Commit Graph

12 Commits

Author SHA1 Message Date
Maksym Dovhal
81a25205b7
Update xlsx table (#316)
This PR is a fix for issue https://github.com/roapi/roapi/issues/259

List of updates/fixes:
* module xlsx renamed to excel.
* Allow reading not only xlsx format but also xls, ods, xlsb
* Allow Excel DateTime format and transform it to arrow
Timestamp(Seconds, None)
* Allow using NULLs in any data types and use null value instead of
string "null"
* Fix issue with incorrect data type inference when multiple data types
are detected.
* Add possibility to specify data schema in config.
* Add new options: -
rows_range_start
 - rows_range_end
 - columns_range_start
 - columns_range_end
 - schema_inference_lines
* Make sheet_name optional and if it is not specified than use first
sheet by default

* Bump calamine crate to version 0.23.1 and add feature "dates"
(supporting for DateTime column format)

Documentation updates: https://github.com/roapi/docs/pull/20
2024-01-31 20:16:34 -08:00
QP Hou
d531f8ae46
support partitioned tables (#307) 2023-10-20 16:00:07 -07:00
Joe
500e535caa
feat: support datafusion config (#288) 2023-08-13 01:09:12 +00:00
Y Togami
beac9cca6a
Fix: support .sqlite, sqlite3 (#223)
* fix: support sqlite extension

* chore: add .sqlite, .sqlite3 test data
2022-11-11 15:47:31 -08:00
Y Togami
35549c0ddf
feat: support .xlsx (#218) 2022-10-28 18:46:09 -07:00
Y Togami
8180fb59d6
feat: Support jsonl for the file extension (#200)
* chore: add sample .ndjson .jsonl file

* test: add test for loading .ndjson and .jsonl

* feat: support jsonl for the file extension
2022-10-14 23:31:39 -07:00
zemel leong
3ace6078fa
Add MySQL, Sqlite support. (#162)
* added MySQL and Sqlite datasource support
* updated arrow, datafusion and deltalake to latest version
* cleared simd ci test cache to workaround nightly compiler bug
2022-04-07 00:55:10 -07:00
Thomas Peiselt
ea84099b07
Lazy load delta: Support for large tables (#71)
* Allow for delta tables to be directly backed by storage.

Enables experimental support for delta tables that are too large to be
stored in memory. We directly expose `DeltaTable` instead of copying the
data into a datafusion::Memtable.

Disadvantages:
- in the new mode, no support for S3
- as we're relying on datafusion to handle the parquet files directly,
  nested schemas and certain data types may not work properly.
2021-09-06 00:56:55 -07:00
Erwin Kroon
ff2d06b0e4
add support for all Arrow IPC formats in roapi-http (#67)
* add support for all arrow IPC formats in roapi-http

* refactor: schema inferrence and partitions in 1 loop
2021-09-05 14:27:09 -07:00
Thomas Peiselt
5ace8b8695
Lazy load parquet (#63)
* PoC for parquet: reading a table by registering parquet directly.

* Adding config flag and restoring existing _in-memory_ code path.

* Addressing review comment: separate `to_mem_table()`.

* Addressing review comment: default-able `LoadOptionParquet`.

* Adding test: make sure we instantiated `datafusion::datasource::ParquetTable`
2021-09-03 19:36:18 +00:00
Qingping Hou
95f9ced23d support deriving table name from table uri 2021-07-22 23:38:28 -07:00
Qingping Hou
e0f213c316 split project into roapi-http and columnq crates 2021-02-17 13:46:20 -08:00