mirror of
https://github.com/roapi/roapi.git
synced 2026-06-05 21:04:02 +08:00
Update the example HTTP port in the configuration sample to match other query/cli examples in the README file. All examples used port `8080`, but we have `8084` in the configuration example. This is not a major issue, but if the ports are the same everywhere, it will be easier for new users to run the project and test the examples. The main documentation has also been updated in [this PR](https://github.com/roapi/docs/pull/21).
379 lines
10 KiB
Markdown
379 lines
10 KiB
Markdown
# ROAPI
|
|
|
|

|
|
[](https://roapi.github.io/docs/index.html)
|
|
[](https://discord.gg/FchMNYTm7G)
|
|
|
|
ROAPI automatically spins up read-only APIs for static datasets without
|
|
requiring you to write a single line of code. It builds on top of [Apache
|
|
Arrow](https://github.com/apache/arrow) and
|
|
[Datafusion](https://github.com/apache/arrow-datafusion). The
|
|
core of its design can be boiled down to the following:
|
|
|
|
- [Query frontends](https://roapi.github.io/docs/api/query/index.html) to
|
|
translate SQL, FlightSQL, GraphQL and REST API queries into
|
|
Datafusion plans.
|
|
- Datafusion for query plan execution.
|
|
- [Data layer](https://roapi.github.io/docs/config/dataset-formats/index.html)
|
|
to load datasets from a variety of sources and formats with automatic schema
|
|
inference.
|
|
- [Response encoding layer](https://roapi.github.io/docs/api/response.html) to
|
|
serialize intermediate Arrow record batch into various formats requested by
|
|
client.
|
|
|
|
See below for a high level diagram:
|
|
|
|
<img alt="roapi-design-diagram" src="https://roapi.github.io/docs/images/roapi.png">
|
|
|
|
## Installation
|
|
|
|
### Install pre-built binary
|
|
|
|
```bash
|
|
# if you are using homebrew
|
|
brew install roapi
|
|
# or if you prefer pip
|
|
pip install roapi
|
|
```
|
|
|
|
Check out [Github release page](https://github.com/roapi/roapi/releases) for
|
|
pre-built binaries for each platform. Pre-built docker images are also available at
|
|
[ghcr.io/roapi/roapi](https://github.com/orgs/roapi/packages/container/package/roapi).
|
|
|
|
### Install from source
|
|
|
|
```bash
|
|
cargo install --locked --git https://github.com/roapi/roapi --branch main --bins roapi
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Quick start
|
|
|
|
Spin up APIs for `test_data/uk_cities_with_headers.csv` and
|
|
`test_data/spacex_launches.json`:
|
|
|
|
```bash
|
|
roapi \
|
|
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
|
|
--table "test_data/spacex_launches.json"
|
|
```
|
|
|
|
For windows, full scheme(file:// or filesystem://) must filled, and use double quote(") instead of single quote(') to escape windows cmdline limit:
|
|
|
|
```bash
|
|
roapi \
|
|
--table "uk_cities=file://d:/path/to/uk_cities_with_headers.csv" \
|
|
--table "file://d:/path/to/test_data/spacex_launches.json"
|
|
```
|
|
|
|
Or using docker:
|
|
|
|
```bash
|
|
docker run -t --rm -p 8080:8080 ghcr.io/roapi/roapi:latest --addr-http 0.0.0.0:8080 \
|
|
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
|
|
--table "test_data/spacex_launches.json"
|
|
```
|
|
|
|
For MySQL and SQLite, use parameters like this.
|
|
```
|
|
--table "table_name=mysql://username:password@localhost:3306/database"
|
|
--table "table_name=sqlite://path/to/database"
|
|
```
|
|
|
|
Want dynamic register data? Add parameter `-d` to command. `--table` parameter cannot be ignored for now.
|
|
```bash
|
|
roapi \
|
|
--table "uk_cities=test_data/uk_cities_with_headers.csv" \
|
|
-d
|
|
```
|
|
|
|
Then post config to `/api/table` register data.
|
|
```bash
|
|
curl -X POST http://172.24.16.1:8080/api/table \
|
|
-H 'Content-Type: application/json' \
|
|
-d '[
|
|
{
|
|
"tableName": "uk_cities2",
|
|
"uri": "./test_data/uk_cities_with_headers.csv"
|
|
},
|
|
{
|
|
"tableName": "table_name",
|
|
"uri": "sqlite://path/to/database"
|
|
}
|
|
]'
|
|
```
|
|
|
|
Query tables using SQL, GraphQL or REST:
|
|
|
|
```bash
|
|
curl -X POST -d "SELECT city, lat, lng FROM uk_cities LIMIT 2" localhost:8080/api/sql
|
|
curl -X POST -d "query { uk_cities(limit: 2) {city, lat, lng} }" localhost:8080/api/graphql
|
|
curl "localhost:8080/api/tables/uk_cities?columns=city,lat,lng&limit=2"
|
|
```
|
|
|
|
Get inferred schema for all tables:
|
|
|
|
```bash
|
|
curl 'localhost:8080/api/schema'
|
|
```
|
|
|
|
### Config file
|
|
|
|
You can also configure multiple table sources using YAML or Toml config, which supports more
|
|
advanced format specific table options:
|
|
|
|
```yaml
|
|
addr:
|
|
http: 0.0.0.0:8080
|
|
postgres: 0.0.0.0:5433
|
|
tables:
|
|
- name: "blogs"
|
|
uri: "test_data/blogs.parquet"
|
|
|
|
- name: "ubuntu_ami"
|
|
uri: "test_data/ubuntu-ami.json"
|
|
option:
|
|
format: "json"
|
|
pointer: "/aaData"
|
|
array_encoded: true
|
|
schema:
|
|
columns:
|
|
- name: "zone"
|
|
data_type: "Utf8"
|
|
- name: "name"
|
|
data_type: "Utf8"
|
|
- name: "version"
|
|
data_type: "Utf8"
|
|
- name: "arch"
|
|
data_type: "Utf8"
|
|
- name: "instance_type"
|
|
data_type: "Utf8"
|
|
- name: "release"
|
|
data_type: "Utf8"
|
|
- name: "ami_id"
|
|
data_type: "Utf8"
|
|
- name: "aki_id"
|
|
data_type: "Utf8"
|
|
|
|
- name: "spacex_launches"
|
|
uri: "https://api.spacexdata.com/v4/launches"
|
|
option:
|
|
format: "json"
|
|
|
|
- name: "github_jobs"
|
|
uri: "https://web.archive.org/web/20210507025928if_/https://jobs.github.com/positions.json"
|
|
```
|
|
|
|
To run serve tables using config file:
|
|
|
|
```bash
|
|
roapi -c ./roapi.yml # or .toml
|
|
```
|
|
|
|
See [config
|
|
documentation](https://roapi.github.io/docs/config/config-file.html) for more
|
|
options including [using Google spreadsheet as a table
|
|
source](https://roapi.github.io/docs/config/dataset-formats/gsheet.html).
|
|
|
|
### Response serialization
|
|
|
|
By default, ROAPI encodes responses in JSON format, but you can request
|
|
different encodings by specifying the `ACCEPT` header:
|
|
|
|
```bash
|
|
curl -X POST \
|
|
-H 'ACCEPT: application/vnd.apache.arrow.stream' \
|
|
-d "SELECT launch_library_id FROM spacex_launches WHERE launch_library_id IS NOT NULL" \
|
|
localhost:8080/api/sql
|
|
```
|
|
|
|
### REST API query interface
|
|
|
|
You can query tables through REST API by sending `GET` requests to
|
|
`/api/tables/{table_name}`. Query operators are specified as query params.
|
|
|
|
REST query frontend currently supports the following query operators:
|
|
|
|
- columns
|
|
- sort
|
|
- limit
|
|
- filter
|
|
|
|
To sort column `col1` in ascending order and `col2` in descending order, set
|
|
query param to: `sort=col1,-col2`.
|
|
|
|
To find all rows with `col1` equal to string `'foo'`, set query param to:
|
|
`filter[col1]='foo'`. You can also do basic comparisons with filters, for
|
|
example predicate `0 <= col2 < 5` can be expressed as
|
|
`filter[col2]gte=0&filter[col2]lt=5`.
|
|
|
|
### GraphQL query interface
|
|
|
|
To query tables using GraphQL, send the query through `POST` request to
|
|
`/api/graphql` endpoint.
|
|
|
|
GraphQL query frontend supports the same set of operators supported by [REST
|
|
query frontend](https://roapi.github.io/docs/api/query/rest.html). Here how is
|
|
you can apply various operators in a query:
|
|
|
|
```graphql
|
|
{
|
|
table_name(
|
|
filter: { col1: false, col2: { gteq: 4, lt: 1000 } }
|
|
sort: [{ field: "col2", order: "desc" }, { field: "col3" }]
|
|
limit: 100
|
|
) {
|
|
col1
|
|
col2
|
|
col3
|
|
}
|
|
}
|
|
```
|
|
|
|
### SQL query interface
|
|
|
|
To query tables using a subset of standard SQL, send the query through `POST`
|
|
request to `/api/sql` endpoint. This is the only query interface that supports
|
|
table joins.
|
|
|
|
|
|
### Key value lookup
|
|
|
|
You can pick two columns from a table to use a key and value to create a quick
|
|
keyvalue store API by adding the following lines to the config:
|
|
|
|
```yaml
|
|
kvstores:
|
|
- name: "launch_name"
|
|
uri: "test_data/spacex_launches.json"
|
|
key: id
|
|
value: name
|
|
```
|
|
|
|
Key value lookup can be done through simple HTTP GET requests:
|
|
|
|
```bash
|
|
curl -v localhost:8080/api/kv/launch_name/600f9a8d8f798e2a4d5f979e
|
|
Starlink-21 (v1.0)%
|
|
```
|
|
|
|
### Query through Postgres wire protocol
|
|
|
|
ROAPI can present itself as a Postgres server so users can use Postgres clients
|
|
to issue SQL queries.
|
|
|
|
```bash
|
|
$ psql -h 127.0.0.1
|
|
psql (12.10 (Ubuntu 12.10-0ubuntu0.20.04.1), server 13)
|
|
WARNING: psql major version 12, server major version 13.
|
|
Some psql features might not work.
|
|
Type "help" for help.
|
|
|
|
houqp=> select count(*) from uk_cities;
|
|
COUNT(UInt8(1))
|
|
-----------------
|
|
37
|
|
(1 row)
|
|
```
|
|
|
|
|
|
## Features
|
|
|
|
Query layer:
|
|
|
|
- [x] REST API GET
|
|
- [x] GraphQL
|
|
- [x] SQL
|
|
- [x] join between tables
|
|
- [x] access to array elements by index
|
|
- [x] access to nested struct fields by key
|
|
- [ ] column index
|
|
- protocol
|
|
- [x] Postgres
|
|
- [x] FlightSQL
|
|
- [x] Key value lookup
|
|
|
|
Response serialization:
|
|
|
|
- [x] JSON `application/json`
|
|
- [x] Arrow `application/vnd.apache.arrow.stream`
|
|
- [x] Parquet `application/vnd.apache.parquet`
|
|
- [ ] msgpack
|
|
|
|
Data layer:
|
|
|
|
- [x] filesystem
|
|
- [x] HTTP/HTTPS
|
|
- [x] S3
|
|
- [x] GCS
|
|
- [x] Azure Storage
|
|
- [x] Google spreadsheet
|
|
- [x] MySQL
|
|
- [x] SQLite
|
|
- [x] Postgres
|
|
- [ ] Airtable
|
|
- Data format
|
|
- [x] CSV
|
|
- [x] JSON
|
|
- [x] NDJSON
|
|
- [x] parquet
|
|
- [x] xls, xlsx, xlsb, ods: https://github.com/tafia/calamine
|
|
- [x] [DeltaLake](https://delta.io/)
|
|
|
|
Misc:
|
|
|
|
- [ ] auto gen OpenAPI doc for rest layer
|
|
- [ ] query input type conversion based on table schema
|
|
- [ ] stream arrow encoding response
|
|
- [ ] authentication layer
|
|
|
|
## Development
|
|
|
|
The core of ROAPI, including query front-ends and data layer, lives in the
|
|
self-contained [columnq](https://github.com/roapi/roapi/tree/main/columnq)
|
|
crate. It takes queries and outputs Arrow record batches. Data sources will
|
|
also be loaded and stored in memory as Arrow record batches.
|
|
|
|
The [roapi](https://github.com/roapi/roapi/tree/main/roapi) crate wraps
|
|
`columnq` with a multi-protocol query layer. It serializes Arrow record batches
|
|
produced by `columnq` into different formats based on client request.
|
|
|
|
### Debug
|
|
|
|
To log all FlightSQL requests in console, set `RUST_LOG=tower_http=trace`.
|
|
|
|
### Build Docker image
|
|
|
|
```bash
|
|
docker build --rm -t ghcr.io/roapi/roapi:latest .
|
|
```
|
|
### VS Code DevContainer
|
|
|
|
#### Requirements
|
|
- [x] Vscode
|
|
- [x] Ensure this extension is installed on your vs code `ms-vscode-remote.remote-containers`
|
|
---
|
|
Once done you will see prompt from left side to reopen the project in dev container or open command palette and search for open with remote container:
|
|
|
|
1. install dependencies
|
|
```bash
|
|
apt-get update && apt-get install --no-install-recommends -y cmake
|
|
```
|
|
2. connect to database from your local using db client of choice using the following credentials
|
|
```
|
|
username: user
|
|
password: user
|
|
database: test
|
|
```
|
|
once done create table so you can map it in `-t` arg or consider using sample in `.devcontainer/db-migration.sql` to populate some tables with data
|
|
|
|
3. run cargo command with mysql db as feature
|
|
```bash
|
|
cargo run --bin roapi --features database -- -a localhost:8080 -t posts=mysql://user:user@db:3306/test
|
|
```
|
|
otherwise if you are looking for other features
|
|
you have to select appropriate one from `roapi/Cargo.toml`
|
|
|
|
|