updated 8mo ago
Rust for Data Engineering
- Overview¶
Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and Kube... See morefrom Overview — Apache Arrow Ballista documentation
Nicolay Gerold added
- The ability to implement custom Polars plugins in Rust is invaluable. Since we process a lot of textual data for our NLP applications, we can create optimized functions to clean text or detect a language, with data being processed efficiently in batches. This level of customization is rarely seen in other typical processing engines and is even impo... See more
from Polars — Processing hundreds of GBs of textual data on a daily basis at MDPI
Nicolay Gerold added
- Spice.ai OSS
What is Spice?
Spice is a small, portable runtime that provides developers with a unified SQL query interface to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.
Spice makes it easy to build data-driven and data-intensive applications by streamlining the use of data and mach... See morefrom GitHub - spiceai/spiceai: A unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake. by spiceai
Nicolay Gerold added