Sublime
An inspiration engine for ideas
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need
The Architecture of Grab's Data Lake
For low throughput data, Grab uses Parquet with Copy on Write (CoW) .
Here's the main operations for Copy on Write:
Here's the main operations for Copy on Write:
- Write Operations - Whenever there's a write, you create a new version of the file that includes the latest change. You can also keep the previous version for consistency and rollback purposes. This helps prevent data corruption, incon
The Architecture of Grab's Data Lake
Programmable platform for data in motion
An open-source data streaming platform with in-line computation capabilities. Apply your custom programs to aggregate, correlate, and transform data records in real-time as they move over the network.
An open-source data streaming platform with in-line computation capabilities. Apply your custom programs to aggregate, correlate, and transform data records in real-time as they move over the network.
The programmable data streaming platform
Fast-csv
Fast-csv is library for parsing and formatting CSVs or any other delimited value file in node.
Features
Fast-csv is library for parsing and formatting CSVs or any other delimited value file in node.
Features
- CSV Formatting
- CSV Parsing
- Built using typescript.
- Flexible formatting and parsing options, to fit almost any scenario.
- Built with streams first to avoid creating large memory footprint when parsing large files.
- Battle tested in production, pa
C2FO • GitHub - C2FO/fast-csv: CSV parser and formatter for node
The ability to implement custom Polars plugins in Rust is invaluable. Since we process a lot of textual data for our NLP applications, we can create optimized functions to clean text or detect a language, with data being processed efficiently in batches. This level of customization is rarely seen in other typical processing engines and is even impo... See more
Polars — Processing hundreds of GBs of textual data on a daily basis at MDPI
Why gRPC?
gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect d... See more
gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect d... See more
gRPC

FastStream brokers provide convenient function decorators @broker.subscriber and @broker.publisher to allow you to delegate the actual process of:
These decorators make it easy to specify the processing logic for your consumers and producers, allowing you to ... See more
- consuming and producing data to Event queues, and
- decoding and encoding JSON encoded messages
These decorators make it easy to specify the processing logic for your consumers and producers, allowing you to ... See more
airtai • GitHub - airtai/faststream: FastStream is a powerful and easy-to-use Python framework for building asynchronous services that interact with event streams such as Apache Kafka and RabbitMQ.
ReadySet is a transparent database cache for Postgres & MySQL that gives you the performance and scalability of an in-memory key-value store without requiring that you rewrite your app or manually handle cache invalidation. ReadySet sits between your application and database and turns even the most complex SQL reads into lightning-fast lookups.... See more