The Architecture of Grab's Data Lake
For low throughput data, Grab uses Parquet with Copy on Write (CoW) .
Here's the main operations for Copy on Write:
Here's the main operations for Copy on Write:
- Write Operations - Whenever there's a write, you create a new version of the file that includes the latest change. You can also keep the previous version for consistency and rollback purposes. This helps prevent data corruption, incon
The Architecture of Grab's Data Lake
For High Throughput data, Grab uses Apache Avro with a strategy called Merge on Read (MOR) .
Here's the main operations with Merge on Read:
Here's the main operations with Merge on Read:
- Write Operations - When data is written, it's appended to the end of a log file. This is much more efficient than merging it in the current data and reduces the latency of writes.
- Read Operations - When you need