Data Processing
Traditional ETL solutions are still quite powerful when it comes to:
- Common connectors with small-medium data volumes : we still have a lot of respect for companies like Fivetran, who have really nailed the user experience for the most common ETL use cases, like syncing Zendesk tickets or a production Postgres read replica into Snowflake. The only
Why you should move your ETL stack to Modal
Koheesio
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working... See more
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working... See more
GitHub - Nike-Inc/koheesio: Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Open source, high-throughput, fault-tolerant vector embedding pipeline
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
Simple API endpoint that ingests large volumes of raw data, processes, and stores or returns the vectors quickly and reliably
dgarnitz • GitHub - dgarnitz/vectorflow: VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Data Transformation
Built For Growth
Don't hack custom scripts or use half-baked tools. SQLMesh ensures accurate and efficient data pipelines with the most complete DataOps solution for transformation, testing, and collaboration.
Built For Growth
Don't hack custom scripts or use half-baked tools. SQLMesh ensures accurate and efficient data pipelines with the most complete DataOps solution for transformation, testing, and collaboration.
SQLMesh
So what abstractions do we have as of today? For example, let’s take the resource abstraction (Dagster, Prefect, referred to as an operator in Airflow). You abstract complex environments and connections away with a simple construct like that. You have the immediate benefits of defining that once and using it in every task or pipeline with... See more
Data Engineering • Data Orchestration Trends: The Shift From Data Pipelines to Data Products
This job copied 12m rows from Clickhouse to Snowflake in 16 minutes using:
Even if Fivetran had a ClickHouse connector (it doesn’t at the time of this writing), syncing 12m rows would cost ~$3300. The total cost of this Modal job... See more
- 5 CPUs : at $0.192 / CPU hour that comes out to $0.26
- 4.4 GiB of memory: at $0.024 / GiB per hour that comes out to $0.03
Even if Fivetran had a ClickHouse connector (it doesn’t at the time of this writing), syncing 12m rows would cost ~$3300. The total cost of this Modal job... See more
Why you should move your ETL stack to Modal
The backbone for Versatile ai
Meet Instill Cloud, a no-code/low-code platform that accelerates AI application development by 10x. Effortlessly connect to diverse data sources, seamlessly integrate AI models, and deploy customized logic for your projects, no matter how complex, with lightning speed.
Meet Instill Cloud, a no-code/low-code platform that accelerates AI application development by 10x. Effortlessly connect to diverse data sources, seamlessly integrate AI models, and deploy customized logic for your projects, no matter how complex, with lightning speed.
Instill AI
(1) The separation between storage and compute , as encouraged by data lake architectures (e.g. the implementation of P would look different in a traditional database like PostgreSQL, or a cloud warehouse like Snowflake). This architecture is the focus of the current system, and it is prevalent in most mid-to-large enterprises (its benefits that... See more
Jacopo Tagliabue • Reproducible data science over data lakes: replayable data pipelines with Bauplan and Nessie.
What is Hatchet?
Hatchet replaces difficult to manage legacy queues or pub/sub systems so you can design durable workloads that recover from failure and solve for problems like concurrency , fairness , and rate limiting . Instead of managing your own task queue or pub/sub system, you can use Hatchet to distribute your functions between a set of... See more
Hatchet replaces difficult to manage legacy queues or pub/sub systems so you can design durable workloads that recover from failure and solve for problems like concurrency , fairness , and rate limiting . Instead of managing your own task queue or pub/sub system, you can use Hatchet to distribute your functions between a set of... See more