GitHub - dlt-hub/dlt: data load tool (dlt) is an open source Python library that makes data loading easy π οΈ
Data bases have gotten so good at this, that the term is almost misleading now. βBaseβ suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. Thatβs right... See more
DuckDB Doesnβt Need Data To Be a Database
dstack is an open-source toolkit and orchestration engine for running GPU workloads. It's designed for development, training, and deployment of gen AI models on any cloud.
Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunch.
Latest news β¨
Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunch.
Latest news β¨
- [2024/01] dstack 0.14.0: OpenAI-compatible endpoints preview (Release)
- [2023/12] dst
dstackai β’ GitHub - dstackai/dstack: dstack is an open-source toolkit for running GPU workloads on any cloud. It works seamlessly with any cloud GPU providers. Discord: https://discord.gg/u8SmfwPpMd
Data science teams can use Baseten to efficiently serve, integrate, design, and ship their custom machine learning models with ease. A key benefit of Baseten is that it collapses the innovation cycle for ML apps, resulting in cheaper experimentation and greater success. It unblocks ML efforts currently bottlenecked by infrastructure, frontend, and ... See more
Jason Risch β’ Self-Serve Apps for ML Teams | Greylock
Koheesio
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working sea... See more
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working sea... See more
GitHub - Nike-Inc/koheesio: Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The last core data stack tool is the orchestrator. Itβs used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more
Data Engineering β’ The Open Data Stack Distilled into Four Core Tools
ata Collection Experimentation Evaluation and Deployment Monitoring and Response Metadata Data catalogs, Amundsen, AWS Glue, Hive metas-tores Weights & Biases, MLFlow, train/test set parameter configs, A/B test tracking tools Dashboards, SQL, metric functions and window sizes Unit Data cleaning tools Tensorflow, ML-lib, PyTorch, Scikit-learn, X... See more
Shreya Shankar β’ "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.
Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation. We aspire to be simple, adaptable and transparent.
Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the m... See more
Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the m... See more