GitHub - dlt-hub/dlt: data load tool (dlt) is an open source...

GitHub - dlt-hub/dlt: data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

RelatedHighlights

Data bases have gotten so good at this, that the term is almost misleading now. “Base” suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. That’s right... See more

DuckDB Doesn’t Need Data To Be a Database

FOD#27: "Now And Then"

dstack is an open-source toolkit and orchestration engine for running GPU workloads. It's designed for development, training, and deployment of gen AI models on any cloud.

Supported providers: AWS, GCP, Azure, Lambda, TensorDock, Vast.ai, and DataCrunch.

Latest news ✨

[2024/01] dstack 0.14.0: OpenAI-compatible endpoints preview (Release)

[2023/12] dst

dstackai • GitHub - dstackai/dstack: dstack is an open-source toolkit for running GPU workloads on any cloud. It works seamlessly with any cloud GPU providers. Discord: https://discord.gg/u8SmfwPpMd

Data science teams can use Baseten to efficiently serve, integrate, design, and ship their custom machine learning models with ease. A key benefit of Baseten is that it collapses the innovation cycle for ML apps, resulting in cheaper experimentation and greater success. It unblocks ML efforts currently bottlenecked by infrastructure, frontend, and ... See more

Jason Risch • Self-Serve Apps for ML Teams | Greylock

Koheesio

CI/CD

Package

Meta

Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

The framework is versatile, aiming to support multiple implementations and working sea... See more

GitHub - Nike-Inc/koheesio: Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more

Data Engineering • The Open Data Stack Distilled into Four Core Tools

ata Collection Experimentation Evaluation and Deployment Monitoring and Response Metadata Data catalogs, Amundsen, AWS Glue, Hive metas-tores Weights & Biases, MLFlow, train/test set parameter configs, A/B test tracking tools Dashboards, SQL, metric functions and window sizes Unit Data cleaning tools Tensorflow, ML-lib, PyTorch, Scikit-learn, X... See more

Shreya Shankar • "We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine Learning.

Unitxt is a python library for getting data fired up and set for utilization. In one line of code, it preps a dataset or mixtures-of-datasets into an input-output format for training and evaluation. We aspire to be simple, adaptable and transparent.

Unitxt builds on separation. Separation allows adding a dataset, without knowing anything about the m... See more