GitHub - dlt-hub/dlt: data load tool (dlt) is an open source...

GitHub - dlt-hub/dlt: data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

RelatedHighlights

Original creator : Jesse Zhang (GH: emptycrown, Twitter: @thejessezhang), who courteously donated the repo to LlamaIndex!

This is a simple library of all the data loaders / readers / tools / llama-packs that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sour... See more

GitHub - run-llama/llama-hub: A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain

Discover, Download, and Run Local LLMs

lmstudio.ai lmstudio.ai

LlamaHub is a library of data loaders, readers, and tools created by the LlamaIndex community. It provides utilities to easily connect LLMs to diverse knowledge sources.

Ben Auffarth • Generative AI with LangChain: Build large language model (LLM) apps with Python, ChatGPT, and other LLMs

DataTrove

DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.

DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more

huggingface • GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more

Data Engineering • The Open Data Stack Distilled into Four Core Tools

Clean & curate your data with LLMs

databonsai is a Python library that uses LLMs to perform data cleaning tasks.

Features

Suite of tools for data processing using LLMs including categorization, transformation, and extraction

Validation of LLM outputs

Batch processing for token savings

Retry logic with exponential backoff for handling rate limits an

databonsai • GitHub - databonsai/databonsai: clean & curate your data with LLMs.

jupyter-naas/awesome-notebooks: Data & AI Notebook ... - GitHub

github.com

Data science teams can use Baseten to efficiently serve, integrate, design, and ship their custom machine learning models with ease. A key benefit of Baseten is that it collapses the innovation cycle for ML apps, resulting in cheaper experimentation and greater success. It unblocks ML efforts currently bottlenecked by infrastructure, frontend, and ... See more