GitHub - towhee-io/towhee: Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Koheesio
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working sea... See more
CI/CD
Package
Meta
Koheesio, named after the Finnish word for cohesion, is a robust Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
The framework is versatile, aiming to support multiple implementations and working sea... See more
GitHub - Nike-Inc/koheesio: Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Nicolay Gerold added
DataTrove
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
huggingface • GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Nicolay Gerold added
txtai
neuml.github.ioGitHub - microsoft/TinyTroupe: LLM-powered multiagent persona simulation for imagination enhancement and business insights.
github.comuniflow provides a unified LLM interface to extract and transform and raw documents.
- Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.
- LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf
CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...
Nicolay Gerold added
Welcome to RAGatouille
Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices. RAG is complex, and there are many moving parts. To g... See more
Easily use and train state of the art retrieval methods in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
The main motivation of RAGatouille is simple: bridging the gap between state-of-the-art research and alchemical RAG pipeline practices. RAG is complex, and there are many moving parts. To g... See more
GitHub - bclavie/RAGatouille: Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
Nicolay Gerold added
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
SkyPilot abstracts away cloud infra burdens :
SkyPilot... See more
SkyPilot abstracts away cloud infra burdens :
- Launch jobs & clusters on any cloud
- Easy scale-out: queue and run many jobs, automatically managed
- Easy access to object stores (S3, GCS, R2)
SkyPilot... See more
skypilot-org • GitHub - skypilot-org/skypilot: SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
Nicolay Gerold added