Data Loading
Airbyte enables you to build data pipelines and replicate data from a source to a destination. You can configure how frequently the data is synced, what data is replicated, and how the data is written to in the destination.
This page describes the concepts you need to know to use Airbyte.
Source
A source is an API, file, database, or data warehouse... See more
This page describes the concepts you need to know to use Airbyte.
Source
A source is an API, file, database, or data warehouse... See more
Core Concepts | Airbyte Documentation
Data Integration. Integration is needed when your organization collects large amounts of data in various systems such as databases, CRM systems, application servers, and so on. Accessing and analyzing data that is spread across multiple systems can be a challenge. To address this challenge, data integration can be used to create a unified view of... See more
Data Engineering • The Open Data Stack Distilled into Four Core Tools
DataTrove
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
DataTrove is a library to process, filter and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
DataTrove processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster. Its (relatively) low memory... See more
huggingface • GitHub - huggingface/datatrove: Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Easily chunk complex documents the same way a human would.
Chunking documents is a challenging task that underpins any RAG system. High quality results are critical to a sucessful AI application, yet most open-source libraries are limited in their ability to handle complex documents.
Open Parse is designed to fill this gap by providing a flexible,... See more
Chunking documents is a challenging task that underpins any RAG system. High quality results are critical to a sucessful AI application, yet most open-source libraries are limited in their ability to handle complex documents.
Open Parse is designed to fill this gap by providing a flexible,... See more
Filimoa • GitHub - Filimoa/open-parse: Improved file parsing for LLM’s
Your LLMs deserve better input.
Reader converts any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/ . Get improved output for your agent and RAG systems at no cost.
Reader converts any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/ . Get improved output for your agent and RAG systems at no cost.
- Live demo: https://jina.ai/reader
- Or just visit these URLs https://r.jina.ai/https://github.com/jina-ai/reader, https://r.jina.ai/https://x.com/elonmusk and see
jina-ai • jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub
The backbone for Versatile ai
Meet Instill Cloud, a no-code/low-code platform that accelerates AI application development by 10x. Effortlessly connect to diverse data sources, seamlessly integrate AI models, and deploy customized logic for your projects, no matter how complex, with lightning speed.
Meet Instill Cloud, a no-code/low-code platform that accelerates AI application development by 10x. Effortlessly connect to diverse data sources, seamlessly integrate AI models, and deploy customized logic for your projects, no matter how complex, with lightning speed.
Instill AI
Magika
Magika is a novel AI powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized Keras model that only weighs about 1MB, and enables precise file identification within milliseconds, even when running on a single CPU.
In an... See more
Magika is a novel AI powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized Keras model that only weighs about 1MB, and enables precise file identification within milliseconds, even when running on a single CPU.
In an... See more
google • GitHub - google/magika: Detect file content types with deep learning
watchfiles
Simple, modern and high performance file watching and code reload in python.
Documentation : watchfiles.helpmanual.io
Source Code : github.com/samuelcolvin/watchfiles
Underlying file system notifications are handled by the Notify rust library.
This package was previously named "watchgod", see the migration guide for more information.
Simple, modern and high performance file watching and code reload in python.
Documentation : watchfiles.helpmanual.io
Source Code : github.com/samuelcolvin/watchfiles
Underlying file system notifications are handled by the Notify rust library.
This package was previously named "watchgod", see the migration guide for more information.
samuelcolvin • GitHub - samuelcolvin/watchfiles: Simple, modern and fast file watching and code reload in python.
Collect , unify , and activate customer data
RudderStack makes it easy to collect and send customer data to the tools and teams that need it
RudderStack makes it easy to collect and send customer data to the tools and teams that need it