jina-ai/reader: Convert any URL to an LLM-friendly input ......

jina-ai/reader: Convert any URL to an LLM-friendly input ... - GitHub

RelatedHighlights

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Data-Juicer is a one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs. This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in pro... See more

alibaba • GitHub - alibaba/data-juicer: A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据！

Nicolay Gerold added

Lumina: The AI Powered Research Suite

lumina-chat.com

Andrés added

GitHub - gregpr07/browser-use: Make websites accessible for AI agents

github.com

Anton Gorshkov added

Browser Automation

Open-Source Pre-Processing Tools for Unstructured Data

The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular... See more

Unstructured-IO • GitHub - Unstructured-IO/unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Nicolay Gerold added

LLM-PowerHouse: A Curated Guide for Large Language Models with Custom Training and Inferencing

Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researche... See more

ghimiresunil • GitHub - ghimiresunil/LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing: LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

Nicolay Gerold added

Inoreader – Build your own newsfeed

inoreader.com

and added

uniflow provides a unified LLM interface to extract and transform and raw documents.

Document types: Uniflow enables data extraction from PDFs, HTMLs and TXTs.

LLM agnostic: Uniflow supports most common-used LLMs for text tranformation, including
- OpenAI models (GPT3.5 and GPT4),
- Google Gemini models (Gemini 1.5, MultiModal),
- AWS BedRock models,
- Huggingf

CambioML • GitHub - CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering: LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster...

Nicolay Gerold added

Original creator : Jesse Zhang (GH: emptycrown, Twitter: @thejessezhang), who courteously donated the repo to LlamaIndex!

This is a simple library of all the data loaders / readers / tools / llama-packs that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sour... See more

GitHub - run-llama/llama-hub: A library of data loaders for LLMs made by the community -- to be used with LlamaIndex and/or LangChain

Nicolay Gerold added