GitHub - NVIDIA/NeMo-Curator: Scalable toolkit for data curation
Clean & curate your data with LLMs
databonsai is a Python library that uses LLMs to perform data cleaning tasks.
Features
databonsai is a Python library that uses LLMs to perform data cleaning tasks.
Features
- Suite of tools for data processing using LLMs including categorization, transformation, and extraction
- Validation of LLM outputs
- Batch processing for token savings
- Retry logic with exponential backoff for handling rate limits and
databonsai • GitHub - databonsai/databonsai: clean & curate your data with LLMs.
Model Explorer is a powerful graph visualization tool that helps one understand, debug, and optimize ML models. It specializes in visualizing large graphs in an intuitive, hierarchical format, but works well for smaller models as well.
Graph visualization plays a pivotal role in the machine learning (ML) development process. Visual representations... See more
Graph visualization plays a pivotal role in the machine learning (ML) development process. Visual representations... See more
Model Explorer: Graph visualization for large model development
Crawl the web in an LLM-friendly style!
Introducing Crawl4AI 🤖🕷️which is a web data crawler that extracts semantically labeled chunks into JSON, along with clean HTML and markdown for RAG, fine-tuning, and AI chatbots.
This open-source tool offers efficient crawling and multi-URL support.... See more
Unclecode (Hossein)x.comThe Impact of Artificial Intelligence and Digital Technologies: Extractivism, Labor Exploitation, and Environmental Consequences
The document discusses the negative impacts of illegal gold mining in the Brazilian Amazon, as well as the environmental and social consequences of data extraction and AI technology.
cartography-of-generative-ai.net