Sublime
An inspiration engine for ideas
WebDataset
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset is a library for writing I/O pipelines for large datasets. Its sequential I/O and sharding features make it especially useful for streaming large-scale datasets to a DataLoader.
The WebDataset format
A WebDataset file is a TAR archive containing a series of data files. All successive data files with the same prefix are... See more
WebDataset
A matter of choice: People and possibilities in the age of AI
The 2025 Human Development Report explores the intersection of artificial intelligence and human development, emphasizing the importance of people's choices in shaping equitable outcomes amid technological advancements.
hdr.undp.orgDatasette
datasette.io

Anthropic/EconomicIndex · Datasets at Hugging Face
huggingface.co
RedPajama-V2 is an open dataset for training large language models. The dataset includes over 100B text
documents coming from 84 CommonCrawl snapshots and processed using
the CCNet pipeline. Out of these, there are 30B documents in the corpus
that additionally come with quality signals. In addition, we also provide the ids of duplicated documents... See more
documents coming from 84 CommonCrawl snapshots and processed using
the CCNet pipeline. Out of these, there are 30B documents in the corpus
that additionally come with quality signals. In addition, we also provide the ids of duplicated documents... See more
togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face

🚨 New preprint 🚨
We introduce Generative Distribution Embeddings (GDEs) — a framework for learning representations of distributions, not just datapoints.
GDEs enable multiscale modeling and come with elegant statistical theory and some miraculous geometric... See more