WebDataset
Sarah Drinkwater and added
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with... See more
Datasette
Nicolay Gerold added
TabLib
Access on Hugging Face
π€
(Sample, Full Dataset)
Read the Paper (TabLib)
Introduction
Huge datasets have been critical for the performance of AI models for text and images. Similar advancements can be made for tabular dataβwhich consists of tables consisting of rows and columnsβbut the research community needs a bigger and more diverse datas... See more
Access on Hugging Face
π€
(Sample, Full Dataset)
Read the Paper (TabLib)
Introduction
Huge datasets have been critical for the performance of AI models for text and images. Similar advancements can be made for tabular dataβwhich consists of tables consisting of rows and columnsβbut the research community needs a bigger and more diverse datas... See more
TabLib
Nicolay Gerold added
Data bases have gotten so good at this, that the term is almost misleading now. βBaseβ suggests something rigid, without which the data would slip away. But the data is always there, just bits on a nameless hard disk. The structure and the accessibility that a modern database provides exist completely independently from that hard disk. Thatβs right... See more
DuckDB Doesnβt Need Data To Be a Database
Nicolay Gerold added
Jordan Bester added
here are two basic approaches to creating AI datasets. The first one, which is typical of the case we have been studying, a pool of open works is purposefully chosen to ensure license compliance. The second approach creates the dataset by scraping the βraw internetβ and relying on copyright exceptions. LAION , a dataset of 400 million image-text pa... See more
Alek Tarkowski β’ Filling the governance vacuum related to the use of information commons for AI training
madisen added
The key characterizations of web3 are: d ecentralized, e dge computing infrastructure, AI-driven, 3D Graphics (aka the Spatial Web), t ransparent/open source, a nonymous, u biquitous (IoT)
Rebecca Searles β’ Web3 101: A Beginnerβs Guide
sari added
Web3 is a new vision of the internet. It takes us away from todayβs centralized internet, where data is stored in servers managed by trusted counterparties like Facebook and Google, to a decentralized internet, where a trustless network of peer-to-peer servers manages and verifies data.
Alex Taussig β’ Firehose #190: πΈ Creator equity. πΈ
sari added