Saved by Sarah Drinkwater and
Datasets as Imagination
It teaches them that data is created, not found; and that creating it well demands humanity, rather than objectivity
Melanie Feinberg • The Myth of Objective Data
Keely Adler and added
Datasets aren’t simply raw materials to feed algorithms, but are political interventions. As such, much of the discussion around 'bias' in AI systems misses the mark: there is no 'neutral,' 'natural,' or 'apolitical' vantage point that training data can be built upon. There is no easy technical 'fix' by shifting demographics, deleting offensive ter... See more
L. M. Sacasas • The Uncanny Gaze of the Machine
Alex Wittenberg added
Patterns when telling stories with data: - What is the dataset? Who generated the dataset and why? - What is the process that underpins the dataset? Given that process, what is missing from the dataset or has been poorly measured? Could other datasets have been generated, and if so, how different could they have been to the one that we have? - What
... See moreJohann Van Tonder added
here are two basic approaches to creating AI datasets. The first one, which is typical of the case we have been studying, a pool of open works is purposefully chosen to ensure license compliance. The second approach creates the dataset by scraping the “raw internet” and relying on copyright exceptions. LAION , a dataset of 400 million image-text pa... See more
Alek Tarkowski • Filling the governance vacuum related to the use of information commons for AI training
madisen added
As Kristoffer Ørum, artist and self-proclaimed ‘misuser of technology,’ pointed out in a RADAR interview, as LLMs become “very good at drawing things that look like something,” humans have the opportunity to push in the opposite direction, reviving more absurdist and abstract forms of art — much like how the expressionists thrived after the advent ... See more
Our Centaur Future - A RADAR Report
Keely Adler and added
Many of these projects are saving time by training on small, highly curated datasets. This suggests there is some flexibility in data scaling laws. The existence of such datasets follows from the line of thinking in Data Doesn't Do What You Think, and they are rapidly becoming the standard way to do training outside Google
semianalysis.com • Google "We Have No Moat, and Neither Does OpenAI"
Beyond collecting and storing information, new tools for thought allow users to resurface, connect, and generate knowledge. Whether visual or textual, this new breed of tools encourage users to link their ideas together, find interesting patterns, and create original content based on their research.
Anne-Laure Le Cunff • The state of personal knowledge management
sari and added
To guard against these issues while reaping the potential benefits of image generators, we provide recommendations such as regulation that forces organizations to disclose their training data, and tools that help artists prevent using their content as training data without their consent.
Timnit Gebru • Just a moment...
Faith Hahn added