Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
The future of the data labeling industry might very well be a question of the AI supply chain as a whole and how the regulatory landscape will respond to that. With the increasing demand for high-quality labeled datasets, it is hard to say how the labour ecosystem for data labeling might change to better meet the rights of workers exploited by a... See more
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
At a broader level beyond the algorithmic pricing model, too, is simply the lack of a standardised wages mechanism endorsed by the capitalist governance model. This can be seen in the gaping pay disparities between countries (e.g., U.S. wages for a task compared to Kenyan wages) and even within countries among different types of specialist labeling... See more
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
This is exacerbated (and facilitated) by the lack of a clear contract and fair terms and conditions for microworkers, with some data labelers coming forward about how they were quietly ghosted from their managers without explanation. This exploitative employer-employee relationship creates an unstable and unreliable environment that workers,... See more
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
Another annotator in Kenya reported that tasks were drying up in the region, and it was clear that the AI supply chain, which had the advantage of not having to have a local infrastructure, was migrating to other countries with cheaper labour like Nepal and the Philippines (until the next cheaper market appears and they set up shop there). The flui... See more
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
This exploitative allocation model is symptomatic of the larger governance model favoured by Western companies shifting their labour force around countries and regions with weaker legal protections and benefits for workers.
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
The MIT Technology Review tested this themselves by creating an account on Remotasks and noticed a timer on the top left of the screen, noticeably 'without a clear deadline or apparent way to pause it to go to the bathroom.' This has been interpreted as an 'inactivity timer' that pushes the task back to the task pool on the platform for someone... See more
Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasets
We differentiate two categories of data labelers behind supervised training datasets: non-subject matter specific data labelers annotating generic, large-scale datasets, and 'expert' data labelers annotating subject matter-specific datasets. The first type of data labeler has been the most common. It involves microworkers contracted from all over... See more