AI Chatbots Are Hiring Tutors to Train Their Models
After her second child was born, Chelsea Becker took an unpaid, yearlong leave from her full-time job as a flight attendant. After watching a video on TikTok, she found a side hustle: training artificial intelligence models for a website called Data Annotation Tech.
For a few hours every day, Ms. Becker, 33, who lives in Schwenksville, Pa., would sit at her laptop and interact with an A.I.-powered chatbot. For every hour of work, she was paid $20 to $40. From December to March, she made over $10,000.
The boom in A.I. technology has put a more sophisticated spin on a kind of gig work that doesn’t require leaving the house. The growth of large language models like the technology powering OpenAI’s ChatGPT has fueled the need for trainers like Ms. Becker, fluent English speakers who can produce quality writing.
It is not a secret that A.I. models learn from humans. For years, makers of A.I. systems like Google and OpenAI have relied on low-paid workers, typically contractors employed through other companies, to help computers visually identify subjects. (The New York Times has sued OpenAI and its partner, Microsoft, on claims of copyright infringement.) They might label vehicles and pedestrians for self-driving cars or identify images on photos used to train A.I. systems.
But as A.I. technology has become more sophisticated, so has the job of people who must painstakingly teach it. Yesterday’s photo tagger is today’s essay writer.
There are usually two types of work for these trainers: supervised learning, where the A.I. learns from human-generated writing, and reinforcement learning from human feedback, where the chatbot learns from how humans rate their responses.
Companies that specialize in data curation, including the San Francisco-based start-ups Scale AI and Surge AI, hire contractors and sell their training data to bigger developers. Developers of A.I. models, such as the Toronto-based start-up Cohere, also recruit in-house data annotators.
It is difficult to estimate the total number of these gig workers, researchers said. But Scale AI, which hires contractors through its subsidiaries, Remotasks and Outlier, said it was common to see tens of thousands of people working on the platform at a given time.
But as with other types of gig work, the ease of flexible hours comes with its own challenges. Some workers said they never interacted with administrators behind the recruitment sites, and others had been cut off from the work with no explanation. Researchers have also raised concerns over a lack of standards, since workers typically don’t receive training on what are considered to be appropriate chatbot answers.
To become one of these contractors, workers have to pass an assessment, which includes questions like whether a social media post should be considered hateful, and why. Another one requires a more creative approach, asking contracting prospects to write a fictional short story about a green dancing octopus, set in Sam Bankman-Fried’s FTX offices on Nov. 8, 2022. (That was the day Binance, an FTX competitor, said it would buy Mr. Bankman-Fried’s company before later quickly backing out of the deal.)
Sometimes, companies look for subject matter experts. Scale AI has posted jobs for contract writers who hold master’s or doctoral degrees in Hindi and Japanese. Outlier has job listings that mention requirements like academic degrees in math, chemistry and physics.
“What really makes the A.I. useful to its users is the human layer of data, and that really needs to be done by smart humans and skilled humans and humans with a particular degree of expertise and a creative bent,” said Willow Primack, vice president of data operations at Scale AI. “We have been focusing on contractors, particularly within North America, as a result.”
Alynzia Fenske, a self-published fiction writer, had never interacted with an A.I. chatbot before hearing a lot from fellow writers who considered A.I. a threat. So when she came across a video on TikTok about Data Annotation Tech, part of her motivation was just to learn as much about A.I. as she could and see for herself whether the fears surrounding A.I. were warranted.
“It’s giving me a whole different view of it now that I’ve been working with it,” said Ms. Fenske, 28, who lives in Oakley, Wis. “It is comforting knowing that there are human beings behind it.” Since February, she has been aiming for 15 hours of data annotation work every week so she can support herself while pursuing a writing career.
Ese Agboh, 28, a master’s student studying computer science at the University of Arkansas, was given the task of coding projects, which paid $40 to $45 an hour. She would ask the chatbot to design a motion sensor program that helps gymgoers count their repetitions, and then evaluate the computer codes written by the A.I. In another case, she would load a data set about grocery items to the program and ask the chatbot to design a monthly budget. Sometimes she would even evaluate other annotators’ codes, which experts said are used to ensure data quality.
She made $2,500. But her account was permanently suspended by the platform for violating its code of conduct. She did not receive an explanation, but she suspected that it was because she worked while in Nigeria, since the site wanted workers based in only certain countries.
That is the fundamental challenge of online gig work: It can disappear at any time. With no one available for help, frustrated contractors turned to social media, sharing their experiences on Reddit and TikTok. Jackie Mitchell, 26, gained a large following on TikTok because of her content on side hustles, including data annotation work.
“I get the appeal,” she said, referring to side hustles as an “unfortunate necessity” in this economy and “a hallmark of my generation and the generation above me.”
Public records show that Surge AI owns Data Annotation Tech. Neither the company nor its chief executive, Edwin Chen, responded to requests for comments.
It is common for companies to hire contractors through subsidiaries. They do so to protect the identity of their customers, and it helps them avoid bad press associated with working conditions for its low-paid contract workers, said James Muldoon, a University of Essex management professor whose research focuses on A.I. data work.
A majority of today’s data workers depend on wages from their gig work. Milagros Miceli, a sociologist and computer scientist researching labor conditions in data work, said that while “a lot of people are doing this for fun, because of the gamification that comes with it,” a bulk of the work is still “done by workers who actually really need the money and do this as a main income.”
Researchers are also concerned about the lack of safety standards in data labeling. Workers are sometimes asked to address sensitive issues like whether certain events or acts should be considered genocide or what gender should appear in an A.I.-generated image of a soccer team, but they are not trained on how to make that evaluation.
“It’s fundamentally not a good idea to outsource or crowdsource concerns about safety and ethics,” Professor Muldoon said. “You need to be guided by principles and values, and what your company actually decides as the right thing to do on a particular issue.”