Добавить новость
smi24.net
News in English
Июль
2023

America Already Has an AI Underclass

0

On weekdays, between homeschooling her two children, Michelle Curtis logs on to her computer to squeeze in a few hours of work. Her screen flashes with Google Search results, the writings of a Google chatbot, and the outputs of other algorithms, and she has a few minutes to respond to each—judging the usefulness of the blue links she’s been provided, checking the accuracy of an AI’s description of a praying mantis, or deciding which of two chatbot-written birthday poems is better. She never knows what she will have to assess in advance, and for the AI-related tasks, which have formed the bulk of her work since February, she says she has little guidance and not enough time to do a thorough job.

Curtis is an AI rater. She works for the data company Appen, which is subcontracted by Google to evaluate the outputs of the tech giant’s AI products and search algorithm. Countless people do similar work around the world for Google; the ChatGPT-maker, OpenAI; and other tech firms. Their human feedback plays a crucial role in developing chatbots, search engines, social-media feeds, and targeted-advertising systems—the most important parts of the digital economy.

Curtis told me that the job is grueling, underpaid, and poorly defined. Whereas Google has a 176-page guide for search evaluations, the instructions for AI tasks are relatively sparse, she said. For every task she performs that involves rating AI outputs, she is given a few sentences or paragraphs of vague, even convoluted instructions and as little as just a few minutes to fully absorb them before the time allotted to complete the task is up. Unlike a page of Google results, chatbots promise authoritative answers—offering the final, rather than first, step of inquiry, which Curtis said makes her feel a heightened moral responsibility to assess AI responses as accurately as possible. She dreads these timed tasks for the very same reason: “It’s just not humanly possible to do in the amount of time that we’re given.” On Sundays, she works a full eight hours. “Those long days can really wear on you,” she said.

Armughan Ahmad, Appen’s CEO, told me through a spokesperson that the company “complies with minimum wages” and is investing in improved training and benefits for its workers; a Google spokesperson said Appen is solely responsible for raters’ working conditions and job training. For Google to mention these people at all is notable. Despite their importance to the generative-AI boom and tech economy more generally, these workers are almost never referenced in tech companies’ prophecies about the ascendance of intelligent machines. AI moguls describe their products as forces akin to electricity or nuclear fission, like facts of nature waiting to be discovered, and speak of “maximally curious” machines that learn and grow on their own, like children. The human side of sculpting algorithms tends to be relegated to opaque descriptions of “human annotations” and “quality tests,” evacuated of the time and energy powering those annotations.

[Read: Google’s new search tool could eat the internet alive]

The tech industry has a history of veiling the difficult, exploitative, and sometimes dangerous work needed to clean up its platforms and programs. But as AI rapidly infiltrates our daily lives, tensions between tech companies framing their software as self-propelling and the AI raters and other people actually pushing those products along have started to surface. In 2021, Appen raters began organizing with the Alphabet Workers Union-Communications Workers of America to push for greater recognition and compensation; Curtis joined its ranks last year. At the center of the fight is a big question: In the coming era of AI, can the people doing the tech industry’s grunt work ever be seen and treated not as tireless machines but simply as what they are—human?  

The technical name for the use of such ratings to improve AI models is reinforcement learning with human feedback, or RLHF. OpenAI, Google, Anthropic, and other companies all use the technique. After a chatbot has processed massive amounts of text, human feedback helps fine-tune it. ChatGPT is impressive because using it feels like chatting with a human, but that pastiche does not naturally arise through ingesting data from something like the entire internet, an amalgam of recipes and patents and blogs and novels. Although AI programs are set up to be effective at pattern detection, they “don’t have any sense of contextual understanding, no ability to parse whether AI-generated text looks more or less like what a human would have written,” Sarah Myers West, the managing director of the AI Now Institute, an independent research organization, told me. Only an actual person can make that call.

The program might write multiple recipes for chocolate cake, which a rater ranks and edits. Those evaluations and examples will inform the chatbot’s statistical model of language and next-word predictions, which should make the program better at writing recipes in the style of a human, for chocolate cake and beyond. A person might check a chatbot’s response for factual accuracy, rate how well it fits the prompt, or flag toxic outputs; subject experts can be particularly helpful, and they tend to be paid more.

Using human evaluations to improve algorithmic products is a fairly old practice at this point: Google and Facebook have been using them for almost a decade, if not more, to develop search engines, targeted ads, and other products, Sasha Luccioni, an AI researcher at the machine-learning company Hugging Face, told me. The extent to which human ratings have shaped today’s algorithms depends on who you ask, however. Major tech companies that design and profit from search engines, chatbots, and other algorithmic products tend to characterize the raters’ work as only one among many important aspects of building cutting-edge AI products. Courtenay Mencini, a Google spokesperson, told me that “ratings do not directly impact or solely train our algorithms. Rather, they’re one data point … taken in aggregate with extensive internal development and testing.” OpenAI has emphasized that training on huge amounts of text, rather than RLHF, accounts for most of GPT-4’s capabilities.

[From the September 2023 issue: Does Sam Altman know what he’s creating?]

AI experts I spoke with outside these companies took a different stance. Targeted human feedback has been “the single most impactful change that made [current] AI models as good as they are,” allowing the leap from GPT-2’s half-baked emails to GPT-4’s convincing essays, Luccioni said. She and others argue that tech companies intentionally downplay the importance of human feedback. Such obfuscation “sockets away some of the most unseemly elements of these technologies,” such as hateful content and misinformation that humans have to identify, Myers West told me—not to mention the conditions the people work under. Even setting aside those elements, describing the extent of human intervention would risk dispelling the magical and marketable illusion of intelligent machines—a “Wizard of Oz effect,” Luccioni said.

Despite tech companies’ stated positions, digging into their own press statements and research papers about AI reveals that they frequently do acknowledge the value of this human labor, if in broad terms. A Google blog post promoting a new chatbot last year, for instance, said that “to create safer dialogue agents, we need to be able to learn from human feedback.” Google has similarly described human evaluations as necessary to its search engine. The company touts RLHF as “particularly useful” for applying its AI services to industries such as health care and finance. Two lead researchers at OpenAI similarly described human evaluations as vital to training ChatGPT in an interview with MIT Technology Review. The company stated elsewhere that GPT-4 exhibited “large improvements” in accuracy after RLHF training and that human feedback was crucial to fine-tuning it. Meta’s most recent language model, released this week, relies on “over 1 million new human annotations,” according to the company.

To some extent, the significance of humans’ AI ratings is evident in the money pouring into them. One company that hires people to do RLHF and data annotation was valued at more than $7 billion in 2021, and its CEO recently predicted that AI companies will soon spend billions of dollars on RLHF, similar to their investment in computing power. The global market for labeling data used to train these models (such as tagging an image of a cat with the label “cat”), another part of the “ghost work” powering AI, could reach nearly $14 billion by 2030, according to an estimate from April 2022, months before the ChatGPT gold rush began.

All of that money, however, rarely seems to be reaching the actual people doing the ghostly labor. The contours of the work are starting to materialize, and the few public investigations into it are alarming: Workers in Africa are paid as little as $1.50 an hour to check outputs for disturbing content that has reportedly left some of them with PTSD. Some contractors in the U.S. can earn only a couple of dollars above the minimum wage for repetitive, exhausting, and rudderless work. The pattern is similar to that of social-media content moderators, who can be paid a tenth as much as software engineers to scan traumatic content for hours every day. “The poor working conditions directly impact data quality,” Krystal Kauffman, a fellow at the Distributed AI Research Institute and an organizer of raters and data labelers on Amazon Mechanical Turk, a crowdsourcing platform, told me.

Stress, low pay, minimal instructions, inconsistent tasks, and tight deadlines—the sheer volume of data needed to train AI models almost necessitates a rush job—are a recipe for human error, according to Appen raters affiliated with the Alphabet Workers Union-Communications Workers of America and multiple independent experts. Documents obtained by Bloomberg, for instance, show that AI raters at Google have as little as three minutes to complete some tasks, and that they evaluate high-stakes responses, such as how to safely dose medication. Even OpenAI has written, in the technical report accompanying GPT-4, that “undesired behaviors [in AI systems] can arise when instructions to labelers were underspecified” during RLHF.

Tech companies have at times responded to these issues by stating that ratings are not the only way they check accuracy, that humans doing those ratings are paid adequately based on their location and afforded proper training, and that viewing traumatic materials is not a typical experience. Mencini, the Google spokesperson, told me that Google’s wages and benefits standards for contractors do not apply to raters, because they “work part-time from home, can be assigned to multiple companies’ accounts at a time, and do not have access to Google’s systems or campuses.” In response to allegations of raters seeing offensive materials, she said that workers “select to opt into reviewing sensitive content, and can opt out freely at any time.” The companies also tend to shift blame to their vendors—Mencini, for instance, told me that “Google is simply not the employer of any Appen workers.”  

[Read: The coming humanist renaissance]

Appen’s raters told me that their working conditions do not align with various tech companies’ assurances—and that they hold Appen and Google responsible, because both profit from their work. Over the past year, Michelle Curtis and other raters have demanded more time to complete AI evaluations, benefits, better compensation, and the right to organize. The job’s flexibility does have advantages, they told me. Curtis has been able to navigate her children’s medical issues; another Appen rater I spoke with, Ed Stackhouse, said the adjustable hours afford him time to deal with a heart condition. But flexibility does not justify low pay and a lack of benefits, Shannon Wait, an organizer with the AWU-CWA, told me; there’s nothing flexible about precarity.

The group made headway at the start of the year, when Curtis and her fellow raters received their first-ever raise. She now makes $14.50 an hour, up from $12.75—still below the minimum of $15 an hour that Google has promised to its vendors, temporary staff, and contractors. The union continued raising concerns about working conditions; Stackhouse wrote a letter to Congress about these issues in May. Then, just over two weeks later, Curtis, Stackhouse, and several other raters received an email from Appen stating, “Your employment is being terminated due to business conditions.”

The AWU-CWA suspected that Appen and Google were punishing the raters for speaking out.  “The raters that were let go all had one thing in common, which was that they were vocal about working conditions or involved in organizing,” Stackhouse told me. Although Appen did suffer a drop in revenue during the broader tech downturn last year, the company also had, and has, open job postings. Four weeks before the termination, Appen had sent an email offering cash incentives to work more hours and meet “a significant spike in jobs available since the beginning of year,” when the generative-AI boom was in full swing; just six days before the layoffs, Appen sent another email lauding “record-high production levels” and re-upping the bonus-pay offer. On June 14, the union filed a complaint with the National Labor Relations Board alleging that Appen and Google had retaliated against raters “by terminating six employees who were engaged in protected [labor] activity.”

Less than two weeks after the complaint was filed, Appen reversed its decision to fire Curtis, Stackhouse, and the others; their positions were reinstated with back pay. Ahmad, Appen’s CEO, told me in an email that his company bases “employment decisions on business requirements” and is “happy that our business needs changed and we were able to hire back the laid off contributors.” He added, “Our policy is not to discriminate against employees due to any protected labor activities,” and that “we’ve been actively investing in workplace enhancements like smarter training, and improved benefits.”

Mencini, the Google spokesperson, told me that “only Appen, as the employer, determines their employees’ working conditions,” and that “Appen provides job training for their employees.” As with compensation and training, Mencini deflected responsibility for the treatment of organizing workers as well: “We, of course, respect the labor rights of Appen employees to join a union, but it’s a matter between them and their employer, Appen.”

That AI purveyors would obscure the human labor undergirding their products is predictable. Much of the data that train AI models is labeled by people making poverty wages, many of them located in the global South. Amazon deliveries are cheap in part because working conditions in the company’s warehouses subsidize them. Social media is usable and desirable because of armies of content moderators also largely in the global South. “Cloud” computing, a cornerstone of Amazon’s and Microsoft’s businesses, takes place in giant data centers.

AI raters might be understood as an extension of that cloud, treated not as laborers with human needs so much as productive units, carbon transistors on a series of fleshly microchips—objects, not people. Yet even microchips take up space; they require not just electricity but also ventilation to keep from overheating. The Appen raters’ termination and reinstatement is part of “a more generalized pattern within the tech industry of engaging in very swift retaliation against workers” when they organize for better pay or against ethical concerns about the products they work on, Myers West, of the AI Now Institute, told me.

Ironically, one crucial bit of human labor that AI programs have proved unable to automate is their own training. Human subjectivity and prejudice have long migrated their way into algorithms, and those flaws mean machines may not be able to perfect themselves. Various attempts to train AI models with other AI models have bred further bias and worsened performance, though a few have shown limited success. “I can’t imagine that we will be able to replicate [human intervention] with current AI approaches,” Hugging Face’s Luccioni told me in an email; Ahmad said that “using AI to train AI can have dire consequences as it pertains to the viability and credibility of this technology.” The tech industry has so far failed to purge the ghosts haunting its many other machines and services—the people organizing on warehouse floors, walking out of corporate headquarters, unionizing overseas, and leaking classified documents. Appen’s raters are proving that, even amid the generative-AI boom, humanity may not be so easily exorcized.








Московские врачи провели младенцу сложнейшую операцию на поджелудочной железе

Коллекция Balenciaga осень-зима 2025/26

Эксперт по антиквариату и сервировке Регина ван Влит: как отличить антиквариат от ненужного старья

Виниры для исправления прикуса


Weah’s agent: One Juventus director ‘is creating problems’

Today in History: July 28, US Army airplane crashes into Empire State Building

Dear Abby: Our son was clean and fit until Emily came along

Kolo Muani: Juventus prepare new offer but face Man United and Chelsea threat


Александр Михайлов (GSOC): «Безопасность IT-экосистемы — это зона нулевого доверия к подрядчикам»

Деревенские прогулки...

НПС построит два велопешеходных моста через Москву-реку

ТСМ готовит территорию под реконструкцию ростовского участка трассы М-4


Bloody fighting game Invincible VS gets its most brutal character yet in Comic-Con trailer

Fretless — The Wrath of Riffson — музыка спасёт мир. Рецензия

Quarantine Zone creator reveals 3 reasons the zombie sim went viral on TikTok

Today's Wordle answer for Monday, July 28



Чемпионат по военно-спортивному многоборью среди росгвардейцев завершился в Грозном

Культовый BAW 212 уже в России

Клиника гнатологии – лечение ВНЧС и восстановление прикуса

Александр Михайлов (GSOC): «Безопасность IT-экосистемы — это зона нулевого доверия к подрядчикам»


Где живут и на чем ездят самые аварийные водители России?

От «Лебединого озера» до «Дон Кихота»: Relax FM приглашает на балет

Путин: модернизация Военно-морского флота усилит защиту интересов России

Александр Михайлов (GSOC): «Безопасность IT-экосистемы — это зона нулевого доверия к подрядчикам»


В Тулу привезли походную икону с главной святыней Новодевичьего монастыря

ЦБ готовится к новому решению: как изменится ключевая ставка и что ждет рынок недвижимости

Владимир Ефимов: С января по июнь москвичи обратились за получением государственных услуги в сфере имущества более 60 тысяч раз

«Аэрофлот» намерен 29 июля выполнить 93% рейсов из Москвы и обратно по графику


Весной его дисквалифицировали за мат, а теперь он герой Универсиады. Кто такой Владимир Сидоренко?

Калинская поднялась на 17 позиций в обновленном рейтинге WTA

Калинская повысила свои позиции на 17 мест в рейтинге WTA

Александр Бублик посвятил супруге победу на турнире ATP-250 в Кицбюэле


Лекторий в Гостином Дворе: секреты успеха от космонавта и ученых

СК сообщил о возбуждении уголовного дела из-за обвала скалы в Кисловодске

Кажетта Ахметжанова: 5 вещей в кошельке, которые блокируют поступление денег

Генпрокуратура РФ направила в суд дело о хищении 519 млн рублей из ПФР


Музыкальные новости

Социальная интеграция детей и подростков с особенностями ментального развития средствами фиджитал гимнастики

Фестиваль стыда и "эстрадной мафии" под маской молодёжного праздника: устроили на VK Fest очередной шабаш звезд

Песков: Высоцкий является феноменальной частью российской культуры

Питчинг Релиза. Отправить релиз на Питчинг.


От «Лебединого озера» до «Дон Кихота»: Relax FM приглашает на балет

Культовый BAW 212 уже в России

Клиника гнатологии – лечение ВНЧС и восстановление прикуса

Александр Михайлов (GSOC): «Безопасность IT-экосистемы — это зона нулевого доверия к подрядчикам»


Почему технологии ПВВК безопасны и эффективны, химия и экология воды по мнению Алексея Горшкова

«Россети Центр» повышают надежность работы трансформаторных подстанций в Смоленской области

Дорога любви: Жасмин представляет романтичный клип на песню «Ты и я»

В День крещения Руси в главный храм Росгвардии доставили иконы небесных покровителей отрядов спецназа ведомства


Сотрудники ОМОН Росгвардии помогли пострадавшему в ДТП на МКАД

Где живут и на чем ездят самые аварийные водители России?

Культовый BAW 212 уже в России

ДТП с участием трех автомобилей произошло на внутренней стороне 104 километра МКАД


Путин поделился достижениями России в области технологий искусственного интеллекта.

Путин: рост Сбербанка обеспечивает стабильность банковской системы.

Володин сообщил о интеграции парламента в стратегию развития России под руководством Путина.

Путин сделал шутку о системе "Оплата с помощью улыбки".


Новый штамм COVID-19 переносится как легкая форма ОРВИ


Депутата ЗакСа Ленобласти Ивана Апостолевского задержали за пост с Навальным*


Сеть клиник «Будь Здоров» открыла первый травмпункт сети на базе клиники на Сретенке

Клиника гнатологии – лечение ВНЧС и восстановление прикуса

Медицинские справки для школьников в центре ВИТА

Врач-косметолог Мадина Осман: что такое липофилинг и кому он может быть показан


Кто заставил Зеленского предложить России новый раунд переговоров: В Британии показали пальцем

Чтобы убрать Зеленского, США достаточно показать ему одну папку: вот почему Киев упал в ноги Трампу


Чемпионат по военно-спортивному многоборью среди росгвардейцев завершился в Грозном

Чемпионат по военно-спортивному многоборью среди росгвардейцев завершился в Грозном

Сотрудница подразделения столичного главка Росгвардии завоевала «золото» на чемпионате войск по легкоатлетическому кроссу

В День парашютиста героем рубрики «Знай наших» стал сотрудник вневедомственной охраны столичного главка Росгвардии младший лейтенант полиции Александр С.


В Минске готовы активизировать сотрудничество с Эфиопией

Лукашенко взял на контроль ситуацию с уничтожением БПЛА над Минском

Лукашенко сообщил о телефонном разговоре с Путиным


Собянин рассказал о тестировании уникальной ИИ-системы для диагностики инсульта

Станция "Достоевская" Кольцевой линии метро готова на четверть - Собянин

Собянин рассказал, как строят станцию «Достоевская» Кольцевой линии метро

Собянин: Участие москвичей в жизни города — ключ ко всем позитивным изменениям


В Бузулукском бору в эти дни работает смена проекта «Заповедное дело РГО»

Александр Михайлов (GSOC): «Безопасность IT-экосистемы — это зона нулевого доверия к подрядчикам»

В Ростокине расскажут о млекопитающих Москвы

Число пострадавших от непогоды автомобилей растет


Лекторий в Гостином Дворе: секреты успеха от космонавта и ученых

Модель Лопырева отметила 42-летие в компании мужа, с которым едва не развелась

ЦБ готовится к новому решению: как изменится ключевая ставка и что ждет рынок недвижимости

Кажетта Ахметжанова: 5 вещей в кошельке, которые блокируют поступление денег


В Архангельске с 29 июля перекрывается движение по участку набережной Северной Двины

Не чайные клиперы

Туманное утро в Турчасово...

Деревенские прогулки...


В Севастополе пройдет масштабная выставка картин Александра Дейнеки

В Крыму из-за дыма от пожара столкнулись девять автомобилей

Сколько пассажиров прибывают в Крым летом на поездах ежедневно

Прогноз погоды в Крыму на 27 июля


ЖК «Республика»: новый крупный микрорайон в сердце Крыма

Ефимов: в Москве выдали почти 700 градпланов для образовательных объектов

Курьер мошенников похитил у пенсионерки более 300 тысяч рублей: молодой человек заключён под стражу

ЦБ готовится к новому решению: как изменится ключевая ставка и что ждет рынок недвижимости














СМИ24.net — правдивые новости, непрерывно 24/7 на русском языке с ежеминутным обновлением *