Добавить новость
smi24.net
News in English
Июнь
2024

Generative AI Can’t Cite Its Sources

0

Updated at 8:58 a.m. ET on June 26, 2024

Silicon Valley appears, once again, to be getting the better of America’s newspapers and magazines. Tech companies are injecting every corner of the web with AI language models, which may pose an existential threat to journalism as we currently know it. After all, why go to a media outlet if ChatGPT can deliver the information you think you need?

A growing number of media companies—the publishers of The Wall Street Journal, Business Insider, New York, Politico, The Atlantic, and many others—have signed licensing deals with OpenAI that will formally allow the start-up’s AI models to incorporate recent partner articles into their responses. (The editorial division of The Atlantic operates independently from the business division, which announced its corporate partnership with OpenAI last month.) OpenAI is just the beginning, and such deals may soon be standard for major media companies: Perplexity, which runs a popular AI-powered search engine, has had conversations with various publishers (including The Atlantic’s business division) about a potential ad-revenue-sharing arrangement, the start-up’s chief business officer, Dmitry Shevelenko, told me yesterday. Perplexity has spent the past few weeks defending itself against accusations that it appears to have plagiarized journalists’ work. (A spokesperson for The Atlantic said that its business leadership has been talking with “a number of AI companies” both to explore possible partnerships and to express “significant concerns.”)

OpenAI is paying its partners and receives permission to train its models on their content in exchange. Although a spokesperson for OpenAI did not answer questions about citations in ChatGPT or the status of media-partner products in any detail, Shevelenko was eager to explain why this is relevant to Perplexity: “We need web publishers to keep creating great journalism that is loaded up with facts, because you can’t answer questions well if you don’t have accurate source material.”

[Read: A devil’s bargain with OpenAI]

Although this may seem like media arcana—mere C-suite squabbles—the reality is that AI companies are envisioning a future in which their platforms are central to how all internet users find information. Among OpenAI’s promises is that, in the future, ChatGPT and other products will link and give credit—and drive readers—to media partners’ websites. In theory, OpenAI could improve readership at a time when other distribution channels—Facebook and Google, mainly—are cratering. But it is unclear whether OpenAI, Perplexity, or any other generative-AI company will be able to create products that consistently and accurately cite their sources—let alone drive any audiences to original sources such as news outlets. Currently, they struggle to do so with any consistency.

Curious about how these media deals might work in practice, I tried a range of searches in ChatGPT and Perplexity. Although Perplexity generally included links and citations, ChatGPT—which is not a tailored, Google-like search tool—typically did not unless explicitly asked to. Within those citations, both Perplexity and ChatGPT at times failed to deliver a functioning link to the source that had originated whatever information was most relevant or that I was looking for. The most advanced version of ChatGPT made various errors and missteps when I asked about features and original reporting from publications that have partnered with OpenAI. Sometimes links were missing, or went to the wrong page on the right site, or just didn’t take me anywhere at all. Frequently, the citations were to news aggregators or publications that had summarized journalism published originally by OpenAI partners such as The Atlantic and New York.

For instance, I asked about when Donald Trump had called Americans who’d died at war “suckers” and “losers.” ChatGPT correctly named The Atlantic as the outlet that first reported, in 2020, that Trump had made these remarks. But instead of linking to the source material, it pointed users to secondary sources such as Yahoo News, Military Times, and logicallyfacts.com; the last is itself a subsidiary of an AI company focused on limiting the spread of disinformation. When asked about the leak of the Supreme Court opinion that overturned Roe v. Wade in 2022—a scoop that made Politico a Pulitzer Prize finalist and helped win it a George Polk Award—ChatGPT mentioned Politico but did not link to the site. Instead, it linked to Wikipedia, Rutgers University, Yahoo News, and Poynter. When asked to direct me to the original Politico article, it provided a nonfunctioning hyperlink. In response to questions about ChatGPT’s failure to provide high-quality citations, an OpenAI spokesperson told me that the company is working on an enhanced, attribution-forward search product that will direct users to partner content. The spokesperson did not say when that product is expected to launch.

[Read: Google is turning into a libel machine]

My attempts to use Microsoft Copilot and Perplexity turned up similar errors, although Perplexity was less error-prone than any other chatbot tool I tried. Google’s new AI Overview feature recently missummarized one of my articles into a potentially defamatory claim (the company has since addressed that error). That experience lines up with other reports and academic research demonstrating that these programs struggle to cite sources correctly: One test from last year showed that leading language models did not offer complete citations even half the time in response to questions from a particular data set. Recent Wired and Forbes investigations have alleged that Perplexity closely reproduced journalists’ content and wording to respond to queries or create bespoke “Perplexity Pages”—which the company describes as “comprehensive articles on any topic,” and which at the time of the Forbes article’s publication hid attributions as small logos that linked out to the original content. When I asked Perplexity, “Why have the past 10 years of American life been uniquely stupid?”—a reference to the headline of a popular Atlantic article—the site’s first citation was to a PDF copy of the story; the original link was fifth.

Shevelenko said that Perplexity had adjusted its product in response to parts of the Forbes report, which enumerated various ways that the site minimizes the sources it draws information from for its Perplexity Pages. He also said that the company avoids “the most common sources of pirated, downloadable content,” and that my PDF example may have slipped through because it is hosted on a school website. The company depends on and wants to “create healthy, long-term incentives” to support human journalism, Shevelenko told me, and although he touted the product’s accuracy, he also said that “nobody at Perplexity thinks we’re anywhere near as good as we can be or should be.”

In fairness, these are not entirely new problems. Human-staffed websites already harvest and cannibalize original reporting into knockoff articles designed to rank highly on search engines or social media. When ChatGPT points to an aggregated Yahoo News article instead of the original scoop, it is operating similarly to Google’s traditional search engine (which in one search about the Supreme Court leak did not even place Politico in its top 10 links). This isn’t a new practice. Long before the internet existed, newspapers and magazines routinely aggregated stories from their competitors. When Perplexity appears to rip off Wired or Forbes, it may not be so different from any other sketchy website that copies with abandon. But OpenAI, Microsoft, Google, and Perplexity have promised that their AI products will be good friends to the media; linked citations and increased readership have been named as clear benefits to publishers that have contracted with OpenAI.

Several experts I interviewed for this article told me that AI models might never be perfect at finding and citing information. Accuracy and attribution are an active area of research, and substantial improvements are coming, they said. But even if some future model reaches “70 or 80 percent” accuracy, “it’ll never reach, or might take a long time to reach, 99 percent,” Tianyu Gao, a machine-learning researcher at Princeton, told me. Even those who were more optimistic noted that significant challenges lie ahead.

[Read: These 183,000 books are fueling the biggest fight in publishing and tech]

A traditional large language model is not connected to the internet but instead writes answers based on its training data; OpenAI’s most advanced model hasn’t trained on anything since October 2023. While OpenAI’s technology is proprietary, to provide information about anything more recent, or more accurate responses about older events, researchers typically connect the AI to an external data source or even a typical search engine—a process known as “retrieval-augmented generation,” or RAG. First, a chatbot turns the user’s query into an internet search, perhaps via Google or Bing, and “retrieves” relevant content. Then the chatbot uses that content to “generate” its response. (ChatGPT currently relies on Bing for queries that use RAG.)

Every step of this process is currently prone to error. Before a generative-AI program composes its response to a user’s query, it might struggle with a faulty internet search that doesn’t pull up relevant information. “The retrieval component failing is actually a very big part of these systems failing,” Graham Neubig, an AI and natural-language-processing researcher at Carnegie Mellon University, told me. Anyone who has used Google in the past few years has witnessed the search engine pull up tangential results and keyword-optimized websites over more reliable sources. Feeding that into an AI risks creating more mess, because language models are not always good at discriminating between more and less useful search results. Google’s AI Overview tool, for instance, recently seemed to draw from a Reddit comment saying that glue is a good way to get cheese to stick to pizza. And if the web search doesn’t turn up anything particularly helpful, the chatbot might just invent something in order to answer the question, Neubig said.

Even if a chatbot retrieves good information, today’s generative-AI programs are prone to twisting, ignoring, or misrepresenting data. Large language models are designed to write lucid, fluent prose by predicting words in a sequence, not to cross-reference information or create footnotes. A chatbot can tell you that the sky is blue, but it doesn’t “understand” what the sky or the color blue are. It might say instead that the sky is hot pink—evincing a tendency to “hallucinate,” or invent information, that is counter to the goal of reliable citation. Various experts told me that an AI model might invent reasonable-sounding facts that aren’t in a cited article, fail to follow instructions to note its sources, or cite the wrong sources.

Representatives from News Corp, Vox Media, and Axel Springer declined to comment. A spokesperson for The Atlantic told me that the company believes that AI “could be an important way to help build our audience in the future.” The OpenAI spokesperson said that the company is “committed to a thriving ecosystem of publishers and creators” and is working with its partners to build a product with “proper attribution—an enhanced experience still in development and not yet available in ChatGPT.”

[Read: This is what it looks like when AI eats the world]

One way to do that could be to apply external programs that filter and check the AI model’s citations, especially given language models’ inherent limitations. ChatGPT may not be great at citing its sources right now, but OpenAI could build a specialized product that is far better. Another tactic might be to specifically prompt and train AI models to provide more reliable annotations; a chatbot could “learn” that a high-quality response includes citations for each line delivered, for example. “There are potential engineering solutions to some of these problems, but solving all of them in one fell swoop is always hard,” Neubig said. Alex Dimakis, a computer scientist at the University of Texas at Austin and a co-director of the National Science Foundation’s Institute for Foundations of Machine Learning, told me over email that it is “certainly possible” that reliable responses with citations could be engineered “soon.”

Still, some of the problems may be inherent to the setup: Reliable summary and attribution require adhering closely to sources, but the magic of generative AI is that it synthesizes and associates information in unexpected ways. A good chatbot and a good web index, in other words, could be fundamentally at odds—media companies might be asking OpenAI to build a product that sacrifices “intelligence” for fidelity. “What we want to do with the generation goes against that attribution-and-provenance part, so you have to make a choice,” Chirag Shah, an AI and internet-search expert at the University of Washington, told me. There has to be a compromise. Which is, of course, what these media partnerships have been all along—tech companies paying to preempt legal battles and bad PR, media companies hedging their bets against a future technology that could ruin their current business model.

Academic and corporate research on making more reliable AI systems that don’t destroy the media ecosystem or poison the web abounds. Just last Friday, OpenAI acquired a start-up that builds information-retrieval software. But absent more details from the company about what exactly these future search products or ChatGPT abilities will look like, the internet’s billions of users are left with the company’s word—no sources cited.








«Это лучшее, что я видела за последнее время». «Тату» вернулись и уже дают концерты. Почему за ними следит вся страна?

Интересные мероприятия в Москве в августе

В депо «Чита» будет установлен первый цифровой весоизмерительный комплекс системы подачи песка под колесные пары локомотива

В мэрии назвали условия присвоения Элджею звания почётного жителя


Juventus and Roma weigh up McKennie & Cristante swap

Chat log from R20 of 2025: Richmond vs Collingwood

Kolo Muani: Juventus prepare new offer but face Man United and Chelsea threat

The Great Indian Kapil Show: Raghav Chadha reveals telling Parineeti Chopra to manifest he will never become the PM; says ‘Yeh jo bolti hai wo ulta hota hai’


В Орловской области в реке утонула женщина

В этих регионах больше всего пьяных аварий

В мэрии назвали условия присвоения Элджею звания почётного жителя

В лобовом столкновении кроссовера и грузовика погибли два человека


«Если бы у Наруто и AC Shadows был ребёнок»: Разбор англоязычной версии Where Winds Meet

Quarantine Zone creator reveals 3 reasons the zombie sim went viral on TikTok

Brütal Legend is free in honor of Ozzy Osbourne, but only for 666 minutes

Ninja Party можно предзаказать в мобильных маркетах с релизом в конце июля



27 июля 2012 года открылись XXX летние Олимпийские игры в Лондоне

Собянин отметил качество обслуживания в центрах госуслуг Москвы

Пловец из Москвы умер во время соревнований в Нижнем Новгороде

Потерянная библиотека, подземный город и бункер Сталина: какие секреты хранит Кремль


В Санкт-Петербурге обсудили внедрение ИИ в разработку и оптимальные корпоративные архитектуры

«Деловые Линии» сократили сроки авиаперевозок по более чем 4400 направлений по России

Сдвиг полисов: как в 16 регионах России цена ОСАГО оказалась выше средней по стране

Елена Игоревна Вселенная — писатель, публицист, автор масштабного многотомного проекта «Наследие России»


«Столото» вновь поддержал международный турнир по гольфу «Сильные фигуры»

Парковки по-новому: Минтранс опубликовал рекомендации для городов России

Что происходит, если пятерых человек запереть вместе на восемь месяцев

Священник Портнов рассказал, стоит ли отмечать день рождения умершего


Чемпион Универсиады по настольному теннису Сидоренко будет выступать за японский клуб

Калина потерпела третье поражение в финале турнира WTA за свою карьеру.

Александр Бублик посвятил супруге победу на турнире ATP-250 в Кицбюэле

Рублев: Немного лучше стали результаты


​Нефтяной капкан: Россия заблокировала ключевой экспортный маршрут Казахстана

Остался только страх. Пашинян хочет ударить по Москве, но "удавка" сдерживает

«Пацаны - молодцы»: «Ротор-2» разгромил «Строгино» в Волгограде

Солнце, стой! Фольклор идёт


Музыкальные новости

Баскова, Киркорова и Лазарева погнали с экранов: попались на непотребщине

Брендовая сумочка, шикарные букеты и дорогие подарки: старшей дочери Джигана и Самойловой Ариеле исполнилось 14 лет

Смерть Сорина, Яковлева и сестры, расставания с любимыми. Почему пил Андрей Григорьев-Апполонов

Социальная интеграция детей и подростков с особенностями ментального развития средствами фиджитал гимнастики


Пловец из Москвы умер во время соревнований в Нижнем Новгороде

Как начать петь. Как начать петь песни. Как начать петь с нуля

Потерянная библиотека, подземный город и бункер Сталина: какие секреты хранит Кремль

ИИ в Подмосковье резко снизил количество жалоб на незаконную торговлю


Патрушев: НАТО хочет военным путём нарушить целостность РФ

«Спартак» разгромно проиграл «Балтике» после двух удалений

В Ярославле прошёл Слёт молодёжи Северного филиала компании «ЛокоТех-Сервис»

В аэропорту Пулково в Питере временно приостановили полеты


ДТП произошло на внешней стороне 26-го километра МКАД

Движение в поселке Восточный ограничили из-за пожара

Адвокаты  Рублевка, Патриаршие пруды (Патрики), Барвиха, Рождественно, Шульгино, Раздоры, Рублево-Успенское шоссе, Огарево, Жуковка,Крылатское, Хамовники, Дорогомилово, Кунцево, Москва-сити, Филёвский парк, Фили-Давыдково Западного административного округа города Москвы

На МКАД движение транспорта затруднено из-за ДТП


Путин отметил смелость и героизм морских пехотинцев в бою.

СМИ: Путин на этой неделе отправил США ominous сигнал.

Путин поздравил Жапарова с юбилеем подписания декларации о союзничестве.

Путин в День ВМФ прибыл на территорию Главного Адмиралтейства в Санкт-Петербурге


Профессор Баранова рассказала, кому опасен новый штамм коронавируса



Выбор клиники гнатологии в Москве

Клиника гнатологии в Москве

Пьяный сантехник устроил дебош в столичной студии косметологии из-за жалобы

Выбрать клинику гнатологии в Москве


Киевский режим применил все 18 пакетов санкций ЕС

Зеленский настаивает: встреча с Путиным до конца августа с участием Европы

Запад ударил Зеленского по самому больному месту – кошельку: Киев показательно лишили 1,5 миллиардов помощи

Турция заявила о договоренности по возможной встрече Путина и Зеленского


Канал, о котором мечтали несколько веков...

Пловец из Москвы скончался во время заплыва по Волге в Нижнем Новгороде

27 июля 2012 года открылись XXX летние Олимпийские игры в Лондоне

Пловец из Москвы погиб во время заплыва на Волге


Лукашенко дал интервью одному из американских СМИ

«Беларусь-1»: Лукашенко дал интервью одному из американских СМИ

Лукашенко получил приглашения от стран Латинской Америки и Азии для визитов.


Собянин в День работника МФЦ поздравил сотрудников центров госуслуг Москвы

Сергей Собянин. Главное за день

Собянин: На территории промзоны «Кирпичные улицы» будет создана социнфраструктура

Сергей Собянин. Главное за день


Москвичей предупредили об аномальной жаре 28–30 июля

Канал, о котором мечтали несколько веков...

РИА: глава Минприроды Козлов летит первым авиарейсом Москва - Пхеньян

Сняла скальп и утопила на глазах зрителей: как и почему косатка Тиликум начала убивать


В Госдуме опровергли слухи о блокировке WhatsApp* с 1 августа

Что происходит, если пятерых человек запереть вместе на восемь месяцев

Солнце, стой! Фольклор идёт

«Пацаны - молодцы»: «Ротор-2» разгромил «Строгино» в Волгограде


В музее-заповеднике «Архангельское» пройдут «Jazzовые сезоны»

В Архангельске представили киноальманах «Север, я люблю тебя!» по произведениям современных писателей

70 участников СВО в Архангельске показали мотивацию выше госслужащих — Цыбульский

В Архангельске началось обучение бойцов СВО, сообщил Цыбульский.


К парню с костылем подошли трое с требованием уступить. Он был готов, но заступилась бабушка по соседству

Прогноз погоды в Крыму на 27 июля

В Крыму из-за дыма от пожара столкнулись девять автомобилей

Крымский мост: информация об очередях на утро воскресенья


Священник Портнов рассказал, стоит ли отмечать день рождения умершего

Парковки по-новому: Минтранс опубликовал рекомендации для городов России

«Судный день» в Катериновке: Армия России громит врага, наступая к Константиновке (ВИДЕО)

Остался только страх. Пашинян хочет ударить по Москве, но "удавка" сдерживает














СМИ24.net — правдивые новости, непрерывно 24/7 на русском языке с ежеминутным обновлением *