Добавить новость
smi24.net
News in English
Май
2024

ElevenLabs Is Building an Army of Voice Clones

0

My voice was ready. I’d been waiting, compulsively checking my inbox. I opened the email and scrolled until I saw a button that said, plainly, “Use voice.” I considered saying something aloud to mark the occasion, but that felt wrong. The computer would now speak for me.

I had thought it’d be fun, and uncanny, to clone my voice. I’d sought out the AI start-up ElevenLabs, paid $22 for a “creator” account, and uploaded some recordings of myself. A few hours later, I typed some words into a text box, hit “Enter,” and there I was: all the nasal lilts, hesitations, pauses, and mid-Atlantic-by-way-of-Ohio vowels that make my voice mine.

It was me, only more pompous. My voice clone speaks with the cadence of a pundit, no matter the subject. I type I like to eat pickles, and the voice spits it out as if I’m on Meet the Press. That’s not my voice’s fault; it is trained on just a few hours of me speaking into a microphone for various podcast appearances. The model likes to insert ums and ahs: In the recordings I gave it, I’m thinking through answers in real time and choosing my words carefully. It’s uncanny, yes, but also quite convincing—a part of my essence that’s been stripped, decoded, and reassembled by a little algorithmic model so as to no longer need my pesky brain and body.

  Listen to the author's AI voice:

Using ElevenLabs, you can clone your voice like I did, or type in some words and hear them spoken by “Freya,” “Giovanni,” “Domi,” or hundreds of other fake voices, each with a different accent or intonation. Or you can dub a clip into any one of 29 languages while preserving the speaker’s voice. In each case, the technology is unnervingly good. The voice bots don’t just sound far more human than voice assistants such as Siri; they also sound better than any other widely available AI audio software right now. What’s different about the best ElevenLabs voices, trained on far more audio than what I fed into the machine, isn’t so much the quality of the voice but the way the software uses context clues to modulate delivery. If you feed it a news report, it speaks in a serious, declarative tone. Paste in a few paragraphs of Hamlet, and an ElevenLabs voice reads it with a dramatic storybook flare.

Listen to ElevenLabs read Hamlet:

ElevenLabs launched an early version of its product a little over a year ago, but you might have listened to one of its voices without even knowing it. Nike used the software to create a clone of the NBA star Luka Dončić’s voice for a recent shoe campaign. New York City Mayor Eric Adams’s office cloned the politician’s voice so that it could deliver robocall messages in Spanish, Yiddish, Mandarin, Cantonese, and Haitian Creole. The technology has been used to re-create the voices of children killed in the Parkland school shooting, to lobby for gun reform. An ElevenLabs voice might be reading this article to you: The Atlantic uses the software to auto-generate audio versions of some stories, as does The Washington Post.

It’s easy, when you play around with the ElevenLabs software, to envision a world in which you can listen to all the text on the internet in voices as rich as those in any audiobook. But it’s just as easy to imagine the potential carnage: scammers targeting parents by using their children’s voice to ask for money, a nefarious October surprise from a dirty political trickster. I tested the tool to see how convincingly it could replicate my voice saying outrageous things. Soon, I had high-quality audio of my voice clone urging people not to vote, blaming “the globalists” for COVID, and confessing to all kinds of journalistic malpractice. It was enough to make me check with my bank to make sure any potential voice-authentication features were disabled.

I went to visit the ElevenLabs office and meet the people responsible for bringing this technology into the world. I wanted to better understand the AI revolution as it’s currently unfolding. But the more time I spent—with the company and the product—the less I found myself in the present. Perhaps more than any other AI company, ElevenLabs offers a window into the near future of this disruptive technology. The threat of deepfakes is real, but what ElevenLabs heralds may be far weirder. And nobody, not even its creators, seems ready for it.

In mid-November, I buzzed into a brick building on a London side street and walked up to the second floor. The corporate headquarters of ElevenLabs—a $1 billion company—is a single room with a few tables. No ping-pong or beanbag chairs—just a sad mini fridge and the din of dutiful typing from seven employees packed shoulder to shoulder. Mati Staniszewski, ElevenLabs’ 29-year-old CEO, got up from his seat in the corner to greet me. He beckoned for me to follow him back down the stairs to a windowless conference room ElevenLabs shares with a company that, I presume, is not worth $1 billion.

Staniszewski is tall, with a well-coiffed head of blond hair, and he speaks quickly in a Polish accent. Talking with him sometimes feels like trying to engage in conversation with an earnest chatbot trained on press releases. I started our conversation with a few broad questions: What is it like to work on AI during this moment of breathless hype, investor interest, and genuine technological progress? What’s it like to come in each day and try to manipulate such nascent technology? He said that it’s exciting.

We moved on to what Staniszewski called his “investor story.” He and the company’s co-founder, Piotr Dabkowski, grew up together in Poland watching foreign movies that were all clumsily dubbed into a flat Polish voice. Man, woman, child—whoever was speaking, all of the dialogue was voiced in the same droning, affectless tone by male actors known as lektors.

They both left Poland for university in the U.K. and then settled into tech jobs (Staniszewski at Palantir and Dabkowski at Google). Then, in 2021, Dabkowski was watching a film with his girlfriend and realized that Polish films were still dubbed in the same monotone lektor style. He and Staniszewski did some research and discovered that markets outside Poland were also relying on lektor-esque dubbing.

Mati Staniszewski’s “investor story” as CEO of ElevenLabs begins in Poland, where he grew up watching foreign films clumsily dubbed into a flat voice. (Daniel Stier for The Atlantic)

The next year, they founded ElevenLabs. AI voices were everywhere—think Alexa, or a car’s GPS—but actually good AI voices, they thought, would finally put an end to lektors. The tech giants have hundreds or thousands of employees working on AI, yet ElevenLabs, with a research team of just seven people, built a voice tool that’s arguably better than anything its competitors have released. The company poached researchers from top AI companies, yes, but it also hired a college dropout who’d won coding competitions, and another “who worked in call centers while exploring audio research as a side gig,” Staniszewski told me. “The audio space is still in its breakthrough stage,” Alex Holt, the company’s vice president of engineering, told me. “Having more people doesn’t necessarily help. You need those few people that are incredible.”

ElevenLabs knew its model was special when it started spitting out audio that accurately represented the relationships between words, Staniszewski told me—pronunciation that changed based on the context (minute, the unit of time, instead of minute, the description of size) and emotion (an exclamatory phrase spoken with excitement or anger).

Much of what the model produces is unexpected—sometimes delightfully so. Early on, ElevenLabs’ model began randomly inserting applause breaks after pauses in its speech: It had been training on audio clips from people giving presentations in front of live audiences. Quickly, the model began to improve, becoming capable of ums and ahs. “We started seeing some of those human elements being replicated,” Staniszewski said. The big leap was when the model began to laugh like a person. (My voice clone, I should note, struggles to laugh, offering a machine-gun burst of “haha”s that sound jarringly inhuman.)

Compared with OpenAI and other major companies, which are trying to wrap their large language models around the entire world and ultimately build an artificial human intelligence, ElevenLabs has ambitions that are easier to grasp: a future in which ALS patients can still communicate in their voice after they lose their speech. Audiobooks that are ginned up in seconds by self-published authors, video games in which every character is capable of carrying on a dynamic conversation, movies and videos instantly dubbed into any language. A sort of Spotify of voices, where anyone can license clones of their voice for others to use—to the dismay of professional voice actors. The gig-ification of our vocal cords.

What Staniszewski also described when talking about ElevenLabs is a company that wants to eliminate language barriers entirely. The dubbing tool, he argued, is its first step toward that goal. A user can upload a video, and the model will translate the speaker’s voice into a different language. When we spoke, Staniszewski twice referred to the Babel fish from the science-fiction book The Hitchhiker’s Guide to the Galaxy—he described making a tool that immediately translates every sound around a person into a language they can understand.

Every ElevenLabs employee I spoke with perked up at the mention of this moonshot idea. Although ElevenLabs’ current product might be exciting, the people building it view current dubbing and voice cloning as a prelude to something much bigger. I struggled to separate the scope of Staniszewski’s ambition from the modesty of our surroundings: a shared conference room one floor beneath the company’s sparse office space. ElevenLabs may not achieve its lofty goals, but I was still left unmoored by the reality that such a small collection of people could build something so genuinely powerful and release it into the world, where the rest of us have to make sense of it.

ElevenLabs’ voice bots launched in beta in late January 2023. It took very little time for people to start abusing them. Trolls on 4chan used the tool to make deepfakes of celebrities saying awful things. They had Emma Watson reading Mein Kampf and the right-wing podcaster Ben Shapiro making racist comments about Representative Alexandria Ocasio-Cortez. In the tool’s first days, there appeared to be virtually no guardrails. “Crazy weekend,” the company tweeted, promising to crack down on misuse.

ElevenLabs added a verification process for cloning; when I uploaded recordings of my voice, I had to complete multiple voice CAPTCHAs, speaking phrases into my computer in a short window of time to confirm that the voice I was duplicating was my own. The company also decided to limit its voice cloning strictly to paid accounts and announced a tool that lets people upload audio to see if it is AI generated. But the safeguards from ElevenLabs were “half-assed,” Hany Farid, a deepfake expert at UC Berkeley, told me—an attempt to retroactively focus on safety only after the harm was done. And they left glaring holes. Over the past year, the deepfakes have not been rampant, but they also haven’t stopped.

I first started reporting on deepfakes in 2017, after a researcher came to me with a warning of a terrifying future where AI-generated audio and video would bring about an “infocalypse” of impersonation, spam, nonconsensual sexual imagery, and political chaos, where we would all fall into what he called “reality apathy.” Voice cloning already existed, but it was crude: I used an AI voice tool to try to fool my mom, and it worked only because I had the halting, robotic voice pretend I was losing cell service. Since then, fears of an infocalypse have lagged behind the technology’s ability to distort reality. But ElevenLabs has closed the gap.

The best deepfake I’ve seen was from the filmmaker Kenneth Lurt, who used ElevenLabs to clone Jill Biden’s voice for a fake advertisement where she’s made to look as if she’s criticizing her husband over his handling of the Israel-Gaza conflict. The footage, which deftly stitches video of the first lady giving a speech with an ElevenLabs voice-over, is incredibly convincing and has been viewed hundreds of thousands of times. The ElevenLabs technology on its own isn’t perfect. “It’s the creative filmmaking that actually makes it feel believable,” Lurt said in an interview in October, noting that it took him a week to make the clip.

“It will totally change how everyone interacts with the internet, and what is possible,” Nathan Lambert, a researcher at the Allen Institute for AI, told me in January. “It’s super easy to see how this will be used for nefarious purposes.” When I asked him if he was worried about the 2024 elections, he offered a warning: “People aren’t ready for how good this stuff is and what it could mean.” When I pressed him for hypothetical scenarios, he demurred, not wanting to give anyone ideas.

Daniel Stier for The Atlantic

A few days after Lambert and I spoke, his intuitions became reality. The Sunday before the New Hampshire presidential primary, a deepfaked, AI-generated robocall went out to registered Democrats in the state. “What a bunch of malarkey,” the robocall began. The voice was grainy, its cadence stilted, but it was still immediately recognizable as Joe Biden’s drawl. “Voting this Tuesday only enables the Republicans in their quest to elect Donald Trump again,” it said, telling voters to stay home. In terms of political sabotage, this particular deepfake was relatively low stakes, with limited potential to disrupt electoral outcomes (Biden still won in a landslide). But it was a trial run for an election season that could be flooded with reality-blurring synthetic information.

Researchers and government officials scrambled to locate the origin of the call. Weeks later, a New Orleans–based magician confessed that he’d been paid by a Democratic operative to create the robocall. Using ElevenLabs, he claimed, it took him less than 20 minutes and cost $1.

Afterward, ElevenLabs introduced a “no go”–voices policy, preventing users from uploading or cloning the voice of certain celebrities and politicians. But this safeguard, too, had holes. In March, a reporter for 404 Media managed to bypass the system and clone both Donald Trump’s and Joe Biden’s voices simply by adding a minute of silence to the beginning of the upload file. Last month, I tried to clone Biden’s voice, with varying results. ElevenLabs didn’t catch my first attempt, for which I uploaded low-quality sound files from YouTube videos of the president speaking. But the cloned voice sounded nothing like the president’s—more like a hoarse teenager’s. On my second attempt, ElevenLabs blocked the upload, suggesting that I was about to violate the company’s terms of service.

For Farid, the UC Berkeley researcher, ElevenLabs’ inability to control how people might abuse its technology is proof that voice cloning causes more harm than good. “They were reckless in the way they deployed the technology,” Farid said, “and I think they could have done it much safer, but I think it would have been less effective for them.”

The core problem of ElevenLabs—and the generative-AI revolution writ large—is that there is no way for this technology to exist and not be misused. Meta and OpenAI have built synthetic voice tools, too, but have so far declined to make them broadly available. Their rationale: They aren’t yet sure how to unleash their products responsibly. As a start-up, though, ElevenLabs doesn’t have the luxury of time. “The time that we have to get ahead of the big players is short,” Staniszewski said. “If we don’t do it in the next two to three years, it’s going to be very hard to compete.” Despite the new safeguards, ElevenLabs’ name is probably going to show up in the news again as the election season wears on. There are simply too many motivated people constantly searching for ways to use these tools in strange, unexpected, even dangerous ways.

In the basement of a Sri Lankan restaurant on a soggy afternoon in London, I pressed Staniszewski about what I’d been obliquely referring to as “the bad stuff.” He didn’t avert his gaze as I rattled off the ways ElevenLabs’ technology could be and has been abused. When it was his time to speak, he did so thoughtfully, not dismissively; he appears to understand the risks of his products. “It’s going to be a cat-and-mouse game,” he said. “We need to be quick.”

Later, over email, he cited the “no go”–voices initiative and told me that ElevenLabs is “testing new ways to counteract the creation of political content,” adding more human moderation and upgrading its detection software. The most important thing ElevenLabs is working on, Staniszewski said—what he called “the true solution”—is digitally watermarking synthetic voices at the point of creation so civilians can identify them. That will require cooperation across dozens of companies: ElevenLabs recently signed an accord with other AI companies, including Anthropic and OpenAI, to combat deepfakes in the upcoming elections, but so far, the partnership is mostly theoretical.

The uncomfortable reality is that there aren’t a lot of options to ensure bad actors don’t hijack these tools. “We need to brace the general public that the technology for this exists,” Staniszewski said. He’s right, yet my stomach sinks when I hear him say it. Mentioning media literacy, at a time when trolls on Telegram channels can flood social media with deepfakes, is a bit like showing up to an armed conflict in 2024 with only a musket.

The conversation went on like this for a half hour, followed by another session a few weeks later over the phone. A hard question, a genuine answer, my own palpable feeling of dissatisfaction. I can’t look at ElevenLabs and see beyond the risk: How can you build toward this future? Staniszewski seems unable to see beyond the opportunities: How can’t you build toward this future? I left our conversations with a distinct sense that the people behind ElevenLabs don’t want to watch the world burn. The question is whether, in an industry where everyone is racing to build AI tools with similar potential for harm, intentions matter at all.

To focus only on deepfakes elides how ElevenLabs and synthetic audio might reshape the internet in unpredictable ways. A few weeks before my visit, ElevenLabs held a hackathon, where programmers fused the company’s tech with hardware and other generative-AI tools. Staniszewski said that one team took an image-recognition AI model and connected it to both an Android device with a camera and ElevenLabs’ text-to-speech model. The result was a camera that could narrate what it was looking at. “If you’re a tourist, if you’re a blind person and want to see the world, you just find a camera,” Staniszewski said. “They deployed that in a weekend.”

Repeatedly during my visit, ElevenLabs employees described these types of hybrid projects—enough that I began to see them as a helpful way to imagine the next few years of technology. Products that all hook into one another herald a future that’s a lot less recognizable. More machines talking to machines; an internet that writes itself; an exhausting, boundless comingling of human art and human speech with AI art and AI speech until, perhaps, the provenance ceases to matter.

I came to London to try to wrap my mind around the AI revolution. By staring at one piece of it, I thought, I would get at least a sliver of certainty about what we’re barreling toward. Turns out, you can travel across the world, meet the people building the future, find them to be kind and introspective, ask them all of your questions, and still experience a profound sense of disorientation about this new technological frontier. Disorientation. That’s the main sense of this era—that something is looming just over the horizon, but you can’t see it. You can only feel the pit in your stomach. People build because they can. The rest of us are forced to adapt.








Лео Канделаки и Анжелика Стубайло сыграют в новую трэвел-игру «Кто куда» на ТНТ

Арбуз, кукуруза и холодные напитки: диетолог Садыков назвал продукты, которые портят ваш сон летом

KENGURU GAMES в Звенигороде звенели на всю страну

«А у кого нет проблем?»: Филипп Киркоров заговорил про здоровье после полученного ожога


The founder of Deliciously Ella started a blog when suffering from severe chronic pain. Now, her multimillion-dollar snack empire is going global

OpenAI warns that its new ChatGPT Agent has the ability to aid dangerous bioweapon development

The Best Story Of Chicago Bulls’ Offseason Just Got A Happy Ending

Why Juventus spent 111m this summer for only one new player


Методы стеганографии становятся популярнее: когда картинка может стать угрозой ИБ

Хаски на Summer Sound – выиграй билеты на Like FM

Кайрат Бермуканов поддержал Петра Яна перед боем в Абу‑Даби

Конференцию Infotecs Marketing Stars в Кибердоме посетили более 500 специалистов по маркетингу и рекламе в сфере ИТ и ИБ


'I destroyed months of your work in seconds' says AI coding tool after deleting a devs entire database during a code freeze: 'I panicked instead of thinking'

Настройки GameHub и Winlator для игры в Prey (2017) на Android

Bungie promises to fix Destiny 2's new metroid-style morph ball as it makes players sick and glitches out on ultrawide monitors

The dairy industry would like Gen Z to drink more milk, so they made a Fortnite diner tycoon game



Движение к победе: в России стартует премия «Мы верим твердо в героев спорта»

Аудиосказки про Турбозавров теперь на Яндекс Музыке!

Летняя школа Русского дома приняла детей из Карабаха

Студия Лебедева запустила образовательный проект, в котором заказчики сами создают свой дизайн


Хаски на Summer Sound – выиграй билеты на Like FM

Ярко вспыхнула и быстро погасла: судьба певицы Марины Журавлевой

Лицемерная политика многовекторности. "Улыбается всем": Эрдоган в обход Москвы готов дружить с Баку

Нам есть чем гордиться: каким будет Национальный центр «Россия» на Краснопресненской набережной


Молодежь Подмосковья приглашают принять участие в конкурсе к 295-летию русского полководца Александра Суворова

Парк Юрия Лужкова 19 июля стал площадкой фестиваля The BOWL

Пост №6137249

Церемония прощания с режиссером Юрием Карой проходит в Москве


Александрова покинула турнир в Гамбурге, не дойдя до полуфинала.

Рублев прокомментировал вылет с турнира в Лос-Кабосе

Теннисистка Мур высказалась о проблемах в антидопинговой системе.

Соболенко сохраняет первое место в чемпионской гонке WTA.


Я слышу голоса: «Т-Технологии» опубликовали ИИ-модель, умеющую троллить мошенников

МегаФон увеличил площадь покрытия на 17,4% за полгода

Депутат Нилов предложил разрешить мотоциклистам движение по выделенной полосе

После вспышки инфекции на теплоходе «Леонид Красин» начали расследование


Музыкальные новости

VK Fest Санкт-Петербург: Детская сцена от PICCOLO и Radio Kids FM вновь покорила сердца юных зрителей!

Семеново, Онег, Борисово, Ивановка. Об усадьбах Рахманинова рассказала Наталья Басманова

Продюсер рассказал, почему заменил Shaman в опере «Князь Владимир»

The dairy industry would like Gen Z to drink more milk, so they made a Fortnite diner tycoon game


Аудиосказки про Турбозавров теперь на Яндекс Музыке!

Студия Лебедева запустила образовательный проект, в котором заказчики сами создают свой дизайн

Летняя школа Русского дома приняла детей из Карабаха

Движение к победе: в России стартует премия «Мы верим твердо в героев спорта»


В Пулково перенаправили 22 летевших в Москву авиарейса

Путин поздравил российских школьников с победой на математической олимпиаде

Собянин: еще два беспилотника, летевших на Москву, были ликвидированы

VK Fest 2025 побил рекорд по числу артистов и вошёл в Книгу рекордов России


Рискнул на 100 тысяч рублей: инженер из Смоленска купил более 1,7 тысячи лотерейных билетов и выиграл автомобиль от «Национальной Лотереи» и РОЛЬФ на VK Fest

25 июля пилотажная группа "Звезда" пролетит над пробками на шоссе под Москвой

Москвичей предупредили о задержках наземного транспорта из-за ливня

Зарядки для электромобилей «Энергия Москвы» станут платными


Путин выразил соболезнования семье председателя Верховного суда Подносовой

Министр юстиции поделился информацией о роли адвокатов в специальной военной операции.

Путин отметил успехи российских школьников на математической олимпиаде.

Нам есть чем гордиться: каким будет Национальный центр «Россия» на Краснопресненской набережной




Собянин: создание 5 центров женского здоровья завершат в этом году

Уже не крепкий орешек: как Брюс Уиллис живет с неизлечимой болезнью

Офтальмолог Кирилл Светлаков: секреты здоровых глаз для тех, кто носит контактные линзы

«Я на инвалидном кресле, он на «Тесле»: что случилось с блогером Мариш Мариш


В центре Киева митингуют против реформ Зеленского

«СТРАНА.ua»: Зеленский раскритиковал ЕС за непредоставление обещанной помощи


Движение к победе: в России стартует премия «Мы верим твердо в героев спорта»

Кайрат Бермуканов поддержал Петра Яна перед боем в Абу‑Даби

В Москве впервые пройдет фестиваль экстремальных видов спорта

Уже 59 орловчан решили побороться за просветительскую награду «Знание. Премия»


Лукашенко призвал белорусских нефтяников «не раскачиваться, а бурить и бурить»

Посланник Трампа рассказал, как «неформально» договаривался с Лукашенко: после нескольких тостов Минск освободил политзаключенных

Куда Лукашенко хочет пристроить лишних дипломатов


Собянин рассказал, как город заботится о природе при строительстве метро

Более 800 спортивных площадок обновят и обустроят в Москве — Сергей Собянин

Более 800 спортивных площадок обновят и обустроят в Москве - Сергей Собянин

Собянин: Продолжаем внедрять новый стандарт заботы о женском здоровье в одном месте


Уже 59 орловчан решили побороться за просветительскую награду «Знание. Премия»

Россияне рассказали, по каким критериям выбирают квартиру для покупки в 2025 году

Зарядки для электромобилей «Энергия Москвы» станут платными

Фестиваль «Сады и цветы»: как Москву превращают в городской оазис


Москва отменяется: героиня шоу «Мама в 16» дала еще один шанс отцу своего ребенка и возвращается в Симферополь

Депутат Нилов предложил разрешить мотоциклистам движение по выделенной полосе

Синоптик Тишковец: Пик ливней пройден в столичном регионе

Собянин: создание 5 центров женского здоровья завершат в этом году


Центр подготовки личного состава Северо-Западного округа Росгвардии пополнили призывники

Республика Алтай вошла в десятку регионов России по развитию ипотеки

Девочка упала при посадке на теплоход в порту Архангельска

В Нарьян-Маре из-за холодов возобновили подачу отопления в дома


Поезда "Таврия" по-прежнему задерживаются из-за ЧП в Ростовской области

Поезда в Крым и из Крыма следуют с задержками до пяти часов

В Симферополе на базе «Клинического госпиталя для ветеранов войн» функционирует гериатрический центр для пожилых людей с возрастными нарушениями

Актуальная информация о задержке поездов в Крым и обратно


Закупки госкомпаний у МСП превысили 3,8 трлн рублей в первом полугодии 2025 года

Депутат Нилов предложил разрешить мотоциклистам движение по выделенной полосе

Молодежь Подмосковья приглашают принять участие в конкурсе к 295-летию русского полководца Александра Суворова

Пост №6137249














СМИ24.net — правдивые новости, непрерывно 24/7 на русском языке с ежеминутным обновлением *