An AI startup that lets anyone clone a target’s voice in a matter of seconds is being rapidly embraced by internet trolls. 4chan users have been flocking to free voice synthesis platform ElevenLabs, using the company’s tech to clone the voices of celebrities and read out audio ranging from memes and erotica to hatespeech and misinformation.
Such AI voice deepfakes have improved rapidly over the past few years, but ElevenLabs’ software, which seems to have opened up general access over the weekend, offers a potent combination of speed, quality, and availability — as well as a complete lack of safeguards.
Abuse of ElevenLabs’ software was first reported by Motherboard, which found posters on 4chan sharing AI generated voice clips that sound like famous individuals including Emma Watson and Joe Rogan. As Motherboard’s Joseph Cox reports:
In one example, a generated voice that sounds like actor Emma Watson reads a section of Mein Kampf. In another, a voice very similar to Ben Sharpio makes racist remarks about Alexandria Ocasio-Cortez. In a third, someone saying ‘trans rights are human rights’ is strangled.
In The Verge’s own tests, we were able to use ElevenLabs platform to clone targets’ voices in a matter of seconds and generate audio samples containing everything from threats of violence to expressions of racism and transphobia. In one test, we created a voice clone of President Joe Biden and were able to generate audio that sounded like the president announcing an invasion of Russia and another admitting that the “pizzagate” conspiracy theory is real; illustrating how the technology could be used to spread misinformation. You can listen to a brief, SFW sample of our Biden voice deepfake below:
ElevenLabs markets its software as a way to quickly generate audio dubs for media including film, TV, and YouTube. It’s one of a number of startups in this space, but claims the quality of its voices requires little editing, allowing for applications like real-time dubs into foreign languages and the instant generation of audiobooks, as in the sample below:
Posts on 4chan seen by The Verge include guides on how to use ElevenLabs’ technology; how to find the sample audio necessary to train a model; and how to circumvent the company’s “credit” limits on generating audio samples. Typical for 4chan, the content created by its users ranges widely in tone and intent, running the gamut from memes and copypasta, to virulent hatespeech and erotic fiction. Voice clones of characters from video games and anime, as well as clones of YouTubers and Vtubers, are particularly popular, in part because it’s easy to find sample audio of these voices to train the software.
In a Twitter thread posted on Monday, Eleven Labs acknowledged this abuse, noting it had seen “an increasing number of voice cloning misuse cases” and would be exploring ways to mitigate these issues. The company claims it can “trace back any generated audio back to the user,” and will explore safeguards like verifying users’ identity and manually checking each voice cloning request. At the time of publication, though, the company’s software is freely accessible without any limits on content generated. The Verge has contacted the company for comment and will update this story if we hear back.
To predict how AI voice clones might be used and misused in future, we can look to the recent history of video deepfakes. This technology began to spread online as a way to generate non-consensual pornography, and though many experts worried it would be used for misinformation, this proved to be largely incorrect (so far). Instead, the vast majority of video deepfakes shared online are pornographic, and the software has been used to harass and intimidate not only celebrities but also private individuals. At the same time, deepfakes are being slowly embraced by commercial entities and being used alongside traditional VFX techniques in film and TV.