Deepfake Daddy

Clint Hawkins
4 min readMay 27, 2023

DeepFake Daddy

I did the worst fatherly thing ever. I scared my child. On purpose.

WHY?

Here’s why: as I was binging YouTube AI news and learning about deepfake technology, I came across the story of Jennifer DeStefano, an Arizona mother who received a call from an unknown number claiming to have kidnapped her 15-year-old daughter, Briana! The scammers used a deepfake of Briana’s voice, making it sound like she was crying and pleading for help. A man then got on the phone and demanded a ransom of $1 million, which he later reduced to $50,000 when DeStefano said she didn’t have the money.

DeStefano was at a dance studio with another daughter at the time, and other mothers at the studio quickly called the police and DeStefano’s husband. They confirmed that Briana was safe within minutes. Despite this, DeStefano reported that she had believed it was her daughter on the phone due to the realistic imitation of her voice and manner of speaking. She stated, “It was completely her voice. It was her inflection. It was the way she would have cried. I never doubted for one second it was her”. You can read the whole article on People.com about the DeStefanos.

So, what could happen next? Will someone “deepfake” my voice and call my kids? How tough is that to do? Well, in less than one hour, I figured out how to make a Deepfake Daddy voice that can read any script I fed it, and it sounds JUST LIKE ME. The AI tool I used was called Eleven Labs. Eleven Labs has top-notch tech that allows you to train an AI model using sample audio clips. You can upload a 30-second audio clip of yourself, and then BOOM, you have a deepfake of you. This kinda reminded me of that scene in Terminator 2: Judgement Day, when the T1000 fakes the voice of the foster mom.

This type of tech is called Voice Synthesis. Voice synthesis, or text-to-speech (TTS), is like a translator that converts written text into spoken words. Just as a translator analyzes and understands the structure of a sentence to convert it into another language, voice synthesis analyzes the linguistic structure of text, assigns phonetic values, and selects a suitable voice to generate a waveform representing the speech. It’s like a virtual actor who reads the text and produces a natural-sounding audio file, making it possible for written words to come alive as spoken words.

Here’s the scary part: if you enable the microphone to record system sounds on your laptop, you can record anything you hear on your laptop. Facebook Live, IG posts, TikTok, YouTube, movies — you name it — you can capture someone’s voice and create your own audio file with it. So, I played with the idea. A great friend of mine, Corey Perlman, CSP (Certified Speaking Professional), is a public speaker with a lot of YouTube content out there (he also has kids on devices like me). I texted him and warned him that I was going to steal his voice. Within a few clicks and some dabbling with audio samples, I uploaded his voice to Eleven Labs as well. Now I had a deepfake Corey voice. I could now blackmail him so badly! Tell me that’s not scary.

That morning, as I finished my pot of Trader Joe’s “Wake Up” yum brew, and my youngest kiddo walked into my office (aka my garage where I have a green screen and my janky audio setup). She asked what I was doing. I asked her, “What’s something daddy says all the time?” She replied, “Do you have your water bottle for practice?” Fair enough, it’s my famous last words before dropping off my mini Alex Morgan. Then I asked, “What’s something I would never say?” She responded, “We are going to the mall and I am buying you new Jordans.” I typed exactly what she said in the Eleven Labs prompt area. I turned up my speakers, shut my mouth, and hit play.

My voice, the exact voice, read the script. I tried not to smirk as we listened.

She paused. She backed up to the threshold of the doorway. She squinted her little freckle face and said, “Ew, that was creepy. What did you do?!”

So yeah, it scared her. Good! Warn your kids, your family, your pets. If an old Gen X non-techy person like me can steal a voice off YouTube and create a deepfake voice… Well, you get the idea.

Until next time, this is the Artificial Dummy saying, “If you are looking for ransom, I can tell you I don’t have money. But what I do have are a very particular set of skills, skills I have acquired over a very long morning of watching YouTube.”

--

--

Clint Hawkins

Artificial Dummy 🤖, Helping people smarter than me find new opportunities