Deepfaking Myself

We’ve all seen different levels of Deepfakes over the years, all continually more impressive from one iteration to the next.

But this article by Professor Ethan Mollick about his attempts to do it himself showed me just how easy it was.

Below, I followed his exact steps (I’ll re-explain below), and this is what I (rather – AI) came up with in just a few minutes:

Does it look perfect? No, but I’m considering how amazing this is from a “Day 1” perspective and with little to no effort from me to tweak the outcomes.

How to Deepfake Yourself

Again, this is directly taken from professor Mollick’s experiment, so as tools become more available, it’s clear that this type of work will soon be available in all-in-one packages.

Already figured it out from Prof. Mollick’s page? Click here to skip past the “How to” section.

Step One: Your Script

I had ChatGPT write my script for me. Note that I didn’t change anything, so the information isn’t quite accurate. I wanted it to try to capture my style a little bit, but I haven’t tried training ChatGPT to know how I write yet. Instead, I gave it the following prompt:

Write a roughly 200 word speech on why it’s important to learn verb tenses in English. Make the tone fairly casual. Include some ums or uhs in natural positions.

ChatGPT spit out the script, and I was on to step two.

Step Two: Cloning Your Voice

I’d read the articles that Microsoft figured out a way to clone your voice from a 3-second audio clip, which by itself is mind-blowing. I used professor Mollick’s approach, though, and signed up for a $5 subscription through ElevenLabs.

This part took me the longest as I wanted an isolated voice recording of myself to upload. In my case, there are a ton of samples available, so I grabbed some of my voice from a podcast episode and edited it so that it was just a segment of my voice with no other speakers, transition sounds, etc.

ElevenLabs doesn’t seem to have a “record in the app” option, so you will have to make your own recording of your voice. For most people, Vocaroo should work fine! Just record and then download your audio.

When you have your recording ready, click on “VoiceLab” then “Instant Voice Cloning”. Upload your audio, and you’re good to go!

To my surprise, it was almost immediately ready to work. I expected some sort of indicator, so I sat around for an extra minute or two, but it turns out it’s just that fast.

When you have your voice cloned, you will select the voice you want to use (you get 10 “voices” with the $5 subscription). From there, you simply paste the text you want in, and it will process it almost immediately.

Last but not least, download the audio.

Step Three: Fake Video

From here it was a jump over to d-id. I just did the trial version to test this out. I recommend doing the same and then seeing if this is something you want to subscribe to.

Clearly, from watching the video above, this was the weakest part of the process. The head movements, blinking, etc were fairly natural, but d-id doesn’t seem to recognize that my beard is a part of my face, so the movement gets real awkward.

I also didn’t upload exactly what they asked for, which was a direct front-facing picture, so I need to experiment with this more in the future.

This service is also fairly intuitive, but here were the short steps:

Click “Create Video”
Under “Choose Presenter” click “Add”
Upload a photo of yourself (they tell you the best setup)

Then, on the right side of the screen there’s an option to drop in your script, but since we want our cloned voice, I used the “Audio” option and uploaded my cloned voice recording.

The processing here did take a few minutes, but the vast majority of the time building the video above was in figuring out the structure of the platforms and getting my own audio and picture prepared. My entire process from beginning to end took 15 minutes, but now that everything is in place, I imagine it will be about 3-4 minutes total the next time around.

Critiques

Each of these points has problems. Most of which were pointed out above, but for a recap:

The video is awkward. It looks like I can stretch the base of my skull off of my head. This can probably be fixed with a picture that follows d-id’s guidelines better, but I’m still not sure if my beard will create a problem there. Assuming most of you don’t have beards that cover your neck, it won’t be an issue for most.
The audio tone is a little flat. I think adding in the ums and asking for the script in a casual tone helped match my style, but it’s still not quite me. Impressive, nonetheless.
ChatGPT will be ChatGPT. I didn’t check the accuracy, and I didn’t want to change anything it wrote for this first run. There are parts that are wrong, and I probably would speak with more hesitations and interrupters.

Implications

Let’s be honest. We’re going to end up in a world where teachers will make lecture recordings through AI just like this. Like all AI conversations, we’re going to struggle with what is appropriate and what isn’t.

One thing to remember is that your students want a relationship with YOU. Not with Fake You. If teachers start delegating the core of their work to AI, students will have a very fair right to complain that they’re not working with you.

On the other hand, I can see a place for making quick just-in-time videos that can help students understand discrete points or for review of concepts that were (or should have been) covered in a previous semester. Perhaps teachers should start branding their recordings as AI or Not AI.

For ESL Teachers, it’s worth taking the time to slow down and listen to the audio and video that comes out. In the same way we adjust our speaking, we may need to adjust our AI output to match our students’ needs.

If you’re getting ChatGPT to write a script, make sure you confirm it’s accurate and uses level-appropriate language.

If you’re uploading your voice for cloning, listen through and see if it’s pronouncing words properly.

If you’re using d-id to make videos, be aware that the lip movement may not match exactly with what you’re saying. Remember that students use visual cues like lip and tongue movement to help process work, so wrong moves here can create a lot of confusion.

What are some of the ways that you might find yourself using this technology in the classroom? I’ll be interested to hear people’s takes on this!

One response to “Deepfaking Myself”

Deepfaking Multilingualism – AI in ESL

September 22, 2023

[…] is less of a deepfake and more of a very sophisticated filter. Unlike in the Deepfaking Myself post, this actually is a video of me talking, and it is saying what I’m saying, it’s just that […]

AI in ESL