Is Voice AI Ready for Primetime Yet?

I’m going to write a lovely blog post with some big words in it like schadenfreude and taciturn.

I will write conversationally, like, you know, the kids do today.

I might even put in an emoji or two, such as this one -💪and this one -🤘.

Then, I think I will write a run on sentence with a bunch of curse words; son of a bitch, damn it, I forgot to finish the run on sentence with the correct punctuation and spacing.

Then I think I will write very much in my own voice: look, I’m not going to tell you what to think, and I am not going to tell you I know what you’re thinking, shoot, I’m just happy someone is thinking, but don’t overthink it too much, o.k. chief?

So far, so good. Grammarly even likes all of this.

Why am I doing this? Well, I am testing out Voice AIs and seeing how close they are to actually taking blog posts (or any written word) and translating them to the hottest new club in New York – Voice! Voice is everywhere. While music streaming may be the current media darling, the fact is, most people are listening to more audio, and it’s not more music, it’s more voice.

It’s podcasts, it’s news, it’s opinion, it’s sports “radio,” it’s narration over video. It’s Google home, it’s Alexa, it’s Siri.

Scary Alexa

The amount of voice being used for Voice Assistants will probably outpace all other forms of voice in the aggregate.

But those voices sound like digital assistants.

They feel computerized.

What if that same text could be converted to your own voice or a voice that sounds like a VO artist?

In 2016 Adobe teased something called #VoCo which basically is photoshop for your voice.

This is what it looked like then.

Scary stuff, or exciting stuff – depending on your worldview. Obviously, this is technology that could be used in a very malicious way. Think about a politician who was exposed saying something horribly racist – only they actually didn’t. What about a wife getting a divorce settlement base don faked voicemails.

Word is, Adobe legal killed the project from becoming public.

Siri fun

But there are some great use-cases for this type of tech. What if you could write a blog (like this one) and then sling it to audio, slap some intros and outros on it, and blam, now you have a podcast. Take that same vocal audio and now put it to video, blam, now you have multiple pieces of content, in your own voice.

I tried to contact this company, Audivity, a few months ago and tried to get an article converted, but never heard back – they are definitely in Beta.  “Convert Your Blogs Into Professional Podcasts -Less Than 3 Hours. Seriously.”

The gold standard in this is Lyrebird    I tried to “train it in my voice” earlier this year and it was not stellar.

But it is getting closer.

From January Lyrebird #1:

 

From today: Lyrebird #2

Both of those sound a little like my voice, but horribly digitized, and kind of NOT AT ALL like me!

Today on Product Hunt there is another entry into this world called Voicepods 

Voicepods is different though. I’m sure they will be moving into ML and voice learning, but for now, they are really trying to streamline the process of text to voice as a platform. Meaning, they then sling it to a podcast, push it to Medium, push it to an Action in Google (like Alexa Skills). This is far easier to do and it is necessary.

So, let’s see how close we can get. I’m going to copy this entire blog post and see what we get.

Voicepod #1 in the voice of Noah:

Voicepod #2 in the voice of Elijah:

 

So, there ya go. Love what Voicepods is doing. Voice-to-Text is cool and all, but what can you do with it? That’s what they are trying to answer.

The verdict is this: customized and personalized voicing is nowhere near ready. However, using AI assistant voices with Voice-to-Text is very smooth.

Here’s the thing though, I’ve been listening to these all morning. The Voice AIs all grind on you after about 5 minutes of continuous listening. For short bursts, it’s fine, but for long-reads, it hurts my little ears. It’s not a volume issue, it’s a color issue. As someone who works with the human voice every day, in VOs, music, ads, reads, I personally don’t think that Voice AI can replace the nuance and the beautiful imperfections that make the human voice awesome. But, much like taste in music, it may be that Voice AI is just fine for many others to listen to for long periods of time. Just not me.

The answer is, no. No, Voice AI is not ready for Primetime.

Still, as a matter of content creation, it may absolutely work.

We shall see.