Run! The AI Voices are coming!

The revolution will be televised

They’re real

Image result for i, robot

Alexa.  Siri.  Google Voice.  Bixby.  Nuance. They’re all out there.  They’re coming.  They’re on their way, and their aim is conquest…and their goal is replacing the humans.

As a voice talent, you should be afraid…right?  Right?  *insert sound of knees quaking here*

They’re formidable.

Image result for biceps

Yes, the voices today are gaining traction, and moreover they’re improving in tone, cadence, quality, and realism.  There's nothing like Amazon's Alexa - arguably the most real - to creep you out like you're actually hearing from a real human being (she originally was, of course).  Just like CGI – when you watch the original Toy Story, and then come back and watch Toy Story 4, you’ll notice the stark differences in quality of animation from beginning to end.  The shadows, the texture, the light play, the movement, the animation style…it’s amazing how much better technology has become in rendering.

CGI itself has become so incredibly photorealistic, especially movies that emulate real life and interact with real-life characters like Jungle Book or The Lion King, that it can be hard to decipher the fake from the real anymore.

Artificial Intelligence Voices are no exception.  They’re formidable, they’re imposing and becoming better and better, and, like it or not, just like CGI, they’re here to stay.

They’re here to stay

Image result for here to stay

AI Voices are not going away anytime soon.  Voices.com partnered with VocalID recently, and that concerned more than a few voice talent on said platform to a reasonable degree.

But are we really threatened?  The human voice is imitable.  Even a character actor or impressionist must take great pains to sound absolutely identical to the original human voice.  Lots of talk these days on AI Voices for use in videos, as opposed to living, breathing humans. The cadence, subtleties, nuances and inflections of humans are inimitable.  They’re hard to duplicate and difficult to replicate.  A human voice has distinct, unduplicable signatures to it that knows when and how to inflect and modulate with any given emotion, phrase, or intonation.  It communicates connectively, it doesn’t just sound out a script.

AI voices are getting closer and closer. Nuance, Alexa, Siri, Google Voice, Bixby, etc..  They're all good, and they've put down roots, and they're integrated into everything we do.  And they're here to stay, like it or not.

Ultimately however, can they ever replace humans?  Will they gain traction enough so as we're not able to decipher what's real, and what's an imitation?  I can tell the difference between my mom's awesome Molasses cookies...and Starbucks' imitators.

But what if artificial sounds so...human?

They’re creepy...and they're nothing in the long run

Image result for tipped scales

But - check out that link above again.  Really?  Did you hear the crappy voice quality?  Are you really quaking in your boots after hearing that???  I’ll give it to VocalID for trying, but as a voice talent, I’m not phased or threatened in the slightest.  After all, these are demos of specific sentences: demos, not actual phrases worked on, which will need to be really, really worked on and refined each time they're put to use.  And they'll have to be good.

Unquestionably, convincingly, hard-to-distinguish - Good.

But no matter how you slice it, ultimately, the nuances of the human voice will trounce anything a synthetic can provide...because they're the hollow chocolate Easter egg.  Yummy on the outside, not much there on the inside at all, and leaves you unsatisfied, like something's deeply missing (just like that chocolately caramel goodness you find in the NON-hollow eggs.  God bless you, Cadbury.)

Image result for ti speech synthesizer

Remember these?  I remember when I was a kid and we had a Texas Instruments computer and I thought we were living large.  It had this little box that you would plug into the side of the TI box, into a module port, and it looked like a little Tupperware container with a flip up lid.  It was a speech synthesizer.  We would type in “QWERTYUIOPL” and it would say “Qwertee-yoo-ee-opp-ull.”  We’d type in “So how do you like them apples?”  And it would say that with a continual rising inflection that betrayed human tone.  It was incredibly crude and of the Speak-n-spell rudimentary sound quality.  It did the job, but it only served to amuse; it didn’t serve to connect.  In fact, it was useless.  In fact, it was creepy.

The hairs go up on my neck when I envision being sold something by a robot through a voiceover.  Ew.

But that’s what the human voice does: it connects.  It breathes.  How do you synthetically replicate breathing?  How do you conjure up a catch in the throat?  What type of syntax generates a vocal pattern that can convey:

  • Fear
  • Anger
  • Sadness
  • Hope
  • Illumination
  • Joy
  • Wonder
  • Awe
  • Excitement
  • Frenzy
  • Passion
  • Lust
  • Peace
  • Contentment
  • Thrill
  • Horror
  • Disgust
  • Rudeness
  • Sarcasm
  • Monstrosity
  • Inebriation
  • Accents
  • Whisper
  • Booming
  • Yelling
  • Screaming
  • Pain
  • Dying…
  • and Living?

The answer?  You can’t.

Be unafraid.  Be very unafraid.  Leave the hills alone.  You don’t need to run just yet.  Stay here and continue to be human.  The revolution will be televised…but by humans.

Sincerely,

Joshua Alexander,

Real Human

===

Subscribe to my blog by entering your details to the right and hitting "Submit."

Need a voiceover?  Request a quote today or visit my Demo Reel.  Or subscribe to my blog.  Or other things.

Joshua Alexander

Seattle Voice Actor & Voiceover Talent for hire

joshua@voicetalentseattle.com

206.557.6690

www.saysomethingjosh.com

www.joshyface.com

www.joshygram.com

www.joshypin.com

www.joshylinked.com

www.joshytweet.com

www.joshyvids.com

Leave a Reply

Your email address will not be published. Required fields are marked *