With advances in the AI sector, new platforms have appeared that generate speech from entered text.

This article is an opinion article and not an article that claims to have mathematical and absolute truth. We are unaware of the advances in AI and what will happen in the coming years, but at vocesdecine.com, given our long experience in the industry, we are convinced, today, of the content of the following lines.

Can AI replace professional speakers? The answer is yes and no. Today, the results of AI-generated speech in Spanish are still far from being credible. Some transitions between letters are clearly perceptible, giving clear robotic effects. Presumably these defects will be corrected over time.

With AI what you can achieve is a credible narration, with a standard intonation, and with a standard type of voice.

The hardest thing to achieve with AI is the emotional nuances of storytelling. Human beings can show emotions with their voices and these emotions are picked up by small nuances in speech. A kind voice is different from a cold voice by a small nuance. A good interpretive job will let us know if a person who cries is crying out of happiness, contained sadness, or heartbreaking pain. It will also let us know if a person laughs for fun, revenge or nervousness. Narrating a mystery audiobook for children or an encyclopedia is not the same. These are just a few examples of nuances that the human voice allows, but these nuances are practically infinite.

We doubt that a machine can be trained to generate them and to know when to use them.

This doubt is based on different arguments, which are the following:

  • The number of parameters to be entered for each voice is very numerous and must be done for each voice to be credible.
  • We doubt the capacity of the machine trainers. Who will be willing to have their voice cloned and stop working since a machine can do it for them? We work with many top-notch voice actors. No one is willing to train a machine to replace them. This leads us to think that the person who will train the machine will be a third-level actor who is not in demand and who has little to lose. Most likely, this actor is not capable of giving the nuances that will give creativity to professional voiceover.
  • The catalog of voices that AI-generated voice platforms can offer is limited. Our clients greatly value being able to choose the voice for their projects based on timbre, age, the character of the voice...

In summary. The voice generated by AI will affect the voice-over market in low-cost projects with little interpretive demand. If we talk about projects that require interpretive nuances, today, AI is not a viable alternative.

If we transfer this concept to food, the voiceovers generated by AI would be equivalent to Fast-Food restaurants and the professional voiceovers made by real people would be equivalent to traditional restaurants (not old-fashioned) where a wide variety of food is offered that retains all its flavor and nutritional value. Both can be eaten but as Alejandro Sanz would say: “It is not the same.”