“AI voice actors sound more human than ever”

feedback.pdxradio.com forums feedback.pdxradio.com forums Portland Radio “AI voice actors sound more human than ever”

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
  • #51099
    Steve Naganuma

    Could this be the next step to further automate radio?


    Scott Young

    I thought VO was undervalued enough already without the need to bypass the human being altogether. I know plenty of very talented on-the-beach radio folks who are far from getting rich in the freelance VO marketplace.

    Steve Naganuma

    I don’t want to see the further loss of jobs in the broadcast industry. Reading the article and listening to some of the examples made me wonder about how this technology might continue to mature and be used in the future.

    Scott Young

    I guess this sort of thing has been happening in broadcasting ever since recorded music put live orchestras out of work. One thing for sure is the technology will mature very rapidly. I’ve seen it over the last few years with spectral editing that can isolate vocals and individual instruments to create very credible stereo mixes from mono sources. Not to mention turning awful stereo mixes into excellent. The advances are measured in months, not years. All these things are fascinating to observe, if not disheartening sometimes.


    Few months ago kept seeing FB ads for buy-outs for $78 bucks of so for 5-7 voices to use…heard the demo…still could hear some artifacts…but almost perfect…don’t know if I could get the owner of station I’m producing for to go for it.


    This brought back memories of the “Jelli” radio stations. A few years ago, I heard one of these radio stations on the air. A synthesized female voice announced the song that was about to play. Although the voice sounded good for a computer-generated voice, it still did not sound like a person; the inflection had an excessively monotone quality about it.

    I can see how this enhanced speech synthesis technology might find uses in advertising or even in Travelers’ Information radio or NOAA weather radio (to make the messages more intelligible than more traditional “robotic” synthesized voices). However, I am wondering whether it would gain much traction in the world of music radio, given the widespread adoption of text display of song/artist information, either by RDS, HD radio, or Internet streaming players. I can picture consultants pulling up data showing that having a voice interrupting the music to announce songs encourages listeners to tune out.


    There is high controversy in the most recent Anthony Bourdain biography film.

    They used voice cloning to present letters he wrote in his voice rather than that of a narrator. They also used it a touch here and there elsewhere in the film and are not being specific about where, and so far where it’s real is unclear, other than the letters.

    Check these out:


    For now, that’s real people. A lot of work is being done to eliminate them.


    If I were to place bets, right now someone, somewhere is training an AI on the various forms of broadcast speech.

    Break this down into a few rough archetypes, and the voices will be familiar enough to pass. Bonus for making sure there are cool beats / ambient sounds present to help the listener through the experience. That’s gonna happen.

    As for source material?

    Text, or what needs to be said, what could be said, is the same problem. The archetypes won’t be people as much as it will be content form and purpose, then drill down by region. Think “Tiegard” instead of “Tigard.” (that one really grated and I still remember it.)

    Having followed the voice systems closely, what they don’t talk about too much is the voice cloning. One does not need a ton of source material to get a convincing clone. A full range of emotional connotation and inflection takes considerably more though. It’s quite possible, right now, to dub in short phrases, say for a ratings problem, botched line, or reconsider post shot. And the facial movements ride along and are convincing in short chunks too.

    Arnold could be saying, “WHILE FAT TACK” as easily as “I’LL BE BACK”, and you can bet some clown somewhere is going to hit a home run with that tech. Then end up in court.

    Hey kids! It’s been a while. I rarely listen to radio, unless I travel. Haven’t done much of that, so…

    But, I have to say this:



    All those conversations we had about connecting with people, mind to mind, people being able to pick up on the mind on the other end, and or how diluting that mind, passing it through rules, spreading it out, and such reduce the value, the potency of the connection and what that means for radio as a medium ARE BANG ON CORRECT.

    And everyone here knows that. Nothing new, right?

    Well, here we are a decade or two beyond the breaking point, and it’s all stale, nothing like what we know is possible. We know why, how, all of it.

    Now, here’s the kicker:

    Because it’s been made stale, the problem space needed to “solve” this and have computer generated radio is an order smaller than it was prior.

    Lowering all those expectations to save a buck, reduce the value of talent, and, and, and, mean people largely won’t have the identification to the medium necessary to both detect this is happening, or give a shit if / when they do.

    I predict they will detect it, not due to the voice quality. To me, that’s almost solved. Crazy good. No, they will detect it when the people who do this do the same thing everyone is doing, and that is to save that last buck, max out the unbridled greed, and botch it with a great voice spewing nonsense.

    I also bet the first uses will be quick autogenerated ADS and what I’ll call “parametric” branding and imaging.

    Basically, have someone good generate one, say KBOT, the AI voice of the future heard today.

    Get the music bits, branding phrases that pay, all that crap done. Run it through a data corpus of prior efforts and have it spit out a few different ones using a set of audio images packaged together to work together.

    Once that’s done, pitch ’em by region, “customized” based on “research” and get about what one gets today on much shorter timelines and cost.

    Gonna be brutal!


    On the flipside, maybe, just maybe this is the kind of ultimate bottom needed for people to seek options to compete.

    And that will be either by expanding what can be aired, shock value style, and or people that other people can actually connect to.

    Tired of KBOT and the endless shuffle? Say hello to… and maybe the cycle closes in on itself and we get some good radio before we all tip over.


    I watched Roadrunner (the Anthony Bourdain documentary) this weekend, and I was left wondering about the scenes in which Bourdain narrates some of his thoughts. The audio quality was not something that he could have achieved by dictating notes for one of his books using a consumer-grade cassette recorder or something that could have come out of a home video. Because of the way that these scenes were interwoven with other types of footage, I made the assumption that I was hearing audio from pieces of TV show episodes that were cut. I was fooled, but I think that was the intent of the documentary’s producers. I would have never guessed that computer software helped to create the narrative segments.

Viewing 9 posts - 1 through 9 (of 9 total)
  • You must be logged in to reply to this topic.