Show Notes
Natural Sounding Synthetic Voices
The Future is Spoken presents Rupal Patel as this week’s guest. Rupal is the Founder and CEO of VocaliD, a voice Al company that creates unique synthetic voices. Unlike conventional methods, VocaliD's award-winning technology generates high-quality, natural-sounding voices within hours, not months. They leverage cutting-edge machine learning techniques, proprietary Voice blending algorithms, and our crowdsourced Voicebank dataset to enable brands and individuals to be heard in a voice that is uniquely theirs. Vocaloid is a spin-out from her research lab at Northeastern University. She is a tenured professor in the Department of Communication Science and Disorders and the Khoury College of Computer Sciences.
Starting with their own experiences, they end up discussing synthetic voices.
Tune in now!
Conversation Highlights:
[00:23] The journey to Synthetic voices…..
- Rupal works on making customized synthetic voices for individuals as well as for companies. They actually started the mission to create voices for people who couldn't speak.
- She also explains how the world of Voice is touching the sky right now.
- Their goal is to use Voice as a social connector.
[03:32] Identifying the problem…….
- Rupal explains the reason behind creating the Vocal ID. She divulges the problems she identified while researching people with speech impairment.
- People with limited speech capabilities still can control the prosody of their Voice.
- What does it take to create a natural-sounding voice?
- You can train a voice with any amount of data.
[11:47] Tuning the prompt according to your need.
- Rupal speaks about the different ways to tune the prompt for the pitch or speed or even the tone. The end-to-end synthesis methodologies allow controlling Speech differently.
- They have also started implementing a new method to make a change at the word level. Rupal is also excited about some of the style modifications.
- Are people open to having different Voices from the same assistant?
- Personalization always wins!
[20:09] The importance of Natural Sounding Voice
- She elaborates that almost every way we are consuming information is through our ears. Because of so much audible capability, you need to have a natural voice.
[22:18] What secret skill do you need to enter the text to Speech world?
- She touches on the skills you need to enter the world of Speech and design natural-sounding voices.
- Linguistics is becoming the heart of Voice.
[26:11] Researching is an essential aspect of everything.
- Rupal explains that apart from doing experiments on building up the voices and making them sound more natural, they also do listening perception experiments to understand how consumers have different preferences for Voice.
- She also touches on how they ensure that the quality remains up to par and the operating system's role in amplifying the quality.
[31:37] The question of Privacy?
[37:24] How can someone design a Text to Speech engine for their use?
[40:36] How is Vocal ID different from others?
- Vocal ID is focused on customized Voice as supposed to specific libraries that other companies possess.
- Machine learning can get you to 90% of the way, but you will require an understanding of Speech to reach that last mile.
- How do get to those emotive levels?
[46:40] Must Listen
- Rupal's piece of advice for someone trying to get into the world of Voice.
Special Reminder:
Celebrate The Diversity of Human Voices! Will You Share your Voice?
Join others from around the world in sharing the gift of Voice. Register today.
Learn more about Rupal at
If you enjoyed this episode of The Future is Spoken Podcast, then make sure to subscribe to our podcast.
Follow Shyamala Prayaga at @sprayaga