The Art of Building a Voice Assistant

Uncategorized Dec 02, 2020

Like many leaders in the industry, Ananya Sharan didn’t start her career in voice technology. After working as a software engineer and then attending business school to become a banker, Ananya realized that her true passion was technology and product management. Today, Ananya is the product manager for Pandora’s Voice Mode, an innovative, mobile-only voice assistant that helps users discover and listen to new music with ease.

What is Pandora? 

Pandora is the largest audio-streaming platform in the US. The company is best known today for providing users with personalized audio recommendations. Pandora is able to accomplish this thanks to the Music Genome Project, which annotates each song track in a catalogue. Music experts also listen to the tracks to characterize them and give them multiple attributes. This data is then leveraged to understand what should be played next or recommended for users. 

The inspiration behind Voice Mode

Ananya explains that Voice Mode came about through a combination of internal employees’ ideas and the increased adoption of smart speakers. Although many users were already logged into their Pandora accounts through Google Home or Alexa, there was an additional challenge:  

“On these connected devices and smart speakers, Pandora cannot really leverage its wealth of data science and knowledge about the listeners' personal tastes to deliver personalized music that is tailored to the unique case of each listener.”

To address this issue, Pandora’s team decided to design a product that could showcase the company’s strength in data science while also providing users with a natural, conversational experience as they explored and consumed audio content on their phones. The resulting product was Voice Mode, which is available on Android and iOS.

What makes Voice Mode unique?

Before Voice Mode came onto the market, users could still request that specific songs be played on smart devices. However, Pandora’s creation is unique in its ability to work in ambiguous situations and predict what users are most likely to enjoy listening to. Ananya uses the examples of cooking or driving, which are both contexts that require hands-free control of a listening device.

“On the phone, when you say, ‘Hey Pandora, play me something for cooking,’ we know what you like to listen to; we know that you like Italian cooking radio ... so we play something like that, instead of just picking a random playlist or a station.”

Building a voice assistant as a team

As Voice Mode’s product manager, Ananya works across a team of engineers, scientists, UX designers, researchers (and more) to advocate for users’ needs. 

“We're building a lot of things that don't have well established industry standards … so we really need to think ahead of what the user might want out of this experience.”

Ananya views herself as a representative for the “the work of many” who have invested months or years into the product. Pandora’s team is so involved, that Voice Mode was first tested internally to ensure that a “benchmark of quality” was met. 

Overcoming challenges in voice

To ensure the highest degree of accuracy, Voice Mode had to be created on a milestone-driven schedule, rather than by a traditional timeline. Accuracy is crucial to voice technology, as users can easily dismiss a product if they are misunderstood and have a bad experience. 

Ananya sees recognition of various user accents, individual voices, and unique industry names (such as the artist “X-X-X”Tentacion or “Triple X”Tentacion) as challenges that AI is still tackling. However, Voice Mode has made strides in other areas. Ananya uses the example of Lil Nas X’s record-breaking 2019 song, “Old Town Road.” Users who only knew the song by some of its most catchy lyrics—“I got the horses in the back”—were in luck, because Voice Mode has a lyric search mode.

Meeting user needs like these is one of the many rewards of working in voice technology. It is also shows the potential to bring more human elements to AI. After all, doesn’t everyone forget song names sometimes?

Creating strong voice experiences

For those embarking on projects similar to Voice Mode, Ananya advises that every voice experience should be cohesive with the brand’s story and values.  Ultimately, the focus should be on “real users and real use cases,” and there should always be an assumption that a request can be made in multiple ways.

“Unlocking these use cases and moments that are so suited for voice discovery, and doing a really amazing job for the user … these are the foundational things that any voice technology has to get right.” 


Sign up for the Digital Assistant Academy newsletter

Subscribe to get our latest updates and offers by email.