Ultimate Guide to Digital Assistant Voice Interactions

A voice interaction happens every time a user speaks to a digital assistant and the assistant responds in some way. The response can be something spoken, an action taken by the assistant, or a hand-off to a screen or other device. The content of the response is often handled natively by the hardware or the native assistant software. In other cases the assistant delegates to a third party to provide the content. To build an effective voice strategy for your business, it is important to understand exactly how voice-initiated interactions are handled by digital assistants. This guide provides a thorough overview of the different types of interactions, how they occur, and which ones are important for your business.

What Are Voice Initiated Interactions?

A voice-initiated interaction starts when a user makes a spoken request to a digital assistant like Siri, Alexa, or Google Assistant. Common requests are to ask a question, play music, or check the weather. The assistant then responds in a manner appropriate to handle the request, such as answering the question, playing a song, or giving a weather report. 

Voice-First vs Voice-Only Interactions

The type of response an assistant can give depends a lot on the type of hardware the user is interacting with. Smart speakers like Amazon Echo and Google Home will always reply with a spoken response. These are known as "voice-only" devices since the only input and output is audio.

Assistants on mobile phones may reply with a visual response, a spoken response, or both. Many smartphones and smart displays support input and output via voice and screens. These are known as "multimodal" devices.

Where Do Voice Responses Come From?

After a user initiates a voice interaction, the assistant must decide how to reply. Some of the responses are based on first party capabilities that are built in to the device. Examples of first party capabilities are setting timers, checking the weather, and answering questions. The content of the first party interactions is controlled completely by the device manufacturer.

If the assistant cannot handle the request natively, it may delegate the request to a third party service. Some phone-based assistants like Siri may do a web search and respond with a website link. Smart speaker devices like Echo and Google Home can try to delegate the request to a third party app. Google calls these apps Actions on Google and Amazon calls them Alexa Skills. Collectively they are referred to as "voice apps." The responses from a voice app are controlled completely by the app publisher. Each voice app creator controls what interactions to respond to and how to respond.

Understanding how First Party and Third Party Interactions are related is key to developing an effective voice strategy. In the next sections we discuss each in more detail.

First Party Response


First party interactions are built in or are native functionality for an assistant or smart speaker. According to the Jan 2019 Voicebot Smart Speaker Consumer Adoption Report, four of the top five use cases for smart speakers are first-party services. These include asking a question, checking the weather, setting an alarm, and setting a timer. Because these are the top use cases, a good voice strategy must include ways to get your content delivered through these first party interactions. Here are some examples of interactions that will be handled natively by the device.

"Alexa, what's the weather forecast?"

"Hey Google, set a timer for 20 minutes."

"Hey Siri, send a text message."

"Hey Google, how much should I water my orchid?"

Third Party Response


With third party interactions, the responses are controlled not by the device manufacturer, but by an external service. This includes third party entertainment like music and news, games, or voice apps. Voice apps are also known as "skills" or "actions." They are ways to extend the features of a smart speaker, much like installing an app on a smartphone. The third party category holds six of the top ten smart speaker use cases.

A characteristic of a third party interaction is that the user must name the service or app in order for the assistant to route the request properly. An example of this is the Alexa "Tide" skill. The Tide skill can answer laundry questions. Alexa may not have a great answer if you ask her directly, "Alexa, how do I get out a chocolate stain?" But if you say, "Alexa, ask Tide How do I remove a chocolate stain?" you will get a detailed response directly from the Tide voice app. Here are some additional examples of interactions that will be handled by third party services.

"Alexa, play Spotify."

"Hey Google, ask Sound Check what's new."

"Alexa, open Sixty Second Docs."

How Is This Relevant To Your Business?

A business voice first strategy needs to incorporate both first party and third party interactions. Many voice services are focused on building conversational voice apps. Just like some companies need a native smartphone app, some companies also need a dedicated voice app. With Soundcheck you can build and publish simple voice apps to Amazon Alexa and Google Home.

But interacting with a voice app is only the 7th most used feature on a smart speaker. The most common use case is simply asking a question. For most businesses this avenue is the best way to get your content delivered over voice. A generic question would be similar to the last first party example given above, "Hey Google, how much should I water my orchid?" While the interaction is with the voice assistant directly, the content of the answer is frequently found on the web. Soundcheck also lets you publish web content that is optimized for voice, making it more likely to get picked up by first party interactions.