Alexa, grant me a wish!

User Experience & Usability

colourful speech iconsDigital virtual assistants have come a long way

Principal UX Consultant, Swetha Sethu Jones, discusses some interesting scenarios and challenges from her personal experience of using Alexa over a 6 month period.

As Amazon’s Alexa, Google Assistant and other digital virtual assistants show, we’ve come a long way since Apple first introduced us to Siri in October 2011. Voice interaction technology has reached a point where you can now converse with a device in a way that gives the illusion of talking to a real person. But this also raises the expectation of real life conversation.

Make like the Genie: what should non-human agents do?

magic lampPeople have some idea of what they think non-human agents should do. Some of these things are based on ideas portrayed in fiction, such as Jarvis from Iron Man. I sometimes wish that Alexa was like the friendly Genie in Disney’s Aladdin; what is it that people like about Jarvis and the Genie – and is there anything we can learn and apply as UX professionals?

Context of use, and challenges

I use my Amazon Echo every day: it’s made a lot of simple tasks easier. For instance, every day before going to sleep, I ask Alexa to ‘help me sleep’, and she does this by playing sounds of the rainforest, ocean waves etc (through the Sleep Sounds skill).

‘Help me sleep’ is a good example of how Alexa can make everyday tasks easier, using a simple and naturally worded command. The trouble is, not all commands are this straightforward: typically you need to ask Alexa to open a particular skill (the equivalent of an app) and then use the specific commands/language set by that skill. This is where the experience can get a bit tricky!

UX designers and specialists beware

For UX designers and specialists, this neatly highlights why it’s vital to follow clear guidelines and principles when designing skills for a smart personal assistant. It’s important to note that for a voice-based interaction on a device like Amazon Echo, people won’t necessarily make the distinction between device interaction and skill interaction (unlike with phones, where the distinction between the phone’s operating system and specific apps is clearer).

As a result, any issues with your skill will not only be a problem for your brand, but also for the entire voice interaction ecosystem – because it makes the virtual assistant seem less helpful!

Below I’ve listed some challenges that I’ve experienced when using Alexa. First, I’ll talk about the problems with Alexa, Amazon’s intelligent assistant. Then I’ll focus on specific challenges with some of the skills I’ve used.

amazon alexa logoAlexa specific challenges

  • Account setup doesn’t make sense: The Echo device is a shared device in the home, and the user accounts model in its current state doesn’t seem useful. Is there a need for this, and how could it be done properly? This is an example of trying to map existing personal device paradigms onto systems that are intended to be an invisible, shared part of the household.
  • Restricted account access to Spotify: Alexa only allows the primary account holder (which in our household is me) to connect their Spotify account to listen to music. This seems unnecessary and confusing, especially because my husband has the paid Spotify subscription! This is related to the previous point – smart assistants need to understand the complexities of shared households, whether they are comprised of families or flatmates.
  • Inflexible inputs and control (i.e. for shopping lists): I can only add one thing at a time to my shopping list. If I said “Alexa, please add tomatoes, kitchen towel, and eggs to my shopping list”, it will add all 3 items as a single item in the list (or if I pause for too long, it will just add the first item). It would be handy if Alexa could recognise pauses within commands and divide the list this way. More frustratingly, I can’t remove anything from the list without using a companion app!
  • Alexa, are you listening? I’m often doing something else when I ask Alexa a question. So when I say ‘Alexa’, even though the device lights up, I want some audio feedback as well, because I may not be looking at the device. Perhaps Alexa could just make an affirmative noise, like a human would if I called out their name (this is one of the big challenges with voice interaction technologies, because humans can recognise context, and gauge the kind of verbal/non-verbal responses we need to provide).

Skill specific challenges

  • Missing context: When I use the National Rail skill on Alexa (where I’ve set my home and primary destination), it reads out a list of the next 3 trains – this reflects the information available, but not my needs. As I’m getting ready to go to work, I want to know when I need to leave the house. And I need to decide if I can actually catch that train, or if I have time to have breakfast!
  • Content is not location specific: As in the above example for train times, if the National Rail skill knew that I’m 10 minutes away from the train station, it could give me more relevant results. Perhaps you could allow users to decide whether or not to share their location with specific skills, just like we do for apps on smartphones.
    Unlike a smartphone (that follows us around), a home assistant has access to our home location and the members of our household. This creates an opportunity to further understand how this context can be used to provide more relevant content. However, it also raises potential challenges around getting consent from different household members.
  • In-app suggestions biased by shared use (e.g. Spotify): In my home we’ve linked my husband’s Spotify account to our Amazon Alexa, and due to our diverse music preferences, the music recommendations on his Spotify account are now a bit bizzare!

The above are examples of things that I’ve personally encountered in my use of Alexa. But there are some other possibilities with voice interaction on Alexa, that I’ve not used. Non-use of technology is just as interesting as use, especially for a new domain and novel tech – it can help us understand the barriers to adoption! For instance, I’ve never bought anything using voice commands (yet!).

shopping cart with blue backgroundAlexa ‘purchase by voice’

When I set up my Alexa, one of the first things I did was to switch off ‘purchase by voice’. My main reason for this was my concern that purchasing by voice is almost too easy. Clarity and reassurance are needed on how purchases are authenticated.

We’re at a similar stage to when digital payments like PayPal or other mobile wallet payments were launched. For many of us in the UK, it’s likely second nature now to just use PayPal or another mobile/digital wallet for payments. However, many people across the world are still getting used to online payments. For example in India, ecommerce sites allow the option for users to pay cash on delivery, and similarly Uber in India has the option to pay cash after the ride, or top up credit in the app.

So how is purchasing by voice useful, and how can you reassure users? As Alexa is a shared device, I might start a purchase journey but be interrupted by someone else – for instance, could they jump in and accidentally make the purchase? Is it child proof, accident proof, or drunken shopping spree proof?!

Designing for these devices

There are lots of guidelines available when designing for voice and conversations, developed by the key tech players in this domain. I’ve included links to some of these individual guidelines in the ‘More like this’ section on this page. I’ve pulled out some core principles that seem to be common when designing a voice interaction:

  1. Use natural language: write for how people talk, not how they read and write.
  2. Keep the interaction short: Respond quickly and take people directly to content.
  3. Allow for intuitive and clear interaction, by providing examples where needed.
  4. Provide feedback and acknowledgement: In conversations, people often ‘backchannel’ (e.g. making noises like “uh-huh”, “hmm” or even short words like “yeah”) to provide verbal assurance that they’re still listening.
  5. Identify and handle utterances gracefully: Handle partial commands and corrections from users as well, by asking relevant follow-up questions. People often don’t speak in full sentences, and tend to repeat and correct themselves! If you’ve never heard a recording of yourself in a casual conversation, try this out and you’ll be amazed at how fragmented normal conversations are.
  6. Prevent errors by expecting variations: This is related to the previous point. There are many variances in spoken English, and you need to plan for and test these.
  7. Fail gracefully: Sometimes people don’t understand each other either!
  8. Allow flexible interaction: For instance, let users replay information, skip information, or change instructions.
  9. Adapt the dialogue as needed, and be prepared to help at any time.
  10. Accommodate diverse speaking styles: It’s not just about accents and slang, it’s also about context. The way the same person uses the same skill might change throughout the day, and over time. For instance, even within the UK, the same word can have different meanings (e.g. when is dinner, supper and tea – ask your colleagues and see if you get the same answer…).

Designing a smart home skill

retail mannequin with network graphic overlayIn this article I’ve discussed general skills for Amazon Alexa. If you have a smart device that can be connected to Alexa using the internet (e.g. Philips Hue devices that allow people to adjust or turn on/off the lights), then you’ll need to design a Smart home skill using the Smart Home Skill API (which I’ve not covered in this article).

The main difference is that this API uses a much more restricted set of commands, so that people can interact with smart home devices in a consistent way. This kind of shared command set across similar devices (or device families) is quite interesting, as it opens the door for universal interactions in voice commands.


Conclusion: Alexa, what does all this mean?

Unlike smartphone and smartwatch based digital assistants like Siri, smart home devices such as Amazon Echo, by their very nature are used at home, and it’s essential to bear this in mind when designing skills and voice interactions for them.

If you are thinking about creating a skill for a smart home device, think about the context of use, develop scripts and journeys for different flows, and follow the specific voice design and interaction guidelines. As with any service design, follow a user centred design process to discover the user need(s) and challenge(s), design and iterate the interaction to meet the need, and then build and test it.

It’s particularly interesting to explore the needs and opportunities for a device that is used by multiple people in one shared space (as opposed to a smartphone that’s used by one person and follows them wherever they go).

At System Concepts, our expert UX consultants can help you test early concepts and flows, support you in further improving interaction flows, and test fully functional beta/live skills in real-life contexts.

Speech bubbleImprove your approach to smart digital assistant design
Contact us to access unique UX insights and proven techniques

More like this

Amazon Alexa | Voice design guide

Amazon's guide to the process of thinking through the design of a voice experience...

Amazon's guide to the process of thinking through the design of a voice experience...

List of design guidelines

A useful list of lists: as many guiding principles as we could find, all in one place. List curated by Ben Sauer at Clearleft...

A useful list of lists: as many guiding principles as we could find, all in one place. List curated by Ben Sauer at Clearleft...

Inclusion through UX

Our inclusion through UX event marking World Usability Day 2017 featured inspiring talks. Review key event resources and takeaways...

Our inclusion through UX event marking World Usability Day 2017 featured inspiring talks. Review key event resources and takeaways...