Voice UX: Ideas For A Voice App Quality Metrics Framework

February 23, 2017

Background

In the midst of the ongoing hype around emerging voice-based Virtual Assistants (including Virtual Advisors, Virtual Companions, etc.), there is a crucial aspect that has been somehow pushed to the backburner and largely overlooked by the recent media analysis. That is indeed the overall voice UX quality, and particularly the usability aspects of the 1000s of skills, actions and voice apps already released or being offered to the public. Compared to the more traditional user interface paradigms such as CLI, GUI, etc., the lack of attention to the qualitative aspects seems to oddly suggest the naive belief that the voice user interface by itself could and should be considered enough to maximize automagically the quality of the user experience of any given application or product.

We all had our share of bitter experiences using the clumsy IVR (Interactive Voice Response) systems. Some of us would recall that the level of frustration has started dampening with the increasing improvement and adoption of NLP (Natural Language Processing) by service providers. With Apple Siri first and then with the new breed of wannabe CUI (Conversational User Interface) solutions by Google, Amazon, Samsung, Microsoft, etc., we discovered the actual and potential virtues of the overall voice UX. However, the expected hype of the marketers by one hand and the genuine curiosity and easy enthusiasm of the early adopters by the other have generated a thick smokescreen that has mostly masked the slithering consumers' dissatisfaction.

There are still none or not enough trustworthy analytics released to the public and openly supported by the major market players in regards to the actual consumers use of the thousands of Alexa skills, Google Home conversational actions, etc.. However, both our usability test and field interviews in the Silicon Valley area offer a number of clues that can be traced back to discoverable diffused consumers' frustration.

Our Goal

The main objective of this quick report is not to compile a list of consumers' grievances but to suggest an initial framework for establishing measurable criteria that can allow for a more detailed evaluation of skills, actions, and other similar voice apps released or to be released soon based on emerging platforms.

The core of the voice UX is to make sure users find actual value in what is offered them. Based on an adaptation of the renowned Peter Morville's User UX Honeycomb model, we can establish -- and propose for a wider discussion -- a tentative voice app (v-app) quality metrics framework as follows:

Is the v-app useful? This question tries to establish whether the content is original and satisfies a user's genuine need. In other words, the v-app design should actually present some innovation in functionality versus other comparable products. It should enable the user to achieve practical goals in a better way compared to what the other existing solutions would allow.
Is the v-app usable? Here we try to measure the overall ease of use of a given v-app. We obviously assume that v-app should work as claimed, without malfunctions.
Is the v-app desirable? This question relates to those properties of a design that are deemed to trigger positive emotions and enjoyment on the user's end. In other words, a user should like the way a given v-app works in comparison to the existing solutions.
Does the v-app offer easily navigable content? This is all about being intuitive and natural, that is, as much as possible close to the user's spontaneous conversational expectations.
Is the v-app accessible for the people with disability? This is also a crucial aspect of a v-app. It is about a design that increments all the other qualities without making steeper the learning curve for people with some level of physical and/or cognitive challenge.
Is the v-app credible? This question relates to both which content a given v-app offers and the way the content is presented to the user. Credibility becomes of enormous relevance particularly when a content is used to support decision making in a number of critical domains such as health, diet, finances, legal issues, etc. .

For all the above mentioned 6 measurement criteria, we propose to adopt a standard rating scale from 1 (minimum) to 10 (maximum). Such a wide scale might appear to complicate the overall evaluation processes. However, we think that it is worth the effort because it allows to better capture nuances that down the road could generate unexpected dynamics in the users behavior.

We invite all the interested parties to intervene and help to test and refine as much as possible this voice UX quality metrics framework (QMF). We hope and believe that such initiative will assist users, voice UX designers and developers to better approach this emerging human-machine relationship experience as it expands increasingly into many domains and their related products and services.

Search This Blog

Walking Makes The Road