The particular hype around the Amazon and Google voice gadgets has offered an unexpected spotlight to marketers and a number of self-styled specialists and voice-UX advocates to try making sense of this emerging consumer market segment. As of now, only a few usage-related surveys have surfaced with unknown methodology therefore with none or very limited value for any serious analytical consideration. Even the statistics related to the "actual" number of sold units by each vendor could not be used as an objective criterion of measurement given that their ad hoc leaks follow consumer market arousal tactics rather than target a desired and better public knowledge. Additionally, the substantial absence of a generally accepted conceptual framework for the specific field makes it difficult to read and interpret correctly the ongoing trends. Nor the promotional technology articles published almost on a daily basis do add actual insights given that they are often drafted by clueless staff writers. The ending result is the lack of a comprehensive analytical perspective, a greatly needed big picture.
While awaiting the start of more systematic field studies, we can try to organize the current fragments of information by elaborating a few temporary useful concepts. Obviously, we assume that the reader is sufficiently familiar with Amazon Alexa and Google Home basic terminology.
In our view, a good starting point is the so-called "skills," as they are called in Amazon's platform jargon. According to the official sources, as of this writing, the "skills" could be defined as the platform's expandable task-oriented capabilities that allow users to interact with the Alexa-enabled devices in a more intuitive way by using voice. Currently, Alexa's feedback is mainly audio and only partially it supplements the information by using visual cards displayed inside its companion mobile App.
If we run an extensive analysis of the skills, we can observe essentially three high-level categories of voice-based user experience elaborations that in a number of cases present overlapping areas:
- (CVC) skills with added improvement through (NLP)
- (ICR) skills that include audio pointcast of entertainment, news, economy, culture, education, sports, games, and other similar topics.
- (AIG) skills that include all those capabilities that allow users to interact with their home physical environment.
The Interactive Custom Radio (ICR) category is not only the quantitatively predominant one among the skills (probably it forms more than 90% of the overall current skills corpus), it is also the only one that has displayed, and still showing a very fast growth rate until now. Such a performance seems to be consistent with the Amazon's marketing goals of flooding the market segment, feeding the media hype and possibly wreaking havoc on any emerging competitions. Moreover, we may also consider other driving factors both on the developers' and consumers' end. Since it is an emerging platform, many developers understandably tend to focus on skills that are easier to design and implement both technically and in terms of available input feeds. All that also help to significantly shorten time to market (TTM) length. On the users' side, the ICR skills often receive a rapid acceptance for they reconnect with the decades-long consumer's habits of listening to traditional radio broadcasts.
The truly original and most interesting category with tremendous future possibilities is the Ambient Intelligence Gateway (AIG) class of skills. These skills are opening the way towards a Sentient Environment where sensing technologies, data processing and supporting middle-ware fuse to generate and maintain a representation of physical space in terms of a world model, allowing shared perception between computing devices and persons. To better contextualize the AIG category of capabilities, let's imagine bottom-up the four abstract layers of any Sentient Environment:
In their current release, the AIG capabilities are almost exclusively limited to either direct or conditionally triggered (by using services such as ) voice commands. Compared to the old-fashioned (IVR) model, the present use of Natural Language User Interface (NLUI) offers an increasing linguistic flexibility. However, we are still substantially in a voice-based equivalent of the decades old computing (CLI) phase. Nevertheless, the most important voice user experience's intrinsic property, that is, invisibility, helps create a context somehow similar to what in the specialized literature is known as a (NUI). The latter induces the feeling of an acquired "Shamanic" empowerment that
The AIG skills have yet a long way to go before they mature and transform enough to merge into a (UAL) -- that is, a maturing Mediated Reality ecosystem (MR) -- and "dissolve" completely into the context of people's daily life. Additionally, we still have to see how these skills will integrate with other developing interactivity models such as Gesture-Control, for instance. The future of unfolding Sentient Environment, particularly its layer (embedded intelligence), appears definitely promising and wide open to exciting new developments. The current voice interactivity feature is only the first step in a long march on a rocky road full of hills and cliffs.