Tuesday, August 15, 2017

Amazon Alexa Beyond Echomania: Jumping The Fences Of The Current Constraints


A few days ago Amazon announced Alexa platform's support for notifications. That is indeed a positive step forward and adds up to the proactiveness of Alexa. The latter property was interestingly one of the main points emphasized by a number of private comments to my last post published [here] and [here].
According to my readers, the Alexa virtual assistant of the fictional scenarios I have referred to was rather a version way too advanced compared to the actual Amazon Alexa. 
Notifications are indeed a positive step forward and indicate more and better proactiveness of Alexa
That's absolutely true. My imaginary interactions (designed for the specific article's purposes) were obviously based on an Alexa platform concept pretty far from what we have today. My fictional Alexa was obviously geared more towards an AGI (Artificial General Intelligence) concept while today's Alexa is still based on a model just a few, although meaningful, steps passed a script-based IVR (Interactive Voice Response) paradigm with technological roots in the 1970s
No doubt that the current Natural Language Understanding (NLU) is an impressive new capability of the ASR (Automatic Speech Recognition) but it does not add much to Alexa's intelligence as a virtual agent nor does it Alexa's excellent human-like high-definition voice. 
The Natural Language Understanding is an impressive new capability but it does not add much to Alexa's Intelligence as a virtual agent.
Current Alexa is essentially a centrally-managed collection of ad-hoc and isolated skills, not too far from an IVR model with branching logic format. The cross-domain real-time capabilities, for instance, are very limited and the context awareness is extremely basic, if not totally inexistent. The new notification feature is an improvement but in absence of other crucial capabilities, the whole platform remains inherently hostage of a "parallel silos" logic while the centrally managed approach becomes a choke point instead of acting as an expected inter-skills facilitator/communicator: e.g. Alexa knows about skills A, B, and C while skill A doesn't have a clue about skill B and C, etc. Here are a few sample scenarios to better clarify my point:
User: Alexa, ask my InstaBanker about my current balance.
Alexa: Sure, the current balance on your account is $275.
User: Alexa, ask PG&E how much is my due for this month energy bill?
Alexa: You got it! PG&E confirms it's $365.
User: Alexa, is there enough money in my account to pay the bill?
Alexa: Sorry, I don't understand your question. 
[----Another common example-----]
User: Alexa, play classical music from Prime.
Alexa: Here is a station for classical music: Classical Focus.
[-------Music starts streaming------]
User: Alexa, news briefing, please.
Alexa: Here is your news briefing.
[---------News starts streaming------]
User: Alexa, stop the briefing. Please resume music.
Alexa: I can change playback music only when music is playing.

Towards Future Alexa

The fictional Alexa of my last post is instead a smart real-time spawned agent capable of crawling an ever expanding distributed and cross-domain knowledge-sphere while delivering edge service to end-users. A "knowledge-sphere" could be imagined as a mix of a Wolfram|Alpha-type Computational Knowledge Engine and a standardized network of knowledge nodes (that we can keep calling 'skills') that can be independently generated by 3rd parties and made available via a virtual marketplace based on some kind of implicit contracting (e.g. through Amazon Prime) or on-demand service purchase (e.g. through App Store). All that means an Alexa that can navigate and use, on an as-needed-basis, both an ever growing repository of semantically structured Linked Data and an instantly generated output by doing computations from an internal knowledge-base.
My fictional Alexa is also capable of taking advantage of on-device processing to allow, for instance, for privacy-by-design with an approach similar to what is currently offered by the French company Snips.ai
Now, let's get off the imagination train and smell the morning coffee: Yes, we are still very far from this fictional scenario. Yet having a strategic perspective helps a lot to think (and un-think) things while we try to push the today's Alexa platform beyond and above itself. In the meantime, let's imagine Alexa as the mythological Phoenix and whisper with William Shakespeare:
“Nor shall this peace sleep with her; but as when
The bird of wonder dies, the maiden phoenix,
Her ashes new-create another heir
As great in admiration as herself.”

Voice-First Discreet Charm: Doing More With Less!


After several months of 24 by 7 peddling of the virtues of voice-first devices such as Amazon Echo and Google Home as the ultimate anticipation of the upcoming golden age of user experience, many marketers and self-styled voice-first forecasters find suddenly themselves in an embarrassing situation of confusion and uncomfortable backpedaling. Some major players such as Amazon and reportedly Apple are either releasing or allegedly planning to release a new breed of (former?) voice-first devices with tablet-sized screens. The fundamental explanation is the dramatic discovery that voice alone is unable to do the trick for users way too accustomed to relying on visual cues to fulfill their desired intents. Does that mean we are back to normal and the voice-first urban legend already shattered and the "Shamanic" empowerment is vacating the front seat? Time will tell.
Is the voice-first urban legend already shattered? Is the user's "Shamanic" empowerment vacating the front seat?
As I explained several months ago [here] and [here] and [her]: "The current voice interactivity feature is only the first step in a long march on a rocky road full of hills and cliffs." Therefore, I am not at all surprised by the current partial u-turn and actually expect to see even more zigzagging over the upcoming months and years almost by any major player in this specific technology field. For our purposes, what is really important is the adoption of an explicit value-based strategic perspective beyond the trivial and inevitably biased marketing rhetoric. 
It is important to adopt an explicit value-based strategic perspective beyond the trivial and inevitably biased marketing rhetoric. 

Strategic Vision

The strategy that I propose to choose is the vision of an ambient technology that simplifies life, makes it easy and convenient by both maximizing utility and improving experience thank to a growing disentanglement of surroundings by minimizing the need for external input. Such an outlook offers a reference framework to establish a metrics toolset that could be used to evaluate each new product, service or simply feature and the related design. Only a strategic view would allow anticipating with needed confidence whether a new and apparently disruptive product or service would survive and thrive beyond the lighthouse customers acquisition. 
A strategic view would allow anticipating with confidence whether a new and disruptive product or service would survive and thrive beyond the lighthouse customers acquisition.
Generally speaking, Natural Language Understanding is definitely among the decisive components of a futuristic ambient technology as described above. The same we can assert about the Motion, Gesture-control, and the overall self-adapting Sentient Environments. Could visual mediators, namely device screensspecial glasses or electronic contact lenses, participate in the same smart ambient and integrate with voice, motion and gesture? No doubt they could and actually they should regardless of the voice-first apocalyptic visionaries' dismay.
In my opinion, however, a relevant issue, still open to creative solutions, is the way we may integrate these components so to dampen the disruptive constraints of discrete time, physical spaces, and events to reinforce the continuity of user experience. In other words, this is about possible ways of using technology to build a ubiquitous access layer (very similar to what in computing is called "abstraction layer") between the user and the surrounding ambient volatility and complexity. 
The relevant issue, open to creative solutions, is the way we integrate technologies so to reinforce the continuity of user experience beyond and above the ambient volatility and complexity.
To explain better my point, I created a few imaginary advanced Alexa conversations as follows:
User: Alexa, how is the traffic?
Alexa: The fastest route via I-80 and Ch-24 takes about 38 minutes. Do you like to see the navigation map?
User: Yes, please.
Alexa: Should I open the map on your mobile phone or your tv screen?
User: On my tv screen in the living room.
Alexa: Ok, here you go.
In this first example, Alexa uses a familiar and already existing and available ambient resource (tv or mobile phone) to complete a task. This way Alexa easily avoids the disruptive effects of "parallel silos" of using its own dedicated display attached to some version of Echo.
User: Alexa, shut down my tv and move the navigation map to my car display in 10 minutes.
Alexa: Ok, you got it!
User: Alexa, load my todo list on my phone screen now.
Alexa: Ok, here you go.
Here Alexa is able to flexibly bridge the potential user's experiential time, physical spaces and task execution gaps. Alexa's subtle handling of the overall context generates the perception of continuity in the user's subjective experience.
User: Alexa, turn off all lights and set the alarm when I leave the house.
Alexa: Ok, I asked the garage door to let me know when you leave the house.
In the above scenario, Alexa relies on the garage door’s smart motion recognizer to schedule a requested task execution. In the following scenarios instead, Alexa acts as a savvy coordinator moving intelligently across IoT and M2M networks to schedule and perform assignments. The context develops to become a full-blown ubiquitous access layer in the sample interactions below:
User: Alexa, ask the kitchen fridge to send the shopping list to Amazon Fresh Pickup store close to my office.
Alexa: Ok, the fridge forwarded the shopping list. Amazon Fresh wants to know what time you like to pick up your grocery bags.
User: At 5:30 pm.
Alexa: Grocery bags pickup set for today at 5:30 pm.
User: Alexa, send a grocery pickup notification to my phone around 5 pm.
Alexa: Ok, a phone reminder set for today at 5 pm.
User: Alexa, where are my car keys?
Alexa: Sure, your car keys are in the kitchen.
User: Alexa, open the garage door and start my car in 5 minutes.
Alexa: You got it! Your Health Advisor wants you not to forget your daily medication for allergy before leaving the house.
User: Ok, thank you! Please ask my Health Advisor to schedule my annual check up any day next week before 9 am. 
Alexa: Sure. Your check up is scheduled for next Friday morning at 7:45 am. The Health Advisor wants you to be there while fasting. I will send a reminder to the tv in the family room on Thursday evening. 
User: Please ask the bathroom mirror to remind me about the fasting on Friday morning.
Alexa: Sure, you got it!
[................. User is already in the car.............]
User: Alexa, what time does my first meeting start?
Alexa: Your first meeting starts at 8:45 am in the Conference room. 
User: Alexa, ask my Office Assistant to set up the slide show for my first meeting.
Alexa: Ok, done! Your phone said it has just received a text message from your Daughter. Should I read it to you now?
User: Yes, please. 
[...................User is at the office ....................]
User: Alexa, book a table for 3 people at the Italian restaurant close to home.
Alexa: Ok, done. Do you want to pre-order your bottle of wine?
User: Yes, please text me first the list of the Italian wines.
Alexa: Ok, here you go! 
User: Alexa, arrange for a Uber to pick up my son at the airport today at 6:45 pm.
Alexa: Ok, done!
User: Alexa, did I renew the annual subscription to your services for 2018?
Alexa: No, you didn't. Do you want me to submit the renewal request and schedule the first quarter payment?
User: Yes, please do! Remember to upload a copy of the receipt to my Accountant. 
Alexa: Both the renewal and the first payment are scheduled. A receipt will be sent to your Accountant repository.

Voice-First Apps Monetization: Let's Break The Ice!


During the last few months, we have read and heard a lot about the need and possible ways that some sort of monetization could and should be allowed to entice and reward the Voice-First app developers. This is indeed a totally legit demand and should receive an adequate answer. Yet, as an emerging, though rapidly expanding technology field and market segment, the revenue stream generation might face a number of complexities that require survey-based fact gathering, careful analysis and an open debate among all the interested parties. 
Monetization could take place in a number of ways by following available paradigms.
At first glance, a hypothetical monetization could take place in a number of ways by following available paradigms. Here is a quick but not necessarily comprehensive list:
  1. Revenue generated through direct voice-app sales and marketing based essentially on the developer's shoulders with or without the Voice-First platform provider's direct support.
  2. Revenue generated indirectly through a larger premium service that offers one or more voice-apps as parallel and complimentary access channels. 
  3. Revenue produced through the classic downstream advertising sponsorship and sales.
  4. Revenue received directly from the Voice-First platform vendor based on an upstream subscription model.
  5. Revenue created through some sort of viable combination of the previous 4 methods.

Nothing Is really as it seems: I called an angel, devil had my phone tapped

Let's run a quick review and see the potential pros and cons of each approach. 
Number 1 is the classic and well consolidated mobile app store-based monetization paradigm. It appears being a viable solution and offers a known and functional revenue sharing model to the benefits of both developers and the platform vendor. Amazon has already put in place the necessary specific infrastructure. The same could be easily implemented by Google, Microsoft, Samsung, etc. Most probably the vendors currently focused on the enterprise environment, such as Artificial Solutions, would follow alternative perspectives best aligned with corporate-wide deployments. The latter might mimic a model based on a mix and match of traditional licensing and the more recent cloud-based subscription formats.
Amazon has already put in place the necessary specific infrastructure for a store-based monetization solution.
At number 2, we have a situation a lot easier to manage since the direct voice-app revenue is not at all an issue for developers no matter working in-house or as external service providers. In this model, the developer revenue is defined as operating cost and taken care of right off the bat with an upstream solution.
Number 3 offers a more complicated context. On initial consideration, almost everybody gets the impression of a no-brainer: Voice and hence Voice-First platform are among the best-ever vehicle for commercials. Such apparent self-evident truth is simply based on our decades-old collective experience of the radio broadcast.
Our tolerance toward interstitial advertising (on both Radio and TV as well as the social media) has significantly decreased during the last couple of decades.
Although our tolerance toward interstitial advertising (on both radio and tv as well the social media) has significantly decreased during the last couple of decades, we still live with a rooted habit of hearing sudden quick ads in the middle of a normal program audio stream without being surprised. Here is the question: Is that enough to make us believe that Voice-First is just the radio broadcast redux and therefore ads could get unleashed bar none? I argue that it is not at all or, at least, not always the case. 
In one of my previous articles, I tentatively classified the current voice-apps production in 3 big categories. Among these categories, Interactive Radio or Flash Briefing type of skills/actions (hitherto more than 90% of the voice-apps) can conceivably have a better tolerability score (in the ears of the beholder) for embedding ads because the former are ultimately nothing more than interactive radiocasts. However, as we move forward, the Flash Briefing type will be inevitably and increasingly pushed to the margins to leave, in facts, the centerpiece to more genuine forms of actual conversational user experience where interjecting ads without breaking the natural dialog-flow will become a lot more complicated.
It's reasonable to conceive that advertisement interjected in a naturally flowing human-machine conversational dialog could trigger rejections on the user's end and could produce counterproductive side effects.
It's awkward when we imagine, for instance, even a simple conversation about the local traffic disrupted here and there by brief commercial announcements. While awaiting the results of more accurate users surveys, it's reasonable to believe that advertisement in a naturally flowing human-machine conversational dialog could trigger rejections on the user's end and could produce counterproductive side effects. Here is a quick imaginary but realistic taste of the awkwardness:
  • User: Alexa, how is the traffic?
  • Alexa: The Home Depot close to you has a special sales tax-free day. Are you....
  • User: Alexa, what the hell are you talking about?
  • Alexa: With the current traffic you can arrive at your destination in approximately 52 minutes. Please don't forget about today's sales tax-free special at Home Depot...
  • User: Alexa, have you lost your mind?
  • Alexa: Sorry, I don't understand your question.
  • User: Alexa, just leave me alone!
  • Alexa: Ok, but please do not forget about the Home Depot specials. By the way, do you want to learn more about the upcoming ...
  • User: Alexa, shut up, please!
  • Alexa: Sorry, I don't understand your question.
  • User: Alexa, god....
At number 4, we find a clean upstream resolution. Here developers publish their voice apps and contractually participate in a pre-established usage-based revenue sharing logic that is carried on directly and fully managed by the Voice-First platform vendor based on end-users service subscription. Both Amazon and Google have already such infrastructure in place and could fully open it up to skill/action developers engagement. Such model could be followed suit by other players including Microsoft, Apple, Samsung, Baidu, etc. Enterprise Voice-First vendors such as Artificial Solutions could use adapted forms of such revenue sharing system in a number of standard productivity contexts.
A mixed regime offers arguably a picture rather close to what we should realistically expect to happen in the very near future.
A mixed regime, as mentioned at number 5, offers arguably a picture rather close to what we should realistically expect to see as the Voice-First goes mainstream while both the technology and the related market segment grow.
In parallel, we should also expect to observe forms of incremental monetization that will allow Voice-First platform providers to significantly increase their own revenue stream through a number of techniques such as voice-app Search Placement and Premium Content revenue sharing all the way to allowing direct Voice-First platform licensed access by third-party products and services (Smartphones, Home devices, Cars, wearables, etc.).
There is no free beer, food is expensive and wine is served only at the finish line.
If we want to believe in the utterly optimistic Voice-First specific projections for the next 3 to 5 years then we should brace ourselves for a dazzling future where the human and the machine intelligence will strengthen each other to literally re-invent the ways people will make a living. Companies such as Amazon, Google, Apple, Samsung, Baidu, and Microsoft are leading these tectonic movements yet we should always keep an eye on Shadow Flowers and continue to believe in the existence of icebergs in the open ocean.