Author(s): Peter Bryer
Speech-based user interfaces (UIs) have just received a big boost. Amazon has released a software development kit (SDK) called the Alexa Skills Kit, allowing third-party developers to access the Alexa cloud-based voice-recognition engine behind Amazon's Echo household personal assistant. The user asks, and Echo answers in a Siri-like fashion. For Amazon, Echo means perpetual connectivity to users and their homes.
Voice recognition platforms are becoming ecosystems of their own as natural UIs grow in importance, complementing operating systems and their legacy interfaces. Opening access to Alexa will allow Amazon to establish its voice engine as a developers' favourite before Apple, Google or Microsoft provide SDKs for their versions.
Amazon announced the SDK the same week that its Echo household personal assistant became widely available to consumers. The tussle for the smart home is narrowing between a few major players, and third-party providers will have to hitch a ride with one or more of the top competitors.
Developing a high-quality speech user interface is a significant task, and the barriers to entry are rising with user expectations. Voice recognition engines need to demonstrate ongoing learning and improvement, and be able to deal with local dialects and late-night mumbles. Inaccurate results continue to provide a source of entertainment, but natural user interfaces are becoming serious business.
The announcement of Amazon's SDK is likely to cause competitors to open their voice systems to third-party developers. Listening devices are on their way to being a control point in smart environments like homes and cars, and CCS Insight has highlighted the importance of natural user interfaces for device makers and services (see Daily Insight: Talk It Up).
The SDK will enable more consumer electronics and appliance makers to include a voice UI as part of the user experience. Ovens, set-top boxes, stereos, televisions and thermostats could be on constant listening mode, and Amazon's engine in the sky could be the central control point. For a number of languages, voice is becoming a common input method to complement touches and clicks. Listening clouds are evolving into ecosystems of their own.