Speech recognition as a tool for both augmented and virtual reality as well as more already available technology such as our mobile phones and devices such as Alexa, Echo and others is definitely improving and surrounding us. The technology enabling this service is becoming better and the ability for it to adapt to accents and even further speech impediments is also being fine-tuned.
So far my experience has been decidedly hit and miss. Siri on most occasions doesn’t get my question right, even with training it in to hear my voice in a recent IOS update, he still cracks me up with the most weird and wonderful things. This morning I only wanted him to go back to his previous search results and we ended up having a debate on why it was me not him. Very funny, totally random and glad I am not relying on just speech to get my stuff done.
I haven’t had the privilege yet of testing out Alexa and Echo. I did try out some voice activated typing dictation softwares, which involved a lot more training of the system than in Siri’s case and they reached reasonable accuracy, but again nothing I would be willing to drop written search over. With sound input and enunciation quality as factors, I wonder if it is possible to get to the point where a group of people can all use the same device with their own accent and find the answers they were looking for. Or even better again different language all going into the one device and everyone receiving their answers in the language they asked for.
The companies behind these technologies say it is possible and in a lot of their marketing you are made to believe it is accurate, can adapt to various accents, etc. The reality in my limited experience is quite different. I had to turn Siri off on my iPad because it also thought newsreaders or journalists with a similar voice to mine were calling him into action every few minutes. Imagine watching BBC news and being interrupted every 5 minutes by “Hi, how can I help”. I have some friends who are absolute fans of Alexa and see their children having conversations with it.
In a virtual world project we are working on, we use the technology developed by Edorble. In this scenario my voice and the voices of the other people in the virtual world are our own, just like you would hear them on Skype or via telephone. At this stage there is no robot or other artificial intelligence involved. However for our 2-D version on the website for the same client we have had to resort to an avatar character, that asks you questions to help you navigate. Voice commands would make the project prohibitively expensive and possible also a bad user-experience for the end-user.
I worked on a translation related software straight after finishing my first degree, and then the project fell down because the organic way language evolved couldn’t be accurately translated by devices. Now a good 10+ years on, the translation tools are more and more accurate. My guess is that this will be the same for speech powered tools. Right now we will stick with voice recording or real speech from one user to another when incorporating speech recognition in our gamification design projects. I am hopeful that in a few years, we don’t have to limit ourselves to the written word and can expand our reality to have voice activated tools that respond to all voices with different accents, speech impediments or even other languages.
Maybe the future holds a reality were we can just call out a cloud based tool to come to our assistance. The click and point style augmented reality may be closer in reach, but the real shift will come when our voice or even our mind can activate tools and find answers. My guess is, that this will take less than the 10 years it took with the translation ability of tools. I am looking forward to the design options this will give us.