The Future of Voice Recognition

David Bosley
David Bosley
March 3, 2016

It’s difficult to think of a technology that is advancing faster than Voice Recognition. In an effort to see where it’s going, let’s rewind back and see where Voice Recognition was a few years ago, in its infancy.

There was much more keypad pressing than there was speaking to the actual voice interface. In the world we live in now, you can call an airline and book a flight around the world, all without ever touching your keypad, and the entire conversation sounds real.

So, let’s peak into the future of Voice Recognition.

Whisper Recognition

Samsung, who is lagging behind in the voice recognition competition (see Google Now, Apple’s Siri) has filed patent paperwork outlining technology that would make whispered voice recognition a type of speech that the phone would recognize. This is a parallel track to normal voice recognition – your phone would recognize when someone is whispering and would “change tracks” to operate in a mode that would recognize that speech delivery type and have its very own grammar.

It’s known that Samsung is working with Nuance, the world’s leader in voice recognition technology, so news that Samsung and Nuance are working on a new technology is not a surprise.

Whisper Recognition might be viewed as one of the first steps in “setting” recognition. It’s the ability of phone technology to recognize where the phone communication is originating and adjusting accordingly. If there are lots of background noise, in a restaurant, for instance, there could be a voice recognition channel for that. This could spawn a whole new set of Voice Recognition channels.

Emotion Recognition

This already exists. There is facial emotion technology that is used by television networks to test viewer’s reactions to their programming.

So, how does this work? Well, whether we like it or not, our human behavior is in many ways measurable. Emotion Recognition measures the changes in our demeanor when we’re agitated, bored, happy, or sad. This measures these emotions as pitch and energy, which form the basis of emotion recognition technology.

This research is already here. MedCityNews.com reports that there a number of telemedicine vendors who are moving into this area with one of the goals being a new field of telepsychiatry. “The emotion recognition would allow doctors to understand what their patients are feeling even if the patient isn’t physically present or is not explaining their emotions to a psychiatrist or psychologist.”

It’s been widely reported that the emotion recognition industry, which was a $5.66 billion dollar industry in 2015, will be a $22 billion dollar industry by 2020. In fact, law enforcement is expected to lead the way, which has been the case from the start. The earliest examples of emotion recognition software is the lie detector test – although inadmissible in a court of law in the United States, expect this to change as technology improves. This brings us to future research, which will collect this data and make hypotheses about what we will say and how an operator, VR system, or doctor should respond.

Speech Prediction

“So, you’re telling me some computer will predict what I will say next? Poppycock.”

Do you think that a computer program could predict that I’d say “poppycock?” No, perhaps not, but speech prediction is more about predicting human behavior with data than with being a soothsayer. So, like it or not, something you say in a conversation right now, might in fact act as a precursor (and predictor) to something you may say next.

The possibilities are endless – from guiding a call center operator to “lead a caller” down a particular path in the conversation, to convincing a patient (who’s in consultation with a counselor) that they are in fact acting a certain way because the counselor can predict the behavior. In short, voice technology is not just about hearing and interpreting sounds, but about hearing and interpreting emotions.

The Future of Voice Recognition

If you still aren’t convinced that Voice Recognition is growing exponentially, just look at the verticals that have embraced the technology. In the 1990s, healthcare was one of the first Voice Recognition industries, but now automotive, travel, education, military, energy, lead generation, banking and financial services, and many others are jumping into the fray. So, why the growth?

Simple. The technology is improving, which is improving the customer experience – and it saves companies money. There are fewer operators, shorter on-hold waiting time, and making better use of that on-hold waiting time. In years past, while on hold, callers were subjected to an array of “elevator music.” This time is now used to gather much-needed information so that when the caller is transferred to an operator, they already have the basic information to start the transaction.

So, if an IVR is not currently a factor in your industry, chances are it will be soon.

Oh, and here’s one final piece of emotion-testing piece of new technology: There is a firm called Sensum that uses your galvanic skin response to measure your sweat levels and will then use that data to make certain hypotheses about your emotional state.

There’s no question that the future of voice recognition is upon us. It’s exciting (and will be rewarding) that we’re here to use the technology in ways that benefit our companies.

Read More on Our Blog

Please, Log-In