We take it for granted at this point. We call a business and a computer asks us why we’re calling and in most cases we get connected to the person or department we were looking for. This happens every day and it doesn’t cause a ripple.
We think it’s time to stop and ask, how did that happen or how does Voice Recognition work?
The simplest incoming phone call model looks like this:
The most basic IVR (Interactive Voice Recognition) simply wants to find out who you’re looking for and get you there. The more complicated IVR, like the ones created for American Airlines or American Express, will gather more information and offer additional services to the caller, but still operate on the exact same voice recognition model as shown above. So, how does the IVR work?
The Science of IVR
Does the computer comprehend the words I’m saying and make decisions based on these responses? No and yes.
The computer doesn’t understand the words, from a technical standpoint. The “it” has no understanding of meaning, but it does recognize that a particular word means something within the context of the IVR. In other words, it understands that a response of “yes” means that the phone call is routed to one place and a “no” is routed to another.
This is not unlike how humans interpret sound.
We don’t really hear words exactly. We hear the vibrations that are made when we speak, and when our brain puts those together, it creates words and their meaning.
In other words, sounds are vibrations that move through the air, through water, through everything, to our ear. In a fraction of a second, our ear collects the vibrations, then sends the information to our brain to be interpreted.
Voice Recognition works very much like our sense of hearing. In a previous article, we discussed how we hear in depth. The fact is, however, that computers have their limits. The best IVR system may know between 5,000 and 10,000 words, while the average human knows between 12,000 and 25,000 words.
There is one very large aspect of hearing that the computer has not quite perfected – context. In other words, a computer does not know we are being sarcastic and makes no interpretation of intent or emotion. They only register the sound.
This, too, is changing. The computer systems we use are being designed in such a way that they can comprehend human emotion and react accordingly. The future is closer than you think.
The Engineering of a Voice Recognition System
Voice Recognition systems are a huge undertaking. This is how they work:
Design or use a system that recognizes sound (words)
Determine what information you want to collect from the caller
Write a script that gathers this data and allow for almost all possible inputs and scenarios. You can’t allow for every possible word and response, but rather choose the most likely possible word choices.
Design a system that takes this information and converts it to usable data
Conclude the process with a system that gathers data, which not only helps you, but ultimately provides a superior customer experience. In other words, deliver what the customer was looking for when they called.
This sounds hard…
The good news is that there are companies like Phonexa that provide all the expertise and creativity you need to design your voice recognition system. Your job is to determine what your filters are and overall, what you want the system to do.