Speech Recognition

 

 

 

 

 

 

 

 

How It Works

     One of the many functions of speech recognition is that it converts audible words to text. You can also make calls and searches with speech recognition. Another thing that you can do with speech recognition is that you can dictate and your computer will take what you said and transfer it into text on the computer. If you say something (words, numbers, letters, symbols, etc.), then they get transferred to the computer.

     Speech recognition is a complicated process and has two main steps to it. The first step is to extract phonemes, which are units of speech. Phonemes are the sounds that group together to form the words that we use. There are about 40 phonemes in English, which make up about 500,000 words. To extract phonemes, waveforms are run through a Fourier Transform to convert the data and be analyzed. The second step is to convert the phonemes into words. The Hidden Markov Model (HMM) is most commonly used to do this. The HMM is basically a chain of phonemes forming a word. It takes the phonemes that you speak and puts them together to figure out what you are saying. Also, some sentences or words that sound alike might be able to be translated by the Markov Model, such as write and right. The image shown below is an example of the Markov Model. It shows two pronunciations of tomato and shows the phonemes joining together to make a word in the end, which is tomato.