|






|
How It Works
One of
the many functions of speech recognition is that it
converts audible words to text. You can also make calls and
searches with speech recognition. Another thing that you can do
with speech recognition is that you can dictate and your
computer will take what you said and transfer it into text on the
computer. If you say something (words,
numbers, letters, symbols, etc.), then they get transferred to
the computer.
Speech
recognition is a complicated process and has two main steps to
it. The first step is to extract phonemes, which are units of
speech. Phonemes are the sounds that group together to form the
words that we use. There are about 40 phonemes in English, which
make up about 500,000 words. To extract phonemes, waveforms are
run through a Fourier Transform to convert the data and be
analyzed. The second step is to convert the phonemes into words.
The Hidden Markov Model (HMM) is most commonly used to do this.
The HMM is basically a chain of phonemes forming a word. It
takes the phonemes that you speak and puts them together to
figure out what you are saying. Also, some sentences or words that sound
alike might be able to be translated by the Markov Model, such
as write and right. The
image shown below is an example of the Markov Model. It shows
two pronunciations of tomato and shows the phonemes joining
together to make a word in the end, which is tomato.

|