How chaos could improve speech recognition

a-sound

If you’ve ever used speech recognition software, you’ll know how often it fails to work well. Recognition rates are nowhere near what is needed for anything but the simplest applications.

So a new approach for analysing speech by Yuri Andreyev and Maxim Koroteev at the Institute of Radioengineering and Electronics of the Russian Academy of Sciences in Moscow is welcome. Their approach is to treat the production of speech as a chaotic phenomenon.

That’s a significant difference compared with previous approaches which predict the next point in a speech signal by extrapolating from previous points in a linear fashion.

That works because the organs that produce speech–the vocal cords–change over a much longer time period than the sound they produce. So they can be considered essentially stationary for this type of analysis.

Of course, one of the characteristics of chaos is that very small changes in starting conditions can produce large changes in output. And if that’s happening, what kind of chaos are we talking about?

Andreyev and Koroteev answer this question by measuring the frequency and amplitude of the sound a person makes when saying various vowels and consonants. They then use this data to reconstruct the multidimensional phase space in which the chaotic signal is produced.

The results are interesting because specific vowels appear to be linked to unique structures in the phase space. Andreyev and Koroteev call these structures phase portraits. The picture above is a phase portrait of the vowel sound ‘a’.

It’s a little harder to identify the shapes associated with consonants and the researchers haven’t yet tried with other sounds such as dipthongs.

It’s a long step from here to speech recognition but in principle, it could be done by looking for the phase portraits of specific phonemes and using them to spell out words.

The question, of course, is whether this would be easier or harder than current approaches.

Ref: arxiv.org/abs/0812.4172: On Chaotic Nature of Speech Signals

Comments are closed.