**Problem Detail:**

I'm learning about HMMs and their applications and trying to understand their usages. My knowledge is a bit spotty, so please correct any incorrect assumptions I'm making. The specific example I'm wondering about is for using HMMs for speech detection, which is a common example in literature.

The basic method seems to be to treat the incoming sounds (after processing) as observations, where the actual words being spoken are the hidden states of the process. It seems obvious the hidden variables here are not independent, but I do not understand how they satisfy the Markov property. I would imagine that the probability of the Nth word is not just dependent on the N-1 word, but on many preceding words before that.

Is this simply ignored as a simplifying assumptions because HMMs are very good at correctly modeling speech detection problems, or am I not clearly understanding what the states and hidden variables in the process are? The same problem would appear to apply to a great deal of applications in which HMMs are quite popular, POS tagging, and so forth.

###### Asked By : sooniln

#### Answered By : Nikolay Shmyrev

On that subject I recommend you to read a very good paper by James Baker and others who were actually responsible for introduction of HMM in speech:

A Historical Perspective of Speech Recognition http://cacm.acm.org/magazines/2014/1/170863-a-historical-perspective-of-speech-recognition/abstract

Using Markov models to represent language knowledge was controversial. Linguists knew no natural language could be represented even by context-free grammar, much less by a finite state grammar. Similarly, artificial intelligence experts were more doubtful that a model as simple as a Markov process would be useful for representing the higher-level knowledge sources recommended in the Newell report. However, there is a fundamental difference between assuming that lan- guage itself is a Markov process and modeling language as a probabilistic function of a hidden Markov process. The latter model is an approximation method that does not make an assumption about language, but rather provides a prescription to the designer in choosing what to represent in the hidden process. The definitive property of a Markov process is that, given the current state, probabilities of future events will be independent of any additional information about the past history of the process. This property means if there is any information about the past history of the ob- served process (such as the observed words and sub-word units), then the designer should encode that information with distinct states in the hidden process. It turned out that each of the levels of the Newell hierarchy could be represented as a probabilistic function of a hidden Markov process to a reasonable level of approximation. For today's state-of-the-art language modeling, most systems still use the statistical N-gram language models and the variants, trained with the basic counting or EM-style techniques. These models have proved remarkably powerful and resilient. However, the N-gram is a highly simplistic model for realistic human language. In a similar manner with deep learning for significantly improving acoustic modeling quality, recurrent neural networks have also significantly improved the N-gram language model. It is worth noting that nothing beats a massive text corpora matching the application domain for most real speech applications.

Overall, the Markov model is pretty generic model for decoding black-box channel with very relaxed assumption on the transmission thus it is a perfect fit for the speech recognition, however, the question remains what to encode as a state indeed. It is clear that states should be more complex objects than what we assume now (just few preceding words). It is ongoing research to reveal true nature of such structure.

###### Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/37709

**3.2K people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback