Cheap and Secure Web Hosting Provider : See Now

[Solved]: Non-Parametric Methods Like K-Nearest-Neighbours in High Dimensional Feature Space

, , No Comments
Problem Detail: 

The main idea of k-Nearest-Neighbour takes into account the $k$ nearest points and decides the classification of the data by majority vote. If so, then it should not have problems in higher dimensional data because methods like locality sensitive hashing can efficiently find nearest neighbours.

In addition, feature selection with Bayesian networks can reduce the dimension of data and make learning easier.

However, this review paper by John Lafferty in statistical learning points out that non-parametric learning in high dimensional feature spaces is still a challenge and unsolved.

What is going wrong?

Asked By : Strin

Answered By : Nick

This problem is known as the curse of dimensionality. Basically, as you increase the number of dimensions, $d$, points in the space generally tend to become far from all other points. This makes partitioning the space (such as is necessary for classification or clustering) very difficult.

You can see this for yourself very easily. I generated $50$ random $d$-dimensional points in the unit hypercube at 20 evenly selected values of $d$ from $1..1000$. For each value of $d$ I computed the distance from the first point to all others and took the average of these distances. Plotting this, we can see that average distance is increasing with dimensionality even though the space in which we are generating the points in each dimension remains the same.

Average distance vs. dimensionality

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback