Cheap and Secure Web Hosting Provider : See Now

[Solved]: Sharp concentration for selection via random partitioning?

, , No Comments
Problem Detail: 

The usual simple algorithm for finding the median element in an array $A$ of $n$ numbers is:

  • Sample $n^{3/4}$ elements from $A$ with replacement into $B$
  • Sort $B$ and find the rank $|B|\pm \sqrt{n}$ elements $l$ and $r$ of $B$
  • Check that $l$ and $r$ are on opposite sides of the median of $A$ and that there are at most $C\sqrt{n}$ elements in $A$ between $l$ and $r$ for some appropriate constant $C > 0$. Fail if this doesn't happen.
  • Otherwise, find the median by sorting the elements of $A$ between $l$ and $r$

It's not hard to see that this runs in linear time and that it succeeds with high probability. (All the bad events are large deviations away from the expectation of a binomial.)

An alternate algorithm for the same problem, which is more natural to teach to students who have seen quick sort is the one described here: Randomized Selection

It is also easy to see that this one has linear expected running time: say that a "round" is a sequence of recursive calls that ends when one gives a 1/4-3/4 split, and then observe that the expected length of a round is at most 2. (In the first draw of a round, the probability of getting a good split is 1/2 and then after actually increases, as the algorithm was described so round length is dominated by a geometric random variable.)

So now the question:

Is it possible to show that randomized selection runs in linear time with high probability?

We have $O(\log n)$ rounds, and each round has length at least $k$ with probability at most $2^{-k+1}$, so a union bound gives that the running time is $O(n\log\log n)$ with probability $1-1/O(\log n)$.

This is kind of unsatisfying, but is it actually the truth?

Asked By : Louis

Answered By : Yuval Filmus

It's not true that the algorithm runs in linear time with high probability. Considering only the first round, the running time is at least $\Theta(n)$ times a $G(1/2)$ random variable. Let $p(n) \longrightarrow 0$ be the allowed failure probability. Since $\Pr[G(1/2) \geq \log_2 p(n)^{-1}] = p(n)$, the running time is at least $\Omega(n \log_2 p(n)^{-1}) = \omega(n)$.

(There is some cheating involved, since the length of the first round isn't really $G(1/2)$. More careful analysis might or might not validate this answer.)

Edit: Grübel and Rosler proved that the expected number of comparisons divided by $n$ tends (in some sense) to some limit distribution, which is unbounded. See for example Grübel's paper "Hoare's selection algorithm: a Markov chain approach", which references their original paper.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback