Cheap and Secure Web Hosting Provider : See Now

# [Solved]: Can someone explain LazySelect?

, ,
Problem Detail:

The LazySelect algorithm is given in these slides as follows.

We have a set $S$ of $n = 2k$ distinct numbers and want to find the $k$th smallest element.

1. Let $R$ be a set of $n^{3/4}$ elements chosen uniformly at random with replacement from $S$.
2. Sort $R$ and find $a$ and $b$ such that $\mathrm{rank}_R(a) = kn^{-1/4} – sqrt(n)$ and $\mathrm{rank}_R(b) = kn^{-1/4} + sqrt(n)$, where $\mathrm{rank}_X(x) = t$ if $x$ is the $t$th smallest element in $X$.
3. Compute $\mathrm{rank}_S(a)$ and $\mathrm{rank}_S(b)$: Output FAIL if $k < \mathrm{rank}_S(a)$ or $k > \mathrm{rank}_S(b)$.
4. Let $P = \{i \in S\mid a\le y\le b\}$: Output FAIL if $|P| \ge 4n^{3/4}$.
5. Return the $(k - \mathrm{rank}_S(a) + 1)$th smallest element from $P$.

Can someone explain the intuition behind the algorithm in a way that is more detailed and easier to understand than the slides above?

#### Answered By : Yuval Filmus

$\DeclareMathOperator{\rank}{rank}$Given a set $S$ of $n = 2k$ elements, the algorithm is aimed at finding the median of $S$ in linear time, with high probability. The idea is to find two elements $a,b \in S$ with $\rank(a) \leq k \leq \rank(b)$ and $\Delta = \rank(b) - \rank(a)$ as small as possible. Given such elements, we can find the median in time $O(n + \Delta\log\Delta)$ as follows:

1. Determine $\rank(a)$ in time $O(n)$.
2. Compute the list $P = \{ x \in S : a \leq x \leq b \}$ in time $O(n)$.
3. Sort $P$ in time $O(\Delta\log\Delta)$.
4. Output the element $x \in P$ of rank $\rank_P(x) = k - \rank(a)$ in time $O(1)$.

How do we find such elements $a,b$? The idea is to sample a smaller list $R$ which contains a $p$ fraction of the elements of $S$, for suitable $p$. An element $x \in R$ of rank $r$ in $S$ will have rank in the range $pr \pm c\sqrt{pr}$ in $R$ with probability $1-e^{-O(c^2)}$ (due to a Chernoff bound). In particular, if we the elements $a,b$ ranked $pk - c\sqrt{pk}$ and $pk + c\sqrt{pk}$ in $R$, respectively, then $\rank(a) \leq k \leq \rank(b)$ with probability $1 - e^{-O(c^2)}$. The size of the corresponding list $P$ will be roughly $$\Delta \approx p^{-1} (\rank_R(b) - \rank_R(a)) = 2c\sqrt{n/p}.$$ We can find the elements $a,b$ by sorting the list $R$, which takes time $O(pn \log(pn))$. In total, the algorithm looks like this:

1. Sample a set $R$ by putting each element with probability $p$.
2. Sort $R$ in time $O(pn \log n)$.
3. Find the elements $a,b$ ranked $pk-c\sqrt{pk}$ and $pk+c\sqrt{pk}$ in $R$.
4. Proceed as before.

The total running time is $$O(pn\log n + n + c\sqrt{n/p}\log n).$$ In order for this to be $O(n)$, we need $$\Omega(c^2 \log^2 n/n) \leq p \leq O(1/\log n).$$ For the algorithm to succeed with high probability $1 - o(1)$, we need $c = \omega(1)$. One setting which satisfies all these constraints is $p = n^{-1/4}$ and $c = n^{1/8}$, the choice taken in the slides.