**Problem Detail:**

The LazySelect algorithm is given in these slides as follows.

We have a set $S$ of $n = 2k$ distinct numbers and want to find the $k$th smallest element.

- Let $R$ be a set of $n^{3/4}$ elements chosen uniformly at random with replacement from $S$.
- Sort $R$ and find $a$ and $b$ such that $\mathrm{rank}_R(a) = kn^{-1/4} – sqrt(n)$ and $\mathrm{rank}_R(b) = kn^{-1/4} + sqrt(n)$, where $\mathrm{rank}_X(x) = t$ if $x$ is the $t$th smallest element in $X$.
- Compute $\mathrm{rank}_S(a)$ and $\mathrm{rank}_S(b)$: Output
`FAIL`

if $k < \mathrm{rank}_S(a)$ or $k > \mathrm{rank}_S(b)$. - Let $P = \{i \in S\mid a\le y\le b\}$: Output
`FAIL`

if $|P| \ge 4n^{3/4}$. - Return the $(k - \mathrm{rank}_S(a) + 1)$th smallest element from $P$.

Can someone explain the intuition behind the algorithm in a way that is more detailed and easier to understand than the slides above?

###### Asked By : Alex G.

#### Answered By : Yuval Filmus

$\DeclareMathOperator{\rank}{rank}$Given a set $S$ of $n = 2k$ elements, the algorithm is aimed at finding the median of $S$ in linear time, with high probability. The idea is to find two elements $a,b \in S$ with $\rank(a) \leq k \leq \rank(b)$ and $\Delta = \rank(b) - \rank(a)$ as small as possible. Given such elements, we can find the median in time $O(n + \Delta\log\Delta)$ as follows:

- Determine $\rank(a)$ in time $O(n)$.
- Compute the list $P = \{ x \in S : a \leq x \leq b \}$ in time $O(n)$.
- Sort $P$ in time $O(\Delta\log\Delta)$.
- Output the element $x \in P$ of rank $\rank_P(x) = k - \rank(a)$ in time $O(1)$.

How do we find such elements $a,b$? The idea is to sample a smaller list $R$ which contains a $p$ fraction of the elements of $S$, for suitable $p$. An element $x \in R$ of rank $r$ in $S$ will have rank in the range $pr \pm c\sqrt{pr}$ in $R$ with probability $1-e^{-O(c^2)}$ (due to a Chernoff bound). In particular, if we the elements $a,b$ ranked $pk - c\sqrt{pk}$ and $pk + c\sqrt{pk}$ in $R$, respectively, then $\rank(a) \leq k \leq \rank(b)$ with probability $1 - e^{-O(c^2)}$. The size of the corresponding list $P$ will be roughly $$ \Delta \approx p^{-1} (\rank_R(b) - \rank_R(a)) = 2c\sqrt{n/p}. $$ We can find the elements $a,b$ by sorting the list $R$, which takes time $O(pn \log(pn))$. In total, the algorithm looks like this:

- Sample a set $R$ by putting each element with probability $p$.
- Sort $R$ in time $O(pn \log n)$.
- Find the elements $a,b$ ranked $pk-c\sqrt{pk}$ and $pk+c\sqrt{pk}$ in $R$.
- Proceed as before.

The total running time is $$ O(pn\log n + n + c\sqrt{n/p}\log n). $$ In order for this to be $O(n)$, we need $$ \Omega(c^2 \log^2 n/n) \leq p \leq O(1/\log n). $$ For the algorithm to succeed with high probability $1 - o(1)$, we need $c = \omega(1)$. One setting which satisfies all these constraints is $p = n^{-1/4}$ and $c = n^{1/8}$, the choice taken in the slides.

###### Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/27685

**3.2K people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback