Cheap and Secure Web Hosting Provider : See Now

# [Answers] What is the average-case complexity of trial division?

, ,
Problem Detail:

The trial division algorithm for checking if a number $N$ is prime works by trying to divide $N$ by all integers in the range 2, 3, ..., $\lfloor \sqrt{n} \rfloor$. If any of them cleanly divide $N$, then we say that $N$ is composite; otherwise we say that $N$ is prime.

In the worst case, this requires $O(\sqrt{N})$ divisions to be made. However, this only happens in the case where $N$ is prime. If $N$ is a composite number, then we might stop much earlier than this.

I'm curious about the average-case complexity of this algorithm, assuming that the input is a natural number chosen uniformly at random. Since there's no way to choose an natural number uniformly at random, I'm curious about the asymptotic growth rate of

$$\mathbb{E}[T(X_n)]$$

where $X_n$ is a random variable that takes on a value from $\{2, 3, 4, 5, ..., n\}$ uniformly at random and $T(N)$ denotes the number of divisions made by the trial division algorithm when run on the number $N$. Since the number of divisions made is equal to the smallest prime factor of the number in question, and since the number is chosen uniformly at random, this expression is equivalent to

$$\frac{\sum_{i=2}^n \xi(i)}{n-1}$$

where $\xi(i)$ denotes the smallest prime factor of $i$.

I have some thoughts about what this might look like, but I'm not sure how to formalize them. First, there's the observation that about half the integers in $\{2, 3, 4, 5, ..., n\}$ are divisible by two, about a third are divisible by three, etc., so for many numbers very few divisions are required. Second, there's the prime number theorem that gives a good asymptotic estimate for the density of the prime numbers. However, I'm having trouble seeing how to combine all this information together.

Does anyone know how to determine the asymptotic growth rate of this expression?

By the prime number theorem, about a $1/\log(n)$ fraction of numbers in the range $[n/2,n]$ are prime. We know that the algorithm will take $\Theta(\sqrt{n})$ time for each of them (since for all $x \in [n/2,n]$, we have $x = \Theta(\sqrt{n})$). Therefore,

$$\mathbb{E}[T(X_n)] = \Omega\left({\sqrt{n} \over \log n}\right).$$

This is enough to establish that trial division is inefficient. Also, since we have the upper bound

$$\mathbb{E}[T(X_n)] = O(\sqrt{n}),$$

we get a pair of upper and lower bounds that are almost matching.

It turns out one can show that the correct running time is

$$\mathbb{E}[T(X_n)] = \Theta\left({\sqrt{n} \over \log n}\right).$$

I'll show the argument below. It's a bit detailed, so brace yourself.

There's an error in your post. You claim that

$$\mathbb{E}[T(X_n)] = {\sum_{i=2}^n \xi(i) \over n-1},$$

but that is not correct. The running time on input $i$ is not $\xi(i)$; the running time is $\min(\xi(i),\sqrt{i})$, since trial division aborts early as soon as it has tried all divisors $\le \sqrt{i}$. Therefore, your expression for the running time is not correct.

Yuval Filmus found that your expression was analyzed on Math.SE, and apparently

$${\sum_{i=2}^n \xi(i) \over n-1} \sim {n \over 2 \log n},$$

but unfortunately this doesn't really tell us much about the running time of your trial division, due to the error I mentioned.

If we correct the error, we find

$$\mathbb{E}[T(X_n)] = {\sum_{t<n} \xi(t) \over n-1} + \Theta\left(\sqrt{n} \over \log n\right),$$

where $t$ ranges over all composites less than $n$. Now we can upper-bound the first term as follows:

\begin{align*} \sum_{t < n} \xi(t) &\le \sum_{p < \sqrt{n}} \left\lfloor {n \over p} \right\rfloor \times p\\ &\le \sum_{p < \sqrt{n}} n\\ &= n \times \#\{p : p < \sqrt{n}, p \text{ prime}\}\\ &= n \times O\left({\sqrt{n} \over \log n}\right). \end{align*}

where the sum is over all primes $p$ that are at most $\sqrt{n}$, and using the prime number theorem in the last step to note that there are $O(\sqrt{n}/\log(\sqrt{n})) = O(\sqrt{n}/\log(n))$ many such primes $p$.

It follows that

$$\mathbb{E}[T(X_n)] = O\left(\sqrt{n} \over \log n\right),$$

so the simple bound given at the start of the answer was tight.