**Problem Detail:**

The Von Neumann entropy $S$ of a density matrix $\rho$ is defined to be $S(\rho)= -\text{tr}(\rho \lg \rho)$. Equivalently, $S$ is the classical entropy of the eigenvalues $\lambda_k$ treated as probabilities. So $S(\rho) = -\sum_k \lambda_k \lg \lambda_k$.

Clearly the Von Neumann entropy can be computed by first extracting the eigenvalues and then doing the sum. But extracting eigenvalues is quite expensive. **Is there a faster way to compute the entropy?**

*(For example, computing the sum of the eigenvalues is much easier than computing the individual eigenvalues. You just return the trace of the matrix. The product of the eigenvalues, the determinant, is also easier to compute than the individual eigenvalues. Is there also a shortcut for the entropy?)*

###### Asked By : Craig Gidney

#### Answered By : Craig Gidney

A paper Computing the Entropy of a Large Matrix by Thomas P. Wihler, Bänz Bessire, André Stefanov suggests approximating $x \lg x$ with a polynomial. Then you can use the trace of powers of the matrix to sum the results of applying that polynomial to each of the eigenvalues.

(The polynomial-via-power-and-trace thing works because density matrices have orthogonal eigenvectors. Raising the matrix to a power will raise its eigenvalues to that power. Then the trace gives you the sum of the eigenvalues at that power.)

Sounds fun. Not going to let the paper spoil it.

The derivatives of $f(x) = x \ln x$ are:

$$\begin{align} f^0(x) &= x \ln x\\ f^1(x) &= 1 + \ln x\\ f^2(x) &= x^{-1} \\ f^3(x) &= -x^{-2}\\ f^4(x) &= 2 x^{-3}\\\ f^5(x) &= -6 x^{-4}\\\ ...\\ f^k(x) &= (-1)^k \cdot (k-2)! \cdot x^{1-k} \end{align}$$

The derivatives don't always exist at $x=0$, but they do at $x=1$, so we can make a Taylor series from there:

$$\begin{align} x \ln x &= \sum_{k=0}^\infty \frac{(x-1)^k}{k!} f^k(1) \\ &= (1 \ln 1)\frac{(x-1)^0}{0!} + (1 + \ln 1)\frac{(x-1)^1}{1!} + \sum_{k=2}^\infty \frac{(x-1)^k}{k!} (-1)^k 1^{1-k}(k-2)! \\ &= x - 1 + \sum_{k=2}^\infty \frac{(1-x)^k}{k(k-1)} \end{align}$$

So now we have a nice series $\frac{(1-x)^1}{-1} + \frac{(1-x)^2}{2} + \frac{(1-x)^3}{6} + \frac{(1-x)^4}{12} + \frac{(1-x)^5}{24} + ...$ that we can approximate by cutting off at some index $c$. Except that would converge like $O(c^{-1})$ when $x=0$, so we should probably try to account for the excess when the numerator is stuck at $1$:

$\begin{align} x \ln x &= x - 1 + \sum_{k=2}^\infty \frac{(1-x)^k}{k(k-1)} \\ &= x - 1 + \sum_{k=2}^{c-1} \frac{(1-x)^k}{k(k-1)} + \sum_{k=c}^\infty \frac{(1-x)^k}{k(k-1)} \\ &\approx x - 1 + \sum_{k=2}^{c-1} \frac{(1-x)^k}{k(k-1)} + \sum_{k=c}^\infty \frac{(1-x)^{\Huge c}}{k(k-1)} \\ &= x - 1 + \left( \sum_{k=2}^c \frac{(1-x)^k}{k(k-1)} \right) + \frac{(1-x)^c}{c-1} \end{align}$

Here's a graph of the difference between $x \ln x$ and the approximation when $c=10$:

And voila, just write code to evaluate that polynomial over the eigenvalues by using traces of the matrix powers:

`import math import numpy def von_neumann_entropy(density_matrix, cutoff=10): x = numpy.mat(density_matrix) one = numpy.identity(x.shape[0]) base = one - x power = base*base result = numpy.trace(base) for k in range(2, cutoff): result -= numpy.trace(power) / (k*k - k) power = power.dot(base) result -= numpy.trace(power) / (cutoff - 1) return result / math.log(2) # convert from nats to bits `

The convergence is still kinda terrible near $x \approx 0$, though. I tried replacing the last correction with some doubling steps like this:

`import math import numpy def von_neumann_entropy(density_matrix, cutoff=10): x = numpy.mat(density_matrix) one = numpy.identity(x.shape[0]) base = one - x power = base*base result = numpy.trace(base) for k in range(2, cutoff): result -= numpy.trace(power) / (k*k - k) power = power.dot(base) # Twiddly hacky magic. a = cutoff for k in range(3): d = (a+1) / (4*a*(a-1)) result -= numpy.trace(power) * d power = power.dot(power) result -= numpy.trace(power) * d a *= 2 result -= numpy.trace(power) / (a-1) * 0.75 return result / math.log(2) # convert from nats to bits `

And that improved things a bit more, so that the maximum offset vs $-x \ln x$ across the [0, 1] range was $3.5 \cdot 10^{-3}$ instead of $2.9 \cdot 10^{-2}$ (for the given cutoff of 10).

(The plot is reversed.)

**Of course, the paper I linked at the start of the answer probably has a much better polynomial.**

For example, my error analysis should have been in terms of the *total* entropy instead of the individual value. And I should have used tricks like computing the trace of $AB$ from $A$ and $B$ in $O(n^2)$ time without doing the multiplication. And I should have thrown some simulated annealing at the coefficients.

###### Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/56261

**3.2K people like this**

## 0 comments:

## Post a Comment

Let us know your responses and feedback