Cheap and Secure Web Hosting Provider : See Now

# [Solved]: solving large nonlinear systems in parallel

, ,
Problem Detail:

I am solving a large (~1e5 equations & unknowns) set of nonlinear equations using Newton-Raphson iterations. Currently I am using the GPU accelerated Krylov methods implemented in ViennaCL to solve the linear system to get the update increment. I am solving the system on a single 24 core, 64GB workstation with a NVIDIA Quadro K4000 GPU. However as the number of unknowns exceeds 1e5 and/or the Jacobian matrix gets more dense, there is not enough memory on the GPU to fit the compressed Jacobian matrix on the device. Using the CPU cores allows the Jacobian matrix to fit into memory, however the compute time is very long.

I would like to use a cluster of the above mentioned workstations to solve either the nonlinear system or the linear update increment system, however I am not sure how to go about decomposing the system of equations into pieces that can be tackled via a distributed memory approach. The equations include radiative transfer, and therefore it is not easy to decompose the domain geometrically as the Jacobian matrix is dense.

Does anyone know about distributed memory approaches to solving large dense linear equation systems?

All help is greatly appreciated!

PS I have looked into the parallelised nonlinear solvers implemented in PETSc, however this is a clunky c library and one must write there entire problem in terms of the PETSc interface which I would like to avoid if possible. I would be more interested in understanding the details as to how a distributed memory parallelised nonlinear or linear solver works...

#### Answered By : Nick Alger

The main point of Krylov-Newton is that it does not require computing and storing the whole Jacobian matrix. It only requires the ability to apply the matrix to a vector.

The stardard approach is to do a little bit of pencil and paper work to figure out how to express the action of the derivative without actually computing and storing the whole matrix. If this is not possible, one can try to figure out a compressed representation of the matrix.

For problems where the matrices are dense, but the long range interactions are low rank (I think many radiative transfer problems would fall in this category..?), fast multipole or H-matrix type methods tend to be very effective.

It's difficult to give further advice without knowing more specifics of the problem.