BDD Method - Schur and BDD Methods - A PARALLEL ELLIPTIC PDE SOLVER

7.3 Schur and BDD Methods

7.3.2 BDD Method

The BDD method adds a coarse grid correction, and should therefor be more or less independent of the number of blocks and the block size. For certain model problems is is possible to show, that the condi-tion number grows like^O((1⁺^log(H=h))²⁾[Smith96]. Hence iteration count is almost independent of as well the grid spacing^hand the block size^H.

Independence of Number of Blocks in 1D This first test shows how a coarse grid correction makes a difference. It is Example A.5 without noise, and decomposed in the horizontal direction only.

7.3 Schur and BDD Methods 95

Example A.5 2D with 1D properties.

BC Homogeneous Dirichlet at east and west, homoge-neous Neumann elsewhere.

Forcing Constant.

Discr. ⁶⁴¹⁶cells, regular grid.

Block dec. ^k¹blocks Results Table 7.10.

k1 # it. cpu time

21 1 1.07

41 1 1.27

81 1 0.86

161 1 0.64 Table 7.10:2D case

Since the BCs have no variation in the 2nd dimension and further-more the forcing over the entire domain is constant (no noise), there will be no variation in the second dimension. It will behave as a 1D problem. Results in Table 7.10 show that for this 2D case with 1D properties, convergence is achieved in one iteration for any number of blocks.

Checkerboard Decomposition The next test will apply a checker-board decomposition on Example A.1 in 2D.

Example A.1 in 2D.

BC Dirichlet at south, Neumann elsewhere.

Forcing Random.

Discr. ⁴⁸⁴⁸cells, regular grid.

Block dec. ^k^kblocks, checkerboard pattern.

Results Table 7.11.

The results in Table 7.11 verify the independence of block number and block size. Even timings stay constant, or decrease, at least in the beginning. Timings is commented in more detail later.

96 Computational Results

kk # it. cpu time

22 5 7.4

33 15 15.0

44 15 13.3

66 17 12.8

88 17 14.2

1212 16 20.5

1616 15 36.6 Table 7.11:2D case

Decomposing Only in One Directions This test is alike the first test of the BDD method, apart from noise is added to the right hand side, giving 2D variation.

Example A.5 in 2D.

BC Dirichlet at east and west, homogeneous Neumann elsewhere.

Forcing Random.

Discr. ⁹⁶²⁴cells, regular grid.

Block dec. ^k¹blocks Results Table 7.12.

k1 # it. cpu time

21 1 1.8

41 3 5.1

81 6 6.6

161 11 8.7

321 33 30

481 52 44 Table 7.12:2D case

Table 7.12 shows that it is not sufficient to decompose in just one direction.

7.3 Schur and BDD Methods 97

However if we decompose evenly in both direction:

Example A.5 in 2D.

BC Dirichlet at east and west, homogeneous Neumann elsewhere.

Forcing Random.

Discr. ⁹⁶²⁴cells, regular grid.

Block dec. ^4k^kblocks Results Table 7.13.

4kk # it. cpu time

41 3 5.0

82 11 10.5

164 16 30

328 14 41

4812 12 84 Table 7.13:2D case

Then Table 7.13 reveal that iteration count again is independent of the block decomposition, as for the checkerboard test earlier.

Note especially the cpu timings in Table 7.13. They are increasing quite heavily as the number of blocks grows. The same is the case for the checkerboard decomposition results in Table 7.11.

Consider the last test in Table 7.13: The domain is discretized in

9624=2304cells. These are decomposed into⁴⁸¹²⁼⁵⁷⁶blocks having each²² ⁼ ⁴cells. The decomposition produces in total 1092 block to block interfaces², which each add 2 shadow variables to the number of unknowns, in total 2184 extra unknowns. Hence there are about as many shadow variables as interior variables. The system has grown to double size, and the extra time needed to calculate the solution to these extra variables has become significant. Furthermore Matlab must take care of many small matrices, and many loops grows big, hence there is most likely some administrative overhead included in order to handle that many blocks.

4712+4811=1092

98 Computational Results Blocks With High Aspect Ratio Table 7.12 shows that it is not suf-ficient to decompose in just one direction, but what if only one block has a high aspect ratio?

Example A.5 in 3D.

BC Dirichlet at east and west, homogeneous Neumann elsewhere.

Forcing Random.

Discr. ¹²⁸⁵³cells, regular grid.

Block dec. ³¹¹blocks, the middle block width is decreased, its width compared to the total width is given as as-pect ratio.

Results Table 7.14.

Aspect ratio # it. cpu time

1:3 2 3.1

1:5 2 3.1

1:7 3 4.3

1:13 5 6.9

1:21 6 7.9

1:31 8 10.1

1:65 10 12.1

1:128 12 15.8 Table 7.14:2D case

Table 7.14 shows an increase in the iteration count as the aspect ra-tio gets worse. However the iterara-tion count do not increase as heavily as in Table 7.12, where also the aspect ratio gets worse as more blocks are used.

Other Examples

The last two examples are provided to show that the method works for more general grids and decompositions than a rectangular grid using a²¹or²²block decomposition.

7.3 Schur and BDD Methods 99

The first example is the²²²block decomposition, which we have not yet found a way to solve using the DN method.

Example A.1 and A.2 in 3D.

BC Dirichlet at front, homogeneous Neumann elsewhere.

Forcing Random.

Discr. ¹⁰¹⁰¹⁰cells, regular grid.

Block dec. ²²²blocks Results Table 7.15.

Example # it. cpu time

A.1 13 9.7

A.2 15 9.0

Table 7.15:²²²block decomposition

Table 7.15 shows that the iteration counts compare with any of the other tests so far, on as well a regular as a not so regular grid.

The last test is an example of how a grid is typically created close to a corner.

Example A.7 in 3D.

BC Dirichlet at east boundary of top right block, inhomo-geneous Neumann on west boundary of top left, ho-mogeneous Neumann elsewhere.

Forcing Random.

Discr. In 3 block setup: Top left block: ⁶¹⁸⁴cells, top right block:¹⁸¹⁸⁴cells, bottom block¹⁸⁶⁴ cells, irregular orthogonal grid.

Block dec. ^k⁺¹¹blocks Results Table 7.16.

Table 7.16 again compare with iteration count for the other tests.

Note however that the block decomposition is only refined in one di-rection, hence the iteration count grows with the number of blocks.

100 Computational Results

k # it. cpu time

3 17 67.4

5 18 42.6

7 22 35.5

Table 7.16:2D case

7.3 Schur and BDD Methods 101

C

H A P T E R

8 Summary

This project is in some sense far from over. It have been like a journey with a given destination, but without map showing the paths.

During the process many paths have been visited. Most have been turned down. Maybe because they ended blind, but usually we turned around before we knew were it lead; at the turning point it was eval-uated as another dead end, or another long way round. Therefor a lot of paths are left relatively unexplored.

Also the destination has not yet been reached, NS3 is not yet par-allel.

The objective of this chapter is to summarize what have been ac-complished so far. I will relate it to articles on the subject if I have found any. Also I will outline subjects which need further investiga-tion.

Subjects for further investigation split into two categories: The work to be done to implement a parallel Poisson solver in NS3, and work which from another point of view may be interesting. The for-mer will be presented in a separate section, while the latter is ad-dressed as a part of the specific subject.

104 Summary

8.1 Matlab

There have been several problems with Matlab, mainly with the Mat-lab backslash operator used to solve a system^Au⁼^f. The problems arise when the system matrix is singular, and the system is consistent.

In this case, there is an infinity of solutions.

A singular matrix represented in Matlab is usually due to rounding errors only close to singular.

In the process of calculating the solution to the singular system, the backslash operator might encounter a zero, which is used in a di-vision. The division by zero produces aInfvalue, Matlabs represen-tation of infinity. TheInfvalue, when encountered in other expres-sions, will produce again eitherInforNaN, “not a number”, and the result returned from the operator is usually a vector ofNaNvalues.

The method has broken down. The break down of the method does not depend on whether the right hand side is consistent or not.

However, if not a strict zero is encountered due to the rounding errors, the backslash operator has no problem in finding one of the solutions to the system. This difference in behavior from a break down case to a near break down case seems not well justified.

In case of a consistent system, a Krylov subspace method can be used, and it will find a solution. However, for the systems considered here, the backslash operator usually out-competes a Krylov subspace method in the time used to find a solution, and is therefor preferred.

One solution to this dilemma is simply to add noise of very small magnitude (close to machine precision) to the diagonal of the system matrix, just enough to make sure that the backslash operator never encounters a strict zero. This will usually only alter the solution at the same magnitude, and will usually be acceptable. Another solution is to examine the result from the backslash operator, and if it contains NaNvalues, then try again with a Krylov subspace method. Both are not very nice brute force solutions, but they work.

As for singular inconsistent system, there exist no solution to the system. The backslash operator produces a result of order^1=eps,^eps being the machine precision of Matlabs double precision arithmetic.

In document A PARALLEL ELLIPTIC PDE SOLVER (Sider 53-58)