Escolar Documentos
Profissional Documentos
Cultura Documentos
SUMMARY
The repeated solution in time of the linear system arising from the finite element integration of coupled
consolidation equations is a major computational effort. This system can be written in either a symmetric
or an unsymmetric form, thus calling for the implementation of different preconditioners and Krylov
subspace solvers. The present paper aims at investigating when either a symmetric or an unsymmetric
approach should be better used. The results from a number of representative numerical experiments
indicate that a major role in selecting either form is played by the preconditioner rather than by the Krylov
subspace method itself. Two other important issues addressed are the size of the time integration step and
the possible lumping of the flow capacity matrix. It appears that ad hoc block constrained preconditioners
provide the most robust algorithm independently of the time step size, lumping, and symmetry. Copyright
q 2008 John Wiley & Sons, Ltd.
1. INTRODUCTION
In the finite element (FE) analysis of large-scale consolidation problems, the repeated solution
in time to the linear system of discretized equations is the most challenging and time-consuming
computational effort. This holds true especially in the initial stages of the consolidation process,
where very small time integration steps are generally used [1]. Therefore, the development of robust
and efficient solvers is of paramount importance for the practical design of FE geomechanical codes.
Three major classes of solvers for coupled consolidation problems can be readily identified:
direct, partitioned, and projective. Although direct solvers are still widely used in practice [24],
they prove inadequate for large-scale 3D problems, e.g. [5, 6], especially because of the great
memory requirement. Partitioned methods are based on uncoupling the overall problem into smaller
Correspondence to: Massimiliano Ferronato, Department of Mathematical Methods and Models for Scientific
Applications, University of Padova, Via Trieste 63, 35121 Padova, Italy.
E-mail: ferronat@dmsa.unipd.it
and simpler problems and iterating between the partial uncoupled solutions until convergence.
Recent schemes in this class are developed, for instance, in References [79]. The main advantage
of a partitioned algorithm stems from the reduced memory requirement, allowing for an easy
implementation also in a common PC environment. Such algorithms, however, are generally
outperformed by projective iterative solvers [10, 11], which are becoming increasingly popular in
large-scale 3D FE analyses [1214]. The most promising methods in this class rely on Krylov
subspaces. Their attractive features include the limited requirement of core memory and the ease of
implementation in both a scalar and a parallel computational environment. Specific algorithms may
be developed according to whether the system is symmetric positive definite, symmetric indefinite,
or generally unsymmetric. For symmetric positive-definite problems, the preconditioned conjugate
gradient (PCG) is by far the most effective solver in several fields, e.g. [1518]. If the system is
symmetric but indefinite, there are a number of options. The most classical and well-established
methods are based on the minimal residual (MR) iteration with the elegant Lanczos three-term
recurrence exploiting the symmetry of the coefficient matrix. This gives rise to the conjugate
residual and the symmetric LQ (SYMMLQ) algorithms [19]. Quite interestingly, PCG has also
been recommended for symmetric indefinite problems, provided that a suitable preconditioner is
used [2022]. One of the most promising Krylov subspace techniques for symmetric indefinite
systems, however, is the symmetric quasi-minimal residual (SQMR) [23], which is an adaptation of
the popular quasi-minimal residual (QMR) method [24] to symmetric problems with the advantage,
as compared with other algorithms such as SYMMLQ, of allowing for the use of generally
indefinite preconditioners. For unsymmetric systems, the generalized minimum residual (GMRES)
[25], transpose-free quasi-minimal residual (TFQMR) [26] and bi-conjugate gradient stabilized
(Bi-CGSTAB) [27] are the most popular options. Among these, Bi-CGSTAB has proved to be a
robust and efficient algorithm for the solution to large-size 3D FE consolidation problems [10, 13].
The FE analysis of a coupled consolidation problem requires the solution of a sequence of linear
systems, which can have a different form: either symmetric indefinite or unsymmetric indefinite or
unsymmetric positive definite. According to the discrete formulation implemented, one may use
either SQMR or Bi-CGSTAB or MR, or even PCG under special conditions [22]. A somewhat
natural question is which approach is the most convenient one. The answer basically depends on
two issues: (1) the size of the time integration step and (2) the preconditioner implemented into
the selected Krylov subspace scheme. The time step size affects the conditioning of the native
coefficient matrix [1] as it impacts on the relative importance of the structural, flow, and coupling
sub-matrices. The choice of a suitable preconditioner is essential to get a fast convergence whatever
Krylov subspace method is used. The present paper focusses on the role that preconditioning plays
when an FE coupled consolidation model is solved with different Krylov subspace algorithms,
specifically SQMR for the symmetric indefinite form and Bi-CGSTAB for the unsymmetric indef-
inite or positive definite form. These methods are selected on the basis of their robustness and
efficiency experimented with in several applications, e.g. [13, 28]. Three preconditioning strate-
gies are addressed in order to cover and compare the most widespread alternatives presently
implemented into FE coupled consolidation models:
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 407
2. Incomplete triangular factorization. ILU-type and IC-type preconditioners have been devel-
oped and successfully implemented in various fields, e.g. [6, 3133]. In coupled consolidation
problems, they have proved robust and efficient in combination with a proper scaling tech-
nique [34]. A major disadvantage is the relatively large storage requirement.
3. Block preconditioning. This technique exploits the block structure of the coefficient matrix
and is a recent development in the area of preconditioning. The various combinations of
different approximations to the structural and flow sub-matrices allow for the construction of
sophisticated preconditioners, e.g. [21, 22, 35], which may yield a very fast convergence.
The above preconditioners are all experimented with in a large-scale 3D heterogeneous consoli-
dation problem solved with both SQMR and Bi-CGSTAB. It is shown that their performance is
related to the time step size and to the possible lumping of the flow capacity matrix. The numerical
results help enlighten the role of preconditioning relative to the selected Krylov subspace solver and
suggest the most convenient algorithm to be used in the solution of different practical problems.
Soil consolidation is a well-known process where the soil skeleton interacts with the pore water
through Terzaghis effective stress principle. The mathematical description of consolidation in a
3D setting goes back to the theory developed by Biot [36], which couples the stress equilibrium
equations for a porous medium:
( r)T p = b (1)
with a fluid continuity equation:
k
p p tr e = q (2)
where r and e are the effective stress and strain tensors, respectively, p is the pore fluid pressure,
b is the applied load vector, is the Biot coefficient, k is the hydraulic conductivity tensor, is
the porosity, and are the pore fluid specific weight and compressibility, respectively, q is the
prescribed discharge, and is the gradient operator.
Equations (1) and (2) are solved numerically using FE in space and finite differences in time.
Using the Galerkin method of weighted residuals, the global coupled system can be written as
K u Qp = fu
(3)
H p+ Q T u+ P p = f p
where
K= BuT D ep Bu dVe , Q= BuT iN p dVe
e Ve e Ve
k
H= ( N p )T ( N p ) dVe , P = N pT N p dVe
e Ve e Ve
fu = NuT b dVe , f p = N pT q dVe
e Ve e Ve
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
408 M. FERRONATO, G. PINI AND G. GAMBOLATI
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 409
Though the above three formulations are mathematically equivalent, numerically they are not, and
different solvers and preconditioners are to be used.
As is well known, the Krylov subspace K of size associated with matrix A is defined as
K = span{v1 , Av1 , A2 v1 , . . . , A1 v1 }
with v1 N an arbitrary vector such that v1 2 = 1. Generally speaking, a projection iterative
solver seeks an approximate solution x to system (7) in the affine Krylov subspace x0 +K by
prescribing the residual vector r to be orthogonal to another Krylov subspace L through a Petrov
Galerkin condition. The vector v1 is usually set equal to the normalized residual r0 /r0 2 , i.e.
fAx0
v1 = (14)
fAx0 2
with x0 the initial guess solution. Different algorithms can be developed according to the different
selections of L. Efficient methods for generally unsymmetric matrices [38] are obtained with L
equal to
L = span{w1 , AT w1 , (AT )2 w1 , . . . , (AT )1 w1 }
with w1 N another arbitrary vector such that w1 2 = 1 (generally w1 = v1 ). Using the Lanczos
biorthogonalization algorithm [39], two sequences of vectors vi and wi , i = 1, . . . , , spanning K
and L, respectively, are created such that
AV = V+1 T (15)
AT W = W+1 T (16)
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
410 M. FERRONATO, G. PINI AND G. GAMBOLATI
4. PRECONDITIONING STRATEGIES
SQMR and Bi-CGSTAB may be robust and efficient Krylov subspace methods for symmetric
indefinite and generally unsymmetric matrices, respectively. However, their actual performance,
as that of any other Krylov subspace method, is strongly related to the preconditioner used.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 411
with Dr and Dc diagonal matrices. The main advantage stems from the ease of implementation
and application with a very small memory requirement. However, if D1 = (Dr Dc )1 is a poor
approximation of the inverse of A, as is often the case, a correspondingly poor acceleration of
any Krylov subspace method with A can be expected.
The matrices Ai , i = 1, 2, 3, of Equations (8), (10), and (12), respectively, can be factorized as
I 0 K 0 I K 1 B T
Ai = L i Ji U = (22)
Xi I 0 Si 0 I
with
X 1 = B K 1 , S1 = C B K 1 B T
X 2 = B K 1 /t, S2 = (C + B K 1 B T )/t
X 3 = B K 1 , S3 = C + B K 1 B T
A diagonal substitute for Ai1 can be constructed using the diagonal entries of Ji of Equation (22).
As the Schur complement Si is too expensive to be computed explicitly, however, a further
approximation is introduced using diag(K )1 in place of K 1 in the computation of the product
B K 1 B T , thus obtaining a new approximate Si instead of Si . The diagonal preconditioner Di1
for Ai therefore takes on the following form:
1
diag(K ) 0
1
Di = (23)
0 diag( Si )
where is a user-specified parameter. The preconditioner (23) is denoted as generalized Jacobi
(GJ) [29]. The GJ preconditioning has been experimented with in several applications [11, 41] and
has proven effective, provided that an appropriate value is selected.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
412 M. FERRONATO, G. PINI AND G. GAMBOLATI
R1 AC1 Cx = R1 f (24)
The sum in Equation (26) is taken over all the A coefficients ai j = 0. In practice, R and C minimize
the sum of the squares of the logarithms of the scaled matrix coefficients so that the entries of
R1 AC1 should be as near to unity as possible.
The LSL scaling in (24) destroys the symmetry of A because generally ri = ci . Therefore, when
using the symmetric formulation A1 , Equation (24) is modified as follows:
with no concerns for the square root because all entries in R and C are positive (see Equation (25)).
The above preconditioning strategy has proved robust and effective in both theoretical test cases
[34] and real large-scale engineering applications [45]. It will be denoted as LSL-ILLT (symmetric
formulation) or LSL-ILUT (unsymmetric formulations) in the sequel.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 413
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
414 M. FERRONATO, G. PINI AND G. GAMBOLATI
MCP is an advanced and robust preconditioner that may prove computationally very efficient.
However, its implementation is not as straightforward as a diagonal or an ILU preconditioner is.
In particular, the MCP construction requires the selection of four user-specified parameters:
1. the fill-in degree
K of L K , i.e. the number of non-zeroes in each L K row stored in excess
to the non-zeroes of the corresponding row of K ;
2. the tolerance A for the AINV algorithm, i.e. the fraction of the diagonal term in a row below
which an extra-diagonal coefficient of Z is dropped from the same row;
3. the fill-in degree
S of L S , i.e. the number of non-zeroes in each L S row stored in excess to
the non-zeroes of the corresponding row of S ;
4. the tolerance S for the S computation, i.e. the fraction of the diagonal term of a row below
which an extra-diagonal coefficient of the same row is dropped.
To use the tolerance S is not strictly required. However, it is recommended in practice as S can
allow for a significant memory saving in the storage of S , which is typically much less sparse
than C.
5. NUMERICAL RESULTS
The computational performance of the previous solution algorithms is investigated with a real-
istic large-scale consolidation problem dealing with the compaction of a confined aquifer due to
groundwater withdrawal. A vertical cross section of the cylindrical porous volume used as a test
problem is shown in Figure 1. The medium consists of clay incorporating a 1000 m deep and
200 m thick sandy aquifer, with the hydraulic conductivity ksand = 105 m/s and kclay = 108 m/s,
porosity = 0.20, Poissons ratio = 0.25, and Youngs modulus E = 83.33 MPa, corresponding to
a uniaxial vertical compressibility c M = 102 MPa1 . The relatively large c M combined with the
low clay permeability gives rise to a numerically challenging problem because of ill-conditioning
for a wide range of t values [1]. Standard Dirichlet conditions are prescribed, with fixed outer and
bottom boundaries, and zero pore pressure variation on the top and outer surfaces (see Figure 1).
The upper boundary is a traction-free plane. This sample problem is solved using a fully 3D
tetrahedral grid totaling 31 775 nodes and N = 127 100 unknowns (n = 95 325, m = 31 775). The
size, the number of non-zeroes, and the spectral norm of each block are given in Table I. Note
the very large difference between the spectral norm of the structural (K and Q) and flow (H
and P) sub-matrices accounting for the ill-conditioning typically encountered in consolidation
problems.
The test problem considered above is solved for different values of the time step t. As is known
from theory [1], the smaller the t, the more severe the ill-conditioning of A. The iterations are
completed when the following exit test is satisfied:
xx 2
108
x 2
where x = [1, 1, . . . , 1]T is the exact test solution and the right-hand side f is computed as Ax .
GJ, LSL-ILLT/ILUT, and MCP with SQMR and Bi-CGSTAB are compared in terms of both
memory requirement and CPU time to convergence. To provide a measure of the memory occupa-
tion of the preconditioner relative to the coefficient matrix, the following parameters are defined
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 415
p=0
1000 m
aquifer 200 m
3000 m
p=0
sand
clay
4500 m
(K , B, and C are the blocks in Equations (8), (10), and (12) with N the global size of A):
N
GJ = (34)
nnz(K )+nnz(B)+nnz(C)
2N +nnz( L)
ILLT = (35)
nnz(K )+nnz(B)+nnz(C)
2N +nnz( L)+nnz(U )
ILUT = (36)
nnz(K )+nnz(B)+nnz(C)
nnz(L K )+nnz(L S )+nnz(Z )
MCP = (37)
nnz(K )+nnz(B)+nnz(C)
where the function nnz() returns the number of non-zeroes stored for a matrix, i.e. the upper
triangular part only, including the main diagonal, for a symmetric matrix, all the non-zeroes for
an unsymmetric or a rectangular matrix, and L and U are the incomplete triangular factors of
ILUT. The parameter is frequently referred to in the specialized literature as the density of
the preconditioner, i.e. the ratio of the number of non-zeroes stored for the preconditioner to the
number of non-zeroes stored for the coefficient matrix. Note that in the denominators of (34)(37)
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
416 M. FERRONATO, G. PINI AND G. GAMBOLATI
Table II. Memory occupation, preconditioner density, user-defined parameters, number of iterations, and
CPU time to convergence for the symmetric indefinite (A1 ), unsymmetric indefinite (A2 ), and unsymmetric
positive-definite (A3 ) matrices preconditioned with GJ and the optimal value.
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 36.0 36.0 36.0 36.0 36.0 36.0
GJ 0.03 0.03 0.03 0.03 0.03 0.03
5 5
Number of iterations 5729 3347
T p (s) 0.5 Failure Failure 0.5 Failure Failure
Ts (s) 1141.3 708.5
Tt (s) 1141.8 709.0
Note: T p and Ts are the CPU time to compute the preconditioner and to iterate to convergence, respectively,
with Tt = T p + Ts .
nnz(B) is taken into account just once because it is not necessary to store the non-zeroes of B T
and that the term 2N appearing in the numerators of (35) and (36) is due to the storage of R
and C of Equation (25). All the numerical experiments have been performed on a scalar computer
equipped with an AMD Athlon(tm) MP 1800+ processor at 1.53 GHz, 2 Gbyte of core memory
and 256 kbyte of secondary cache.
Table II shows the performance of SQMR and Bi-CGSTAB preconditioned with GJ for a
small (101 s) and a large (106 s) t value. The optimal choice for the user-specified parameter
(Equation (23)), estimated empirically, is also provided. As expected, GJ is a very cheap precon-
ditioner requiring almost zero additional memory (GJ
0, see Table II) and CPU time for its
computation. It may allow for a Krylov subspace method to converge; however, in our test problem
this occurs with the symmetric formulation and SQMR only. Bi-CGSTAB does not converge even
for large time step values, where ill-conditioning of the coefficient matrix is less severe. There-
fore, should a diagonal preconditioning, such as GJ, be used, the symmetric approach (8)(9) in
conjunction with SQMR appears to be the most convenient algorithm.
Using LSL-ILLT or LSL-ILUT as a preconditioner, however, a different conclusion can be
achieved, as shown in Table III. The symmetric formulation with SQMR is the most efficient one
for small t values, while Bi-CGSTAB is superior for large t. The positive-definite property of
the global unsymmetric matrix (compare the results of A2 and A3 ) appears to play a very marginal
role. Note that LSL-ILUT or LSL-ILLT is much more expensive than GJ as to both the memory
requirement (up to 4.6 times the memory needed for A in the unsymmetric approach) and the
time for the preconditioner computation. The overall performance, however, is markedly superior
to GJ; therefore, if enough computer resources are available, it is to be decisively preferred.
Table IV shows the Krylov solver performance with MCP for the three variants (31)(33).
In this case, the symmetric algorithm appears to be slightly cheaper for any time step size,
with the differences between SQMR and Bi-CGSTAB not as pronounced as with the previous
preconditioners. The need for setting four user-specified parameters, instead of one or two as is the
case with GJ and LSL-ILUT, respectively, can make the empirical determination of the optimal
preconditioner slightly heavy. However, MCP proves to be very robust to the variation of the
parameter set; therefore, it should not be particularly difficult to find in practice an appropriate
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 417
Table III. The same as Table II using LSL-ILLT or LSL-ILUT and the optimal pair (
, ).
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 108.3 173.4 173.4 93.9 134.4 134.4
ILLT/ILUT 2.44 4.61 4.61 1.96 3.31 3.31
(
, ) (45, 102 ) (45, 102 ) (45, 102 ) (30, 101 ) (30, 101 ) (30, 101 )
Number of iterations 291 204 248 336 179 158
T p (s) 34.3 245.4 244.9 21.7 52.0 52.3
Ts (s) 288.7 217.4 263.1 275.2 157.0 138.4
Tt (s) 323.0 462.8 508.0 296.9 209.0 190.7
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 93.9 93.9 93.9 93.3 93.3 93.3
MCP 1.96 1.96 1.96 1.94 1.94 1.94
(
K , A ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 )
(
S , S ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 )
Number of iterations 143 90 94 139 89 89
T p (s) 33.4 33.7 33.3 31.4 31.3 31.3
Ts (s) 113.1 142.2 146.0 109.8 138.2 139.4
Tt (s) 146.5 175.9 179.3 141.2 169.5 170.5
combination. The memory to store MCP is less than twice that required for storing A, thus being
placed in an intermediate situation between diagonal and ILU-type preconditioners. This is mainly
because of the possibility of setting different fill-in degrees for L K and L S . It must also be recalled
that the CPU time T p to compute the preconditioner does not include the time spent to compute
Z and the product B Z Z T B T . In fact, such an effort can be actually performed just once at the
beginning of a transient simulation with its cost quickly made up for in very few time steps [22]. In
this test problem, the CPU time to compute Z and B Z Z T B T was 17.7 s. On the whole, however,
MCP significantly outperforms the other preconditioners.
A more complete comparison between LSL-ILUT/ILLT and MCP vs the time step size is
shown in Figure 2. The Bi-CGSTAB profiles with LSL-ILUT show that the critical time step
tcrit as defined by Ferronato et al. [1] is approximately equal to 104 s. For t<tcrit , SQMR
with LSL-ILLT is always superior to Bi-CGSTAB while the opposite holds true for t>tcrit . By
distinction, note the much more stable behavior exhibited by MCP independently of symmetry,
with SQMR slightly less expensive.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
418 M. FERRONATO, G. PINI AND G. GAMBOLATI
300
200
100
0
1e-01 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06
Time step size t (s)
Figure 2. CPU time Tt (s) vs the time step size t for SQMR and Bi-CGSTAB with
different preconditioners and matrix formulations.
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 36.0 36.0 36.0 36.0 36.0 36.0
GJ 0.03 0.03 0.03 0.03 0.03 0.03
5 5
Number of iterations 3881 3347
T p (s) 0.5 Failure Failure 0.5 Failure Failure
Ts (s) 808.7 701.1
Tt (s) 809.2 701.6
performance and the outcome given above. In particular, we expect to find the most significant
differences with a small t as the relative importance of P vs H grows in the C computation (see
Equation (8), (10), or (12)).
Tables VVII show results similar to those provided in Tables IIIV, respectively, for a lumped P.
Again SQMR with a diagonal preconditioning converges, while Bi-CGSTAB fails. Also note the
improved GJ performance relative to the outcome of Table II. A more complete comparison is
displayed in Figure 3 for different time step sizes. Similarly to Figure 2, the no lumping profile
provides evidence of tcrit
104 s. For t<tcrit , GJ with the lumped P is about 30% faster than
GJ without lumping, while, as expected, the two formulations tend to behave the same way as t
progressively increases.
Table VI shows that ILU-based preconditioners can be very sensitive to lumping, with the number
of iterations and the total CPU time significantly larger than those of Table III for the smallest
t. Quite surprisingly, the best results from Bi-CGSTAB with ILUT are obtained without the
preliminary LSL scaling. Also note for t = 101 s that ILUT is even worse than GJ (see Table V).
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 419
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 106.2 157.2 157.2 93.9 134.4 134.4
ILLT/ILUT 2.37 4.07 4.07 1.96 3.31 3.31
(
, ) (45, 102 ) (45, 102 ) (45, 102 ) (30, 101 ) (30, 101 ) (30, 101 )
Number of iterations 1251 819 866 312 159 161
T p (s) 29.9 33.5 33.6 22.8 54.1 51.3
Ts (s) 793.6 809.8 858.5 260.2 139.6 141.2
Tt (s) 823.5 843.3 892.1 283.0 193.7 192.5
t = 101 s t = 106 s
SQMR Bi-CGSTAB Bi-CGSTAB SQMR Bi-CGSTAB Bi-CGSTAB
with A1 with A2 with A3 with A1 with A2 with A3
Mbyte 93.9 93.9 93.9 93.3 93.3 93.3
MCP 1.96 1.96 1.96 1.94 1.94 1.94
(
K , A ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 ) (10, 101 )
(
S , S ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 ) (0, 104 )
Number of iterations 141 89 99 139 88 88
T p (s) 32.9 33.0 33.3 31.4 31.5 31.5
Ts (s) 115.2 140.6 159.4 110.0 137.2 136.5
Tt (s) 148.1 173.6 192.7 141.4 168.7 168.0
1200
1000
800
CPU time Tt (s)
600
400
0
1e-01 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06
Time step size t (s)
Figure 3. CPU time Tt (s) vs the time step size t for SQMR with GJ and the possible lumping of P.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
420 M. FERRONATO, G. PINI AND G. GAMBOLATI
1000
800
200
0
1e-01 1e+00 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06
Time step size t (s)
For t between 100 and 104 s, i.e. just below the critical time step, SQMR preconditioned with
LSL-ILLT fails to converge or fulfils the termination criterion after more than 15 000 iterations,
thus revealing an unexpected instability. This is not actually related to symmetry or SQMR itself,
as is proved by the satisfactory convergence with GJ, but rather to a poor ILLT.
Finally, MCP behaves practically the same as without lumping with an overall speed-up relative
to ILLT and ILUT growing up to about 5.5 in the most favorable example. With the lumped P
matrix, the symmetric formulation again slightly outperforms the unsymmetric ones. The overall
solution CPU time vs t with LSL-ILUT and MCP is shown in Figure 4, which appears to be
quite similar to Figure 2. The SQMR profile with LSL-ILLT is missing because the algorithm fails
to converge for 100 t104 s.
6. CONCLUSIONS
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 421
problem solved here, SQMR appears to be superior for small time steps (t<tcrit ), whereas
Bi-CGSTAB with an unsymmetric matrix proves better for large time steps (t>tcrit );
block constrained preconditioners are usually significantly better than ILU-based precondi-
tioners independently of symmetry and are less demanding in terms of computer resources.
This is especially true for small t values where ill-conditioning may occur. Their implemen-
tation and use, however, is not straightforward as MCP requires four user-specified parameters
to be preliminarily set in a more or less optimal way. The MCP implementation addressed in
the present paper proves to be quite robust to the parameter selection;
while with ILUT/ILLT the performance of the symmetric formulation also depends on the
time step size, with block preconditioners SQMR is generally slightly superior to Bi-CGSTAB
independently of t. Hence, with MCP the symmetric form is to be usually preferred to the
unsymmetric form;
the positive-definite property of the unsymmetric coefficient matrix is not an important factor.
The previous conclusions can slightly change if a lumped flow capacity matrix is used, as is
often done to avoid numerical instabilities for the transient pore pressure solution. In particular, it
has been observed that:
as far as CPU time is concerned, the diagonal GJ preconditioning with SQMR is about 30%
more performant than with no lumping;
by contrast, the use of a lumped flow capacity matrix may significantly slow down the
convergence with ILU-type preconditioners at small time steps, with GJ becoming possibly
more competitive. The major difference occurs with SQMR, which may even fail to converge;
MCP appears to be insensitive to lumping, thus proving the most robust and efficient precon-
ditioner in any situation.
In summary, the above results show that neither symmetry nor the specific Krylov algorithm
appears to be a decisive factor for the most efficient numerical solution to FE coupled consolidation
equations. Rather, preconditioning is actually the most important issue, with MCP superior to both
GJ and LSL-ILUT/ILLT.
REFERENCES
1. Ferronato M, Gambolati G, Teatini P. Ill-conditioning of finite element poroelasticity equations. International
Journal of Solids and Structures 2001; 38:59956014.
2. Duff IS, Erisman AM, Reid JK. Direct Methods for Sparse Matrices. Clarendon Press: Oxford, 1986.
3. Dongarra JJ, Duff IS, Sorensen DC, van der Vorst HA. Numerical Linear Algebra for High-performance
Computers. SIAM: Philadelphia, PA, 1998.
4. HSL. Harwell Subroutine Library Archive. Aea Technology, Engineering Software, CCLRC, 2004. Available
from: http://www.cse.clrc.ac.uk/nag/hsl.
5. Pini G, Gambolati G, Ferronato M. A comparison of solution methods for finite element Biot consolidation
equations. In Calibration and Reliability in Groundwater Modelling: A Few Steps Closer to Reality, Kovar K,
Hrkal Z (eds). IAHS: Prague, Czech Republic, 2003; 4551. IAHS Publication no. 277.
6. Janna C, Comerlati A, Gambolati G. A comparison of projection and direct solvers for finite elements in
elastostatics. Advances in Engineering Software, submitted.
7. Prevost JH. Partitioned solution procedure for simultaneous integration of coupled-field problems. Communications
in Numerical Methods in Engineering 1997; 13:239247.
8. Golub GH, Wu X, Yuan JY. SOR-like methods for augmented systems. BIT Numerical Mathematics 2001;
41:7185.
9. Cao ZH. Fast Uzawa algorithm for generalized saddle point problems. Applied Numerical Mathematics 2003;
46:157171.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
422 M. FERRONATO, G. PINI AND G. GAMBOLATI
10. Gambolati G, Pini G, Ferronato M. Direct, partitioned and projected solution to finite element consolidation
models. International Journal for Numerical and Analytical Methods in Geomechanics 2002; 26:13711383.
11. Chen X, Phoon KK, Toh KC. Partitioned versus global Krylov subspace iterative methods for FE solution of
3-D Biots problem. Computer Methods in Applied Mechanics and Engineering 2007; 196:27372750.
12. Mroueh H, Shahrour I. Use of sparse iterative methods for the resolution of three-dimensional soil/structure
interaction problems. International Journal for Numerical and Analytical Methods in Geomechanics 1999;
23:19611975.
13. Gambolati G, Pini G, Ferronato M. Numerical performance of projection methods in finite element consolidation
models. International Journal for Numerical and Analytical Methods in Geomechanics 2001; 25:14291447.
14. Lee FH, Phoon KK, Lim KC, Chan SH. Performance of Jacobi preconditioning in Krylov subspace solution of
finite element equations. International Journal for Numerical and Analytical Methods in Geomechanics 2002;
26:341372.
15. Dupont S, Marchal JM. Preconditioned conjugate gradients for solving the transient Boussinesq equations in
three-dimensional geometries. International Journal for Numerical Methods in Fluids 1988; 8:283303.
16. Dickinson JK, Forsyth PA. Preconditioned conjugate gradient methods for three-dimensional linear elasticity.
International Journal for Numerical Methods in Engineering 1994; 37:22112234.
17. Carey GF, Shen Y, McLay RT. Parallel conjugate gradient performance for least-squares finite elements and
transport problems. International Journal for Numerical Methods in Fluids 1998; 28:14211440.
18. Kilic SA, Saied F, Sameh A. Efficient iterative solvers for structural dynamics problems. Computers and Structures
2004; 82:23632375.
19. Paige CC, Saunders MA. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical
Analysis 1975; 12:617629.
20. Rozloznk M, Simoncini V. Krylov subspace methods for saddle point problems with indefinite preconditioning.
SIAM Journal on Matrix Analysis with Applications 2002; 24:368391.
21. Toh KC, Phoon KK, Chan SH. Block preconditioners for symmetric indefinite linear systems. International
Journal for Numerical Methods in Engineering 2004; 60:13611381.
22. Bergamaschi L, Ferronato M, Gambolati G. Novel preconditioners for the iterative solution to FE-discretized
coupled consolidation equations. Computer Methods in Applied Mechanics and Engineering 2007; 196:26472656.
23. Freund RW, Nachtigal NM. A new Krylov-subspace method for symmetric indefinite linear systems. Proceedings
of the 14th IMACS World Congress on Computational and Applied Mathematics, Atlanta, GA, 1994; 12531256.
24. Freund RW, Nachtigal NM. QMR: a quasi-minimal residual method for non-Hermitian linear systems. Numerische
Mathematik 1991; 60:315339.
25. Saad Y, Schultz MH. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems.
SIAM Journal on Scientific and Statistical Computing 1986; 7:856869.
26. Freund RW. A transpose-free quasi-minimal residual algorithm for non-Hermitian linear systems. SIAM Journal
on Scientific Computing 1993; 14:470482.
27. van der Vorst HA. Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric
linear systems. SIAM Journal on Scientific and Statistical Computing 1992; 13:631644.
28. Chan SH, Phoon KK, Lee FH. A modified Jacobi preconditioner for solving ill-conditioned Biots consolidation
equations using symmetric quasi-minimal residual method. International Journal for Numerical and Analytical
Methods in Geomechanics 2001; 25:10011025.
29. Phoon KK, Toh KC, Chan SH, Lee FH. An efficient diagonal preconditioner for finite element solution of Biots
consolidation equations. International Journal for Numerical Methods in Engineering 2002; 55:377400.
30. Chen X, Toh KC, Phoon KK. A modified SSOR preconditioner for sparse symmetric indefinite linear systems
of equations. International Journal for Numerical Methods in Engineering 2006; 65:785807.
31. Saad Y. ILUT: a dual threshold incomplete ILU factorization. Numerical Linear Algebra with Applications 1994;
1:387402.
32. Saint-George P, Warzee G, Notay Y, Beauwens R. Problem-dependent preconditioners for iterative solvers in FE
elastostatics. Computers and Structures 1999; 73:3343.
33. Li N, Saad Y, Chow E. Crout version of ILU for general sparse matrices. SIAM Journal on Scientific Computing
2003; 25:716728.
34. Gambolati G, Pini G, Ferronato M. Scaling improves stability of preconditioned CG-like solvers for FE
consolidation equations. International Journal for Numerical and Analytical Methods in Geomechanics 2003;
27:10431056.
35. Bergamaschi L, Ferronato M, Gambolati G. Mixed constraint preconditioners for the iterative solution of FE
coupled consolidation equations. Journal of Computational Physics, submitted.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag
THE ROLE OF PRECONDITIONING IN THE SOLUTION 423
36. Biot MA. General theory of three-dimensional consolidation. Journal of Applied Physics 1941; 12:155164.
37. Booker JR, Small JC. An investigation of the stability of numerical solutions of Biots equations of consolidation.
International Journal of Solids and Structures 1975; 11:907917.
38. Saad Y. Iterative Methods for Sparse Linear Systems. SIAM: Philadelphia, PA, 2003.
39. Lanczos C. Solution of systems of linear equations by minimized iterations. Journal of Research of the National
Bureau of Standards 1952; 49:3353.
40. Sonneveld P. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM Journal on Scientific and
Statistical Computing 1989; 10:3652.
41. Phoon KK, Toh KC, Chen X. Block constrained versus generalized Jacobi preconditioners for iterative solution
of large scale Biots FEM equations. Computers and Structures 2004; 82:24012411.
42. Saad Y. SPARSKIT: a basic tool kit for sparse matrix computations. Technical Report RIACS-90-20, Research
Institute for Advanced Computer Science, NASA Ames Research Center, Moffet Field, CA, 1990.
43. Benzi M. Preconditioning techniques for large linear systems: a survey. Journal of Computational Physics 2002;
182:418477.
44. Curtis AR, Reid JK. On the automatic scaling of matrices for Gaussian elimination. Journal of the Institute of
Mathematic Applications 1972; 10:118124.
45. Comerlati A, Ferronato M, Gambolati G, Putti M, Teatini P. A coupled model of anthropogenic Venice uplift.
In Poromechanics III, Abousleiman Y et al. (eds). A.A. Balkema: Rotterdam, 2005; 317321.
46. Benzi M, Tuma M. A sparse approximate inverse preconditioner for nonsymmetric linear systems. SIAM Journal
on Scientific Computing 1998; 19:968994.
47. Benzi M, Cullum J, Tuma M. Robust approximate inverse preconditioning for the conjugate gradient method.
SIAM Journal on Scientific Computing 2000; 22:13181332.
Copyright q 2008 John Wiley & Sons, Ltd. Int. J. Numer. Anal. Meth. Geomech. 2009; 33:405423
DOI: 10.1002/nag