BLOCK KRYLOV SUBSPACE METHODS FOR FUNCTIONS OF MATRICES. (2024)

Link/Page Citation

1. Introduction. Many applications in scientific computing requirethe efficient computation of f(A)B, where f is a scalar function definedon an n x n matrix A and B is a block vector in [C.sup.n x s]. RegardingB as a collection of columns [b.sub.i],

B := [[b.sub.1]|...|[b.sub.s]],

one might consider applying methods for a single vector, such asthose described in [12, 27] or the newly proposed restarted Arnoldimethods [24, 25, 26], to each problem f(A)[b.sub.i]. It is well knownfor linear systems, however, that block Krylov approaches treating allcolumns [b.sub.i] at once can be computationally advantageous; see,e.g., [3, 33, 46, 50, 52, 59, 60, 61, 63]. It is therefore reasonable toconsider block Krylov methods for computing f(A)B.

Others have considered block Krylov methods for f(A)B before. Lopezand Simoncini developed a block Krylov method for exp(A)B in such a waythat the so-called geometric properties of B are preserved, but they didnot undertake a convergence analysis [44]. Benner, Kurschner, and Saakapplied Krylov and integral-based methods to compute log(A)B but did notdevelop a block-based theory in any detail [6]. Al-Mohy and Highamproposed a direct method for computing exp(A)B based on the scaling andsquaring method and a truncated Taylor series approximation [2]. Thismethod is indeed dominated by matrix-block-vector multiplications AB,which can make it superior to Krylov methods. However, the two Krylovmethods the authors of [2] compare it with do not feature blocks, unlikethe ones we propose here.

Our development draws from two main sources: Krylov methods formatrix functions and the block full orthogonalization method (BFOM) forlinear systems, which reduces to block conjugate gradients (BCG) whenthe matrix considered is Hermitian positive definite (HPD).

1.1. Krylov methods for f(A)b. There is an expansive body ofliterature devoted to methods for computing f(A)b for f more generalthan the exponential or logarithm and b a single vector. We focus onmethods that approximate f(A)b by a Krylov subspace such as those in [1,7, 23, 24, 25, 26, 39, 54]. These methods have been referred to asArnoldi or Lanczos methods for matrix functions, depending on whetherthe matrix A is Hermitian. For the sake of the framework developedherein, we propose a new nomenclature. We refer to all such methods asfull orthogonalization methods (FOM) for functions of matrices, or[(FOM).sup.2] for short, due to the inherent connections between thesemethods and FOM theory, which we recapitulate in Section 3. Inparticular, we generalize [24, 25], which use quadrature to computestable restarted approximations to f(A)b, where f is a Stieltjesfunction.

1.2. Block Krylov methods. Block Krylov subspace methods forcomputing [A.sup.-1] B comprise an even larger body of literature,inadvertently causing some ambiguity as to what one means by a"block" Krylov method. We propose the following threecategories to describe the variety of methods found in the literature.

Classical block methods. These are methods related to DianneO'Leary's 1980 seminal paper on block conjugate gradients[50]. BCG has been explored thoroughly in the literature [3, 10, 17, 48,49, 53] along with other block methods such as BGMRES [33, 34, 46, 58,59, 60, 61]. A survey of classical block Krylov methods is given in[56].

Global block methods. Global GMRES and global FOM were firstintroduced in 1999 in [40] for matrix equations. Additional globalmethods can be found in [5, 9, 22, 36, 52, 67].

Loop-interchange block methods. These methods are not ashistorically established as the classical and global methods. Rather,they have been proposed as an implementation alternative to thenon-block approach in which the systems are solved sequentially, onecolumn after another. While it is likely that such methods have beenconsidered before, they are first formally named and developed in [52].

We denote the FOM variant of each of the above classes as ClBFOM,GlBFOM, and LiBFOM, respectively. A detailed description clarifying whatprecisely differentiates all three classes is developed later in thispaper.

1.3. Advantages of blocks. There is no difference between thematrix-block product AV and the concatenation of matrix-vector products[A[v.sub.1] | ... |A[v.sub.s]]; indeed, AV = [A[v.sub.1] | ... |A[v.sub.s]]. However, from the standpoint of computational efficiency,formulating an algorithm in terms of matrix-block products leads to animproved process for a number of reasons described in, e.g., [4, 16, 18,38]. For one, matrix-block products only require that A be accessedonce, versus s times if each column A[v.sub.i] is computedindependently. Reducing the number of times A must be accessed isadvantageous when computer architecture (i.e., machine-specificproperties of the CPU, GPU, or other hardware accelerators) rendersmemory access costly. The advantage increases when A is not explicitlystored or must be generated anew each time it is accessed. In a parallelenvironment, block methods present additional advantages since theyreduce communication among processors.

Building a Krylov subspace based on block vectors instead of columnvectors also "enriches" the theoretical Krylov subspace fromwhich approximations are drawn. Classical block methods, for example,use information from all columns of B to generate approximations to eachcolumn of X leading to potentially more accurate approximations periteration and consequently faster convergence. Methods such asmulti-preconditioned GMRES [30], enlarged Krylov subspace conjugategradients [31], and augmented Krylov subspace methods [62, Section 9]apply this principle even more generally to speed up convergence. We donot, however, explore such techniques further in the present work.

1.4. Notation. Let [cross product] denote the Kronecker productbetween two matrices. With [I.sub.d] denoting the identity matrix ofsize d x d, we define the standard unit vector [e.sup.d.sub.k] [memberof] [C.sup.d] as the kth column of [I.sub.d]. The standard unit blockvector [E.sub.k] [member of] [C.sup.ms x s] is defined as [E.sub.k] :=[e.sub.k.sup.m] [perpendicular to] [I.sub.s]. We do not distinguishnotationally between the matrix A [member of] [C.sup.n x n] as6 anoperator on [C.sup.n,] i.e., v [right arrow] Av, and A as an operator on[C.sup.n x s,] i.e., V = [[v.sub.1] |...| [v.sub.s]| [right arrow] AV =[A[v.sub.1] | ... | A[v.sub.s]].

1.5. Outline. In Section 2, we introduce the concept of amatrix-valued inner product [<<x,x>>.sub.S] for blockvectors with values in a *-algebra S [subset] [C.sup.s x s]. The*-algebra S together with [<<x,x>>.sub.S] are fundamental inthe sense that they yield different kinds of block Krylov subspaces andnotions of block orthogonality. We develop a general block Arnoldiprocess based on [<<x,x>>.sub.S] encompassing the threeclasses of block approaches mentioned in Section 1.2. In Section 3, weshow how each choice of S together with [<<x,x>>.sub.S]determines a specific BFOM for linear equations, and we derive a"true" scalar-valued inner product from[<<x,x>>.sub.S], which allows for a convergence analysis ofBFOM for matrices self-adjoint and positive definite with respect to[<<x,x>>.sub.S]. Section 4 then details our maincontributions. We expound the BFOM for functions of matrices(B[(FOM).sup.2]), show how such methods can be stably restarted, andprovide a convergence proof for an important class of functions andmatrices. In Section 5 we illustrate the performance of the classical,global, and loop-interchange versions of restarted B[(FOM).sup.2] incomparison to the non-block approach with some numerical experiments.

2. Block Krylov subspaces. The use of matrix-valued functions inthe analysis of block Krylov methods is not new; see, e.g., the rightbilinear form of [21], the x product of [9], or the block inner productof [28]. Neither is the notion of a matrix-valued inner product novel;see, e.g., literature on matrix-valued orthogonal polynomials [11, 32].

Let S be a *-subalgebra of [C.sup.s x s] with identity, i.e., S isa vector subspace containing [I.sub.s] that is closed under matrixmultiplication and conjugate transposition. An immediate consequence ofthese properties is that any scalar-valued polynomial of an element in Sis also an element of S. In particular, S contains the inverses[C.sup.-1] of all its nonsingular elements C [member of] [C.sup.s x s]since [C.sup.-1] can be expressed as a scalar polynomial in C. Thefollowing generalizes the notion of an inner product to a bilinearoperator mapping [C.sup.n x s] x [C.sup.n x s] to S and serves as thefoundation for our framework.

DEFINITION 2.1. A mapping [<<x,x>>.sub.S] from [C.sup.nx s] x [C.sup.n x s] to S is called a block inner product onto S if itsatisfies the following conditions for all X, Y, Z [member of] [C.sup.nx s] and C [member of] S:

(i) S-linearity: [<<X, YC>>.sub.S] = [<<X,Y>>.sub.S] C and [<<X + Y, Z>>.sub.S] = [<<X,Z>>.sub.S] + [<<Y, Z>>.sub.S,]

(ii) symmetry: [<<X, Y>>.sub.S] = [<<Y,X>>.sub.S,]

(iii) definiteness: [<<X, X>>.sub.S] is positivedefinite if X has full rank, and [<<X, X>>.sub.S] =[0.sub.s] if and only if X = 0.

Note that by condition (ii) the matrix [<<X, X>>.sub.S]is always Hermitian, and conditions (i) and (ii) together imply that[<<X + YC, Z>>.sub.S] = [<<X, Z>>.sub.S] + C*[<<Y, Z>>.sub.S]. Also, since [<<x,x>>.sub.S] iscontinuous, (iii) implies that [<<X, X>>.sub.S] is positivesemidefinite when X is rank-deficient. The following definition isrelated to a generalization of the notion of norm.

DEFINITION 2.2. A mapping N which maps all X [MEMBER OF] [C.sup.n xs] with full rank to a matrix N(X) [member of] S is called a scalingquotient if for all such X there exists Y [member of] [C.sup.n x s] suchthat X = YN(X) and [<<Y, Y>>.sub.S] = [I.suib.s].

The scaling quotient N(X) is nonsingular since for X of full rankthe matrix

[<<X,X>>.sub.S] = N(X)* [<<Y, Y>>.sub.S]N(X) = N(X)* N(X)

is positive definite. (1) For arbitrary S, the Hermitian matrix[<<X, X>>.sup.1/2.sub.S] lies in S since the matrix squareroot can be represented as a scalar polynomial acting on [<<X,X>>.sub.S]. The operator [([<<x,x)).sub.S]).sup.1/2] is thusa scaling quotient with [??] Provided it lies in S, the Cholesky factorof [<<X, X>>.sub.S] is a scaling quotient as well.

We call a pair of block vectors X and Y[<<x,x>>.sub.S]-orthogonal if [<<X,Y>>.sub.S] =[0.sub.s], and we call a block vector X[<<x,x>>.sub.S]-normalized if [<<X, X>>.sub.S] =[I.sub.s]. Combining the two notions, a set of block vectors{[X.sub.1],..., [X.sub.m]} is called[<<x,x>>.sub.S]-orthonormal when [??], where[[delta].sub.ij] is the Kronecker delta. Clearly, if X has full rank,XN[(X).sup.-1] is [<<x,x>>.sub.S]-normalized.

The following three combinations of S, [<<x,x>>.sub.S],and N, whose block inner products were first deduced in [22], satisfythe above definitions.

EXAMPLE 2.3 (Classical). [S.sup.Cl] = [C.sup.s x s] and [??] Thescaling quotient [N.sup.Cl](X) is the factor R from the economical QRfactorization of X, i.e., X = QR with Q [MEMBER OF] [C.sup.n x s] and R[member of] [C.sup.s x s]. Equivalently, R = L*, with L denoting theCholesky factor of [??]

EXAMPLE 2.4 (Global). [S.sup.Gl] is the set of scalar multiples of[I.sub.s], and the inner product is [??] The scaling quotient is givenas [N.sup.G1] (X) = [[parallel]X[parallel].sub.F] [I.sub.s].

EXAMPLE 2.5 (Loop-interchange). [S.sup.Li] is the set of diagonalmatrices, and [??], where diag is the operator that sets to zero all theoff-diagonal entries of a given square matrix while preserving itsdiagonal. For X = [[x.sub.1]|... |[x.sub.s]], the scaling quotient of Xis given by

[mathematical expression not reproducible]

Further interesting block inner products exist. One might consider"hybrid" block inner products that combine two of the abovethree examples. For example, if s = pq we can take [S.sup.Hy] as the setof all block diagonal matrices with blocks of size q x q. For X =[[X.sub.1] | ... | [X.sub.p]] [member of] [C.sup.n x s] with [X.sub.i][member of] [C.sup.n x q], and similarly for Y [member of] [C.sup.n xs], we can then define

(2.1) [mathematical expression not reproducible]

which is a hybrid of [<<X,Y>>.sub.S.sup.cl] and[<<X,Y>>.sub.S.sup.Li] in some sense. See Figure 2.2 fordepictions of S associated to Examples 2.3-2.5.

For our work, the following natural extension of self-adjointnessto blocks turns out to be the most appropriate.

DEFINITION 2.6. A matrix A [MEMBER OF] [C.sup.n x N] is[<<x,x>>.sub.S]-self-adjoint if for all X, Y [MEMBER OF][C.sup.n x s],

[<<AX,Y>>.sub.S] = [<<X,AY>>.sub.S].

For the block inner products [<<x,x>>.sub.S] fromExamples 2.3-2.5, A being [<<x,x>>.sub.S]-self-adjointreduces to A being Hermitian in the usual sense, i.e., A = A*.

DEFINITION 2.7. Givenasetofblockvectors [??] [subset] [C.sup.n xs], the S-spanof [??] is defined as

[mathematical expression not reproducible]

Note that [span.sup.S]{[X.sub.1],..., [X.sub.m]} is indeed asubspace of [C.sup.n x s]. We can then define the mth block Krylovsubspace with respect to A, B, and S as

[mathematical expression not reproducible]

To illustrate how the choice of S affects the structure of [??] (A,B), we look at the Krylov subspaces associated to Examples 2.3-2.5. Theclassical block inner product gives rise to what is historically knownas the block Krylov space; see [33, 34] for a more detailed descriptionof this space along with a useful notion of block grade. On the otherhand, the global and loop-interchange block inner products give rise tospecial subspaces of [??] (A, B):

[mathematical expression not reproducible]

where [k.sub.m](A, [b.sub.i]) := span{[b.sub.i], A[b.sub.i], ...,[A.sup.m 1] [b.sub.i]} [subset] [C.sup.n].

We can determine [<<x,x>>.sub.S]-orthonormal bases ofthese spaces with a block Arnoldi-type process, detailed in Algorithm 1.As in the non-block case, the process simplifies to a Lanczos-typeprocess with a three-term recurrence if A is[<<x,x>>.sub.S]-self-adjoint. For this case, one could alsoderive a coupled two-term recurrence version as is used for BCG in,e.g., [50].

The block Arnoldi process breaks down for some k if W does not havefull rank, because then N(W) does not exist and W cannot be[<<x,x>>.sub.S]-normalized. This case actually represents alucky breakdown and can be cured by reducing the dimension s accordinglyfor subsequent iterations. We implement such deflation in our numericalexperiments, but, for simplicity of presentation, we do not discuss itany further here. For now, we thus assume that Algorithm runs withoutbreakdowns up to iteration m, i.e., that we obtain

Algorithm 1: Block Arnoldi and Block Lanczos. Given : A, B, S, [<<x,x>>.sub.S], N, m 1 Compute B = N(B) and [V.sub.1] = [BB.sup.-1] 2 if A is [<<x,x>>.sub.S]-self-adjoint then 3 Set [V.sub.0] = 0, [H.sub.0,1] = B 4 for k = 1,..., m do 5 W = A[V.sub.k] - [V.sunb.k-1][H.sub.k-1,k] 6 [H.sub.k,k] = [<< [V.sub.k], W >>.sub.S] 7 W = W - [V.sub.k] [H.sub.k,k] 8 [??] 9 Set [??]10 else11 for k = 1,..., m do12 Compute W = A[V.sub.k]13 for j = 1,..., k do14 [H.sub.j,k] = [<< V.sub.j], W>>.sub.S]15 W = W - [V.sub.j][H.sub.j,k]16 Compute [H.sub.k+1,k] = N(W) and [V.sub.k+1] = W[H.sup.-1.sub.k+1,k]17 Return B, [??]

(i) a [[<<x,x>>.sub.S] -orthonormal basis [??] [subset][C.sup.n x s] such that each [V.sub.k] has full rank and [??]

(ii) a block upper Hessenberg matrix [H.sub.m] [member of] [S.sup.mx m], and

(iii) [H.sub.m + 1, m] [member of] S,

all satisfying the block Arnoldi relation

(2.2) A[V.sub.m] = [V.sub.m][H.sub.m] +[V.sub.m+1][H.sub.m+1,m][E.sup.*.sub.m,]

where [V.sub.m] = [[V.sub.1]| ... |[V.sub.m]] [member of] [C.sup.nx ms], and

[mathematical expression not reproducible]

Figure 2.2 illustrates the block Arnoldi relation. By construction,each block entry [H.sub.j,k] is an element of S, so it is thereforenatural to regard [H.sub.m] as an operator which maps [S.su.m], thesubspace of [C.sup.ms x s] whose block vectors take entries from S, ontoitself. We will always do so in this paper and stress this by notingthat [H.sub.m] [member of] [S.sup.m x m]

How the choice of S affects the structure of [H.sub.m] is depictedin Figure 2.3. In the classical case (top left), [H.sub.m] is a blockupper Hessenberg matrix, in which the only blocks with a fixed structureare the block lower diagonal elements, which are upper triangular. Theglobal and loop-interchange cases (top right) produce a comparativelysparse pattern. In particular, the global case produces [H.sub.m] = H[cross product] [I.sub.s] for some m x m upper Hessenberg matrix H. Ahybrid inner product results in a structure whose sparsity is "inbetween" that of the classical and loop-interchange structures.

2.1. Matrix polynomials and [lambda]-matrices. Since

[mathematical expression not reproducible]

there is an inherent connection between the Krylov subspace [??](A, B) and polynomials with matrix coefficients.

Let [??] [subset] S, with [C.sub.m] [not equal to] [0.sub.s], andlet P : [C.sup.s x s] [right arrow] [C.sup.s x s] be given by [??] Wedenote the space of all such P by [P.sub.m](S) and refer to its elementsas matrix polynomials of degree m. By P m (S), we denote the subset ofmatrix polynomials in [P.sub.m](S) with [C.sub.o] = [I.sub.s]. In thespecial case when [LAMBDA] = [lambda] [I.sub.s] for a scalar [lambda],[??] can thus be regarded as a mapping from C to [C.sup.s x s] and iscalled a [lambda]-matrix. For more on the properties of P([LAMBDA]) andP([lambda]) and how they relate to each other, see [15, 29,43].

Given a matrix polynomial P [member of] [P.sub.m](S), one canextract its matrix coefficients [??] and define the action of P on an nx n matrix A paired with an n x s block vector B via the operation o asfollows: (2)

(2.3) [mathematical expression not reproducible]

In fact, P(A) o B could be viewed as a matrix polynomial in its ownright with coefficients B[C.sub.k] [member of] [C.sup.s x s]. However,we find it more appropriate to think of the object P(A) oBin terms ofthe underlying matrix polynomial or [lambda]-matrix P with coefficients[C.sub.k] [member of] S.

With the operator o, we can succinctly describe the block Krylovsubspace [??](A, B) as

(2.4) [mathematical expression not reproducible]

3. Block full orthogonalization methods (BFOM). In this section, werecast the convergence theory of block full orthogonalization methods(BFOM) for linear systems in terms of our new generalized framework. Tostart, recall that [??] is the S-span of the[<<x,x>>.sub.S-] orthonormal Arnoldi basis block vectors[V.sub.1],..., [V.sub.m], which, using [V.sub.m] = [[V.sub.1] \ ... [V.sub.m]] (see (2.2)), can be stated as

(3.1) [mathematical expression not reproducible].

In particular, we have B = [V.sub.1]B = [V.sub..m][E.sub.1]B withthe unit block vector [E.sub.1] [member of] [S.sup.m] as defined inSection 1.4.

Given a block Krylov subspace [??] (A,B) and the block innerproduct [<<x,x>>.sub.S], we now define the mth BFOMapproximation to be the block vector [X.sub.m] [member of] [??](A, B)satisfying the block Galerkin condition

(3.2) [R.sub.m] := B - A[X.sub.m] is[<<x,x>>.sub.S]-orthogonal to [??] (A, B).

THEOREM 3.1. Assume that [H.sub.m] : [S.sup.m [right arrow][S.sup.m] is nonsingular, andput [??]. Then

(3.3) [X.sub.m] := [V.sub.m][Y.sub.m]B

belongs to [??] (A, B) and satisfies the block Galerkin condition(3.2).

Proof. Since B [member of] S and [Y.sub.m] [member of] [S.sup.m],we have that [Y.sub.m] B [member of] [S.sup.m] and thus [X.sub.m][member of] [??] (A, B) by (3.1). Using the block Arnoldi relation (2.2)and the fact that B = [V.sub.m][E.sub.1] B, it holds that

(3.4) [mathematical expression not reproducible].

Thus, since [V.sub.m+1] is [<<x,x>>.sub.S]-orthogonalto [??] (A, B) by construction, so is [V.sub.m+1][C.sub..m], implyingthat (3.3) satisfies (3.2).

We say that [R.sub.m] and [V.sub.m+1] are cospatial since, by(3.4), the columns of each span the same s-dimensional subspace in[C.sup.n].

It is also helpful to consider the polynomial representations of[X.sub.m] and [R.sub.m.] Since [X.sub.m] [member of] [??] (A, B), weknow there exists [Q.sub.m-1] in [P.sub.m-1](S) such that [X.sub.m] =[Q.sub.m-1](A) [omicron] B. Then

(3.5) [R.sub.m] = B - A[X.sub.m] = B - A([Q.sub.m]-1(A) [omicron]B) = [A.sup.0] B[I.sub.s] - (A[Q.sub.m]-1(A)) [omicron] B = [P.sub.m](A)(3.5) B,

where [P.sub.m]([lambda]) := [I.sub.s] -[lambda][Q.sub.m]-1([lambda]) [member of] [P.sub.m] (S).

3.1. Convergence results for BFOM. We analyze the convergence ofBFOM with a general block inner product [<<x,x>>.sub.s] inthe case that the matrix A is [<<x,x>>.sub.S]-self-adjointand positive definite in a sense to be defined shortly. Our resultsreduce to what is well-known for BCG in the case of the classical blockinner product [50]. For the global and loop-interchange block innerproducts, the results are directly related to what is known for CG for asingle right-hand side, and we briefly discuss this at the end of thesection.

Assume that A is HPD and let 0 < [[lambda].sub.min] [less thanor equal to] [[lambda].sub.max] denote the smallest and largesteigenvalues of A, respectively. The following constants play animportant role:

(3.6) [mathematical expression not reproducible]

If k = 1, then we set [[xi].sub.m] = 0. Our goal is to mimic thewell-known convergence result for (non-block) CG stated in the followingtheorem; see, e.g., [45, Chapter 8] and [55, Chapter 6]. This resultuses the A-norm [[parallel]x[parallel].sub.A] := [<x,Ax>.sub.2.sup.1/2].

THEOREM 3.2. The error [e.sub.m] for the CG iterate at step msatisfies

[mathematical expression not reproducible]

3.1.1. A scalar inner product and norm. To deduce results likethose of Theorem 3.2 in our block framework, we introduce a scalar innerproduct [<x,x>.sub.S] : [C.sup.n x s] x [C.sup.n x s] [rightarrow] C:

(3.7) [<X, Y>.sub.S] := trace [<<Y, X>>.sub.S].

The properties of [<<x,x>>.sub.S] from Definition 2.1ensure that (3.7) is a true inner product on [C.sup.n x s], whichfurther induces the norm

[mathematical expression not reproducible].

We note that in the case of Examples 2.3 and 2.5, the resultingnorms [??] and [??] are both identical to the Frobenius norm [??], whilein the case of Example 2.4, we have that [??].

DEFINITION 3.3. A matrix A [MEMBER OF] [C.sup.n x n], as anoperator on [C.sup.n x s], is called

(i) [<x,x>.sub.S]-self-adjoint if for all X, Y [member of][C.sup.n x s] we have [<AX, Y>.sub.S] = [<X, AY>.sub.S],

(ii) [<x,x>.sub.S]-positive definite if in addition, for allnonzero X [member of] [C.sup.n x s], we have that [<X, AX>.sub.S]> 0.

REMARK 3.4. The following assertions follow immediately from thedefinition.

(i) If A is [<<x,x>>.sub.S]-self-adjoint, then A isalso [<x,x>.sub.s]-self-adjoint. The converse does not necessarilyhold.

(ii) For A [<x,x>.sub.s]-positive-definite, spec(A) [subset](0, [infinity]).

As for (ii), we note that whether we regard A [member of] [C.sup.nx n] as an operator on [C.sup.n] or on [C.sup.n x s,] it has the samespectrum, albeit the geometric multiplicities of the eigenvalues aremultiplied by s for the latter.

LEMMA 3.5. Let p be a scalar-valued polynomial with realcoefficients, and let A be [<x,x>.sub.s] -self-adjoint. Then

[mathematical expression not reproducible]

Proof. As a [<x,x>.sub.s]-self-adjoint operator on[C.sup.nxs], A admits an [<x,x>.sub.s] -orthonormal basis ofeigenvectors from [C.sup.nxs], implying that its operator norm[\\A\\.sub.S] is given as

[mathematical expression not reproducible]

Since p has real coefficients, the operator p(A) is also[<x,x>.sub.s]-self-adjoint, and it holds that spec(p(A)) ={p([lambda]) : [lambda] [member of] spec(A)}. This directly gives[[parallel]p(A)[parallel].sub.S] = [max.sub.[lambda][member of]spec(A)]|p([lambda])|.

When A is[<x,x>.sub.s] definite, we have for any rationalfunction g, which is defined and positive on (0, [infinity]), a g (A)-weighted inner product and norm, defined as

[mathematical expression not reproducible]

We will particularly need the cases g(z) = [z.sup.-1] and g(z) = (z+ tI), t [greater than or equal to] 0, in Theorem 4.5. Lemma 3.5 carriesover to the g(A)-weighted inner products.

LEMMA 3.6. Let p be a scalar-valued polynomial with realcoefficients, g: (0, [infinity])[right arrow](0, [infinity]) a rationalfunction, and Abe [<x,x>.sub.s] -positive-definite. Then

[mathematical expression not reproducible]

Proof Since A is [<x,x>.sub.s] -self-adjoint, it also is[<x,x>.sub.s] S-g(A)-self-adjoint due to [<AX,Y>.sub.S-g](A)= [<AX, g(A)Y>.sub.S]= [<X, Ag(A)Y>.sub.S]=[<X, g(A)AY>.sub.S]= [<X, AY>.sub.S-g](A).

An application of Lemma 3.5 for the [<x,x>.sub.s] -g(A) innerproduct leads to the desired result.

The scalar inner product [<x,x>.sub.s] induces a traditional,scalar notion of [<x,x>.sub.s]-orthogonality, and similarly forthe g(A)-weighted versions. Trivially,[<<x,x>>.sub.S]-orthogonality implies[<x,x>.sub.s]-orthogonality since [<<x,x>>.sub.S] =[0.sub.s] implies [<X, Y>.sub.S] = trace ([<<Y,X>>.sub.S]) = 0, and likewise for the g (A) -weighted cases. Theconverse, however, is not true. For example, consider the classicalblock inner product (Example 2.3) on [C.sup.3 x 2], let v :=[[1,0,0].sup.T], w := [[0,1,0].sup.T], and define Y := [v|w], X :=[w|v]. Then

[mathematical expression not reproducible]

implying that X and Y are not block orthogonal, whereas trace [??]implies that X and Y are [??]-orthogonal.

3.1.2. A variational characterization and error bounds. We denoteby P the [<x,x>.sub.s] -A-orthogonal projector onto a subspace Kof [C.sup.n x s]. It then follows that for any Y [member of] [C.sup.n xs],

[mathematical expression not reproducible]

and that

(3.8) [mathematical expression not reproducible]

We are now in a position to formulate the following generalizationof Theorem 3.2.

THEOREM 3.7. Let A [MEMBER OF] [C.sup.n x n] be[<<x,x>>.sub.S]-self-adjoint and [<x,x>.sub.s]-positive definite, and let B [member of] [C.sup.n x s] be a blockright-hand-side vector. Then the BFOM error [??] satisfies

(3.9) [mathematical expression not reproducible]

with [[xi].sub.m] from (3.6).

Proof. Since [X.sub.m] satisfies the block Galerkin condition(3.2), [R.sub.m] is [<<x,x>>.sub.S]-orthogonal and,consequently, [<x,x>.sub.s] -orthogonal to [??](A, B). Then forall V [member of] [??] (A, B),

0 = [<[R.sub.m], V).sub.S] = [<A[E.suv.m], V).sub.S] =[<[E.sub.m], V).sub.S-A].

Since [E.sub.m] = [X.sub.*] - [X.sub.m] with [X.sub.m] [member of][??](A, B), applying (3.8) gives the equality in (3.9). To prove theinequality in (3.9), we make use of the polynomial characterization ofthe block Krylov subspace [??] (A, B) given in (2.4). We thus have that

(3.10) [mathematical expression not reproducible]

where [P.sub.m] is the BFOM residual polynomial as in (3.5). By theembedding

[mathematical expression not reproducible]

we can regard [P.sub.m](C) as a subspace of [P.sub.m](S) with[P.sub.p](A) [omicron] X = p(A)X. Along with (3.10) and Lemma 3.6, thisgives that

[mathematical expression not reproducible]

If we now take p as the (scaled) Chebyshev polynomial of degree mfor the interval [[[lambda].sub.min], [[lambda].sub.max]] (as in, e.g.,[55, Chapter 6]), then [??] |p([lambda])| [less than or equal to][[xi].sub.m], leading to the inequality in (3.9).

To put Theorem 3.7 in perspective, consider the specific blockinner products [<<x,x>>.sub.S] from Examples 2.3-2.5. In theclassical and loop-interchange cases, [??] and in the global case,[[parallel]X[parallel].sub.S-A] = s [\\X\\.sub.A-F]. Thus, in all threecases, Theorem 3.7 gives that

(3.11) [mathematical expression not reproducible].

For the classical case, this result is contained in unpublishedwork by Eisenstat [20], who rewrites results from [50] in terms of theA-weighted Frobenius norm. In the loop-interchange case, we can use thestandard CG error bound from Theorem 3.2 for each column as analternative way to arrive at (3.11). In the global case, the estimate(3.11) can also be obtained as follows. Let A = [I.sub.s] [crossproduct] A. Then solving the block linear system AX = B with GlBFOM isidentical to solving Avec(X) = vec(B) with FOM [36, Theorem 1], wherevec is the operator that reshapes annxs block vector into an ns x 1vector. Since A and A have identical spectra, k, c, and [[xi].sub.m] arejust as in (3.6). Applying Theorem 3.2 we obtain that

[mathematical expression not reproducible]

Converting everything back to block form gives (3.11).

It is worth mentioning that since the *-algebras for the threeexamples are nested as [??], the variational characterization fromTheorem 3.7 directly provides the comparison

(3.12) [mathematical expression not reproducible].

4. BFOM for functions of matrices: B[(FOM).sup.2]. We begin thissection by considering a generic function f : D [subset] C [right arrow]C and an n x n matrix A along with a starting block vector B, a blockinner product [<<x,x>>.sub.S] on [C.sup.n x s] with scalingquotient N, and corresponding outputs from the Arnoldi process(Algorithm 1) fulfilling the block Arnoldi relation (2.2). Assumingf(A)B exists, we define the B[(FOM).sup.2] approximation to f(A)B as

(4.1) [F.sub.m] := [V.sub.m]f([H.sub.m])[E.sub.1]B,

where, just as [H.sub.m], the matrix function f([H.sub.m]) isconsidered as an operator from [S.sup.m] to [S.sup.m].

If we can express f as a contour integral of the form

(4.2) [mathematical expression not reproducible]

then (3) [??] and

(4.3) [mathematical expression not reproducible]

A crucial connection to BFOM for systems of equations now arises.The block Arnoldi algorithm is shift invariant, i.e., the block Arnoldiprocedure for A + tI produces the same basis [V.sub.m] as the processfor A, as well as the same block Arnoldi relation, but with the matrix[H.sub.m] replaced by [H.sub.m] + tI. Therefore, for each t G [DELTA],the block vector [V.sub.m][([H.sub.m] + tI).sup.-1] [E.sub.1]B appearingin (4.3) is precisely the mth BFOM approximation to the solution X(t) ofthe block linear system (A + tI)X(t) = B.

Note that f([H.sub.m]) is not necessarily defined when f(A) is,because [H.sub.m] could have an eigenvalue on which f is not defined ornot sufficiently smooth. We now show that if f is defined andsufficiently smooth on an appropriate superset of spec(A), thenf([H.sub.m]) is defined as well. Recall that for an operator H on aHilbert space T with inner product <x,x>, the field of values [??](H) is given as

[mathematical expression not reproducible]

and that the field of values contains the spectrum of H. We use thefield of values for A as an operator on [C.sup.n x s] with the innerproduct [<x,x>.sub.S] and for [H.sub.m] as an operator on[S.sup.m] with the inner product [<x,x>.sub.vm] defined as

[<X,Y>.sub.vm] := [<V.sub.m]X,[V.sub.m]Y).sub.S], X,Y[member of] [S.sup.m],

where [V.sub.m] is the matrix of the block Arnoldi vectors; see(2.2). The following lemma relates the two field of values.

LEMMA 4.1. We have

[mathematical expression not reproducible]

Proof. We first observe that by the Arnoldi relation (2.2), we havethat

[mathematical expression not reproducible]

The last equality holds since [??] which one can see by breaking[V.sub.m]X into components and applying the fact that the [V.sub.j] areorthonormal. Clearly, [V.sub.m]X [member of] [C.sup.n x s]. Since,moreover, [<V.sub.m]X, [V.sub.m]X>.sub.S] = <X,X>.sub.vm],this gives the desired result.

As a direct consequence, we obtain the following sufficientcondition for the existence of f([H.sub.m]).

THEOREM 4.2. If the function f is defined and sufficiently smoothon [??] (A), then f([H.sub.m]) : [S.sup.m] [right arrow] [S.sup.m] iswell defined, and consequently, so is the approximation [F.sub.m] givenby (4.1).

In the remainder of this section we concentrate on Cauchy-Stieltjesfunctions as a special class of functions closely related to those ofthe form (4.2) and develop a restart approach for B[(FOM).sup.2]on suchfunctions. It is important to note that, in principle, theB[(FOM).sup.2] approximation (4.1) and restart technique can be used forany f of the form (4.2), particularly analytic functions. We, however,develop a theory for Stieltjes functions and[<<x,x>>.sub.S]-self-adjoint, [<x,x>.sub.s] -positivedefinite matrices A only.

4.1. Stieltjes functions. A Stieltjes or Cauchy-Stieltjes functionis a function f, with f : C \ (-[infinity],0] [right arrow] C, that canbe written as a Riemann-Stieltjes integral as follows:

(4.4) [mathematical expression not reproducible]

where [micro] is monotonically increasing and nonnegative on [0,[infinity]) with [??] The relation between (4.4) and (4.2) becomesobvious if [micro] is differentiable since then g = [micro]. Stieltjesfunctions are a particular case of Markov-type functions [24, 42]. Thematrix Stieltjes function f(A) is defined if A has no eigenvalues on (-[infinity],0]. If [??] [intersection] (-[infinity],0] = [theta] then byTheorem 4.2 the B[(FOM).sup.2] approximation [F.sub.m] = [V.sub.m]f([H.sub.m]) [E.sub.1] B exists.

Many important functions are Stieltjes functions. For example, for[alpha] [member of] (0,1),

(4.5) [mathematical expression not reproducible]

and

[mathematical expression not reproducible]

For more information on Stieltjes functions, see, e.g., [35].

4.2. Restarts. The computation of a sufficiently accurate [F.sub.m]faces many limitations in practice since Krylov methods tend to sufferfrom steep storage requirements as the number of basis vectors mincreases and even more so in the case of block bases. For linearsystems, restarts are the standard approach for mitigating this issue.Based on work for the non-block case in [25], we develop a numericallystable and efficient restart approach for B[(FOM).sup.2]. The key is touse the integral representation for f to develop an integralrepresentation for the error of the B[(FOM).sup.2] approximation[F.sub.m]. As we will see, this integral representation is defined in afundamentally different way compared to the non-block case, but we cannevertheless apply a variant of B[(FOM).sup.2] to it to approximate theerror. The error approximation is ultimately defined in a recursivefashion and updated each time a new block basis is computed. Note thatthe norm of the error approximation can be used as a stopping criterion,an important issue when approximating matrix functions.

Updating the function approximation with an error approximation isthe only established approach in the literature for restarting[(FOM).sup.2] for a single vector [1,19,25,39]. Modifications to thistechnique include varying cycle lengths or choosing more than one blockvector for generating the next basis, as is done with thick restarts[24] or recycling [51]. However, for the sake of simplicity, we assumethat the number of basis vectors computed per cycle m is constant andthat only one block vector, specifically the last basis vector [??] fromthe previous cycle, is used as the starting vector for computing thebasis vectors of the next cycle.

4.2.1. Restarted BFOM for shifted linear systems. Essential to thedevelopment of a restart procedure for B[(FOM).sup.2] is a theory forthe restarted BFOM for the shifted linear systems (A + tI) [X.sub.*] (t)= B, t [greater than or equal to] 0, which we now expound. We begin byfixing m, the number of block basis vectors that are generated andstored at a time in the block Arnoldi process, and henceforth refer toit as the cycle length. Using a superscript (k) to index quantities froma given cycle and prescribing [??] (t) := 0, the restarted BFOMapproximation to [X.sub.*](t) obtained after the (k + 1)st cycle isgiven as

(4.6) [mathematical expression not reproducible]

with [??] (t) defined as the BFOM approximation to [??] (t) in theblock residual equation

[mathematical expression not reproducible].

Consider the first cycle. By the shift invariance of the blockArnoldi relation (2.2) and by the cospatial relation (3.4), we have that

[mathematical expression not reproducible]

where the cospatiality factor [C.sub.m.sup.(1)] (t) depends on t,while the block basis vector [??] does not.

Instead of starting the second cycle with [??] (t), we can just aswell start it with [??] since the two are cospatial. We then obtain theblock basis [??], which block spans [??] and is again independent oft.If we then take

[mathematical expression not reproducible]

we observe, again by the cospatial relation in (3.4), that theresidual to the equation (A + tI)Z* (t) = [R.sub.m] (t) for theapproximation [Z.sup.(1)] (t) satisfies

[mathematical expression not reproducible]

showing that the residual [??] is[<<x,x>>.sub.S]-orthogonal to [??] (A, [V.sup.(2).sub.1),thus satisfying the block Galerkin condition (3.2). This implies that[??] (t) is indeed the BFOM approximation for the residual equation (A +tI)[??] (t) = [R.sub.m] (t). The residual [??] (t) of the updatedapproximation [??] (t) = [??] (t) + [??] (t) is then given as

[mathematical expression not reproducible]

Defining [??] leads to a succinct expression for the cospatialityrelationship between [??] (t) and [??],

[mathematical expression not reproducible]

Inductively, if we start the (k + 1)st cycle with the (m + 1)stblock basis vector from the previous cycle, i.e., if we take [??], wethen obtain for all k [greater than or equal to] 1 and t [greater thanor equal to] 0 that

(4.7) [mathematical expression not reproducible]

(4.8) [mathematical expression not reproducible]

where

(4.9) [mathematical expression not reproducible],

with

(4.10) [mathematical expression not reproducible].

4.2.2. An integral representation of the B[(FOM).sup.2] error. Weare now ready to state our central result on restarted B[(FOM).SUP.2].All integrals occurring in the following theorem are assumed to exist.

THEOREM 4.3. Let f be a Stieltjes function. For k [greater than orequal to] 1 and t [greater than or equal to] 0 with the matrices [??](t) [member of] S as in (4.10), define the matrix-valued function [??](z) of the complex variable z as

(4.11) [mathematical expression not reproducible]

Let

[mathematical expression not reproducible]

be the B[(FOM).sup.2] approximation to f(A)B after the first cycle.For k [greater than or equal to] 1 put

(4.12) [mathematical expression not reproducible]

Then, for k = 0,1,..., the kth B[(FOM).sup.2] approximation error[??] is given as

(4.13) [mathematical expression not reproducible]

Before starting the proof, we note that the representations (4.12)and (4.13) are not of the form "matrix function times a blockvector", as is the main object of our efforts f(A)B. Rather, theyare more closely related to the action of a matrix polynomial on amatrix and block vector pair, as defined by the operator o in (2.3). Byanalogy, we have thus extended the meaning of o to integrals withmatrix-valued coefficients in the definitions of (4.12) and (4.13). Withthis in mind, one can see that the correction [??] to the approximation[??] is the natural extension of the B[(FOM).sup.2] approximation to thematrix function [??] in the space [??] with [??].

Proof of Theorem 4.3. We will make use of the restarted BFOMiterates [??] (t) together with their updates [??], their errors [??],and their residuals [??] (t) from equations (4.6), (4.7), and (4.8).Note that, in particular, for all k = 0,1,..., the error representationfor [??] from (4.13) is equivalent to

[mathematical expression not reproducible]

We obtain (4.13) for k = 0 via

[mathematical expression not reproducible]

Inductively then, for k [greater than or equal to] 1, we express

[mathematical expression not reproducible]

and use (4.7) and (4.8) to obtain that

[mathematical expression not reproducible]

with the last equality holding by (4.9).

Theorem 4.3 provides the restart technique for our proposedB[(FOM).sup.2] algorithm given as Algorithm 2. Note that the cospatialfactors Cm (t) can be retrieved for any t as long [??] is stored for allk. We employ numerical quadrature to evaluate the integral defining[??], but one should note that the choice of quadrature rule does notaffect the integral representation of the error for the next cycle.

Algorithm 2: B[(FOM).sup.2] with restarts. Given:f, A, B, S, [<<x,x>>.sub.S] N, m, tol1 Run Algorithm 1 with inputs A, B, S, [<<x,x>>.sub.S], N, and m to obtain [??], [??],2 Compute [??]3 for k= 1, 2,..., until convergence do4 Determine [??] to define the new error function [??]5 Run Algorithm 1 with inputs A, [??], S, [<<x,x>>.sub.S] N, and m to obtain [??], [??] and [??]6 Compute [??] where [??] is evaluated via quadrature7 Compute [??]

4.3. Error bounds and convergence results. It remains toinvestigate when Algorithm 2 produces a convergent sequence ofapproximations to f(A)B. The following lemma is important in thiscontext.

LEMMA 4.4. Let A be [<x,x>.sub.s] -positive-definite, and letg : (0, [infinity]) [right arrow] (0, [infinity]) be a scalar rationalfunction. Also, let [g.sub..min] and [g.sub.max] denote the minimum andmaximum values of g on spec(A), respectively. Then

[mathematical expression not reproducible]

Proof Since A is [<x,x>.sub.s] -positive definite, itsspectrum is positive and A has a [<x,x>.sub.s]-orthonormaleigenbasis, i.e., there exist [??] [subset] C and [??] [subset] [C.sup.nx s] such that A[Q.sub.j] = [[lambda].sub.j][Q.sub.j] and[<[Q.sub.j], [Q.sub.k]).sub.S] = [[delta].sub.jk]. Given any V[member of] [C.sup.nxs], expand it in terms of this basis as [??] Then

[mathematical expression not reproducible]

and thus

[mathematical expression not reproducible]

Noting that [??] leads to the desired result.

Define the following shifted versions of (3.6):

(4.14) [mathematical expression not reproducible]

Note that for all t [greater than or equal to] 0, 0 [less than orequal to] [[xi].sub.m](t) < 1, and limt [right arrow].[infinity][[xi].sub.m](t) = 0; see [24, Proposition 4.2].

The following theorem generalizes the results of [24, Lemma 4.1 andTheorem 4.3] to the block case. Incidentally, its proof also shows thatthe improper integral representation (4.13) for the error is finiteunder the given assumptions on A, which at the same time implies thatthe restarted B[(FOM).SUP.2] approximations are all well-defined.

THEOREM 4.5. Let f be a Stieltjes function, A [MEMBER OF] [C.sup.nx n] [<<x,x>>.sub.S]-self-adjoint and[<x,x>.sub.s]-positive definite, and B [member of] [C.sup.n x s].Let [??] from (4.13) be the error of the restarted B[(FOM).sup.2]approximation after k cycles of length m. Then, with the quantitiesdefined in (4.14), we have

(4.15) [mathematical expression not reproducible]

where [??]. In particular, B[(FOM).sup.2] converges for all cyclelengths m.

Proof. We begin by writing the error Dm ) as an integral over theshifted restarted BFOM error as

[mathematical expression not reproducible]

The norm can be passed through the integral (see, e.g., [57, Lemma2.1]) to give

[mathematical expression not reproducible]

Using Lemma 4.4, we have for any V [member of] [C.sup.n x s] that

[mathematical expression not reproducible]

implying that [??] Then

(4.16) [mathematical expression not reproducible]

Repeated application of Theorem 3.7 to [??] gives that

(4.17) [mathematical expression not reproducible]

Since [??] we can use Lemma 4.4 to bound

(4.18) [mathematical expression not reproducible]

Combining (4.16), (4.17), (4.18), we obtain that

(4.19) [mathematical expression not reproducible]

which is the first inequality in (4.15). The bound to the right of(4.19) is increased if we replace [[xi].sub.m] (t) by [[xi].sub.m] (0)since [[xi].sub.m] (t) is a monotonically decreasing function oft [24,Proposition 4.2]. Moreover, the geometric mean [??] satisfies

[mathematical expression not reproducible]

so that

[mathematical expression not reproducible]

Observing that the integral on the right is just f [??] leads tothe second inequality in (4.15).

5. Numerical experiments. In this section, we illustrate thebehavior of restarted B[(FOM).sup.2] for a variety of functions--not allof which are Stieltjes--and matrices--not all of which are[<<x,x>>.sub.S]-self-adjoint or [<x,x>.sub.s]-positive definite. While timings and computational effort, especiallyin comparison to the non-block method, are important in practice, we donot devote much attention to them here, as the main purpose of theseexamples is to establish the versatility and applicability ofB[(FOM).sup.2]. Four versions are implemented: ClB[(FOM).sup.2],GlB[(FOM).sup.2], LiB[(FOM).sup.2], and HyB[(FOM).sup.2] (which takes[??] as in (2.1) and [N.sup.Hy] (X) := sqrtm(X*X)), along with thenon-block approach of [24] applied to each column sequentially, referredto as [(FOM).sup.2]. For all four corresponding block inner products, A= A* implies that A is [<<x,x>>.sub.S]-self-adjoint.

5.1. Expressions for the matrix error function. We consider threefunctions, [z.sup.-a], [??], and exp(z). For the respective integralrepresentation of the error functions we use the quadrature rulesdetailed in [25], which we briefly recapitulate here.

5.1.1. f(z) = [z.sup.-[alpha]], 0 < [alpha] < 1. Applying theCayley transform [??], for some [delta] > 0, to (4.5), we use N-nodeGauss-Jacobi quadrature for the interval [-1,1] (as in, e.g., [13]) andfind

[mathematical expression not reproducible]

with the Gauss-Jacobi nodes [??], weights [??] and [??]. Asdiscussed in [25], the algorithm is not sensitive to the choice of[delta], so we take [delta] = 1.

5.1.2. f(z) = log(z + 1)/z. With the Gauss-Legendre nodes [??],weights [??], and transformed nodes [??], we obtain

[mathematical expression not reproducible]

5.1.3. f(z) = exp(z). Although the exponential is not a Stieltjesfunction, we can still apply the framework developed in this paper tothe Cauchy integral form,

[mathematical expression not reproducible]

Based on work in [64, 65, 66], the authors of [25] recommend taking[DELTA] as a parabolic contour parametrized as

[gamma] (s) = a + is - c[s.sup.2], s [member of] R.

Here, a and c are chosen anew for each restart cycle to ensure that[DELTA] encloses the eigenvalues of the matrix [H.sub.m]. The infiniteinterval of integration for s is truncated for a given error tolerancetol by the truncation parameter [??], so that [??]. From the N -pointmidpoint rule on [-s.sub.o], [s.sub.o]], we obtain the nodes [??], for j= 1,..., N. Defining [w.sub.j] := exp( [gamma] ([s.sub.j])) [gamma]([s.sub.j]) and [t.sub.j] := - [gamma] ([s.sub.j]), we then approximatethe error approximation as

[mathematical expression not reproducible]

5.2. Remarks on the implementation. We highlight some aspectsrelevant to an implementation of the algorithms described in this paper.

The first aspect pertains to breakdowns of the block Arnoldiprocess due to the occurrence of a rank-deficient block vector W inAlgorithm 1. For the global method, this situation implies that W = 0,i.e., that [??] has reached its maximal size, the algorithm hasconverged, and f(A)B = [F.sub.m.sup.(k)]. In a similar fashion for theloop-interchange method, a zero entry in the i-th diagonal position of[??] indicates that the process has converged for the ith column of B,so this column can be discarded and the process can continue with theremaining ones.

The classical method, however, faces a more complicated issue.Algorithm 1 may generate singular block entries [H.sub.k+1,k] even whennone of the columns has converged due to possible linear dependenceamongst columns of the basis vector [V.sub.k]. A number of methods havebeen proposed to manage this phenomenon; see, e.g., [8, 33] andreferences therein. We employ [8, Algorithm 7.3], which uses arank-revealing QR (RRQR) factorization of W to detect near and exactlinear dependence. The RRQR acts as the scaling quotient [N.sup.Cl]instead of the standard economic QR factorization. We demonstrate thebehavior of a RRQR-based implementation of the classical method versusthe non-deflated implementation in Section 5.4.

The other key aspect is that we use quadrature to evaluate theerror function (4.11), which directly affects the quality of thesolution update [??] Specifically, we implement adaptive quadrature, inwhich the number of nodes is increased by [??] until the differencebetween two consecutive applications of the quadrature rule is less thana given tolerance. The final number of nodes is stored, and in the nextcycle, slightly fewer nodes are first used in an attempt to decrease thecomputational effort; if they are not sufficient, the nodes areincreased again until the tolerance is met, and so on. For all examplesin this paper, the quadrature tolerance is set to be the same as theconvergence tolerance.

As we have access to the exact solution f(A)B (or, rather, amachine-accurate approximation to it) for each example in this paper, weuse [??], i.e., the difference between the approximation[F.sub.m.sup.(k)] and f(A)B, to measure convergence. A process isregarded as convergent when [??] is below a given error tolerance, whichis specified for each example.

All experiments are run on a Dell desktop with a Linux 64-bitoperating system, an Intel[R]Core[TM] i7-4770 CPU @ 3.40 GHz, and 32 GBof RAM. In the plots, we abbreviate ClB[(FOM).sup.2], GlB[(FOM).sup.2],LiB[(FOM).sup.2], and [(FOM).sup.2] as Cl, Gl, Li, and nB, respectively,where "nB" stands for "non-block." (4)

5.3. B[(FOM).sup.2] on a random tridiagonal HPD matrix. Our firstexample consists of the function f(z) = [z.sup.-1/2] acting on a 100 x100 HPD tridiagonal matrix A with random entries with a condition numberof [??]. We let B be a random 100 x 10 matrix, i.e., s = 10, and thecycle length m is set to 5, while the error tolerance is set to[10.sup.-10]. For the hybrid method, q = 5; i.e., [S.sup.Hy] consists of10 x 10 matrices with 5 x 5 blocks on the diagonal. With this example weillustrate that while the error bound given in Theorem 4.5 is valid, itis too far from the actual values of the error to be used as a predictorof convergence, which one can see by comparing the black solid line at[10.sup.1] with the rest of the convergence curves in Figure 5.1. (Notethat GlB[(FOM).sup.2], LiB[(FOM).sup.2], and [(FOM).sup.2] all havenearly identical error curves and therefore appear to overlap eachother. In fact, LiB[(FOM).sup.2] and [(FOM).sup.2] should be identicalin exact arithmetic.) Since [[xi].sub.m](0) is very close to 1, thisbound cannot precisely predict the convergence exhibited by the actualerror curves. We therefore do not include it in the rest of theconvergence plots.

5.4. Discretized two-dimensional Laplacian and f(z) = [z.sup.-1/2].We now take A to be the real, symmetric positive definite matrix arisingfrom the second-order central difference discretization of the negativetwo-dimensional Laplacian operator with 100 grid points in each spatialdimension, so that n = [10.sup.4]. We look at two different B with s =10, the first having full rank and the second being rank deficient. Wealso run two versions of ClB[(FOM).sup.2], one with deflation and onewithout, as described in Section 5.2, in order to observe the effects ofrank deficiency on the stability of ClB[(FOM).sup.2]. For both cases,the cycle length is m = 25, and the error tolerance is [10.sup.-6].HyB[(FOM).sup.2] is not run for this set of examples due to thecomplexity of developing an adequate deflation procedure. We take B = u[cross product] [I.sub.10], where u is the vector of dimension[10.sup.3] whose entries are all 1 and [I.sub.10] is the 10 x 10identity, making B full rank.

The left part of Figure 5.2 displays the results for B having fullrank. One can see that all methods attain the desired accuracy withroughly the same number of cycles. The error curves for ClB[(FOM).sup.2]with and without deflation coincide as well as those ofLiB[(FOM).sup.2], [(FOM).sup.2], and GlB[(FOM).sup.2], leading to whatmay appear to be only two curves. However, for a given cycle,GlB[(FOM).sup.2] is slightly less accurate than LiB[(FOM).sup.2] or[(FOM).sup.2], and all are less accurate than ClB[(FOM).sup.2], aspredicted by the comparative result (3.12).

For the right part of Figure 5.2, the first column of B is a linearcombination of the four other columns. The right figure shows thatClB[(FOM).sup.2] stagnates almost immediately because it does not reactto the rank deficiency of B. Indeed, when computing R in the economicQR-factorization of B as a scaling quotient, the algorithm treats the Rfactor as full rank since the smallest element on the diagonal is of[??], i.e., neither exactly nor numerically zero. Taking inversesintroduces enough numerical error so that by the next cycle, the basisvector (now inaccurate) has full rank. It halts once the code detectsthat the norm of the error is no longer monotonically decreasing. Incontrast, ClB[(FOM).sup.2] with deflation converges correctly, albeitwith slow runtime. This is due to the comparatively high cost for theRRQR factorization. Although the cost per cycle is still high comparedto, e.g., GlB[(FOM).sup.2] and [(FOM).sup.2], significantly fewer cyclesare required overall.

It is important to note that this situation is contrived, andlinear dependence amongst the columns of basis vectors is unlikely tooccur in most practical situations. However, given how slow the runtimeof ClB[(FOM).sup.2] with deflation is in our Matlabimplementation--roughly 20 times slower in the full-rank example--we donot examine it in further examples. Furthermore, given the excellentperformance of GlB[(FOM).sup.2], which requires no special treatment ofrank deficiencies, tailoring ClB[(FOM).sup.2] to work in rank deficientcases may not be necessary.

5.5. Overlap Dirac operator and f(z) = sign(z). Quantumchromodynamics (QCD) is an area of theoretical particle physics in whichthe strong interaction between quarks and gluons is studied. Simulationsare carried out on a four-dimensional space-time lattice, where eachlattice point is assigned 12 variables and each variable corresponds toa combination of three colors and four spins. One must solve systemsinvolving the overlap Dirac operator, described, e.g., in [23, 47]. Thesolution of such systems entails the computation of sign(Q)[E.sub.i],where the matrix Q represents a periodic nearest-neighbor coupling onthe lattice and [E.sub.1] is the first 12 unit vectors of the latticespace. The matrix Q is large, sparse, complex and Hermitian if thechemical potential is zero, which is what we consider here.

We take an [8.sup.4] lattice such that n = 12 [8.sup.4] = 49152 andA = [Q.sup.2] [member of] [C.sup.49152 x 49152.] Using sign(z) =[([z.sup.2]).sup.-1/2z], we compute sign(Q) [E.sub.1] as [A-.sup.1/2B],where B = Q[E.sub.1] [member of] [C.sup.49152 x 12. We set the errortolerance to [10.sup.-6] and vary the cycle length, letting m[memberof]{25,50,100,150}. We also include HyB[(FOM).sup.2] in this series oftests, with q = 4.

Figure 5.3 shows the correlation between the cycle length and thenumber of cycles required to converge. Aside from GlB[(FOM).sup.2]requiring one extra cycle when m = 50, all methods require the samenumber of cycles to converge for a given m. In which case, acomputationally less intensive method, such as LiB[(FOM).sup.2] orGlB[(FOM).sup.2], should be preferred since ClB[(FOM).sup.2] does nottake advantage of the extra information in its larger approximationspace as m increases.

5.6. Bus power system and log(z + 1)/z. We now consider theStieltjes function f(z) = log(z + 1)/z and its action on theS-admittance matrix A of a bus power system, specifically matrixHB/1138_bus of the University of Florida Sparse Matrix Collection [14].The symmetric matrix A has a condition number of [??], and its sparsitypattern is given in the left of Figure 5.4. The block vector B hasrandom entries, and we vary the number of its columns, i.e., s [memberof] {2,6,12,24}, while the cycle length m is kept constant at m = 50. Weinclude HyB[(FOM).sup.2] in this set of tests as well, with q [memberof] {1,2,4, 6}. Note that with q = 1, HyB[(FOM).sup.2] is arithmeticallyequivalent to LiB[(FOM).sup.2].

As shown in Figure 5.4, LiB[(FOM).sup.2] and [(FOM).sup.2] requireroughly the same number of cycles to converge, with GlB[(FOM).sup.2]requiring the next largest number of cycles. For both ClB[(FOM).sup.2]and HyB[(FOM).sup.2], the cycle count decreases as s increases. Since[S.sup.Cl] = [C.sup.s x s], the classical method obtains its iteratesfrom the largest possible block Krylov space among the methods. Inparticular, it takes information from m x s = 50s columns, which, as sincreases, becomes quite large relative to the problem size n = 1138,hence why ClB[(FOM).sup.2] benefits so substantially from the additionalcolumns in this example. Considering that HyB[(FOM).sup.2] is a mixtureof LiB[(FOM).sup.2] and ClB[(FOM).sup.2], it is noteworthy that themethod exhibits trends closer to those of ClB[(FOM).sup.2] than ofLiB[(FOM).sup.2].

5.7. Convection-diffusion equation and exp(z). In the finalexample, we look at the action of the exponential on a series ofmatrices that vary in degree of non-symmetry. The matrices correspond tothe standard finite differences discretization of a two-dimensionalconvection-diffusion equation on [0,1] x [0,1] with a constantconvection field and convection parameter v. We use 350 discretizationpoints in each dimension and a scaling parameter of 2 [10.sup.-3],resulting in matrices of size [350.sup.2] x [350.sup.2] = 122,500 x122,500. We look at three matrices [A.sub.v], for v[memberof]{0,100,200}. When v = 0, [A.sub.v] is real symmetric; otherwise,[A.sub.v] is non-symmetric.

B[(FOM).sup.2] converges in every scenario, as exhibited by theplots in Figure 5.5. In each figure, the curves for ClB[(FOM).sup.2] andHyB[(FOM).sup.2] overlap with each other and likewise the curves forGlB[(FOM).sup.2], LiB[(FOM).sup.2], and [(FOM).sup.2], resulting in whatappears to be only two curves. For a given v, each method requires thesame number of cycles: 4 for v = 0, 6 for v = 100, and 9 for v = 200.

6. Conclusions. As matrix functions are featured more in scientificcomputing, it is vital that a solid theoretical foundation be laid forthe methods used to evaluate them. In this work, we have built such afoundation for block Krylov methods for matrix functions within aframework that generalizes existing theory for block Krylov methods forlinear systems. This framework, which hinges on a *-subalgebra S, ablock inner product [<<x,x>>.sub.S] and a scaling quotientN, opens up avenues for new block Krylov methods, as demonstrated by thesuccess of our hybrid method in the numerical examples.

We have also established a variational characterization of andconvergence theory for B[(FOM).sup.2] in the case of[<<x,x>>.sub.S]-self-adjoint and[<x,x>.sub.s]-positive definite matrices and Stieltjes functions,drawing heavily on conjugate gradients theory. While the resultingconvergence bounds are shown to be too pessimistic in practice, theycomprise a necessary theoretical component for B[(FOM).sup.2].Additionally, in our formulation of B[(FOM).sup.2], not only do weaccount for restarts, but we do so in an efficient way via the matrixerror function [??] and additive correction [??] The variety ofnumerical experiments we have presented show that updating with [??]leads to stable approximations [??].

It is also clear from our numerical experiments that the classicalblock inner product may not always be the most efficientcomputationally. In several cases, it requires nearly the same number ofcycles to reach the desired accuracy as the global or loop-interchangemethods, which are much cheaper to implement due to the sparsitystructure of their respective *-algebras. As demonstrated by the buspower system examples, however, the classical method is advantageouswhen s is large relative to the system size. Because of such differencesin performance, our generalized framework becomes all the more importantsince one can choose a block inner product that suits the problem athand without having to conduct a new analysis.

7. Acknowledgments. The authors would like to thank Tyrone Rees fora template used to make Figure 2.2, Marcel Schweitzer for code thatformed the foundation of the implementation of Algorithm 2, data used inSection 5.5, and many useful discussions, and Stanley Eisenstat for acopy of [20], which provided insight for the proof of Theorem 3.7.

REFERENCES

[1] M. AFANASJEW, M. EIERMANN, O. G. ERNST, AND S. GUTTEL,Implementation of a restarted Krylov subspace method for the evaluationof matrix functions, Linear Algebra Appl., 429 (2008), pp. 2293-2314.

[2] A. H. AL-MOHY AND N. J. HIGHAM, Computing the action of thematrix exponential, with an application to exponential integrators, SIAMJ. Sci. Comput., 33 (2011), pp. 488-511.

[3] J. I. ALIAGA, D. L. BOLEY, R. W. FREUND, AND V. HERNANDEZ, ALanczos-type method for multiple starting vectors, Math. Comp., 69(2000), pp. 1577-1601.

[4] A. H. BAKER, J. M. DENNIS, AND E. R. JESSUP, On improvinglinear solver performance: a block variant of GMRES, SIAM J. Sci.Comput., 27 (2006), pp. 1608-1626.

[5] F. P. A. BEIK AND D. K. SALKUYEH, On the global Krylov subspacemethods for solving general coupled matrix equations, Comput. Math.Appl., 62 (2011), pp. 4605-4613.

[6] P. BENNER, P. KURSCHNER, AND J. SAAK, Frequency-limitedbalanced truncation with low-rank approximations, SIAM J. Sci. Comput.,38 (2016), pp. A471-A499.

[7] M. BENZI AND V. SIMONCINI, Approximation of functions of largematrices with Kronecker structure, Numer. Math., 135 (2017), pp. 1-26.

[8] S. BIRK, Deflated Shifted Block Krylov Subspace Methods forHermitian Positive Definite Matrices, PhD. Thesis, FachbereichMathematik und Naturwissenschaften der Bergischen Universitat Wuppertal,Wuppertal, 2015.

[9] R. BOUYOULI, K. JBILOU, R. SADAKA, AND H. SADOK, Convergenceproperties of some block Krylov subspace methods for multiple linearsystems, J. Comput. Appl. Math., 196 (2006), pp. 498-511.

[10] C. G. BROYDEN, A breakdown of the block CG method, Optim.Methods Softw., 7 (1997), pp. 41-55.

[11] D. DAMANIK, A. PUSHNITSKI, AND B. SIMON, The analytic theoryof matrix orthogonal polynomials, Surv. Approx. Theory, 4 (2008), pp.1-85.

[12] P. I. DAVIES AND N. J. HIGHAM, Computing f(A)b for matrixfunctions f, in QCD and numerical analysis III, A. Borici., A. Frommer,B. Joo, A. Kennedy, and B. Pendleton, eds., vol. 47 of Lect. NotesComput. Sci. Eng., Springer, Berlin, 2005, pp. 15-24.

[13] P. J. DAVIS AND P. RABINOWITZ, Methods of NumericalIntegration, Academic Press, New York, 1975.

[14] T. A. DAVIS AND Y. HU, The University of Florida sparse matrixcollection, ACM Trans. Math. Software, 38 (2011), Art. 1, 25 pages.

[15] J. E. DENNIS, JR., J. F. TRAUB, AND R. P. WEBER, The algebraictheory of matrix polynomials, SIAM J. Numer. Anal., 13 (1976), pp.831-845.

[16] R. DORRANCE, F. REN, AND D. MARKOVIC, A scalable sparsematrix-vector multiplication kernel for energy-efficient sparse-BLAS onFPGAs, in Proceedings of the 2014 ACM/SIGDA International Symposium onField-programmable Gate Arrays, FPGA '14, ACM, New York, 2014, pp.161-170.

[17] A. A. DUBRULLE, Retooling the method of block conjugategradients, Electron. Trans. Numer. Anal., 12 (2001), pp. 216-233.http://etna.ricam.oeaw.ac.at/vol.12.2001/pp216-233.dir/pp216-233.pdf

[18] I. S. DUFF, M. A. HEROUX, AND R. POZO, An overview of thesparse basic linear algebra subprograms: the new standard from the BLASTechnical Forum, ACM Trans. Math. Software, 28 (2002), pp. 239-267.

[19] M. EIERMANN AND O. G. ERNST, A restarted Krylov subspacemethod for the evaluation of matrix functions, SIAM J. Numer. Anal., 44(2006), pp. 2481-2504.

[20] S. C. EISENSTAT, On the rate of convergence of B-CG andBGMRES, 2015. Unpublished.

[21] A. EL GUENNOUNI, K. JBILOU, AND H. SADOK, The block Lanczosmethod for linear systems with multiple right-hand sides, Appl. Numer.Math., 51 (2004), pp. 243-256.

[22] L. ELBOUYAHYAOUI, A. MESSAOUDI, AND H. SADOK, Algebraicproperties of the block GMRES and block Arnoldi methods, Electron.Trans. Numer. Anal., 33 (2008/09), pp. 207-220.http://etna.ricam.oeaw.ac.at/vol.33.2008-2009/pp207-220.dir/pp207-220.pdf

[23] J. VAN DEN ESHOF, A. FROMMER, T. LIPPERT, K. SCHILLING, AND H.A. VAN DER VORST, Numerical methods for the QCD overlap operator. I.Sign-function and error bounds, Comput. Phys. Comm., 146 (2002), pp.203-224.

[24] A. FROMMER, S. GUTTEL, AND M. SCHWEITZER, Convergence ofrestarted Krylov subspace methods for Stieltjes functions of matrices,SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1602-1624.

[25] --, Efficient and stable Arnoldi restarts for matrix functionsbased on quadrature, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 661-683.

[26] A. FROMMER, K. LUND, M. SCHWEITZER, AND D. B. SZYLD, TheRadau--Lanczos method for matrix functions, SIAM J. Matrix Anal. Appl.,38 (2017), pp. 710-732.

[27] A. FROMMER AND V. SIMONCINI, Matrix functions, in Model OrderReduction: Theory, Research Aspects and Applications, W. H. A.Schilders, H. A. van der Vorst, and J. Rommes, eds., vol. 13 ofMathematics in Industry, Berlin, 2008, Springer, pp. 275-304.

[28] A. GAUL, Recycling Krylov Subspace Methods for Sequences ofLinear Systems--Analysis and Applications, PhD. Thesis, FakultatII-Mathematik und Naturwissenschaften, Technischen Universitat Berlin,Berlin, 2014.

[29] I. GOHBERG, P. LANCASTER, AND L. RODMAN, Matrix Polynomials,SIAM, Philadelphia, 2009.

[30] C. GREIF, T. REES, AND D. B. SZYLD, GMRES with multiplepreconditioners, SeMA J., 74 (2017), pp. 213-231.

[31] L. GRIGORI, S. MOUFAWAD, AND F. NATAF, Enlarged Krylovsubspace conjugate gradient methods for reducing communication, SIAM J.Matrix Anal. Appl., 37 (2016), pp. 744-773.

[32] W. GROENEVELT, M. E. H. ISMAIL, AND E. KOELINK, Spectraldecomposition and matrix-valued orthogonal polynomials, Adv. Math., 244(2013), pp. 91-105.

[33] M. H. GUTKNECHT, Block Krylov space methods for linear systemswith multiple right-hand sides: an introduction, in Modern MathematicalModels, Methods and Algorithms for Real World Systems, A. Siddiqi, I.Duff, and O. Christensen, eds., Anamaya, New Delhi, 2007, pp. 420-447.

[34] M. H. GUTKNECHT AND T. SCHMELZER, The block grade of a blockKrylov space, Linear Algebra Appl., 430 (2009), pp. 174-185.

[35] P. HENRICI, Applied and Computational Complex Analysis. Vol.2, Wiley, New York, 1977.

[36] M. HEYOUNI AND A. ESSAI, Matrix Krylov subspace methods forlinear systems with multiple right-hand sides, Numer. Algorithms, 40(2005), pp. 137-156.

[37] N. J. HIGHAM, Functions of Matrices: Theory and Computation,SIAM, Philadelphia, 2008.

[38] M. HOEMMEN, Communication-Avoiding Krylov Subspace Methods,PhD. Thesis, Electrical Engineering and Computer Sciences, University ofCalifornia, Berkeley, 2010.

[39] M. ILIC, I. W. TURNER, AND D. P. SIMPSON, A restarted Lanczosapproximation to functions of a symmetric matrix, IMA J. Numer. Anal.,30 (2010), pp. 1044-1061.

[40] K. JBILOU, A. MESSAOUDI, AND H. SADOK, Global FOM and GMRESalgorithms for matrix equations, Appl. Numer. Math., 31 (1999), pp.49-63.

[41] M. D. KENT, Chebyshev, Krylov, Lanczos: Matrix Relationshipsand Computations, PhD. Thesis, Department of Computer Science, StanfordUniversity, 1989.

[42] L. KNIZHNERMAN AND V. SIMONCINI, A new investigation of theextended Krylov subspace method for matrix function evaluations, Numer.Linear Algebra Appl., 17 (2010), pp. 615-638.

[43] P. LANCASTER, Lambda-Matrices and Vibrating Systems, PergamonPress, Oxford, 1966.

[44] L. LOPEZ AND V. SIMONCINI, Preserving geometric properties ofthe exponential matrix by block Krylov subspace methods, BIT, 46 (2006),pp. 813-830.

[45] D. G. LUENBERGER, Linear and Nonlinear Programming, 2nd ed.,Adison-Wesley, Reading, 1984.

[46] R. B. MORGAN, Restarted block-GMRES with deflation ofeigenvalues, Appl. Numer. Math., 54 (2005), pp. 222-236.

[47] H. NEUBERGER, Exactly massless quarks on the lattice, Phys.Lett. B, 417 (1998), pp. 141-144.

[48] A. A. NIKISHIN AND A. Y. YEREMIN, Variable block CG algorithmsfor solving large sparse symmetric positive definite linear systems onparallel computers. I. General iterative scheme, SIAM J. Matrix Anal.Appl., 16 (1995), pp. 1135-1153.

[49] --, An automatic procedure for updating the block size in theblock conjugate gradient method for solving linear systems, J. Math.Sci., 114 (2003), pp. 1844-1853.

[50] D. P. O'LEARY, The block conjugate gradient algorithm andrelated methods, Linear Algebra Appl., 29 (1980), pp. 293-322.

[51] M. L. PARKS, E. DE STURLER, G. MACKEY, D. D. JOHNSON, AND S.MAITI, Recycling Krylov subspaces for sequences of linear systems, SIAMJ. Sci. Comput., 28 (2006), pp. 1651-1674.

[52] S. RASHEDI, G. EBADI, S. BIRK, AND A. FROMMER, On shortrecurrence Krylov type methods for linear systems with many right-handsides, J. Comput. Appl. Math., 300 (2016), pp. 18-29.

[53] Y. SAAD, On the Lanczos method for solving symmetric linearsystems with several right-hand sides, Math. Comp., 48 (1987), pp.651-662.

[54] --, Analysis of some Krylov subspace approximations to thematrix exponential operator, SIAM J. Numer. Anal., 29 (1992), pp.209-228.

[55] --, Iterative Methods for Sparse Linear Systems, 2nd ed.,SIAM, Philadelphia, 2003.

[56] T. SCHMELZER, Block Krlyov Methods for Hermitian LinearSystems, PhD. Thesis, Department of Mathematics, University ofKaiserslautern, Kaiserslautern, 2004.

[57] M. SCHWEITZER, Restarting and Error Estimation in Polynomialand Extended Krylov Subspace Methods for the Approximation of MatrixFunctions, PhD. Thesis, Bergische Universitat Wuppertal, Wuppertal,2016.

[58] V. SIMONCINI, Ritz and pseudo-Ritz values using matrixpolynomials, Linear Algebra Appl., 241/243 (1996), pp. 787-801.

[59] V. SIMONCINI AND E. GALLOPOULOS, An iterative method fornonsymmetric systems with multiple right-hand sides, SIAM J. Sci.Comput., 16 (1995), pp. 917-933.

[60] --, Convergence properties of block GMRES and matrixpolynomials, Linear Algebra Appl., 247 (1996), pp. 97-119.

[61] --, A hybrid block GMRES method for nonsymmetric systems withmultiple right-hand sides, J. Comput. Appl. Math., 66 (1996), pp.457-469.

[62] V. SIMONCINI AND D. B. SZYLD, Recent computationaldevelopments in Krylov subspace methods for linear systems, Numer.Linear Algebra Appl., 14 (2007), pp. 1-59.

[63] K. M. SOODHALTER, A block MINRES algorithm based on the bandLanczos method, Numer. Algorithms, 69 (2015), pp. 473-494.

[64] L. N. TREFETHEN, J. A. C. WEIDEMAN, AND T. SCHMELZER, Talbotquadratures and rational approximations, BIT, 46 (2006), pp. 653-670.

[65] J. A. C. WEIDEMAN, Optimizing Talbot's contours for theinversion of the Laplace transform, SIAM J. Numer. Anal., 44 (2006), pp.2342-2362.

[66] J. A. C. WEIDEMAN AND L. N. TREFETHEN, Parabolic andhyperbolic contours for computing the Bromwich integral, Math. Comp., 76(2007), pp. 1341-1356.

[67] J. ZHANG, H. DAI, AND J. ZHAO, A new family of global methodsfor linear systems with multiple right-hand sides, J. Comput. Appl.Math., 236 (2011), pp. 1562-1575.

ANDREAS FROMMER ([dagger]), KATHRYN LUND ([dagger] [doubledagger]), AND DANIEL B. SZYLD ([double dagger])

([dagger]) Fakultat fur Mathematik und Naturwissenschaften,Bergische Universitat Wuppertal, 42097 Wuppertal, Germany([emailprotected]).

([double dagger]) Department of Mathematics, Temple University(038-16), 1805 N. Broad Street, Philadelphia, Pennsylvania 19122-6094,USA ({katlund,szyld}@temple.edu).

* Received March 21, 2017. Accepted July 29, 2017. Published onlineon September 22, 2017. Recommended by K. Jbilou. A. Frommer and K. Lundare supported in part by Deutsche Forschungsgemeinschaft throughCollaborative Research Centre SFB TRR55 "Hadron Physics fromLattice QCD." K. Lund and D. B. Szyld are supported in part by theU.S. National Science Foundation grant DMS-1418882. D. B. Szyld is alsosupported by the U.S. Department of Energy grant DE-SC 0016578.

(1)This equality also explains why we do not extend the concept ofa scaling quotient to rank deficient X. Then, at least one of the two, Yor N(X), are necessarily rank-deficient, and it seems impossible toguarantee that X = YN(X) can be fulfilled for general[<<x,x>>.sub.S].

(2) The notation [omicron] is attributed to Gragg in [41].

(3)We refer to, e.g., [27, 37] for a discussion of why and whenthis definition of a matrix function coincides with the standarddefinition based on interpolating polynomials.

(4) A package of our routines written in Matlab can be found athttps://gitlab.com/katlund/

COPYRIGHT 2017 Institute of Computational Mathematics
No portion of this article can be reproduced without the express written permission from the copyright holder.

BLOCK KRYLOV SUBSPACE METHODS FOR FUNCTIONS OF MATRICES. (2024)

References