The reason for studying the multiplication algorithm for finding the eigen-pairs (eigenvalues and eigenvectors) of a symmetric (Hermitian) matrix are
The multiplication algorithm is not my invention -- my professor mentioned it during his seminar in the Summer of 1948. However, I worked-out the details on my own. And, I also discovered how to reduce the run-time from about 10^6 * n^3 to only about 20 * n^3 per eigen-pair. These are typical values -- formulae are presented.
Often, we have a priori knowledge of the structure of a matrix. This knowledge eases the task of finding the eigen-pairs and the subsequent factorization.
Definition: The function f(x) = det(x I - A) is called the characteristic function of the matrix A.
Lemma: Since the characteristic function is a polynomial, by Gauss's fundamental theorem of algebra, the characteristic
equation f(x) = 0 has a root.
Theorem: Since the characteristic polynomial is of n-th degree in x, by the corollary to the fundamental theorem of algebra,
the characteristic equation has exactly n roots.
Definition: These roots are known as the eigenvalues.
Lemma: For each eigenvalue x, the matrix (x I - A) is singular.
Theorem: Thus, there exists a row-vector v that solves the equation v (x I - A) = 0.
Corollary: Any real (non-zero) multiple of v also satisfies the equation. Normalize v, so that v^2 = 1. It
becomes unique, except for sign.
Definition: This normalized vector is called the eigenvector corresponding to the eigenvalue.
Definition: The pair (x, v) is called an eigen-pair.
Lemma: For a symmetric (Hermitian) matrix A, the eigen-vectors corresponding to distinct eigen-values are orthogonal; thus
orthonormal.
Proof by partition and conquer: It is trivial for the size n=1. This is the first step of
Mathematical Induction.. Prove by brute-force for size n=2 -- not too bad. Partition the matrix into four sub-matrices, with the last
diagonal element as the fourth partition. Apply the result for n=2, to go from n to n+1. This is the second step of Mathematical
Induction. Thus, it is true for any natural number n. QED.
Lemma: For a symmetric (Hermitian) matrix A, if there are such, for any set m of equal -- said to be degenerate --
eigenvalues, the corresponding null-space is of dimension m. We take any set of m orthonormal vectors in this space as the corresponding
eigenvectors. While the degenerate eigenvectors are not unique, the space is unique.
Theorem: For a symmetric (Hermitian) matrix A, the set of eigenvectors may be written as the rows of an orthonormal
(unitary) matrix. For convenience, uniqueness, and convention, we take this matrix as proper; that is, so that its determinant would be
positive.
Corollary: Any symmetric (Hermitian) matrix A may be factored as A = < D >, where > = <tr is an orthonormal
(unitary) matrix of the eigenvectors and D is a real diagonal-matrix of the eigenvalues.
Proof: It only remains to prove that D is real. Since A is symmetric (Hermitian), Atr = A. Thus < D > = A = Atr = (< D
>)tr = >tr Dtr <tr = < Dtr >. Hence Dtr = D. QED.
Definition: Paul Anton Maurice Dirac named the < "bra" and > "ket".
Theorem: It is obvious that > = <tr iff (= if and only if) A is symmetric (Hermitian).
From what we have said, it is apparent that the task of finding the eigenvalues of a symmetric (Hermitian) matrix corresponds to that of finding
the roots of its characteristic polynomial. The same theorems, algorithms, and problems are common to both. There are two differences,
though.
Translation: The translation of the origin to xo may be accomplished by evaluating f(x + xo) = det((x + xo) I - A) =
det(x - (A - xo I)). That is A ==> (A - xo I). This matrix transformation clearly is easier than that for a polynomial. Since
the ket is the transpose of the bra, the factorization carries over as A = < D > ==> < (D - xo I) >. Thus, we see that if xo
is an eigenvalue, we are introducing a single nullity. On the other hand, a nullity disappears under any non-null translation.
Definition: A set of degenerate eigenvalues consists of those eigenvalues which have the same
magnitude. For a symmetric (Hermitian) matrix, these would either positive or negative of the same magnitude. If the degeneracy
involves eigenvalues of mixed signs, it is a pseudo-degeneracy; because, any non-null translation will break this
degeneracy. For emphasis, a same-sign degeneracy is said to be a true-degeneracy.
Definition: The kernel of a matrix consists of the set of null eigenvalues and all of the sets of
degenerate eigenvalues. Please observe that a single null eigenvalue would qualify for membership in the kernel. Caution: I have
seen an unrelated concept called a kernel, as well.
Any invariance is dear to the heart of a mathematician. These invariances may be employed to verify the correctness and precision of various computations.
Theorem: Given a pair of non-singular n-by-n square matrices A and B. The determinant of their product is the
product of their determinants: det(A B) = det(A) det(B).
Proof.
Corollary: The determinant of B A (1/B) is equal to that of A: det(B A / B) = det(A). Thus, the determinant is
invariant under such a transformation.
Theorem: The determinant of an orthonormal (unitary) matrix is either one or minus one. In the latter case, the matrix
is said to be improper.
Proof.
Corollary: Given a non-singular n-by-n square real (complex) matrix A and a pair of proper orthonormal (unitary) matrices
bra < and ket >. The determinant of their product is equal to that of the matrix A: det(< A >) = det(A). Thus, the
determinant is invariant under such a transformation.
Proof is in three steps. For A a diagonal matrix, employ the foregoing strategy. Then, for a general matrix A, factor it into a
bra-prime <', diagonal D, and ket-prime >': A = <' D >'. Finally, consider the product (< <') D (>' >) = < A
>. It is of the form of the first step. QED.
Theorem: Given a non-singular n-by-n square symmetric (Hermitian) matrix A and a pair of proper orthonormal
(unitary) matrices bra < and ket >, where > = <tr. The trace of their product is equal to that of the matrix A: trace(<
A >) = trace(A). Thus, the trace is invariant under such a transformation.
Proof.
Definition: The Frobenious matrix norm is defined as the square-root of the trace of the product of a
matrix by its transpose (Hermitian transpose): Frob-norm(A) = sqrt(trace(A Atr)).
Precision: The Frobenious matrix norm provides a valuable test for the precision of an alleged equality of matrices A and
B. Compute Frob-norm(A / B - I). It should be only negligibly greater than zero.
Theorem: It is obvious that in the factorization A = < D >, the ket > is equal to the transpose of the bra < iff
(= if and only if) the matrix A is symmetric (Hermitian).
Lemma: It is obvious that the sum of two symmetric (Hermitian) matrices, which have the same bras, has the same bra as
either of the matrices.
Lemma: It also is obvious that the square of a symmetric (Hermitian) matrix has the same bra as that matrix.
Lemma: Finally, it is obvious that the inverse of a symmetric (Hermitian) matrix has the same bra as that matrix.
Theorem: Any algebraic function or any function that may be represented by a Taylor series f(x) of a given symmetric
(Hermitian) matrix A has the same bra as that of the matrix. Furthermore, given the factorization A = < D >, where > = <tr, we
have the factorization f(A) = < f(D) >. Thus the eigenvectors are invariant, in particular, under each of the operations involved with
the multiplication or division algorithm.
Proof is obvious.
Lemma: Consider a diagonal matrix D, consisting of all zero's, except a single one somewhere. Also, consider a
vector v = (v1, v2, v3, ..., vn). Place it vertically in the bra <, at the same position as the one in the D. Let the ket > be
the transpose of the bra. Then, the product A = < D > = (a11, a12, ..., ann) has the elements aij = vi * vj, regardless of the position
of the one in the D. The other elements in the bra or ket are irrelevant; because, they multiply by the zero elements of the diagonal.
We may invert this multiplication. Take the square root of one of the diagonal elements of A and divide each of the elements in the
corresponding row (or column) by it, to obtain the elements of the vector v. Normalize it. Place it into the first column of the bra
and into the first row of the ket. Then, D = <tr A >tr, with the eigenvalue in the first position. If the original matrix was
positive, there is no ambiguity in finding the root; otherwise, one of the two roots is extraneous.
Lemma: Consider a diagonal matrix D, consisting of all zero's, except for a single one somewhere and an element d in the
open interval (0, 1) somewhere else. Its p-th power has a one and d^p. Thus, it converges the a D of the aforementioned form -- only a
single one.
Now, we are in a position to compute the speed of this convergence. The precision of a floating-point number usually is stated as 10^(-e),
where e is between 12 and 15, depending upon ones generosity. Then, the largest value of d, which is distinguishably less than one, is (1 -
10^(-e)). We desire that its p-th power be just as close to zero. As an equation
Take its logarithm and solve for p, to obtain
By the classical multiplication, this takes p-1 matrix multiplications. Clearly, an impractical result. However, it does guarantee that any symmetric (Hermitian) matrix is capable of having its eigen-pairs computed. Hence, if some algorithm fails to do so; it is the fault of that algorithm, or of its implementation. My original contribution was to observe that if we perform successive squaring, we only need (ln(p) / ln(2)) matrix multiplications.
This squaring algorithm is eminently practicable. Since each matrix multiplication (or squaring) takes exactly n^3 FP (=
floating-point) multiplications, we have a firm upper-bound upon the amount of FP-multiplications, as n^3 times either of the aforementioned
values. However, do not raise the (A / d) to an excessively high power; because, doing so will cause the accumulated round-off errors to
break the degeneracies falsely. Hence, it is a value-judgment as to when to quit -- certainly well before the aforementioned worst-case
power.
The proof of each of the following is obvious; hence, we omit the proofs.
Theorem -- multiplication algorithm: Given a symmetric (Hermitian) matrix A, with a single largest in magnitude eigenvalue
d. The product (A / d) converges to the form given in the first lemma.
Corollary: Given a symmetric (Hermitian) matrix A, with a single-sign largest in magnitude eigenvalue d, with degeneracy of
m. The product (A / d) converges to m times that in the theorem. The trace of (A / d)^(2 p), with p an appropriately large integer,
will converge to m. If you keep track of it, you also will know the absolute-value of d. But, there is an easier way.
Corollary: The trace of (A (A / d)^(2 p)) is (m d). Division gives you d, including its sign.
Corollary -- short-cut deflation: The matrix (A - (A (A / d)^(2 p))) is deflated by this degenerate eigenvalue. It is
a symmetric (Hermitian) matrix.
Theorem: Given a symmetric (Hermitian) matrix A, with no mixed-sign degeneracies. By recursion, we obtain all of the
eigenvalues. We obtain the corresponding eigenvectors, except as follows. We cannot find the eigenvectors corresponding to any
null-eigenvalues. If the matrix is non-singular and if its smallest in magnitude eigenvalue is not degenerate, we cannot find the
corresponding eigenvector. For each non-null degenerate eigenvalue, we can find only one eigenvector. If -- contrary to the hypothesis
-- the matrix does have a mixed-sign -- that is, a pseudo -- degeneracy, this algorithm will fail; thus, revealing the presence of the mixed-sign
degeneracy.
Deflation -- long form. Place each of the known eigenvectors as a vertical-vector into the left-side of a bra. Fill it
out with random numbers. Employ the Gramm-Schmidt orthonormalization algorithm to obtain an orthonormal matrix <. If there are only
columns, corresponding to a single -- possibly truly-degenerate -- eigenvalue, of random numbers placed into the bra, you are finished.
Otherwise, proceed as follows. Multiply <tr A <, to obtain a matrix with a diagonal portion at the left-upper corner. The
partition in the right-lower corner will be a symmetric (Hermitian) matrix, of strictly smaller size. This long-form deflation does work for
pseudo-degeneracies.
Theorem: By recursion, we obtain the factorization A = < D >. It is unique, up to degeneracies. That is,
for any true-degeneracy of an eigenvalue, only the space is unique. The individual eigenvectors are an arbitrary set of orthonormal
vectors.
Analysis of a degeneracy: Given an n-by-n symmetric (Hermitian) matrix A, whose largest in magnitude eigenvalue has a magnitude d, with a degeneracy of m = pos + neg. Let p the the appropriately large integer of the preceding lemma. If trace(A^2) = 0; then the eigenvalue is equal to zero d = 0, with a multiplicity of m = n. Otherwise, let
Then, we may solve:
Hence, there are pos of eigenvalues equal to d and neg of eigenvalues equal to -d, for a total of m with a magnitude of d. If it should turn out that both pos and neg are not zero, then the short-form of the deflation is not applicable -- you have to employ the long-form, at least for this one eigenvalue. We should observe that, if you are keeping track, by the time that you have computed t0, you already should know the value of d. Hence, there is no need to evaluate t2. Also, usually you would know a priori that the matrix is positive. Hence, there is no need to evaluate t1.
Finally, retrace your steps, to obtain the factorization A = < D >, with > = <tr. Thus, we have obtained the set of
eigen-pairs. This multiplication algorithm is equivalent to the Horner's method for finding
the roots of a polynomial. Translation corresponds in the two algorithms. Deflation in the case of the multiplication algorithm for a
matrix corresponds to the division by the found factor in the Horner's method for a polynomial. The amount of FP-multiplications for the whole
algorithm is bounded by (n^4 / 4) times either of the foregoing values. The multiplication algorithm can be implemented in fixed-point
arithmetic very effectively, if such is available with sufficient precision.
Clean-up: The Gramm-Schmidt ortho-normalization algorithm may be employed to clean-up the bra; thereby reducing the
Frob-norm(< <tr - I). However, the Frob-norm(A (< (1 / D) <tr) - I) will not be affected significantly -- it actually may
increase somewhat.
The closely-related division algorithm consists of applying the foregoing multiplication-algorithm to B = 1 / (y I - A). While the multiplication algorithm finds the eigen-pairs in order of decreasing magnitude of eigenvalues, the division algorithm finds the eigen-pairs in order of distance from y. The division algorithm is particularly effective at finding a specified eigenvalue and the related eigenvector. Typically, it does so in three or four times n^3 multiplications. For a polynomial equation, the inversion is equivalent to the replacement of x by 1/x and then multiplication by x^n.
Some details:
If you perform any operation that alters the eigenvalues; you have to undo the operation, before reporting the eigenvalues of the original
matrix: multiplication by a scalar, rotation, inversion, or translation. On the other hand, the eigenvectors are invariant under each
of these operations; hence, they need not be undone. Also, the long-form of the deflation, which affects the eigenvectors, has to be undone
The eigenvalues are easier to resolve if one translates to their mean value, namely to the trace(A) / n. However, the resulting matrix will
become of mixed-sign.
Webmaster@rism.com Copyright (c) 2000 by R.I. 'Scibor-Marchocki. Last modified on Tuesday 31-st October 2000.