and
In the Complex chapter, we will show how to extend this Factorization-of-a-symmetric chapter to be applicable to a Hermitian matrix (the complex counterpart of a symmetric matrix). At this time, however, we consider each matrix to be real. This same comment applies to each of the succeeding chapters, until we arrive at the Complex chapter.
Definition: A vector is said to be axial, if it is specified up to a multiplicative scalar factor (which may be either positive, negative, or complex). That is, an axial vector has a direction but neither magnitude nor sense.
Lemma: Given an axial vector V. For any scalar z, the product zV is the same axial vector.
Proof is obvious.
Theorem: Given a unit axial vector V. For any scalar z whose magnitude is one, the product zV is the same unit axial vector. That is, a unit axial vector has a direction and a magnitude (of one) but no sense.
Proof is obvious.
Each of the following three theorems is a special case of the theorem providing an algebraic function of a symmetric matrix, which we had in the previous chapter. However, we provide the proofs here, to make the present discussion self-contained.
Theorem translation: Given a symmetric matrix S. For any real number u, each of the eigenvalues of the translated matrix S’ = S + u I is translated by u to become e’ = e + u. The eigenvectors are invariant. The inverse transformation is the translation by the additive inverse of u, namely -u.
Proof: The characteristic polynomial of S’ is 0 = det(S’ – x I) = det((S + u I) – x I) = det(S – (x I – u I)) = det(S – (x – u) I). Hence, e = e’ – u. Add u to each side, to obtain e’ = e + u. It is obvious that the eigenvectors are invariant. QED.
Theorem gauge: Given a symmetric matrix S. For any non-zero real number u, each of the eigenvalues of the matrix S’ = u S is e’ = u e. The eigenvectors are invariant. This transformation is called “gauge” or “scaling”. The inverse transformation is the gauge by the multiplicative inverse of u, namely 1/u.
Proof: The characteristic polynomial of S’ is 0 = det(S’ – x I) = det((u S) – x I). Divide by u, to obtain 0 = det(S – (x / u) I). Hence, e = e’ / u. Multiply each side by u, to obtain e’ = u e. It is obvious that the eigenvectors are invariant. QED.
Theorem inversion: Given a non-singular symmetric matrix S. Each of the eigenvalues of its inverse S’ = Sinv is the reciprocal of the corresponding eigenvalue of S. The eigenvectors are invariant. Often, this transformation is called “division”. This inversion transformation is idempotent; that is, it is its own inverse.
Proof: Factor the matrix S into S = O E Otr, where O is an ortho-normal matrix of the eigenvectors and E is a diagonal matrix of the corresponding eigenvalues. Invert the matrix S, to obtain S’ = Sinv = (Otr)inv Einv Oinv = O Einv Otr. Hence, Einv is the diagonal matrix of the eigenvalues of the matrix S’ and O is the ortho-normal matrix of its eigenvectors. QED.
A similarity transformation by an arbitrary (of convenience) ortho-normal matrix Q. The inverse transformation is that by Qtr.
Find an eigenvector.
Given a symmetric matrix S, each of whose eigenvalues is zero, except for one eigenvalue with a value of one.
Theorem: The matrix is (ai aj) and the eigenvector corresponding to that non-zero eigenvalue is (ai).
Proof: Hint: Factor the matrix S into S = O E Otr, where O is an ortho-normal matrix of the eigenvectors and E is a diagonal matrix of the corresponding eigenvalues.
Corollary: Each of the diagonal elements (ai ai) of the matrix S is positive.
Proof: The square of a real number is positive. QED.
Corollary: The trace of the matrix S is one.
Proof: Factor the matrix S into S = O E Otr, where O is an ortho-normal matrix of the eigenvectors and E is a diagonal matrix of the corresponding eigenvalues. It was given that each element of E is zero, except for one element which is one. Hence, the trace of E is one. The matrix S is similar to the matrix E. The trace is invariant under a similarity transformation. QED.
Corollary: We have aj = (ai aj) / ai and ai = (ai aj) / aj.
Proof is obvious.
Corollary: The maximum of the diagonal elements of S – {ai ai} – exists.
Proof: Hint: The set of real numbers is totally ordered. The maximum of a finite set of totally ordered numbers exists.
Algorithm: Use the corresponding index in the penultimate corollary to compute the elements of the
eigenvector.
The referenced theorem computes the corresponding eigenvalue as e = V S Vtr, where V is
taken as the horizontal eigenvector.
Theorem: For a positive symmetric matrix S, whose largest eigenvalue is e of multiplicity m, the limit, as p increases without bound, of trace(((1 / e) S)^p) --> m and the ((1 / e) S)^p --> Vtr V, where V is the set of m horizontal eigenvectors corresponding to the eigenvalues e. The Gramm-Schmidt ortho-normalization algorithm may be employed to orthogonalize this set of vectors. We call m the degeneracy of this eigenvalue.
Proof: Hint: Consider the similarity transformation of the matrix S to the diagonal matrix of its eigehvalues.
Corollary: The space spanned by these vectors is unique. Furthermore, if the multiplicity m is one, the vector is a unique axial vector. As a result of how we obtained it, this will be a unit vector; that is, its magnitude will be one.
Proof is obvious.
Algorithm multiplication: Making the foregoing theorem into the multiplication algorithm is tricky; because we have no a priori knowledge of the value of m. After each multiplication, we calculate the trace. We divide the current (multiplied) matrix by the trace. In this form, the trace will not converge. However, the expression mod(1.0 / trace + 0.5, 1.0) – 0.5 will converge to zero. Of course, if we multiply too many times; the accumulated round-off errors will break anomalously the degeneracy. Once convergence is achieved, the value of the multiplicity m is the reciprocal of the trace. Now that we have found the value of the multiplicity m, we can go back and correct the final matrix. The correction of the final matrix is easy – just multiply it by m. To find the eigenvalue e, multiply the resulting matrix by the original matrix. Then, the eigenvalue e is given by e = trace / m.
Theorem deflation—matrix rank of the matrix: The deflated
matrix (S - e (Vtr V)) has the original eigenvalue e replaced by zero.
Thus, we say that we have killed the eigenvalue. We will define and discuss the matrix-rank in the next chapter
Proof is obvious.
Lemma: pre-multiply the preceding deflated matrix by V and post-multiply by Vtr, to obtain V (S - e (Vtr V)) Vtr = V S Vtr - e I. In this form, it has m zero's along the diagonal, at the position where we have placed the vector V. The inverse transformation consists of pre-multiplication by Vtr and post-multiplication by V.
Proof is obvious.
Theorem deflation – order of the matrix: We delete these m rows and m columns. Now, we have a matrix smaller by m; that is, the matrix now is (n - m) x (n - m); its matrix order is n – m. The inverse transformation consists of an augmentation by an identity matrix of matrix-rank m.
Proof is obvious.
The foregoing transformation S’ = S – e (Vtr V0 affects only the single eigenvalue e. Contrast the translation transformation S’ = S – e I, which affects all the eigenvalues. Why? Because the translation may be written as S’ = S – e I = S – e (Otr O). In this form, we see that we are multiplying by each of the eigenvectors. Hence, each eigenvalue is affected.
Finally, we arrive at a zero matrix. Are we finished? Not necessarily. Have we found n eigenvalues (counting their multiplicities)? If not all of the n eigenvalues have been accounted; the balance is the nullity of the original matrix. Take all of the eigenvectors that have been found thus far. Augment them with sufficient random vectors to obtain n vectors. Employ the Gramm-Schmidt ortho-normalization algorithm to obtain the remaining eigenvectors -- they correspond to the eigenvalues which are equal to zero.
If the original matrix is not positive, we first fold it. (Since this forward reference is to a definition, we do not create a circular logical argument thereby.) That is, we multiply the given matrix by its transpose. This resulting matrix is positive. Now, we may find its first set of eigenpairs, with the eigenvector being of degeneracy m. Multiply this matrix by the original matrix. The trace will become t = mp - mn, where mp is the positive multiplicity and mn is the negative multiplicity. Their sum, of course, is m = mp + mn. Solve, to obtain mp = (m + t) / 2 and mn = (m - t) / 2. The diagonal element corresponding to each of the mn eigenvalues will be negative. Thus, we have resolved the mixed-sign pseudo-degeneracy of the eigenvalue.
Algorithm division: Works only for a non-singular symmetric matrix So, one of whose eigenvalues e is known approximately. If the exact eigenvalue is known, change its value slightly. Compute the matrix S = (So - e I)inv. You may want to employ the twice algorithm for this inversion. From its construction, the matrix S is symmetric. Apply the multiplication algorithm for just the one eigenvalue. It converges very rapidly. Actually, usually, S already is converged. We have obtained the multiplicity and the set of eigenvectors. Work backwards through the computation of S, to obtain the precise value of the eigenvalue of the original matrix So. The division algorithm yields only one eigenvalue and its set of eigenvectors.
Algorithm squaring: The squaring algorithm is a variant of the multiplication algorithm, but which converges much faster: Instead of multiplying, square. For s squaring operations, the effective multiplying operations m is m = 2^s. Let epsilon = mod(1.0 / trace + 0.5, 1.0) – 0.5. The convergence criterion is abs(epsilon) < delta. The multiplicity is 1 / trace - epsilon. This algorithm is analogous to and based upon the squaring algorithm for finding the real roots of a polynomial. Let us remind the reader that the sum of the eigenvalues (i.e., the roots of the characteristic polynomial) may be computed as the trace of the matrix, while the product of the eigenvalues may be computed as the determinant of the matrix.
[I discovered/devised the squaring algorithm in 1961. While I have not seen it in the literature, it is such an obvious variant of the multiplication algorithm, that probably someone else also has discovered it.]
How fast does the convergence occur? On a computer which employs IEEE double-precision floating-point arithmetic, zero = 10^(-16), approximately. Let r be the ratio of the next largest eigenvalue to the largest one. Then, for the multiplication algorithm, we have the equation r^p = zero. Solve for p, to obtain p = ln(zero) / ln(r). While for the squaring algorithm: p = 2^s. Solve for s, to obtain s = ln(p) / ln(2) = ln(ln(zero) / ln(r)) / ln(2).
Lemma: The product of a converged-power of a given symmetric matrix and the original matrix still is converged.
Proof is obvious.
Lemma: The trace of any converged odd-power of a given symmetric matrix has the sign of the corresponding eigenvalue.
Proof is obvious.
Theorem: The product of a converged squaring of a given symmetric matrix and the original matrix has the sign of the corresponding eigenvalue.
Proof is obvious.
Theorem: Each eigenvalue of a real matrix into which a complex matrix has been mapped is degenerate with an even multiplicity.
Proof: Since this mapping places two copies of the complex matrix onto the real matrix, the multiplicity of each of the complex eigenvalues is doubled. QED.
Now that we have all of the pieces, we need to devise a strategy. I did it three times, already:
On a Texas Instruments TI-49 programmable calculator, employing a built-in proprietary interpreter.
On a Data General mainframe computer, employing a proprietary dialect of Fortran IV.
On an IBM desktop computer, employing their APL (= A Programming Language) interpreter. The multiplicative family is a natural fit here.
The strategy has to include recursively performing the inverse transformations of any transformation that has global side-effects: translation, gauge, inversion, similarity, and the two deflation transformations. Caution: None of these six transformations commute.
I propose the following strategy. It should work, as stated. However, I have not verified it in terms of a
computer program.
Start with either a symmetric xor a Hermitian matrix.
If the matrix is not known to be positive; fold [This is a forward reference; but, since we do not
employ the squaring algorithm anywhere, it is not logically circular.] it. Reminder: the Hermitian transpose implies a
complex-conjugate.
Tie-point #1
Save the matrix, temporarily.
Square it until it converges.
Compute the multiplicity m.
If the original matrix was complex and the mapping from complex to real is being utilized; assert that m is even.
Employ the Gramm-Schmidt ortho-normalization for the first m vectors. the use of
pivoting (a forward reference) will improve the precision of the computation of the
eigenpairs. if m is one; the normalization factor is the reciprocal of the square-root of the diagonal element. Optionally, use only
the first eigenvector.
Compute the eigenvalue e.
Optionally, multiply by the original symmetric matrix and take the trace of the resulting product, to ascertain the sign of this eigenvalue -- as I
did in my original implementation.
Optionally, compute each determinant 0 = det(S - x I) and 0 = det(S - (- x) I), to ascertain the sign(s) of the eigenvalue. I did not do this step
in my original implementation.
Deflate rank.
Optionally, deflate order and save recursively, for convergence time proportional to n^2 (n + 1)^2 / 4 versus n^4. pray, see this
reference for the formula for the sum of n^3.
If rank > 0 then go to tie-point #1.
If any eigenpairs remain to be found; employ the Gramm-Schmidt ortho-normalization for the eigenvectors. each of the corresponding
eigenvalues is zero.
Undo the order deflations, if any.
Optionally, clean-up the eigenvectors by employing the Gramm-Schmidt ortho-normalization.
Since this algorithm has a propensity for anomalously resolving degeneracies, it strongly is urged that the purported answer be checked thoroughly.
If it had been folded; undo the folding and ascertain the sign of each eigenvalue. In my original implementation, I performed neither any of
Gramm-Schmidt ortho-normalization steps, nor did I perform any of the following steps ---
Now that we have found the ortho-normal matrix O, form the product Otr S O, to obtain the value (including the sign) of each eigenvalue that is not
cross-sign degenerate.
If there are any cross-sign degeneratea eigenvalues; by means of a suitable permutation, segregate them at the bottom.
Order-deflate this matrix, to obtain only the cross-sign degenerate kernel.
From the squaring algorithm, we already know the square of each of the eigenvalues. Compute plus or minus the square-root of each, to obtain the
approximate eigenvalue.
For each remaining eigenvalue, perform this sequence of steps:
Translate by minus the approximate eigenvalue, thereby breaking the cross-sign degeneracy.
Perform the division algorithm.
Usually it will converge immediately. If not; perform the squaring algorithm upon it, until it converges.
If this eigenvalue is degenerate; perform the Gramm-Schmidt ortho-normalization algorithm, to obtain the eigenvectors.
You are finished with this eigenvalue.
finish.
As far as I remember, this is the strategy (using only the first eigenvector, above) that I had employed in each of the aforementioned
situations. However, the devil is in the details! I may have forgotten something. :=)
This strategy may be improved interactively by the creative use of the translation, inversion, and similarity. Just be sure to undo them as
appropriate.
The convergence time is proportional to n^3 for the Jacob algorithm, described below. For large n, clearly the Jacobi
algorithm is significantly faster. Hence, on a modern computer, we may as well employ the Jacobi algorithm, exclusively. Then, for
logistical reasons, actual implementation of the squaring algorithm has dubious value.
[I discovered how to factor a general real matrix A as A = < D > in circa 1960-1. The squaring algorithm has the historical distinction of being that algorithm which I had employed to demonstrate this factorization. It immediately was obvious that any algorithm for the factorization of a symmetric matrix could be employed likewise. It also immediately was obvious that the mapping of the complex matrices into the real matrices would extend this factorization to that of a general complex matrix.]
The advantages of the multiplicative family of algorithms for finding the eigenvalue-eigenvector eigenpairs versus the Jacobi algorithm are: Employs only matrix operations; does not need to access the individual elements of the matrix. Easer implementation, especially if the eigenvalues of interest are known to be not degenerate. Faster at finding a single eigenpair. The disadvantages are: Poorer precision. Slower at finding all of the eigenpairs. Unable to resolve relatively closely-spaced, in magnitude, eigenvalues.
For the Hermitian matrix (that is, the complex counterpart of the symmetric matrix), thankfully, the eigenvalues are real -- it could have been worse. However, the mapping that we employ, which carries a complex matrix into a real matrix, doubles the multiplicity of each eigenvalue. Thus, the resultant matrix is highly degenerate. Consider not to employ this mapping; but, instead code the complex Hermitian matrix directly.
Some day, if I feel really ambitious, I may implement the squaring algorithm again, employing the C language. If anyone is interested; pray, let me know.
The process of finding the eigenvalues of a symmetric matrix finds the roots of the characteristic polynomial. Hence, there must be a correspondence between the theorems/algorithms for eigenvalues and those for solving a polynomial equation. For instance, what we call here "deflation" corresponds to the Horner method of the Theory of Equations. Likewise, the squaring is applicable in either context. The practicability may vary, though; because, while it is guaranteed that each eigenvalue of a symmetric (even of a Hermitain) matrix is real, there rarely is any such a priori knowledge regarding a given real polynomial.
Study guide: While the Jacobi algorithm is monolithic, the squaring algorithm may be studied in
gradual stages:
non-degenerate positive-definite
non-degenerate positive
positive, with only one degenerate root
positive, with multiple degenerate roots
no cross-sign degenerate roots
general polynomial -- symmetric matrix
general polynomial -- Hermitian matrix
In each stage, first study the polynomial, then the corresponding matrix.
Lemma: Given the symmetric 2x2 matrix S = (a, b; b, d). We kill the off-diagonal element by taking the angle x as x = Atan(2 b / (r - (a - d))), where r = sqrt((2 b)^2 + (a - d)^2). The sign of r has to be chosen carefully – we will show how, presently.
The resulting marix becomes O S Otr = (1 / 2) ((a + d) + r, 0; 0, (a + d) - r).
The ortho-normal matrix becomes O = (1 / R) (r - (a - d), 2 b; - 2 b, r - (a - d)), where
r = sqrt((2 b)^2 + (a - d)^2)
R = sqrt(2 r (r - (a - d))).
Observe that the result is algebraic, even though we employed Trigonometry to obtain it. Also, observe that the ortho-normal matrix O is proper. Finally, we observe that each of these two square-roots here and each in the following detailed discussion is real. It would be a disaster otherwise. :-)
In the proof of the trace of a similarity transformation, we had multiplied-out a two-by-two case. From it, we have for an arbitrary ortho-normal matrix O and a given symmetric-matrix S
(a (cos(x))^2 + (b + c) sin(x) cos(x) + d (sin(x))^2, - a sin(x) cos(x) + b (cos(x))^2 - c (sin(x))^2 + d sin(x) cos(x);
-a sin(x) cos(x) - b (sin(x))^2 + c (cos(x))^2 + d sin(x) cos(x). a (sin(x))^2 - (b + c) sin(x) cos(x) + d (cos(x))^2).
O S Otr = (a (cos(x))^2 + 2 b sin(x) cos(x) + d (sin(X))^2, - a sin(x) cos(x) + b ((cos(X))^2 - (sin(X))^2) + d sin(x) cos(x);
- a sin(x) cos(x) + b ((cos(x))^2 - (sin(x))^2) + d sin(x) cos(x), a (sin(x))^2 - 2 b sin(x) cos(x) + d (cos(x))^2) =
= (1 / 2) (a (1 + cos(2 x)) + 2 b sin(2 x) + d (1 - cos(2 x)), - a sin(2 x) + 2 b cos(2 x) + d sin(2 x);
- a sin(2 x) + 2 b (cos(2 x) + d sin(2 x), a (1 - cos(2 x)) - 2 b sin(2 x) + d (1 + cos(2 x)).
The upper-right (and lower-left) element is - (a - d) sin(2 x) + 2 b cos(2 x).
Set it equal to zero and solve for the angle (2 x): tan(2 x) = 2 b / (a - d). Then, the sine and cosine of this angle are:
sin(2 x) = 2 b / r
cos(2 x) = (a - d) / r
r = (a == d)? (2 b): ((a >= d)? (sqrt((2 b)^2 + (a - d)^2)): (-sqrt((2 b)^2 + (a - d)^2))).
Substitute back into the product
O S Otr = (1 / (2 r)) (a (r + (a - d)) + 2 b (2 b) + d (r - (a - d)), 0; 0, a (r - (a - d)) - 2 b(2 b) + d(r + (a + d))) =
= (1 / (2 r)) ((a - d)^2 + 4 b^2 + (a + d) r, 0; 0, - (a - d)^2 - 4 b^2 + (a + d) r) =
= (1 / (2 r)) (r^2 + (a + d) r, 0; 0, - r^2 + (a + d) r) =
= (1 / 2) ((a + d) + r, 0; 0, (a + d) - r) = (a + delta / 2, 0; 0, d - delta / 2)
delta = r - (a - d) = (r^2 - (a - d)^2) / (r + (a - d)) = (2 b)^2 / (r + (a - d)) =
= (a == d)? (2 b): (2 b)^2 / ((a - d) (1 + sqrt(1 + (2 b / (a - d))^2))).
Its trace is (a + d), as it should by the aforementioned theorem.
Its Frobenious matrix norm is = (1 / 2) ((a + d)^2 + (2 b)^2 + (a - d)^2) = (a^2 + 2 b^2 + d^2), as it should by the theorem for the norm.
We need the angle x.
tan(x) = sin(x) / cos(x) = sqrt((1 - cos(2 x)) / (1 + cos(2 x))) = sin(2 x) / (1 - cos(2 x)) = 2 b / (r - (a - d)) = 2 b / delta.
Take the angle x in (-pi/2, pi/2).
sin(x) = 2 b / R
cos(x) = (r - (a - d)) / R
R = sqrt((2 b)^2 + (r - (a - d))^2) = sqrt(2 r^2 - 2 r (a - d)) = sqrt(2 r (r - (a - d))) = sqrt(2 r delta). QED.
Summary:
x = (1 / 2) atan(2 b / (a – d))
z = |a – d| + sqrt((2 b)^2 + (a – d)^2)
delta = ((a > d)? (1.0): (-1.0)) * ((2 b)^2 / z xor -z)
sq = +- 1.0 / sqrt((2 b)^2 + delta^2)
cos(x) = 2 b sq
sin(x) = delta sq
hence delta = 2 b tan(x).
We have worked-out the most common case – that for |2 b| <= |a - d|. The coding of the algorithm requires formulae for a == d and the |2 b | > |a - d| cases, as well.
Let us see how the lemma applies to a 3x3 symmetric matrix, which we write as S = (a, b, u; b, d, v; u, v, 1). We need to evaluate the similarity transformation with the ortho-normal matrix O = (cos(x), sin(x), 0; -sin(x), cos(x), 0; 0, 0, 1). It is apparent that O S Otr = (a+delta, 0, u’; 0, d-delta, v’; u’, v’, 1), where we have u’ = u’ = u cos(x) + v sin(x) and v’ = v cos(x) – u sin(x). We also need to evaluate the product of the old bra, whose typical row we write as < = (u, v) – the u is in the column of the aforementioned a and the v is in the column of the aforementioned d. It again is apparent that < O = (u’, v’), where the u’ and v’ are as given previously. Of course the actual u and v and the resulting u’ and v’ are those of the bra.
Theorem/algorithm: For a symmetric matrix which is larger than 2x2, apply this lemma repeatedly to the largest-in-magnitude off-diagonal element. How to find the largest element? Maintain a single column of pointers to the position in each row of the largest in magnitude element to the right of the principal diagonal. Update these pointers, as required, after each kill.
Proof: It obviously converges -- by the ratio test -- for the simplified version of applying the lemma to any element whose square is at least as large as the average of the square of the upper elements. There are m = n (n - 1) / 2 elements above the principal-diagonal. Each kill makes the average to no greater than (m - 1) / m times its previous value. This fixed ratio is less than one. QED.
How fast does it converge? We define a sweep as the act of killing m elements. The resulting ratio is ((m - 1) / m)^m --> 1 / epsilon, as m increases without bound. Hence, the time-constant is one sweep. Since each kill requires (8 n) multiplications, a sweep requires (8 n) m = (8 n) (n (n - 1) / 2) = 4 n^3 + lower-order terms multiplications. Thus, the convergence-time is proportional to the cube of n. Not bad, at all. Especially, when we recall that a single multiplication of matrices requires n^3 multiplications. How many sweeps s do we need to provide a precision of b bits? The formula is (1 / epsilon)^s = (2^(-b))^2 = 2^(-2 b). We have to square; because, we are talking about the sum of squares. Solve for s, to obtain s = 2 b ln(2). This is guaranteed for the simplified algorithm. The original (full) algorithm is faster. After the first two or three sweeps at the aforementioned rate, the advantage of killing the largest element manifests itself. The formula becomes approximately [but I do not see how to prove it] (1 / epsilon)^2^s = 2^(- 2 b). Solve for s, to obtain s = ln(2 b ln(2) / ln(2). For the Pentium (R) microprocessors, b = 55. Then, the two formulae give s = 76.2 or s = 6.25 + whatever it takes to get started = 9 or so , respectively.
Each kill reduces the average as average -= b^2 / m. Because of round-off errors, this running value for the average will become imprecise. It is recommended that the average be recomputed at the start of each sweep. Also, it is recommended that during a sweep, the change to the diagonal-elements be kept in a separate vector; thus, providing double-precision for the calculation of the diagonal-elements.
Corollary: The matrix O indeed is ortho-normal. And – if it was obtained by the Jacobi algorithm (but, not any of the multiplication family of algorithms) – it is proper.
Proof: It is constructed as the product of proper ortho-normal matrices. QED.
Lemma: Given a horizontal eigenvector V of a symmetric matrix S and the factorization of S as S = O E Otr, we have the product V O = (0, 0, ..., 0, 1, 0, ..., 0). That is, a horizontal vector whose elements are zero, except for a single one.
Proof: We already know that the eigenvectors are unique (up to degeneracies). By the corollary, we know that these vectors are ortho-normal; because O is the collection of their transposes. Hence, the product of V by the transpose of any other eigenvector is zero and by the transpose of itself is one.
Theorem: Given a horizontal eigenvector V of a symmetric matrix S, the corresponding eigenvalue e may be computed as e = V S Vtr.
Proof: Consider the factorization of S as S = O E Otr. We need to show that e =? V S Vtr = V (O E Otr) Vtr = (V O) E (Otr Vtr) = (V O) E (V O)tr = (0, 0, ..., 1, 0, ..., 0) E (0, 0, ..., 1, 0, ..., 0)tr = ((0, 0, ..., 1, 0, ..., 0) E) (0, 0, ..., 1, 0, ..., 0)tr = (0, 0, ..., e, 0, ..., 0) (0, 0, ..., 1, 0, ..., 0)tr = e. QED. We already had proven this theorem, earlier; but, in a different way.
Theorem: An ortho-normal (or unitary) matrix Q may be approximated by means of the product of elementary rotations.
Proof: Construct an arbitrary positive-definite diagonal matrix E', with the elements in strictly decreasing order. The product matrix S = Q E' Qtr obviously is symmetric. Employ the Jacobi algorithm to factor it as S = O E Otr. By the construction within the Jacobi algorithm, O is the product of elementary rotations. By the Jacobi theorem, O approaches Q. QED.
Theorem: Each eigenvalue of a symmetric (Hermitian) matrix is real.
Proof: Since the Jacobi algorithm converges and since -- as was stated at the beginning -- none of the radicands is negative, the resulting eigenvalue matrix must be real.
While we just might feel lucky that this theorem is true for the symmetric matrix, it is amazing that it also is true for the Hermitian matrix.
[Historical note: Jacobi discovered/invented this algorithm in 1845. It took him five years to demonstrate it on a 7x7 symmetric matrix. It became accepted as the best algorithm for the factorization of a symmetric-matrix. However, a proof of the convergence was elusive. During the decade 1970-1980, for the first time, someone wrote a computer program for the easier simplified version. The convergence became obvious. Finally, the original algorithm was implemented. It converges even faster. Through its history, surprisingly, this algorithm sometimes would diverge. Once it had been implemented as a computer program, it was discovered that an error in the sign of the square-root for the r was the culprit -- attention to details!]
There are 24 options in the case of a symmetric matrix and 48 options in the case of a Hermitian matrix. These extra 24 are ignored when the Jacobi routine is operating upon a symmetric matrix. I provide all of these choices; but, myself, remain agnostic. I leave it up to the user to make the selection.
Finding the next element to kill – Jacobi_style.
0 = the largest, in magnitude, element. This was the way that Jacobi originally proposed.
1 = downward by rows and to the right within each row, find the first element which, in magnitude, is equal to or greater than the average value. When the Jacobi algorithm first was being implemented on a computer, it had been considered too difficult to find the largest element. It is not clear for whom it was too difficult – the computer or the programmer. :-)
2 = from the right-upper corner, by super-diagonals and downward within each super-diagonal, find the first element which, in magnitude, is equal to or greater than the average value. This choice converges significantly faster than the preceding one.
Finally, someone devised a clever way to implement the largest, in magnitude, search. And, insooth, the 0 choice converges significantly faster that the preceding one.
Add 4 – applicable only for a Hermitian matrix, ignored for a symmetric matrix – within each pair of complementary kills, will kill the primary copy first. It makes absolutely no difference, which you kill first.
In finding the angle x, there are two square-roots, each of which may be taken as either positive xor negative – Jacobi_angle.
0 = always take the delta positive. It simply feels good to do so. As a by-product, it will sort the eigenvalues in decreasing order, down along the principal-diagonal.
1 = always take the delta negative. If you want to be different. As a by-product, it will sort the eigenvalues in increasing order, down along the principal-diagonal.
2 = employ the strategy of choosing that sign for delta which results in the smaller, in absolute-value, delta, for each kill. This strategy will cause the least change to the diagonal elements. As a result, it should provide the best precision.
3 = employ the strategy of choosing that sign for delta which results in the larger, in absolute-value, delta, for each kill. This strategy will cause the greatest change to the diagonal elements. If you want to be contrary.
Add 4 to choose the negative square-root in the computation of the sq. As a consequence, each kill will multiply the each of the two rows of the bra by minus one. Since the eigenvectors are axial unit vectors, the eigenvectors remain the same – they just look different. Since each time two vectors are affected, the parity is zero. Hence, the bra stays proper. This choice makes absolutely no difference.
The convergence test criterion – test_Jac.
It makes little sense to have this relative test criterion significantly less than the precision of the computation. The IEEE double-precision has a precision of approximately 10^(-16). I employ the test_Jac value of 10^(-20), just to see how well the routine works. The convergence requires approximately minus the logarithm to the base ten of the precision, this value multiplied by n cubed, multiplications. Thus, if you do not need the ultimate in precision, you may save some time by specifying a larger value.
Jacobi devised his algorithm to factor a symmetric matrix. Both the foregoing theorem/algorithm and the computer program assert -- in their respective hypotheses -- that the given matrix is symmetric (or Hermitian). What would happen if we were to provide a general real (or complex) matrix to the computer program? Suppose further that we were to by-pass the assertion. Then, the computer program would run and produce some output. Because of the manner that this program is written; it still would factor the given matrix A into A = O D Otr. From the way that it is constructed, the matrix O still would be an ortho-normal (or unitary, respectively) matrix. The matrix D is produced by killing each of the elements of A which are above the principal diagonal. However, while the program computes the alterations to the elements below the principal diagonal, it never attempts to kill any of them. Hence, the matrix D would have each of its elements that are above the principal diagonal become zero -- just the definition of a left-triangular matrix. Thus, we would achieve the factorization A = O L Otr.
For a detailed picture of the result, let us analyze formally the consequence of suspending the assertion.
There are four triangular matrices: 1) The lower L and upper U, each of the elements on whose diagonal is one. We already have encountered these in the discussion of the LDU matrix inversion algorithm. And 2) The left L and right R, whose diagonal elements are not constrained. We already have encountered these in the discussion of the QR matrix inversion algorithm.
Theorem: The product of a pair of like triangular matrices is a triangular matrix of like kind.
Proof is obvious.
Definition: The right-Cholesky factorization -- sometimes called "decomposition" -- of a real (or complex) matrix A is A = O R Otr, where O is an ortho-normal (or unitary, respectively) matrix and R is a right-triangular matrix. The left-Cholesky factorization of a real (or complex) matrix A is A = O L Otr, where O is an ortho-normal (or unitary, respectively) matrix and L is a left-triangular matrix. This factorization also is known as "the Shur form".
Lemma: The transpose of a right-triangular matrix is a left-triangular matrix. The transpose of a left-triangular matrix is a right-triangular matrix.
Proofs are obvious.
Theorem/Algorithm: Given a real (or complex) matrix A. The following steps comprise a constructive proof of the existence of the left-Cholesky factorization and establish its uniqueness -- up to the usual degeneracies. The right-Cholesky factorization is analogous.
Temporarily, designate the matrix A as symmetric; so as to be able to cheat in the next step.
Apply the Jacobi algorithm to this matrix. It will yield an ortho-normal (or unitary, respectively) matrix O and a matrix D which it will designate as diagonal.
Observe that the matrix D actually is left-triangular, rather than diagonal. Hence, remove its designation as a diagonal matrix and rename it as L.
Proof is obvious.
Corollary: Given a real (or complex) matrix A. The two Cholesky factorizations are A = O R Otr and Atr = O L Otr, where O is the same ortho-normal (or unitary, respectively) matrix and the left-triangular matrix L is the transpose of the right-triangular matrix R.
Proof: Take A = O L Otr as the left-Cholesky factorization. Take the transpose of each side of this equation, to obtain Atr = (O L Otr)tr = O Ltr Otr. By the lemma, Ltr is a right-triangular matrix. QED.
Collary/Algorithm: To obtain a right-Cholesky factorization of the matrix A:
Transpose the matrix.
Perform the steps of this theorem.
Transpose the L matrix, to obtain an R matrix.
Proof is obvious.
Note: Depending upon the specific implementation, the actual code of the Jacobi program may need minor modification to serve as a Cholesky program.
As promised, the following two theorems and an algorithm constitute a generalization of the first sentence of a previous theorem to an suitably constrained set of matrices.
[I formulated and proved the following theorem during the last week of January 2006. It is a generalization of a portion of the preceding theorem. Hence, while these two theorems overlap, neither subsumes the other.]
Given two symmetric matrices A and B. In the foregoing sections, we have shown that each may be factored as A = O a Otr and B = Q b Qtr, where O and Q are ortho-normal matrices and a and b are diagonal matrices. As a set, the eigen-values a and b are unique. The ortho-normal matrices O and Q then are unique, up to degeneracies.
We already know that the additive inverse of A may be written as -A = O (-a) Otr and the multiplicative inverse of A may be written as Ainv = O ainv Otr. Also, it is obvious that the additive identity (the null matrix) and the multiplicative identity (the identity matrix) each trivially is a symmetric matrix.
Lemma: The sum of any two symmetric matrices is a symmetric matrix. Furthermore, if the two matrices A and B share the same ortho-normal matrix factor O; then their sum matrix is the symmetric matrix A + B = O (a + b) Otr.
Proof is obvious.
Lemma: If the two symmetric matrices A and B share the same ortho-normal matrix factor O; then their product matrix is the symmetric matrix A B = O a b Otr.
Proof is obvious.
Theorem: Any rational algebraic function of a set of symmetric matrices which share the same ortho-normal matrix factor O is the symmetric matrix given by f(A, B, C, ...) = O f(a, b, c, ...) Otr.
Proof: Matrices constitute a division-ring.
Lemma: The diagonal matrices a and b commute.
Proof is obvious.
Corollary: A set of symmetric matrices which share the same ortho-normal matrix factor O constitute a field.
Proof is obvious.
The foregoing theorem speaks glibly regarding the "same ortho-normal matrix factor O". For a given set of matrices,
even if it is possible; it may require some effort to express the matrices so that they share the same ortho-normal matrix factor O.
Start with a non-degenerate matrix -- call it the matrix A. Factor it as A = O a Otr. Then, for each other matrix -- say, B --, write
Otr B O. If this does not diagonalize B; you are out of luck. Otherwise, we obtain Otr B O = b, a diagonal matrix. Pre-multiply
by O and post-multiply by Otr, to obtain B = O b Otr. If each of the matrices are thus diagonizable; you have placed the matrices in the form
required by the hypothesis.
On the other hand, if there is no non-degenerate matrix in the given set; take one with the least degeneracies, as your starting matrix.
Then, you will have to resolve the degeneracies, as you proceed. Good luck!
The following theorem tells when a common ortho-normal factor exists. However, this theorem will not work reliably for any degenerate
matrices.
Theorem: Given two non-degenerate symmetric-matrices A and B. If they share a common ortho-normal factor; then the product (Otr Q) is a permutation matrix.
Proof: By the hypothesis, Q = O. Then, (Otr Q) = Otr O = I, the identity matrix. An identity matrix is a trivial permutation matrix. QED.
Why is this theorem restricted to non-degenerate matrices? Where does its proof fail if either (or both) of the
matrices is degenerate?
The answer to this rhetorical question is very suble. The theorem regarding the uniqueness of the eigen-vectors states that the eigen-vectors
are unique up to degeneracies. Thus, just because A and B share a common ortho-normal factor does not imply that Q = O, when either matrix is
degenerate.
Converse: Given two symmetric matrices A and B. If (Otr Q) is a permutation matrix; then, A and B share a common factor. The proof is constructive.
Proof: By the hypothesis, (Otr Q) = P, a permutation matrix. Then, we may write the matrix B as B = Q b Qtr = (O Otr) (Q b Qtr) (O Otr) = O (Otr Q) b (Qtr O) Otr = O (P b Ptr) Otr = O b' Otr, where b' = P b Ptr is the indicated permutation of b. QED.
This is not an exact converse; because, the preconditions are different.
Contra positive of the theorem: Given two non-degenerate symmetric-matrices A and B. If the product (Otr Q) is not a permutation matrix; then the matrices do not chare a common ortho-normal factor.
Proof: A contra positive of a theorem is equivalent, logically, to the theorem; hence, it follows from the theorem.
Algorithm: Given two matrices A and B. This algorithm will either factor the two matrices so as to posses the same ortho-normal factor O; else, establish that it is impossible to do so.
Verify that they are symmetric: Compute the Frobenious matrix-norm of (A - Atr). It should be zero. In practice, near zero because of round-off errors. Ditto for (B - Btr).
Factor the matrix A -- which now is known to be symmetric -- into A = O a Otr, where O is an ortho-normal matrix and a is a diagonal matrix. Ditto B = Q b Qtr.
By inspection, verify that neither matrix is degenerate. If either matrix is degenerate; this algorithm is inconclusive, hence, inapplicable.
Compute the product matrix P = Otr Q. Of necessity, it is ortho-normal; because, it is the product of two ortho-normal matrices.
By inspection, verify that the matrix P is comprised of the elements {-1, 0, 1}, to within acceptable round-off errors.
Hence, the matrix P is a permutation matrix; because, any ortho-normal matrix whose elements are drawn from {-1, 0, 1} is a permutation matrix. Pray, see the converse of the theorem following the definition of a permutation matrix.
Employ the the construction in the proof of the converse of the foregoing theorem, to factor the matrix B as B = O b' Otr, where O is the ortho-normal factor of the matrix A and b' = P b Ptr is the indicated permutation of the original diagonal factor b of the matrix B.
On a slightly different tack...
Lemma: If a pair of symmetric matrices A and B commute; their product C = A B is a symmetric matrix.
Proof is obvious.
THIS THEOREM -- and hence, its corollary, as well -- IS FALSE!!! See if you can spot the fallacy in its purported proof.
Theorem: If a pair of non-degenerate positive-definite symmetric matrices A and B commute; the largest eigen-value of the product matrix is equal to the product of the largest eigen-values of the individual matrices. The corresponding eigen-vector may be computed from the product of the converged matrices. Pray, see the footnote number 01.
Proof: Hint: Employ either the multiplication or squaring algorithm.
Corollary; The remaining eigen-values and eigen-vectors may be obtained by Mathematical Induction.
Proof is by Mathematical Induction.
The foregoing theorem and its corollary may be generalized to a pair of any symmetric matrices. However, the details become very tedious.
If the original matrices A and B are large and we desire more that approximately twenty eigen-vectors of the product matrix; it is easier and faster to employ the Jacobi algorithm on the product matrix, instead of the method of the foregoing theorem.
Did you spot the error? The limit of a product need not the equal to the product of the limits. So, why retain a false theorem? To demonstrate how easily one may stray from the narrow path.
Footnote number 01.
All right. So, you want the nitty-gritty?
For convenience and by convention, the a and b diagonal matrices are sorted in decreasing order. Since we know what the limit of either the
multiplication or squaring algorithm is, let us just replace the a and b matrices with a null-matrix. Then, place a one into the upper-left
corner of each. With care, it only requires n^2 multiplications to construct each product (O a Otr) and (Q b Qtr). We need only the top
row (or left column) of the product ((O a Otr) (Q b Qtr)), at a cost of n^2 multiplications. Employ the method from the multiplication or squaring algorithm to normalize either of these vectors. The
total is on the order of (3 n^2) multiplications. Finally, partition-off the top row and left column, to deflate each of the original
matrices A and B. Now, we are ready for the Mathematical Induction.
Copyright (c) 2003, 4, 5, 6 by R.I. ‘Scibor-Marchocki. Last revised Friday 17-th March 2006. Typographical error corrected Thursday 23-rd March 2006.