Linear Algebra Part Four

copyright (c) 1999 by R. I. 'Scibor-Marchocki

 

This is a collections of e-mail letters, which I wrote to my very good friend Deb, as a set of lessons in Linear Algebra. I have deleted each salutation, to protect her privacy. If anybody shows interest, someday, I might edit these lessons into a better organized tutorial. So, please write me at webmaster@rism.com.

 


Jacobi -- a 2x2 matrix

Jacobi invented his algorithm for finding the factorization of a symmetric matrix A

A = O*D*Otr

in 1845. However, until the advent of computers, few people had the patience to perform the computation. Thus, it was only about 1972 that an attempt was launched to implement this algorithm as a computer program. It took until 1980 to implement the whole algorithm in a satisfactory manner. Hence, expect many lessons on this subject. After all, it took eight years of effort to understand the algorithm.

Jacobi's insight was that since the ultimate goal is this factorization, one should work towards it directly.

While, in principle, a 3x3 or even a 4x4 symmetric matrix can be solved analytically; it is not practical to do so.

Sophomore Calculus textbooks, in the third semester, present the Jacobi algorithm for a 2x2 symmetric matrix A = (a b b c). While a purist would do it algebraically, since we know Trigonometry, we will do it the easier way. Take the ortho-normal matrix O = (cos(phi/2) sin(phi/2) -sin(phi/2) cos(phi/2), with phi in the semi-closed interval (-pi pi]. Solve the factorization

D = Otr * A * O

Multiply it out. We know that D will be a symmetric matrix. Set the right-upper corner equal to zero. Solve for phi. Substitute back into the O. Also, substitute into the two diagonal elements of D. You may want to check, substitute into the off-diagonal element of D and see that it indeed is zero.

Well, do it a step at a time. Multiply it out (a(cos(phi)+1)/2 - b sin(phi) + c(cos(phi)-1)/2 (a-c)sin(phi) + b cos(phi) (a-c)sin(phi) + b cos(phi) a(cos(phi)-1)/2 + b sin(phi) + c(cos(phi)+1)/2)

Now, finally, you see the foresight we had in employing phi/2 -- instead of phi -- in the skew-symmetric matrix, when we discussed the bilinear transformations?

The next step is to solve for phi the equation

0 = (a-c)sin(phi) + b cos(phi)

There are two precautions. Of the two possible quadrants, we take that one for which

sin(phi) = -b/d and cos(phi) = (a-c)/d

And, we compute the square-root -- taking into account that b is small in magnitude --

d = sqrt((a-c)**2 + b**2) = (a-c) sqrt(1 + b**2 / (a-c)**2)

Remember, way back when we presented -- as part of the Trigonometry -- some miscellaneous Algebra? Why do you think we told you? You are not allowed to forget *anything*. Once you are told, you *know* it!!!!! You better. -)

This gives

cos(phi) = 1 / sqrt(1 + b**2 / (a-c)**2)

which is nearly one. And

sin(phi) = b / ((a-c) sqrt(1 + b**2 / (a-c)**2))

which, when it will be multiplied by b, will become very small, in magnitude.

I leave it to you to finish. Be careful with the quadrant in which phi/2 falls. Check its placement by computing sin(phi) = 2 sin(phi/2) cos(phi/2)

One more question Can (a-c) be zero? After all, we blissfully indicated division by it in several steps. No, it cannot be zero, because, if it were; then the matrix would be degenerate -- its two eigen-values would be equal. And, by hypothesis, we stipulated that the matrix be non-degenerate.

So much work -- and this is just a 2x2 matrix. -)


Jacobi -- the sharp pattern

Follow-up of the 2x2 lesson. Did you compute the sin(phi/2) as

sqrt((1-xos(phi)*(1+cos(phi))

instead of

sqrt(1 - (cos(phi))**2)

How many times must I remind to watch the details?

Speaking of details. It is vital that you place the angles in the appropriate quadrants. Otherwise, at the least, the algorithm will not converge. At the worst, you will make an inconsistent hopeless mess!

While we had employed trigonometric notation, we never computed any angle. All of our computations had been algebraic. So, the solution is strictly algebraic. Only guided by Trigonometry.

 

Now, let us consider an nxn symmetric matrix A, with n strictly greater than two.

Select an off-diagonal element to kill. ("Kill" is the traditional terminology.) Since the matrix is symmetric, we may as well consider only the upper-right triangle. The strategy for this selection will be the of some future lessons.

Take an ortho-normal matrix, which is an identity matrix, except for the four elements The candidate for killing, its symmetric counterpart in the lower-left triangle, and its two projections upon the principal diagonal. Into this set of four positions, place the elements of the 2x2 ortho-normal matrix discussed last time.

Operate upon the matrix A in-situ. Perform the previous transformation. The four designated elements of the matrix A will become modified as previously.

Now, the sharp pattern. Pre- and post-multiply by the ortho-normal matrix (or its transpose, where required). Only the sharp pattern, through the four designated elements, changes. So, compute only these elements -- and do *not* re-compute the four designated-elements.

There are three or four further things we have to compute.

Do you remember how one computes the Frobenious matrix norm? Do you even remember what a Frobenious matrix norm is? How can we progress, when we have to review all of the time. The most important lesson my Father ever taught me -- and early on -- was that he *never* would repeat a lesson. Learn it the first time! I am not going to help you here. You will have to find our discussion of the Frobenious matrix norm, and review it yourself. -)

Jacobi did not tell us this -- we had to discover it ourselves. We need the sum of the squares of the off-diagonal elements of the current matrix A. For the most part, we can update it by subtraction of the (square of the before an element less the square of the after an element), summed over the off-diagonal elements of the sharp pattern. The whole thing has to be computed initially and any time that the updated value (because of the accumulated round-off errors) becomes negative or smaller than the precision of the computation, relative to the previous re-computed value. We stop, when the re-computed value becomes small enough.

Jacobi did not tell us this, either -- we had to discover it ourselves. We need the location, in each row (or column), of the largest -- in absolute value -- off-diagonal element. We need the location of the row (or column) of the largest off-diagonal element. Again, these should be updated only per the sharp pattern.

This is optional, only if we want the eigen-vectors. Initialize it as an identity matrix. Keep multiplying by each of the ortho-normal matrices. Update only the sharp pattern. Careful, they do *not* commute! Which is it we want? Post multiply by the individual ortho-normal matrices.

Jacobi did not tell us this, either -- we had to discover it ourselves. Also, we do not modify the diagonal values. We accumulate the changes in a separate vector. Only when we are finished, we add this separate vector to the original diagonal. In this manner, we obtain an effective double-precision in our calculation.


Algebraic Topology

Mostly, we are talking about nxn matrices, with n a finite integer.

However, except for some of the algorithms, all that we say extends to infinite n -- either denumerable or even the power of the continuum. In the latter case, summations have to be replaced by Lebesque-Stieltjes integrals, of course.

Any time that n becomes infinite, we have crossed over to Algebraic Topology.

The ortho-normal (unitary) matrices of infinite order n provide a very concise, insightful approach to Fourier series.

The main difference is that instead of attempting to factor a matrix A into O * D * Otr, we begin with a known ortho-normal matrix. Then study the properties of this product. Sometimes, if the given matrix O is not ortho-normal, we will have to employ the Gramm-Schmidt ortho-normalization algorithm.

But, many books have been written about Algebraic Topology. We might write some lessons, after we finish the Linear Algebra.


Jacobi -- strategy

Diagonal. Without an adjective, diagonal means the principal-diagonal Begin at the left-upper corner and go down, to the right, at an angle of minus pi/4. Upper-diagonal begins at the element just to the right of the left-upper corner and goes in the same direction -- it is just above the principal-diagonal. Lower-diagonal begins at the element just below the left-upper corner and goes in the same direction -- it is just below the principal-diagonal. Tri-diagonal is all of the elements in the union of the preceding three diagonals.

 

Where are we with the Jacobi algorithm for factoring a symmetric matrix?

In the mid 1840's, Jacobi invented and demonstrated his algorithm.

Usually, we kill the largest -- in magnitude -- remaining element. However, in any matrix larger than 2x2, some other off-diagonal elements increase, in magnitude. Especially annoying is the resurrection of previously killed elements. It is not apparent whether we are making any overall progress.

In 1972 -- one-and-a-half centuries later -- we *still* do not have a convergence proof. Our -- very limited; because it is difficult by hand -- experience with this algorithm shows that *usually* it converges rapidly; however, occasionally it trashes among several apparently good solutions. Sometimes, it fails to converge. No one has been able to write a computer program for this algorithm -- another eight years will pass before a program becomes available. It is a surprisingly *difficult* algorithm!

Without the confidence that a convergence-proof would provide us, we are unable to distinguish computational errors, programming errors, poor strategy, and possibly inherent failures of the algorithm.

In previous lessons, we already have described all of the computations. What we lack is a viable strategy for selecting elements to kill. When we kill an element, it, of course, becomes zero. All elements not on the sharp-pattern remain unaltered. The other off-diagonal elements of the sharp-pattern change -- some increase in magnitude.

A heuristic would be to kill the elements along the upper-diagonal. Since their sharp-patterns do not intersect the previously killed elements, we will obtain an upper-diagonal with all zero's. But, some of the remaining off-diagonal elements have become larger, in magnitude. Have we made any overall progress? Even if we have in the case of a particular matrix, will there be progress in *every* matrix we may attempt in the future?

And we still are left with many non-zero elements. When we kill any of them, the upper-diagonal zero's are destroyed. Are we making *any* progress? ???

While by hand, it is easy to spot the largest -- in magnitude -- element; for a computer, a brute-force search is slow It requires visiting (n*(n-1)/2) elements.

Let us adapt a strategy which favours speed of selection over efficacy of the killing and the side-effects on the sharp-pattern. Maintain the sum of the squares of the off-diagonal elements. Scan along the upper-right triangle, in whatever pattern is fastest to do so. Kill any element whose square exceeds the average of the aforementioned sum.

What have we to loose? It might work. After all, we do not know of any strategy which has a convergence proof. Hence this strategy might not be so very undesirable. Let us try.

For the first time, we have something that we succeed to implement as a computer program. It *does* work -- and very well. But, still no convergence-proof as a guarantee.

However, it is an easier algorithm, than Jacobi's original. With effort, we succeed in establishing strictly monotonic convergence for this algorithm.

It becomes obvious that Jacobi's strategy of killing the largest -- in magnitude -- remaining element can be guaranteed to have at least as fast a convergence. After all, it conforms to the requirement of killing only those elements which exceed the average.

Now, encouraged and motivated, someone realizes that by placing the magnitudes of the elements into a data-base, one can find the largest, in magnitude, much faster than by the brute-force search. It becomes practical to implement the original Jacobi's strategy.

This insight took eight years -- until 1980 -- and many people mathematicians, computer scientists, and programmers.


Jacobi -- degenerate

A matrix is said to be *singular* iff (= if and only if) it possesses (at least) one eigen-value equal to zero.

A matrix is said to be *degenerate* iff is possesses (at least) one pair of repeated eigen-values.

Unfortunately, one cannot just look at a matrix and tell if it is either of the foregoing. Only if the matrix has been specifically constructed from known pieces, does one know. Otherwise, one infers from the failure-mode of a program, which implements an algorithm with a known inability to handle a specific type of matrix.

The set of programs which implement my squaring algorithm actually analyze a matrix and report its structure. This analysis is its surviving value.

Most algorithms for the inversion of a matrix are unable to handle a singular matrix. With great effort, there are known work-arounds, however.

Many algorithms for the eigen-pairs of a matrix are unable to handle a degenerate matrix. They usually also fail for a singular matrix.

Thus, there is a precedent for difficulties with either singular or degenerate matrices.

More than a century of frustration without a convergence proof for the Jacobi algorithm lead to a very pessimistic attitude when finally a convergence proof was formulated. The hypothesis excluded both singular and degenerate matrices.

Soon, it was observed that the Jacobi algorithm handles singular matrices, in stride. A careful inspection of the convergence proof confirmed that the restriction against singular matrices is not needed.

It also was observed that occasionally, the Jacobi algorithm actually *does* factor degenerate matrices. How can it possibly do so ever? After all, we saw that for a 2x2 degenerate matrix, we would be dividing by zero -- since a and c are the same, a-c is zero.

Let us look at a *degenerate* symmetric 2x2 matrix more carefully. The symmetric matrix A is (a b b c). Then A-xI = (a-x b b c-x). Its determinant -- which we set equal to zero to find the eigen-values -- is

0 = (a-x)*(c-x) - b*b = x**2 - (a+c)*x + (a*c-b*b)

For a degenerate matrix, the two roots must be equal, by definition. Thus, we set its discriminant equal to zero

0 = (a+c)**2 - 4*(a*c-b*b) = (a-c)**2 + 4*b*b

Since these are real numbers, each term individually must be zero. Thus, we have

c = a and b = 0

Substituting back into the matrix A, we have (a 0 0 a).

Surprise, the matrix *already* is factored -- before we did anything to it!

A = (a 0 0 a) = (1 0 0 1) * (a 0 0 a) * (1 0 0 1)tr

Now, we see that, since with our improved algorithm, while we strive to kill the *largest*, in magnitude, element; we are not obliged to go after the largest -- any one which is above average will do. Thus, if we skip attempting to kill any elements whose c is equal to a, we will not need to attempt dividing by zero.

By the time that all of the elements corresponding to non-degenerate eigen-values have been killed, those corresponding to the degenerate eigen-values will have become zero by themselves! Hence, the algorithm actually converges *faster* for a matrix which is degenerate. Now, we can remove the restriction against degenerate matrices from the hypothesis.

As a result, the final form of the Jacobi algorithm can handle *any* symmetric matrix.

How fast is it? To select an element for killing, we have to search n rows. The killing operation is an intricate computation; but, independent of the size or complexity of the matrix -- thus it is a constant. Then, the sharp-pattern computation requires several times n multiplications. Finally, one pass through each of the elements in the upper-right triangle requires n*(n-1)/2 killings. This is a total of some constant times n to the fourth power.

In practice, we observe that the killing increases in effectiveness rapidly. By the time we near the completion of the first pass, already the interactions in the sharp-pattern are decreasing. Usually, convergence occurs before we finish the second pass.

Thus the Jacobi algorithm converges in O(n**4) time. This is the official notation for saying that some constant times the fourth power of n multiplications are required.

For reference, the Gaussian-elimination, the Gauss-Jordan, and the LDU (= Lower, Diagonal, Upper) algorithms for the inversion of a matrix run in O(n**3) time. Likewise matrix multiplication requires exactly n**3 multiplications. Thus, we see that the Jacobi algorithm requires only one-higher power of n.


The transpose of a product is the product of the transposes, taken in reverse order.

It occurs to me, that we have not shown this

Theorem The transpose of a product is the product of the transposes, taken in reverse order. Given two matrices A and B, and their product C = A*B. The transpose of C is

Ctr = (A*B)tr = Btr * Atr

Proof Consider 2x2 matrices. A=(a b c d) and B=(e f g h). Whichever way you compute Ctr = (ae+bg af+gh ce+dg cf+dh). Then, by Mathematical Induction, employing our recursive partitioning of a matrix. QED.

 

Study of the properties of eigen-values and eigen-vectors.

Textbooks do so directly from the definition. This approach is difficult theoretically; because it is difficult to obtain an eigen-vector from an eigen-value. It is not practical; because determinants are too slow to compute. And, the round-off errors of polynomials make any with n greater than about 12 impossible with just double-precision.

In the 1960's, the Jacobi algorithm was not understood, as yet.

I knew that the multiplication algorithm *can* solve any symmetric matrix into its eigen-pairs. I had invented the much-faster -- but, closely related -- squaring algorithm. Textbooks do not describe these algorithms in any detail; but, as you remember, I was not using any textbook; so it did not matter. I elected to study eigen-pairs; based upon the squaring algorithm. Someday, I might provide lessons.

But, now that the Jacobi algorithm is understood, it is the easiest -- and usually fastest -- algorithm. Let us study eigen-p[airs based upon the Jacobi algorithm. This will be all new material -- invented as we go. It will be an adventure.

But, it is late. And I have to inspect Wolczica for the ticks!


The set of eigen-vectors is orthogonal

First, some determinants. We may have had some of these previously; but, in any case, it will be convenient to have all of these together, here.

Lemma The determinant of a diagonal matrix is the product of its elements.

Corollary The determinant of a diagonal matrix is the exponential of the trace of the logarithm of the matrix. Generalize to using a Lebesque-Stieltjes integral instead of a summation, to compute the trace. As will become evident from the discussion of the Jacobi algorithm, this corollary -- and its lemma -- applies to a symmetric (Hermitian) matrix, as well.

Corollary The determinant of an identity matrix is one.

Theorem The determinant of the transpose of a matrix is equal to the determinant of the matrix. Proof Take a 2x2 matrix. Compute it both ways. Employ Mathematical Induction upon our recursive partitioned matrices.

Theorem The determinant of a product of two matrices is equal to the products of their individual determinants. Proof Take a pair of 2x2 matrices. Compute it both ways. Employ Mathematical Induction upon our recursive partitioned matrices.

Corollary The determinant of a product of many matrices is equal to the products of their individual determinants. Proof by Mathematical Induction.

Corollary The determinant of the inverse of a matrix is equal to the reciprocal of the determinant of the matrix. Proof follows from the definition of an inverse of a matrix A * (inverse of A) = I.

Theorem The absolute value -- in the case of a unitary matrix, this absolute value is in the sense of complex numbers -- of an ortho-normal (unitary) matrix is one. Proof follows from O * Otr = I.

------------------------------- this is for an ortho-normal matrix, only -------------

Corollary The determinant of an ortho-normal matrix is either plus or minus one, with one of these values being extraneous.

Definition The ortho-normal matrix is said to be *proper* iff (= if and only if) its determinant is plus one. Otherwise -- that is, iff its determinant is minus one -- the ortho-normal matrix is said to be *improper*.

Theorem The ortho-normal matrix which we obtained from a skew-symmetric matrix is proper. Proof follows from the Trigonometric identity (cos(phi/2))**2 + (sin(phi/2))**2 = 1.

Theorem The product of two -- and by Mathematical Induction, of any even amount of -- improper ortho-normal matrices is a proper ortho-normal matrix.

From the choice of terminology, one may observe a bias against an improper ortho-normal matrix. And with good reason. Had we employed improper ortho-normal matrices in the Jacobi algorithm, it would not have converged. Attention to details!

So, what is an improper ortho-normal matrix, anyhow? Replace any one -- or odd amount of -- row (or column) by its negative. For instance, the 2x2 ortho-normal matrix (cos(phi/2) sin(phi/2) -sin(phi/2) cos(phi/2)) becomes (-cos(phi/2) -sin(phi/2) -sin(phi/2) cos(phi/2)). It may be verified that its determinant is minus one; but that O * Otr = I, still. If we replace an even amount of rows (or columns) by each of their respective negative, we obtain a proper ortho-normal matrix; but a rather *peculiar* looking one.

What happens if we replace a single row (or column) in a unitary matrix? Considered as an ortho-normal matrix, is indeed is an improper ortho-normal matrix. However, it is not isomorphic to a complex matrix; hence, is not a unitary matrix. Thus, we see that a unitary matrix is, of necessity, proper.

----------------- end of symmetric matrices -------------------

The Jacobi algorithm provided us with a factorization of any symmetric (Hermitian) matrix A as

A = O * D * Otr

And, from the foregoing, we may write I = O * Otr, where we are employing the *same* ortho-normal matrix, as that which the Jacobi algorithm provided.

Substitute into the expression, to obtain

A - xI = O * D * Otr - x O * Otr

Take its determinant

det(A - xI) = det(O) * (product of ((each element of D) - x)) * det(Otr)

This simplifies to the product of ((each element of D) - x). This is know as the eigen-value polynomial. Its roots are the eigen-values. Thus, we see that the Jacobi algorithm has provided us with the eigen-values of the matrix.

Definition The corresponding eigen-vector -- or set of eigen-vectors, in the case of a degenerate eigen-value -- is the set of vectors spanning the null-space of the matrix A - xI, for the given eigen-value x. By convention, we normalize -- make their norm equal to one -- these eigen-vectors. And -- also by convention -- in the degenerate case, we employ the Gramm-Schmidt ortho-normalization algorithm, to ortho-normalize the set of eigen-vectors. It is arbitrary which side we employ; however, in keeping with our choice of taking the O matrix on the *left* side in the factorization provided by the Jacobi algorithm, we take the left eigen-vectors.

Theorem The O matrix of the Jacobi algorithm is the set of eigen-vectors -- each corresponding to the eigen-values in the diagonal matrix D. Proof is obvious. Thus, we have established that the set of eigen-vectors is orthogonal. This is a difficult proof from either the squaring algorithm -- which is how I did it in the mid 1960's -- or directly from the definition -- which is the traditional method, employed by most textbooks.

Corollary The eigen-vector corresponding to a given eigen-value is unique. The set of eigen-vectors -- considered as the space which it spans -- corresponding to a degenerate eigen-value is unique; however, the individual eigen-vectors are arbitrary.

Corollary The principal curvatures of a Riemannian geometry are ortho-normal, since they are the eigen-vectors.

Corollary The principal axes of moments of inertia are ortho-normal, since they also are the eigen-vectors.

Dynamics portion of Mechanics. A solid object will wobble in its rotation, except when it rotates about one of its principal axes of moment of inertia.

Application The static balancing of a wheel consists of translating its center of mass to the desired axis of rotation. The dynamic balancing of a wheel, which already is balanced statically, consists of rotating two of the principal axes of moments of inertia to coincide with the desired axis of rotation.

Observe, that the Jacobi algorithm, as presented in previous lessons, has to be constrained not to confound the eigen-values between the two sub-matrices along the principal diagonal, when we apply it to the real isomorph of the Hermitian matrix. Attention to details, once again!

 

Have you solved the challenge to find the "twice" matrix inversion? You are taking *much* longer that I did. -)

We will postpone the discussion of degeneracy of a symmetric (Hermitian) matrix. That study requires the multiplication -- or any one of its relatives -- algorithm. We should discuss degeneracy; because, it is vital to the connection with Group Theory. And hence with Quantum Mechanics.

Instead, next time, we will proceed to the study of matrices which are *not* symmetric (not Hermitian).



webmaster@rism.com
Last modified on Thursday 22-nd July 1999
Copyright (c) 1999 by R. I. 'Scibor-Marchocki