copyright (c) 1999 by R. I. 'Scibor-Marchocki
This is a collections of e-mail letters, which I wrote to my very good friend Deb, as a set of lessons in Linear Algebra. I have deleted each salutation, to protect her privacy. If anybody shows interest, someday, I might edit these lessons into a better organized tutorial. So, please write me at webmaster@rism.com.
In the factorization theorem A = <S>, I had asserted that the ket is an ortho-normal (unitary) matrix. I had not provided a proof; because I considered it obvious. However, if you want a proof, here it is
We need to show that > * >tr = I. From the definition of the ket,
> * >tr = ((1/S) * Otr * A) * ((1/S) * Otr * A) =
= ((1/S) * Otr *A) * (Atr * O * (1/S)) =
= (1/S) * Otr * (A * Atr) * O * (1/S) =
= (1/S) * Otr * (O * D * Otr) * O * (1/S) =
= (1/S) * (Otr * O) * (S * S) * (Otr * O) * (1/S) =
= (1/S) * (S * S) * (1/S) =
= ((1/S) * S) * (S * (1/S)) =
= I * I = I QED.
By the way, the inverse of a diagonal matrix is obtained by replacing each non-zero element by its reciprocal. The zero elements remain zero and correspond to the null-space of the matrix.
An elementary permutation matrix is an identity matrix, with the square (1 1 -1 1), placed with its diagonal on the diagonal of the identity matrix. This matrix may be obtained by setting phi = pi, in the Jacobi algorithm. From the Jacobi algorithm, it is apparent that such a permutation matrix permutes one pair of adjacent rows or columns. In particular, operating upon a diagonal matrix, a pair of adjacent elements is permuted -- the matrix remains a diagonal matrix.
An elementary improper matrix is an identity matrix, with one of its elements replaced by a minus one. Given such a matrix p, it is obvious that the product p * D * ptr = D.
Is the factorization of a given real (complex) matrix A into bra times spectrum times ket unique? Even if we constrain the spectrum to be a diagonal matrix, there still are several common factorizations possible. For instance, the LDU (= Lower Diagonal Upper) algorithm for the inversion of a matrix factors the matrix into A = L * D * U, where L is a lower-left triangular matrix; that is, the diagonal is all ones and the upper-right triangle is all zeros. The U is an upper-right triangular matrix; that is, the diagonal is all ones and the lower-left triangle is all zeros. And D is a diagonal matrix; that is all of the off-diagonal elements are zeros.
Theorem The factorization A = <S> -- where the bra and the ket are ortho-normal (unitary) matrices and the spectrum is a diagonal matrix -- is unique, up to the following transformations Let p be a product of any arbitrary given elementary permutation matrices and elementary improper matrices. Then p * ptr = I. and ptr * p = I Write A = <S> = < * (ptr * p) * S * (ptr * p) * > = (< * ptr) * (p * S * ptr) * (p * >) = (< * ptr) * S * (p * >). Now, we have a new equivalent factorization.
Since the spectrum S had been obtained as the square-root of the absolute value of a diagonal positive real matrix, each of the elements of the spectrum may be either positive or negative. Thus, as a set, the spectrum is unique, in absolute value. We could adapt the conventions that each of the square-roots is the positive one and that we sort the spectrum in decreasing order. This is convenient in the matrix employed in Information Theory. However, for a complex matrix, the two halves of the diagonal have to be the same; hence, we will have each value with an even degeneracy and will have to split these pairs between the two halves, into corresponding positions.
Furthermore, if the matrix A is degenerate -- any values in S repeat, in absolute value -- then the corresponding degenerate portion of the bra -- and thus the corresponding portion of the ket -- are an ortho-normal (unitary) basis, which spans the degenerate space. We make extensive use of such a degeneracy in the factorization of the matrix of Information Theory.
Inversion of a matrix. That mathematician must have stumbled on the twice algorithm for the inversion of an ill-conditioned matrix. He felt -- and rightly so -- that the chance of anyone else ever *stumbling* across the same algorithm is negligible.
However, I had an advantage. While I did not have the Jacobi algorithm, any other algorithm for a symmetric matrix will do. I had been using my squaring algorithm. With my experience with the foregoing factorization theorem, I could construct the twice algorithm *systematically*.
Given the factorization of a real (complex) matrix A into the product of the bra times the spectrum times the ket A = <S>, its inverse is inverse of A = inverse of (<S>) = (inverse of >) * (inverse of S) * (inverse of <) = >tr * (1/S) * <tr. This inversion works for even a singular matrix -- the null space is handled correctly. Since it works for a singular matrix, it will work for any ill-conditioned matrix, as well. The only disadvantage is that it is slow O(n**4), compared to most good matrix inversions at O(n**3).
Well, let us just write the factorization A =- <S>; but treat it *analytically*, without performing the factorization explicitly; that is, numerically. The bra and the ket each is ortho-normal (unitary). Thus, their spectra are the identity matrix -- ill-conditioning ratio of one, as good as it gets. Hence, the numerical inversion of the bra or ket will not result in any degradation of precision. It is in the *spectrum* that the ill-conditioning of A resides. In isolation, we can invert the spectrum without any degradation of precision -- that is why the discussion in the previous paragraph had no degradation.
However, when an ill-conditioned spectrum is multiplied by anything, the ill-conditioning becomes confounded with the elements of that anything. Then, we get into trouble attempting to invert the result.
Let us try any usual algorithm for the inversion of a matrix. The Gaussian-elimination is best; but, it is difficult to code. The Gauss-Jordan is almost as good; but only a little easier to code. The LDU algorithm is only a little slower; but much easier to code. Hence it is the most popular. Like I said, use any one of these or of several others, which are available. Compute the inverse of A and call the *computed* -- approximate -- result B. Since the bra and ket invert precisely, we have B = computed inverse of (<S>) = >tr * (approximate value of 1/S) * <tr.
Multiply C = A * B = (< * S * >) * (>tr * (approximate value of 1/S) * <tr) = < * S * (> * >tr) * (approximate value of 1/S) * <tr = < * (S * (approximate value of 1/S) )* <tr = < (approximately the identity) * <tr.
Since the (approximately the identity) is well-conditioned, the matrix C will invert precisely, employing any of the usual algorithms. Call the result D = inverse of C = inverse of (A * B) = (inverse of B) * (inverse of A). Since C was well-conditioned, this inversion has been performed precisely.
Now, multiply B * D = B * ((inverse of B) * (inverse of A)) = (B * (inverse of B)) * (inverse of A) = I * (inverse of A) = inverse of A. This result also is very precise.
We have performed two inversions and two multiplications, for a total of four times n**3 multiplications, if we employed the Gaussian-elimination algorithm. No more than eight times n**3 multiplications, if we employ some of the other usual inversion algorithms. This time -- still O(n**3) -- is a small price to pay for what amounts to perfect inversion of even the worst possible ill-conditioned matrix.
So there you have it. The "twice" algorithm for the inversion of a matrix.
From extensive experience and theoretical considerations, the worst possible meaningful ill-conditioning has to be a little less than the precision of our calculation. Otherwise, the spectrum is confounded with the bra and the ket to an extent that it is lost in the round-off errors of just expressing the matrix A = <S>. Construct a matrix with known bra, spectrum, and ket. Then the product >tr * (1/S) * <tr is the *exact* inverse, up to the capability of the given precision to display it. Multiply the A = <S> by this inverse to obtain E = A * (exact inverse of A) = (<S>) * (>tr * (1/S) * <tr). Compare it to the identity by employing the Frobinious matrix norm of (E - I). If this is not the precision of the calculation, then the ill-conditioning is so large that computations are impossible. Also, compute the quality of the computed inverse as the Frobinious matrix norm of (A * (B * D) - I). We find that both of these break from the precision of the calculation to around one, at the same time -- as the ill-conditioning approaches the precision of the calculation.
Thus, the twice algorithm provides a precise -- the precision of the calculation -- result, whenever any calculations are meaningful.
I never saw this algorithm -- or any other algorithm that is any good -- for ill-conditioned matrices, in any textbook. I wonder if this algorithm would be of any interest -- or use -- to anyone? Would anyone even just be interested? Well, the saying goes, "Invent a better mouse-trap and the world will beat a path to your doorstep!" I doubt it. But, each of us will post it to the Web. We will see what happens. Will anybody who might decide to utilize this algorithm be ethical enough to let us know?
In Information Theory, there is a problem known as the "three channels in cascade problem". The matrix that needs to be factored is known to be positive symmetric; hence, it is square. It is partitioned into four sub-matrices, with the two on the principal diagonal being square. They ordinarily would be of the same size; but, need not be.
We may write this matrix as (A1 A3 A3tr A2). Then, it follows that A1 and A2 are positive symmetric -- and, by hypothesis -- square.
With the previous theorem for symmetric (Hermitian) matrices, we may factor the A1 = O1 * D1 * O1tr and A2 = O2 * D2 * O2tr. The diagonal matrices D1 and D2 are positive, by the original hypothesis on the original large matrix. Let S1 = sqrt(D1) and S2 = sqrt(D2).
Let A4 = (1/S1) * O1tr * A3 * O2 * (1/S2) = < S > be factored, as shown, by means of the theorem for the real (complex) matrices. We *have* to employ this theorem; because all that can be said of the matrix A4 is that it is real.
Compute A3 = O1 * S1 * < * S * > * S2 * O2tr
Substituting into the original matrix, we obtain (O1 * S1 * S1 * O1tr O1 * S1 * < * S * > * S2 * O2tr O2 * S2 * >tr * S * <tr * S1 * O1tr O2 * S2 * S2 * O2tr). Its obvious factorization is into the large bra = (O1 * S1 * < 0 0 O2 * S2 * >tr), the large spectrum = (I S Str I), and the large ket = (<tr * S1 * O1tr 0 0 > * S2 * O2tr).
The gist of the matrix is the small spectrum S. The interaction with other devices -- represented by other like matrices -- is embodied in the large bra and the large ket. Actually, they are each others transpose -- as one would expect -- and indeed, of necessity are because the original large matrix by hypothesis is symmetric.
Obtaining this factorization was my motivation for the study and research of Linear Algebra.
This theorem has profound ramifications in Statistics, Quantum Mechanics, and General Relativity, to name some obvious immediate applications.
However, it will take something like a thousand typewritten pages to present Information Theory, to lead up to this matrix. Without this background, I admit that the matrix looks enigmatic.
This lesson finishes the main-track of Linear Algebra. There are some side-issues, which are very interesting, in their own right. We may take them up, at some later time.
Did you survive? How did you like the lessons in Linear Algebra? Any comments regarding bras?
Did I miss anything -- that should be described as pre-requisite material? Does anything need better, more detailed, or more extensive explanation? Which specific lesson should be expanded, perhaps?
Will you be interested in possible future descriptions of some of the other specific algorithms eigen-values for the determinant definition and the eigen-vectors as the corresponding null-space, the degenerate structure as revealed by the multiplication family of algorithms and my squaring remember of that family, the various algorithms for inverting a (well-conditioned) matrix (Gaussian elimination, LDU, and some others), several other algorithms for finding the eigen-values and eigen-vectors of a symmetric matrix? Which, if any, interest you?
The relation of the 3Ch theorem -- and the factorization of that matrix -- to Statistics factor analysis (principal factors and their loadings), principal correlation and regression? We know which tools a user possesses from that which he can disassemble and reassemble. Thus we know that Statisticians do *not* posses this factorization theorem -- and the computer program implementation thereof -- because they cannot perform principal regression on a data which had been subjected to principal factor analysis.
It is trivial in one dimension. Any fool can perform multi-dimensional regression. Unfortunately, it makes little sense unless both the principal factor analysis and the transformation to the principal regression had been performed first. Statisticians never do so, for lack of the required tools.
webmaster@rism.com
Last modified on Thursday 22-nd July 1999
Copyright (c) 1999 by R. I. 'Scibor-Marchocki