and
In the Complex chapter, we will show how to extend this Real chapter to be applicable to a complex matrix. At this time, however, we consider each matrix to be real. This same comment applies to each of the succeeding chapters, until we arrive at the Complex chapter.
We have shown already that any similarity transformation of an arbitrary diagonal matrix is symmetric. Hence, for a non-symmetric matrix A, the factorization A = O E Otr is impossible, contrary to the assertion in some textbooks. Something has to give. At least one of the three factors has to be different. For instance, the LDU matrix-inversion algorithm (This is a forward reference. However, there is no resultant logical violation; because, we do not make any use of this concept here.) factors the matrix into A = L D U, where D is a diagonal matrix; but, the L and U each is a triangular matrix. And the Gramm-Schmidt ortho-normalization algorithm factors the matrix A into an ortho-normal (or unitary) matrix O and a right-triangular matrix R as A = O R.
[Historical note: While for the case of a symmetric matrix S, the foregoing nomenclature of eigen-values and eigenvectors is accepted, for the case of a non-symmetric matrix A, there is no accepted consistent nomenclature. Even within a single textbook, there may be inconsistencies. The situation becomes even more confusing for a complex matrix. Thus, to avoid any conflicts, I refrain from the use of "eigenvalue" or "eigenvector" in the following discussion.
There are several possibilities. For instance, P. A. M. Dirac proposed the factorization of the real (complex) matrix A as A = < D > . He named the ortho-normal (unitary) matrix < the "bra" and the ortho-normal (unitary) matrix > the "ket". He also named the diagonal D “spectrum”. He struggled for more than a decade; but, was not successful in discovering/devising an algorithm to accomplish such a factorization. Around 1961, I discovered/devised and proved the following algorithm to accomplish this factorization for a real matrix A. As will be shown in the Complex chapter, the same algorithm also factors the complex matrix.]
Lemma: For any arbitrary non-singular matrix B, let C = Binv A. Then, the product B C = A.
Lemma: Given an arbitrary ortho-normal matrix bra < and an arbitrary non-singular diagonal matrix D, let the corresponding matrix ket > be > = (< D)inv A = (1 / D) <tr A Then, by the preceding lemma, we obviously have (< D) > = < D > = A. With our convention that the reciprocal of zero is zero, this result is true even for a singular matrix D.
Definition of folding of a matrix: The product A Atr is said to be the folding of the matrix A.
Lemma: The product matrix (that is, the folding of the matrix A) S = A Atr is symmetric.
Proof: 0 =? S - Str = (A Atr) - (A Atr)tr = A Atr - A Atr = 0. QED.
Temporarily, assume that the given matrix A is non-singular.
Definition: A symmetric matrix is said to be positive if each of its eigenvalues is non-negative; positive definite if it is positive and has no null eigenvalue.
Lemma: The product matrix (that is, the folding of the matrix A) S = A Atr is positive.
Proof: Consider A to be (a, b; c, d). Its transpose is Atr = (a, c; b, d). The product matrix S becomes S = A Atr = (a, b; c, d) (a, c; b, d) = (a a + b b, a c + b d; a c + b d; c c + d d). By the foregoing discussion of the characteristic polynomial, it is a concave-upward parabola. Its minimum is at x-min = ((a a + b b) + (c c + d d)) / 2 = (a^2 + b^2 + c^2 + d^2) / 2 > 0. Its y-intercept is y-intercept = (a a + b b) (c c + d d) - (a c + b d)^2 = (a a c c + a a d d + b b c c + b b d d) - (a a c c + 2 a b c d + b b d d) = a a d d - 2 a b c d + b b c c = a d (a d - b c) - b c (a d - b c) = (a d - b c)^2 > 0. Since the minimum occurs to the right of the y-axis and the y-intercept is above the x-axis, each of the roots must be positive. QED.
By the theorem, there exists an ortho-normal matrix O and a diagonal matrix E, such that S may be factored as S = O E Otr. Since, by the lemma, E is positive, we may take its square-root. Let the diagonal matrix D = sqrt(E). By convention, we take the positive square-root; but, actually, for each of the elements independently, we could take either sign. Let the bra be this ortho-normal matrix O; that is, < = O. Then, we have the factorization of the original matrix A as A = < D >, where we remember that the ket was taken in terms of the bra as > = (1 / D) <tr A.
Theorem: The ket matrix is ortho-normal.
Proof: I =? > >tr = ((1 / D) <tr A) ((1 / D) <tr A)tr = ((1 / D) <tr A) (Atr < (1 / D) = (1 / D) <tr (A Atr) < (1 / D). Since we had A Atr = S = < E <tr = < D^2 <tr, by substitution, we obtain ... = (1 / D) <tr (< D^2 <tr) < (1 / D) = (1 / D) (<tr <) D^2 (<tr <) (1 / D). Since we know that the bra is ortho-normal; that is <tr < = I, we have ... = (1 / D) D^2 (1 / D) = I. QED.
If the matrix A is singular; we have to perform two additional operations: 1) Move the null eigenvalues of the diagonal matrix E to the bottom right. We may accomplish this placement by a sort, in decreasing order. Certain algorithms for the eigenvalue-eigenvector eigen-pairs accomplishe this sort, as a by-product. And, 2) Perform the Gramm-Schmidt ortho-normalization algorithm on the ket, with as many random vectors as there were null eigenvalues in the E diagonal matrix.
The ket matrix is unique, as described in the Degrees of Freedom chapter.
We summarize the --
Algorithm: Given a general real (or general complex) matrix A. To find its factorization into a bra spectrum ket, proceed as follows:
Fold the matrix A, to obtain a positive symmetric (or positive Hermitian) matrix S.
Employ any algorithm that finds the eigenvalue-eigenvector
eigenpairs for a symmetirc (or Hermitian) matrix, to obtain the factorization into S = O E Otr. This
factorization is unique, up to degeneracies.
[Historical note: In 1960-1, when I first discovered/devised this algorithm, it had been unknown whether the Jacobi algorithm is
guaranteed to converege. Hence, I employed the squaring algorithm
(which I discovered and devised for this purpose). Now, that it is known to converge, employ the Jacobi algorithm -- it is by far the best.]
Sort the E, in decreasing order. Obviously the matrix O has to be permuted accordingly.
Let the bra matrix be < = O.
Take the positive square-root of the diagonal matrix E, to obtain the spectrum diagonal matrix D = sqrt(E).
Compute the ket matrix as > = (1 / D) <tr A. This ket is ortho-normal (or unitary).
Perform a Gramm-Schmidt ortho-normalization of the ket, with a q.s. (quantum sufficit = sufficient quantity) of random vectors. This step is harmless for a non-singular matrix A. Hence, it may be performed always, without the necessity of ascertaining the singularity of the matrix A.
Now, you have obtained the factorization of A as A = < D >. This factorization is unique, up to degeneracies; because, the factorization in step two was unique. However, we postpone the discussion of uniqueness of the ket matrix, until the degrees of freedom chapter.
This factorization and hence this algorithm is crucial to the understanding of the multi-dimensional linear regression.
The A = < D > factorization of a non-square matrix, of necessity, must have at least one non-square factor: the ket for a horizontal matrix or the bra for a vertical matrix, respectively. A non-square bra or ket, of course, it not ortho-normal (unitary). It is suggested that you augment, on a zero (identity works also) background, the A matrix to be square, before factoring it. Then each of the factors will be square. Hence, both the bra and the ket will be ortho-normal (unitary). In the case of a complex matrix, the augmentation should be performed the same way for the real and the imaginary parts, which then may be mapped to the real isomorph. Which, finally, may be factored into a unitary bra, a diagonal, and a unitary ket.
Factorization of a matrix A into a symmetric matrix S and an ortho-normal matrix O.
[I formulated and proved the following two theorems, regarding the factorization A = S O, in the early part of February 2006.]
Theorem: Existence and construction. Any general real (or complex) matrix A may be factored into a symmetric (or Hermitian) matrix S times an ortho-normal (or unitary) matrix O. The proof is constructive.
Proof: From the foregoing, we have the factorization A = < D >, where bra < and ket > each is an ortho-normal (or unitary) matrix and the spectrum D is a real diagonal matrix. Write A = < D > = < D (<tr <) > = (< D <tr) (< >) = S O, where S = < D <tr is a symmetric (or Hermitian) matrix and O = < > is an ortho-normal (or unitary) matrix.
The permutations of the eigen-values permitted in the diagonal matrix D and their propagation into the other matrices in the foregoing proof make it difficult to establish the uniqueness of the result. Hence, we investigate a different computation.
Lemma: Any permutation P of the eigen-values of the factorization of a symmetric matrix S yields a new factorization of the same matrix.
Proof: The product (Ptr P) = I is the identity matrix. Write the factorization of the symmetric matrix S as S = O D Otr, where O is an ortho-normal matrix and D is a diagonal matrix of the eigen-values. Then, S = O D Otr = O I D I Otr = O (Ptr P) D (Ptr P) Otr = (O Ptr) (P D Ptr) (O Ptr)tr = O' D' O'tr, where O' = O Ptr is a new ortho-normal matrix and D' = P D Ptr is a new diagonal matrix.
Lemma: There are 2^<matrix-rank of S> square-roots of a positive symmetric matrix S. Exactly one of them -- which we call the principal square-root -- is positive.
The forward-reference to the definition of matrix-rank is not circular logically; because, the reference is to a definition.
proof: The square-root of a positive symmetric matrix may be calculated by any method that is available for the calculation of the square-root of a real number. Let us factor this matrix S as S = O D Otr, where O is an ortho-normal matrix and D is a diagonal matrix of the eigen-values. Then, the square-root of the matrix S is sqrt(S) = O sqrt(D) Otr. Since D is comprised of the matrix-rank of S of non-zero positive numbers, there are 2^<matrix rank of S> combinations of signs for their square-roots. Exactly one such has all of the signs positive. QED.
Lemma: Once a legitimate matrix S is determined and fixed, the matrix O is unique.
Proof: By the foregoing theorem, we have A = S O. Pre-multiply by the inverse of S, to obtain O = Sinv A. Then, O is unique; because matrices constitute a division-ring. QED.
Theorem: Existence and uniqueness. Any general real (or complex) matrix A may be factored into a symmetric (or Hermitian) matrix S times an ortho-normal (or unitary) matrix O. There are 2^<matrix-rand of A> such pairs. Exactly one of them -- which we call the principal factorization -- has S as a positive symmetric (or positive Hermitian) matrix. The proof is constructive.
Proof: Write the requisite factorization of A as A = S O, where the symmetric matrix S is S = sqrt(A Atr) and the ortho-normal matrix O = Sinv A. QED.
Lemma: The magnitude of the determinant of a matrix A is given by |det(A)| = sqrt(det(A Atr)), where "sqrt" is the name of the square-root function.
Proof follows from the theorem that the determinant of a product of matrices is equal to the product of their respective determinants.
This lemma extends the concept of a determinant to that of a non-square matrix -- directly for a horizontal rectangular matrix or with a pre-transposition for a vertical-rectangular matrix.
Theorem: For a general real (or general complex) matrix A, the magnitude of its determinant is given by |det(A)| = exp((1 / 2) trace(ln(A Atr))) = exp(trace(ln(D))).
Proof: Since, by two of the recent lemmata, the folding of a matrix yields a positive symmetric matrix, the proof of theis theorem follows from the corresponding theorem for a positive-definite symmetric-matrix.
Within the realm of Linear Algebra, the trace in the formulae of this theorem is considered to be a summation. Replace this summation by a Lebesque integral, to obtain the corresponding formulae for use in Banach (or Hilbert) space.
Definition: A null-space vector V (possibly multi-dimensional) of a matrix A is given by the solution of the equation V A = 0.
Usually, this concept is presented in conjunction with the inversion of a matrix. However, we consider it easier to present it here.
Theorem: Given a matrix A. If there is a set of null-space vectors; any linear combination thereof is a null-space, as well.
Proof is obvious.
Definition: The null-space vector is the largest-dimensional null-space vector.
The foregoing discussion pertains to the left null-space. A similar discussion could be provided for the right null-space.
Definition: The dimensionality of the null-space of a matrix is called it nullity.
This whole discussion of null spaces could be framed in terms of the vector space of the rows of a given matrix.
Now that we have studied the concept of nullity of a matrix, how do we ascertain it and how do we find the actual null spaces? There are three methods.
Any matrix inversion algorithm yields the nullity of a matrix; but, not the null-spaces.
The Gramm-Schmidt ortho-normalization algorithm can provide the left or right null-space and hence the nullity; but, not the null-space of the diagonal.
Any program that factors a matrix into its bra, diagonal (spectrum), and ket provides the pieces needed for the computation of each of the three null-spaces.
Consider the general real matrix A = (cos(alpha) cos(beta), sin(alpha) cos(beta); - cos(alpha) sin(beta), - sin(alpha) sin(beta)). It obviously is symmetric; iff (= if and only if) the sum alpha + beta is an integral multiple of pi. The left null-space of A is Vl = (sin(beta), cos(beta). That is, Vl A = 0. The right null-space of A is Vr = (cos(alpha), - sin(alpha). That is, A Vrtr = 0. Clearly these two vectors -- Vl and Vr -- are equal; iff the sum alpha + beta is an integral multiple of pi. Hence, by transitivity, we have that the two vectors are equal iff the matrix is symmetric. Thus, we have the
Caveat: The left and right null-spaces are viewed from a different vantage point, except in the case of a symmetric (or Hermitian) matrix and certain other matrices.
This caveat applies to each of the following theorems regarding null-spaces, as well as to the axes of an ortho-normal matrix, in a later chapter. That is, when we say that "the left and right null-spaces are the same", we do not mean that literally Vl = Vr. What we do mean is that Vl < = Vr >tr, as shown by the following theorem.
Lemma: Let the given matrix A be factored A = < D >. Let V be the null-space of this diagonal matrix, i.e., V D = 0. Then, the left null-space of A is Vl = V <tr and the right null-space of A is Vr = V >. Hence, the size (order) of Vl and that of Vr is equal to that of V.
Proof: By hypothesis, V D = 0. Take its transpose, to obtain D Vtr = 0. To show that Vl is the left null-space of A, we need to show that 0 =? Vl A = (V <tr) (< D >) = V (<tr <) D > = V I D > = (V D) > = 0 > = 0. OK. To show that Vr is the right null-space of A, we need to show that 0 =? A Vrtr = (< D >) (V >)tr = ( < D >) (>tr Vtr) = < D (> >tr) Vtr = < D I Vtr = < (D Vtr) = < 0 = 0. OK. Since the Vl and Vr are computed from the V by means of a multiplication by a non-singular matrix, their resulting size is that of the V OK. QED.
Theorem: Null-space by factorization. For this matrix A, we have Vl < = Vr >tr = V, where Vl is the left null-space of A, Vr is the right null-space of A, and V is the null-space of D -- the diagonal in the factorization of the matrix A, as A = < D >. Furthermore, the matrix-rank of Vl, Vr, and V are the same.
Comment: Notice that it is Vl < rather than Vl <tr and Vr >tr rather than Vr >, as might be expected at first glance.
Proof: From the lemma we have Vl = V <tr. Post-multiply by <, to obtain Vl < = V. From the lemma, we also have Vr = V >. Post multiply by >tr, to obtain Vr >tr = V. The conclusion follows from the transitivity of the equality relationship. QED.
Corollary: For any symmetric (or Hermitian) matrix A, the left null-space Vl and the right null-space Vr coincide; i.e., Vr = Vl.
Proof: Since for a symmetric matrix the ket is equal to the transpose of the bra, post-multiply the Vl < = Vr >tr equation of the conclusion of the theorem by the transpose of the bra on the left and the ket on the right, to obtain Vl (< <tr) = Vr (>tr >), which simplifies to the required Vl = Vr. QED.
Theorem: Null-space by Gramm-Schmidt. The Gramm-Schmidt ortho-normalization algorithm yields the Vltr when performed upon the columns or the Vr when performed upon the rows. The nullity is the same in each case. The Vl and Vr are unique spaces.
Proof: It is obvious that we obtain either the Vltr or the Vr, respectively. Traditionally, a complicated Mathematical Induction proof is employed to establish the equality of the nullity. We, however, invoke the preceding theorem. The uniqueness of the Vl and Vr spaces follows for the uniqueness of each step of the Gramm-Schmidt algorithm.
Corollary: Along the principal diagonal, arrange any assortment of son-singular real and singular symmetric (or non-singular complex and singular unitary) matrices. The resulting matrix has its Vl = Vr.For any ortho-normal (or unitary) matrix a and any matrix b, the partitioned matrix A = (a, b; btr, 0) has its Vr = Vl. For any non-singular matrix a, the partitioned matrix A = (a, 0; 0, 0) has its Vr = Vl.
Proof: Hint: Obtain the left and right null-spaces employing the Gramm-Schmidt ortho-normalization algorithm.
Theorem: The construction in the foregoing corollary exhausts the possibilities for a non-symmetric (of non-Hermitian) matrix whose left and right null-spaces are equal.
Proof: Any non-symmetric portion of the matrix has to be isolated in its own diagonal partition.
Definition: A set of equal eigen-values is said to be degenerate.
Theorem: The factorization of a symmetric (or Hermitian) matrix S into S = O E Otr is unique, up to degeneracies.
Proof follows from the definition of the eigenvector as the null-space of an eigen-value.
Corollary: The factorization of a general real (or general complex) matrix A into A = < D > is unique, up to degeneracies.
Proof follows from the uniqueness of products in the division-ring of the matrices.
Theorem: For any non-singular matrix M, if the vector Vo is in the null-space of the matrix A; then, the vector V = Vo Minv is in the null-space of the matrix B = M A.
Proof: By hypothesis Vo A = 0, the null-matrix. We have to show that V B = 0. Thus, 0 =? V B = (Vo Minv) (M A) = Vo (Minv M) A = Vo A = 0. QED.
Theorem: For any matrix M, if the vector Vo is in the null-space of the matrix A; then, so is it in the null-space of the matrix B = A M.
Proof: By hypothesis Vo A = 0, the null-matrix. We have to show that Vo B = 0. Thus, 0 =? Vo (A M) = (Vo A) M = 0 M = 0. QED.
The foregoing two theorems are stated for the left null-space. Analogous theorems hold for the right null-space, as well.
Theorem: The inverse of the matrix A obviously is given by Ainv = >tr (1 / D) <tr.
Proof is obvious.
For this computation of the inverse of the diagonal matrix D, we adopt the convention that the reciprocal of zero is zero. We sort this diagonal matrix only enough to place each of its zero elements at the bottom-right. We apply the same transposition to the bra. The resulting ket will have the same nullity; hence it cannot be ortho-normal. We employ the Gramm-Schmidt ortho-normalization algorithm to pad the bottom of the ket with random ortho-normalized vectors. Thus, yielding an ortho-normal ket, which still multiplies A = < D >. Then, the results – that the product A = < D >, that the ket is ortho-normal, and that the inverse of A is Ainv = <tr (1 / D) >tr – still are true even for a singular matrix A. This is our “best effort” at the inversion of a singular matrix.
Definition: Given a matrix A and an arbitrary non-singular matrix B. The product (B A Binv) is called the generalized similarity transformation of the matrix A by B.
Theorem: If B is an ortho-normal (or unitary) matrix O; the generalized similarity transformation reduces to the similarity transformation. Hence, the "generalized" is legitimized.
Proof is obvious.
Theorem: The determinant of the matrix A is invariant under the generalized similarity transformation of A by B.
Proof is obvious.
Lemma: Given an arbitrary non-singular diagonal matrix D. The diagonal of A is invariant under the generalized similarity transformation of A by D. Hence, so is the trace of A invariant.
Proofs are obvious.
Theorem: The trace of the matrix A is invariant under the generalized similarity transformation of A by B.
Proof: In the literature of the Linear Algebra, this theorem is proven by means of a very complicated Mathematical Induction. However, we are in a position to provide a much easier proof. Factor the matrix B as B = < D >. Then, the generalized similarity transformation of A by B becomes (B A Binv) = ((< D >) A (< D >)inv) = ((< D >) A (>tr Dinv <tr)) = (< (D (> A >tr) Dinv) <tr). Now, evaluate the trace from the inside to the outside. QED.
It is amazing that this theorem is true!
Definition. Given a matrix A and an arbitrary matrix B. The product (B A Btr) is called the congruence transformation of the matrix A by B.
Theorem: If the matrix B is ortho-normal (unitary); then the concepts of congruence and similarity coincide.
Proof follows from the fact -- the definition of an ortho-normal (unitary) matrix -- that the inverse of an ortho-normal (unitary) matrix is its transpose.
The best reference on tensors (as well as on Riemannian geometry) is the book Gravitation by Charles W. Misner, John A. Wheeler, and Kip S. Thorne. ISBN: 0716703440.
Definition: The rank of a tensor is the total amount of its subscripts and superscripts.
A matrix may be defined as a tensor of rank two. Often in the study of tensors, we isolate two of its sub/super-scripts and consider them as a matrix. Thus, all of the theorems and algorithms (as well as subroutines) of matrices are applicable to tensors.
In the literature of matrices, there are two terms which I do not like. However, I present them here, for completeness of the discussion of Linear Algebra.
Definition: A square n by n matrix is said to be of order n.
Why confuse the reader by introducing a new word, which does not extend to a non-square matrix?
Definition: The rank of a matrix is its order less its nullity.
This use of the word "rank" conflicts with that in the study of tensors.
In the Complex chapter, we will show how to extend this discussion to be applicable to a complex matrix. At this time, however, we consider each matrix to be real; but, show the name of the corresponding complex matrix, in parentheses.
In our chapter on symmetric matrices, we had defined the characteristic polynomial as 0 = det(A - x I). There, we had restricted A to be a symmetric matrix. Obviously, such a restriction is not required. The matrix A might just as well be a general real matrix -- insooth, even a general complex matrix. For an n by n matrix, the degree of this characteristic polynomial is n. Thus, for n less than or equal to four, the roots may be found analytically. And, regardless of the value of n, there are various algorithms for finding the roots numerically. Let us call each of these roots an eigenvalue, provided that it is real. We are out of luck for any complex root, even if the matrix A represents a complex matrix.
In our discussion of the characteristic polynomial for a symmetric matrix, we also had said that for each eigenvalue x, the matrix (A - x I) is singular. Our previous restriction to symmetric matrices is not required here either. Thus, the matrix A might just as well be a general real (or complex) matrix. Let us call the set of vectors that spans its null space, the corresponding eigenvectors. There are algorithms available for the required computation. For instance, our Jacobi subroutines for a real (or complex) matrix already implement the construction of the spanning set of vectors. Either of our LDU or Gauss-Jordan matrix inversion algorithms could be augmented with a call to the Gramm-Schmidt ortho-normalization algorithm to do likewise. Thus, for each eigenvector x, we have 0 = V (A - x I) = (A - x I) Vtr, where V is the corresponding eigenvector. Remember, that a null-space is both a left null-space and a right null-space.
Provided that each of the roots of the characteristic polynomial is real, the set of the corresponding eigenvectors may be arranged into a matrix, call it M. Of course, we have to take due regard for any degeneracies of the eigenvalues. Likewise, we arrange the eigenvalues into the diagonal matrix E. Now, we ask the rhetorical question: Is A = M E Mtr? Let us analyze this question by means of a truth-table. There are three columns. 1) Is the given matrix A symmetric (Hermitian)? 2) Is the matrix M ortho-normal (unitary)? 3) Is this equality true? Since each of these columns is the Boolean binary, there are eight rows.
| # | Is A symmetric (Hermitian)? | Is M ortho-normal (unitary)? | A =? M E Mtr | Comment |
| 0 | T | T | T | one |
| 1 | T | T | F | one - impossible |
| 2 | T | F | T | one - impossible |
| 3 | T | F | F | one - impossible |
| 4 | F | T | T | two - impossible |
| 5 | F | T | F | three - unlikely |
| 6 | F | F | T | three - unlikely |
| 7 | F | F | F | three - most likely |
one This is the theorem(s) discussed on the previous page - the factorization of a symmetric matrix.
two. This is the theorem(s) discussed at the top of this page - real.
three is what remains, by the process of elimination. The "most likely" versus "unlikely" depends upon each specific matrix A.
The result may be expressed as the
Theorem: If the eigenvalue diagonal-matrix E of the non-symmetric (non-Hermitian) matrix A has no complex elements; then it is false that its eigenvector matrix M is ortho-normal (unitary) and the congruence equation A = M E Mtr holds. Alternatively, the conclusion may be stated: then either the eigenvector matrix M is not ortho-normal (unitary) or the congruence equation A != M E Mtr does not hold.
Proof: This theorem is a formal statement of the result of the analysis presented in the foregoing truth-table.
Corollary: In the conclusion of the foregoing theorem, one may replace the "congruence" by "similarity" and Mtr by Minv.
Proof: follows from the fact -- the definition of an ortho-normal (unitary) matrix -- that the inverse of an ortho-normal (unitary) matrix is its transpose.
Theorem: Given a real or complex matrix. The constant term of the characteristic polynomial is equal to the determinant of the matrix.
Proof is obvious from the defining equation of the characteristic polynomial.
Corollary: Given a real or complex matrix. The product of the Eigen values is equal to the determinant of the matrix.
Proof: By the definition of Eigen value, each Eigen value is a root of the characteristic equation. The product of negative of the roots of a monic polynomial is equal to the constant term. QED.
For the following several lemmata, theorems, and
corollaries, consider a given general real (or complex) matrix A and a given
orthonormal (or unitary) matrix O. Let the matrix B = O A Otr.
This is called a congruence transformation of the matrix A by the matrix O.
Since Otr is equal to Oinv, this transformation could be written as B = O A
Oinv. It then could be called a
generalized similarity transformation
of the matrix A by the matrix O. Either name is equally applicable.
Hence, this transformation preserves both the trace and the determinant, per
the applicable theorems, which are presented following the foregoing
definitions. (Pray, see the links.)
While the following is an obvious putting together of existing pieces, the
final two corollaries are unexpected and interesting. This material
has been added on Thursday 18-VI-2009.
Lemma: A root x (real or complex) of the characteristic
polynomial 0 = det(B - x I) is also a root
of 0 = det(A - x I).
Proof: We have that I = O Otr = Otr O, from the definition and properties of an orthonormal matrix. Take 0 = det(B - x I) and express B as B = O A Otr, to obtain 0 = det((O A Otr) - x I)).
We may introduce factors of the identity matrix I, to obtain 0 = det((I) ((O A Otr) - x I) (I)) = det((O Otr) ((O A Otr) - x I) (O Otr)).
Use the associative law of multiplication of matrices (division ring operations), to obtain 0 = det(O (Otr (((O A Otr) - x I) O) Otr).
Use the distributive law of multiplication over addition of matrices, to obtain 0 = det(O (Otr (O A Otr) O - (Otr x O)) Otr) = det(O (Otr (O A Otr) O - ((otr x) O)) Otr).
Use the commutative law of multiplication of a matrix by a scalar, to obtain 0 = det(O (Otr (O A Otr) O - x (Otr O)) Otr).
Again, use the associative law of multiplication of matrices, to obtain 0 = det(O (Otr O) A (Otr O) - x (Otr O)) Otr) = det(O (A - x I) Otr).
Since the determinant of a product of matrices is equal to the product of their determinants, we obtain 0 = det(O) det(A - x I) det(Otr).
Since an orthonormal matrix is non-singular, as is its transpose, we obtain 0 = det(A - x I). QED.
Theorem: The set of roots of the equation 0 = det(B - x I) is the same as that of 0 = det(A - x I).
Proof is obvious.
Corollary: The characteristic polynomial is invariant under the transformation B = O A Otr.
Proof is obvious.
Consider either the
left L or right R triangular matrices (whose diagonal elements are not
constrained,
by definition).
We already have encountered these
matrices in the discussion of the
QR matrix inversion algorithm.
Lemma: If A happens to be either the left L or right R triangular matrix, then its diagonal elements are the roots of 0 = det(A - x I).
Proof is obvious.
Theorem: Both the trace and the determinant of the Cholesky matrix Ch is equal to that of the original matrix.
Proof is obvious from their invariance under the transformation B = O A Otr.
Corollary: The sum of the Eigen values (real or complex) of the Cholesky matrix is equal to the trace of the original matrix.
Proof is obvious.
Corollary: The product of the Eigen values (real or complex) of the Cholesky matrix is equal to the determinant of the original matrix.
Proof is obvious.
When we first undertook the study of the generalized
Eigen-value problem, it
appeared to be a dead-end effort. Forsooth, there is no obvious way to
proceed. However, the foregoing proofs show that once we had obtained
the Cholesky factorization by another route of inquiry, it is obvious
how to find these generalized Eigen-values and how to predict their
sum and product from the original matrix. Huzzah!
[I discovered these properties of a skew-symmetric matrix in the early 1980s. I believe these results to be original. I am posting them here in September 2004.]
Lemma: Given a skew-symmetric matrix K. For any root r of its characteristic polynomial (see above for the generalized definition), its negative also is a root.
Proof is obvious.
Theorem: Given an n by n skew-symmetric matrix K. If n is odd; the characteristic polynomial has only odd-degree terms. If n is even; the characteristic polynomial has only even-degree terms.
Proof is obvious.
Theorem: Given an n by n skew-symmetric matrix K. It is singular; if its n is odd.
Proof: From the defining equation of the characteristic polynomial, it is obvious that the constant term of the characteristic polynomial is the determinant of the matrix. QED.
Corollary. Given an n by n skew-symmetric matrix K. The matrix-rank of K is even.
Proof: Apply the theorem to an appropriate subspace. QED.
Observe that I did not place "skew-Hermitian" in parentheses in the foregoing theorems. These results do not generalize to the complex matrices.
The foregoing definitions and algorithms for the eigenvalue eigenvector is about all that can be accomplished in this direction. Hence, since this generalization is not fruitful, we will not follow it any further. Our favorite path is presented at the top of this page and continued on the next page.
Copyright (c) 2003, 4 by R.I. ‘Scibor-Marchocki. Last revised Tuesday 09-th November 2004. A broken link removed Monday 30-VIII-2005. A new theorem added Saturday 04-th February 2006 and expanded on Friday 24-th February 2006. A new section added on Thursday 18-th VI (June) 2009.