and
In the Complex chapter, we will show how to extend this Symmetric chapter to be applicable to a Hermitian matrix (the complex counterpart of a symmetric matrix). At this time, however, we consider each matrix to be real. This same comment applies to each of the succeeding chapters, until we arrive at the Complex chapter.
Theorem: For any given symmetric matrix S, there exist an ortho-normal matrix O and a diagonal matrix E, such that the matrix S may be factored into the product S = O E Otr. By convention, we order the elements of E in decreasing order of absolute values. Within each set of equal absolute values, the positive elements precede the negative ones. This factorization is unique, up to degeneracies. [The proof is postponed until just after that of the corollary.]
Definition: A symmetric matrix is said to be positive or positive-definite iff (= if and only if) each of its eigenvalues (that is, elements of the E) is positive or strictly-positive, respectively.
Corollary: Given a factoring of a symmetric matrix S as S = O E Otr. For any arbitrary permutation matrix P, so is the factoring S = (O Ptr) (P E Ptr) (O Ptr)tr, where (O Ptr) is an ortho-normal matrix and (P E Ptr) is a diagonal matrix. If each of O and P is proper, so is (O Ptr). If S is positive or positive-definite, so are E and (P E Ptr). It is this corollary that we employ to order the eigenvalues in the preceding theorem.
Proof is obvious. However, since we need the results in a subroutine, we present the details. Let the permutation matrix P be P = (0, 1; -1, 0). Since its determinant is one, P is proper by definition. Let the ortho-normal matrix O be O = (a, b; -b, a). Its determinant is det(O) = a^2 + b^2, which is positive; hence, it must be one. Hence, the matrix O is proper. By definition of ortho-normal, the product (O Otr) is equal to the identity. Compute it, to obtain I =? (a, b; -b a) (a, -b; b, a) = (a^2 + b^2, 0; 0, a^2 + b^2) = (1, 0; 0, 1), as it should. Let the diagonal matrix E be E = (c, 0; 0, d). Compute each of the matrices in the conclusion. (O Ptr) = (a, b; -b, a) (0, -1; 1, 0) = (b, -a; a, b). Its determinant is det(O Ptr) = a^2 + b^2 = 1. Hence it is proper. Is it ortho-normal? Compute I =? (O Ptr) (O Ptr) = (b, -a; a, b) (b, a; -a, b) = (a^2 + b^2, 0; 0, a^2 + b^2) = (1, 0; 0, 1). Yes. Please observe that (O Ptr) has the rows interchanged (from those of O), with the original second row multiplied by minus one and placed in the first position. (P E Ptr) = ((0, 1; -1, 0) (c, 0; 0, d)) (0, -1; 1, 0) = (0, d; -c, 0) (0, -1; 1, 0) = (d, 0; 0, c). Please observe that (P E Ptr) has the two diagonal elements interchanged (from those of E. Since they are the same two eigenvalues, the (P E Ptr) is positive or positive-definite if E is, respectively.
Definition of the characteristic polynomial of a given symmetric matrix S is: 0 = det(S - x I).
Proof of the foregoing theorem. By the corollary to the fundamental theorem of algebra, this polynomial has n roots. They are known as the eigenvalues of the matrix S. Being roots of a specific polynomial, the set of eigenvalues is unique. Let us write the set of these roots as a diagonal matrix E. For each eigenvalue x, the matrix (S - x I) is singular. (Why? Because, by definition of the characteristic polynomial, the determinant det(S - x I) is zero.) Hence, the matrix (S - x I) has a unique null space (Since we do not utilize the uniqueness of the factorization in the proofs of the null space, we are not committing a logical error by employing this forward reference.), which is known as the corresponding eigenvector V, which we take as a row-vector. Normalize it so that 1 = V Vtr. Thus. we write 0 = Vi (S - x I). Collect the transpose these eigenvectors into a matrix O. Obviously, the matrix O is ortho-normal. A direct proof is possible, but difficult. Instead, we obtain this result as a corollary of the Jacobi theorem/algorithm. That the symmetric matrix is equal to S = O E Otr follows from the convergence of either any of the multiplication family of algorithms or from that of the Jacobi algorithm. QED.
Let us be specific. Consider the matrix S to be (a, b; b, d). Its characteristic polynomial becomes 0 = det((a, b; b, d) - x (1, 0; 0, 1)) = det(a - x, b; b, d - x) = (a - x) (d - x) - b b = x^2 - (a + d) x + a d - b^2. Complete the square. x^2 - (a + d) x + (a + d)^2 / 4 = b^2 + (a + d)^2 / 4. (x - (a + d) / 2)^2 = (4 b^2 + (a + d)^2) / 4 = (4 b^2 + a^2 + 2 a d + d^2) / 4. Since the right-hand side is the sum of two squares, the roots are real. This is a parabola, concave upwards. Its minimum is at x-min = (a + d) / 2. Its y-intercept occurs when x is zero. Namely y-intercept = a d - b^2.
Conversely: For any arbitrary given ortho-normal matrix O and any arbitrary given diagonal matrix E, the product (a similarity transformation) S = O E Otr is a symmetric matrix.
Proof is obvious.
Corollary: trace(S) = trace(E).
Proof: Consider the factorization S = O E Otr. This is a similarity transformation. Hence, the trace is invariant. QED.
Lemma: The trace(E) is equal to the sum of the eigenvalues.
Proof is obvious
Corollary: The trace(S) is equal to the sum of the eigenvalues.
Proof is obvious
Theorem: norm(S) = norm(E).
Proof: Consider the factorization S = O E Otr. Its transpose is Str = (O E Otr)tr = O E Otr. Then, the product is S Str = (O E Otr) (O E Otr) = O E (Otr O) E Otr = O E I E Otr = O E^2 Otr. This is a similarity transformation of E^2. Hence, trace(O Otr) = trace(E^2). QED.
Theorem: det(S) = det(E).
Proof: Consider the factorization S = O E Otr. Take its determinant: det(S) = det(O E Otr) = det(O) det(E) det(Otr) = (det(O) det(Otr)) det(E) = det(E).
Lemma: The det(E) is equal to the product of the eigenvalues.
Proof is obvious.
Corollary: The det(S) is equal to the product of the eigenvalues.
Proof is obvious.
Lemma: For any scalar a and symmetric matrix S, we have a S = O (a E) Otr. a I = a (O I Otr) = O (a I) Otr. Etc.
Proofs are obvious.
Theorem: Any rational algebraic function f(x) of a symmetric matrix S is given by f(S) = O f(E) Otr. An infinite series converges provided that it does so for each of the elements of E.
Proofs are obvious.
Corollary: Any two such functions of the same symmetric matrix commute: f(S) g(S) = g(S) f(S).
Proof is obvious.
The first sentence of the foregoing theorem may be generalized to an arbitrary set of matrices, provided that the membership in this set is restricted suitably. However, we have to postpone further discussion, until we develop the necessary tools in the following chapter.
Theorem: A symmetric matrix S satisfies its own characteristic equation.
Proof: The characteristic equation is 0 = det(S – x I) = f(x). Substitute S for x, to obtain 0 =? f(S) = det(S – S I) = det(0) = 0. QED.
Definition: A polynomial is said to be monic if the coefficient of its highest degree term is one
Lemma: The characteristic polynomial of a symmetric matrix is monic.
Proof is obvious.
Theorem: Write the characteristic equation 0 = det(S – x I) as the monic polynomial 0 = f(x) = x^n + ... + an. The inverse of the matrix may be written as Sinv = (f(S) – an) / (- an S). The right-hand side is a monic n-1 degree polynomial in S – the matrix S does not appear in the denominator. The constant an = det(S).
Proof is obvious.
Theorem: For a positive-definite symmetric (or positive-definite Hermitian) matrix S, its determinant is given by det(S) = exp(trace(ln(S))) = exp(trace(ln(E))).
Proof: Consider the factorization S = O E Otr. Its logarithm is ln(S) = O ln(E) Otr. Its trace is trace(ln(S)) = trace(ln(E)) = ln(det(E)) = ln(det(S)). QED.
How do we compute the logarithm of a positive-definite symmetric (or positive-definite Hermitian) matrix? The same way as we compute the logarithm of a real (or complex) number. Pray, see the bilinear transformation of a matrix. How fast does it converge? The convergence depends upon the larger of the largest eigenvalue and the reciprocal of the smallest eigenvalue.
Within the realm of Linear Algebra, the trace in the formulae of this theorem is considered to be a summation. Replace this summation by a Lebesque integral, to obtain the corresponding formulae for use in Banach (or Hilbert) space.
If the matrix is not positive; there are two choices: 1) Employ complex arithmetic. This choice is not very convenient. Or, 2) Employ the formula for a general matrix.
Theorem: Given a horizontal eigenvector V of a symmetric matrix S, the corresponding eigenvalue e may be computed as e = V S Vtr.
Proof: By the definition of an eigenvector, we have 0 = V (S - e I). Post multiply it by Vtr, to obtain 0 = V (S - e I) Vtr = V S Vtr - e V Vtr = V S Vtr - e. QED. Since this is such an important theorem, we will present an alternative proof, after the discussion of the Jacobi algorithm.
Theorem: The eigenvalues of a symmetric matrix S are invariant under a similarity transformation.
Proof: Let the ortho-normal matrix for the similarity transformation be Q. The similarity transformation, then, is T = Q S Qtr. This matrix T also is symmetric; because it is the result of a simiarity transformation of a symmetric matrix. We need to show that T and S have the same characteristic equations. The characteristic equation of T is 0 = det(T - x I) = det((Q S Qtr) - x I) = det(Q (S - x I) Qtr) = det(Q) det(S - x I) det(Qtr) = det(S - x I). QED.
Theorem: The inverse of a positive-definite symmetric matrix S is a positive-definite symmetric matrix.
Proof: By the definition of positive-definite, there exist a factorization of S into an ortho-normal matrix O and a diagonal matrix E, such that S = O E Otr. Then, its inverse is Sinv = (Otr)inv Einv Oinv = O (1 / E) Otr. Since the inverse of a diagonal matrix is a diagonal matrix, consisting of the reciprocals of the elements of the original matrix, these elements are strictly positive. Thus, the matrix Sinv satisfies the definition of a positive-definite symmetric matrix. QED.
Theorem: Any two positive-definite symmetric (or Hermitian) matrices A and B may be factored as B O D’ = A O D’inv, where O is an ortho-normal (or unitary) matrix and D’ is a positive-definite diagonal real matrix.
Proof: Factor the product (Binv A) as (Binv A) = O D Otr, where O is an ortho-normal (or unitary) matrix and D is a positive-definite diagonal real matrix. Let D’ be the principal square-root of D. Then, pre-multiply by B and post-multiply by (O D’inv), to obtain B (Binv A) (O Dinv) = B (O D Otr) (O D’inv). Associate the expressions on each side of this equation as (B Binv) (A O D’inv) = (B O D) (Otr O) D’inv. Simplify as A O D’inv = (B O) (D D’inv) = B O D’. QED.
In Linear Algebra, symmetric (or Hermitian) matrices are treated extensively and very well. However, once one moves outside of the scope, into the general real (or general complex) matrices, confusion reigns -- incomplete, misleading, or outright incorrect. The problem addressed by the foregoing theorem has been around for at least a century. The statement and proof also have been copied -- including the error -- from one textbook into the next. I plead guilty of the perpetuation. Do you see the error in the proof?
There is the tacit assumption that the quotient (Binv A) is a positive-definite symmetric matrix. It might be; but, is very unlikely. Just because each of the matrices A and B is positive-definite symmetric -- and hence so is Binv -- says nothing regarding the quotient (Binv A). Therefore, we have to add to the hypothesis of the theorem and reword it thus:
Theorem: Any two positive-definite symmetric (or Hermitian) matrices A and B, whose quotient (Binv A) also is positive definite symmetric (or Hermitian), may be factored as B O D’ = A O D’inv, where O is an ortho-normal (or unitary) matrix and D’ is a positive-definite diagonal real matrix.
Proof: Now, the foregoing proof is correct, as it stands
Of course, with this rewording, the scope of the theorem dwindles to insignificance. We will return to this problem as an application of the Jordan normal form, where we will expand the scope. Alternatively, by relaxing the conclusion somewhat, we may generalize the hypothesis, as shown by the --
Theorem: Any two non-singular real (or complex) matrices A and B may be factored as B Q D' = A O D'inv, where O and Q each is an ortho-normal (or unitary) matrix and D' is a positive-definite diagonal real matrix. The proof is constructive.
Proof: Since, by hypothesis, the matrix B is non-singular, we may write the product (Binv A). Factor [This is a forward reference. However, since we do not make use of the present theorem, it is not logically circular.] this product as (Binv A) = < D >, where the bra < and the ket > each is an ortho-normal (or unitary) matrix and D is a positive-definite diagonal real matrix. Let O = >tr, Q = <, and D' = sqrt(D), where sqrt indicates the principal (that is, positive) square-root. Pre-multiply the preceding equation by B and post-multiply by O D'inv, to obtain the equation of the conclusion. QED .
Copyright (c) 2003, 4, 5, 6 by R.I. ‘Scibor-Marchocki. Last revised Friday 31-st March 2006.