copyright (c) 1999 by R. I. 'Scibor-Marchocki
This is a collections of e-mail letters, which I wrote to my very good friend Deb, as a set of lessons in Linear Algebra. I have deleted each salutation, to protect her privacy. If anybody shows interest, someday, I might edit these lessons into a better organized tutorial. So, please write me at webmaster@rism.com.
pseudo-metrizable topology
I have spent all morning looking for my old notes of Information Theory.
I did locate the printed listings of the old Linear Algebra library of computer programs. Eventually, I will have to transcribe the standard (not my own) algorithms. But the bulk of the subroutines will be easier to write from scratch; rather than attempting to decipher the forgotten logic, on the faded printing, by a worn-out ribbon -- just barely legible, if that.
All that I found, related to Information Theory, was a folder entitled, "Sequel." It contains some ramblings -- of dubious value --, which require a knowledge of Information Theory, as a pre-requisite. Not suitable for now -- perhaps, much later.
What I did get, is incredibly dusty!
But, I came across a chart relating the angle theta, the channel rate R, the correlation coefficient, and the determinant of the 3Ch matrix. Also, a formula for the channel rate of optimally-cascaded three channels.
This was not my idea; but, it is so obvious that I am ashamed that I did not think of it myself. Actually, nicely, at least it was a Polish mathematician who pointed it out. Define an angle theta as the principal inverse cosine of each of the principal correlation coefficients. One may go on to show a gratifying geometrical interpretation of the correlation coefficient.
However, he did not realize the profundity of this definition. He said that it is confined to the Gaussian probability distribution. What he did not see -- he could not because of his lack of knowledge of my Information Theory -- was the possibility of a generalization.
We further observe that the channel rate R is equal to minus one-half of the logarithm of (1 - the square of that correlation coefficient). I will show you this derivation later.
Hence, we may solve for the channel rate as minus one-half of the logarithm of the absolute-value (in the real-number sense) of the sine of theta. Now, we have an angle which is a Topological invariant; because the channel rate is invariant -- as follows from the axiomatization of Information Theory (a *very* long proof) -- under a Topological homomorphism, which also is a Group-Theoretic homeomorphism. Observe the slightly different spellings of the "hom...".
Remember, I told you that around 1963, I had asserted that one may obtain an elementary -- either spherical, plane, or hyperbolic -- geometry from just an angle. I have so demonstrated with the recent _Plane and Spherical Trigonometry and Synthetic Geometry_ textbook.
This angle theta leads to an *intrinsic* geometry. It follows from the uniqueness -- which can be established, fairly easily -- that there are only these three intrinsic geometries. And that they correspond A compact topology has an intrinsic spherical geometry. A topology, which is only locally compact, has an intrinsic plane geometry. And a topology, which is not even locally compact, has an intrinsic hyperbolic geometry.
These choices are exhaustive -- hence, the uniqueness. Thus, the intrinsic geometry of any pseudo-metrizable topology is elementary, as a result of this constructive-proof, which I had obtained around 1963.
Originally, the metrization question of Topology was raised around 1925. In the early 1950's, an existence theorem was established Any pseudo-metrizable topology may be pseudo-metrized. The converse is obvious Any pseudo-geometry has a pseudo-metrizable topology.
But, it took another decade for me to provide the uniqueness proof and a construction, as I am outlining here.
In Real Variables, there is a theorem -- named for the mathematician who first proved it (as is my want, I have forgotten his name) -- which states that there is a unique function y = y(x), which solves the equation
Lebesque integral of f(x), from zero to x =
Lebesque integral of g(y), from zero to y,
where f(x) and g(y) are any given Lebesque-integrable functions.
If we take g(y) as the Gaussian probability-density distribution function, then we have mapped any pseudo-metrizable space to another pseudo-metrizable space, which has a Gaussian probability on it. This approach is not very feasible. A better approach is through Group Theory and Quantum Mechanics. But that is another *long* story.
It takes about a hundred pages to prove from my axiomatization; but, let us take it as a starting point. The information rate H(X) is postulated as minus the Lebesque-Stieltjes integral, over the space X, of p(x) * ln(kappa * p(x)) * dx, where p(X) is the probability density distribution p(x) over X. By definition, a probability density distribution is a measure, with the additional restriction that its integral be one.
The scale-factor kappa is required; because the p(x) has units the reciprocal of that of x. Hence, kappa has the same units as x. For example, metres, centi-metres, feet, inches, or Hertz. The product (kappa * p(x)), then, is abstract -- as required for the argument of the Naperian logarithm. Besides, we want H(X) to have the same value, regardless of the units we employ for x.
Now, take p(x) as the Gaussian probability density distribution (in one dimension)
g(x) = (1 / (sqrt(2 * pi) * sigma)) * exp((- 1 / 2) * (x / sigma)**2))
Its logarithm has two terms -- each negative. The first term is
- ln((sqrt(2 * pi) * sigma)
It integrates to
ln(sqrt(2 * pi) * sigma / kappa)
The second term is
(- 1 / 2) * (x / sigma)**2
which integrates to one.
Their sum may be written as
H(X) = ln(sqrt(2 * pi) * epsilon * sigma / kappa)
The channel-rate R is defined
R(XxY) = H(XxY) - (H(X) + H(Y))
As any definition, it just is. There is no proof or justification, except convenience and brevity for what follows.
Substitution yields the aforementioned formula.
In the principal coordinates, channel rate just adds. Because of the logarithm, we obtain the product.
Remember that the determinant of a symmetric (Hermitian) matrix A = O * D * Otr is equal to the determinant of the D, which is the product of its elements. Which, in turn, may be written as the exponential of the trance of the logarithm of D. It is incredible that the determinant of a Hermitian matrix is real. Also, we have that the determinant of A is equal to the exponential of the trace of the logarithm of A.
Likewise, in passing, the absolute value of the determinant of a real (complex) matrix A = <S> is equal to the determinant of S, which again, is the product of its elements. Thus, the absolute value of the determinant of A is the exponential of the trance of the logarithm of S. Because each element of S is positive, we do not need to specify absolute value of S.
Remember, that we defined the 3Ch matrix A as the partition (A1 A3 A3tr A2), By hypothesis, A is a positive-definite symmetric matrix. And A1 and A2 are square matrices.
Taking it all together, we obtain the channel rate R for the matrix A as
- (1 / 2) * ln( det(A) / (det(A1) * det(A2)))
Notice that the kappa has cancelled-out -- always does so for the channel rate. Thus, it is not necessary to factor these matrices, just to compute the channel rate.
The concepts of a channel and of its cascading is part of Statistical Communications Theory, which in turn, is a part of Probability Theory. Obscure, but neither profound nor original with me. If I ever locate my notes on Information Theory, the Statistical Communications Theory is an early chapter of the book.
Let us cascade three channels. Each of the channels having been factored in the usual 3Ch style, with the D and S ordered in decreasing order.
It only makes sense for the middle channel to be of a form that cancels the ket of the first channel and the bra of the third channel, and has a net structure that of an ortho-normal matrix. It can be shown that the extrema are for a permutation matrix. And that the maximum actually is attained with an identity matrix.
Finally, a tedious -- but straight-forward -- calculation shows that the overall channel rate, in this optimum situation, is for the individual correlation coefficients to multiply. I found this calculation today. I will try to present it in the next lesson.
drunk at a lamp post parable
There is a parable regarding a drunk at a lamp-post, at night.
Someone observes a drunk searching near a lamp-post. Upon inquiry, the drunk says that he is looking for the ignition keys to his automobile, so he may return home.
Asked did he lose the keys here, he replies that no; but this is the only place with enough light so he can see to search.
Why do we study the Gaussian probability distribution? There are several reasons, mostly no better than what the drunk cited.
The Gaussian probability distribution is the only one that lends itself to being studied employing Linear Algebra.
Statisticians have a long history of successful use of the Gaussian probability distribution.
In Probability Theory, it is shown that under fairly general conditions, the limit, as the amount of observations increases without bound, of a probability distribution, is Gaussian. Thus, we presume that the usual probability distribution should be close to being Gaussian.
We had shown that the Information Rate [always so capitalized]
H(X) = ln(sqrt(2 * pi) * epsilon * sigma / kappa)
for a Gaussian probability density distribution, with a standard deviation of sigma. This formula is a good approximation for any probability distribution, which is close to being Gaussian. Furthermore, the errors of approximation tend to cancel, during the subtraction, in the calculation of the Channel Rate [also, always so capitalized] R. And it is the Channel Rate R, which interests us; rather than the Information Rate H.
When we obtain a plane geometry -- say, as the intrinsic geometry of a given topology --, the natural probability distribution is Gaussian.
If our probability distribution is not Gaussian, we showed the mapping to obtain a Gaussian probability density distribution. But we seldom go to the effort of performing such a mapping.
If the intrinsic geometry is either spherical or hyperbolic, it may be embedded in a plane geometry, of higher dimension. As already stated, in the plane geometry, we either have -- or can obtain -- a Gaussian probability distribution.
In passing, we mention that matrices may be employed for the study of spherical geometry. This study is known as Quantum Mechanics. P.A.M. Dirac was the first to do so. As Herman Weyl first demonstrated, we further may describe these matrices by means of Group Theory. Thus, we have a different branch of Algebra -- Group Theory -- rather than Linear Algebra.
s12 = s1 * s2
Theorem The inverse of a positive-definite symmetric (Hermitian) matrix A is likewise a positive-definite symmetric (Hermitian) matrix.
Proof Factor A = O * D * Otr. The inverse of A is O * (1 / D) * Otr. Its transpose is equal to itself; hence, the inverse of A is a symmetric (Hermitian) matrix. By hypothesis, each element of D is positive definite; thus, so it that of (1 / D). Hence the inverse of A is positive-definite. QED.
Theorem A tri-gonal symmetric (Hermitian) matrix A is positive-definite.
Proof Compute det(A - xI) = 0. Since there is a non-zero constant term, there is no root equal to zero. Hence, the matrix is definite. By Descartess' rule of signs, there are no negative roots. Since a symmetric (Hermitian) matrix has each root real, each must be positive-definite. QED.
Theorem The inverse of a symmetric (Hermitian) tri-gonal matrix A is obvious.
Algorithm Compute the determinant of A. Then the inverse may be written as a matrix, each of whose elements is a polynomial of degree no larger than the order of A, divided by the determinant of A. Since the inverse is symmetric (Hermitian), utilize the symmetry, at each step of the computation. The computation proceeds in the following order, upon the elements of the inverse matrix. Compute along the upper-diagonal. Proceed along each higher-diagonal, in turn. Finish along the principal-diagonal.
Definition A *channel* is a probability distribution upon the Cartesian product of two spaces, often taken as XxY. The space X is called the *input*. The space Y is called the *output*. The special case of a Gaussian probability distribution is what we presented as the 3Ch.
Theorem (stated without proof) The probability-density distribution upon a memory-less Cartesian product of spaces factors into a product probability-density distribution. Sorry, I realize that this theorem is incomprehensible, without a large background of Information Theory. Even the word "memory" and the phrase "product probability-density distribution" are obscure.
Remember that any positive-definite symmetric matrix A may be used to provide a Gaussian probability-density distribution.
Definition A sequence of m *cascaded* channel is a probability distribution upon the Cartesian product of m+1 spaces, without memory across any pair of non-contiguous spaces. The special case of a Gaussian probability distribution is that given by a positive-definite symmetric matrix A, whose inverse is a tri-gonal -- perhaps partitioned -- matrix, as follows from the preceding theorem.
Finally, we are in a position to prove the
Theorem Consider two cascaded elementary (that is, not partitioned) channels XxYxZ, with a Gaussian probability distribution. Let s1 be the correlation coefficient of XxY and s2 that of YxZ. Then the correlation coefficient s12 of XxZ is s12 = s1*s2.
Proof For your morbid curiosity, the matrix A defining the Gaussian probability-density distribution upon XxYxZ is
A = (1-b**2 -a ab -a 1 -b ab -b 1-a**2) / (1 - a**2 - b**2)
Working backwards, I obtained A by inverting the next matrix. We never use the matrix A. Hence, we did not need to display it.
The inverse of A is the symmetric tri-gonal matrix
(inverse of A) = (1 a 0 a 1 b 0 b 1)
It is positive-definite, by a previous theorem. Hence A is a positive-definite symmetric matrix, by another of the previous theorems. Since the inverse of A is a tri-gonal matrix, and since -- as just shown -- the matrix A is positive-definite, as well, the matrix A may be employed to define the Gaussian probability-density distribution upon the space XxYxZ.
All that remains, is to compute the three correlation coefficients.
First multiply-out
(x y z) * (inverse of A) * (x y z)tr
It is the quadratic, in these three variables,
x**2 + 2 a x y + y**2 + 2 b y z + z**2
where we have omitted the asterisk of multiplication between juxtaposed symbols. Observe, that -- as required for a cascaded channel -- there is no term in the product x*z.
To integrate-out the variable z, we have to complete its square
x**2 + 2 a x y + (1 - b**2) * y**2 + (b*y + z)**2
Only the first three terms remain, after integration with respect to the term containing the variable z.
This result may be factored as
(x y) * (1 a a 1-b**2) * (x y)tr
The middle matrix may be factored as
(1 0 0 sqrt(1-b**2)) * (1 a/sqrt(1-b**2) a/sqrt(1-b**2) 1) * (1 0 0 sqrt(1-b**2))
From the middle factor of which, we observe that the correlation coefficient s1 is
s1 = - a / sqrt(1 - b**2)
By symmetry of notation,
s2 = - b / sqrt(1 - a**2)
Returning to the first polynomial. To integrate-out the variable y, we have to compete its square
(1 - a**2) * x**2 - 2 a b x z + (1 - b**2) * z**2 + (y + a * x + b * z)**2
Only the first three terms remain, after integration with respect to the term containing the variable y.
This result may be factored as
(x z) * (1-a**2 -ab -ab 1-b**2) * (x z)tr
The middle matrix may be factored as
(sqrt(1-a**2) 0 0 sqrt(1-b**2)) *
(1 - ab/sqrt((1-a**2)*(1-b**2)) -ab/sqrt((1-a**2)*(1-b**2)) 1) *
(sqrt(1-a**2) 0 0 sqrt(1-b**2))
From the middle factor of which, we observe that the correlation coefficient s12 is
s12 = a * b / sqrt((1 - a**2) * (1 - b**2))
QED.
The identity permutation realizes the supremum over all permutations.
The three-channels in cascade problem. I had formulated this problem in about 1962. It took me two decades to solve it.
Here I will state the general problem, in Information Theory. Then, I will attempt to solve the special case of a Gaussian-probability distribution. We need the solution of this special case, on our way to the solution of the general case.
The three-channels in cascade problem. Given three-channels in cascade, say WxXxYxZ, with each end-channel -- WxX and YxZ -- fixed, what is the supremum of the over-all Channel -- WxZ -- Rate, over the set of variable middle-channels XxY? Is this supremum attainable? If so, at what middle channel?
Let us confine ourselves to Gaussian probability.
Each of the three channels may be represented in the form of the 3Ch matrix. Factor each. Just to be a meaningful middle-channel, the bra of the middle channel has to be the inverse of the ket of the first channel. And, likewise, the ket of the middle channel has to be the inverse of the bra of the last channel. What remains of the middle channel is an ortho-normal matrix. The problem reduces to a manipulation of this matrix.
Our goal is to show that an identity matrix for the middle channel realizes the supremum. And we already know how to compute the overall-Channel Rate in that case.
Thus the problem reduces to the product
S1 * (the ortho-normal matrix of the middle channel) * S3
with the S1 and S3 fixed.
Let us consider a special case of a two-dimensional channel. Thus, S1 may be taken as
S1 = (a 0 0 b) with 1 > abs(a) > abs(b) > 0
and S3 as
S3 = (c 0 0 d) with 1 > abs(c) > abs(d) > 0
If the middle-channel is a permutation matrix, there are only two possibilities -- and identity matrix or a transposition matrix --
S = (a*c 0 0 b*d) or S = (a*d 0 0 b*c)
The overall-Channel Rate is
R(WxZ) = - ln(det(I S Str I))
Because of the minus sign, the smaller determinant will produce the larger Channel Rate. Thus, we have to show that the difference between the two values is strictly negative.
Expand the determinant. That for the first S, above, is
(1 - a**2 c**2) * (1 - b**2 c**2)
Multiply it out to obtain
1 - (a**2 c**2 + b**2 d**2) + a**2 b**2 c**2 d**2
Subtract the corresponding product for the second S, above, to obtain
- ((a**2 c**2 + b**2 d**2) - (a**2 d**2 + b**2 c**2))
It factors to
- (a**2 - b**2) * (c**2 - d**2) < 0
Hence, we see that the identity matrix wins.
From Group Theory, we know that any permutation may be factored into a product of transpositions.
Then, by Mathematical Induction, the foregoing result generalizes to any dimensionality channels. The identity permutation realizes the supremum over all permutations.
in the two-dimensional case, the identity matrix realizes the supremum, over all ortho-normal matrices
Go back to the two-dimensional case, with a general ortho-normal matrix.
The matrix is (cos(alpha/2) sin(alpha/2) -sin(alpha/2) cos(alpha/2)
with alpha in the closed-interval [-2 * pi, 2 * pi)].
The product is
(a 0 0 b) * (the foregoing ortho-normal matrix) * (c 0 0 d)
And the determinant is that of
(I (the foregoing product) (its transpose) I)
Remember the identity (cos(x))**2 + (sin(X))**2 = 1. It is tedious, but the determinant is a constant plus
2 * (a**2 - b**2) * (c**2 - d**2) * cos(alpha)
Set the derivative of the determinant equal to zero, and solve for alpha. We obtain
0 = (a strictly positive value) * sin(alpha)
Thus alpha is either 0, pi, or -pi.
Substitute back into the ortho-normal matrix. The zero corresponds to the identity permutation matrix. The pi or minus pi correspond to the same other permutation matrix.
Last time, we showed that over all permutation matrices, the identity matrix realizes the supremum.
Thus, at least in the two-dimensional case, the identity matrix realizes the supremum, over all ortho-normal matrices.
The two-dimensional transposition, we generalized to the multi-dimensional permutation matrix. We made use of Group Theory.
The generalization of the two-dimensional ortho-normal matrix to multi-dimensions requires the use of the bilinear transformation. I remember that I did it once; but, now I cannot remember exactly how I did it. -) I will try to build up to it. Maybe it will become apparent, as we progress.
the proof is too long to include here
Some time ago, we discussed bilinear transformations.
There were two such. Either one will do. Designate it as the function f(x) Just be sure to stick with whichever one you decide to employ..
The bilinear transformation has many interesting properties. Here we present a few.
Given a fixed ortho-normal (unitary) matrix O, let F(alpha) be the function
F(alpha) = f(f(O) * tan(alpha / 4))
We took the "/ 4"; because, the outside bilinear transformation doubles the angle and the subsequent multiplication of F by its transpose -- to obtain a rotation through an angle -- doubles the angle again. Thus, F(alpha) is a rotation through the angle alpha.
The following properties of F(alpha) are obvious -- they are stated without proof.
F(alpha) is an ortho-normal (unitary) matrix F(alpha) * F(alpha)tr = I
F(- alpha) = F(alpha)tr
F(pi) = O
The limit, as alpha approaches 2*pi, of F(alpha) is minus the identity, with any constant elements remaining at the identity.
If we factor F(alpha) = < * S(alpha) * >, the bra and ket are constants -- that is, they are independent of the alpha -- with < = O. This proof is tricky.
The defining condition for an ortho-normal (unitary) matrix O is that O * Otr = I. Any permutation matrix P obviously is an ortho-normal matrix; hence P * Ptr = I. However, we also have the
Theorem (a permutation matrix is idem-potent) Iff (= if and only if) a matrix P is a permutation matrix; then P * P = I. The proof is obvious.
The two-dimensional ortho-normal matrix
(cos(alpha/2) sin(alpha/2) -sin(alpha/2) cos(alpha)
which we considered last time, has its O at alpha=pi. It is
O = (0 1 -1 0)
This is a permutation matrix. Conversely, in two-dimensional space, any ortho-normal matrix may be represented as F(alpha); thus, in terms of a permutation matrix. Proof by construction Set F(alpha) equal to the given ortho-normal matrix and solve the set of two simultaneous trigonometric equations for alpha.
Last time, we showed that, in two-dimensional space, the determinant was a minimum at alpha equal to zero, and a maximum at alpha equal to plus or minus pi. We did so by setting the first derivative of the determinant equal to zero, then substituting back into the equation for the determinant. However, it would have been easier to find the second derivative. It is strictly positive at alpha equal to zero and strictly negative at alpha equal to plus or minus pi. By the way, we obtain another -- equivalent -- minimum at alpha equal to plus or minus 2*pi.
We had said that these computations are straight-forward, but tedious. This computation was not peculiar to two-dimensional space. Actually, it was peculiar to the defining ortho-normal matrix being a permutation matrix.
Hence, the same computation may be carried out for any permutation matrix, in multi-dimensional space. It is a lot more tedious, however. That is why we employed the group-theoretic properties of a permutation, instead, to extend our result to multi-dimensional space.
For an ortho-normal matrix, which is not a permutation matrix, I did prove a similar result. However, that was long after my typewriter had broken several type-bars. Thus, I doubt that there is any written record of that proof. I remember that I had implemented a computer program, which displayed the over-all Channel Rate as a function of alpha, for any given ortho-normal matrix. The function was periodic, with a modulus of periodicity of 2*pi. And a maximum at zero; minima at plus or minus pi. Just as with sun-spots, while the sun-spot cycle has an obvious periodicity of about eleven years, the actual modulus of periodicity is about twenty-two years; the actual periodicity of the over-all channel is 4*pi.
At this time, I will have to plagiarize Fermat, and say, that the proof is too long to include here. Actually, at this time, I have no idea how to prove it. -) If the proof should occur to me, I will write another lesson on it.
at alpha equal to zero
Given any specific ortho-normal (unitary) matrix O, let B = f(O) be its corresponding skew-symmetric (skew-Hermitian) matrix. Write B = bk * Bk, summed -- by Einstein's convention -- over k, in the closed interval [1, K], where K is less that or equal to n, the order of O. Absorb any minus signs into the Bk, thus, making each bk strictly greater than zero. Each Bk is to be taken as a skew-symmetric (skew-Hermitian) matrix, with exactly one pair of plus-one and minus-one.
Now, F(alpha) becomes
F(alpha) = f((bk * Bk) * tan(alpha / 4))
Use partial differentiation -- actually, the gradient with respect to the bk -- to extend the one-dimensional result, at alpha equal to zero, to this ortho-normal matrix O.
Next time, we will show how to treat the point at alpha equal to pi.
Gaussian Three-channels-in-cascade theorem
Given two diagonal square matrices S1 and S2 and a proper ortho-normal matrix O, each of them of the same size. The elements of S1 and S2 belong to the open interval (-1, 1) and are arranged in decreasing order of magnitudes.
At first, we further will assume that S1 and S2 are not degenerate; that is, that neither S1 nor S2 has any repeated elements.
For brevity of notation, let
R = S1 * O * S2
and consider the symmetric positive-definite matrix
M = (I R Rtr I)
Theorem The determinant
det = det(M) = det(I R Rtr I)
may be evaluated as each of the following four forms
#1 det = det(I - R * Rtr)
#2 det = det(O - S1**2 * O * S2**2) provided S1 is not singular
#3 det = det(1 - Rtr * R)
#4 det = det(Otr - S2**2 * Otr * S1**2) provided S2 is not singular.
If both S1 and S2 are singular, partition off the smaller null-space, in the matrix M. In the complementary space, the remaining portions of S1 and S2 are not both singular. The determinant of this complementary M is the same as that of the original M. Thus, without loss of generality, we further assume that S1 and S2 are not both singular.
Proof of #1. Cross-multiply the matrix M.
Proof of #2. Factor
(O - S1**2 * O * S2**2) = S1 * (I - R * Rtr) * (1/S1) * (O * Otr)
Remember that the determinant of a product of matrices is equal to the product of the determinants of each of the matrices.. The determinant of Otr is equal to that of O, each being either plus or minus one. Also, the determinant of (1/S1) is the reciprocal of the determinant of S1 QED.
Proof of #3. A matrix commutes with its transpose.
Proof of #4. Factor
(Otr - S2**2 * Otr * S1**2) = S2 * (1 - Rtr * R) * (1/S2) * (Otr * O)
From here on, the proof proceeds like that of #2, above. QED.
Theorem Gaussian Three-channels-in-cascade theorem. Given two fixed end-channels (first and third channel, respectively) and a variable middle-channel,. The supremum of the over-all Channel Rate is realized with the middle channel being the product of the inverse of the ket of the first channel, an arbitrary proper ortho-normal matrix over each degenerate partition of the first channel, a diagonal matrix whose elements are plus or minus one, an arbitrary proper ortho-normal matrix over each degenerate partition of the third channel, and the inverse of the bra of the third channel. Likewise, the infimum is realized with the diagonal matrix being the other diagonal of the square; that is, having a slope of plus one.
Thus, the maximum and the minimum is unique, up to degeneracies and axial-vectors.
Proof. The proof for the infimum is similar. We prove the supremum. We consider only non-degenerate, not-singular matrices S1 and S2. We need to show that the global minimum of the determinant occurs with the matrix O being the identity matrix. We employ the factorization #2, above.
Consider a two-dimensional case, Write
S1**2 = (a 0 0 b) with 1 > a > b > 0
S2**2 = (d 0 0 e) with 1 > d > e > 0
and
O = (cos(alpha/2) sin(alpha/2) -sin(alpha/2) cos(alpha/2))
with alpha in the closed interval [-2*pi, 2*pi]
Substitution into #2 yields
det = det((1 - ad)cos(alpha/2) (1 - ae)sin(alpha/2) - (1 - bd)sin(alpha/2) (1 - be)cos(alpha/2)
Cross-multiply to obtain
det = (1/2) * ((1 - ad)(1 - be)(1 + cos(alpha)) + (1 - ae)(1 - bd)(1 - cos(alpha)))
which is some constant plus
- (1/2) * (a - b) * (d - e) cos(alpha)
where the factor (a - b) * (d - e) is strictly positive.
Its first derivative is a strictly-positive coefficient times the sine of alpha. And its second derivative is the same strictly-positive coefficient times the cosine of alpha.
Thus, the global maximum occurs at alpha equal to zero or plus or minus 2*pi. And the global minimum occurs at alpha equal to plus or minus pi. Any values of alpha which differ by an integral multiple of 2*pi correspond to the same axial-vector. QED.
The computation for the foregoing proof was much easier than that stated as "tedious" several lessons ago. There, we had evaluated the determinant, employing the form #1, above.
For higher dimensions, a representative is the three-dimensional
S1**2 = (a 0 0 0 b 0 0 0 c) with 1 > a > b > c > 0
S2**2 = (d 0 0 0 e 0 0 0 f) with 1 > d > e > f > 0
and the ortho-normal matrix O being the product of
(1 0 0 0 cos(beta/2) sin(beta/2) 0 -sin(beta/2) cos(beta/2))
and (cos(alpha/2) sin(alpha/2) 0 -sin(alpha/2) cos(alpha/2) 0 0 0 1)
Again, with alpha and beta being in the closed interval [-2*pi, 2*pi].
It multiplies to a matrix, which has a lone zero in the upper-right hand corner.
Substitute into #2 and expand by minors of the third column, to obtain the determinant.
It consists of four terms. Each is some constant times (1 + cos(beta)(1 + cos(alpha)), with the four permutations of these plus signs being replaced by minus signs.
Take partial derivatives with respect alpha and beta. The result is the same as that of the two-dimensional case.
If the primitive ortho-normal matrices do not overlap, partition and employ the two-dimensional calculation. In case of complete overlap, alpha+beta becomes the new variable.
The proof proceeds by Mathematical Induction. QED.
Partition-out the singularities, as described earlier. The degeneracies are obvious. QED.
The probability of a degeneracy of a random end-channel is zero. However, in crystallography and in Quantum Mechanics -- where in each case, we have geometric and symmetric constrains -- degeneracies are common and of vital importance.
There you have it. This finishes our over-view of Linear Algebra.
We could go back and fill-in many gaps.
Are you interested?
webmaster@rism.com
Last modified on Sunday 01-st August 1999
Copyright (c) 1999 by R. I. 'Scibor-Marchocki