and
Not to obscure the gist of the following theorem, we describe the various vectors and matrix involved, first. Each vector is a horizontal-vector of n components. Let X be a vector function of the scalar t. Let y be a scalar function of t. Given a set {ti} of n constants. Let C be a vector, whose constant components -- to be determined later -- we call the coefficients. Let the matrix A consist of n rows of X(ti). We assume that the matrix A is not singular.
Lemma: If the vector C is such that Ytr = A Ctr; then y = X Ctr is a function of the basis X and passes through the points of which the vector Ytr is comprised.
Proof is obvious.
Theorem: The two formulations are equivalent. 0 = det((X, y; A, Ytr)) or y = X Ainv Ytr gives y as a function of the basis X and passes through the points of which the vector Ytr is comprised.
Proof: Since the two formulations say the same thing, only in a different notation, a proof of either proves the other. For the proof of the determinant version, I say, "Behold!". QED. For the independent proof of the matrix version, pre-multiply the first equation of the lemma by Ainv, to obtain Ainv Ytr = Ctr. Substitute into the second equation of the lemma, to obtain the matrix equation of the theorem. QED.
Corollary: If the matrix A is ortho-normal, that is, A = O; then, the three aforementioned equations involving A become, respectively: Ytr = O Ctr, y = X Otr Ytr, and Otr Ytr = Ctr. Furthermore, the matrix A may be made ortho-normal by subjecting Atr to the Gramm-Schmidt ortho-normalization algorithm. Of course, then, the vector Xtr would have to be subject to the same linear transformation.
Matrix multiplication involves repeated addition of the partial products. Instead of addition, we may employ a Lebesque-Stiltjes integral. In the special case where X is comprised of the denumerably-infinite set of components {1 / (2 pi), (1 / pi) cos([i / 2] t), (1 / pi) sin([i / 2] t)}, we obtain the classical Fourier series. The general X corresponds to a generalized Fourier series. If we let X be a function of a real-variable i; we obtain a generalized Fourier integral. Of course, we gloss-over the vital questions 1) A proof that X is a basis for some class of functions y. And 2) The convergences as the finite n increases without bound and then is transformed into a continuous variable.
Instead of t, we may employ another variable – say u – as the independent variable, with t being a function of u, namely t = t(u). A linear function is useful for scaling (gauge transformation) or translation. The Arcsine is another interesting function – it yields ortho-normal polynomials upon an interval.
[Historical note: My very good friend Ralph took the twelfth-grade College Algebra course -- with a textbook of the same title, written by Ryder -- in the Summer of 1942. This was just before I entered the twelfth-grade in the Autumn of that year. As he studied College Algebra, we taught it to me. (If you are reading this, Ralph, gramercy very much for launching me on the study of Linear Algebra.) For some time prior, I had been in the need for an extrapolation formula. As soon as Ralph taught me about determinants, I discovered/formulated the determinant version of the foregoing theorem, for the special case where X is comprised of the powers t^j for j in [0, n). Over the next several years, as I learned of other functions than the powers of t, I realized that X may be any vector function. The only requirement is that the matrix A be non-singular. When in 1980, programs -- I wrote them myself -- capable of computing with matrices became available to me, I re-formulated the determinant form of the theorem to the matrix form. Now, all of the Linear Algebra may be employed in conjunction with the theorem. I never have seen this theorem mentioned in the literature. There are other, better known, interpolation/extrapolation formulae. For instance --]
Theorem: The polynomial function y = sum, with i in [0, n) of (t / ti) times the product, with j in [0, n) but not equal to i, of ((t - tj) / (ti - tj)) passes through the points of which the vector Ytr is comprised.
Proof: Again, I say, "Behold!"
Corollary: This is the same polynomial as that of the special case of the previous theorem, where X is comprised of the powers of t.
Proof: A polynomial of degree (n - 1) that passes through n points is unique. QED.
The French mathematician Alexandre-Theophile Vandermonde (28-II-1735 to 01-I-1798) studied the properties of determinants.
Given an n-dimensional vector X = {x(j), with j an integer in [0, n)}.
Definition: The elementary symmetric function of order i is the sum of the products
of the combinations of the elements of the vector X, taken i at a time. It usually is depicted as the lower-case sigma, with a
subscript. However, for lack of the appropriate glyphs here, we will write it as shown --
Thus: s(0) = 1, s(1) = x(0) + x(1) + x(2) + … + x(n – 1), …, s(n) = x(0) * x(1) * x(2) * … * x(n – 1).
Theorem: The elementary symmetric function of order i has C(n, i) terms.
Proof is obvious. Hint: C(n, i) = n! / (i! (n – i)!) is the amount of combinations of n things taken i at a time.
Theorem: The monic polynomial, whose roots are the elements of the vector X is the product of the (x – x(j)) over j in [0, n). It is equal to x^n – s(1) x^(n – 1) + s(2) x^(n – 2) – s(3) x^(n – 3) + … + (- 1)^n s(n).
Proof is by Mathematical Induction.
Definition: The Vandermonde matrix V of order n has the elements {(x(j))^i, with i an integer in [0, n)}. The i indicates the row and the j indicates the column.
The Vandermonde matrix of order one is (1). Its determinant is one. Henceforth, we will presume that the order is greater than one.
Lemma: Each term in the by-the-definition evaluation of the determinant of the Vandermonde matrix is of degree (n (n – 1) / 2). Hence, the determinant of the Vandermonde matrix also is of degree (n (n – 1) / 2).
Proof is obvious. Hint: Each term is the product of elements of degree j in an arithmetic progression: 1, 2, 3, …, n – 1.
Lemma: For any j1 != j2, the determinant of the Vandermonde matrix has a factor (x(j2) – x(j1)).
Proof is obvious: Hint: The determinant is zero; if x(j2) = x(j1).
Theorem: The determinant of the Vandermonde matrix is f(n) times the product of the product of (x(j2) – x(j1)) over j1 in [0, j2) over j2 in [1, n], where f(n) is some unknown function.
Proof is obvious.
While it is apparent that the f(n) is identically one, I do not see any easy proof. Hence, we will prove the whole thing by Mathematical Induction.
Lemma: The determinant of the Vandermonde matrix of order two is (x(1) – x(0)).
Proof is obvious.
Lemma: The determinant of the Vandermonde matrix of order n is the product of the determinant of the Vandermonde matrix of order (n – 1) by the product of (x(n – 1) – x(j) over j in [0, n – 1).
Proof follows from the foregoing theorem regarding the monic polynomial.
Theorem: The determinant of the Vandermonde matrix is the product of the product of (x(j2) – x(j1)) over j1 in [0, j2) over j2 in [1, n].
Proof is by Mathematical Induction, which is comprised of the foregoing two lemmata.
[It is apparent that, in high-school, I had re-discovered a portion of the work of Vandermonde.]
For the purpose of the following definition and two theorems, consider that x is an element of a division-ring. Thus, an unitary prefix minus-sign designates the additive inverse of the following element and a binary infix minus-sign indicates addition of the additive inverse of the following element to the previous element. Also, the natural-number one is the multiplicative-identity and its successor -- two -- indicates that the following element is to be added two times. We may generalize this concept as the --
Theorem: The product nx, where n is a natural number, commutes.
Proof is by Mathematical Induction upon n.
Definition of a bilinear transformation: y = y(x) = (1 – x) / (1 + x) = 1 +2 ((- x) / (1 + x)).
Lemma: The numerator commutes with the inverse of the denominator, of the bilinear transformation.
Proof: We need to show that (1 – x) (1 / (1 + x)) =? (1 / (1 + x)) (1 – x). Pre and post-multiply by (1 + x), to obtain (1 + x) (1 – x) =? (1 – x) (1 + x). Perform these indicated multiplications, to obtain 1 + x (- x) =? 1 + (- x) x. Subtract 1, to obtain (x) (- x) =? (- x) (x). Ok; because the additive inverse of an element of a division-ring commutes with that element. Now, for a formal proof, reverse the steps. QED.
Theorem: The bilinear transformation is idempotent; i.e., it is its own inverse.
Proof (version one): x =? (1 – (1 – x) / (1 + x)) / (1 + (1 – x) / (1 + x)) = ((1 + x) – (1 – x)) / ((1 + x) + (1 – x)) = (2 x) / (2) = x. QED.
Proof (version two): Post-multiply each side of the definition of the bilinear transformation by (1 + x), to obtain y(x) (1 + x) = (1 - x). Substitute y(x) for each x, to obtain y(y(x)) (1 + (1 - x) / (1 + x)) = 1 - (1 - x) / (1 + x). Post-multiply each side by (1 + x), to obtain y(y(x) ((1 + x) + (1 - x)) = (1 +x) - (1 - x), where we hav employed the distributive law of addition over multiplication. Simplify, to obtain y(y(x)) 2 = 2 x. Division by 2 yields y(y(x)) = x. QED.
Actually, this second version of the proof is easier; because, it did not employ the lemma.
We may generalize the previous lemma into the following theorem.
Lemma: The product f(x) g(x), where f and g are monomials, commutes.
Proof is by Mathematical Induction, first upon one of the exponents then upon the other.
Lemma: The product f(x) g(x), where f and g are polynomials, commutes.
Proof: Hint: By multiplying-out and simplifying, establish that 0 = f(x) g(x) - g(x) f(x).
Lemma: Given the collection, of cardinality n, {fi(x)}of polynomials (not necessarily distinct). The product of these polynomials, taken without replacement, commutes.
Proof is by Mathematical Induction upon n.
Theorem: The product f(x) g(x), where f and g are rational algebraic functions, commutes.
Proof: Strategy: Pre and post multiply by the same sequence of polynomials, to clear of all denominators. Then, apply the last lemma. For a formal proof, reverse the steps.
This theorem succeeds because only a single element x of the division-ring is employed. In general, f(x) g(y) need not commute.
Examples of division-rings:
Real or complex numbers, constituting a field -- which is a special case of a division-ring.
The set of matrices; because it is a division-ring, plus the scalar-multiplication axiom.
Now, we resume our exposition of matrices.
Lemma: For an ortho-normal (or unitary) matrix O, the numerator commutes with the inverse of the denominator, of the fraction (I - O) / (I + Otr).
Proof: We need to show that (I – O) (I / (I + Otr)) =? (I / (I + Otr)) (I – O). Pre and post-multiply by (I + Otr), to obtain (I + Otr) (I – O) =? (I – O) (I + Otr). Perform these indicated multiplications, to obtain Otr – O =? Otr – O. Ok. Now, for a formal proof, reverse the steps. QED. Alternatively, this lemma is just a special case of the preceding theorem. Remember that the transpose of an ortho-normal (or unitary) matrix is its inverse.
Theorem: The bilinear transformation of a skew-symmetric (or skew-Hermitian) matrix is an ortho-normal (or, unitary, respectively) matrix. The bilinear transformation of an ortho-normal (or unitary) matrix is a skew-symmetric (or skew-Hermitian, respectively) matrix.
Proof: In view of the theorem that the bilinear transformation is its own inverse, either one of the statements implies the other. However, because of the importance of this theorem, we will prove each statement directly. For the one that is easier to prove – the first assertion, let K be a skew-symmetric matrix. Let the matrix O = (I – K) / (I + K). Its transpose is Otr = (I / (I + Ktr)) (I – Ktr) = (I / (I – K)) (I + K). We need to show that I =? O Otr = ((I – K) / (I + K)) ((I / (I – K)) (I + K)) = (I – K) (I / ((I + K) (I – K))) (I + K) = (I – K) (I / (I – K K)) (I + K). Post multiply by an identity, in the form of (I – K) / (I – K), to obtain = ((I – K) (I / (I – K K)) (I + K)) ((I – K) / (I – K)) = ((I – K) (I / (I – K K))) ((I + K) (I – K)) / (I – K) = (I – K) ((I / (I – K K)) (I – K K)) / (I – K) = (I – K) / (I – K) = I. QED.
For the second assertion, let O be an ortho-normal matrix. Let the matrix K = (I – O) / (I + O). Post-multiply it by an identity, in the form of (I / (I + Otr)) (I + Otr), to obtain K = ((I – O) / (I + O)) ((I / (I + Otr)) (I + Otr)). Associate the denominators, to obtain K = (I – O) (I / (2 I + O + Otr)) (I + Otr). By the lemma, continue, to obtain, K = ((I – O) (I + Otr)) / (2 I + O + Otr) = (Otr – O) / (2 I + O + Otr). Its transpose is Ktr = (I / (2 I + O + Otr)) (O – Otr). By the lemma, continue, to obtain Ktr = (O – Otr) / (2 I + O + Otr)). We need to show that O =? K + Ktr. It is obvious. QED.
Theorem: The bilinear transformation of a skew-symmetric ortho-normal (or skew-Hermitian unitary) matrix O is its additive inverse, which is skew-symmetric and ortho-normal (or skew-Hermitian and unitary, repsectively).
Proof: That the result of the bilinear transformation is ortho-normal and skew-symmetric follows from the preceding theorem. To find the actual resulting matrix, we have to perform the division. We do so as follows: Pray, view the theorem regarding the properties of a skew-symmetric ortho-normal matrix. In particular, we utilize the two equations (I+ O) (I - O) = 2 I and (I - O) (I - O) = - 2 O. By definition, the bilinear transformation of a matrix O is (I - O) / (I + O). Post-multiply it by the identity in the form I = (I / (I - O)) (I - O), to obtain ((I - O) / (I + O)) ((I / (I - O)) (I - O)). Associate and multiply-out the middle two facotrs, to obtain (I - O) ((I / (I + O)) ((I / (I - O))) (I - O) = (I - O) (I / ((I + O) (I - O))) (I - O) = (I - O) (I / (2 I)) (I - O) = (I - O) (I - O) / 2 = - 2 O / 2 = - O. That this matrix is skew-symmetric and ortho-normal is obvious. QED.
Definition: For brevity of notation, let bt(x) be the bilinear transformation and let Bt(alpha, u) = bt(bt(u) tan(alpha / 2)), where u is a given fixed parameter.
Theorem: At (pi / 2), the bilinear transformation of the bilinear transformation of an ortho-normal matrix is itself; i.e., Bt((pi / 2), O) = O. At (- pi / 2), it is its transpose; i.e., Bt((- pi / 2), O) = Otr. Also, Bt(0, O) = I and Bt(pi, O) = - I.
Proof: The first statement follows from the idempotent property. The remaining statements are obvious.
Theorem: The modulus of periodicity of the bilinear transformation Bt(alpha, u) is (2 pi).
Proof is obvious.
Theorem: The bilinear transformation Bt(alpha, u) has a removable singularity at each odd multiple of pi. It is removed by the multiplication of the numerator and the denominator by cos(alpha). The resulting expression for Bt becomes Bt(alpha, u) = (cos(alpha / 2) - bt(u) sin(alpha / 2)) / (cos(alpha / 2) + bt(u) sin(alpha / 2)) = 1 - 2 bt(u) sin(alpha / 2) / (cos(alpha / 2) + bt(u) sin(alpha / 2)).
Proof is obvious.
Theorem: The first two derivatives, with respect to alpha, of the bilinear transformation Bt(alpha, u) are
Bt'(alpha, u) = - bt(u) / (cos(alpha / 2) + bt(u) sin(alpha / 2))^2
Bt"(alpha, u) = bt(u) (- sin(alpha / 2) + bt(u) cos(alpha / 2)) / (cos(alpha / 2) + bt(u) sin(alpha / 2))^3
Proof is obvious.
Corollary: At alpha = 0, the MacLauren series -- a special case of the Taylor series, where the reference point xo is at the origin -- of the bilinear transformation Bt(alpha, u) is
Bt(alpha, u) = I - bt(u) alpha + (bt(u) alpha)^2 + ....
Proof is obvious.
|
example in two dimensions |
|
Theorem: The bilinear transformation of the ortho-normal matrix function of alpha given by (cos(alpha / 2), sin(alpha / 2); -sin(alpha / 2), cos(alpha / 2)) is (0, -1; 1, 0) tan(alpha / 2). Its matrix factor not only is a skew-symmetric matrix (as it should be); but, also an ortho-normal (actually, permutation) matrix -- a bonus.
Proof: Hint: First, employ the partition and conquer matrix inversion. Then, employ the half-angle formula for the tangent function.
Corollary: The bilinear transformation of the skew-symmetric matrix function of alpha given by (0, -1; 1, 0) tan(alpha / 2) is (cos(alpha / 2), sin(alpha / 2); -sin(alpha / 2), cos(alpha / 2)).
Proof: Version one of the proof follows from the idem-potent property of the bilinear transformation. Version two of the proof is a direct calculation. Hint: Multiply through numerator and denominator by cos(alpha / 2). Make use of the definition of the tangent function. Then, employ the partition and conquer matrix inversion. Finally, employ the addition theorem for the sine and cosine functions. |
Consider complex-numbers, which are developed here. The x is the real-part and the y is the imaginary part.
Theorem: The bilinear transformation maps the y-axis onto a unit-circle, centered at the origin. We go around the circle in the negative (i.e., clockwise) direction; because, we had written the bilinear transformation with the negative sign in the numerator. Had we written it with the negative sign in the denominator, instead; then we would have traversed the circle in the positive (i.e., counterclockwise) direction.
Proof: The field of complex numbers is a special case of a field; hence, the foregoing properties of the bilinear
transformation are applicable. Let x = i tan(w / 2). Then, its bilinear transformation is
(1 - x) / (1 + x) = (1 - i tan(w / 2)) / (1 + i tan(w / 2)) = (1 - i tan(w / 2))^2 / (1 + (tan(w / 2))^2) = (cos(w / 2) - i sin(w / 2))^2 = (exp(-i
w / 2))^2 = exp(-i w). QED.
Corollary: The bilinear transformation maps the unit circle, centered at the origin, onto the y-axis.
Proof follows from the idempotent property of the bilinear transformation.
We would have to generalize the foregoing theorem to a computation of a mapping of the Cartesian coordinate net onto an orthogonal net. That the net is orthogonal follows from the theorem in Complex Variables to the effect that any analytic function provides a conformal mapping. The computation of the net is a generalization of the computation performed for the proof of the foregoing theorem -- nothing new; just tedious. Now, we have constructed a Smith chart. Then, in Ordinary Differential Equations, we have to set-up and solve the telegrapher's equation and show how to plot the results upon the Smith chart. Mayhap, someday.
[The vector cross-product provides a well known rotation about the resultant pseudo-vector as an axis. Textbooks ordinarily say that there is no generalization to higher dimensions. About three-quarters of a century ago, a generalization had been proposed. However, the presentation was too complicated to be of any practical value. In the 1975-1980, I discovered/devised the following concept of an axis of rotation of an ortho-normal matrix. I am posting these original results here in September 2004.]
An ortho-normal matrix is a special case of a general real matrix. As such, it may be factored into a bra, diagonal, ket.
Theorem: Given an ortho-normal (unitary) matrix O. Its factorization into O = < D > is < = I, D = I, and > = O.
Proof is obvious from the general theorems.
This factorization is nice; but, not informative. Instead, let us proceed at a different tack.
Definition: Given an ortho-normal (unitary) matrix O. A (possibly multidimensional) vector V is said to be the axis of rotation of O; if it satisfies the equation V O = V.
Theorem: Existence. Given an arbitrary ortho-normal (unitary) matrix O. There exists an axis of rotation.
Proof: The trivial null-vector (that is, the origin) is an axis of rotation. QED.
We would like something of higher dimensionality. Obviously, its dimensionality cannot exceed the size of the matrix O. We need not be concerned regarding the necessity of choosing among possible candidates.
Theorem: Given an arbitrary ortho-normal (unitary) matrix O. If there is a set of axes; any linear combination thereof is an axis, as well.
Proof is obvious.
Sounds like the axis is a sub-space. Whoops! We have opened a new can of worms. For sooth, matrices do constitute a vector space. As we said at the outset, each row of a matrix may be considered to be a vector.
Theorem: Given an arbitrary ortho-normal (unitary) matrix O. The largest-dimensional axis is the null-space of the bt(O).
Proof: By the definition of the axis, it is the solution of the equation V O = V. Write it as 0 = V (I - O). By the idempotent property of the bilinear transformation, we have that O = bt(bt(O)). Expand the outside bt and substitute into the equation for V, to obtain 0 = V (I - (I - bt(O)) / (i + bt(O))) = V (2 bt(O)) / (I + bt(O)). QED.
Corollary: The axis of the ortho-normal matrix I (the identity matrix) is the whole space. The axis of the ortho-normal matrix -I is the null-vector (the origin).
Proofs are obvious.
Corollary: Given an arbitrary ortho-normal (unitary) matrix O. Except for alpha equal to multiples of pi, each member of the family of ortho-normal (unitary) matrices Bt(alpha, O) has the same largest-dimensional axis as that of O itself.
Proof: The same steps as those for the proof of the theorem yield the equation 0 = V (2 bt(O)) sin(alpha / 2) / (I cos(alpha / 2) + bt(O) sin(alpha / 2)). Since the sine of alpha is not zero, we may divide through by sin(alpha / 2), to obtain the last equation of the theorem. QED.
The foregoing concept of axis of rotation actually is a left axis of rotation. One may define and prove analogous results for a right axix of rotation. We have the
Theorem: The left and right largest-dimensional axes of rotation span the same sub-space; but, viewed from a different vantage-point. Pray, see the caveat within the discussion of the null-space.. Hence, we may speak of the largest-dimensional axis of rotation.
Proof: Since the left axis of rotation leads to the left null-space of the bt(O) and the right axis of rotation leads to the right null-space of bt(O), the two largest-dimensional axes of rotation span the same sub-space; because, the two null-spaces are the same.
Henceforth, when we say "axis", we will mean the "common left and right largest-dimensional axis of rotation".
Lemma: For odd n, an n by n ortho-normal matrix O has at least a one-dimensional, non-null, real axis of rotation.
Proof follows from the singularity of the skew-symmetric bt(O) matrix.
Theorem: The matrix-rank an ortho-normal matrix O is even.
Proof follows from the theorem that the matrix-rank of a skew-symmetric matrix is even. QED.
Corollary: So does the corresponding family.
Proof is obvious.
Observe that I did not place "unitary" in parentheses in the foregoing theorem. It does not generalize to the complex matrices.
Once one has obtained the axis, it would be convenient to take this axis, any given set of vectors, and q.s. (quantum sufficit = sufficient quantity) of random vectors to span the space given by the ortho-normal (unitary) matrix. Then, ortho-normalize them employing the Gramm-Schmidt ortho-normalization algorithm. Thus, we have reduced the concept of an axis of an ortho-normal (unitary) matrix to that of finding the null-space of its bilinear transformation and then performing a Gramm-Schmidt ortho-normalization, each a well-known concept.
The formula for the triple cross-product demonstrates the special case of the foregoing concepts as applied to the cross-product in three-dimensional space.
Documenting this problem is a work in progress. As I bring together the disparate pieces, I will post them -- here, or link to wherever I do post them.
Statement of the problem: Given two fixed end-channels, with a variable middle-channel at our disposal. Consider the overall channel rate.
Does the supremum exist? Yes; because, the channel rate is a real number.
Find that supremum.
Is it realized?
If so; find the middle-channel which realizes the supremum.
The solution of this problem involves Topology, Group Theory (denumerable groups, only), and Linear Algebra. Thus, clearly, it belongs in Algebraic Topology. Here, we will restrict ourselves to only the Linear Algebra portion. It also involves multi-dimensional Spherical Trigonometry and synthetic Geometry, developed from the view-point of an angle rather than a segment. I already have posted the three-dimensional special case. Everything scales easily to higher dimensions, with the exception of the vector cross-product. It has to be replaced by the concept of the axis of rotation of a family of ortho-normal matrices.
A set of cascaded channels becomes a channel, in its own right. Hence, all of the concepts, theorems, and algorithms or a channel are applicable to such a set. Since adjacent channels have to match each other, we insert a variable middle channel between each pair of fixed channels. Thus, we obtain the three channels. To provide the requisite match, each side of this middle channel has to have its LBRA be the transpose of the RKET of the first channel and its RKET be the transpose of the LBRA of the third channel. Also, its LDIAG has to be the inverse of the RDIAG of the first channel and likewise its RDIAG has to be the inverse of the LDIAG of the third channel. Pray, see the factorization of a channel for the terminology.
Then, the middle channel has an arbitrary variable ortho-normal matrix in its middle. However, it is convenient to take it as the transpose of the MKET of the first channel, times an arbitrary permutation, times a family of ortho-normal matrices, times another arbitrary permutation, and finally times the transpose of the MBRA of the third channel.
Taken this way, three of the factors from each of the first and third channel cancel. The remaining factors are the MDIAG of the first channel; the three factors from the middle channel -- the arbitrary permutation, the family of ortho-normal matrices, and other arbitrary permutation; followed by the MDIAG of the third channel. It is in this canonical form that we consider the three-channels-in-cascade problem.
Lemma: Given four non-negative numbers, such that a > b and c > d; then, we have a c + b d > a d + b c.
Proof: We need to show that 0 <? (a c + b d) - (a d + b c) = a (c - d) + b (d - c) = (a - b) (c - d) > 0. QED.
Theorem: Given two sets each of n distinct non-negative numbers, with each set arranged in decreasing sequence. The inequality of the foregoing conclusion generalizes.
Proof is by Mathematical Induction, over the permutations.
Lemma: Given four non-negative numbers, such that a > b and c > d; then, we have (a c)^2 + (b d )^2 > (a d)^2 + (b c)^2.
Proof: Hint: x^2 + y^2 = (x + y)^2 - 2 x y and employ the previous lemma.
Theorem: Given two sets each of n distinct non-negative numbers, with each set arranged in decreasing sequence. The inequality of the foregoing conclusion generalizes.
Proof is by Mathematical Induction, over the permutations.
Lemma: Given four non-negative numbers, such that a > b and c > d; then, we have (1 - (a c)^2) (1 - (b d)^2) < (1 - (a d)^2) (1 - (b c)^2).
Pray, observe that the sense of the inequality has changed.
Proof: Hint: (1 - x) (1 - y) = 1 - (x + y) + x y and employ the previous two lemmata.
Theorem: Given two sets each of n distinct non-negative numbers, with each set arranged in decreasing sequence. The inequality of the foregoing conclusion generalizes.
Proof is by Mathematical Induction, over the permutations.
Lemma: Given four non-negative numbers, such that a > b and c > d; then, we have ln(1 - (a c)^2) + ln(1 - (b d)^2) < ln(1 - (a d)^2) + ln(1 - (b c)^2).
Proof: Hint: The logarithmic function is a strictly monotonic increasing function and employ the previous three lemmata.
Theorem: Inequality of logarithms. Given two sets each of n distinct non-negative numbers, with each set arranged in decreasing sequence. The inequality of the foregoing conclusion generalizes.
Proof is by Mathematical Induction, over the permutations.
Warning: The following two corollaries are in the realm of Algebraic Topology. If your labor contract (as either an instructor or student) precludes your studying of Algebraic Topology; then, skip these corollaries.
Corollary: If the appropriate limit exists, the foregoing theorem generalizes to a denumerably infinite pair of sets.
Corollary: If the appropriate limit exists, the foregoing corollary generalizes to a continuum pair of sets.
Definition: A function is said to be bicontinuous iff (= if and only if) it and its inverse each is continuous.
Theorem: A one-to-one bicontinuous function is strictly monotonic and so is its inverse.
Proof is obvious.
Corollary: A one-to-one bicontinuous function possesses the same extrema as its argument. If the function is increasing; the maxima and minima correspond. Otherwise, if the function is decreasing; the maxima correspond to the minima and the minima correspond to the maxima.
Proof is obvious.
We had obtained some formulae for the channel rate R of a channel: R = -(1/2) ln(det(S) / (det(a) det(c)) = -(1/2) ln(det(InDIAG) = -(1/2) sum(ln(1 - rho^2)). Here, the symbols are S is the positive-definite symmetric matrix of the channel. It is partitioned as S = (a, b; btr, c). The rho's are the individual elements of the MDIAG. Pray, see the chapter on the factorization of a channel for the definitions of the InDIAG and MDIAG.
Theorem: To find the supremum (or maximum) of the channel rate R, it suffices to find the infimum (or minimum) of the determinant det(S).
Proof follows from the definition of the three-channels-in-cascade problem, the previous theorem, the determinant of a product, partition and conquer, and the channel rate formula.
Theorem: Supremum at identity. The supremum of the channel rate is attained (hence, it is a maximum) for the identity matrix among the permutation matrices.
Proof follows from the previous theorem and the inequality of logarithms.
|
example in two dimensions |
|
Lemma: The determinant of the inner-portion of the three-channels-in-cascade problem is an
extremum at alpha a multiple of pi and nowhere else.
Proof is comprised of the following steps:
Theorem: The over-all channel-rate of the three-channels-in-cascade problem attains its global maximum with the middle-channel being an identity matrix. Proof follows from the theorem on bicontinuous functions and that on permutations. |
Now, all that remains is the -- hardly trivial -- task of extending the foregoing result to higher dimmensions. The premutations we had extended by means of Mathematical Induction. I do not see any (easy) way of extending the ortho-normal matrices by Mathematical Induction. Even for the two-dimensional case, we began with the ortho-normal matrix, which does not scale. It would have been much more difficult had we begun with the Bt(alpha, u) function. Furthermore, the bonus -- the matrix bt(u) being ortho-normal -- is rarely true in higher than two-dimensional space. The foregoing lemma had a two-part conclusion: The positive statement -- extremum at alpha a multiple of pi -- and the negative statement -- nowhere else. The challenge is to extend each part to higher dimensions.
We proceed as in the foregoing example:
The ortho-normal matrix of the middle channel is given by the MacLauren series which was described in conjunction with the bilinear transformation.
Pre-multiply by the diagonal d1 of the first channel and post-multiply by the diagonal d3 of the third channel.
Take an identity matrix of twice the size of the resulting product matrix. Place the foregoing matrix at the right-upper corner. Place its transpose at the left-lower corner.
What do we have now? This is a positive-definite symmetric -- hence, of necessity, square -- matrix, which is partitioned in the middle into four equal-sized square matrices. The upper-left and lower-right quadrants are comprised of identity matrices. The lower-left quadrant consists of a matrix which is the transpose of that in the upper-right quadrant. All that remains to describe is this matrix in the upper-right quadrant. Its principal diagonal is the product of an arbitrary permutation of the d1 by another arbitrary permutation of the d3. The remaining (that is, off-diagonal) elements are those resulting from the first-degree (or higher) terms of the MacLauren series expansion of the aforementioned bilinear transformation. Hence, each of these elements contains a factor of alpha.
Compute the MacLauren series (You need only the zero and first degree terms.) of its determinant.
The constant term may be computed by evaluating the determinant with alpha taken as zero. It will become the value of this extremum.
From the definition of a determinant, each product that involves an element off the principal diagonal of either of the right-upper or left-lower quadrants must involve at least one other such element. Since each such element has a factor of alpha, the resulting product has alpha to the (at least) second power. Thus, there is no linear term in alpha in the MacLauren series of the determinant.
Observe that the coefficient of the first degree term is zero.
Hence, we have the ---
Lemma: With the middle-channel given by the orho-normal function Bt(alpha, u), there is an extremum of the overall channel-rate of the three-channels-in-cascade problem at alpha equal to zero.
Proof: Hint: Employ the McLauren series for the bilinear function. It is obvious that the MacLauren series for the determinant does not have a linear term in alpha. QED.
Theorem: Extremum for a permutation matrix. An extremum of the channel rate is attained (hence, it is either a maximum, minimum, inflexion point, or saddle-point) for a permutation matrix among the ortho-normal matrices.
Proof is obvious.
Theorem: A maximum at identity matrix. The over-all channel-rate of the three-channels-in-cascade problem attains its global maximum with the middle-channel being an identity matrix.
Proof follows from the preceding two theorems and the supremum at identity theorem.
Corollary: This maximum is - (1 / 2) sum of ln(1 - (d1i * d2i)^2).
Proof is obvious from the channel rate formula.
Conjecture: The foregoing is the global maximum among the ortho-normal (which, of course, include permutation) matrices. We explicitly exclude any coding.
[I believe that I had devised a proof in circa 1979. However, since I cannot reproduce it at this time, temporarily, I will leave this statement as a "conjecture".]
This conjecture is supported -- but, of course, not proved -- by extensive numerical calculations. Moreover, it is appealing intuitively. Anybody see a proof? Pray, let us discuss.
[I formulated the three-channels-in-cascade problem in 1960 or 1961. Solved it in the later 1970s. However, it
is only now (22 July 2004) that I have written it up in detail.]
Copyright (c) 2003, 4, 5 by R.I. ‘Scibor-Marchocki. Last revised Thursday 01-st December 2005..