and
Definition: The Reyleigh probability density distribution p(X) is defined as p(x) = x exp(-x^2 / 2) on X = [0, oo).
Theorem: Its integral is the cumulative probability distribution P(X): P(x) = 1 - exp(-x^2 / 2). Evaluated over all of X, it is 1.
Proof: int(x exp(-x^2 / 2) dx on [0, x)) = [- exp(-x^2 / 2)] from 0 to x = 1 - exp(-x^2 / 2). Then, the limit, as x increases without bound, is one. QED.
Corollary: The inverse cumulative probability distribution is x = sqrt(-2 ln(1 - P(x))).
Proof is obvious.
Definition: The Gaussian probability density distribution p(X) is defined as p(x) = (1 / sqrt(2 pi)) exp(-x^2 / 2) on X = (-oo, oo).
Theorem: Its integral over all of X is 1.
Proof: Let J = int(p(x) dx on (-oo, oo)). Then J^2 = int(p(x) dx on (-oo, oo)) int(p(y) dy on (-oo, oo)) = (1 / (2 pi)) int(int(exp(- (x^2 + y^2) / 2)) dy dx on (-oo, oo)^2).
Change to the polar-coordinate. J^2 = (1 / (2 pi)) int(int(r exp(-r^2 / 2) r dr d-theta on [0, 2 pi)) on [0, oo)) = (1 / (2 pi)) int(1 d-theta on [0, 2 pi) int(r exp(-r^2 / 2) dr on [0, oo) = (1 / (2 pi)) (2 pi) (1) = 1. QED.
Theorem: Given a pair of random numbers (u, v) on [0, 1)^2. The pair of coordinates (x, y) on (-oo, oo)^2 of a random Gaussian event is (x, y) = (r cos(theta), r sin(theta), where theta = 2 pi u and r = sqrt(- 2 ln(1 - v)).
Proof is obvious.
Theorem: The average-value xbar of the Gaussian distribution is zero and its standard-deviation sigma is one.
Proof: xbar = ave(x) = int(x p(x) over X) = int(x (1 / sqrt(2 pi)) exp(-x^2 / 2) dx) = (1 / sqrt(2 pi)) exp(-x^2 / 2) = 0.
sigma^2 = int(x^2 p(x) over X) = int(x^2 (1 / sqrt(2 pi)) exp(-x^2 / 2) dx).
Do integration by parts. Take u = x and dv = x exp(-x^2 / 2). Then, the integral = (1 / sqrt(2 pi)) (- x exp(-x^2 / 2) + int(exp(- x^2 / 2) dx)) = 1. QED.
The general formula for the Gaussian-probability density distribution is p(x) = (1 / (sqrt(2 pi) sigma)) exp(-((x - xbar) / sigma)^2).
For an n-dimensional x, it becomes p(x) = (1 / (sqrt(2 pi))^n (1 / sqrt(det(S)) exp(- (x - xbar) Sinv (x - xbar)tr / 2), where S is a positive-definite symmetric-matrix and x is a horizontal-vector.
Definition: The information-rate H(X) is defined as H(X) = -int(p(x) ln(p(x)) dx) on X. [The ramifications of Information Theory do not concern us, here.]
Theorem: For the Gaussian-probability distribution, the information rate is H(X) = (1 / 2) ln(2 pi) + ln(sigma).
Proof: H(x) = int((1 / (sqrt(2 pi) sigma)) exp(-(x / sigma)^2) * ((1 / 2) ln(2 pi)) + ln(sigma) + (x / sigma))^2) = (1 / 2) ln(2 pi) + ln(sigma). QED.
For an n-dimensional x, it becomes H(X) = (n / 2) ln(2 pi) + (1 / 2) ln(det(S)).
Definiton: The channel-rate R(Y, X) is defined as R(Y, X) = (H(Y) + H(X)) - H(Y, X)..
Theorem: For the Gaussian-probability distribution, the channel-rate is R(Y, X) = -(1 / 2) ln(1 - rho^2).
Proof: The S = (Var-xx, Var-xy; Var-yx, Var-yy). Then, H(X) = (1 / 2) ln(2 pi) + (1 / 2) ln(Var-xx). H(Y, X) = (2 / 2) ln(2 pi) + (1 / 2) ln(Var-yy Var-xx - Var-yx Var-xy). H(Y) = (1 / 2) ln(2 pi) + (1 / 2) ln(Var-yy).
R(Y, X) = H(Y) + Y(X) – H(Y, X) = -(1 / 2) ln(1 - rho^2).
We had defined Theta as cos(Theta) = rho. Hence, sin(Theta) = exp(-R(Y, X)).
For the n-dimensional x and n-dimensional y, it becomes R(Y, X) = -(1 / 2) ln(det(InDIAG)), where InDIAG is the middle-factor of the large-factor factorization of a channel S. We already had (Pray, see the proof leading up to the formula for the determinant of the channel.) the formula for the determinant of the InDIAG as det(inDIAG) = product of (1 - di^2), where di are the elements of the MDIAG. Substitution yields R(Y, X) = -(1 / 2) sum of ln(1 - di^2).
From the relations among the determinants, it is obvious that this channel-rate may be evaluated without the necessity of factoring the channel. It is R(Y, X) = -(1 / 2) (ln(det(S)) - (ln(det(a)) + ln(det(c))), where it will be recalled that the matrix S is partitioned as S = (a, b; btr, c). Furthermore, since each of the three matrices is positive-definite symmetric, the channel rate R becomes R(Y, X) = -(1 / 2) (trace(ln(S)) - (trace(ln(a)) + trace(ln(b)))), where we have employed the theorem.
Then, we may generalize the Theta and rho, by calculating them from the channel-rate R. The average-Theta is given by sin(average-Theta) = exp(-R(Y, X) / n), where the matrix S is of size 2n by 2n.. The effective-Theta as sin(effective-Theta) = exp(-R(Y, X)). Also, (average-rho)^2 = 1 – exp(- 2 R(Y, X) / n) and (effective-rho)^2 = 1 – exp(- 2 R(Y, X)).
Lemma: Given the 2x2 positive-definite symmetric matrix S = (a, b; b, c) defining a Gaussian probability density distribution. The two marginal probability dentistry distributions are defined by the matrices (a) and (c), respectively.
Proof by Linear Algebra is obvious.
Proof by Calculus: The inverse of S is (c, -b; -b, a) / (a c - b^2). Pre-multiply by the vector V = (x, y) and post-multiply by its transpose, to obtain the quadratic expression V (1 / S) Vtr = (c x^2 - 2 b^2 x y + a y^2) / (a c - b^2). From here on, we consider only the computation of the first marginal probability; the second one is analogous. Complete the square in y, to obtain the equivalent expression ((c - b^2 / a) x^2 + (a y - b x / a)^2) / (a c - b^2). Let u = a y - b x / a^2. Substitute, to obtain the expression ((c - b^2 / a) x^2 + u^2) / (a c - b^2). Integrate with respect to u. The expression becomes x^2 / a. Consider the matrix, which is (1 / a). Its inverse is (a). QED. We have omitted many of the details of the computation. Now, are you not glad that you are studying Linear Algebra, rather than Calculus? :-)
Theorem: Ditto for a n-by-n partitioned matrix S = (A, B; Btr, C).
Proof by Linear Algebra: Follows immediately, from the recursive definition of a matrix, by the application of Mathematical Induction.
Proof by Calculus -- I even do not want to think of it!
Copyright (c) 2003, 4 by R.I. ‘Scibor-Marchocki. Last revised Thursday 11-th November 2004.