Given a set of points, the purpose of linear regression is to pass an over-determined surface as close - in some sense - as possible to these
points.
First we have to select a metric, both in the sense of between a pair of points and over a set of such pairs. There are innumerable possible
choices and several popular choices. We are going to adapt the possibly-weighted Lebesque-Stieltjes integral of the square of the difference of the
dependent variables.
LS-integral over the domain of (y - f(x))^2 p(x), with respect to x,
where p(x) x in X, is a probability density distribution p(X) and where f(x) is the function to be fitted to the given y = y(x) x in X, is the
function y(X).
For the purpose of this exercise, we are going to restrict ourselves to the one-dimensional case.
We take the probability density distribution p(x) x in X, as p(X). Its Lebesque-Stieltjes integral over X, by definition, is one
1 = LS-integral over X of p(x), with respect to x
The Greek letter mu, which we write as u, is the mean of x over X. Likewise, yo is the mean of y over X. The subscript of the capital I indicates
the variable of the moment. The second (and higher) order moments are taken about the center of mass. The second-order moments are called
"variance" by statisticians and "moment of inertia" by physicists. The third-order moments are called "kurtosis". The Greek letter
sigma, which we write as s, is the standard deviation of the variable indicated in its subscript, which we write as a suffix. The Greek
letter rho is the correlation coefficient. The subscript ee stands for expected error.
u = Ix = LS-integral over X of x p(x), with respect to x
yo = Iy = LS-integral over X of y p(x), with respect to x
sx^2 = Ix2 = LS-integral over X of (x - u)^2 p(x), with respect to x
sy^2 = Iy2 = LS-integral over X of (y - yo)^2 p(x), with respect to x
Ixy = LS-integral over X of (xi - u)(y - yo) p(x), with respect to x
rho = Ixy / (sx sy)
see^2 = Iee = LS-integral over X of (yp - y)^2 p(x), with respect to x
We observe that each of these two Lebesque-Stieltjes integrals are zero.
0 = LS-integral over X of (xi - u) p(x), with respect to x
0 = LS-integral over X of (yi - yo) p(x), with respect to x
The straight-line (y predicted, abbreviated as yp), which we want to fit, by what is known as the method of least squares, to the
set of points, is
yp = m (x - u) + b + yo
We adapt the Euclidean metric for the vertical distance between a point and the line. Then, we integrate it over X. Thus, we want to minimize the
function
w = Iee = LS-integral over X of (yp - y)^2 = LS-integral over X of [m (x - u) + b - (y - yo)]^2
To this end, employing "differentiation under the integral sign" (theorem to be quoted and the reference to be located and cited), we find the
partial derivatives of w with respect to m and b, set each equal to zero, and solve for b and m.
0 = dw/dm = 2 LS-integral over X of [m (x - u) + b - (y - yo)] (x - u) p(x), with respect to x = 2 (m Ix2 - Ixy)
0 = dw/db = 2 LS-integral over X of [m (x - u) + b - (y - yo)] p(x), with respect to x = 2 b
Thus
m = Ixy / Ix2 = rho sy / sx
b = 0
Substitution back into w yields
see^2 = Iee = w = m^2 Ix2 + b^2 + Iy2 - 2 m Ixy = Iy2 - Ixy^2 / Ix2 = (Iy2 - Ixy) (Iy2 + Ixy) / Ix2
Substitution back into the equation for the straight-line yields the regression of y upon x
yp = rho (sy / sx) (x - u) + yo
which, alternatively, may be written as
(yp - yo) / sy = rho ((x - u) / sx)
The expected error is
see^2 = Iee = (Iy2 - Ixy) (Iy2 + Ixy) / Ix2
Observe that, when written in terms of the normalized coordinates, the line passes through the origin and has a slope equal to the correlation
coefficient rho. Had we asked for the regression of x upon y, we would have obtained
(xp - u) / sx = rho ((y - yo) / sy)
These are not the same straight lines.
If the cardinality n of the set given points is finite, the unbiased expected error is
unbiassed-see^2 = unbiassed-Iee = Iee n / (n - 2)
And it is this unbiased expected error that should be employed in the Student-t probability distribution. References:
Statistics: a First Course, Donald H. Sanders, McGraw-Hill, Inc., 1995 (newer editions may be available). ISBN 0-07-054900-1
Introduction to Mathematical Statistics, Robert V.Hogg and Allen T. Craig. fifth edition 1994. ISBN 0023557222.
The generalization to multi-dimensions would promote the second-order moments Ix2, Iy2, and Ixy to matrices. The parity of the rank of
these tensors is invariant under the generalization, as one would expect. Since this generalization is a profound part of Linear Algebra, we place the one-dimensional special-case within College Algebra, rather than Calculus.
Copyright 2000 by R. I. 'Scibor-Marchocki
Last modified on Wednesday 16-th February 2000
Webmaster@rism.com