Interpolation, Extrapolation, Curve Fitting, Differentiation, Integration, Solving of Differential Equations, Etc. would be more descriptive; but too long for a title.
At the end of the eleventh-grade of high school (in 1942), I wanted to have a general-purpose extrapolation formula, to explain some concepts related to the solution of the popular questions: Given the first several terms of an unknown sequence, what is the next term or what are the next several terms? And does the sequence converge? I had not been aware of any standard interpolation formula. Since I needed one, I just made up one of my own. My formula did solve the mathematical problem.
At that time, determinants and Kramer's rule were in vogue, so I employed what was at hand and expressed my formula in terms of determinants. With care, it is possible to evaluate such a seven-by-seven determinant, by hand, in one day. Furthermore, the determinant form, without expansion, lends itself very well to the proof of certain theorems. I employed it to good advantage, through graduate school; but to the complete bafflement of my professors.
Two decades later, I employed it for the computer solution of a system of ordinary differential equations, employing the Milne predictor-corrector method. A decade and a half later, I translated it into a matrix form and incorporated it into a program library we were constructing. And now, another decade and a half later, here it is.
I do not know if my formula is known in the literature. I never have seen it, so at best it is very obscure. Perhaps it is original with me. However, once you grasp the concept, it is so very obvious that one would think someone else might have obtained the same formula?
An intriguing aspect of either the determinant or the matrix formula is that while it is obvious by inspection, a derivation would be impractically difficult. Hence, the proof consists of the single word "Observe!".
0 = determinant of
y(x) f0(x) f1(x) f2(x) f3(x) ....
y(x0) f0(x0) f1(x0) f2(x0) f3(x0) ....
y(x1) f0(x1) f1(x1) f2(x1) f3(x1) ....
y(x2) f0(x2) f1(x2) f2(x2) f3(x2) ....
y(x3) f0(x3) f1(x3) f2(x3) f3(x3) ....
.... .... .... .... .... ....
In high school, I had written it with the fi(x) as the powers of x, x^i . However, even then I was aware that these elements of the determinant could be general functions. Proof: A determinant is zero whenever two of its rows are identical.
In matrix form, it is even easier to write. Let the matrix of the basis functions fj at the index points xi be
fj(xi)
with the i (row) vertical and the j (column) horizontal. Then the interpolation formula is, employing Einstein's summation convention,
y(x) = (y(xi)) transpose((fj(xi))^(-1)) (fj(x))
If these were written out, the first vector is a row vector, in the matrix the i (row) is vertical and the j (column) is horizontal, and the final vector is a column vector. There is no requirement that the points xi be equidistant. They need be neither monotonic nor increasing. But the points must be distinct, that is, we can interpolate only a function. Proof: We are multiplying the matrix fj(xi) by its inverse to obtain the identity matrix.
Now, we see that the requirement upon the basis-functions f's is that they must be linearly independent, that is, the matrix must be non-singular to be invertible. Instead of discrete, either i or j may be a continuum. Then we have a Banach (if real) or Hilbert (if complex) space.
It gets even better. Let x be an arbitrary function of u, say x = g(u), where we will select g(u) later on. Now, let z(u) be the composite function z(u) = y(g(u)). then we have
z(u) = (z(ui)) transpose((fj(xi))^(-1)) (fj(g(u))
In this form, the domain of the function z is U, while that of the basis f is X. Thus, we can reuse the computation of the matrix (fj(xi)) and its inversion. All that we need is an easy function x = g(u) which maps the index points ui to xi. Much can be accomplished with a linear, bilinear, exponential, logarithmic, trigonometric, inverses trigonometric, or the indefinite integral of some probability density distribution as this function. We may name the composite function hj(u) = fj(g(u)); many have accepted names. Furthermore, we may change the parameters of the function g(u) as often as we please, say dependent upon the neighbourhood of the specific point u.
Hence the product (which is a vector) of the transposed inverted matrix by each of the two basis vectors (the function and its integral) may be reused for the whole solution of a set of simultaneous ordinary differential equations, as long as the relative spacing of the index points remains fixed. Only the vector z(ui) depends upon the function being interpolated and the current reference point.
The functions fj need not, but may, be orthonormal. It that case, the inverse of the matrix (fj(xi)) is obtained easily as its transpose. We could employ the Gramm-Schmidt orthonormalization algorithm to make any arbitrary functions into an orthonormal set. However, there already are many well-known orthonormal functions. The sine and cosine functions give us the usual Fourier series. The complex exponential function gives us the LaPlace transform. There are a dozen or two common ones and the Batmann manuscript project describes some five-hundred other named orthonormal functions.
degree 0 x (0) matrix (1) inverse (1) fj (1) and its integral (x) + C. The derivative of fj is (0).
degree 1 x (-1, 1) matrix
| 1 | -1 |
| 1 | 1 |
inverse
| 1 | 1 |
| -1 | 1 |
divide by 2
fj (1, x) and its integral (x, x^2 / 2) + C. The derivative of fj is (0, 1). Gauge (1, 2).
degree 1 x (-1/2, 1/2) matrix
| 1 | -1/2 |
| 1 | 1/2 |
inverse
| 1 | 1 |
| -2 | 2 |
divide by 2
fj (1, x) and its integral (x, x^2 / 2) + C. The derivative of fj is (0, 1). Gauge (1, 1).
degree 2 x (-1, 0, 1)
matrix
| 1 | -1 | 1 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
inverse
| 0 | 2 | 0 |
| -1 | 0 | 1 |
| 1 | -2 | 1 |
divide by 2
fj (1, x, x^2) and its integral (x, x^2 / 2, x^3 / 3) + C. The derivative of fj is (0, 1, 2 x). Gauge (1, 2, 4).
degree 3 x (-3, -1, 1, 3)
matrix
| 1 | -3 | 9 | -27 |
| 1 | -1 | 1 | -1 |
| 1 | 1 | 1 | 1 |
| 1 | 3 | 9 | 27 |
inverse (courtesy of Macsyma)
| 3 | 27 | 27 | -3 |
| 1 | -27 | 27 | -1 |
| 3 | -3 | -3 | 3 |
| -1 | 3 | -3 | 1 |
divide by 48
fj (1, x, x^2, x^3) and its integral (x, x^2 / 2, x^3 / 3, x^4 / 4) + C. The derivative of fj is (0, 1, 2 x, 3 x^2). Gauge (1, 6, 36, 216).
degree 4 x (-2, -1, 0, 1, 2)
matrix
| 1 | -2 | 4 | -8 | 16 |
| 1 | -1 | 1 | -1 | 1 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 |
| 1 | 2 | 4 | 8 | 16 |
inverse (again, courtesy of Macsyma)
| 0 | 0 | 24 | 0 | 0 |
| 2 | -16 | 0 | 16 | -2 |
| -1 | 16 | -30 | 16 | -1 |
| -2 | 4 | 0 | -4 | 2 |
| 1 | -4 | 6 | -4 | 1 |
divide by 24
fj (1, x, x^2, x^3, x^4) and its integral (x, x^2 / 2, x^3 / 3, x^4 / 4, x^5 / 5) + C. The derivative of fj is (0, 1, 2 x, 3 x^2, 4 x^3). Gauge (1, 4, 16, 64, 256).
degree 5 x (-5, -3, -1, 1, 3, 5)
matrix
| 1 | -5 | 25 | -125 | 375 | -1875 |
| 1 | -3 | 9 | -27 | 81 | -243 |
| 1 | -1 | 1 | -1 | 1 | -1 |
| 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 3 | 9 | 27 | 81 | 243 |
| 1 | 5 | 25 | 125 | 375 | 1875 |
inverse (another courtesy of Macsyma)
| 540 | -2625 | 10125 | 10125 | -2625 | 540 |
| -108 | 875 | -10125 | 10125 | -875 | 108 |
| -600 | 2805 | -2205 | -2205 | -2805 | -600 |
| 120 | -935 | 2205 | -2205 | 935 | -120 |
| 60 | -180 | 120 | 120 | -180 | 60 |
| -12 | 60 | -120 | 120 | -60 | 12 |
divide by 16080
fj (1, x, x^2, x^3, x^4, x^5) and its integral (x, x^2 / 2, x^3 / 3, x^4 / 4 x^5 / 5, x^6 / 6) + C. The derivative of fj is (0, 1, 2 x, 3 x^2, 4 x^3, 5 x^4). Gauge (1m 10, 100, 1000, 10000, 100000).
degree 6 x (-3, -2, -1, 0, 1, 2, 3)
matrix
| 1 | -3 | 9 | -27 | 81 | -243 | 729 |
| 1 | -2 | 4 | -8 | 16 | -32 | 64 |
| 1 | -1 | 1 | -1 | 1 | -1 | 1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| 1 | 2 | 4 | 8 | 16 | 32 | 64 |
| 1 | 3 | 9 | 27 | 81 | 243 | 729 |
inverse (once more, courtesy of Macsyma)
| 0 | 0 | 0 | 720 | 0 | 0 | 0 |
| -12 | 108 | -540 | 0 | 540 | -108 | 12 |
| 4 | -54 | 540 | -980 | 540 | -54 | 4 |
| 15 | -120 | 195 | 0 | -195 | 120 | -15 |
| -5 | 60 | -195 | 280 | -195 | 60 | -5 |
| -3 | 12 | -15 | 0 | 15 | -12 | 3 |
| 1 | -6 | 15 | -20 | 15 | -6 | 1 |
divide by 720
fj (1, x, x^2, x^3, x^4, x^5. x^6) and its integral (x, x^2 / 2, x^3 / 3, x^4 / 4 x^5 / 5, x^6 / 6. x^7 / 7) + C. The derivative of fj is (0, 1, 2 x, 3 x^2, 4 x^3, 5 x^4, 6 x^5). Gauge (1, 6, 36, 216, 1296, 7776, 46656).
In each case, do not forget to transpose the listed inverse, or else transpose the whole right-hand side of the formula.
A few times in the past, before the wide-spread availability of computers, I had worked out, by hand, the pairs of determinants for the function and its first integral, through the sixth degree, This time, however, in my old age, I have become spoiled by the availability of computers. Each of the larger matrices (as indicated), I inverted in the Macsyma program.. The same matrix serves for the function and its integral or derivative; only the variable post-multiplying vector has to be integrated or differentiated. Anything higher than the sixth degree is not practical to employ by hand; hence, there is no reason to display the higher degree matrices or their inverses.
The ill-conditioning ratio is defined as the ratio of the largest (in magnitude) to the smallest (in magnitude) non-zero eigenvalues. A useful, easily-computed, approximation is the corresponding ratio of the elements of a matrix.
These fj(xi) matrices, with the set of (xi) which we have employed, are ill-conditioned. Thus, not only have we reached the limits of feasible hand calculation; but, we are nearing those of double-precision floating-point calculation, as well. Instead, as a better choice, the (xi) should be taken in the interval [- 1 / 2, 1 / 2]. The resulting fj(xi) matrices will be reasonably well-conditioned; but entirely unsuitable to hand-calculation, even for low degrees of the polynomial. We have demonstrated such a matrix for the first degree. A guage transformation of the other matrices may be performed by multiplying by the powers of the scale-factor. The diagonal-matrix guage multiplier is shown for each inverse matrix.
Even a unit interval does not yield a well-conditioned matrix at higher degrees. An actual computation of the ill-conditioning ratio and adaptive scaling of the interval will help, somewhat. Probably somewhere in the twenties is the highest degree feasible for double-precision floating-point calculation. Advice: Do not push it; settle for the fourteenth degree, as a practical maximum. If you go higher, you will be fighting round-off stability.
Undoubtedly, textbooks of Banach or Hilbert space present the first of the foregoing matrix formulae; but, if so, the concept has not filtered down to common usage as a very practical method of interpolation.
There are several other polynomial interpolation algorithms. The best knows is the LaGrange method. It is very computationally intensive and the use of the mapping from the U to the X spaces is not mentioned in the usual exposition. Another method of finding the interpolating polynomial is by means of linear regression. Indeed, the symbolic mathematics program Derive can fit a user-supplied finite set of functions as a basis. Again there is no mention of the U to X mapping.
It is obvious that the interpolating polynomial (or coefficient vector, in the case of an arbitrary basis) is unique, hence the choice of the algorithm for its computation is only a matter of convenience or speed of computation. But, my algorithm (in its second matrix form) provides for the greatest reusability.
By far, the most popular polynomial interpolation formula is that of LaGrange. We present it for comparison with my formulae.
Let both i and j represent the index set of the given points, at which the values of y(xi) are given,
Consider the quotient (x - xj) / (xi - xj), where j does not equal i, It is linear in x. It has a zero at x = xj. And it is one at x x = xi. Next, consider the product of these quotients over all j not equal to i. This product will be zero for x equal to each xj and one for x equal to xi. Multiply this product by y(xi). Now, we have a polynomial of degree one less than the cardinality of the set of given points. This polynomial passes through the one point at xi and is zero at each of the other points.
Finally, sum these products over the set of index points i. We have obtained an interpolating polynomial of degree one less than the cardinality of the set of given points, which passes through each of them. As already mentioned, since the interpolating polynomial is unique, it must be the same polynomial as obtained by either of my formulae. However, the inverted matrix and the basis vector are commingled. Furthermore, the LaGrange formula does not lend itself to a generalized basis.
Thus we see that what I had developed in high school is a generalization of the orthonormal transform to any arbitrary basis. No wonder my professors were mystified! Remember that there is no derivation available. The proof consists of "Observe!". You either see it or you don't. :-)
If anybody knows of any literature regarding this formula or its applications, please inform me. If anybody decides to employ this formula himself, I would like to know, also, please. This is my first and, so far at least, only publication of this formula. Please acknowledge any use of it that you might make. Thanks.
Copyright ©1997 R. I. 'Scibor-Marchocki last modified on Sunday 03-rd August 1997.