Linear Algebra Part One

copyright (c) 1999 by R. I. 'Scibor-Marchocki

 

This is a collections of e-mail letters, which I wrote to my very good friend Deb, as a set of lessons in Linear Algebra. I have deleted each salutation, to protect her privacy. If anybody shows interest, someday, I might edit these lessons into a better organized tutorial. So, please write me at webmaster@rism.com.

 


linear algebra

I had read Shannon's "A Mathematical Theory of Information" in about 1955. And I have been interested and doing original research in Information Theory ever since. At least in as much as the availability of time, money, and resources and the interest of my various employers permitted.

Information Theory reaches into most other branches of Mathematics. In particular, the practical computational portions tend to be computer programs related to Linear Algebra.

Of course, Linear Algebra has its beginning in China some 5000 years ago. Already then, they had developed the Gaussian Elimination algorithm for the inversion of a matrix. Gauss merely popularized this algorithm in Europe. Many great mathematicians have contributed to Linear Algebra. Thus, one has to retrace a long path, before one arrives at the frontier.

During the latter 1970's, I had a Texas Instruments TI-49 calculator. By today's standards, it was a very small calculator. However, it was *programmable*. On it, I coded some original algorithms of Linear Algebra.

On the basis of my demonstration of those programs, I had been retained by Clyde Sparks (Vice President) of Delphi Data, in April 1980, to develop a library of Linear Algebra on their computers. I did so for the following year and a half.

As I remember, the library consisted of some 700 routines. It was written in a version of FORTRAN IV on the Data General computers. Those computers have not been available since the latter 1980's.

So, that library is not accessible to any modern computer.

 

Today Tuesday 01-st June 1999, I am beginning the process of translating that library to the C language. Actually, most of the routines, I can write from scratch faster and easier than attempting to copy from the old listings. After all, I *know* Linear Algebra. And I have developed some interesting original algorithms.

However, it took 18 months in 1980-1981. It will take something comparable now. And it does not get interesting to any on-looker until many of the fundamentals have been coded. Thus, there is little of interest to say about this project for the present.

 

Tomorrow morning, at 1000 local time, I have an appointment at the Kaiser (Health Maintenance Organization). I want to receive a vaccine for Lyme disease. Also for Streptococcal pneumonia. We will see if the doctor they appointed to me will consent to vaccinate me.


inverse of a matrix

Here is a brand-new algorithm for inverting a matrix. I just invented it this afternoon, 1500 Pacific Daylight Savings Time Thursday 03-rd June 1999. Copyright (c) 1999 by R. I. 'Scibor-Marchocki.

It is as fast at the Gaussian elimination algorithm. Furthermore, it requires only one division.

First a review. A matrix over a division ring is defined as a rectangular array of elements drawn from that ring; with the usual definitions of matrix addition, matrix multiplication, and multiplication of a matrix by a scalar. It is very easy to show that the matrices themselves constitute a division ring. Hence, the definition is recursive.

For the purpose of this discussion, we consider only square matrices.

Given a matrix A, the inverse of the matrix (A, 0, 0, 1), where 0 is a matrix each of whose elements is zero and 1 is an identity matrix, is the matrix (1/A, 0, 0, 1). Thus the inverse of the larger matrix embodies the inverse of the original matrix.

The inverse of a matrix (a) is the matrix (1/a). While a could be a partitioned matrix, this statement is useful only for an elementary matrix (a).

The inverse of a partitioned matrix (A, B, C, D) is the matrix (D, -C, -B, A) / (A*D - B*C), as may be verified easily.

Apply recursively, and do not waste time actually multiplying by zero or by one.

There you have it. And it is *much* easier than Gaussian elimination.

Try it. Please tell me what you think,


inversion of a matrix

I was in a hurry, when I wrote the description of that algorithm for the inversion of a matrix. I just presented a bare-outline. Is it clear enough? Should I elaborate?

This algorithm is suitable for hand calculation. However, it would suffer, if it were to be implemented as a computer program. Unless one is very careful.

For a matrix other than of a size which is an integral power of two, there will be a large block of zeros and a partial diagonal of ones. Hand multiplication by zero or one is trivially fast compared to multiplication by a typical real number. For a computer, such a trivial multiplication is only slightly faster than multiplication by a typical real number. And an if-statement to bypass such a trivial multiplication would consume far more time than that spent in the multiplication.

It is only the first recursion which presents this implementation problem.

Maybe the matrix could be partitioned to isolate the zeros and the ones.

Thus the right-upper quarter of the first matrix would be partitioned into two vertical matrices, the right-most one being a zero matrix. The left-bottom quarter into two like horizontal matrices. The right-bottom quarter matrix into four matrices, with its right-upper and left-bottom being zero-matrices, while the right-bottom being an identity-matrix.

This right-bottom quarter-matrix would be incompatible with the left-upper quarter-matrix -- the only one by which it needs to be multiplied. Thus, the left-upper quarter-matrix would have to be partitioned into the same size and shape sub-matrices.

After the first recursion, each element of the resulting divisor-matrix would by a typical real number. Also, on the way back up from the recursion, there are no such trivial multiplications.

As you see, there often -- *very* often -- is much that is left unsaid in the description of an algorithm, which has to be included in its practical implementation.

How much do you want to include in your posting? If you indicate the level of detail you desire, I will re-write the algorithm accordingly. Or do you want a computer program -- the ultimate in detail?

All maps and aerial photographs are copyrighted. They cannot be posted, without explicit permission from the original publisher (or whomever holds the copyright). Besides, except for a businesses expecting clients to come, there would be little interest to anybody of a map or aerial photograph.


documentation for my matrix inversion algorithm

I have consolidated and polished the documentation for my matrix inversion algorithm. Here it is. Is this the level of detail that you have in mind? Please make suggestions for further improvements.

 

/* ****************** documentation of the rism matrix inversion ****

 

Here is a brand-new algorithm for inverting a matrix. I just invented it this afternoon, 1500 Pacific Daylight Savings Time Thursday 03-rd June 1999. Copyright (c) 1999 by R. I. 'Scibor-Marchocki. Of course, perhaps the Chinese knew this algorithms four thousand years ago?

It is as fast at the Gaussian elimination algorithm. Furthermore, it requires only one division.

First a review. A matrix over a division ring is defined as a rectangular array of elements drawn from that ring; with the usual definitions of matrix addition, matrix multiplication, and multiplication of a matrix by a scalar. It is very easy to show that the matrices themselves constitute a division ring. Hence, the definition is recursive.

For the purpose of this discussion, we consider only square matrices.

Given a matrix A, the inverse of the partitioned matrix (A, 0, 0, 1) (where 0 is a matrix each of whose elements is zero and 1 is an identity matrix), is the matrix (1/A, 0, 0, 1). Thus the inverse of the larger matrix embodies the inverse of the original matrix. And their determinants are equal. Thus, if the given matrix is not of an integral power-of-two size, pad it with an identity matrix to such a size.

The inverse of a matrix (a) is the matrix (1/a). While a could be a partitioned matrix, this statement is useful only for an elementary matrix (a). Furthermore, for an elementary matrix, its determinant is a.

The inverse of a matrix (A, B, C, D) is the matrix (D, -C, -B, A) / (A*D - B*C), as may be verified easily. Furthermore, the determinant of the original matrix is equal to that of the divisor matrix. Apply recursively, and do not waste time actually multiplying by zero or by one.

There you have it. And it is *much* easier than Gaussian elimination.

 

This algorithm is suitable for hand calculation. However, it would suffer, if it were to be implemented as a computer program. Unless one is very careful to avoid the trivial multiplications efficiently.

For a matrix other than of a size which is an integral power of two, there will be a large block of zeros and a partial diagonal of ones. Hand multiplication by zero or one is trivially fast compared to multiplication by a typical real number. For a computer, such a trivial multiplication is only slightly faster than multiplication by a typical real number. And an if-statement to bypass such a trivial multiplication would consume far more time than that spent in the multiplication.

It is only the first recursion which presents this implementation problem.

Sub-partition the matrix to isolate the zeros and the ones. Thus, the right-upper quarter of the first matrix would be partitioned into two vertical matrices, the right-most one being a zero matrix. The left-bottom quarter into two like horizontal matrices. The right-bottom quarter matrix into four matrices, with its right-upper and left-bottom being zero-matrices, while the right-bottom being an identity-matrix.

This right-bottom quarter-matrix would be incompatible with the left-upper quarter-matrix -- the only one by which it needs to be multiplied. Thus, the left-upper quarter-matrix would have to be partitioned into the same size and shape sub-matrices.

After the first recursion, each element of the resulting divisor-matrix would by a typical real number. Also, on the way back up from the recursion, there are no such trivial multiplications. Except at the top level, where appropriate sub-partitioning again is required.

As you see, there often -- *very* often -- is much that is left unsaid in the description of an algorithm, which has to be included in its practical implementation.

 

Speed. The benchmark for the inversion of a matrix is the requirement for the multiplication of two matrices each of size n x n, which is exactly n cubed multiplications -- no divisions. We are not concerned with the much faster addition or subtraction. Most algorithms for the inversion of a matrix require between n cubed and three times as many multiplications and about n squared divisions. A good implementation of the Gaussian elimination algorithm requires exactly n cubed multiplications and n squared divisions. The somewhat easier Gauss-Jordan algorithm requires one-and-a-half times n cubed multiplications. The Kramer rule, however, requires approximately n to the fourth power multiplications -- not acceptable for large matrices. The Kramer rule also suffers from excessive round-off errors. The algorithm described here requires slightly less than n cubed multiplications and only one division.

 

********************************************** */

Once we both are satisfied with the wording, by all means, if you are so inclined, please do post it on the Internet.


rism squaring algorithm

Definition A matrix is said to be *symmetric* iff (= if and only if) its transpose is equal to the given matrix. The complex counterpart is called *Hermitian*.

Definition A matrix is said to be *skew-symmetric* iff its transpose is equal to the negative of the given matrix. The complex counterpart is called *skew-Hermitian*.

These two situations are not exhaustive. That is, a matrix may be neither.

For the purpose of this discussion, let us consider a symmetric (or possibly Hermitian) matrix A. Thus, this matrix of necessity is square, of size n x n.

This is a polynomial equation of degree n, in lambda

determinant (A - lambda I) = -0

I is an identity matrix of the same size n x n. Each root is called an *eigen-value*. This is a bastard word first part is German, second part is English. The Fundamental Theorem of Algebra says that there is one root. At a corollary, it follows that there are exactly n roots, not necessarily distinct.

By Mathematical Induction upon n, it may be shown that for each distinct lambda, a non-null vector x (called *eigen-vector*) exists such that

x . (A - lambda I) = 0

Alternatively, this theorem follows from the properties of determinants.

There is a well-known algorithms for finding an eigen-value. Successively multiply the matrix by itself. Eventually, the largest (in magnitude) eigen-value -- raised to the power of how many multiplications have been performed -- will remain on the principal diagonal. The rest of the elements of the matrix will approach zero.

Successive eigen-values may be obtained.

If the inverse of the original matrix is subjected to this algorithm, the smallest (in magnitude) eigen-value is obtained.

This algorithm is employed to study the structure of the eigen-values and eigen-vectors, especially degenerate conditions of repeated eigen-values.

For actual numerical computation of the eigen-values, this algorithm is not practical; because it requires typically maybe 10000 multiplications to isolate an eigen-value.

In the mid-1960's, I invented a squaring algorithm. I first implemented this algorithm in the mid-1970's upon the Texas Instruments TI-49 programmable calculator.

By squaring instead of multiplying, only a dozen iterations is required to isolate the largest eigen-value.

It is interesting to see how a computer program deals with the degenerate cases of either repeated eigen-values double-root. tipple or higher root. Also the degenerate case of a single, double, tipple or higher zero valued root.

In the process, the structure of such degenerate matrices is revealed.

An advantage of this algorithm -- over most others -- it that only matrix operations are required to obtain eigen-values.

I will have to resurrect this algorithm. Describe it in great detail -- it is these details that make this algorithm interesting. And re-implement it.

Are you interested?


1961

Some advise please.

Times have changed.

Until the advent of wide-spread powerful computers in the 1980's, Linear Algebra had not been practical. Thus there were no classes offered and the few textbooks that existed were miserable, to say the least.

From here and there, I had picked-up pieces of Linear Algebra; but I never had a systematic class (none were available). Neither did I ever read a textbook, and well that I did not. As you will see.

In 1961, I realized that I will need a thorough knowledge of Linear Algebra to support my ongoing research of Information Theory. Since there were no classes offered anywhere, instead, I borrowed the various contemporary textbooks from some libraries.

The books were very poorly written -- actually incomprehensible. I decided that it would be easier to re-create Linear Algebra by myself than to decipher any of these textbooks. I returned them to the respective libraries. I did not purchase a copy of any of them. Of course, it means that I do not know some of the standard terminology, or know it only poorly.

You may have heard the joke regarding the bumble-bee. According to the principles of Aerodynamics, a bumble-bee cannot fly. He, however, not ever having read any textbooks of Aerodynamics, blissfully flies around.

I had not read textbooks on Linear Algebra. Thus I did not know that one cannot find a determinant of a non-square matrix. I did not know that there is no known algorithm for finding the eigen-pairs (eigen-value, eigen-vector) of a non-symmetric (non-Hermitian) matrix.

I needed both to support my Information Theory. Necessity is the mother of invention. I devised algorithms for both. And at the time, I did not realize that I had done anything out of the ordinary.

Now, in the 1990's, there are abundant excellent textbooks of Linear Algebra -- al least elementary ones. And one or two advanced.

Question

When I describe the various algorithms, what background should I assume of the reader? How much background information should I include? Should I write my own textbook of Linear Algebra? Why, when there are many excellent ones on the market, already? But they are *new* books. They leave-out some of the older algorithms, theorems, etc. Which perspective should I adopt that of 1961 (when I devised many of my algorithms), 1980 (when I implemented them), or 2000 (when the material will be read by others)? What is the best compromise????

In the early 1980's, the Mathematical Dictionary was published jointly by the Mathematical Society of America and its Japanese counterpart. It states that, while there are several algorithms for finding the eigen-pairs of a Hermitian matrix, there is no known algorithms for a non-Hermitian matrix. That was when I first realized that I had something unusual. I already had a computer program that could find the eigen-pairs of a non-symmetric matrix. Thus, not only did I have an algorithm, I had it implemented and running.

Note A Hermitian matrix is the complex counterpart of the real matrix know as symmetric matrix. There is a close relationship between corresponding concepts of real and complex matrices.

Not being able to find the eigen-pairs of a non-symmetric matrix prevents statisticians from being able to perform factor-analysis and regression meaningfully upon the same problem, for instance.

But, there is a great background required in the theorems, algorithms, and computer programs to arrive at the point where it makes sense to talk about non-symmetric (or non-Hermitian) matrices.

Like I said at the outset, how much of a background should I assume in describing theorems and algorithms. And it will take a large effort to rewrite all of the subroutines needed to support the high-level routines which do the new/interesting things. Then, the description of these high-level routines, again, involves a large background. Will a description which omits the background be comprehensible to anyone?

Now, even at Mt. San Antonio Community College, we offer a class in Linear Algebra. It is an elementary class. We have employed various textbooks -- some good, some mediocre.

??????


squaring

Since you failed to respond to my inquiry as to the level of knowledge of mathematics which I should pre-suppose, I have taken the stand that the reader knows mathematics. If you should request additional background or more detailed explanations, I will see about providing such. Please feel free to make such a request.

Chinese

To establish what the Chinese might have known 4000 years ago requires access to the relevant old manuscripts (or photo-copies thereof), knowledge of ancient Chinese, being an expert historian, being an expert mathematician specializing in linear algebra, and having credibility in -- that is, acceptance by -- the Western culture. A further complication is that these old manuscripts did not provide an exposition of mathematics. Instead, only some worked-out problems are available. Thus one must infer the possible knowledge of mathematics from the problem that could be solved. An inexact process, at best.

It is considered that, 4000 years ago, the Chinese knew Linear Algebra, up to eigen-pairs. However, this statement does not claim a knowledge of every theorem or algorithm. And as with all of history, there are controversies.

On the other hand, anytime that we invent/discover a new theorem or algorithm, we would have to investigate back to see if the Chinese might have known.

my algorithm for finding the inverse of a matrix

Thus, I do not have any practical way of ascertaining the real novelty of my algorithm for the inversion of a matrix. All that I say is that I have not seen it in any textbook. But, then, I have not read very many textbooks of linear algebra, either.

 

Eigen-pairs (eigen-value, eigen-vector)

Eigen-pairs have been studied for two centuries. In 1845, Jacobi invented his algorithm. It took him two years to work-out -- by hand -- a 6x6 matrix, as an example. It was apparent that his algorithm *usually* was very good. However, there was no *proof* of convergence and no estimate of the speed of convergence. When computers became available, the Jacobi algorithm could not be implemented, at first. Around 1973, someone invented a simplified, poorer version, which it was practical to implement as a computer program. Around 1976, a mathematician managed to prove the convergence of this algorithm. In a few years, it was realized that the original Jacobi algorithm converges as least as fast -- probably much faster. Finally, around 1981, the full Jacobi algorithm was implemented as a computer program.

The multiplication-family of algorithms for the eigen-pairs

Until that time, only the multiplication algorithm had been studied thoroughly. It is easy to prove its convergence and the rate of convergence. Let

R the precision of the computation 10**14 for double-precision

r the ratio of the largest, in magnitude, to the next eigen-value; or the reciprocal of the ratio of the smallest, in magnitude, to the next eigen-value, in the case of the division algorithm

m the power to which the set of eigen-values is raised

p how many multiplications (or divisions) have been performed

s how many squarings have been performed, in my algorithm

Then m gives the rate of convergence

log(R) / log(r) = m = p + 1 = 2 ** s

For any value of r, one may compute either p or s. Typical values of r might be as large as 5 or as small as 1.01. Obviously, values of p in the thousands to tens-of-thousands are not practical too slow and excessive accumulated round-off errors. However, the corresponding values of s will be 10 to 15 -- certainly acceptable.

The complications resulting from degeneracy of the given matrix affect each of the algorithms of the multiplication family in the same manner. And, indeed, the original multiplication algorithm lends itself very well to a study of such degeneracies. Everything carries over to the division algorithm or to my squaring algorithm.

In 1961, I realized that I would need guaranteed algorithms (but not necessarily the fastest ones) of linear algebra to support my continuing research of information theory. I had invented my squaring algorithm in about 1962. I first implemented it on a Texas Instruments TI-49 programmable calculator in about 1976. Finally, in 1980, I re-implemented all of my linear-algebra algorithms as FORTRAN IV programs on a Data General main-frame computer -- which long ago has become obsolete and no longer available. Each of the three events preceded the availability of an understanding of the Jacobi algorithm in the mathematics community.

Once the Jacobi algorithm became understood and implemented, in around 1982, it is unquestionably the best available algorithm for finding the eigen-pairs of a symmetric matrix.

Hermitian

By the well-known mapping of a complex number (a, b) to a real matrix (a, b, -b, a), it follows that any algorithm for a real matrix may be applied to the corresponding complex matrix. Thus, any algorithm for a symmetric matrix may be applied to a Hermitian matrix.

Around 1982, the American Mathematical Society -- in cooperation with the Japanese counterpart -- published the definitive Dictionary of Mathematics. In it is stated that "while there are many known algorithms for finding the eigen-pairs of a Hermitian matrix; there is no known algorithm for a non-Hermitian matrix." This statement would include the multiplication family (multiplication, division, my squaring) and the Jacobi algorithm.

In 1962, I also invented an algorithm for a non-symmetric matrix. I also implemented it on the TI-49 in 1976. When I read the foregoing quotation, I extended it to a non-Hermitian matrix. But, this algorithm will require a background of my Information Theory to both motivate and explain.

This Summer, I have begun re-re-implementing my algorithms of linear algebra and information theory in the C language, as *.dll and VBA (= Visual Basic for Applications) and Access for the data-base and user interface. Since the FORTRAN implementation took a year, I imagine this implementation will, likewise.

A faded "Ditto" copy of the typewritten manuscript is available. Entering it -- about a 1000 pages -- into a computer-readable form will be a large project. Probably, I will be in a position to begin doing so in a year or two.

 

Now is 1612 Pacific Daylight Savings Time Saturday 19-th June 1999. For today it was predicted 33C, for Sunday 34.5C, in Pasadena. We are about the same. However, at the Renaissance Pleasure Faire, it will be warmer. I skipped today -- because of the heat. I will go tomorrow; because it is the last day of the Faire. I will take a small cooler with ice, orange juice, and water. And do as little as possible, except sit in the shade at the Friends of the Faire.

While I am able to connect to Lightside, they seem to have lost their connection to the Internet -- not an unusual occurrence for them. It will be Monday before I will be at my computer again. So this e-mail will be delayed accordingly.



webmaster@rism.com
Last modified on Thursday 22-nd July 1999
Copyright (c) 1999 by R. I. 'Scibor-Marchocki