Enter the matrixes!

Extract from an article by Arthur Cayley published in 1858 in ’Philosophic
Extract from an article by Arthur Cayley published in 1858 in ’Philosophical transactions’ (vol. 148).
Whether performed by a human or a computer, matrix multiplication is a tedious task. Researchers are battling to reduce the time and number of steps required to solve this type of operation.

Excel spreadsheets, climate modeling, simulation of aircraft wing structure, neural network calculations, image processing... Behind these objects or problems often lie matrices. Straight out of algebra, they are above all mathematical objects on which we can perform operations, just as we do with numbers. Among these is multiplication, a simple operation that can demand enormous computing resources from a computer when the matrix becomes large. That’s why, since the 1960s, researchers have been working on optimized multiplication methods to speed up these calculations.

Matrices can be seen as an array of values. Among other things, they can be used to describe a system of linear equations in a compact way. Most physical, chemical or biological phenomena can thus be represented in matrix form. Matrices are also used to characterize an object, such as an image described by a table indicating the value (color, position, etc.) of each of its pixels, or in machine learning, where " most of the computation in a neural network is performed on matrices representing the state of each neuron ", points out Gabriel Peyré, researcher in the Department of Mathematics and Applications at the École normale supérieure 1 .

From tables to matrices

The ancestors of matrices originated in China, in the 1st century BC or AD. " In the anonymous work The Nine Chapters on Mathematical Procedures, we find something that resembles a matrix. This is because the procedure for solving a linear system begins with a description of the layout of the data ," explains Karine Chemla, a researcher in the history of mathematics at the Science, Philosophy, History (Sphere) 2 laboratory. The point of this layout is comparable to our positional numeration, which makes it easy to add up by superimposing units, tens, hundreds and so on. " The idea here is that the algorithm for solving the problem is based on the arrangement of the data in a table ," she emphasizes.

However, these tables could not yet be considered matrices, as they could not be used to perform operations. " This leap was made in the mid-19th century, with the work of Arthur Cayley ," explains Karine Chemla. This British mathematician performed the first elementary operations on these objects: addition, inversion or multiplication. Operating on matrices enabled him to solve geometric problems. Indeed, the geometric transformation of an object - its rotation, for example - can be written using a matrix. And the succession of several transformations is nothing more than the multiplication of matrices, each representing a geometric transformation undergone by the same object. A technique widely used in 3D graphics.

Almost all algorithms use matrix multiplication for numerical resolution.

In an increasingly digital world, matrix multiplication has become a central operation, and not just in the field of mathematics. " Almost all algorithms use matrix multiplication for numerical solving," recalls François Le Gall, a researcher at the Graduate School of Mathematics at Nagoya University (Japan).

The advantage is that the trivial (or standard) method for performing this operation is simple to perform by hand or with a computer, as long as the matrix size remains reasonable. The disadvantage is that the number of calculation steps required for this algorithm grows exponentially with the size of the matrix. " We can count the additions and multiplications required to multiply two matrices using the trivial algorithm. For a matrix with n rows and n columns, this cost, called the ’complexity of the algorithm’, is n3 ", explains Clément Pernet, teacher-researcher at the Laboratoire Jean Kuntzmann 3 . For example, if two matrices each have 1,000 rows and 1,000 columns, then the computer (or human) needs to perform 1 billion operations (i.e. 1,0003) to successfully multiply them. " Matrices of this size are common in applications, particularly in machine learning", warns François Le Gall.

The quest for the best algorithm

As early as the 1960s, mathematicians and computer scientists were wondering whether matrices could be multiplied with fewer operations. " Volker Strassen (a German mathematician) thought it was impossible. But he was the first to find an algorithm for solving a matrix product in fewer than n3 steps," smiles Clément Pernet. A discovery that launched the quest for the optimal matrix product algorithm! " The main motivation is to make fast calculations ," emphasizes François Le Gall. Multiplying two matrices takes time: 30 seconds for two matrices of 10,000 rows and 10,000 columns, but 4 minutes with twice as many rows and columns. Finding an algorithm that reduces the number of calculation steps required would reduce the overall calculation time. This reduction is all the more significant as the size of the matrix increases.

Let’s compare the trivial algorithm with a hypothetical multiplication algorithm whose complexity would be n². When multiplying two matrices of size 3 x 3, the advantage of the second method over the trivial algorithm is not enormous. On the other hand, if the size of the matrix is one million columns for as many rows, the second algorithm will require one million times fewer calculations, i.e. one million times less time and energy expended. " Our theoretical motivation is also to understand how far we can reduce the value of the power of n ", explains François Le Gall. What we do know is that, even optimized, the algorithm cannot go below a complexity of n2. This is because n² is the number of cells making up the result matrix. It therefore takes at least n² steps to write the result. To reduce complexity, researchers are focusing their efforts on reducing the number of multiplications required to solve a matrix product. " We know the number of additions to be made, and it’s small compared to the number of multiplications. So it’s mainly the multiplications that determine the power of n ," explains François Le Gall.

A "painfully" decreasing complexity

The first improvement, Volker Strassen’s algorithm, reduces the complexity of n3 to n2,807. To build it, the mathematician used the divide-and-conquer method. The idea is as follows: the problem (in this case, the matrix) is decomposed into several sub-problems (parts of the matrix), which in turn are decomposed into other sub-problems until only matrices of size 2 x 2 are obtained. All that remains is to multiply all these results together to find the final result. In this way, the multiplication of two matrices of size 2 x 2 requires one less multiplication than the trivial method, and the multiplication of matrices of size 10,000 x 10,000 offers a time saving of around 28%. " After Volker Strassen, there was no progress for ten years. Then, between 1978 and the end of the 1980s, there was enormous competition ," exclaims François Le Gall. Several small improvements were discovered, first thanks to new work by Volker Strassen, then by Shmuel Winograd and Don Coppersmith, both mathematicians. Using a different matrix decomposition method, they succeeded in reducing complexity to n2,376.

Since the 2010s, successive improvements have been made to this precise method. The latest optimization was carried out by Zhou Renfei and his colleagues at the Interdisciplinary Institute of Information Science at Tsinghua University (China), and, at the end of 2023, they increased the exponent from 2.372860 to 2.371866. A result that may not look like much, but is mathematically decisive. " They found that something in Coppersmith and Winograd’s method was not optimal. We’d missed it ," says François Le Gall.

Along the way, theoretical motivation took precedence over real improvements. " After the 1970s, matrix multiplication algorithms became galactic ," warns Clément Pernet. In other words, these algorithms are so complex that they only reduce computing time for matrices so gigantic that all the computers on Earth wouldn’t be enough to store them. So, in practice, they are never used to multiply two matrices, even with thousands of rows and columns. However, just because the result isn’t used doesn’t mean it isn’t interesting. " This research is answering fundamental questions, and requires new techniques ," points out Gabriel Peyré. This race between theorists could well lead to faster algorithms that can be used in concrete applications...