EQUATIONS, THEORY OF. An equation is the state ment of an equality between one or more unknown numbers and known, or given, numbers, which is true, not for all values of the unknowns, but only for certain of them (Lat. aequatio, an equalizing). An equality which holds for all values of the unknowns is an identity. Thus 3x+2 = 5, true only for x is an equation; - = (x+ y) (x- y) , holding for all values of x, y, is an identity. To distinguish identities from equations the symbol = is used, as in - - (x+ y) (x - y) . This symbol will also be used, where no confusion can arise, to signify a definition; thus x = a means .c is a.
The earliest known equivalents of algebraic equations occur in the Rhind papyrus, evidently compiled from earlier works, by the Egyptian Ahmes, about 165o or 170o B.C. For example, he proposes this problem :` `A quantity and its seventh added together become 19. What is the quantity? " His word for the unknown is `aha` or `h`, formerly written hau and translated "heap" or "mass." The problem, therefore, is to solve the equa tion x++x= 19, as we would now express it. Lacking a conven ient algebraic notation, he proceeded by a cumbersome method later known as that of "false position." Indeed, neither the Egyptians nor their Greek successors made any progress that is significant from a modern point of view, and neither people rose to the abstract conception of a theory of equations as a fruit ful field of mathematical science. The Indians, with their peculiar addition to arithmetic, achieved more.
The theory of equations is concerned chiefly with the prop erties of a single algebraic equation of the type cox" +. C i + ... + = o, in which n is a positive whole number, the coefficients , are any given numbers, or numbers that are not speci fied but are assumed known, and cow o. The nature of the coeffi cients will be made more precise later, as on them the whole theory depends. The degree of this equation is n. Roughly speaking, the theory of equations discusses this problem :The coefficients being specified, find all values of x which make the equation true. This again will be amplified and made more definite as we proceed. The finding of x is called solving the equation.
The Greeks are sometimes credited with solving equations of the second degree. Thus Euclid's Elements, ii. II, is equivalent to solving There are two values of x; Euclid was content with one. In the 9th century P.D. the Arab, Mohammed ibn Musa al-Khowarizmi (whose name is variously trans literated) gave both values 3, 7 of x in = lox; he also discussed many more equations of the second degree. Like other Arabic writers, he used the equivalent of the term "root" for a value of the unknown. Great advances were made in the 15th and i6th centuries by the Italian mathematicians, who solved the general equations of the third and fourth degrees. These will be considered later. In spite of its brilliance, their work had but little direct influence on the evolution of a theory of algebraic equations. Curiosity as to the underlying reason for success or failure seems not to have perturbed the practical mind of the 16th-century algebraist. That rare type of speculation was re served for the golden age of the late i8th and early 19th cen turies; and although significant progress was made in the i8th century, notably by Joseph Louis Lagrange in a classic memoir of I 7 70-7 I, it was only with the researches of Evariste Galois (181 I-183 2) that the theory, at one stride, reached its maturity. Galois was killed in a duel at the age of 21.
Formerly the theory included much that is now relegated to other departments of algebra, e.g., the solution of simultaneous equations of the first degree in several unknowns, which is now an application of determinants (q.v.) and matrices. As commonly understood to-day, the theory of algebraic equations is concerned chiefly with two problems and their numerous ramifications, all of which sprang directly from the necessity for solving the equa tions presented by problems in pure and applied science. To describe these, a few definitions must first be recalled. For an understanding of certain parts of the sequel an elementary ac quaintance with the plotting of simple curves is presupposed; for others, a knowledge of derivatives, and Taylor's theorem (q.v.), and finally, for the modern theory, the reader is assumed to have read parts of the article GROUPS.
A in which a, b are real numbers and is called complex (see COMPLEX NUMBERS) ; if b =o, the number is real, otherwise it is imaginary. Let n be a positive integer other than zero, and let ci. , be complex numbers not involving x. If is not zero, the polynomial f (x) +cn is of degree n. A complex number k, which is such that f(k) = o, is called a root of the algebraic equation f(x) =o of degree n, and the value k of n is said to satisfy the equation; f(x) is also said to vanish when x = k. According as none, or some, of the coefficients , c" are imaginary, the equation f(x) = o is called real or imaginary. It is necessary to consider both real and imaginary equations. If f(x) = o is imaginary, its solution is reduced to that of real equations thus:Write f (x) in the form g(x) +ih(x), where the coefficients of g(x), h(x) are all real. Then g(x) +ih(x) = o. Multiply the last throughout by the conjugate imaginary g(x) -ih(x). The result is a real equation, among whose roots occur all those of f(x) = o. Real equations are thus the funda mental ones, but we shall not assume f(x) = o to be real unless so stated.
The central problems of the theory are these: (a) To find a root (x) = o, i.e., to solve the equation, when the degree n and the coefficients , are given. (b) To determine the precise conditions under which the roots of f(x) = o can be ex pressed in terms of the coefficients by means of a finite number of algebraic operations (additions, multiplications, subtractions, divisions, extractions of roots). This is called the algebraic solution, or the solution by radicals, = o. The exact sense in which the coefficients are "given" in this problem is the crux of the modern theory; for the present it suffices to state that they may be considered as independent variables. If in (a), when the coefficients have given numerical values, a root cannot be found exactly, a practicable process must be devised whereby a root may be exhibited to any prescribed degree of approximation. If in (b) the roots are not expressible in the form demanded, it is required to construct the simplest functions of the coefficients that do satisfy the equation. For example, it was almost proved in 1824 by Niels Heinrick Abel, then only 22 years of age, that the solution by radicals of the general equation of degree >4 is impossible. His attempt contains two oversights, now easily rectified by the Galois theory. The current assertion that Abel proved the general equation of degree >4 solvable by radicals impossible, is definitely incorrect. The objections by William Rowan Hamilton in 1839 to Abel's alleged proof alone are valid.
In 1858 Charles Hermite first solved the general equation of de gree 5 by means of elliptic functions (q.v.). Modern work in this direction, originating with Henri Poincare about 188o, solves the general equation of degree n in terms of Fuchsian functions. Current developments of (b) are inextricably interwoven with the theories of substitution groups, algebraic numbers (modern higher arithmetic) and special functions of a complex variable; (a) is practically exhausted.
Actually the theorem is not in the purview of algebra, as all proofs, depending ultimately upon continuity, are analytic and belong to the calculus. A proof adequate to the demands of modern rigour would implicitly traverse the entire theory of the continuum. Certain ultra-rigorists of the school founded by Leopold Kronecker, and invigorated to-day by L. E. J. Brouwer and Hermann Weyl, might even assert that, not only has the fundamental theorem not yet been proved, but also that it is without meaning. From the standpoint of modern mathematical foundations a fatal epistemological imperfection of the classical proofs is their failure to exhibit a process for constructing, in a well-defined number of well-defined operations, the roots whose existence the purported proofs undertake to establish. The dif ficulty here, of course, is irrelevant for the pragmatic problem of solving a numerical equation to a prescribed degree of accuracy. However disturbing such scepticism may be to the 2oth-century critical logician, it need not deter the engineer who is capable of plotting a graph sufficiently accurate for most practical purposes. It is interesting, however, on account of its indication that mathe matical reasoning may be as fallible as any other; and it should not be forgotten by the professional mathematician that to-day's heterodoxy is to-morrow's rigorous orthodoxy.
The jth elementary symmetric function of - , for j= 2, , n, is, by definition, the sum of all possible products of different variables chosen from the set , x. Thus, for n = 3, the ist, end, 3rd elementary symmetric functions are xi+x2+x3, x1x2+x2x3+x3x1i and these are all the ele mentary symmetric functions of There are obviously an unlimited number of symmetric functions other than the ele mentary; it suffices to apply to any rational function of , x the n! substitutions of the symmetric group on these letters and add the results; any numerical factor common to all the terms may be suppressed. For example, when n = 3, is symmetric in and is equal to The last illustrates the important theorem that any polynomial P which is symmetric in , is equal to a polynomial Q in the elementary symmetric functions and the coefficients of P; the coefficients of Q are whole numbers. If all the coefficients of P are also whole numbers, Q is a polynomial in the elementary symmetric functions alone with whole number coefficients. These properties constitute the fundamental theorem of symmetric functions. The reduction to elementary symmetric functions is unique.
Since , are any complex numbers independent of x, and cow o, the equation f (x) = o may be divided throughout by It then becomes xn-f +an = o, where - , are complex numbers, and this form is precisely as general as the original. When convenient we shall use it. If the roots are , the linear (= first degree in x), factors of and hence, on comparing coefficients of like powers of x, we see that = (- i)i times the jth elementary symmetric function of the roots, for j = i, 2, , n. In particular, (- I)" times the product of all the roots. This frequently is useful in testing for rational roots of an equation whose coefficients are rational numbers; by this means all the rational roots may be found. By the fundamental theorem on symmetric functions it follows that any symmetric polynomial P in the roots . , a is equal to a polynomial Q in the coefficients , and the coefficients of P; the coefficients of Q are whole numbers. If both the coefficients of the equation and those of P are rational num bers, then Q is a rational number.
As several subsequent theorems are considerably simpler for equations having no multiple roots, it is important to refer all cases back to this, as follows: If the highest common factor of f(x) and f'(x) involves x, let it be g(x) . Then a root of g(x) = o, of multiplicity in, is a root of f(x) = o, of multiplicity md-- i conversely, any root of f(x) = o, of multiplicity m+ i, is a root of g(x)=o, of multiplicity m. By successive applications of the process for finding the highest common factor, any multiple roots that may be present can be found. If the root a is of multi plicity h, (x is a factor of f(x), and similarly for all mul tiple roots. Dividing f(x) by the product of all such (x , we obtain a polynomial which vanishes only for the simple roots of f(x) = 0. This argument for multiple roots is perfectly general and is not restricted to real equations.
Now let a, b be real numbers, neither a root of f(x)=o, and let a < b. To find the number of real roots of f(x)=o lying between a and b, put x = a in the Sturm functions, and delete any terms that then vanish. Count the variations of sign (as in Descartes' rule) in the resulting sequence of real numbers. Let there be V. variations. Proceed similarly with b, and obtain Vb. Then the number of real roots between a and b is V a Vb. In particular, if a= oo , b= +00. Sturm's theorem thus gives the total number of real roots when we attend only to the signs attached to the highest powers of x in his functions. (The com putations are usually laborious, but awkward fractions can be avoided by multiplying each dividend by a properly chosen posi tive constant before dividing ). Next, if f(x) =o has multiple roots, is still the number of real roots between a and b, provided each multiple root be counted once only. In practice, however, it is simplest to get rid of the multiple roots first, by the method already indicated.
A less powerful theorem by the French physician F. D. Budan (1807), proved by J. B. J. Fourier about 1829, involves less com putation than Sturm's, and is often usable. Let the jth derivative of f(x); replace Sturm's sequence by f(x), (x), (x) , , (x) , and proceed in this sequence with the same a, b as before, pre cisely as in calculating Vb for Sturm's. Then, if a root of multiplicity m be now counted as m roots, for this se quence is either the number of real roots of f(x)=o between a and b, or exceeds it by a positive whole number.
It is sometimes convenient to know an upper limit L to the value of the real roots of f(x)=o. Let G be the greatest of the numerical values of the coefficients , c,,. If the first negative coefficient is preceded by precisely s coefficients that are greater than, or equal to, zero, then L = 1 + . Another upper limit is as follows: If is negative, change all signs in f(x)=o. Divide then the numerical value of each negative co efficient by the sum of all those positive coefficients that precede it. Let Q be the greatest of these quotients. Then 1+Q is the upper limit in question.
The essential detail of constructing the equation fi(x) =o whose roots are those of f(x)=o each diminished by the positive number h, is performed by synthetic division according to the following theorem, which is an immediate consequence of the expansion of f (x+h) by Taylor's theorem. If f(x) . . +cn be divided by x h, let the quotient be and the remainder Divide by x h; let the quotient be and the remainder Continue thus to n divisions. The last quotient is say the last remainder is rl. Then ynlx+rn Many devices for shortening the labour of Horner's method are explained in treatises on equations; in particular the last digit (at least) that is required can, in general, be obtained by simple division. Horner's method is, beyond any question, the most practical yet devised for the numerical solution of equations with numerical coefficients. Other methods of solving numerical equations are an impracticable one by continued fractions, due to Lagrange, and others by expansions in infinite series. The latter has recently been reconsidered by E. T. Whittaker, who expresses the coefficients in the series for a root in terms of de terminants that can be easily computed.
The general biquadratic may be taken in the form = o.
Its solution, due to Ludovici Ferrari, but first published in the Ars Magna, was found by adding to both sides and imposing the condition that the new left-hand member be iden tically the square of x2+a1 x/2+h, where h is to be found. By comparison of coefficients in the assumed identity, and sub sequent elimination of m, a cubic for h is obtained. This cubic is called the resolvent, or reducing, cubic; its 3 roots enable us to find m and b. The solution of the biquadratic is thus finally reduced to that of two quadratics, (mx+b) whose 4 roots are those required. The ultimate formulae, with the value of h inserted, are too complicated to be usable. Solu tions by means of elliptic functions are known, but they also are mere algebraic curiosities.
There is a vast literature on cubic and biquadratic equations. Little of it is to-day of any vital mathematical consequence, and the most of it is of no practical value. Nevertheless, the incessant activity of nearly three centuries reflected in this accumulation of algebraic lore was not wholly futile, for without at least some of it, the sure clue to the maze could probably not have been discovered. To appreciate the true magnitude of this early work, the modern algebraist should attempt to restore it for himself, with but such tools and notations as its creators had. He will rise from his efforts with a new respect for his forbears.
A new era began with Lagrange in 177o. In an illuminating critique of his predecessors' solutions of the general cubic and biquadratic, he observed that the solution by radicals of any algebraic equation can be made to depend upon that of another, now called the resolvent, which may or may not be easier to solve. Thus, for equations of degree 5, the resolvent is of degree 6. The roots of a resolvent equation are rational functions of those of the original. By such considerations, and others arising naturally from them, Lagrange transposed the problem of solution by radicals to a profound study of rational functions of the roots of equations, and in particular to an investigation of the number of distinct values, which such functions of the roots, considered as independent variables, assume under permutations of the roots. This work contains a germ of the modern theory founded by Galois, in which the theory of substitution groups plays a central part.
An idea of Lagrange's attack can be gained from a brief re consideration of px+q = o, which will also prepare the way for the Galois theory. Let the roots be The discrimi nant of this equation is the square of (xi-x3) (x2-x3).
Let co be an imaginary root of 1 =o, and write F G= The three functions D have the significant property that any substitution on which leaves one of these functions unchanged, leaves also the other two unchanged; for example, The totality of substitutions on n independent variables which leave a given function of those variables unchanged, form a, group; the function is said to belong to the group. Lagrange proved that, if several functions belong to the same group, any one of them is a rational function of each of the others. Hence each of is a rational function of D, and therefore each is rational in the square of a polynomial in p, q, since being the discriminant, is rational and integral in the coefficients of the equation. The coefficient of being zero in px+q = o, we have = o, which, with the values of F, G just indicated, gives three equations of degree 1 to solve for in terms of algebraic functions of p, q. Since the determinant of the set does not vanish, a solution exists, and it is thus known d priori that the general cubic is solvable by radicals. When elaborated, this method yields the roots explicitly. The biquad ratic is treated in a similar manner; the general equation of degree 5 cannot be so solved, for a reason that will appear in the concluding sections.
As Lagrange's great work was but a preliminary to that of the modern school, further discussion of it may be omitted. Con temporaneously with Lagrange, an Italian physician, Paolo Ruf fini, began issuing in 1799 a series of memoirs containing inci dentally theorems which would now be restated in terms of substitution groups on five letters, in an attempt to prove the general equation of the fourth degree algebraically unsolvable. He almost succeeded, but his projected proof, like Abel's, is incomplete, and the whole matter is to-day surveyed from the higher point of view of Galois. Before passing to this we exam ine a general process of which several applications have occurred in what preceded.
Transformations.-The importance of transforming a given equation into another, whose roots are particular functions of those of the original, was seen in the preceding sketches of the solutions, numerical or algebraic, of equations. Thus, an im portant detail of Horner's method was equivalent to the linear transformation x=y+k; the roots of the y-equation were those of the x-equation each diminished by k. This is one of the simplest examples of a more general transformation introduced in 1683 by Ehrenfried Walther Tschirnhausen (or Tschirnhaus), who attempted to solve all equations by reducing them to the form yn = A - an impossible project. Let , be the roots = o. On dividing any polynomial P in a given root, say xi, by f the polynomial is reduced to another of degree n at most, in Tschirnhaus transformations are of the type y = P(x)/Q(x), where P, Q are polynomials of degree < n, and Q(x) vanishes for no root of f(x) = o; Q(x) may reduce to a numerical constant, in particular to 1. This transforms f(x) = o into an equation in y whose roots are (j 2, , n). The y-equation can be obtained by elimination; the details usually are tedious.
The use of such transformations is evident from the following specimens. By a transformation y= where p, q do not involve x, the general cubic is reducible to = A , where A depends only on p, q and the coefficients of the cubic. More generally, by a linear transformation, or by a Tschirnhaus trans formation whose coefficients involve only i square root, the general equation of degree n can be reduced to an equa tion of degree n in y lacking the terms in Or again, by a Tschirnhaus transformation whose coefficients involve only r cube root and 3 square roots, the general equation of degree n is reducible to an equation in y of degree n lacking the terms in yn-i, yn-3. The last is of capital importance, for the general equation of degree 5, which is thus reducible to = o, a result obtained by E. S. Bring about 1786, and independently by G. B. Jerrard in 1827. This is one point of departure for the solution in terms of elliptic functions, as a similar equation appears naturally in the construction of elliptic functions whose periods are fifths of those of given functions.
The degree g of = o does not exceed n !; its g roots can be derived from by the substitutions of a group G, the so-called group of f(x) = o for the field of its coefficients. Every rational function of the roots which is unchanged by all the substitutions of G is rationally known; every rationally known rational func tion of the roots of f(x) = o is unchanged by all the substitutions of G; moreover, G is the smallest group having the first property, and the largest having the second. The group of the general equation of degree n is the symmetric group on the roots. If to the field of the coefficients (x) = o there be adjoined a rational function of the roots, giving an enlarged field which contains the original, and if this function belongs to a subgroup of G, the group of the equation is reduced to the subgroup.
Solvability by Radicals.A group is simple only if its invariant subgroups are itself and the identity; non-simple groups are called composite. A subgroup of G other than G is called proper; an invariant proper subgroup not a subgroup of a larger in variant proper subgroup is called maximal. Let H be a maximal invariant proper subgroup of any group G; let K be a maximal invariant proper subgroup of H, and so on, till the identity group I. Then G, H, K, , L, i is called a series of composition of G. Let the respective orders of these groups be g, h, k, . ,1, 1. Then g/h, h/k, , l are integers. They are the same, except for order, for all series of composition for G, and are called the factors of composition of G, which is called solvable if and only if its factors of composition are all primes.
The crown of Galois's theory is the beautiful theorem that an algebraic equation is solvable by radicals if and only if its group for the field of its coefficients is solvable. By the known properties of the symmetric groups on n letters, it follows at once that when n> 4, the general equation of degree n is not solvable by radicals.