FREQUENCY-DISTRIBUTION OF ERRORS 14. Frequency-distribution and Frequency-polygon. Suppose that, when an event happens, it may happen in either of two ways, which, in G. U. Yule's notation, we will call A and a; so that a is an abbreviation for " not-A." Let the event happen n times, and let the result be A in m cases and a in n—ni cases. Then this distribution of the n cases as m A's and n—m a's is called a frequency-distribution. More generally, suppose that the event may be classed under the k+1 heads A1, A2 ... and that, out of the total n, the numbers coming under these k+ 1 heads are respectively mo, nzi, m2 . . . mk; then the distri bution of the it in this way is called a frequency-distribution. The number of heads is taken to be k+I because the sum of the numbers . mk is necessarily n, so that if we know k of the k+i numbers we know the remaining one, The numbers mo, . are the frequencies of A A1, . ; their ratios to n, i.e., the ratios n ., are the relative fre quencies (sec. 4).
15. Representative Distribution.—Let the probability of occurrence of considered alone, i.e., its probability when the division of cases is into A o and be Po. Then, by the definition of probability, we should expect that in a very large number N of trials the number of cases of would be about N Provisionally—for reasons which will be seen later (sec. 18) —we might similarly expect that when the smaller number n of trials is made the number of cases of would be about and similarly for . . with probabilities P2. • • • The frequency-distribution which gives the numbers under the heads A 0, Al, A2. • • as nPo, nPi, nP2. . is called a representative distribution. The definition applies whether these numbers are integers or not. The differences between these numbers and the
actual numbers m2. . . which occur when n individuals are taken at random are the errors or errors of random sampling of mo, m1, ni2. • • • We might have defined a representative distribution as a dis tribution in which the numbers under the different heads are proportional to the numbers which would "in the long run" come under these heads.
16. Law of Frequency of Error (Simplest Case).—Denot ing, as before, " not-A" by a, suppose that the probability of A is p, and that that of a is q = I — p. Then as the result of n trials the number of A's might be any number from n to o, the number of a's being the remainder out of the n. What are the respective probabilities of these different numbers? Consider the probability of m A's and n—m a's. For a simple example, take n=6, m=4. Then the probability of 4 A's and 2 a's occurring in the order AAAAaa is ppppqq= and the probability of their occurring in any other specified order is similarly But there are orders in which they may occur; „C„, denoting the number of combinations of n things m together. Hence the total probability of 4 A's and 2 a's is Similarly the probability of m A's and n—m a's is Taking the values of m from n to o, we see that the probabilities of n, n-1, n-2. . .o A's (and o, 1, 2...no's) are the successive terms in the expansion (p+q)n = -„C„_2r-2q2+ . . - F 06.0 If „K,„ is the probability that in n trials there will be m and n —m a's, then = (16.2) In a representative distribution of N sets of n trials the numbers of cases in the n+ I categories (n n-1 A's and I a,. . .) would be found by multiplying the terms in (16.0 by N; i.e., the number of cases of m A's and n—m a's would be N Suppose, for example, that p = 0.6, q= o.4, n=6. Then it will be found that in a representative distribution of 1,000,000 cases the numbers of cases in which there are 6, 5, 4, 3, I, o would be 46656, 186624, 311040, 27648o, 138240, 36864, 4096. The respective probabilities (relative frequencies in a repre sentative distribution) are the ratios of these numbers to 1,000, 000. A distribution of the above kind, showing the frequencies (theoretical or actual) with which an event happens n, n—i, n-2. . . times out of 11, is called a binomial distribution.