Stations of the Cross

series, wives, ages, coefficient, correlation, husbands, table, quantities and married

Page: 1 2 3 4 5

Groups That Are Not Homogeneous.

A case that may, be mentioned is that in which the measurements ob tained relate, not to a homogeneous group, but to a mixture of two or more groups. The manner in which such a mixture might affect the characteristics of a distribution may be illustrated from the figures given below for the heights of a particular group of men. Suppose two such groups were found differing in the heights of the individuals composing them, but in such a way that the numbers in one of them whose heights were between any two specified limits were exactly twice those for limits an inch higher in the other group. We might have, for example :— the individuals of which were half an inch taller than those of the original group. If the distribution of the original group were such as could be precisely represented by an appropriate theo retical formula, the compound group would not be, in general, represented by the same formula. In some instances, it has been found possible to show that an observed group was so distributed as to be consistent with the assumption that it was made up of two groups of different characteristics, the members of which were included within one and the same series of observations. Such an analysis can clearly be of considerable importance with reference to the deductions to be drawn from the examination of the observed data.

Three Variable Elements.

In what precedes, the statistical material considered has consisted of series of pairs of quantities, the values of the two quantities in each pair being related ; for example, where one of the quantities is the height of a man, the other the relative frequency with which men of that height were observed. We pass now to the consideration of series of three quantities, for example, where two of the quantities express the measures of different phenomena (which may be connected in some way, or may be independent) and the third expresses the relative frequency with which any combination of the other two occurs. Such a series is represented in the table given below, which shows the proportions in which, over a period of years, in a particular country, marriages in which both bride and bride groom had been married previously were distributed according to the ages of the brides and of the bridegrooms. The particulars might advantageously be set out in fuller detail, but the condensed table will serve quite well to illustrate the nature of the problems connected with the analysis of data of this kind.

The table is to be read as follows : Of each thousand women married for the second (or later) time to men who had been mar ried before, 447 were between 25 and 35 years old, and of these 20 married men under 25 years of age, 175 married men between 25 and 35, 167 married men between 35 and 45 and 17 men be tween 45 and 55. The other rows of figures will be read similarly, and the columns are to be read in corresponding manner, with the words "husband" and "wife" interchanged.

The columns, and also the rows, representing the age distribu tions of particular sections of the wives and of the husbands, are examples of skew distribution of numbers. It is clear, from the clustering of numbers in the neighbourhood of a band cross ing the table from its upper left side towards its lower right side, that there is some association between the ages of the wives and the husbands, which are more often relatively high together, or relatively low together, than in other relations. The average age of the husbands is near 41 years and that of the wives somewhat The numbers in the last row represent a compound of two groups as numerous as the original group, the individuals of each of which were half an inch shorter than those of the original group, and another group, equal in number to the original group, in excess of 34 years.

Correlation.

If the average ages of the husbands shown in each row of the table be computed, and also those of the wives shown in each column, the two series of figures thus obtained show the variation of the average ages of corresponding groups of hus bands and wives. They furnish the measures of what are known as the regression of the ages of husbands on those of wives, and the regression of the ages of wives on those of husbands. The two regressions would be complementary if the distribution represented complete linear correlation of husbands' ages and wives' ages. When this complete correlation is not shown, a measure of the degree in which the distribution diverges from that corresponding to complete correlation is afforded by the so called coefficient of correlation. Between two series of quantities Xi, and Y1, Y2, Y3 if the sums of the squares of each series, and be calculated, and the sum of the pairs of products xiyi, x2y2, x3y3 . . . , i.e., S(x•y) be also calculated, the fraction S(x•y)/ gives the coefficient of correlation.

In the case represented by the final row and the final column of the last table, the calculation gives a fraction exceeding .9. The highest possible value for the coefficient is 1. The coefficient is a measure of the divergence between the two lines of regression in cases like that illustrated, those two lines coalescing when the correlation is complete. If the series compared are such that decreases in the one correspond to increases in the other, the calculated coefficient is negative. The range of values for the coefficient lies between o and i for positive correlations and be tween o and --I for negative correlations. Small values of the coefficient mean that the connection between the series is slight, and, in view of the extent to which, in actual observations, the records obtained are affected by disturbances having no relation to the matter under examination, no great significance can gen erally be attached to the occurrence of correlation coefficients of small magnitude.

Page: 1 2 3 4 5