Biographies Characteristics Analysis

Univariate analysis of variance correlation table. Multivariate analysis of variance and structural modeling of equations

One-factor dispersion model has the form

where Xjj- the value of the variable under study, obtained on z-level factor (r = 1, 2,..., t) su-m serial number (j- 1,2,..., P);/y - effect due to the influence of the i-th level of the factor; e^. - random component, or disturbance caused by the influence of uncontrollable factors, i.e. variation of a variable within a single level.

Under factor level some of its measure or state is understood, for example, the amount of fertilizers applied, the type of metal smelting or the batch number of parts, etc.

Basic prerequisites for analysis of variance.

1. Mathematical expectation of perturbation ? (/ - is zero for any i, those.

  • 2. Perturbations are mutually independent.
  • 3. Dispersion of perturbation (or variable Xu) is constant for any ij> those.

4. The perturbation e# (or the variable Xu) has a normal distribution law N( 0; a 2).

The influence of factor levels can be as fixed, or systematic(model I), and random(model II).

Let, for example, it is necessary to find out whether there are significant differences between batches of products in terms of some quality indicator, i.e. check the impact on the quality of one factor - a batch of products. If all batches of raw materials are included in the study, then the influence of the level of such a factor is systematic (model I), and the findings are applicable only to those individual batches that were involved in the study; if only a randomly selected part of the parties is included, then the influence of the factor is random (model II). In multifactorial complexes, a mixed model III is possible, in which some factors have random levels, while others are fixed.

Let's consider this problem in more detail. Let there be t batches of products. From each batch selected accordingly p L, p 2 ,p t products (for simplicity, we assume that u = n 2 =... = n t = n). We represent the values ​​of the quality index of these products in the form of a matrix of observations


It is necessary to check the significance of the influence of batches of products on their quality.

If we assume that the row elements of the observation matrix are numerical values ​​(realizations) of random variables X t , X 2 ,..., x t, expressing the quality of products and having a normal distribution law with mathematical expectations, respectively a v a 2 , ..., a t and equal dispersions a 2 , then given task comes down to testing the null hypothesis #0: a v = a 2l = ... = a t, carried out in the analysis of variance.

Let us denote the averaging over some index with an asterisk (or a dot) instead of an index, then average the quality of products of the i-th batch, or group average for the i-th level of the factor, takes the form

a overall average -

Consider the sum of the squared deviations of the observations from the total mean xn:

or Q= Q+ Q2+ ?>з The last term

since the sum of the deviations of the values ​​of the variable from its mean, i.e. ? 1.g y - x) is equal to zero. ) =x

The first term can be written as

As a result, we get the following identity:

t p. _

where Q=Y X [ x ij _ x ", I 2 - general, or complete, sum of squared deviations; 7=1

Q, - n^, where to one; k (n -1) - degrees of freedom ^ -distribution, 5 and i 7]- ^-Fisher's criterion. Example 6.1. Two hundred assumptions that the speed factor of presenting words affects the performance of their reproduction (data in the table in Fig. 8.1). Solution sequence:

o Formulation of hypotheses.

H 0: the speed factor is no more pronounced than random; H 1: The rate factor is more pronounced than random.

o Checking assumptions: investigated parameter normal distribution; samples unrelated identical volumes; measurements on a ratio scale.

o Definition empirical criterion G EMF is based on comparing the squared sums of the columns with the sum of the squares of all empirical values. Each column represents a sample and corresponds to a certain gradation of the speed factor.

o Designations introduced:

P= 6 - number of observations (rows)

to= 3 - number of factors (bars)

PC = 6-3 = 18 - total individual values;

7 - row index changes from 1 to P(7 = 1, 2, ..., n)

and- column index changes from 1 to to (and= 1, 2, ..., k).

o Mathematical calculations(see fig 6.1 6.2):

i = 1 7 = 1 p m kp^ u = 1)

There are 1 = 6 2 + seven 2 + 6 2 + 5 2 + _ + 5 2 + 5 2 = 432; and 2 = - (34 2 + +29 2 + 23 2) = 421;

and 3^^ (34 + 29 + 23) 2 = 410.89; 3 or 6

Rice. 6.1. Results Fig. 6.2. Calculation formulas

analysis of variance one-way analysis of variance

o Critical value^ kr can be obtained using the function

RDISP() for the significance level for a = 0.05 (0.01) and the number of degrees of freedom to 1 \u003d 3-1 \u003d 2 and k (n -1) \u003d 3 (6-1) \u003d 15. G 0u05 ~ 3.68 and G 0u01 ~ 6.36.

o Decision making. Insofar as ¥ HMF> P 0? 01(6.89 > 6.36), null hypothesis H 0 deviates at a significance level of 0.01.

o Formulation of conclusions. Differences in word reproduction volume (speed factor) are more pronounced than random. This relationship can be represented graphically in Fig. 6.3.

Rice. 6.3. Dependence of the average volume of reproduced words on the speed of presentation

Calculations of a one-factor model can be carried out using the "Data Analysis" package, section "One-factor analysis of variance" (Fig. 6.4).

Rice. 6.4. Package menu "Data analysis" After entering the appropriate parameters (Fig. 6.5), you can get the results of one-way analysis of variance (Fig. 6.6).

Rice. 6.5. Dialog window

Rice. 6.6. Results of one-way analysis of variance (a = 0.05)

The computer package "Data Analysis" performs calculations of basic statistics (sums, averages, variances, the value of empirical and theoretical criteria, etc.), which gives the researcher grounds for statistical conclusions.