Biographies Characteristics Analysis

Find the matrix of sample pair correlation coefficients. Definition of multiple correlation coefficient in MS Excel

Z 1 (t)

Z 2 (t)

t

y(t)

Z 1 (t)

Z 2 (t)

t

y(t)

The main task when choosing the factors included in the correlation model is to include in the analysis all the main factors that affect the level of the phenomenon under study. However, it is not advisable to introduce a large number of factors into the model; it is more correct to select only a relatively small number of the main factors that are presumably in correlation with the selected functional indicator.

This can be done using the so-called two-stage selection. In accordance with it, all pre-selected factors are included in the model. Then among them, on the basis of a special quantitative assessment and an additional qualitative analysis, insignificantly influencing factors are identified, which are gradually discarded until there are those regarding which it can be argued that the available statistical material is consistent with the hypothesis of their joint significant effect on the dependent variable with the chosen form of connection.

Two-stage selection received its most complete expression in the method of the so-called multi-stage regression analysis, in which the elimination of insignificant factors occurs on the basis of their significance indicators, in particular, on the basis of the value of t f - the calculated value of the Student's criterion.

Calculate t f according to the found pair correlation coefficients and compare them with t critical for 5% significance level (two-sided) and 18 degrees of freedom (ν = n-2).

where r is the value of the pair correlation coefficient;

n – number of observations (n=20)

When comparing t f for each coefficient with t kr = 2,101 we obtain that the found coefficients are recognized as significant, since t f > t cr.

t f for r yx 1 = 2, 5599 ;

t f for r yx 2 = 7,064206 ;

t f for r yx 3 = 2,40218 ;

t f for r x1 x 2 = 4,338906 ;

t f for r x1 x 3 = 15,35065;

t f for r x2 x 3 = 4,749981

When selecting factors included in the analysis, they are subject to specific requirements. First of all, indicators expressing these factors must be quantifiable.

The factors included in the model should not be in a functional or close relationship with each other. The presence of such bonds is characterized by multicollinearity.

Multicollinearity indicates that some factors characterize the same side of the phenomenon under study. Therefore, their simultaneous inclusion in the model is impractical, since they duplicate each other to a certain extent. If there are no special assumptions speaking in favor of one of these factors, preference should be given to one of them, which is characterized by a large coefficient of pair (or partial) correlation.

It is believed that the limiting value is the value of the correlation coefficient between the two factors, equal to 0.8.

Multicollinearity usually leads to the degeneracy of the matrix of variables and, consequently, to the fact that the main determinant decreases its value and, in the limit, becomes close to zero. Estimates of the coefficients of the regression equation become highly dependent on the accuracy of finding the initial data and change their values ​​dramatically when the number of observations changes.

Task 2

1. Build a matrix of paired correlation coefficients. Check for multicollinearity. Justify the selection of factors in the model.

2. Build an equation multiple regression in linear form with selected factors.

3. Rate statistical significance regression equation and its parameters using Fisher's and Student's criteria.

4. Build a regression equation with statistically significant factors. Assess the quality of the regression equation using the coefficient of determination R 2 . Assess the accuracy of the constructed model.

5. Estimate the forecast for the volume of output, if the forecast values ​​of the factors are 75% of their maximum values.

Task conditions (Option 21)

According to the data presented in Table 1 (n = 17), we study the dependence of the volume of output Y (million rubles) on following factors(variables):

X 1 - the number of industrial and production personnel, people.

X 2 - the average annual cost of fixed assets, million rubles.

X 3 - depreciation of fixed assets,%

X 4 - electric power, kWh.

X 5 - technical equipment of one worker, million rubles.

X 6 - production of marketable products per worker, rub.

Table 1. Production data

Y x1 x2 x3 x4 x5 x6
39,5 4,9 3,2
46,4 60,5 20,4
43,7 24,9 9,5
35,7 50,4 34,7
41,8 5,1 17,9
49,8 35,9 12,1
44,1 48,1 18,9
48,1 69,5 12,2
47,6 31,9 8,1
58,6 139,4 29,7
70,4 16,9 5,3
37,5 17,8 5,6
62,0 27,6 12,3
34,4 13,9 3,2
35,4 37,3 19,0
40,8 55,3 19,3
48,1 35,1 12,4


Build a matrix of paired correlation coefficients. Check for multicollinearity. Justify the selection of factors in the model

Table 2 presents pair correlation coefficient matrix for all variables involved in the consideration. Matrix obtained using the tool Correlation from the package Data analysis in Excel.

Table 2. Matrix of pair correlation coefficients

Y X1 X2 X3 X4 X5 X6
Y
X1 0,995634
X2 0,996949 0,994947
X3 -0,25446 -0,27074 -0,26264
X4 0,12291 0,07251 0,107572 0,248622
X5 0,222946 0,166919 0,219914 -0,07573 0,671386
X6 0,067685 -0,00273 0,041955 -0,28755 0,366382 0,600899

Visual analysis of the matrix allows you to establish:

1) At has rather high pairwise correlations with variables X1, X2 (>0,5) and low with variables X3, X4, X5, X6 (<0,5);

2) The analysis variables X1, X2 demonstrate rather high pairwise correlations, which necessitates checking the factors for the presence of multicollinearity between them. Moreover, one of the conditions of the classical regression model is the assumption of independence of the explanatory variables.

To identify the multicollinearity of factors, we perform Farrar-Glouber test by factors X1, X2, X3,X4,X5,X6.

Verification of the Farrar-Glouber test for multicollinearity of factors includes several stages.

1) Checking for multicollinearity of the entire array of variables .

One of the conditions of the classical regression model is the assumption that the explanatory variables are independent. To identify multicollinearity between factors, the matrix of interfactorial correlations R is calculated using the Data Analysis Package (Table 3).

Table 3. Interfactor correlation matrix R

X1 X2 X3 X4 X5 X6
X1 0,994947 -0,27074 0,07251 0,166919 -0,00273
X2 0,994947 -0,26264 0,107572 0,219914 0,041955
X3 -0,27074 -0,26264 0,248622 -0,07573 -0,28755
X4 0,07251 0,107572 0,248622 0,671386 0,366382
X5 0,166919 0,219914 -0,07573 0,671386 0,600899
X6 -0,00273 0,041955 -0,28755 0,366382 0,600899

There is a strong relationship between factors X1 and X2, X5 and X4, X6 and X5 (>0.5).

The determinant det (R) = 0.001488 is calculated using the MOPRED function. The determinant of the matrix R tends to zero, which allows us to make an assumption about the general multicollinearity of the factors.

2) Checking for multicollinearity of each variable with other variables:

Calculate the inverse matrix R -1 using the Excel MINF function (Table 4):

Table 4 inverse matrix R-1

X1 X2 X3 X4 X5 X6
X1 150,1209 -149,95 3,415228 -1,70527 6,775768 4,236465
X2 -149,95 150,9583 -3,00988 1,591549 -7,10952 -3,91954
X3 3,415228 -3,00988 1,541199 -0,76909 0,325241 0,665121
X4 -1,70527 1,591549 -0,76909 2,218969 -1,4854 -0,213
X5 6,775768 -7,10952 0,325241 -1,4854 2,943718 -0,81434
X6 4,236465 -3,91954 0,665121 -0,213 -0,81434 1,934647

· Calculation of F-criteria , where are the diagonal elements of the matrix , n=17, k = 6 (Table 5).

Table 5. F-criteria values

F1 (Х1) F2 (Х2) F3 (X3) F4 (X4) F5 (x5) F6 (X6)
89,29396 89,79536 0,324071 0,729921 1,163903 0,559669

The actual values ​​of the F-criteria are compared with table value F table = 3.21(FDISP(0.05;6;10)) with n1= 6 and n2 = n - k – 1=17-6-1=10 degrees of freedom and significance level α=0.05, where k is the number of factors.

· The values ​​of F-criteria for factors X1 and X2 are larger than the table value, which indicates the presence of multicollinearity between these factors. Factor X3 has the least effect on the overall multicollinearity of factors.

3) Checking for Multicollinearity for Each Pair of Variables

Calculate the partial correlation coefficients using the formula , where are the elements of the matrix (Table 6)

Table 6. Matrix of coefficients of partial correlations

X1 X2 X3 X4 X5 X6
X1
X2 0,996086
X3 -0,22453 0,197329
X4 0,093432 -0,08696 0,415882
X5 -0,32232 0,337259 -0,1527 0,581191
X6 -0,24859 0,229354 -0,38519 0,102801 0,341239

· Calculation t- criteria according to the formula (table 7)

n - number of data = 17

K - number of factors = 6

Table 7.t-tests for partial correlation coefficients

X1 X2 X3 X4 X5 X6
X1
X2 35,6355
X3 -0,72862 0,636526
X4 0,296756 -0,27604 1,446126
X5 -1,07674 1,13288 -0,4886 2,258495
X6 -0,81158 0,745143 -1,31991 0,326817 1,147999

t table \u003d STUDRIVE (0.05; 10) \u003d 2.23

The actual values ​​of t-criteria are compared with the table value at degrees of freedom n-k-1 = 17-6-1=10 and significance level α=0.05;

t21 > ttable

t54 > ttable

Tables 6 and 7 show that two pairs of factors X1 and X2, X4 and X5 have a high statistically significant partial correlation, that is, they are multicollinear. In order to get rid of multicollinearity, one of the variables of the collinear pair can be eliminated. In a pair of X1 and X2 we leave X2, in a pair of X4 and X5 we leave X5.

Thus, as a result of checking the Farrar-Glouber test, the following factors remain: X2, X3, X5, X6.

Completing the procedures correlation analysis, it is advisable to look at the partial correlations of the selected factors with the result Y.

Let's build a matrix of paired correlation coefficients based on the data in Table 8.

Table 8. Output data with selected factors X2, X3, X5, X6.

No. of observation Y x2 x3 x5 x6
39,5 3,2
46,4 20,4
43,7 9,5
35,7 34,7
41,8 17,9
49,8 12,1
44,1 18,9
48,1 12,2
47,6 8,1
58,6 29,7
70,4 5,3
37,5 5,6
12,3
34,4 3,2
35,4
40,8 19,3
48,1 12,4

The last column of Table 9 shows the t-test values ​​for column Y.

Table 9. Matrix of coefficients of partial correlation with the result Y

Y X2 X3 X5 X6 t criterion (t tab (0.05; 11) = 2.200985
Y 0,996949 -0,25446 0,222946 0,067685
X2 0,996949 -0,26264 0,219914 0,041955 44,31676
X3 -0,25446 -0,26264 -0,07573 -0,28755 0,916144
X5 0,222946 0,219914 -0,07573 0,600899 -0,88721
X6 0,067685 0,041955 -0,28755 0,600899 1,645749

Table 9 shows that the variable Y has a high and at the same time statistically significant partial correlation with X2 factor.

1. CONSTRUCT A MATRIX OF COEFFICIENTS OF PAIR CORRELATION.

To do this, we calculate the pair correlation coefficients using the formula:

The necessary calculations are presented in Table 9.

-

the relationship between the revenue of the enterprise Y and the volume of capital investments X 1 is weak and direct;

-

there is practically no connection between the revenue of enterprise Y and fixed production assets X 2;

-

the connection between the volume of capital investments X 1 and fixed production assets X 2 is close and direct;

Table 9

Auxiliary table for calculating pair correlation coefficients

t Y X1 X2

(y-yavg)*
(x1-x1sr)

(y-yavg)*
(x2-x2sr)

(х1-х1ср)*
(x2-x2sr)

1998 3,0 1,1 0,4 0,0196 0,0484 0,0841 0,0308 0,0406 0,0638
1999 2,9 1,1 0,4 0,0576 0,0484 0,0841 0,0528 0,0696 0,0638
2000 3,0 1,2 0,7 0,0196 0,0144 1E-04 0,0168 -0,0014 -0,0012
2001 3,1 1,4 0,9 0,0016 0,0064 0,0441 -0,0032 -0,0084 0,0168
2002 3,2 1,4 0,9 0,0036 0,0064 0,0441 0,0048 0,0126 0,0168
2003 2,8 1,4 0,8 0,1156 0,0064 0,0121 -0,0272 -0,0374 0,0088
2004 2,9 1,3 0,8 0,0576 0,0004 0,0121 0,0048 -0,0264 -0,0022
2005 3,4 1,6 1,1 0,0676 0,0784 0,1681 0,0728 0,1066 0,1148
2006 3,5 1,3 0,4 0,1296 0,0004 0,0841 -0,0072 -0,1044 0,0058
2007 3,6 1,4 0,5 0,2116 0,0064 0,0361 0,0368 -0,0874 -0,0152
Σ 31,4 13,2 6,9 0,684 0,216 0,569 0,182 -0,036 0,272
Avg. 3,14 1,32 0,69

Also, the matrix of pair correlation coefficients can be found in the Excel environment using the DATA ANALYSIS add-in, the CORRELATION tool.

The matrix of pair correlation coefficients is:

Y X1 X2
Y 1
X1 0,4735 1
X2 -0,0577 0,7759 1

The matrix of paired correlation coefficients shows that the effective attribute y (revenue) has weak connection with the volume of capital investments x 1, and there is practically no connection with the size of the OPF. The relationship between the factors in the model is estimated as close, which indicates their linear dependence, multicollinearity.

2. BUILD A LINEAR MULTIPLE REGRESSION MODEL

We will find the model parameters using the least squares method. To do this, we will create a system normal equations.

Calculations are presented in table 10.

Let's solve the system of equations using Cramer's method:

Table 10

Auxiliary Calculations for Finding the Parameters of a Linear Multiple Regression Model

y
3,0 1,1 0,4 1,21 0,44 0,16 3,3 1,2
2,9 1,1 0,4 1,21 0,44 0,16 3,19 1,16
3,0 1,2 0,7 1,44 0,84 0,49 3,6 2,1
3,1 1,4 0,9 1,96 1,26 0,81 4,34 2,79
3,2 1,4 0,9 1,96 1,26 0,81 4,48 2,88
2,8 1,4 0,8 1,96 1,12 0,64 3,92 2,24
2,9 1,3 0,8 1,69 1,04 0,64 3,77 2,32
3,4 1,6 1,1 2,56 1,76 1,21 5,44 3,74
3,5 1,3 0,4 1,69 0,52 0,16 4,55 1,4
3,6 1,4 0,5 1,96 0,7 0,25 5,04 1,8
31,4 13,2 6,9 17,64 9,38 5,33 41,63 21,63

The linear model of multiple regression has the form:

If the volume of capital investments is increased by 1 million rubles, then the company's revenue will increase by an average of 2.317 million rubles. with fixed sizes of fixed production assets.

If the main production assets are increased by 1 million rubles, then the company's revenue will decrease by an average of 1.171 million rubles. with the same amount of investment.

3. WE CALCULATE:

determination coefficient:

67.82% of the change in the company's revenue is due to a change in the volume of capital investments and fixed production assets, by 32.18% - the influence of factors not included in the model.

F - Fisher criterion

Let's check the significance of the equation

The tabular value of F is a criterion at a significance level of α = 0.05 and the number of degrees of freedom d.f. 1 = k = 2 (number of factors), number of degrees of freedom d.f. 2 \u003d (n - k - 1) \u003d (10 - 2 - 1) \u003d 7 will be 4.74.

Since F calc. = 7.375 > F tab. = 4.74, then the regression equation as a whole can be considered statistically significant.

The calculated indicators can be found in the Excel environment using the DATA ANALYSIS add-in, the REGRESSION tool.


Table 11

Auxiliary Calculations for Finding the Average Relative Approximation Error

y BUT
3,0 1,1 0,4 2,97 0,03 0,010
2,9 1,1 0,4 2,97 -0,07 0,024
3,0 1,2 0,7 2,85 0,15 0,050
3,1 1,4 0,9 3,08 0,02 0,007
3,2 1,4 0,9 3,08 0,12 0,038
2,8 1,4 0,8 3,20 -0,40 0,142
2,9 1,3 0,8 2,96 -0,06 0,022
3,4 1,6 1,1 3,31 0,09 0,027
3,5 1,3 0,4 3,43 0,07 0,019
3,6 1,4 0,5 3,55 0,05 0,014
0,353

average relative approximation error

On average, the calculated values ​​differ from the actual ones by 3.53%. The error is small, the model can be considered accurate.

4. Build a power model of multiple regression

To build this model, we take the logarithm of both sides of the equality

lg y = lg a + β 1 ∙ lg x 1 + β 2 ∙ lg x 2 .

Let's make the change Y = lg y, A = lg a, X 1 = lg x 1 , X 2 = lg x 2 .

Then Y = A + β 1 ∙ X 1 + β 2 ∙ X 2 is a linear two-factor regression model. MNC can be applied.

Calculations are presented in table 12.

Table 12

Auxiliary calculations for finding the parameters of the power model of multiple regression

y lg y
3,0 1,1 0,4 0,041 -0,398 0,477 0,002 -0,016 0,020 0,158 -0,190
2,9 1,1 0,4 0,041 -0,398 0,462 0,002 -0,016 0,019 0,158 -0,184
3,0 1,2 0,7 0,079 -0,155 0,477 0,006 -0,012 0,038 0,024 -0,074
3,1 1,4 0,9 0,146 -0,046 0,491 0,021 -0,007 0,072 0,002 -0,022
3,2 1,4 0,9 0,146 -0,046 0,505 0,021 -0,007 0,074 0,002 -0,023
2,8 1,4 0,8 0,146 -0,097 0,447 0,021 -0,014 0,065 0,009 -0,043
2,9 1,3 0,8 0,114 -0,097 0,462 0,013 -0,011 0,053 0,009 -0,045
3,4 1,6 1,1 0,204 0,041 0,531 0,042 0,008 0,108 0,002 0,022
3,5 1,3 0,4 0,114 -0,398 0,544 0,013 -0,045 0,062 0,158 -0,217
3,6 1,4 0,5 0,146 -0,301 0,556 0,021 -0,044 0,081 0,091 -0,167
31,4 13,2 6,9 1,178 -1,894 4,955 0,163 -0,165 0,592 0,614 -0,943

We solve the system of equations using the Cramer method.

The power model of multiple regression has the form:

AT power function the coefficients at the factors are the coefficients of elasticity. The coefficient of elasticity shows how many percent the value of the effective attribute y will change on average if one of the factors is increased by 1% with the value of other factors unchanged.

If the volume of capital investments is increased by 1%, then the company's revenue will increase by an average of 0.897% with the same size of fixed production assets.

If the fixed production assets are increased by 1%, then the company's revenue will decrease by 0.226% with unchanged capital investments.

5. WE CALCULATE:

multiple correlation coefficient:

The connection between the enterprise's revenue and the volume of capital investments and fixed production assets is close.

Table 13

Auxiliary calculations for finding the coefficient of multiple correlation, the coefficient of determination, the average relative error of approximation of the power model of multiple regression

Y

(Y-Y calc.) 2

A
3,0 1,1 0,4 2,978 0,000 0,020 0,007
2,9 1,1 0,4 2,978 0,006 0,058 0,027
3,0 1,2 0,7 2,838 0,026 0,020 0,054
3,1 1,4 0,9 3,079 0,000 0,002 0,007
3,2 1,4 0,9 3,079 0,015 0,004 0,038
2,8 1,4 0,8 3,162 0,131 0,116 0,129
2,9 1,3 0,8 2,959 0,003 0,058 0,020
3,4 1,6 1,1 3,317 0,007 0,068 0,024
3,5 1,3 0,4 3,460 0,002 0,130 0,012
3,6 1,4 0,5 3,516 0,007 0,212 0,023
31,4 13,2 6,9 0,198 0,684 0,342

determination coefficient:

71.06% of the change in the company's revenue in the power-law model is due to a change in the volume of capital investments and fixed production assets, by 28.94% - the influence of factors not included in the model.

F - Fisher criterion

Let's check the significance of the equation

The tabular value of F is a criterion at a significance level of α = 0.05 and the number of degrees of freedom d.f. 1 = k = 2, number of degrees of freedom d.f. 2 \u003d (n - k - 1) \u003d (10 - 2 - 1) \u003d 7 will be 4.74.

Since F calc. = 8.592 > F tab. = 4.74, then the power regression equation as a whole can be considered statistically significant.

Landing is impossible, in which case the fuel consumption is less. Get the program optimal control, when until a certain moment t1 there is no control u*=0, and starting from t=t1, the control is equal to its maximum value u*=umax, which corresponds to the minimum fuel consumption. 6.) Solve the canonical system of equations, considering it for cases where and control ...

To the compilation of mathematical models. If a mathematical model is a diagnosis of a disease, then an algorithm is a method of treatment. The following main stages of operational research can be distinguished: observation of the phenomenon and collection of initial data; formulation of the problem; construction mathematical model; model calculation; testing the model and analyzing the output data. If the results are not satisfactory...

Mathematical constructions by analogy with reveals in a flat approximation the longitudinal-scalar electromagnetic wave with electrical - (28) and magnetic (29) common-mode components. The mathematical model of irrotational electrodynamics is characterized by the scalar-vector structure of its equations. The fundamental equations of irrotational electrodynamics are summarized in Table 1. Table 1, ...

OPTION 5

The dependence of the average life expectancy on several factors is studied according to the data for 1995, presented in Table. 5.

Table 5

Mozambique

……………………………………………………………………………………..

Switzerland

Designations adopted in the table:

· Y-- average life expectancy at birth, years;

· X 1 -- GDP in purchasing power parities;

· X 2 -- chain pace population growth, %;

· X 3 -- chain labor force growth rate, %;

· X 4 -- infant mortality rate, % .

Required:

1. Make a matrix of paired correlation coefficients between all the variables under study and identify collinear factors.

2. Construct a regression equation that does not contain collinear factors. Check the statistical significance of the equation and its coefficients.

3. Build a regression equation containing only statistically significant and informative factors. Check the statistical significance of the equation and its coefficients.

Items 4 - 6 refer to the regression equation built when performing item 3.

4. Assess the quality and accuracy of the regression equation.

5. Give an economic interpretation of the coefficients of the regression equation and a comparative assessment of the strength of the influence of factors on the resulting variable Y.

6. Calculate the predicted value of the resulting variable Y, if the predicted values ​​of the factors amount to 75% of their maximum values. Plot the confidence interval of the prediction of the actual value Y with 80% reliability.

Decision. To solve the problem, we use spreadsheet processor EXCEL.

1. Using the add-on "Data analysis ... Correlation" we build a matrix of paired correlation coefficients between all the variables under study (menu "Tools" "Data analysis..." "Correlation"). On fig. Figure 1 shows the correlation analysis panel with filled fields. To copy a window snapshot to the WINDOWS data clipboard, use the Alt+Print Screen key combination (on some keyboards, Alt+PrtSc). The correlation analysis results are shown in Appendix. 2 and transferred to table. one.

rice. 1. Correlation analysis panel

Table 1

Matrix of pairwise correlation coefficients

Analysis interfactorial correlation coefficients shows that the value of 0.8 exceeds in absolute value correlation coefficient between a pair of factors X 2 -X 3 (highlighted in bold). Factors X 2 -X 3 are thus recognized as collinear.

2. As shown in paragraph 1, the X2-X3 factors are collinear, which means that they actually duplicate each other, and their simultaneous inclusion in the model will lead to an incorrect interpretation of the corresponding regression coefficients. It can be seen that the X2 factor has a higher absolute correlation coefficient with the result Y than the X3 factor: ry,x2=0.72516; ry,x3=0.53397; |ry,x2|>|ry,x3| (see Table 1). This indicates more strong influence factor X2 to change Y. Factor X3 is thus excluded from consideration.

To construct the regression equation, the values ​​of the variables used ( Y,X 1 , X 2 , X 4) copy to a blank worksheet ( adj. 3). We build the regression equation using the add-on " Data Analysis… Regression» (menu « Service" « Data analysis…» « Regression"). The regression analysis panel with filled fields is shown in rice. 2.

The results of the regression analysis are given in adj. 4 and transferred to tab. 2. The regression equation has the form (see " Odds» in tab. 2):

y = 75.44 + 0.0447 ? x 1 - 0.0453 ? x2 - 0.24? x4

The regression equation is recognized as statistically significant, since the probability of its random formation in the form in which it was obtained is 1.04571 × 10 -45 (see Fig. "F Significance" in tab. 2), which is significantly lower than the accepted significance level =0.05.

Probability of random formation of coefficients at the factor X 1 below the accepted level of significance =0.05 (see “ P-Value" in tab. 2), which indicates the statistical significance of the coefficients and the significant impact of these factors on the change in annual profit Y.

Probability of random formation of coefficients at factors X 2 and X 4 exceeds the accepted level of significance =0.05 (see “ P-Value" in tab. 2), and these coefficients are not considered statistically significant.

rice. 2. Model regression analysis panel Y(X 1 ,X 2 ,X 4 )

table 2

Y(X 1 , X 2 , X 4 )

Analysis of variance

Significance F

Regression

Regression Equation

Odds

standard error

t-statistic

P-Value

bottom 95%

Top 95%

Lower 95.0%

Top 95.0%

Y-intersection

3. Based on the results of checking the statistical significance of the coefficients of the regression equation, carried out in the previous paragraph, we build a new regression model containing only informative factors, which include:

factors, the coefficients for which are statistically significant;

factors whose coefficients t _statistics modulo exceeds one (in other words, absolute value coefficient greater than standard error).

The first group includes the factor X 1 to second -- factor X 4 . Factor X 2 is excluded from consideration as non-informative, and finally regression model will contain factors X 1 , X 4 .

To build a regression equation, copy the values ​​of the variables used to a blank worksheet ( adj. 5) and perform a regression analysis ( rice. 3). Its results are given in adj. 6 and transferred to tab. 3. The regression equation looks like:

y = 75.38278 + 0.044918 ? x 1 - 0.24031 ? x4

(cm. " Odds» in table 3).

rice. 3. Panel regression analysis of the model Y(X 1 , X 4 )

Table 3

Model Regression Analysis Results Y(X 1 , X 4 )

Regression statistics

Multiple R

R-square

Normalized R-square

standard error

Observations

Analysis of variance

Significance F

Regression

Regression Equation

Odds

standard error

t-statistic

P-Value

Y-intersection

The regression equation is statistically significant: the probability of its random formation is below the acceptable significance level = 0.05 (see " Significance F" in table 3).

The coefficient at the factor is also recognized as statistically significant X 1 the probability of its random formation is below the acceptable significance level = 0.05 (see “ P-Value" in tab. 3). This indicates a significant impact of GDP in purchasing power parities X 1 per change in annual profit Y.

Coefficient at the factor X 4 (annual infant mortality rate) is not statistically significant. However, this factor can still be considered informative, since t _his odds statistic exceeds modulo unit, although further conclusions regarding the factor X 4 should be treated with some caution.

4. Evaluate the quality and accuracy of the last regression equation using some statistical characteristics obtained during the regression analysis (see " regression statistics» in the table. 3):

multiple coefficient of determination

R2 = _ i=1 ____________ =0.946576

R 2 = shows that the regression model explains 94.7% of the variation in average life expectancy at birth Y, and this variation is due to a change in the factors included in the regression model X 1 , X 4 ;

regression standard error

shows that the values ​​predicted by the regression equation for average life expectancy at birth Y differ from the actual values ​​by an average of 2.252208 years.

Medium relative error approximation is determined by the approximate formula:

E rel? 0.8 ? -- ? 100%=0.8 ? 2.252208/66.9? 100%?2.7

where thousand rubles. -- life expectancy value (determined using the built-in function " AVERAGE»; adj. one).

E rel shows that the values ​​of annual profit predicted by the regression equation Y differ from the actual values ​​by an average of 2.7%. The model has high precision(at - the accuracy of the model is high, at - good, at - satisfactory, at - unsatisfactory).

5. For the economic interpretation of the coefficients of the regression equation, we tabulate the average values ​​and standard deviations variables in the initial data (Table 4). Average values ​​were determined using the built-in function "AVERAGE", standard deviations - using the built-in function "STDEV" (see Appendix 1).

In the territories of the South federal district The Russian Federation provides data for 2011

Territories of the federal district

Gross regional product, billion rubles, Y

Investments in fixed capital, billion rubles, X1

1. Rep. Adygea

2. Rep. Dagestan

3. Rep. Ingushetia

4. Kabardino-Balkarian Republic

5. Rep. Kalmykia

6. Karachay-Cherkess Republic

7. Rep. North Ossetia Alania

8. Krasnodar region)

9. Stavropol Territory

10. Astrakhan region

11. Volgograd region

12. Rostov region

  • 1. Calculate the matrix of paired correlation coefficients; evaluate the statistical significance of the correlation coefficients.
  • 2. Build the correlation field of the resulting feature and the most closely related factor.
  • 3. Calculate the parameters of the linear pair regression for each factor X..
  • 4. Evaluate the quality of each model through the coefficient of determination, the average approximation error and Fisher's F-test. Choose the best model.

will be 80% of maximum value. Present graphically: actual and model values, forecast points.

  • 6. Using stepwise multiple regression (exclusion method or inclusion method), build a model of apartment price formation due to significant factors. Give an economic interpretation of the coefficients of the regression model.
  • 7. Evaluate the quality of the built model. Has the quality of the model improved compared to the one-factor model? Give an assessment of the influence of significant factors on the result using elasticity coefficients, in - and -? coefficients.

When solving this problem, calculations and construction of graphs and diagrams will be carried out using the setting Excel Analysis data.

1. Calculate the matrix of paired correlation coefficients and evaluate the statistical significance of the correlation coefficients

In the Correlation dialog box, in the Input interval field, enter the range of cells containing the source data. Since we also selected the column headings, we check the Labels in the first row checkbox.

We got the following results:

Table 1.1 Matrix of pairwise correlation coefficients

An analysis of the matrix of pair correlation coefficients shows that the dependent variable Y, i.e. the gross regional product, has a closer relationship with X1 (investment in fixed capital). The correlation coefficient is 0.936. This means that the dependent variable Y (gross regional product) is 93.6% dependent on X1 (investment in fixed assets).

The statistical significance of the correlation coefficients will be determined using Student's t-test. The table value is compared with the calculated values.

Let's calculate the table value using the STUDRIST function.

t table = 0.129 at confidence level equal to 0.9 and degrees of freedom (n-2).

The X1 factor is statistically significant.

2. Let's construct the field of correlation of the effective feature (gross regional product) and the most closely related factor (investment in fixed capital)

To do this, we will use the tool for constructing a scatter plot in Excel.

As a result, we obtain the field of correlation of the price of the gross regional product, billion rubles. and investments in fixed capital, billion rubles. (Figure 1.1.).

Figure 1.1

3. Calculate the parameters of linear pair regression for each factor X

To calculate the parameters of a linear pairwise regression, we will use the Regression tool included in the Data Analysis setting.

In the Regression dialog box, in the Input interval Y field, enter the address of the range of cells that represents the dependent variable. In field

Input interval X we enter the address of the range that contains the values ​​of the independent variables. Let's calculate the pairwise regression parameters for factor X.

For X1, the following data were obtained, presented in Table 1.2:

Table 1.2

The regression equation for the dependence of the price of the gross regional product on investment in fixed capital has the form:

4. Let's evaluate the quality of each model through the coefficient of determination, the average approximation error and Fisher's F-criterion. Let's find out which model is the best.

The coefficient of determination, the average approximation error, we obtained as a result of the calculations carried out in paragraph 3. The data obtained are presented in the following tables:

Data for X1:

Table 1.3a

Table 1.4b

A) The coefficient of determination determines what proportion of the variation of the attribute Y is taken into account in the model and is due to the influence of the factor X on it. more value coefficient of determination, the closer connection between features in the constructed mathematical model.

AT Excel program denoted R-square.

Based on this criterion, the most adequate model is the regression equation of the dependence of the price of the gross regional product on investment in fixed assets (X1).

B) Calculate the average approximation error using the formula:

where the numerator is the sum of the squared deviations of the calculated values ​​from the actual ones. In the tables, it is in the SS column, Residuals row.

We calculate the average value of the price of an apartment in Excel using the AVERAGE function. = 24.18182 billion rubles

When conducting economic calculations, the model is considered sufficiently accurate if mean error approximation is less than 5%, the model is considered acceptable if the average approximation error is less than 15%.

According to this criterion, the most adequate is the mathematical model for the regression equation of the dependence of the price of the gross regional product on investment in fixed assets (X1).

C) An F-test is used to test the significance of the regression model. For this, a comparison is also made of the critical (tabular) values ​​of Fisher's F-test.

The calculated values ​​are given in tables 1.4b (indicated by the letter F).

The table value of Fisher's F-test is calculated in Excel using the FDISP function. We take the probability equal to 0.05. Received: = 4.75

The calculated values ​​of Fisher's F-test for each factor are comparable with the table value:

71.02 > = 4.75 the model is adequate according to this criterion.

After analyzing the data for all three criteria, we can conclude that the best is the mathematical model built for the gross regional product factor, which is described by the linear equation

5. For the chosen model of the dependence of the price of the gross regional product

we will predict the average value of the indicator at the level of significance if the predicted value of the factor is 80% of its maximum value. Let's represent graphically: actual and model values, forecast points.

Calculate the predicted value of X, according to the condition, it will be 80% of the maximum value.

Calculate X max in Excel using the MAX function.

0,8 *52,8 = 42,24

To obtain predictive estimates of the dependent variable, we substitute the obtained value of the independent variable into the linear equation:

5.07 + 2.14 * 42.24 \u003d 304.55 billion rubles.

Let us determine the confidence interval of the forecast, which will have the following boundaries:

To calculate confidence interval for the predicted value, we calculate the deviation from the regression line.

For a paired regression model, the deviation value is calculated:

those. the standard error value from Table 1.5a.

(Since the number of degrees of freedom is equal to one, the denominator will be equal to n-2). correlation pairwise regression prediction

To calculate the coefficient, we use Excel function STUDRASPOBR, we take the probability equal to 0.1, the number of degrees of freedom is 38.

The value will be calculated from using Excel, we get 12294.


Let's define the upper and lower bounds of the interval.

  • 304,55+27,472= 332,022
  • 304,55-27,472= 277,078

Thus, the forecast value = 304.55 thousand dollars will be between the lower limit, equal to 277.078 thousand dollars. and an upper limit equal to 332.022 billion rubles. Rub.

Actual and model values, forecast points are presented graphically in Figure 1.2.


Figure 1.2

6. Using stepwise multiple regression (exclusion method), we will build a model for the formation of the price of the gross regional product due to significant factors

To build a multiple regression, we will use the Excel Regression function, including all factors in it. As a result, we obtain result tables, from which we need Student's t-test.

Table 1.8a

Table 1.8b

Table 1.8c.

We get the view model:

Insofar as< (4,75 < 71,024), уравнение регрессии следует признать адекватным.

Let's choose the smallest modulo value of Student's t-test, it is equal to 8.427, compare it with the tabular value that we calculate in Excel, take the significance level equal to 0.10, the number of degrees of freedom n-m-1=12-4=8: =1.8595

Since 8.427>1.8595 the model should be recognized as adequate.

7. To evaluate the significant factor of the obtained mathematical model, we calculate the coefficients of elasticity, and - the coefficients

The elasticity coefficient shows how many percent the resultant sign will change when the factor sign changes by 1%:

E X4 \u003d 2.137 * (10.69 / 24.182) \u003d 0.94%

That is, with an increase in investment in fixed capital by 1%, the cost increases by an average of 0.94%.

The coefficient shows by what part of the value of the standard deviation the average value of the dependent variable changes with a change in the independent variable by one standard deviation.

2,137* (14.736/33,632) = 0,936.

Mean data standard deviations taken from tables obtained with tools Descriptive statistics.

Table 1.11 Descriptive statistics (Y)

Table 1.12 Descriptive statistics (X4)

The coefficient determines the share of the influence of the factor in the total influence of all factors:

To calculate the pair correlation coefficients, we calculate the matrix of pair correlation coefficients in the Excel program using the Correlation tool of the Data Analysis settings.

Table 1.14

(0,93633*0,93626) / 0,87 = 1,00.

Conclusion: Based on the calculations obtained, we can conclude that the effective attribute Y (gross regional product) is highly dependent on factor X1 (investment in fixed capital) (by 100%).

Bibliography

  • 1. Magnus Ya.R., Katyshev P.K., Peresetsky A.A. Econometrics. Starting Course. Tutorial. 2nd ed. - M.: Delo, 1998. - p. 69 - 74.
  • 2. Workshop on econometrics: Textbook / I.I. Eliseeva, S.V. Kurysheva, N.M. Gordeenko and others 2002. - p. 49 - 105.
  • 3. Dougerty K. Introduction to econometrics: Per. from English. - M.: INFRA-M, 1999. - XIV, p. 262 - 285.
  • 4. Aivyzyan S.A., Mikhtiryan V.S. Applied Mathematics and fundamentals of econometrics. -1998., pp. 115-147.
  • 5. Kremer N.Sh., Putko B.A. Econometrics. -2007. from 175-251.