Biographies Characteristics Analysis

Estimation of the regression equation. Assessment of the significance of linear regression parameters and the entire equation as a whole

Pair Regression is a regression between two variables

-y and x, i.e. view model + E

Where at- effective sign, i.e. dependent variable; X- sign factor.

Linear regression is reduced to finding an equation of the form or

The equation of the form allows set values factor x to have the theoretical values ​​of the effective feature, substituting the actual values ​​of the factor x into it.

Building linear regression reduces to estimating its parameters a and b.

Linear regression parameter estimates can be found by different methods.

1.

2.

Parameter b called regression coefficient. Its value shows

the average change in the result with a change in the factor by one unit.

Formally a- meaning at at x = 0. If the sign-factor

does not and cannot have zero value, then the above

free term interpretation, a doesn't make sense. Parameter, a maybe

have no economic content. Attempts economically

interpret the parameter, a can lead to absurdity, especially when a < 0.

Only the sign of the parameter can be interpreted a. If a a > 0,

then the relative change in the result is slower than the change

checking the quality of the found parameters and the entire model as a whole:

-Assessment of the significance of the regression coefficient (b) and the correlation coefficient

-Assessing the significance of the entire regression equation. Determination coefficient

The regression equation is always supplemented with an indicator of the tightness of the connection. At

using linear regression as such an indicator is

linear correlation coefficient r xy . There are different

modifications of the linear correlation coefficient formula.

The linear correlation coefficient is in the limits: -1≤ .rxy

≤ 1. Moreover, the closer r to 0 the weaker the correlation and vice versa

the closer r is to 1 or -1, the stronger the correlation, i.e. the dependence of x and y is close to

linear. If a r exactly =1 or -1 all points lie on the same straight line.

If the coefficient regression b>0 then 0 ≤. rxy≤ 1 and

vice versa for b<0 -1≤.rxy≤0. Coef.

correlation reflects the degree of linear dependence of m / y values ​​in the presence of

pronounced dependence of another species.

To assess the quality of the selection of a linear function, the square of the linear

correlation coefficient

Called determination coefficient. Determination coefficient

characterizes the proportion of the variance of the resulting feature y, explained by

regression. Corresponding value

characterizes the share of dispersion y, caused by the influence of other unaccounted for

in the factor model.

OLS allows get such parameter estimates a and b, which

the sum of the squared deviations of the actual values ​​of the resulting attribute

(y) from calculated (theoretical)

minimum:

In other words, from

of the entire set of lines, the regression line on the chart is chosen so that the sum

squares of the vertical distance between the points and this line would be

minimum.

The system of normal equations is solved

ESTIMATION OF SIGNIFICANCE OF LINEAR REGRESSION PARAMETERS.

The assessment of the significance of the regression equation as a whole is given using the F-criterion

Fisher. In this case, the null hypothesis is put forward that the regression coefficient is equal to

zero, i.e. b= 0, and hence the factor X does not provide

influence on the result y.

The direct calculation of the F-criterion is preceded by an analysis of the variance.

Central to it is the expansion of the total sum of squared deviations

variable at from the average value at into two parts -

"explained" and "unexplained":

Total sum of squared deviations

Sum of squares

deviations explained by regression

Residual sum of squared deviation.

Any sum of squared deviations is related to the number of degrees of freedom , t.

e. with the number of freedom of independent variation of the feature. The number of degrees of freedom is related to the number of units of the population n and the number of constants determined from it. With regard to the problem under study, the number of degrees of freedom should show how many independent deviations from P possible required for

the formation of a given sum of squares.

Dispersion per degree of freedom D.

F-ratios (F-criterion):

If the null hypothesis is true, then the factor and residual variances are not

differ from each other. For H 0, a refutation is necessary so that

the factor variance exceeded the residual one by several times. English

statistician Snedecor developed tables of critical values ​​of F-ratios

at different levels of materiality null hypothesis and various degrees

freedom. The table value of the F-test is the maximum value of the ratio

variances, which can take place in case of their random divergence for a given

the probability level of the presence of the null hypothesis. Calculated value of F-ratio

is recognized as reliable if o is greater than the table value. In this case, zero

the hypothesis about the absence of a relationship of signs is rejected and a conclusion is made about

the significance of this relationship: F ​​fact > F table H 0

is rejected.

If the value is less than the tabular F fact ‹, F table

Then the probability of the null hypothesis is above a given level and it cannot be

rejected without serious risk of misleading the connection. AT

In this case, the regression equation is considered statistically insignificant. But

is not rejected.


Similar information.


100 r first order bonus

Choose the type of work Graduate work Course work Abstract Master's thesis Report on practice Article Report Review Test Monograph Problem solving Business plan Answers to questions creative work Essay Drawing Essays Translation Presentations Typing Other Increasing the uniqueness of the text Candidate's thesis Laboratory work On-line help

Ask for a price

After the linear regression equation is found, the significance estimation as an equation in general, as well as individual parameters. Check the significance of the regression equation means to determine whether mathematical model, expressing the relationship between variables, experimental data, and whether there are enough explanatory variables (one or more) included in the equation to describe the dependent variable. To have a general judgment about the quality of the model from the relative deviations for each observation, determine average approximation error: Average error approximation should not exceed 8–10%.

The assessment of the significance of the regression equation as a whole is based on F- Fisher's criterion preceded by analysis of variance. According to the main idea analysis of variance, total amount squared deviations of the variable y from the average y is decomposed into two parts - "explained" and "unexplained": where is the total sum of squared deviations; is the sum of squared deviations explained by regression (or the factorial sum of squared deviations); residual amount squared deviations, which characterizes the influence of factors unaccounted for in the model. Determining the dispersion per one degree of freedom brings the dispersions to a comparable form. Comparing the factor and residual dispersion per one degree of freedom, we obtain the value F- Fisher's criterion: actual value F- Fisher's criterion is compared with

table value F table(a; k 1; k 2) at significance level a and degrees of freedom k 1 = m and k 2= n-m-1.However, if the actual value F- the criterion is greater than the tabular one, then the statistical significance of the equation as a whole is recognized.

For Pairwise Linear Regression m=1, so

Value F-criteria is related to the coefficient of determination R2, it can be calculated using the following formula:

In paired linear regression, the significance of not only the equation as a whole, but also its individual parameters. For this purpose, for each of the parameters, its standard error: m b and m a. The standard error of the regression coefficient is determined by the formula: , where

The value of the standard error, together with t-Student's distribution at n-2 degrees of freedom is used to test the significance of the regression coefficient and to calculate its confidence interval. To assess the significance of the regression coefficient, its value is compared with its standard error, i.e. the actual value is determined t-Student's test: which is then compared with the tabular value at a certain level of significance a and the number of degrees of freedom (n-2). Confidence interval for the regression coefficient is defined as b± t tabl × mb. Since the sign of the regression coefficient indicates the growth of the effective feature y with an increase in the sign-factor x(b>0), a decrease in the effective feature with an increase in the feature-factor ( b<0) или его независимость от независимой переменной (b=0), then the boundaries of the confidence interval for the regression coefficient should not contain contradictory results, for example, -1.5 £ b£0.8. This kind of record indicates that the true value of the regression coefficient simultaneously contains positive and negative values ​​and even zero, which cannot be.

standard error parameter a is determined by the formula: The procedure for assessing the significance of this parameter does not differ from that considered above for the regression coefficient. Computed t-criterion: , its value is compared with the table value when n- 2 degrees of freedom.


TOPIC 4. STATISTICAL METHODS FOR STUDYING RELATIONSHIPS

Regression Equation - this is an analytical representation of the correlation dependence. The regression equation describes a hypothetical functional relationship between the conditional average value of the effective attribute and the value of the attribute - factor (factors), i.e. the underlying trend of addiction.

Pair correlation dependence is described by the pair regression equation, multiple correlation dependence - by the multiple regression equation.

The result attribute in the regression equation is the dependent variable (response, the variable being explained), and the attribute factor is the independent variable (the argument, the explanatory variable).

The simplest type of regression equation is the equation of a paired linear relationship:

where y is the dependent variable (sign-result); x is an independent variable (sign-factor); and are the parameters of the regression equation; - Estimation error.

Various mathematical functions can be used as a regression equation. The equations of linear dependence, parabola, hyperbola, steppe function, etc. find frequent practical application.

As a rule, the analysis begins with a linear relationship, since the results are easy to interpret meaningfully. The choice of the type of the constraint equation is a rather important step in the analysis. In the "pre-computer" era, this procedure was associated with certain difficulties and required the analyst to know the properties of mathematical functions. At present, on the basis of specialized programs, it is possible to quickly construct a set of communication equations and, based on formal criteria, select the best model (however, the mathematical literacy of an analyst has not lost its relevance).

A hypothesis about the type of correlation dependence can be put forward based on the results of constructing the correlation field (see lecture 6). Based on the nature of the location of the points on the graph (the coordinates of the points correspond to the values ​​of the dependent and independent variables), the trend of the relationship between the signs (indicators) is revealed. If the regression line passes through all points of the correlation field, then this indicates a functional relationship. In the practice of socio-economic research, such a picture cannot be observed, since there is a statistical (correlation) dependence. Under the conditions of correlation dependence, when drawing a regression line on a scatterplot, a deviation of the points of the correlation field from the regression line is observed, which demonstrates the so-called residuals or estimation errors (see Figure 7.1).

The presence of an equation error is due to the fact that:

§ not all factors influencing the result are taken into account in the regression equation;

§ the form of connection may be incorrectly chosen - the regression equation;

§ Not all factors are included in the equation.

To construct a regression equation means to calculate the values ​​of its parameters. The regression equation is built on the basis of the actual values ​​of the analyzed features. The calculation of parameters is usually performed using method least squares(MNK).

The essence of the MNC is that it is possible to obtain such values ​​of the parameters of the equation, at which the sum of the squared deviations of the theoretical values ​​of the attribute-result (calculated on the basis of the regression equation) from its actual values ​​is minimized:

,

where - the actual value of the sign-result of the i-th unit of the population; - the value of the sign-result of the i-th unit of the population, obtained by the regression equation ().

Thus, the problem is solved for an extremum, that is, it is necessary to find at what values ​​of the parameters, the function S reaches a minimum.

Carrying out differentiation, equating the partial derivatives to zero:



, (7.3)

, (7.4)

where is the average product of the factor and result values; - the average value of the sign - factor; - the average value of the sign-result; - variance of the sign-factor.

The parameter in the regression equation characterizes the slope of the regression line on the graph. This option is called regression coefficient and its value characterizes by how many units of its measurement the sign-result will change when the sign-factor changes by the unit of its measurement. The sign of the regression coefficient reflects the direction of the dependence (direct or inverse) and coincides with the sign of the correlation coefficient (under conditions of paired dependence).

Within the framework of the example under consideration, the STATISTICA program calculated the parameters of the regression equation that describes the relationship between the level of average per capita monetary income of the population and the value of the gross regional product per capita in the regions of Russia, see Table 7.1.

Table 7.1 - Calculation and evaluation of the parameters of the equation describing the relationship between the level of average per capita cash income of the population and the value of the gross regional product per capita in the regions of Russia, 2013

Column "B" of the table contains the values ​​of the parameters of the pair regression equation, therefore, we can write: = 13406.89 + 22.82 x. This equation describes the trend of the relationship between the analyzed characteristics. The parameter is the regression coefficient. In this case, it is equal to 22.82 and characterizes the following: with an increase in GRP per capita by 1 thousand rubles, average per capita cash incomes increase on average (as indicated by the "+" sign) by 22.28 rubles.

The parameter of the regression equation in socio-economic studies, as a rule, is not meaningfully interpreted. Formally, it reflects the value of the sign - the result, provided that the sign - factor is equal to zero. The parameter characterizes the location of the regression line on the graph, see Figure 7.1.

Figure 7.1 - Correlation field and regression line, reflecting the dependence of the level of average per capita monetary income of the population in the regions of Russia and the value of GRP per capita

The parameter value corresponds to the point of intersection of the regression line with the Y-axis, at X=0.

The construction of the regression equation is accompanied by an estimate statistical significance the equation as a whole and its parameters. The need for such procedures is associated with a limited amount of data, which may prevent the operation of the law of large numbers and, therefore, the identification of a true trend in the relationship of the analyzed indicators. In addition, any studied population can be considered as a sample from the general population, and the characteristics obtained during the analysis as an estimate of the general parameters.

The assessment of the statistical significance of the parameters and the equation as a whole is the substantiation of the possibility of using the constructed communication model for making managerial decisions and forecasting (modeling).

Statistical Significance of the Regression Equation in general is estimated using Fisher F-test, which is the ratio of the factorial and residual variances calculated for one degree of freedom:

where - factor variance of the feature - result; k is the number of degrees of freedom of factorial dispersion (the number of factors in the regression equation); - the mean value of the dependent variable; - theoretical (obtained by the regression equation) value of the dependent variable for the i-th unit of the population; - residual variance of the sign - result; n is the volume of the population; n-k-1 is the number of degrees of freedom of the residual dispersion.

The value of Fisher's F-test, according to the formula, characterizes the ratio between the factor and residual variances of the dependent variable, demonstrating, in essence, how many times the value of the explained part of the variation exceeds the unexplained one.

Fisher's F-test is tabulated, the input to the table is the number of degrees of freedom of the factorial and residual variances. Comparison of the calculated value of the criterion with the tabular (critical) one allows answering the question: is that part of the variation of the trait-result that can be explained by the factors included in the equation of this type statistically significant? If a , then the regression equation is recognized as statistically significant and, accordingly, the coefficient of determination is also statistically significant. Otherwise ( ), the equation is statistically insignificant, i.e. the variation of the factors taken into account in the equation does not explain the statistically significant part of the variation of the trait-result, or the relationship equation is not correctly chosen.

Estimation of the statistical significance of the parameters of the equation carried out on the basis t-statistics, which is calculated as the ratio of the absolute value of the parameters of the regression equation to their standard errors ( ):

, where ; (7.6)

, where ; (7.7)

where - standard deviations of the sign - factor and sign - result; - coefficient of determination.

In specialized statistical programs, the calculation of parameters is always accompanied by the calculation of their standard (root mean square) errors and t-statistics (see Table 7.1). The calculated value of t-statistics is compared with the tabular one, if the volume of the studied population is less than 30 units (certainly a small sample), one should refer to the Student's t-distribution table, if the population volume is large, one should use the normal distribution table (Laplace's probability integral). An equation parameter is considered statistically significant if.

Estimation of parameters based on t-statistics, in essence, is a test of the null hypothesis about the equality of the general parameters to zero (H 0: =0; H 0: =0;), that is, about a statistically insignificant value of the parameters of the regression equation. The significance level of the hypothesis, as a rule, is taken: = 0.05. If the calculated significance level is less than 0.05, then the null hypothesis is rejected and the alternative one is accepted - about the statistical significance of the parameter.

Let's continue with the example. Table 7.1 in column "B" shows the values ​​of the parameters, in the column Std.Err.ofB - the values ​​of the standard errors of the parameters ( ), in the column t (77 - the number of degrees of freedom) the values ​​of t - statistics are calculated taking into account the number of degrees of freedom. To assess the statistical significance of the parameters, the calculated values ​​of t-statistics must be compared with the table value. The given level of significance (0.05) in the normal distribution table corresponds to t = 1.96. Since 18.02, 10.84, i.e. , one should recognize the statistical significance of the obtained parameter values, i.e. these values ​​are formed under the influence of non-random factors and reflect the trend of the relationship between the analyzed indicators.

To assess the statistical significance of the equation as a whole, we turn to the value of Fisher's F-test (see Table 7.1). Estimated value of F-criterion = 117.51, table value criterion, based on the corresponding number of degrees of freedom (for factorial variance d.f. =1, for residual variance d.f. =77), is 4.00 (see Appendix .....). Thus, , therefore, the regression equation as a whole is statistically significant. In such a situation, we can also talk about the statistical significance of the value of the coefficient of determination, i.e. The 60 percent variation in average per capita incomes of the population in the regions of Russia can be explained by the variation in the volume of gross regional product per capita.

By assessing the statistical significance of the regression equation and its parameters, we can get a different combination of results.

· Equation by F-test is statistically significant and all parameters of the equation by t-statistics are also statistically significant. This equation can be used both for making managerial decisions (which factors should be influenced in order to obtain the desired result), and for predicting the behavior of the result attribute for certain values ​​of the factors.

· According to the F-criterion, the equation is statistically significant, but the parameters (parameter) of the equation are insignificant. The equation can be used to make management decisions (concerning those factors for which the statistical significance of their influence has been confirmed), but the equation cannot be used for forecasting.

· The F-test equation is not statistically significant. The equation cannot be used. The search for significant signs-factors or an analytical form of the connection between the argument and the response should be continued.

If the statistical significance of the equation and its parameters is confirmed, then the so-called point forecast can be implemented, i.e. an estimate of the value of the attribute-result (y) was obtained for certain values ​​of the factor (x).

It is quite obvious that the predicted value of the dependent variable, calculated on the basis of the relation equation, will not coincide with its actual value ( ). Graphically, this situation is confirmed by the fact that not all points of the correlation field lie on the regression line, only with a functional connection the regression line will pass through all points of the scatter diagram. The presence of discrepancies between the actual and theoretical values ​​of the dependent variable is primarily due to the very essence of the correlation dependence: at the same time, many factors affect the result, of which only a part can be taken into account in a specific relationship equation. In addition, the form of the relationship between the result and the factor (the type of regression equation) may be incorrectly chosen. In this regard, the question arises of how informative the constructed constraint equation is. This question is answered by two indicators: the coefficient of determination (it has already been discussed above) and the standard error of estimation.

The difference between the actual and theoretical values ​​of the dependent variable is called deviations or errors, or leftovers. Based on these values, the residual variance is calculated. The square root of the residual variance is root-mean-square (standard) estimation error:

= (7.8)

The standard error of the equation is measured in the same units as the predicted rate. If the equation errors follow a normal distribution (with large amounts of data), then 95 percent of the values ​​​​should be from the regression line at a distance not exceeding 2S (based on the property of a normal distribution - the rule of three sigma). The value of the standard error of estimation is used in the calculation of confidence intervals when predicting the value of a sign - the result for a specific unit of the population.

In practical research, it often becomes necessary to predict the average value of a feature - the result for a particular value of the feature - factor. In this case, in the calculation of the confidence interval for the mean value of the dependent variable()

the value of the average error is taken into account:

(7.9)

The use of different error values ​​is explained by the fact that the variability of the levels of indicators for specific units of the population is much higher than the variability of the mean value, therefore, the forecast error of the mean value is smaller.

Confidence interval of the forecast of the mean value of the dependent variable:

, (7.10)

where - marginal estimation error (see sampling theory); t is the confidence coefficient, the value of which is in the corresponding table, based on the level of probability adopted by the researcher (number of degrees of freedom) (see sampling theory).

The confidence interval for the predicted value of the result attribute can also be calculated taking into account the correction for the shift (shift) of the regression line. The value of the correction factor is determined by:

(7.11)

where is the value of the attribute-factor, based on which the value of the attribute-result is predicted.

It follows that the more the value differs from the average value of the attribute-factor, the greater the value of the correction factor, the greater the forecast error. Given this coefficient, the confidence interval of the forecast will be calculated:

The accuracy of the forecast based on the regression equation can be affected by various reasons. First of all, it should be taken into account that the evaluation of the quality of the equation and its parameters is based on the assumption of a normal distribution of random residuals. Violation of this assumption may be due to the presence of sharply different values ​​in the data, with non-uniform variation, with the presence of a non-linear relationship. In this case, the quality of the forecast is reduced. The second point to keep in mind is that the values ​​of the factors taken into account when predicting the result should not go beyond the range of variation in the data on which the equation is built.

©2015-2019 site
All rights belong to their authors. This site does not claim authorship, but provides free use.
Page creation date: 2018-01-08

We will check the significance of the regression equation based on

Fisher F-test:

The value of Fisher's F-test can be found in the table Analysis of Variance of the Excel protocol. The tabular value of the F-criterion with a confidence probability α = 0.95 and the number of degrees of freedom equal to v1 = k = 2 and v2 = n – k – 1= 50 – 2 – 1 = 47 is 0.051.

Since Fcalc > Ftabl, the regression equation should be recognized as significant, that is, it can be used for analysis and forecasting.

The assessment of the significance of the coefficients of the obtained model, using the results of the Excel report, can be done in three ways.

The coefficient of the regression equation is recognized as significant if:

1) the observed value of Student's t-statistics for this coefficient is greater than the critical (tabular) value of Student's statistics (for a given significance level, for example, α = 0.05, and the number of degrees of freedom df = n – k – 1, where n is the number observations, and k is the number of factors in the model);

2) Student's t-statistic p-value for this coefficient is less than the significance level, for example, α = 0.05;

3) the confidence interval for this coefficient, calculated with a certain confidence probability (for example, 95%), does not contain zero inside itself, that is, the lower 95% and upper 95% boundaries of the confidence interval have the same signs.

Significance of coefficients a1 and a2 Let's check the second and third methods:

p-value ( a1 ) = 0,00 < 0,01 < 0,05.

p-value ( a2 ) = 0,00 < 0,01 < 0,05.

Therefore, the coefficients a1 and a2 are significant at the 1% level, and even more so at the 5% significance level. The lower and upper 95% boundaries of the confidence interval have the same signs, therefore, the coefficients a1 and a2 significant.

Definition of an explanatory variable from which

The variance of random perturbations may depend.

Checking the fulfillment of the homoscedasticity condition

Residuals according to the Goldfeld–Quandt test

When testing the OLS premise of homoscedasticity of residuals in a multiple regression model, one should first determine for which of the factors the variance of the residuals is most disturbed. This can be done by visually examining the residual plots for each of the factors included in the model. That of the explanatory variables, on which the variance of random perturbations depends more, will be ordered by increasing actual values ​​when checking the Goldfeld–Quandt test. Graphs are easy to get in the report, which is generated as a result of using the Regression tool in the Data Analysis package).

Graphs of residuals for each of the factors of the two-factor model

From the graphs presented, it can be seen that the dispersion of balances is most violated in relation to the factor Short-term receivables.

Let us check the presence of homoscedasticity in the residuals of the two-factor model based on the Goldfeld–Quandt test.

    Let's sort the variables Y and X2 in ascending order of the factor X4 (in Excel, you can use the command Data - Sort in ascending X4 order):

    Data sorted in ascending X4:

  1. Let us remove C = 1/4 n = 1/4 50 = 12.5 (12) values ​​from the middle of the ordered set. As a result, we obtain two populations, respectively, with small and large values ​​of X4.

    For each set, we perform the calculations:

Sum

111234876536,511

966570797682,068

455748832843,413

232578961097,877

834043911651,192

193722998259,505

1246409153509,290

31419681912489,100

2172804245053,280

768665257272,099

2732445494273,330

163253156450,331

18379855056009,900

10336693841766,000

Sum

69977593738424,600

Set Equations

Y = -27275.746 + 0.126X2 + 1.817X4

Y = 61439.511 + 0.228X2 + 0.140X4

The results of this table were obtained using the Regression tool in turn to each of the obtained populations.

4. Find the ratio of the resulting residual sums of squares

(the numerator must be a larger amount):

5. The conclusion about the presence of homoscedasticity of the residuals is made using the Fisher F-test with a significance level α = 0.05 and two identical degrees of freedom k1 = k2 = == 17

where p is the number of parameters of the regression equation:

Ftable (0.05; 17; 17) = 9.28.

Since Ftabl > R, homoscedasticity is confirmed in the residuals of the two-factor regression.

Estimation of the significance of the parameters of the regression equation

The significance of the parameters of the linear regression equation is estimated using the Student's t-test:

if t calc. > t cr, then the main hypothesis is accepted ( Ho), indicating the statistical significance of the regression parameters;

if t calc.< t cr, then the alternative hypothesis is accepted ( H1), indicating the statistical insignificance of the regression parameters.

where m a , m b are the standard errors of the parameters a and b:

(2.19)

(2.20)

The critical (tabular) value of the criterion is found using the statistical tables of the Student's distribution (Appendix B) or according to the tables excel(section of the function wizard "Statistical"):

t cr = STEUDRASP( α=1-P; k=n-2), (2.21)

where k=n-2 also represents the number of degrees of freedom .

The estimate of statistical significance can also be applied to the linear correlation coefficient

where m r is the standard error of determining the values ​​of the correlation coefficient r yx

(2.23)

Below are the options for tasks for practical and laboratory work on the subject of the second section.

Questions for self-examination in section 2

1. Specify the main components of the econometric model and their essence.

2. The main content of the stages of econometric research.

3. Essence of approaches to determine the parameters of linear regression.

4. The essence and peculiarity of the application of the least squares method in determining the parameters of the regression equation.

5. What indicators are used to assess the closeness of the relationship of the studied factors?

6. Essence linear coefficient correlations.

7. The essence of the coefficient of determination.

8. The essence and main features of the procedures for assessing the adequacy (statistical significance) regression models.

9. Assessment of the adequacy of linear regression models by the coefficient of approximation.

10. The essence of the approach for assessing the adequacy of regression models by the Fisher criterion. Determination of empirical and critical values ​​of the criterion.

11. The essence of the concept of "dispersion analysis" in relation to econometric studies.

12. The essence and main features of the procedure for assessing the significance of parameters linear equation regression.

13. Features of the application of the Student's distribution in assessing the significance of the parameters of the linear regression equation.

14. What is the task of forecasting single values ​​of the studied socio-economic phenomenon?

1. Build a correlation field and formulate an assumption about the form of the relationship equation of the studied factors;

2. Write down the basic equations of the least squares method, make the necessary transformations, compile a table for intermediate calculations and determine the parameters of the linear regression equation;

3. Verify the correctness of the calculations performed using standard procedures and functions of electronic Excel tables.

4. Analyze the results, formulate conclusions and recommendations.

1. Calculation of the value of the linear correlation coefficient;

2. Construction of a dispersion analysis table;

3. Assessment of the coefficient of determination;

4. Verify the correctness of the calculations performed using standard procedures and functions spreadsheets Excel.

5. Analyze the results, formulate conclusions and recommendations.

4. Spend overall score the adequacy of the chosen regression equation;

1. Assessment of the adequacy of the equation by the values ​​of the approximation coefficient;

2. Assessment of the adequacy of the equation by the values ​​of the coefficient of determination;

3. Assessment of the adequacy of the equation by the Fisher criterion;

4. Conduct a general assessment of the adequacy of the parameters of the regression equation;

5. Verify the correctness of the calculations performed using standard procedures and functions of Excel spreadsheets.

6. Analyze the results, formulate conclusions and recommendations.

1. Using the standard procedures of the Excel Spreadsheet Function Wizard (from the "Mathematical" and "Statistical" sections);

2. Data preparation and features of using the "LINEST" function;

3. Data preparation and features of using the "PREDICTION" function.

1. Using the standard procedures of the Excel spreadsheet data analysis package;

2. Preparation of data and features of the application of the "REGRESSION" procedure;

3. Interpretation and generalization of table data regression analysis;

4. Interpretation and generalization of the data of the dispersion analysis table;

5. Interpretation and generalization of the data of the table for assessing the significance of the parameters of the regression equation;

When performing laboratory work according to one of the options, it is necessary to perform the following particular tasks:

1. Make a choice of the form of the equation of the relationship of the studied factors;

2. Determine the parameters of the regression equation;

3. To assess the tightness of the relationship of the studied factors;

4. Assess the adequacy of the selected regression equation;

5. Evaluate the statistical significance of the parameters of the regression equation.

6. Verify the correctness of the calculations performed using standard procedures and functions of Excel spreadsheets.

7. Analyze the results, formulate conclusions and recommendations.

Tasks for practical and laboratory work on the topic "Paired linear regression and correlation in econometric studies."

Option 1 Option 2 Option 3 Option 4 Option 5
x y x y x y x y x y
Option 6 Option 7 Option 8 Option 9 Option 10
x y x y x y x y x y