Biographies Characteristics Analysis

Estimation of the regression equation. Assessing the Significance of the Multiple Regression Equation

To assess the significance, significance of the correlation coefficient, Student's t-test is used.

The average error of the correlation coefficient is found by the formula:

H
and based on the error, the t-test is calculated:

The calculated value of the t-test is compared with the tabular value found in the Student's distribution table at a significance level of 0.05 or 0.01 and the number of degrees of freedom n-1. If the calculated value of the t-test is greater than the tabulated one, then the correlation coefficient is recognized as significant.

With a curvilinear relationship, the F-criterion is used to assess the significance of the correlation relationship and the regression equation. It is calculated by the formula:

or

where η is the correlation ratio; n is the number of observations; m is the number of parameters in the regression equation.

The calculated value of F is compared with the table value for the accepted level of significance α (0.05 or 0.01) and the number of degrees of freedom k 1 =m-1 and k 2 =n-m. If the calculated value of F exceeds the tabulated value, the relationship is recognized as significant.

The significance of the regression coefficient is established using Student's t-test, which is calculated by the formula:

where σ 2 and i is the variance of the regression coefficient.

It is calculated by the formula:

where k is the number of factor features in the regression equation.

The regression coefficient is recognized as significant if t a 1 ≥t cr. t cr is found in the table of critical points of Student's distribution at the accepted level of significance and the number of degrees of freedom k=n-1.

4.3 Correlation-regression analysis in Excel

Let's carry out a correlation-regression analysis of the relationship between yield and labor costs per 1 quintal of grain. To do this, open an Excel sheet, in cells A1: A30 enter the values ​​of the factor attribute productivity of grain crops, in cells B1: B30 the values ​​of the effective feature - labor costs per 1 quintal of grain. From the Tools menu, select the Data Analysis option. Left-clicking on this item will open the Regression tool. Click on the OK button, the Regression dialog box appears on the screen. In the Input interval Y field, enter the values ​​of the resulting attribute (highlighting cells B1:B30), in the Input interval X field, enter the values ​​of the factor attribute (highlighting cells A1:A30). We mark the probability level of 95%, select New worksheet. We click on the OK button. The table “RESULTS” appears on the worksheet, in which the results of calculating the parameters of the regression equation, the correlation coefficient and other indicators are given, allowing you to determine the significance of the correlation coefficient and the parameters of the regression equation.

RESULTS

Regression statistics

Multiple R

R-square

Normalized R-square

standard error

Observations

Analysis of variance

Significance F

Regression

Odds

standard error

t-statistic

P-Value

bottom 95%

Top 95%

Lower 95.0%

Top 95.0%

Y-intersection

Variable X 1

In this table, "Multiple R" is the correlation coefficient, "R-squared" is the coefficient of determination. "Coefficients: Y-intersection" - a free term of the regression equation 2.836242; "Variable X1" - regression coefficient -0.06654. There are also values ​​of Fisher's F-test 74.9876, Student's t-test 14.18042, " standard error 0.112121”, which are necessary to assess the significance of the correlation coefficient, the parameters of the regression equation and the entire equation.

Based on the data in the table, we construct a regression equation: y x ​​\u003d 2.836-0.067x. The regression coefficient a 1 = -0.067 means that with an increase in grain yield by 1 quintal/ha, labor costs per 1 quintal of grain decrease by 0.067 man-hours.

Correlation coefficient r=0.85>0.7, therefore, the relationship between the studied features in this population is close. The coefficient of determination r 2 =0.73 shows that 73% of the variation of the effective trait (labor costs per 1 centner of grain) is caused by the action of the factor trait (grain yield).

Table critical points distribution of Fisher - Snedecor, we find the critical value of the F-criterion at a significance level of 0.05 and the number of degrees of freedom k 1 =m-1=2-1=1 and k 2 =n-m=30-2=28, it is equal to 4.21. Since the calculated value of the criterion is greater than the tabular value (F=74.9896>4.21), the regression equation is recognized as significant.

To assess the significance of the correlation coefficient, we calculate the Student's t-test:

AT
In the table of critical points of the Student's distribution, we find the critical value of the t-test at a significance level of 0.05 and the number of degrees of freedom n-1=30-1=29, it is equal to 2.0452. Since the calculated value is greater than the tabulated one, the correlation coefficient is significant.

After finding the equation linear regression, the significance of both the equation as a whole and its individual parameters is estimated.

Check the significance of the regression equation - means to determine whether mathematical model, expressing the relationship between variables, experimental data, and whether there are enough explanatory variables (one or more) included in the equation to describe the dependent variable.

Significance testing is based on analysis of variance.

According to the idea of ​​analysis of variance, total amount squared deviations (RMSD) y from the mean value is decomposed into two parts - explained and unexplained:

or, respectively:

There are two extreme cases here: when the total standard deviation is exactly equal to the residual and when the total standard deviation is equal to the factorial.

In the first case, the x factor does not affect the result, the entire variance of y is due to the influence of other factors, the regression line is parallel to the Ox axis, and the equation should look like.

In the second case, other factors do not affect the result, y is related to x functionally, and the residual standard deviation is zero.

However, in practice both terms are present on the right-hand side. The suitability of the regression line for forecasting depends on which part general variation y falls on the explained variation. If the explained RMSD is greater than the residual RMSD, then the regression equation is statistically significant and the x factor has a significant effect on the y result. This is equivalent to the fact that the coefficient of determination will approach unity.

The number of degrees of freedom (df-degrees of freedom) is the number of independently variable feature values.

The overall standard deviation requires (n-1) independent deviations,

The factorial standard deviation has one degree of freedom, and

Thus, we can write:

From this balance, we determine that = n-2.

Dividing each standard deviation by its number of degrees of freedom, we get middle square deviations, or dispersion per one degree of freedom: - total variance, - factorial, - residual.

Analysis statistical significance linear regression coefficients

Although the theoretical values ​​of the coefficients of the linear dependence equation are assumed constants, estimates a and b of these coefficients obtained in the course of constructing the equation from the data random sample, are random variables. If the regression errors have normal distribution, then the estimates of the coefficients are also normally distributed and can be characterized by their mean values ​​and variance. Therefore, the analysis of the coefficients begins with the calculation of these characteristics.

Coefficient variances are calculated by the formulas:

Variance of the regression coefficient:

where - residual dispersion one degree of freedom.

Parameter dispersion:

Hence, the standard error of the regression coefficient is determined by the formula:

The standard error of the parameter is determined by the formula:

They serve to test null hypotheses that the true value of the regression coefficient b or intercept a is zero: .

The alternative hypothesis has the form: .

t-statistics have t-student distribution with degrees of freedom. According to Student's distribution tables, at a certain level of significance b and degrees of freedom, a critical value is found.

If, then, the null hypothesis must be rejected, the coefficients are considered statistically significant.

If, then the null hypothesis cannot be rejected. (If the coefficient b is statistically insignificant, the equation should look like this, and this means that there is no relationship between the features. If the coefficient a is statistically insignificant, it is recommended to evaluate the new equation in the form).

Interval coefficient estimates linear equation regressions:

Confidence interval for a: .

Confidence interval for b:

This means that with a given reliability (where is the significance level) true values a, b are in the indicated intervals.

The regression coefficient has a clear economic interpretation, so the confidence limits of the interval should not contain inconsistent results, for example, They should not include zero.

Analysis of the statistical significance of the equation as a whole.

Fisher distribution in regression analysis

The assessment of the significance of the regression equation as a whole is given using Fisher's F-test. In this case, the null hypothesis is put forward that all regression coefficients, with the exception of the free term a, are equal to zero and, therefore, the x factor does not affect the result y (or).

The value of F - criterion is associated with the coefficient of determination. When multiple regression:

where m is the number of independent variables.

When pairwise regression formula F - statistics takes the form:

When finding the tabular value of the F-criterion, a significance level is set (usually 0.05 or 0.01) and two degrees of freedom: - in the case of multiple regression, - for paired regression.

If, then it is rejected and a conclusion is made about the significance of the statistical relationship between y and x.

If, then the probability of the regression equation considered statistically insignificant is not rejected.

Comment. In pairwise linear regression. Also, therefore. Thus, testing hypotheses about the significance of the regression and correlation coefficients is equivalent to testing the hypothesis about the significance of the linear regression equation.

The Fisher distribution can be used not only to test the hypothesis that all linear regression coefficients are simultaneously equal to zero, but also the hypothesis that some of these coefficients are equal to zero. This is important in the development of a linear regression model, as it allows assessing the validity of excluding individual variables or their groups from the number of explanatory variables, or, conversely, including them in this number.

Let, for example, the multiple linear regression was first estimated for n observations with m explanatory variables, and the coefficient of determination is equal, then the last k variables are excluded from the list of explanatory variables, and the equation for which the coefficient of determination is (, because (each additional variable explains a portion, however small, of the variation in the dependent variable).

In order to test the hypothesis about the simultaneous equality to zero of all coefficients with excluded variables, the value is calculated

which has a Fisher distribution with degrees of freedom.

According to Fisher's distribution tables, at a given significance level, they find. And if, then the null hypothesis is rejected. In this case, it is incorrect to exclude all k variables from the equation.

Similar reasoning can be carried out about the validity of including one or more k new explanatory variables in the regression equation.

In this case, F is calculated - statistics

having a distribution. And if it exceeds critical level, then the inclusion of new variables explains a significant part of the previously unexplained variance of the dependent variable (i.e., the inclusion of new explanatory variables is justified).

Remarks. 1. It is advisable to include new variables one at a time.

2. To calculate F - statistics, when considering the inclusion of explanatory variables in the equation, it is desirable to consider the coefficient of determination adjusted for the number of degrees of freedom.

F - Fisher statistics is also used to test the hypothesis about the coincidence of the regression equations for individual groups observations.

Let there be 2 samples containing, respectively, observations. For each of these samples, the species regression equation was evaluated. Let the standard deviation from the regression line (i.e.) be equal for them, respectively, .

The null hypothesis is tested: that all the corresponding coefficients of these equations are equal to each other, i.e. the regression equation for these samples is the same.

Let the regression equation of the same type be estimated for all observations at once, and RMS.

Then F is calculated - statistics according to the formula:

It has a Fisher distribution with degrees of freedom. F - statistics will be close to zero if the equation for both samples is the same, because in this case. Those. if, then the null hypothesis is accepted.

If, then the null hypothesis is rejected, and a single regression equation cannot be constructed.

After the regression equation is built and its accuracy is estimated using the coefficient of determination, it remains open question due to what this accuracy is achieved and, accordingly, can this equation be trusted. The fact is that the regression equation was not built according to population, which is unknown, but from a sample of it. Points from the general population fall into the sample randomly, therefore, in accordance with the theory of probability, among other cases, it is possible that the sample from the “broad” general population turns out to be “narrow” (Fig. 15).

Rice. fifteen. Possible variant hit points in the sample from the general population.

In this case:

a) the regression equation built on the sample may differ significantly from the regression equation for the general population, which will lead to forecast errors;

b) the coefficient of determination and other characteristics of accuracy will be unreasonably high and will mislead about the predictive qualities of the equation.

In the limiting case, the variant is not excluded, when from the general population, which is a cloud with the main axis parallel to the horizontal axis (there is no connection between the variables), a sample will be obtained due to random selection, the main axis of which will be inclined to the axis. Thus, attempts to predict the next values ​​of the general population based on sample data from it are fraught not only with errors in assessing the strength and direction of the relationship between the dependent and independent variables, but also with the danger of finding a relationship between variables where there is actually none.

In the absence of information about all points of the general population the only way to reduce errors in the first case is to use a method in estimating the coefficients of the regression equation that ensures their unbiasedness and efficiency. And the probability of the occurrence of the second case can be significantly reduced due to the fact that one property of the general population with two variables independent of each other is known a priori - it is this connection that is absent in it. This reduction is achieved by checking the statistical significance of the resulting regression equation.

One of the most commonly used verification options is as follows. For the resulting regression equation, -statistics is determined - a characteristic of the accuracy of the regression equation, which is the ratio of that part of the variance of the dependent variable that is explained by the regression equation to the unexplained (residual) part of the variance. The equation for determining -statistics in the case of multivariate regression is:

where: - explained variance - part of the variance of the dependent variable Y, which is explained by the regression equation;

Residual variance - part of the variance of the dependent variable Y that is not explained by the regression equation, its presence is a consequence of the action of a random component;

Number of points in the sample;

The number of variables in the regression equation.

As can be seen from the above formula, the variances are defined as the quotient of dividing the corresponding sum of squares by the number of degrees of freedom. The number of degrees of freedom is the minimum required number of values ​​of the dependent variable that are sufficient to obtain the desired sample characteristic and which can freely vary, given that all other quantities used to calculate the desired characteristic are known for this sample.

To obtain the residual variance, the coefficients of the regression equation are needed. In the case of pairwise linear regression, there are two coefficients, therefore, in accordance with the formula (assuming ), the number of degrees of freedom is . This means that to determine the residual variance, it is sufficient to know the coefficients of the regression equation and only the values ​​of the dependent variable from the sample. The remaining two values ​​can be calculated from these data and are therefore not freely variable.

To calculate the explained variance, the values ​​of the dependent variable are not required at all, since it can be calculated by knowing the regression coefficients for the independent variables and the variance of the independent variable. To see this, it suffices to recall the expression given earlier . Therefore, the number of degrees of freedom for the residual variance is equal to the number of independent variables in the regression equation (for paired linear regression).

As a result, the -criterion for the paired linear regression equation is determined by the formula:

.

In probability theory, it has been proven that the -criterion of the regression equation obtained for a sample from the general population in which there is no connection between the dependent and independent variable has a Fisher distribution, which is quite well studied. Due to this, for any value of the -criterion, it is possible to calculate the probability of its occurrence and vice versa, to determine the value of the -criterion that it cannot exceed with a given probability.

For implementation statistical check the significance of the regression equation, a null hypothesis is formulated about the absence of a relationship between the variables (all coefficients for the variables are equal to zero) and the significance level is selected.

The significance level is the acceptable probability of making a Type I error - rejecting the correct one as a result of the check. null hypothesis. In this case, to make a Type I error means to recognize from the sample the presence of a relationship between the variables in the general population, when in fact it is not there.

The significance level is usually taken to be 5% or 1%. The higher the significance level (the smaller ), the higher the test reliability level equal to , i.e. the greater the chance of avoiding the sampling error of the existence of a relationship in the population of variables that are actually unrelated. But with an increase in the level of significance, the risk of committing an error of the second kind increases - to reject the correct null hypothesis, i.e. not to notice in the sample the actual relationship of variables in the general population. Therefore, depending on which error has large Negative consequences, choose one or another level of significance.

For the selected significance level according to the Fisher distribution, a tabular value is determined, the probability of exceeding which in the sample with power , obtained from the general population without a relationship between variables, does not exceed the significance level. compared with the actual value of the criterion for the regression equation .

If the condition is met, then the erroneous detection of a relationship with the value of the -criterion equal to or greater in the sample from the general population with unrelated variables will occur with a probability less than the significance level. According to the "very rare events does not happen”, we come to the conclusion that the relationship between the variables established by the sample is also present in the general population from which it was obtained.

If it turns out, then the regression equation is not statistically significant. In other words, there is a real probability that a relationship between variables that does not exist in reality has been established in the sample. An equation that fails the test for statistical significance is treated the same as an expired drug.

Tee - such medicines are not necessarily spoiled, but since there is no confidence in their quality, they are preferred not to be used. This rule does not protect against all errors, but it allows you to avoid the most gross ones, which is also quite important.

The second verification option, more convenient in the case of using spreadsheets, is a comparison of the probability of occurrence of the obtained criterion value with the significance level. If this probability is below the significance level , then the equation is statistically significant, otherwise it is not.

After the statistical significance of the regression equation is checked, it is generally useful, especially for multivariate dependencies, to check for the statistical significance of the obtained regression coefficients. The ideology of checking is the same as when checking the equation as a whole, but as a criterion, the Student's criterion is used, which is determined by the formulas:

and

where: , - Student's criterion values ​​for coefficients and respectively;

- residual variance of the regression equation;

Number of points in the sample;

The number of variables in the sample, for pairwise linear regression.

The obtained actual values ​​of Student's criterion are compared with table values obtained from Student's distribution. If it turns out that , then the corresponding coefficient is statistically significant, otherwise it is not. The second option for checking the statistical significance of the coefficients is to determine the probability of the occurrence of Student's t-test and compare with the significance level .

Variables whose coefficients are not statistically significant are likely to have no effect on the dependent variable in the population at all. Therefore, either it is necessary to increase the number of points in the sample, then it is possible that the coefficient will become statistically significant and at the same time its value will be specified, or, as independent variables, find others that are more closely related to the dependent variable. In this case, the forecasting accuracy will increase in both cases.

As an express method for assessing the significance of the coefficients of the regression equation, the following rule can be applied - if the Student's criterion is greater than 3, then such a coefficient, as a rule, turns out to be statistically significant. In general, it is believed that in order to obtain statistically significant equations regression is necessary for the condition to be satisfied.

The standard error of forecasting according to the obtained regression equation unknown value when known, it is estimated by the formula:

Thus, a forecast with a confidence level of 68% can be represented as:

In the event that a different confidence level, then for the significance level it is necessary to find the Student's criterion and confidence interval for a forecast with a level of reliability will be equal to .

Prediction of multidimensional and non-linear dependencies

If the predicted value depends on several independent variables, then in this case there is a multivariate regression of the form:

where: - regression coefficients describing the influence of variables on the predicted value.

The technique for determining the regression coefficients does not differ from paired linear regression, especially when using spreadsheet, since the same function is used there for both paired and multivariate linear regression. In this case, it is desirable that there are no relationships between the independent variables, i.e. changing one variable did not affect the values ​​of other variables. But this requirement is not mandatory, it is important that there are no functional functions between the variables. linear dependencies. The above procedures for checking the statistical significance of the obtained regression equation and its individual coefficients, the assessment of forecasting accuracy remains the same as for the case of paired linear regression. At the same time, the use of multivariate regressions instead of a pair regression usually allows, with an appropriate choice of variables, to significantly improve the accuracy of describing the behavior of the dependent variable, and hence the accuracy of forecasting.

In addition, the equations of multivariate linear regression make it possible to describe the non-linear dependence of the predicted value on independent variables. Casting procedure nonlinear equation to linear form is called linearization. In particular, if this dependence is described by a polynomial of degree different from 1, then by replacing variables with degrees different from unity with new variables in the first degree, we obtain a multivariate linear regression problem instead of a nonlinear one. So, for example, if the influence of the independent variable is described by a parabola of the form

then the replacement allows us to transform the nonlinear problem to a multidimensional linear problem of the form

Just as easily can be converted non-linear problems for which non-linearity arises due to the fact that the predicted value depends on the product of independent variables. To account for this effect, it is necessary to introduce a new variable equal to this product.

In those cases where the nonlinearity is described more complex dependencies, linearization is possible due to coordinate transformation. For this, the values ​​are calculated and graphs of the dependence of the initial points in various combinations of the transformed variables are built. That combination of transformed coordinates, or transformed and non-transformed coordinates, in which the dependence is closest to a straight line suggests a change of variables that will lead to the transformation of a nonlinear dependence to a linear form. For example, a nonlinear dependence of the form

turns into a linear

The resulting regression coefficients for the transformed equation remain unbiased and effective, but the equation and coefficients cannot be tested for statistical significance

Checking the validity of the application of the method least squares

The use of the least squares method ensures the efficiency and unbiased estimates of the coefficients of the regression equation, subject to following conditions(Gaus-Markov conditions):

3. values ​​do not depend on each other

4. values ​​do not depend on independent variables

The easiest way to check whether these conditions are met is to plot the residuals versus , then the independent variable(s). If the points on these graphs are located in a corridor located symmetrically to the x-axis and there are no regularities in the location of the points, then the Gaus-Markov conditions are met and there are no opportunities to improve the accuracy of the regression equation. If this is not the case, then it is possible to significantly improve the accuracy of the equation, and for this it is necessary to refer to the special literature.

Final tests in econometrics

1. The assessment of the significance of the parameters of the regression equation is carried out on the basis of:

A) t - Student's criterion;

b) F-criterion of Fisher - Snedekor;

c) mean square error;

d) average approximation error.

2. The regression coefficient in the equation characterizing the relationship between the volume of sales (million rubles) and the profit of enterprises in the automotive industry for the year (million rubles) means that with an increase in the volume of sales by 1 million rubles profit increases by:

d) 0.5 million rub.;

c) 500 thousand. rub.;

D) 1.5 million rubles

3. Correlation ratio (correlation index) measures the degree of closeness of the relationship between X andY:

a) only with a non-linear form of dependence;

B) with any form of addiction;

c) only with a linear relationship.

4. In the direction of communication there are:

a) moderate;

B) straight;

c) rectilinear.

5. Based on 17 observations, a regression equation was built:
.
To check the significance of the equation, we calculatedobserved valuet- statistics: 3.9. Conclusion:

A) The equation is significant for a = 0,05;

b) The equation is insignificant at a = 0.01;

c) The equation is not significant at a = 0.05.

6. What are the consequences of violating the OLS assumption “expectation regression residuals equals zero"?

A) Biased estimates of regression coefficients;

b) Efficient but inconsistent estimates of regression coefficients;

c) Inefficient estimates of regression coefficients;

d) Inconsistent estimates of regression coefficients.

7. Which of the following statements is true in case of heteroskedasticity of residuals?

A) Conclusions on t and F-statistics are unreliable;

d) Estimates of the parameters of the regression equation are biased.

8. What is the test based on? rank correlation Spearman?

A) On the use of t - statistics;

c) On use ;

9. What is the White test based on?

b) On the use of F-statistics;

B) in use ;

d) On the graphical analysis of the residuals.

10. What method can be used to eliminate autocorrelation?

11. What is the violation of the assumption of the constancy of the variance of residuals called?

a) Multicollinearity;

b) Autocorrelation;

B) Heteroskedasticity;

d) Homoscedasticity.

12. Dummy variables are introduced into:

a) only in linear models;

b) only in multiple non-linear regression;

c) only in non-linear models;

D) both linear and non-linear models reduced to a linear form.

13. If in the matrix of paired correlation coefficients there are
, then this shows:

A) About the presence of multicollinearity;

b) About the absence of multicollinearity;

c) About the presence of autocorrelation;

d) About the absence of heteroscedasticity.

14. What measure is impossible to get rid of multicollinearity?

a) Increasing the sample size;

D) Transformation of the random component.

15. If
and the rank of matrix A is less than (K-1) then the equation:

a) over-identified;

B) not identified;

c) accurately identified.

16. The regression equation looks like:

BUT)
;

b)
;

in)
.

17. What is the problem of model identification?

A) obtaining uniquely defined parameters of the model given by the system of simultaneous equations;

b) selection and implementation of methods for statistical estimation of unknown parameters of the model according to the initial statistical data;

c) checking the adequacy of the model.

18. What method is used to estimate the parameters of an over-identified equation?

C) DMNK, KMNK;

19. If a qualitative variable haskalternative values, then the simulation uses:

A) (k-1) dummy variable;

b) kdummy variables;

c) (k+1) dummy variable.

20. Analysis of the closeness and direction of the links of two signs is carried out on the basis of:

A) pair correlation coefficient;

b) coefficient of determination;

c) multiple correlation coefficient.

21. In a linear equation x = a 0 +a 1 x regression coefficient shows:

a) the closeness of the connection;

b) proportion of variance "Y" dependent on "X";

C) how much "Y" will change on average when "X" changes by one unit;

d) correlation coefficient error.

22. What indicator is used to determine the part of the variation due to a change in the value of the factor under study?

a) coefficient of variation;

b) correlation coefficient;

C) coefficient of determination;

d) coefficient of elasticity.

23. The coefficient of elasticity shows:

A) by what% will the value of y change when x changes by 1%;

b) by how many units of its measurement the value of y will change when x changes by 1%;

c) by how much % will the value of y change when x changes by unit. your measurement.

24. What methods can be applied to detect heteroscedasticity?

A) Golfeld-Quandt test;

B) Spearman's rank correlation test;

c) Durbin-Watson test.

25. What is the basis of the Golfeld-Quandt test

a) On the use of t-statistics;

B) On the use of F - statistics;

c) On use ;

d) On the graphical analysis of the residuals.

26. What methods cannot be used to eliminate the autocorrelation of residuals?

a) Generalized method of least squares;

B) Weighted least squares method;

C) the maximum likelihood method;

D) Two-step method of least squares.

27. What is the violation of the assumption of independence of residuals called?

a) Multicollinearity;

B) Autocorrelation;

c) Heteroskedasticity;

d) Homoscedasticity.

28. What method can be used to eliminate heteroscedasticity?

A) Generalized method of least squares;

b) Weighted least squares method;

c) The maximum likelihood method;

d) Two-step least squares method.

30. If byt-criterion, most of the regression coefficients are statistically significant, and the model as a wholeF- the criterion is insignificant, then this may indicate:

a) Multicollinearity;

B) On the autocorrelation of residuals;

c) On heteroscedasticity of residues;

d) This option is not possible.

31. Is it possible to get rid of multicollinearity by transforming variables?

a) This measure is effective only when the sample size is increased;

32. What method can be used to find estimates of the parameter of the linear regression equation:

A) the least squares method;

b) correlation and regression analysis;

c) analysis of variance.

33. A multiple linear regression equation with dummy variables is constructed. To check the significance of individual coefficients, we use distribution:

a) Normal;

b) Student;

c) Pearson;

d) Fischer-Snedekor.

34. If
and the rank of matrix A is greater than (K-1) then the equation:

A) over-identified;

b) not identified;

c) accurately identified.

35. To estimate the parameters of a precisely identifiable system of equations, the following is used:

a) DMNK, KMNK;

b) DMNK, MNK, KMNK;

36. Chow's criterion is based on the application of:

A) F - statistics;

b) t - statistics;

c) Durbin-Watson criteria.

37. Dummy variables can take on the following values:

d) any values.

39. Based on 20 observations, a regression equation was built:
.
To check the significance of the equation, the value of the statistic is calculated:4.2. Conclusions:

a) The equation is significant at a=0.05;

b) The equation is not significant at a=0.05;

c) The equation is not significant at a=0.01.

40. Which of the following statements is not true if the residuals are heteroscedastic?

a) Conclusions on t and F statistics are unreliable;

b) Heteroskedasticity manifests itself through the low value of the Durbin-Watson statistics;

c) With heteroscedasticity, estimates remain effective;

d) Estimates are biased.

41. The Chow test is based on a comparison:

A) dispersions;

b) coefficients of determination;

c) mathematical expectations;

d) medium.

42. If in the Chow test
then it is considered:

A) that partitioning into subintervals is useful from the point of view of improving the quality of the model;

b) the model is statistically insignificant;

c) the model is statistically significant;

d) that it makes no sense to split the sample into parts.

43. Dummy variables are variables:

a) quality;

b) random;

B) quantitative;

d) logical.

44. Which of the following methods cannot be used to detect autocorrelation?

a) Series method;

b) Durbin-Watson test;

c) Spearman's rank correlation test;

D) White's test.

45. The simplest structural form of the model is:

BUT)

b)

in)

G)
.

46. ​​What measures can be taken to get rid of multicollinearity?

a) Increasing the sample size;

b) Exclusion of variables highly correlated with the rest;

c) Change of model specification;

d) Transformation of the random component.

47. If
and the rank of matrix A is (K-1) then the equation:

a) over-identified;

b) not identified;

B) accurately identified;

48. A model is considered identified if:

a) among the equations of the model there is at least one normal one;

B) each equation of the system is identifiable;

c) among the model equations there is at least one unidentified one;

d) among the equations of the model there is at least one overidentified.

49. What method is used to estimate the parameters of an unidentified equation?

a) DMNK, KMNK;

b) DMNC, MNC;

C) the parameters of such an equation cannot be estimated.

50. At the junction of what areas of knowledge did econometrics arise:

A) economic theory; economic and math statistics;

b) economic theory, mathematical statistics and probability theory;

c) economic and mathematical statistics, probability theory.

51. In the multiple linear regression equation, confidence intervals are built for the regression coefficients using the distribution:

a) Normal;

B) Student;

c) Pearson;

d) Fischer-Snedekor.

52. Based on 16 observations, a paired linear regression equation was constructed. Forregression coefficient significance check computedt for 6l =2.5.

a) The coefficient is insignificant at a=0.05;

b) The coefficient is significant at a=0.05;

c) The coefficient is significant at a=0.01.

53. It is known that between quantitiesXandYexistspositive connection. To what extentis the pairwise correlation coefficient?

a) from -1 to 0;

b) from 0 to 1;

C) from -1 to 1.

54. The multiple correlation coefficient is 0.9. What percentagedispersion of the resultant attribute is explained by the influence of allfactor traits?

55. Which of the following methods cannot be used to detect heteroscedasticity?

A) Golfeld-Quandt test;

b) Spearman's rank correlation test;

c) series method.

56. The given form of the model is:

a) a system of nonlinear functions of exogenous variables from endogenous ones;

B) system linear functions endogenous variables from exogenous;

c) a system of linear functions of exogenous variables from endogenous ones;

d) a system of normal equations.

57. Within what limits does the partial correlation coefficient calculated by recursive formulas change?

a) from - to + ;

b) from 0 to 1;

c) from 0 to + ;

D) from -1 to +1.

58. Within what limits does the partial correlation coefficient calculated through the coefficient of determination change?

a) from - to + ;

B) from 0 to 1;

c) from 0 to + ;

d) from –1 to +1.

59. Exogenous variables:

a) dependent variables;

B) independent variables;

61. When adding another explanatory factor to the regression equation, the multiple correlation coefficient:

a) will decrease

b) will increase;

c) retain its value.

62. A hyperbolic regression equation was built:Y= a+ b/ X. ForThe significance test of the equation uses the distribution:

a) Normal;

B) Student;

c) Pearson;

d) Fischer-Snedekor.

63. For what types of systems can the parameters of individual econometric equations be found using the traditional least squares method?

a) a system of normal equations;

B) a system of independent equations;

C) a system of recursive equations;

D) a system of interdependent equations.

64. Endogenous variables:

A) dependent variables;

b) independent variables;

c) dated from previous points in time.

65. Within what limits does the coefficient of determination change?

a) from 0 to + ;

b) from - to + ;

C) from 0 to +1;

d) from -l to +1.

66. A multiple linear regression equation has been built. To check the significance of individual coefficients, we use distribution:

a) Normal;

b) Student;

c) Pearson;

D) Fischer-Snedekor.

67. When adding another explanatory factor to the regression equation, the coefficient of determination:

a) will decrease

B) will increase;

c) retain its value;

d) will not decrease.

68. The essence of the least squares method is that:

A) the estimate is determined from the condition of minimizing the sum of squared deviations of the sample data from the determined estimate;

b) the estimate is determined from the condition of minimizing the sum of deviations of sample data from the determined estimate;

c) the estimate is determined from the condition of minimizing the sum of squared deviations of the sample mean from the sample variance.

69. What class of non-linear regressions does the parabola belong to:

73. What class of non-linear regressions does the exponential curve belong to:

74. What class of non-linear regressions does a function of the form ŷ belong to
:

A) regressions that are non-linear with respect to the variables included in the analysis, but linear with respect to the estimated parameters;

b) non-linear regressions on the estimated parameters.

78. What class of non-linear regressions does a function of the form ŷ belong to
:

a) regressions that are non-linear with respect to the variables included in the analysis, but linear with respect to the estimated parameters;

B) non-linear regressions on the estimated parameters.

79. In the regression equation in the form of a hyperbola ŷ
if the value
b >0 , then:

A) with an increase in the factor trait X the value of the resultant attribute at decrease slowly, and x→∞ average value at will be equal to a;

b) the value of the effective feature at increases with slow growth with an increase in the factor trait X, and at x→∞

81. The coefficient of elasticity is determined by the formula

A) Linear function;

b) Parabolas;

c) Hyperbolas;

d) exponential curve;

e) Power.

82. The coefficient of elasticity is determined by the formula
for a regression model in the form:

a) Linear function;

B) Parabolas;

c) Hyperbolas;

d) exponential curve;

e) Power.

86. Equation
called:

A) a linear trend

b) parabolic trend;

c) hyperbolic trend;

d) exponential trend.

89. Equation
called:

a) a linear trend;

b) parabolic trend;

c) hyperbolic trend;

D) an exponential trend.

90. System views called:

A) a system of independent equations;

b) a system of recursive equations;

c) a system of interdependent (simultaneous, simultaneous) equations.

93. Econometrics can be defined as:

A) it is an independent scientific discipline that combines a set of theoretical results, techniques, methods and models designed to, on the basis of economic theory, economic statistics and mathematical and statistical tools, give a specific quantitative expression to general (qualitative) patterns due to economic theory;

B) the science of economic measurements;

C) statistical analysis of economic data.

94. The tasks of econometrics include:

A) forecast of economic and socio-economic indicators characterizing the state and development of the analyzed system;

B) simulation of possible scenarios for the socio-economic development of the system to identify how the planned changes in certain manageable parameters will affect the output characteristics;

c) testing of hypotheses according to statistical data.

95. Relationships are distinguished by their nature:

A) functional and correlation;

b) functional, curvilinear and rectilinear;

c) correlation and inverse;

d) statistical and direct.

96. With a direct connection with an increase in a factor trait:

a) the effective sign decreases;

b) the effective attribute does not change;

C) the performance indicator increases.

97. What methods are used to identify the presence, nature and direction of association in statistics?

a) average values;

B) comparison of parallel rows;

C) analytical grouping method;

d) relative values;

D) graphical method.

98. What method is used to identify the forms of influence of some factors on others?

a) correlation analysis;

B) regression analysis;

c) index analysis;

d) analysis of variance.

99. What method is used to quantify the strength of the impact of some factors on others:

A) correlation analysis;

b) regression analysis;

c) the method of averages;

d) analysis of variance.

100. What indicators in their magnitude exist in the range from minus to plus one:

a) coefficient of determination;

b) correlation ratio;

C) linear correlation coefficient.

101. The regression coefficient for a one-factor model shows:

A) how many units the function changes when the argument changes by one unit;

b) how many percent the function changes per unit change in the argument.

102. The coefficient of elasticity shows:

a) by how many percent does the function change with a change in the argument by one unit of its measurement;

B) by how many percent does the function change with a change in the argument by 1%;

c) by how many units of its measurement the function changes with a change in the argument by 1%.

105. The value of the correlation index, equal to 0.087, indicates:

A) about their weak dependence;

b) a strong relationship;

c) errors in calculations.

107. The value of the pair correlation coefficient, equal to 1.12, indicates:

a) about their weak dependence;

b) a strong relationship;

C) about errors in calculations.

109. Which of the given numbers can be the values ​​of the pair correlation coefficient:

111. Which of the given numbers can be the values ​​of the multiple correlation coefficient:

115. Mark the correct form of the linear regression equation:

a) s
;

b) ŷ
;

c) ŷ
;

D) ŷ
.