Biographies Characteristics Analysis

Automatic student calculation. Data Requirements

Student distribution table

Probability integral tables are used for large samples from infinitely large population. But already at (n)< 100 получается Несоответствие между

tabular data and limit probability; at (n)< 30 погрешность становится значительной. Несоответствие вызывается главным образом характером распределения единиц генеральной совокупности. При большом объеме выборки особенность распределения в гене-

of the general population does not matter, since the distribution of deviations of the sample indicator from the general characteristic when large sample always turns out to be normal.

nym. In samples of small size (n)< 30 характер распределения генеральной совокупности сказывается на распределении ошибок выборки. Поэтому для расчета ошибки выборки при небольшом объеме наблюдения (уже менее 100 единиц) отбор должен проводиться из со-

a population that has normal distribution. The theory of small samples was developed by the English statistician W. Gosset (who wrote under the pseudonym Student) at the beginning of the 20th century. AT

In 1908, he constructed a special distribution that allows, even with small samples, to correlate (t) and confidence level F(t). For (n) > 100, Student distribution tables give the same results as Laplace probability integral tables for 30< (n ) <

100 differences are minor. Therefore, in practice, small samples include samples with a volume of less than 30 units (of course, a sample with a volume of more than 100 units is considered large).

The use of small samples in some cases is due to the nature of the surveyed population. Thus, in breeding work, "pure" experience is easier to achieve on a small number of

plots. The production and economic experiment, associated with economic costs, is also carried out on a small number of trials. As already noted, in the case of a small sample, both the confidence probabilities and the confidence limits of the general mean can be calculated only for a normally distributed population.

The probability density of Student's distribution is described by a function.

1 + t2

f (t ,n) := Bn

n − 1

t - current variable; n - sample size;

B is a value that depends only on (n).

Student's distribution has only one parameter: (d.f. ) - the number of degrees of freedom (sometimes denoted by (k)). This distribution is, like the normal one, symmetrical with respect to the point (t) = 0, but it is flatter. With an increase in the sample size, and, consequently, the number of degrees of freedom, the Student's distribution quickly approaches normal. The number of degrees of freedom is equal to the number of those individual values ​​of features that need to be

suppose to determine the desired characteristic. So, to calculate the variance, the average value must be known. Therefore, when calculating the dispersion, (d.f.) = n - 1 is used.

Student distribution tables are published in two versions:

1. similarly to the tables of the probability integral, the values ​​( t ) and

cumulative probabilities F(t) for different numbers of degrees of freedom;

2. values ​​(t) are given for the most commonly used confidence probabilities

0.70; 0.75; 0.80; 0.85; 0.90; 0.95 and 0.99 or for 1 - 0.70 = 0.3; 1 - 0.80 = 0.2; …… 1 - 0.99 = 0.01.

3. with different number of degrees of freedom. Such a table is given in the appendix.

(Table 1 - 20), as well as the value (t) - Student's test at a significance level of 0.7

​ Student's t-test is a general name for a class of methods for statistical testing of hypotheses (statistical tests) based on the Student's distribution. The most common cases of applying the t-test are related to checking the equality of the means in two samples.

1. History of the development of the t-test

This criterion was developed William Gosset to assess the quality of beer at Guinness. In connection with obligations to the company not to disclose trade secrets, Gosset's article was published in 1908 in the journal Biometrics under the pseudonym "Student" (Student).

2. What is the Student's t-test used for?

Student's t-test is used to determine the statistical significance of mean differences. It can be used both in cases of comparing independent samples ( for example, groups of patients with diabetes mellitus and groups of healthy), and when comparing related sets ( e.g. mean heart rate in the same patients before and after taking an antiarrhythmic drug).

3. When can the Student's t-test be used?

To apply the Student's t-test, it is necessary that the original data have normal distribution. In the case of applying a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) of variances.

If these conditions are not met, when comparing sample means, similar methods should be used. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent samples).

4. How to calculate Student's t-test?

To compare means, Student's t-test is calculated using the following formula:

where M 1- arithmetic mean of the first compared population (group), M 2- arithmetic mean of the second compared population (group), m 1- the average error of the first arithmetic mean, m2- the average error of the second arithmetic mean.

5. How to interpret the value of Student's t-test?

The resulting value of Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Finding the number of degrees of freedom f according to the following formula:

f \u003d (n 1 + n 2) - 2

After that, we determine the critical value of Student's t-test for the required level of significance (for example, p=0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

  • If the calculated value of Student's t-test equal or greater critical, found in the table, we conclude that the differences between the compared values ​​are statistically significant.
  • If the value of the calculated Student's t-test smaller tabular, which means that the differences between the compared values ​​are not statistically significant.

6. An example of calculating the Student's t-test

To study the effectiveness of a new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group they received a placebo. After that, the level of hemoglobin in peripheral blood was measured. In the first group, the average hemoglobin level was 115.4±1.2 g/l, and in the second - 103.7±2.3 g/l (data are presented in the format M±m), the compared populations have a normal distribution. The number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Decision: To assess the significance of differences, we use Student's t-test, calculated as the difference between the means divided by the sum of squared errors:

After performing the calculations, the value of the t-test was equal to 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. We compare the obtained value of Student's t-test 4.51 with the critical value at p=0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical value, we conclude that the observed differences are statistically significant (significance level p<0,05).

When can the Student's t-test be used?

To apply the Student's t-test, it is necessary that the original data have normal distribution. In the case of applying a two-sample test for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) of variances.

If these conditions are not met, when comparing sample means, similar methods should be used. nonparametric statistics, among which the most famous are Mann-Whitney U-test(as a two-sample test for independent samples), and sign criterion and Wilcoxon test(used in cases of dependent samples).

To compare means, Student's t-test is calculated using the following formula:

where M 1- arithmetic mean of the first compared population (group), M 2- arithmetic mean of the second compared population (group), m 1- the average error of the first arithmetic mean, m2- the average error of the second arithmetic mean.

How to interpret the value of Student's t-test?

The resulting value of Student's t-test must be correctly interpreted. To do this, we need to know the number of subjects in each group (n 1 and n 2). Finding the number of degrees of freedom f according to the following formula:

f \u003d (n 1 + n 2) - 2

After that, we determine the critical value of Student's t-test for the required level of significance (for example, p=0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

If the calculated value of Student's t-test equal or greater critical, found in the table, we conclude that the differences between the compared values ​​are statistically significant.

If the value of the calculated Student's t-test smaller tabular, which means that the differences between the compared values ​​are not statistically significant.

Student's t-test example

To study the effectiveness of a new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group they received a placebo. After that, the level of hemoglobin in peripheral blood was measured. In the first group, the average hemoglobin level was 115.4±1.2 g/l, and in the second - 103.7±2.3 g/l (data are presented in the format M±m), the compared populations have a normal distribution. The number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the obtained differences and the effectiveness of the new iron preparation.

Decision: To assess the significance of differences, we use Student's t-test, calculated as the difference between the means divided by the sum of squared errors:

After performing the calculations, the value of the t-test was equal to 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. We compare the obtained value of Student's t-test 4.51 with the critical value at p=0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical value, we conclude that the observed differences are statistically significant (significance level p<0,05).

The Fisher distribution is the distribution of a random variable

where random variables X 1 and X 2 are independent and have chi distributions - the square with the number of degrees of freedom k 1 and k2 respectively. At the same time, a couple (k 1 , k 2) is a pair of "numbers of degrees of freedom" of the Fisher distribution, namely, k 1 is the number of degrees of freedom of the numerator, and k2 is the number of degrees of freedom of the denominator. Distribution of a random variable F named after the great English statistician R. Fisher (1890-1962), who actively used it in his work.

The Fisher distribution is used to test hypotheses about the adequacy of the model in regression analysis, about the equality of variances, and in other problems of applied statistics.

Student's table of critical values.

Form start

Number of degrees of freedom, f Student's t-test value at p=0.05
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
40-41 2.021
42-43 2.018
44-45 2.015
46-47 2.013
48-49 2.011
50-51 2.009
52-53 2.007
54-55 2.005
56-57 2.003
58-59 2.002
60-61 2.000
62-63 1.999
64-65 1.998
66-67 1.997
68-69 1.995
70-71 1.994
72-73 1.993
74-75 1.993
76-77 1.992
78-79 1.991
80-89 1.990
90-99 1.987
100-119 1.984
120-139 1.980
140-159 1.977
160-179 1.975
180-199 1.973
1.972
1.960

One of the most well-known statistical tools is Student's t-test. It is used to measure the statistical significance of various pairwise quantities. Microsoft Excel has a special function for calculating this indicator. Let's learn how to calculate Student's t-test in Excel.

But, for starters, let's still find out what the Student's criterion is in general. This indicator is used to check the equality of the average values ​​of two samples. That is, it determines the validity of the differences between two groups of data. At the same time, a whole set of methods is used to determine this criterion. The indicator can be calculated with a one-tailed or two-tailed distribution.

Calculation of the indicator in Excel

Now let's move on to the question of how to calculate this indicator in Excel. It can be done through the function STUDENT TEST. In versions of Excel 2007 and earlier, it was called TTEST. However, it was left in later versions for compatibility purposes, but it is still recommended to use a more modern one in them - STUDENT TEST. This function can be used in three ways, which will be discussed in detail below.

Method 1: Function Wizard

The easiest way to calculate this indicator is through the Function Wizard.


The calculation is performed, and the result is displayed on the screen in a pre-selected cell.

Method 2: Working with the Formulas Tab

Function STUDENT TEST can also be called by going to the tab "Formulas" using a special button on the ribbon.


Method 3: manual input

Formula STUDENT TEST it can also be entered manually into any cell on the worksheet or into the function bar. Its syntax looks like this:

STUDENT.TEST(Array1,Array2,Tails,Type)

What each of the arguments means was considered when analyzing the first method. These values ​​should be substituted into this function.

After the data is entered, press the button Enter to display the result on the screen.

As you can see, the Student's criterion is calculated in Excel very simply and quickly. The main thing is that the user who performs the calculations must understand what he is and what input data is responsible for what. The program performs the direct calculation itself.

In the course of the example, we will use fictitious information so that the reader can make the necessary transformations on their own.

So, for example, in the course of research, we studied the effect of drug A on the content of substance B (in mmol / g) in tissue C and the concentration of substance D in the blood (in mmol / l) in patients divided according to some criterion E into 3 groups of equal volume (n = 10). The results of this fictitious study are shown in the table:

Substance B content, mmol/g

Substance D, mmol/l

concentration increase


We would like to warn you that samples of size 10 are considered by us for ease of presentation of data and calculations; in practice, such a sample size is usually not enough to form a statistical conclusion.

As an example, consider the data of the 1st column of the table.

Descriptive statistics

sample mean

The arithmetic mean, which is very often referred to simply as "average", is obtained by adding all the values ​​and dividing this sum by the number of values ​​in the set. This can be shown using an algebraic formula. A set of n observations of a variable x can be represented as x 1 , x 2 , x 3 , ..., x n

The formula for determining the arithmetic mean of observations (pronounced "X with a dash"):

\u003d (X 1 + X 2 + ... + X n) / n

= (12 + 13 + 14 + 15 + 14 + 13 + 13 + 10 + 11 + 16) / 10 = 13,1;

Sample variance

One way to measure data scatter is to determine how far each observation deviates from the arithmetic mean. Obviously, the greater the deviation, the greater the variability, the variability of observations. However, we cannot use the average of these deviations as a measure of dispersion, because positive deviations compensate for negative deviations (their sum is zero). To solve this problem, we square each deviation and find the average of the squared deviations; this quantity is called variation or dispersion. Take n observations x 1, x 2, x 3, ..., x n, average which equals. We calculate the disper this one, usually referred to ass2,these observations:

The sample variance of this indicator is s 2 = 3.2.

Standard deviation

The standard (root mean square) deviation is the positive square root of the variance. For example, n observations, it looks like this:

We can think of the standard deviation as a sort of mean deviation of the observations from the mean. It is calculated in the same units (dimensions) as the original data.

s = sqrt (s 2) = sqrt (3.2) = 1.79 .

The coefficient of variation

If you divide the standard deviation by the arithmetic mean and express the result as a percentage, you get the coefficient of variation.

CV = (1.79 / 13.1) * 100% = 13.7

Sample mean error

1.79/sqrt(10) = 0.57;

Student's coefficient t (one-sample t-test)

It is used to test the hypothesis about the difference between the mean value and some known value m

The number of degrees of freedom is calculated as f=n-1.

In this case, the confidence interval for the mean is between the limits of 11.87 and 14.39.

For the 95% confidence level, m=11.87 or m=14.39, i.e. = |13.1-11.82| = |13.1-14.38| = 1.28

Accordingly, in this case, for the number of degrees of freedom f = 10 - 1 = 9 and the confidence level of 95% t=2.26.

Dialog Basic Statistics and Tables

In the module Basic statistics and tables choose Descriptive statistics.

A dialog box will open Descriptive statistics.

In field Variables choose Group 1.

Pressing OK, we obtain tables of results with descriptive statistics of the selected variables.

A dialog box will open One-sample t-test.

Suppose we know that the average content of substance B in tissue C is 11.

The results table with descriptive statistics and Student's t-test is as follows:

We had to reject the hypothesis that the average content of substance B in tissue C is 11.

Since the calculated value of the criterion is greater than the tabulated value (2.26), the null hypothesis is rejected at the chosen significance level, and the differences between the sample and the known value are recognized as statistically significant. Thus, the conclusion about the existence of differences, made using the Student's criterion, is confirmed using this method.