Biographies Characteristics Analysis

Finding the confidence interval for the mathematical expectation. Example problems for finding a confidence interval

And others. All of them are estimates of their theoretical analogues, which could be obtained if not a sample, but a general population were available. But alas, the general population is very expensive and often inaccessible.

The concept of interval estimation

Any sample estimate has some spread, because is a random variable depending on the values ​​in a particular sample. Therefore, for more reliable statistical conclusions, one should know not only point estimate, but also an interval that is highly likely γ (gamma) covers the evaluated indicator θ (theta).

Formally, these are two such values ​​(statistics) T 1 (X) And T 2 (X), What T 1< T 2 , for which at a given probability level γ the condition is met:

In short, it is likely γ or more the true indicator is between the points T 1 (X) And T 2 (X), which are called the lower and upper bounds confidence interval.

One of the conditions for constructing confidence intervals is its maximum narrowness, i.e. it should be as short as possible. The desire is quite natural, because... the researcher tries to more accurately localize the location of the desired parameter.

It follows that the confidence interval must cover the maximum probabilities of the distribution. and the assessment itself should be in the center.

That is, the probability of deviation (of the true indicator from the estimate) upward is equal to the probability of deviation downward. It should also be noted that for asymmetric distributions the interval on the right is not equal to the interval left.

The figure above clearly shows that the greater the confidence probability, the wider the interval - a direct relationship.

This was a short introduction to the theory. interval estimation unknown parameters. Let's move on to finding confidence limits for mathematical expectation.

Confidence interval for mathematical expectation

If the original data are distributed over , then the average will be a normal value. This follows from the rule that a linear combination of normal values ​​also has a normal distribution. Therefore, to calculate the probabilities we could use mathematical apparatus normal distribution law.

However, this will require knowing two parameters - expectation and variance, which are usually unknown. You can, of course, use estimates instead of parameters (arithmetic mean and ), but then the distribution of the average will not be entirely normal, it will be slightly flattened downwards. This fact was cleverly noted by citizen William Gosset from Ireland, publishing his discovery in the March 1908 issue of the journal Biometrica. For purposes of secrecy, Gosset signed himself Student. This is how the Student t-distribution appeared.

However, the normal distribution of data used by K. Gauss in error analysis astronomical observations, is extremely rare in earthly life and is quite difficult to establish (for high precision about 2 thousand observations are needed). Therefore, it is best to discard the assumption of normality and use methods that do not depend on the distribution of the original data.

The question arises: what is the distribution of the arithmetic mean if it is calculated from the data of an unknown distribution? The answer is given by the well-known in probability theory Central limit theorem (CPT). In mathematics, there are several variants of it (the formulations have been refined over the years), but all of them, roughly speaking, boil down to the statement that the sum large quantities independent random variables obeys the normal distribution law.

When calculating the arithmetic mean, the sum of random variables is used. From here it turns out that the arithmetic mean has a normal distribution, in which the expectation is the expectation of the original data, and the variance is .

Smart people know how to prove CLT, but we will verify this with the help of an experiment conducted in Excel. Let's simulate a sample of 50 uniformly distributed random variables (using Excel functions CASE BETWEEN). Then we will make 1000 such samples and calculate the arithmetic mean for each. Let's look at their distribution.

It can be seen that the distribution of the average is close to the normal law. If the sample size and number are made even larger, the similarity will be even better.

Now that we have seen with our own eyes the validity of the CLT, we can, using , calculate confidence intervals for the arithmetic mean, which cover the true mean or mathematical expectation with a given probability.

To set the upper and lower limits, you need to know the parameters normal distribution. As a rule, there are none, so estimates are used: arithmetic mean And sample variance . I repeat, this method gives a good approximation only with large samples. When samples are small, it is often recommended to use the Student distribution. Don't believe it! The Student distribution for the mean occurs only when the original data is normally distributed, that is, almost never. Therefore, it is better to immediately set a minimum bar for the amount of required data and use asymptotically correct methods. They say 30 observations are enough. Take 50 - you won't go wrong.

T 1.2– lower and upper limits of the confidence interval

– sample arithmetic mean

s 0– standard deviation of the sample (unbiased)

n – sample size

γ – confidence probability (usually equal to 0.9, 0.95 or 0.99)

c γ =Φ -1 ((1+γ)/2)reciprocal value functions of standard normal distribution. Simply put, this is the number of standard errors from the arithmetic mean to the lower or upper bound (these three probabilities correspond to values ​​of 1.64, 1.96 and 2.58).

The essence of the formula is that the arithmetic mean is taken and then a certain amount is set aside from it ( with γ) standard errors ( s 0 /√n). Everything is known, take it and consider it.

Before the widespread use of personal computers, they used to obtain the values ​​of the normal distribution function and its inverse. They are still used now, but it is more effective to turn to ready-made Excel formulas. All elements from the formula above ( , and ) can be easily calculated in Excel. But there is a ready-made formula for calculating the confidence interval - TRUST.NORM. Its syntax is as follows.

CONFIDENCE.NORM(alpha;standard_off;size)

alpha– level of significance or confidence level, which in the notation adopted above is equal to 1- γ, i.e. the probability that the mathematicalthe expectation will be outside the confidence interval. With a confidence level of 0.95, alpha is 0.05, etc.

standard_off– standard deviation of sample data. There is no need to calculate the standard error; Excel itself will divide by the root of n.

size– sample size (n).

The result of the CONFIDENCE NORM function is the second term from the formula for calculating the confidence interval, i.e. half-interval Accordingly, the lower and upper points are the average ± the obtained value.

Thus, it is possible to construct a universal algorithm for calculating confidence intervals for the arithmetic mean, which does not depend on the distribution of the original data. The price for universality is its asymptotic nature, i.e. the need to use relatively large samples. However, in the age modern technologies collect required quantity data is usually not difficult.

Testing statistical hypotheses using confidence intervals

(module 111)

One of the main problems solved in statistics is. Its essence is briefly as follows. It is assumed, for example, that the expectation population equal to some value. Then the distribution of sample means that can be observed for a given expectation is constructed. Next, they look at where in this conditional distribution the real average is located. If it goes beyond acceptable limits, then the appearance of such an average is very unlikely, and if the experiment is repeated once, it is almost impossible, which contradicts the hypothesis put forward, which is successfully rejected. If the average does not exceed critical level, then the hypothesis is not rejected (but also not proven!).

So, with the help of confidence intervals, in our case for expectation, you can also test some hypotheses. It's very easy to do. Let's say the arithmetic mean for a certain sample is 100. The hypothesis is tested that the expected value is, say, 90. That is, if we pose the question primitively, it sounds like this: can it be that when true meaning average equal to 90, the observed average turned out to be equal to 100?

To answer this question, you will additionally need information about the average square deviation and sample size. Let's say standard deviation is 30, and the number of observations is 64 (so that the root can be easily extracted). Then the standard error of the mean is 30/8 or 3.75. To calculate a 95% confidence interval, you will need to set aside two to either side of the mean. standard errors(more precisely, 1.96 each). The confidence interval will be approximately 100±7.5 or from 92.5 to 107.5.

Further reasoning is as follows. If the value being tested falls within the confidence interval, then it does not contradict the hypothesis, because falls within the limits of random fluctuations (with a probability of 95%). If the point being checked falls outside the confidence interval, then the probability of such an event is very small, in any case below the acceptable level. This means that the hypothesis is rejected as contradicting the observed data. In our case, the hypothesis about the expected value is outside the confidence interval (the tested value of 90 is not included in the interval 100±7.5), so it should be rejected. Answering the primitive question above, it should be said: no, it cannot, in any case, this happens extremely rarely. Often, they indicate the specific probability of erroneously rejecting the hypothesis (p-level), and not the specified level on which the confidence interval was constructed, but more on that another time.

As you can see, constructing a confidence interval for the average (or mathematical expectation) is not difficult. The main thing is to grasp the essence, and then things will move on. In practice, most cases use a 95% confidence interval, which is approximately two standard errors wide on either side of the mean.

That's all for now. All the best!

Confidence interval– limit values statistical value, which with a given confidence probability γ will be in this interval when sampling a larger volume. Denoted as P(θ - ε. In practice, choose confidence probabilityγ from values ​​quite close to unity: γ = 0.9, γ = 0.95, γ = 0.99.

Purpose of the service. Using this service, you can determine:

  • confidence interval for the general mean, confidence interval for the variance;
  • confidence interval for the standard deviation, confidence interval for the general share;
The resulting solution is saved in a Word file (see example). Below is a video instruction on how to fill out the initial data.

Example No. 1. On a collective farm, out of a total herd of 1000 sheep, 100 sheep underwent selective control shearing. As a result, an average wool clipping of 4.2 kg per sheep was established. Determine with a probability of 0.99 the mean square error of the sample when determining the average wool shearing per sheep and the limits within which the shearing value is contained if the variance is 2.5. The sample is non-repetitive.
Example No. 2. From a batch of imported products at the Moscow Northern Customs post, it was taken at random resampling 20 samples of product "A". As a result of the check, the average moisture content of product “A” in the sample was established, which turned out to be equal to 6% with an average square deviation 1 %.
Determine with probability 0.683 the limits of the average moisture content of the product in the entire batch of imported products.
Example No. 3. A survey of 36 students showed that the average number of textbooks they read per year academic year, turned out to be equal to 6. Considering that the number of textbooks read by a student per semester has normal law distributions with a standard deviation equal to 6, find: A) with a reliability of 0.99 interval estimation for the mathematical expectation of this random variable; B) with what probability can we say that the average number of textbooks read by a student per semester, calculated from a given sample, will deviate from the mathematical expectation according to absolute value no more than 2.

Classification of confidence intervals

By type of parameter being assessed:

By sample type:

  1. Confidence interval for an infinite sample;
  2. Confidence interval for the final sample;
The sample is called resampling, if the selected object is returned to the population before selecting the next one. The sample is called non-repeat, if the selected object is not returned to the population. In practice, we usually deal with non-repetitive samples.

Calculation of the average sampling error for random sampling

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error.
Designations of the main parameters of the general and sample populations.
Average sampling error formulas
re-selectionrepeat selection
for averagefor sharefor averagefor share
The relationship between the sampling error limit (Δ) guaranteed with some probability Р(t), And average error sample has the form: or Δ = t·μ, where t– confidence coefficient, determined depending on the probability level P(t) according to the table of Laplace integral function.

Formulas for calculating the sample size using a purely random sampling method

Let a sample be taken from a general population subject to the law normal distribution XN( m; ). This basic assumption of mathematical statistics is based on the central limit theorem. Let the general standard deviation be known , but the mathematical expectation of the theoretical distribution is unknown m(average value ).

In this case, the sample mean , obtained during the experiment (section 3.4.2), will also be a random variable m;
). Then the “normalized” deviation
N(0;1) – is a standard normal random variable.

The task is to find an interval estimate for m. Let's construct a two-sided confidence interval for m so that the true mathematical expectation belongs to him with a given probability (reliability) .

Set such an interval for the value
- this means finding the maximum value of this quantity
and minimum
, which are the boundaries of the critical region:
.

Because this probability is equal
, then the root of this equation
can be found using Laplace function tables (Table 3, Appendix 1).

Then with probability it can be argued that the random variable
, that is, the desired general average belongs to the interval
. (3.13)

Size
(3.14)

called accuracy assessments.

Number
quantile normal distribution - can be found as an argument of the Laplace function (Table 3, Appendix 1), taking into account the relation 2Ф( u)=, i.e. F( u)=
.

Back, by set value deviations can be found with what probability the unknown general mean belongs to the interval
. To do this you need to calculate

. (3.15)

Let a random sample be extracted from the general population using the repeated selection method. From Eq.
can be found minimum resampling volume n, necessary for the confidence interval with a given reliability did not exceed the preset value . The required sample size is estimated using the formula:

. (3.16)

Let's explore estimation accuracy
:

1) As the sample size increases n magnitude decreases, and therefore the accuracy of the estimate increases.

2) C increase reliability of the assessment the value of the argument increases u(because F(u) increases monotonically) and therefore increases . In this case, the increase in reliability reduces accuracy of its assessment .

Evaluation
(3.17)

called classical(Where t- a certain parameter depending on And n), because it characterizes the most frequently encountered distribution laws.

3.5.3 Confidence intervals for estimating the mathematical expectation of a normal distribution with an unknown standard deviation 

Let it be known that the population is subject to the law of normal distribution XN( m;), where the value root mean square deviations unknown.

To construct a confidence interval for estimating the general mean in this case, statistics are used
, having a Student distribution with k= n–1 degrees of freedom. This follows from the fact that N(0;1) (see section 3.5.2), and
(see section 3.5.3) and from the definition of the Student distribution (part 1.section 2.11.2).

Let us find the accuracy of the classical estimate of the Student distribution: i.e. we'll find t from formula (3.17). Let the probability of fulfilling the inequality
given by reliability :

. (3.18)

Because the TSt( n-1), it is obvious that t depends on And n, so they usually write
.

(3.19)

Where
– Student distribution function with n-1 degrees of freedom.

Solving this equation for m, we get the interval
which reliably  covers the unknown parameter m.

Magnitude t , n-1, used to determine the confidence interval of a random variable T(n-1), distributed according to t-test with n-1 degrees of freedom is called Student's coefficient. It should be found by given values n and  from tables " Critical points Student distributions. (Table 6, Appendix 1), which represent solutions to equation (3.19).

As a result, we get the following expression accuracy confidence interval for estimating the mathematical expectation (general mean), if the variance is unknown:

(3.20)

Thus, there is a general formula for constructing confidence intervals for the mathematical expectation of the population:

where is the accuracy of the confidence interval depending on the known or unknown dispersion is found according to formulas, respectively 3.16. and 3.20.

Problem 10. Some tests were carried out, the results of which are listed in the table:

x i

It is known that they obey the law of normal distribution with
. Find rating m* for mathematical expectation m, construct a 90% confidence interval for it.

Solution:

So, m(2.53;5.47).

Problem 11. The depth of the sea is measured by a device whose systematic error is 0, and random errors are distributed according to the normal law, with a standard deviation =15m. How many independent measurements must be made to determine the depth with errors of no more than 5 m at a confidence level of 90%?

Solution:

According to the conditions of the problem we have XN( m; ), Where =15m, =5m, =0.9. Let's find the volume n.

1) With a given reliability = 0.9, we find from Tables 3 (Appendix 1) the argument of the Laplace function u = 1.65.

2) Knowing the specified estimation accuracy =u=5, let's find
. We have

. Therefore the number of tests n25.

Problem 12. Temperature sampling t for the first 6 days of January is presented in the table:

Find the confidence interval for the mathematical expectation m population with confidence probability
and evaluate the general standard deviation s.

Solution:


And
.

2) Unbiased estimate find it using the formula
:

=-175

=234.84

;
;

=-192

=116


.

3) Since the general variance is unknown, but its estimate is known, then to estimate the mathematical expectation m we use the Student distribution (Table 6, Appendix 1) and formula (3.20).

Because n 1 =n 2 =6, then ,
, s 1 =6.85 we have:
, hence -29.2-4.1<m 1 < -29.2+4.1.

Therefore -33.3<m 1 <-25.1.

Similarly we have,
, s 2 = 4.8, so

–34.9< m 2 < -29.1. Тогда доверительные интервалы примут вид: m 1 (-33.3;-25.1) and m 2 (-34.9;-29.1).

In applied sciences, for example, in construction disciplines, confidence interval tables are used to assess the accuracy of objects, which are given in the relevant reference literature.

You can use this search form to find the task you need. Enter a word, phrase from the task or its number, if you know it.


Search only in this section


Confidence intervals: list of solutions to problems

Confidence intervals: theory and problems

Understanding Confidence Intervals

Let us briefly introduce the concept of a confidence interval, which
1) estimates some parameter of a numerical sample directly from the data of the sample itself,
2) covers the value of this parameter with probability γ.

Confidence interval for parameter X(with probability γ) is called an interval of the form , such that , and the values ​​are calculated in some way from the sample.

Usually in applied problems the confidence probability is taken equal to γ ​​= 0.9; 0.95; 0.99.

Let's consider some sample of size n, made from the general population, distributed presumably according to the normal distribution law. Let us show what formulas are used to find confidence intervals for distribution parameters- mathematical expectation and dispersion (standard deviation).

Confidence interval for mathematical expectation

Case 1. The variance of the distribution is known and equal to . Then the confidence interval for the parameter a has the form:
t determined from the Laplace distribution table according to the relation

Case 2. The variance of the distribution is unknown; a point estimate of the variance is calculated from the sample. Then the confidence interval for the parameter a has the form:
, where is the sample average calculated from the sample, parameter t determined from the Student distribution table

Example. Based on 7 measurements of a certain quantity, the average of the measurement results was found to be 30 and the sample variance to be 36. Find the boundaries within which the true value of the measured quantity is contained with a reliability of 0.99.

Solution. We'll find . Then the confidence limits for the interval containing the true value of the measured value can be found using the formula:
, where is the sample mean, is the sample variance. We substitute all the values ​​and get:

Confidence interval for variance

We believe that, generally speaking, the mathematical expectation is unknown, and only the point unbiased estimate of the variance is known. Then the confidence interval has the form:
, Where - distribution quantiles determined from tables.

Example. Based on the data of 7 tests, the evaluation value for the standard deviation was found s=12. Find, with probability 0.9, the width of the confidence interval constructed to estimate the dispersion.

Solution. The confidence interval for the unknown population variance can be found using the formula:

We substitute and get:


Then the width of the confidence interval is 465.589-71.708=393.881.

Confidence interval for probability (proportion)

Case 1. Let the sample size and sample fraction (relative frequency) be known in the problem. Then the confidence interval for the general share (true probability) has the form:
, where the parameter t is determined from the Laplace distribution table using the relation.

Case 2. If in the problem the total size of the population from which the sample was taken is additionally known, the confidence interval for the general share (true probability) can be found using the adjusted formula:
.

Example. It is known that Find the boundaries within which the general share is likely to be contained.

Solution. We use the formula:

Let's find the parameter from the condition , we get Substitute into the formula:


You will find other examples of problems in mathematical statistics on the page

Let the random variable X of the population be normally distributed, taking into account that the variance and standard deviation s of this distribution are known. It is required to estimate the unknown mathematical expectation using the sample mean. In this case, the task comes down to finding a confidence interval for the mathematical expectation with reliability b. If you specify the value of the confidence probability (reliability) b, then you can find the probability of falling into the interval for the unknown mathematical expectation using formula (6.9a):

where Ф(t) is the Laplace function (5.17a).

As a result, we can formulate an algorithm for finding the boundaries of the confidence interval for the mathematical expectation if the variance D = s 2 is known:

  1. Set the reliability value – b.
  2. From (6.14) express Ф(t) = 0.5× b. Select the value of t from the table for the Laplace function based on the value Ф(t) (see Appendix 1).
  3. Calculate the deviation e using formula (6.10).
  4. Write down a confidence interval using formula (6.12) such that with probability b the inequality holds:

.

Example 5.

The random variable X has a normal distribution. Find confidence intervals for an estimate with reliability b = 0.96 of the unknown mathematical expectation a, if given:

1) general standard deviation s = 5;

2) sample average;

3) sample size n = 49.

In formula (6.15) of the interval estimate of the mathematical expectation A with reliability b all quantities except t are known. The value of t can be found using (6.14): b = 2Ф(t) = 0.96. Ф(t) = 0.48.

Using the table in Appendix 1 for the Laplace function Ф(t) = 0.48, find the corresponding value t = 2.06. Hence, . By substituting the calculated value of e into formula (6.12), you can get a confidence interval: 30-1.47< a < 30+1,47.

The required confidence interval for an estimate with reliability b = 0.96 of the unknown mathematical expectation is equal to: 28.53< a < 31,47.