The law of large numbers and limit theorems. Law of Large Numbers

If the phenomenon of sustainability medium takes place in reality, then in the mathematical model with which we study random phenomena, there must be a theorem reflecting this fact.
Under the conditions of this theorem, we introduce restrictions on random variables X 1 , X 2 , …, X n:

a) each random variable Х i has mathematical expectation

M(Х i) = a;

b) the variance of each random variable is finite, or we can say that the variances are bounded from above by the same number, for example With, i.e.

D(Х i) < C, i = 1, 2, …, n;

c) random variables are pairwise independent, i.e. any two X i and Xj at i¹ j independent.

Then obviously

D(X 1 + X 2 + … + X n)=D(X 1) + D(X 2) + ... + D(X n).

Let us formulate the law of large numbers in the Chebyshev form.

Chebyshev's theorem: with an unlimited increase in the number n independent tests " the arithmetic mean of the observed values of a random variable converges in probability to its mathematical expectation ”, i.e. for any positive ε

R(| –a| < ε ) = 1. (4.1.1)

The meaning of the expression "arithmetic mean = converges in probability to a" is that the probability that will differ arbitrarily little from a, approaches 1 indefinitely as the number n.

Proof. For a finite number n independent tests, we apply the Chebyshev inequality for a random variable = :

R(|–M()| < ε ) ≥ 1 – . (4.1.2)

Taking into account the restrictions a - b, we calculate M( ) and D( ):

M( ) = = = = = = a;

D( ) = = = = = = .

Substituting M( ) and D( ) into inequality (4.1.2), we obtain

R(| –a| < ε )≥1 – .

If in inequality (4.1.2) we take an arbitrarily small ε >0and n® ¥, then we get

= 1,

which proves the Chebyshev theorem.

An important practical conclusion follows from the considered theorem: we have the right to replace the unknown value of the mathematical expectation of a random variable by the arithmetic mean value obtained from a sufficiently large number of experiments. In this case, the more experiments to calculate, the more likely (reliable) it can be expected that the error associated with this replacement ( - a) will not exceed the given value ε .

In addition, other practical problems can be solved. For example, according to the values of probability (reliability) R=R(| – a|< ε ) and the maximum allowable error ε determine the required number of experiments n; on R and P define ε; on ε and P determine the probability of an event | – a |< ε.

special case. Let at n trials observed n values of a random variable x, having mathematical expectation M(X) and dispersion D(X). The obtained values can be considered as random variables X 1 ,X 2 ,X 3 , ... ,X n,. It should be understood as follows: a series of P tests are carried out repeatedly, so as a result i th test, i= l, 2, 3, ..., P, in each series of tests one or another value of a random variable will appear X, not known in advance. Hence, i-e value x i random variable obtained in i th test, changes randomly if you move from one series of tests to another. So every value x i can be considered random X i .

Assume that the tests meet the following requirements:

1. Tests are independent. This means that the results X 1 , X 2 ,
X 3 , ..., X n tests are independent random variables.

2. Tests are carried out under the same conditions - this means, from the point of view of probability theory, that each of the random variables X 1 ,X 2 ,X 3 , ... ,X n has the same distribution law as the original value X, That's why M(X i) =M(X)and D(X i) = D(X), i = 1, 2, .... P.

Considering the above conditions, we get

R(| –a| < ε )≥1 – . (4.1.3)

Example 4.1.1. X is equal to 4. How many independent experiments are required so that with a probability of at least 0.9 it can be expected that the arithmetic mean of this random variable will differ from the mathematical expectation by less than 0.5?

Decision.According to the condition of the problem ε = 0,5; R(| – a|< 0,5) ≥ 0.9. Applying formula (4.1.3) for the random variable X, we get

P(|–M(X)| < ε ) ≥ 1 – .

From the relation

1 – = 0,9

define

P= = = 160.

Answer: it is required to make 160 independent experiments.

Assuming that the arithmetic mean normally distributed, we get:

R(| – a|< ε )= 2Φ () ≥ 0,9.

From where, using the table of the Laplace function, we get ≥
≥ 1.645, or ≥ 6.58 i.e. n ≥49.

Example 4.1.2. Variance of a random variable X is equal to D( X) = 5. 100 independent experiments were carried out, according to which . Instead of the unknown value of the mathematical expectation a adopted . Determine the maximum amount of error allowed in this case with a probability of at least 0.8.

Decision. According to the task n= 100, R(| –a|< ε ) ≥0.8. We apply the formula (4.1.3)

R(| –a|< ε ) ≥1 – .

From the relation

1 – = 0,8

define ε :

ε 2 = = = 0,25.

Hence, ε = 0,5.

Answer: maximum error value ε = 0,5.

4.2. Law of large numbers in Bernoulli form

Although the concept of probability is the basis of any statistical inference, we can only in a few cases determine the probability of an event directly. Sometimes this probability can be established from considerations of symmetry, equal opportunity, etc., but there is no universal method that would allow one to indicate its probability for an arbitrary event. Bernoulli's theorem makes it possible to approximate the probability if for the event of interest to us BUT repeated independent tests can be carried out. Let produced P independent tests, in each of which the probability of occurrence of some event BUT constant and equal R.

Bernoulli's theorem. With an unlimited increase in the number of independent trials P relative frequency of occurrence of an event BUT converges in probability to probability p occurrence of an event BUT,t. e.

P(½ - p½≤ ε) = 1, (4.2.1)

where ε is an arbitrarily small positive number.

For the final n provided that , the Chebyshev inequality for a random variable will have the form:

P(| –p|< ε ) ≥ 1 – .(4.2.2)

Proof. We apply the Chebyshev theorem. Let be X i– number of occurrences of the event BUT in i th test, i= 1, 2, . . . , n. Each of the quantities X i can only take two values:

X i= 1 (event BUT happened) with a probability p,

X i= 0 (event BUT did not occur) with a probability q= 1–p.

Let be Y n= . Sum X 1 + X 2 + … + X n is equal to the number m event occurrences BUT in n tests (0 m n), which means Y n= – relative frequency of occurrence of the event BUT in n tests. Mathematical expectation and variance X i are equal respectively:

M( ) = 1∙p + 0∙q = p,

Probability theory studies the regularities inherent in mass random phenomena. Like any other science, the theory of probability is designed to predict as accurately as possible the result of a particular phenomenon or experiment. If the phenomenon is of a single nature, then the theory of probability is able to predict only the probability of the outcome in a very wide range. Regularities appear only with a large number of random phenomena occurring in homogeneous conditions.

The group of theorems establishing the correspondence between the theoretical and experimental characteristics of random variables and random events with a large number of tests on them, as well as concerning the limit distribution laws, are combined under the general name limit theorems of probability theory.

There are two types of limit theorems: the law of large numbers and the central limit theorem.

Law of Large Numbers, which occupies an important place in the theory of probability, is the link between the theory of probability as a mathematical science and the laws of random phenomena during mass observations of them.

The law plays a very important role in the practical applications of probability theory to natural phenomena and technical processes associated with mass production.

Limit distribution laws are the subject of a group of theorems - a quantitative form of the law of large numbers. Those. the law of large numbers is a series of theorems, each of which establishes the fact that the average characteristics of a large number of trials approximate certain constants, i.e. establish the fact of convergence in probability of some random variables to constants. These are the theorems of Bernoulli, Poisson, Lyapunov, Markov, Chebyshev.

1. a) Bernoulli's theorem - the law of large numbers ( was formulated and proved earlier in section 3 of § 6 when considering the limit integral theorem of Moivre-Laplace.)

With an unlimited increase in the number of homogeneous independent experiments, the frequency of an event will differ arbitrarily little from the probability of an event in a separate experiment. Otherwise, the probability that the deviation in the relative frequency of the event BUT from a constant probability R events BUT very little as tends to 1 for any : .

b) Chebyshev's theorem.

With an unlimited increase in the number of independent trials, the arithmetic mean of the observed values of a random variable with a finite variance converges in probability to its mathematical expectation; otherwise, if independent identically distributed random variables with mathematical expectation and limited variance , then for any it is true: .

Chebyshev's theorem (generalized). If the random variables in the sequence are pairwise independent and their variances satisfy the condition , then for any positive ε > 0 the statement is true:

or what is the same .

c) Markov's theorem. (the law of large numbers in a general formulation)

If the variances of arbitrary random variables in the sequence satisfy the condition: , then for any positive ε > 0 the statement of the Chebyshev theorem holds: .

d) Poisson's theorem.

With an unlimited increase in the number of independent experiments under variable conditions, the frequency of the event BUT converges in probability to the arithmetic mean of its probabilities under these tests.

Comment. In none of the forms of the law of large numbers do we deal with the laws of distribution of random variables. The question related to finding the limit distribution law for the sum when the number of terms increases indefinitely is considered by the central limit theorem. are identically distributed, then we arrive at the Moivre-Laplace integral theorem (section 3 of § 6), which is the simplest particular case of the central limit theorem.

At the beginning of the course, we already said that the mathematical laws of probability theory are obtained by abstracting the real statistical regularities inherent in mass random phenomena. The presence of these regularities is associated precisely with the mass character of phenomena, that is, with a large number of homogeneous experiments performed or with a large number of random influences that generate in their totality a random variable subject to a well-defined law. The stability property of mass random phenomena has been known to mankind since ancient times. In whatever area it may appear, its essence boils down to the following: the specific features of each individual random phenomenon have almost no effect on the average result of the masses and such phenomena; random deviations from the average, inevitable in each individual phenomenon, in the mass are mutually canceled out, leveled out, leveled out. It is this stability of averages that is the physical content of the "law of large numbers", understood in the broad sense of the word: with a very large number of random phenomena, their average result practically ceases to be random and can be predicted with a high degree of certainty.

In the narrow sense of the word, the “law of large numbers” in probability theory is understood as a number of mathematical theorems, in each of which, for certain conditions, the fact of approximation of the average characteristics of a large number of experiments to some specific constants is established.

In 2.3 we have already formulated the simplest of these theorems, J. Bernoulli's theorem. She claims that with a large number of experiments, the frequency of an event approaches (more precisely, converges in probability) to the probability of this event. Other, more general forms of the law of large numbers will be introduced in this chapter. All of them establish the fact and conditions for the convergence in probability of certain random variables to constant, non-random variables.

The law of large numbers plays an important role in the practical applications of probability theory. The property of random variables under certain conditions to behave practically as non-random ones allows us to confidently operate with these quantities, to predict the results of mass random phenomena with almost complete certainty.

The possibilities of such predictions in the field of mass random phenomena are further expanded by the presence of another group of limit theorems, which no longer concern the limit values of random variables, but limit distribution laws. This is a group of theorems known as the "central limit theorem". We have already said that when summing a sufficiently large number of random variables, the distribution law of the sum approaches the normal one indefinitely, provided certain conditions are met. These conditions, which can be formulated mathematically in various ways - in a more or less general form - essentially boil down to the requirement that the influence on the sum of individual terms be uniformly small, i.e., that the sum should not include terms that clearly prevail over the set the rest by their influence on the dispersion of the amount. Different forms of the central limit theorem differ from each other in the conditions for which this limit property of the sum of random variables is established.

Various forms of the law of large numbers, together with various forms of the central limit theorem, form a set of so-called limit theorems of probability theory. Limit theorems make it possible not only to make scientific forecasts in the field of random phenomena, but also to evaluate the accuracy of these forecasts.

In this chapter, we consider only some of the simplest forms of limit theorems. First, theorems related to the "law of large numbers" group will be considered, then - theorems related to the "central limit theorem" group.

It is quite natural to need to quantify the statement that in "large" series of tests the frequency of occurrence of an event is "close" to its probability. The certain delicacy of this task must be clearly understood. In the most typical cases for the theory of probability, the situation is such that in arbitrarily long series of tests, both extreme values of the frequency remain theoretically possible

\frac(\mu)(n)=\frac(n)(n)=1 and \frac(\mu)(n)=\frac(0)(n)=0

Therefore, whatever the number of trials n, it is impossible to assert with complete certainty that, say, the inequality

<\frac{1}{10}

For example, if the event A consists in throwing a six when throwing a die, then after n throws with probability (\left(\frac(1)(6)\right)\^n>0 !} we will always get only sixes, i.e. with probability (\left(\frac(1)(6)\right)\^n !} we get the frequency of appearance of sixes equal to one, and with a probability (\left(1-\frac(1)(6)\right)\^n>0 !} the six does not fall out even once, i.e. the frequency of the appearance of sixes will be equal to zero.

In all such problems, any non-trivial estimate of the proximity between frequency and probability does not operate with complete certainty, but only with some probability less than unity. It can be proved, for example, that in the case of independent trials with a constant probability p of the occurrence of an event, the inequality

\vline\,\frac(\mu)(n)-p\,\vline\,<0,\!02

for frequency \frac(\mu)(n) will be executed at n=10\,000 (and any p ) with probability

P>0,\!9999.

Here, we first of all want to emphasize that in the above formulation, the quantitative estimate of the proximity of the frequency \frac(\mu)(n) to the probability p is associated with the introduction of a new probability P .

The real meaning of estimate (8) is as follows: if we make N series of n tests and count the number M of series in which inequality (7) is satisfied, then for a sufficiently large N, approximately

\frac(M)(N)\approx P>0,\!9999.

But if we want to refine relation (9) both in terms of the degree of closeness \frac(M)(N) to the probability P , and in terms of the reliability with which it can be argued that such closeness will take place, then we will have to turn to considerations similar to those we've already done with the proximity of \frac(\mu)(n) and p . If desired, such reasoning can be repeated an unlimited number of times, but it is quite clear that this will not allow us to completely free ourselves from the need to turn at the last stage to probabilities in the primitive rough sense of the term.

It should not be thought that such difficulties are some peculiarity of the theory of probability. In the mathematical study of real phenomena, we always schematize them. Deviations of the course of real phenomena from the theoretical scheme can, in turn, be subjected to mathematical study. But for this, these deviations themselves must be placed in a certain scheme, and this latter should be used already without a formal mathematical analysis of deviations from it.

Note, however, that in the actual application of the estimate

P\!\left\(\,\vline\,\frac(\mu)(n)-p\,\vline\,<0,\!02\right\}>0,\!9999.

for a single series of n tests, we also rely on some considerations of symmetry: inequality (10) indicates that for a very large number N of series, relation (7) will be satisfied in at least 99.99% of cases; it is natural to expect with great certainty that, in particular, inequality (7) will be realized in a certain series of n trials that interests us, if we have reason to believe that this series occupies an ordinary, unmarked position in a number of other series.

The probabilities that are usually neglected in various practical positions are different. It has already been noted above that in rough calculations of the consumption of shells, which guarantees the fulfillment of the task, they are satisfied with the rate of consumption of shells, at which the task is solved with a probability of 0.95, i.e., they neglect probabilities that do not exceed 0.05. This is explained by the fact that the transition to calculations proceeding from neglecting, say, only probabilities less than 0.01, would lead to a large increase in the rates of consumption of projectiles, i.e., in almost many cases, to the conclusion that it is impossible to complete the task set in that short the period of time that is available for this, or with the actual supply of projectiles that can be used.

Sometimes, even in scientific research, they are limited to statistical methods calculated on the basis of neglecting probabilities of 0.05. But this should be done only in cases where collecting more extensive material is very difficult. Consider the following problem as an example of such methods. Let us assume that under certain conditions, a commonly used drug for the treatment of a disease gives a positive result in 50%, i.e. with a probability of 0.5. A new drug is proposed and, to test its advantages over the old one, it is planned to use it in ten cases, selected impartially from among patients in the same position as those for whom the old drug was found to be 50% effective. At the same time, it is established that the advantage of a new drug will be considered proven if it gives a positive result in at least eight cases out of ten. It is easy to calculate that such a decision is associated with neglecting the probability of getting an erroneous conclusion (i.e., the conclusion that the benefit of a new drug is proven, while it is equivalent or even worse than the old one) of just the order of 0.05. Indeed, if in each of ten trials the probability of a positive outcome is equal to p, then the probabilities of getting 10.9 or 8 positive outcomes in ten trials are equal, respectively

P_(10)=p^(10),\qquad P_9=10p^9(1-p),\qquad P_8=45p^8(1-p)^2.

In sum, for the case p=\frac(1)(2) we get P=P_(10)+P_9+P_8=\frac(56)(1024)\approx0,\!05.

Thus, assuming that the new drug is in fact exactly equivalent to the old one, we run the risk of drawing the erroneous conclusion that the new drug is superior to the old one with a probability of about 0.05. To reduce this probability to about 0.01, without increasing the number of trials n=10, one would have to establish that the benefit of a new drug would be considered proven only if its use gives a positive result in at least nine cases out of ten. If this requirement seems too harsh for the proponents of the new drug, then the number of trials n will have to be set to be significantly larger than 10. If, for example, at n=100 it is established that the benefits of the new drug will be considered proven when \mu>65 , then the probability of error will only be P\approx0,\!0015 .

If the norm of 0.05 is clearly insufficient for serious scientific research, then the probability of an error of 0.001 or 0.003 is for the most part neglected even in such academic and detailed studies as the processing of astronomical observations. However, sometimes scientific conclusions based on the application of probabilistic laws also have much greater reliability (that is, they are built on the neglect of much lower probabilities). More will be said about this later.

In the considered examples, we have repeatedly used special cases of the binomial formula (6)

P_m=C_n^mp^m(1-p)^(n-m)

for the probability P_m to get exactly m positive outcomes in n independent trials, in each of which a positive outcome has a probability p. Let us use this formula to consider the question posed at the beginning of this section about the probability

<\varepsilon\right\},

where \mu is the actual number of positive outcomes. Obviously, this probability can be written as the sum of those P_m for which m satisfies the inequality

\vline\,\frac(m)(n)-p\,\vline\,<\varepsilon,

that is, in the form

P=\sum_(m=m_1)^(m_2)P_m,

where m_1 is the smallest of m values satisfying inequality (12), and m_2 is the largest of such m .

Formula (13) for any large n is of little use for direct calculations. Therefore, the discovery by Moivre for the case p=\frac(1)(2) and by Laplace, for any p, of an asymptotic formula, which makes it very easy to find and study the behavior of the probabilities P_m for large n, was of great importance. This formula looks like

P\sim\frac(1)(\sqrt(2\pi np(1-p)))\exp\!\left[-\frac((m-np)^2)(2np(1-p)) \right].

If p is not too close to zero or one, then it is sufficiently accurate already for n of the order of 100. If we put

T=\frac(m-np)(\sqrt(np(1-p))),

Then formula (14) will take the form

P\sim\frac(1)(\sqrt(2\pi np(1-p)))\,e^(-t^2/2).

From (13) and (16) we can derive an approximate representation of the probability (11)

P\sim\frac(1)(\sqrt(2\pi))\int\limits_(-T)^(T)e^(-t^2/2)\,dt=F(T),

where

T=\varepsilon\sqrt(\frac(n)(p(1-p)))

The difference between the left and right parts in (17) at constant and different from zero and unity tends to zero at n\to\infty uniformly with respect to \varepsilon. Detailed tables have been compiled for the F(T) function. Here is a short excerpt from them

\begin(array)(c|c|c|c|c)T&1&2&3&4\\\hline F&0,\!68269&0,\!95450&0,\!99730&0,\!99993\end(array)

At T\to\infty the value of the function F(T) tends to unity.

Let us use formula (17) to estimate the probability

P=\mathbf(P)\!\left\(\,\vline\,\frac(\mu)(n)-p\,\vline\,<0,\!02\right\}\approx F\!\left(\frac{2}{\sqrt{p(1-p)}}\right) at n=10\,000,~\varepsilon=0,\!02, as T=\frac(2)(\sqrt(p(1-p))).

Since the function F(T) increases monotonically with increasing T , for a p-independent estimate of P from below, we must take the smallest possible (for different p ) value of T . This smallest value will be obtained with p=\frac(1)(2) , and it will be equal to 4. Therefore, approximately

P\geqslant F(4)=0,\!99993.

Inequality (19) does not take into account the error due to the approximate nature of formula (17). By estimating the error associated with this circumstance, one can in any case establish that P>0,\!9999 .

In connection with the considered example of the application of formula (17), it should be noted that the estimates of the remainder term of formula (17), given in theoretical works on probability theory, remained little satisfactory for a long time. Therefore, the application of formula (17) and similar ones to calculations for not very large n or for probabilities p that are very close to 0 or 1 (and such probabilities in many cases are especially important) were often based only on the experience of checking such results. for a limited number of examples, rather than on well-established estimates of possible error. A more detailed study, moreover, showed that in many practically important cases the above asymptotic formulas need not only an estimate of the remainder term, but also a refinement (because without such a refinement the remainder term is too large). In both directions, the most complete results are due to S. N. Bernshtein.

Relations (11), (17), and (18) can be rewritten as

\mathbf(P)\!\left\(\,\vline\,\frac(\mu)(n)-p\,\vline\,

For sufficiently large t, the right side of formula (20), which does not contain n, is arbitrarily close to unity, i.e., to a probability value that corresponds to full certainty. We see, therefore, that as a rule, the deviations of the frequency \frac(\mu)(n) from the probability p are of the order \frac(1)(\sqrt(n)). Such proportionality of the accuracy of the action of probabilistic regularities to the square root of the number of observations is typical for many other questions as well. Sometimes they even speak, in order of a somewhat simplified popularization, about the "law of the square root of n" as the basic law of probability theory. This idea gained full clarity thanks to the introduction by the great Russian mathematician P. L. Chebyshev into the systematic use of the method of reducing various probabilistic problems to calculations of “mathematical expectations” and “variances” for sums and arithmetic means of “random variables”.

Random variable is a quantity that under given conditions S can take on different values with certain probabilities. It suffices for us to consider random variables that can take only a finite number of different values. To indicate how they say probability distribution such a random variable \xi , it suffices to indicate its possible values x_1,x_2,\ldots,x_r and probabilities

P_r=\mathbf(P)\(\xi=x_r\).

In sum, these probabilities over all different possible values \xi are always equal to one:

\sum_(r=1)^(s)P_r=1.

An example of a random variable is the number \mu of positive outcomes studied above in n trials.

mathematical expectation the value \xi is called the expression

M(\xi)=\sum_(r=1)^(s)P_rx_r,

a dispersion the quantities \xi refer to the mean of the squared deviation \xi-M(\xi) , i.e. the expression

D(\xi)=\sum_(r=1)^(s)P_r(x_r-M(\xi))^2.

The square root of the variance

\sigma_(\xi)=\sqrt(D(\xi))=\sqrt(\sum_(r=1)^(s)P_r(x_r-M(\xi))^2)

called standard deviation(values from its mathematical expectation M(\xi) ).

The simplest applications of variances and standard deviations are based on the famous Chebyshev's inequality

\mathbf(P)\(|\xi-M(\xi)|\leqslant t_(\sigma_(\xi))\)\geqslant1-\frac(1)(t^2),

It shows that deviations of the random variable \xi from its mathematical expectation M(\xi) , which are much larger than the standard deviation \sigma_(\xi) , are rare.

In the formation of sums of random variables \xi=\xi^((1))+ \xi^((2))+\cdots+\xi^((n)) for their mathematical expectations, the equality always holds

M(\xi)=M(\xi^((1)))+M(\xi^((2)))+\cdots+M(\xi^((n))).

Similar equality for variances

D(\xi)=D(\xi^((1)))+D(\xi^((2)))+\cdots+D(\xi^((n))).

true only under certain restrictions. For equality (23) to be valid, it suffices, for example, that the quantities \xi^((i)) and \xi^((j)) with different numbers are not, as they say, "correlated" with each other, i.e., that at i\ne j

M\Bigl\((\xi^((i))-M(\xi^((i))))(\xi^((j))-M(\xi^((j))))\ Bigl\)=0

The correlation coefficient between the random variables \xi^((i)) and \xi^((j)) is the expression

R=\frac(M\Bigl\(\Bigl(\xi^((i))-M(\xi^((i)))\Bigl)\Bigl(\xi^((j))-M( \xi^((j)))\Bigl)\Bigl\))(\sigma_(\xi^((i)))\,\sigma_(\xi^((j)))).

If a \sigma_(\xi^((i)))>0 in \sigma_(\xi^((j)))>0, then condition (24) is equivalent to R=0 .

The correlation coefficient R characterizes the degree of dependence between random variables. Always |R|\leqslant1 , and R=\pm1 only if there is a linear connection

\eta=a\xi+b\quad(a\ne0).

For independent values R=0 .

In particular, equality (24) is satisfied if the quantities \xi^((i)) and \xi^((j)) are independent of each other. Thus, equality (23) always applies to mutually independent terms. For arithmetic averages

\zeta=\frac(1)(n)\Bigl(\xi^((1))+\xi^((2))+\cdots+\xi^((n))\Bigl) from (23) follows

D(\zeta_=\frac(1)(n^2)\Bigl(D(\xi^((1)))+ D(\xi^((2)))+\cdots+ D(\xi^( (n)))\Bigl).

Let us now assume that for all terms the variances do not exceed some constant

D(\xi^((i)))\leqslant C^2. Then by (25) D(\zeta)\leqslant\frac(C^2)(n),

and due to the Chebyshev inequality for any t

\mathbf(P)\!\left\(|\zeta-M(\zeta)|\leqslant\frac(tC)(\sqrt(n))\right\)\geqslant1-\frac(1)(t^ 2)

Inequality (26) contains the so-called law of large numbers in the form established by Chebyshev: if the quantities \xi^((i)) are mutually independent and have limited variances, then as n increases, their arithmetic averages \zeta , less and less noticeably deviate from their mathematical expectations M(\zeta) .

More precisely, they say that sequence of random variables

\xi^((1)),\,\xi^((2)),\,\ldots\,\xi^((n)),\,\ldots

obeys the law of large numbers if for the corresponding arithmetic averages \zeta and for any constant \varepsilon>0

\mathbf(P)\(|\zeta-M(\zeta)|\leqslant \varepsilon\)\to1\quad (n\to\infty).

To obtain the limit relation (27) from inequality (26), it suffices to set

T=\varepsilon\cdot\frac(\sqrt(n))(C).

A large number of studies by A.A. Markova, S.N. Bernstein, A.Ya. Khinchin and others is devoted to the question of the possible widening of the conditions for the applicability of the limit relation (27), i.e., the conditions for the applicability of the law of large numbers. These studies are of fundamental importance. However, even more important is the exact study of the deviation probability distribution \zeta-M(\zeta) .

The great merit of the Russian classical school in probability theory is the establishment of the fact that, under very broad conditions, the equality

\mathbf(P)\!\left\(t_1\sigma_(\zeta)<\zeta-M(\zeta)

Chebyshev gave an almost complete proof of this formula for the case of independent and bounded terms. Markov filled in the missing link in Chebyshev's reasoning and expanded the conditions for the applicability of formula (28). Even more general conditions were given by Lyapunov. The question of extending formula (28) to sums of dependent terms was studied with special completeness by S. N. Bernshtein.

Formula (28) covered such a large number of particular problems that for a long time it was called the central limit theorem of probability theory. Although, with the latest development of the theory of probability, it turned out to be included in a number of more general laws, its importance cannot be overestimated even today.

Time.

If the terms are independent and their variances are the same and equal: D(\xi^((i)))=\sigma^2, then it is convenient for formula (28), taking into account relation (25), to give the form

\mathbf(P)\!\left\(\frac(t_1\sigma)(\sqrt(n))<\zeta-M(\zeta)<\frac{t_2\sigma}{\sqrt{n}}\right\}\sim\frac{1}{\sqrt{2\pi}}\int\limits_{t_1}^{t_2}e^{-t^2/2}\,dt\,.

Let us show that relation (29) contains a solution to the problem of deviations of the frequency \frac(\mu)(n) from the probability p , which we dealt with earlier. To do this, we introduce random variables \xi^((i)) defining them by the following condition:

\xi^((i))=0 if the i -th trial had a negative outcome,

\xi^((i))=1 if the i -th trial had a positive outcome.

It is easy to check that then

and formula (29) gives

\mathbf(P)\!\left\(t_1\sqrt(\frac(p(1-p))(n))<\frac{\mu}{n}-p
which for t_1=-t,~t_2=t again leads to formula (20).
Also see Limit theorems in probability theory Javascript is disabled in your browser.
ActiveX controls must be enabled in order to make calculations!

Lemma Chebyshev. If the random variable X, for which there is a mathematical expectation M[x], can take only non-negative values, then for any positive number a we have the inequality

Chebyshev's inequality. If a X is a random variable with mathematical expectation M[x] and dispersion D[x], then for any positive e we have the inequality

. (2)

Chebyshev's theorem.(law of large numbers). Let be X 1 , X 2 , …, x n,… - a sequence of independent random variables with the same mathematical expectation m and variances limited by the same constant with

. (3)

The proof of the theorem is based on the inequality

, (4)

following from the Chebyshev inequality. From the Chebyshev theorem, as a corollary, one can obtain

Bernoulli's theorem. Let it be produced n independent experiments, in each of which with a probability R some event may occur BUT, let it go v n is a random variable equal to the number of occurrences of the event BUT in these n experiments. Then for any e > 0 we have the limit equality

. (5)

Note that inequality (4) as applied to the conditions of the Bernoulli theorem gives:

. (6)

Chebyshev's theorem can be formulated in a somewhat more general form:

Generalized Chebyshev's theorem. Let be x 1, x 2, …, x n,… - sequence of independent random variables with mathematical expectations M[x 1 ] = m 1 , M[x2] = m 2 ,… and dispersions limited by the same constant with. Then for any positive number e we have the limit equality

. (7)

Let x be the number of occurrences of 6 points in 3600 throws of the die. Then M[ x] = 3600 = 600. Let us now use inequality (1) for a = 900: .

We use inequality (6) for n = 10000, p = , q = . Then

Example.

The probability of occurrence of event A in each of 1000 independent experiments is 0.8. Find the probability that the number of occurrences of event A in these 1000 experiments will deviate from its mathematical expectation in absolute value by less than 50.

Let x be the number of occurrences of event A in the specified 1000 experiments. Then M[ x] = 1000 × 0.8 = 800 and D[ x] = 1000 × 0.8 × 0.2 = 160. Now inequality (2) gives:

Example.

The variance of each of 1000 independent random variables x k (k = 1, 2,..., 1000) is 4. Estimate the probability that the deviation of the arithmetic mean of these variables from the arithmetic mean of their mathematical expectations in absolute value will not exceed 0.1.

According to inequality (4), for c = 4 and e = 0.1, we have