Biographies Characteristics Analysis

What is a normal moment of the second order. Initial and central moments

3.4. Moments of a random variable.

Above, we got acquainted with the exhaustive characteristics of SW: the distribution function and the distribution series - for discrete SW, the distribution function and probability density - for continuous SW. These characteristics, which are pairwise equivalent in information content, are functions and completely describe SW from a probabilistic point of view. However, in many practical situations it is either impossible or not necessary to characterize a random variable in an exhaustive way. It is often sufficient to specify one or more numerical parameters that to some extent describe the main features of the distribution, and sometimes finding exhaustive characteristics, although desirable, is too difficult mathematically, and operating with numerical parameters, we restrict ourselves to an approximate, but simpler description. The specified numerical parameters are called numerical characteristics random variable and play an important role in the applications of probability theory to various fields of science and technology, facilitating the solution of problems and allowing the results of the solution to be presented in a simple and visual form.

The most commonly used numerical characteristics can be divided into two types: moments and position characteristics. There are several types of moments, of which the two most commonly used are: primary and central. Other types of moments, for example, absolute moments, factorial moments, we do not consider. To avoid the use of a generalization of the integral - the so-called Stieltjes integral, we give definitions of the moments separately for continuous and discrete SWs.

Definitions. 1. Starting momentk-th order discrete SW is called the quantity

where f(x) is the probability density of the given SW.

3. Central pointk-th order discrete SW is called the quantity

In cases where several TSs are under consideration at the same time, it is convenient, in order to avoid misunderstandings, to indicate the ownership of the moment; we will do this by indicating the designation of the corresponding CB in brackets, for example, , etc. This notation should not be confused with the function notation, and the letter in brackets with the function argument. The sums and integrals on the right-hand sides of equalities (3.4.1 - 3.4.4) may converge or diverge depending on the value k and specific distribution. In the first case, they say that does not exist or diverges, in the second - that moment exists or converges. If a discrete SW has a finite number of finite values ​​( N finite), then all its moments of finite order k exist. At infinite N, starting from some k and for higher orders, discrete SW moments (simultaneously initial and central) may not exist. The moments of continuous SW, as can be seen from the definitions, are expressed by improper integrals, which can diverge starting from some k and for higher orders (both initial and central). Zero-order moments always converge.

Let us consider in more detail first the initial and then the central moments. From a mathematical point of view, the initial moment k th order is the "weighted average" k-th degrees of SW values; in the case of a discrete RV, the weights are the probabilities of values; in the case of a continuous RV, the weight function is the probability density. Operations of this kind are widely used in mechanics to describe the distribution of masses (static moments, moments of inertia, etc.); the analogies that arise in this connection are discussed below.

For a better understanding of the initial moments, we consider them separately for given k. In probability theory, the moments of lower orders are most important, i.e., for small k, so the consideration should be carried out in ascending order of values k. The initial moment of zero order is equal to

1 , for discrete SW;

=1 , for continuous SW,

those. for any SW it is equal to the same value - one, and therefore does not carry any information about the statistical properties of the SW.

The initial moment of the first order (or the first initial moment) is equal to

For discrete CB;

, for continuous SW.

This moment is the most important numerical characteristic of any SW, for several interrelated reasons. First, according to the Chebyshev theorem (see Section 7.4), with an unlimited number of trials on SW, the arithmetic mean of the observed values ​​tends (in a certain sense) to experience. Second, for a continuous SW, it is numerically equal to X-th coordinate of the center of gravity of the curvilinear trapezoid formed by the curve f(x) (a similar property also holds for a discrete SW), so this moment could be called the “center of gravity of the distribution”. Thirdly, this moment has remarkable mathematical properties that will become clear during the course, in particular, therefore its value is included in the expressions for the central moments (see (3.4.3) and (3.4.4)).

The importance of this moment for theoretical and practical problems of probability theory and its remarkable mathematical properties have led to the fact that, in addition to the designation and name "first initial moment", other designations and names are used in the literature that are more or less convenient and reflect the mentioned properties. The most common names are: expected value, mean, and notation: m, M[X], . We will most often use the term "expectation" and the notation m; if there are several RVs, we will use a subscript indicating the ownership of the mathematical expectation, for example, m x , m y etc.

The initial moment of the second order (or the second initial moment) is equal to

For discrete CB;

, for continuous SW;

sometimes it is called the mean square of a random variable and denoted M.

The initial moment of the third order (or the third initial moment) is equal to

For discrete CB;

, for continuous SW

sometimes it is called average cube of a random variable and denoted M[X 3 ].

It makes no sense to go on listing the initial moments. Let us dwell on an important interpretation of the order moments k>1. Let, along with SW X there is also a SW Y, and Y=X k (k=2, 3, ...). This equality means that the random variables X and Y are deterministically related in the sense that when the SW X takes on the value x, SV Y takes on the value y=x k(Subsequently, such a CV connection will be considered in more detail). Then, according to (3.4.1) and (3.4.2)

=m y , k=2, 3, ...,

i.e. k-th initial moment of the SW is equal to the mathematical expectation kth power of this random variable. For example, the third initial moment of the edge length of a random cube is equal to the expected volume of the cube. The possibility of understanding moments as some kind of mathematical expectation is another facet of the importance of the concept of mathematical expectation.

Let's move on to the central points. Since, as will become clear somewhat below, the central moments are uniquely expressed in terms of the initial moments and vice versa, the question arises why central moments are needed at all and why the initial moments are not enough. Consider SW X(continuous or discrete) and another RV Y related to the first as Y=X+a, where a 0 is a non-random real number. Each value x random variable X corresponds to the value y=x+a random variable Y, hence the distribution of SW Y will have the same shape (expressed by the distribution polygon in the discrete case or by the probability density in the continuous case) as the CV distribution X, but shifted along the x-axis by a. Therefore, the initial moments of SW Y will differ from the corresponding moments of SW X. For example, as it is easy to see, m y =m x +a(moments of higher order are related by more complex relations). So we have established that the initial moments are not invariant under the shift of the distribution as a whole. The same result will be obtained if we shift not the distribution, but the beginning of the x-axis horizontally by the value - a, i.e. the equivalent conclusion is also valid: the initial moments are not invariant with respect to the horizontal shift of the origin of the x-axis.

From this shortcoming, the central moments are free, intended to describe those properties of distributions that do not depend on their shift as a whole. Indeed, as can be seen from (3.4.3) and (3.4.4), when the distribution is shifted as a whole by the value a, or, what is the same, shifting the beginning of the abscissa axis by - a, all values x, for the same probabilities (in the discrete case) or the same probability density (in the continuous case), will change by a, but the value will also change m, so that the values ​​of the parentheses on the right side of the equalities will not change. Thus, the central moments are invariant with respect to the shift of the distribution as a whole, or, what is the same, with respect to the shift of the beginning of the abscissa axis along the horizontal. These moments received the name "central" in those times when the first initial moment was called the "center". It is useful to note that the central moment of SW X can be understood as the corresponding initial moment of SW X 0 equal to

X 0 =X-m x .

SW X 0 is called centered(in relation to SV X), and the operation leading to it, i.e. subtracting its mathematical expectation from a random variable, is called centering. As we shall see later, this concept and this operation will be useful throughout the course. Note that the central moment of the order k>1 can be considered as the mathematical expectation (average) k th degree of centered CB: .

Let us consider separately the central moments of lower orders. The central moment of zero order is equal to

, for discrete SW;

, for continuous SW;

i.e., for any SW and does not carry any information about the statistical properties of this SW.

The first order central moment (or first central moment) is

for discrete SW;

for continuous SW; i.e., for any SW and does not carry any information about the statistical properties of this SW.

The second order central moment (or second central moment) is

, for discrete SW;

, for continuous SW.

As it turns out below, this moment is one of the most important in probability theory, since it is used as a characteristic of the measure of spread (or scattering) of SW values, therefore it is often called dispersion and denoted D X. Note that can be understood as the mean square of the centered SW.

The central moment of the third order (third central moment) is equal to

Consider a discrete random variable given by the distribution law:

Expected value equals:

We see that it is much more. This can be explained by the fact that the value x= -150, which is much different from the rest of the values, increased sharply when squared; the probability of this value is small (0.02). Thus, the transition from M(X) to M(X2) made it possible to better take into account the influence on the mathematical expectation of such values ​​of a random variable that are large in absolute value, but the probability of their occurrence is small. Of course, if the quantity had several large and unlikely values, then the transition to the quantity x2, and even more so to the values , etc., would allow even more "strengthening the role" of these large, but unlikely possible values. That is why it turns out to be appropriate to consider the mathematical expectation of a positive integer power of a random variable, not only discrete, but also continuous.

Definition 6.10. The initial moment of the th order of a random variable is the mathematical expectation of the value:

In particular:

Using these points, the formula for calculating the variance can be written differently

In addition to the moments of a random variable, it is advisable to consider the moments of deviation .

Definition 6.11. The central moment of the th order of a random variable is the mathematical expectation of the value .

(6.23)

In particular,

It is easy to derive relations connecting the initial and central moments. So, comparing (6.22) and (6.24), we get:

It is not difficult to prove the following relations:

Similarly:

Moments of higher orders are rarely used. In determining the central moments, deviations of a random variable from its mathematical expectation (center) are used. Therefore, the moments are called central.

In determining the initial moments, deviations of a random variable are also used, but not from the mathematical expectation, but from a point whose abscissa is equal to zero, which is the origin. Therefore, the moments are called initial.

In the case of a continuous random variable, the initial moment of the th order is calculated by the formula:

(6.27)

The central moment of the th order of a continuous random variable is calculated by the formula:

(6.28)

Assume that the distribution of a random variable is symmetrical with respect to the mathematical expectation. Then all central moments of odd order are equal to zero. This can be explained by the fact that for each positive value of the quantity X-M(X) exists (due to the symmetry of the distribution with respect to M(X)) the negative value of this quantity equal to it in absolute value, and their probabilities will be the same.



If the central moment of an odd order is not equal to zero, then this indicates the asymmetry of the distribution, and the larger the moment, the greater the asymmetry. Therefore, it is most reasonable to take some odd central moment as a characteristic of the distribution asymmetry. Since the central moment of the first order is always equal to zero, it is advisable to use the central moment of the third order for this purpose.

Definition 6.12. The asymmetry coefficient is the value:

If the asymmetry coefficient is negative, then this indicates a large influence on the magnitude of negative deviations. In this case, the distribution curve (Fig. 6.1 a) more than a canopy to the left of . If the coefficient is positive, which means that the influence of positive deviations prevails, then the distribution curve is flatter on the right.

As is known, the second central moment (dispersion) serves to characterize the dispersion of the values ​​of a random variable around its mathematical expectation. If this moment for some random variable is large enough, i.e. the dispersion is large, then the corresponding distribution curve is flatter than the distribution curve of a random variable with a smaller moment of the second order. However, the moment cannot serve this purpose, because for any distribution .

In this case, the central moment of the fourth order is used.

Definition 6.13. The kurtosis is the value:

For the most common normal distribution law in nature, the ratio . Therefore, the kurtosis given by formula (6.28) serves to compare this distribution with the normal one (Fig. 6.1 b).

Initial and central moments are used to characterize various properties of random variables.

Starting momentk-order random variable X is the mathematical expectation of the k-th power of this variable:

α K \u003d M.

For a discrete random variable

C

X \u003d X - M [X]

A centered random variable is the deviation of a random variable from its mathematical expectation:

Let us agree to distinguish a centered r.v. 0 at the top.

Central pointS-th order is called the mathematical expectation of the S-th degree of a centered random variable

 S \u003d M [(X - m x) S ].

For a discrete random variable

 S = (x i – m x) S p i .

For a continuous random variable

.

Moment Properties of Random Variables

    the initial moment of the first order is equal to the mathematical expectation (by definition):

α 1 \u003d M \u003d m x.

    the central moment of the first order is always equal to zero (we will prove it using the example of a discrete r.v.):

 1 \u003d M [(X - m x) 1 ] \u003d (x i – m x) p i = x i p i - m x p i = m x –m x p i \u003d m x -m x \u003d 0.

    the central moment of the second order characterizes the spread of a random variable around its mathematical expectation.

The second order central moment is called dispersion with. in. and denoted by D[X] or D x

The variance has the dimension of the square of a random variable.

    Standard deviationσ x \u003d √D x.

σ x - as well as D x characterizes the spread of a random variable around its mathematical expectation but has the dimension of a random variable.

    the second initial moment α 2 characterizes the degree of spread of the random variable around its mathematical expectation, as well as the shift of the random variable on the real axis

Relationship between the first and second initial moments and dispersion (on the example of a continuous r.v.):

    the third central moment characterizes the degree of spread of the random variable around the mathematical expectation, as well as the degree of asymmetry in the distribution of the random variable.

f(x avg) > f(-x avg)

For symmetric distribution laws m 3 = 0.

To characterize only the degree of asymmetry, the so-called asymmetry coefficient is used.

For a symmetric distribution Sk = 0

    the fourth central moment characterizes the degree of spread of the random variable around the mathematical expectation, as well as the degree of peaking of the distribution law.

Of particular importance for characterizing the distribution of a random variable are numerical characteristics called initial and central moments.

Starting moment k-th order a k(X) random variable X k th power of this quantity, i.e.

a k(X) = M(X k) (6.8)

Formula (6.8), by virtue of the definition of mathematical expectation for various random variables, has its own form, namely, for a discrete random variable with a finite set of values

for a continuous random variable

, (6.10)

where f(x) is the distribution density of the random variable X.

The improper integral in formula (6.10) turns into a definite integral over a finite interval if there are values ​​of a continuous random variable only in this interval.

One of the previously introduced numerical characteristics - the mathematical expectation - is nothing more than the initial moment of the first order, or, as they say, the first initial moment:

M(X) = α 1 (X).

In the previous subsection, the notion of a centered random variable was introduced HM(X). If this quantity is considered as the main one, then the initial moments can also be found for it. For the value itself X these moments will be called central.

Central point k-th order µk(X) random variable X is called expectation k th power of a centered random variable, i.e.

µk(X) = M[(HM(X))k] (6.11)

In other words, the central moment k-th order is the mathematical expectation k th degree of deviation.

central moment k-th order for a discrete random variable with a finite set of values ​​is found by the formula:

, (6.12)

for a continuous random variable according to the formula:

(6.13)

In the future, when it becomes clear what kind of random variable we are talking about, we will not write it in the notation of the initial and central moments, i.e. instead of a k(X) and µk(X) we will simply write a k and µk .

Obviously, the central moment of the first order is equal to zero, since this is nothing but the mathematical expectation of the deviation, which is equal to zero according to the previously proven, i.e. .

It is easy to understand that the central moment of the second order of a random variable X coincides with the variance of the same random variable, i.e.

In addition, there are the following formulas relating the initial and central moments:

So, the moments of the first and second orders (mathematical expectation and variance) characterize the most important features of the distribution: its position and the degree of spread of values. Higher-order moments serve for a more detailed description of the distribution. Let's show it.

Assume that the distribution of a random variable is symmetrical with respect to its mathematical expectation. Then all central moments of odd order, if they exist, are equal to zero. This is explained by the fact that, due to the symmetry of the distribution, for each positive value of the quantity XM(X) there is a negative value equal to it in absolute value, while the probabilities of these values ​​are equal. Consequently, the sum in formula (6.12) consists of several pairs of terms equal in absolute value but different in sign, which cancel each other out during summation. Thus, the entire amount, i.e. the central moment of any odd order of a discrete random variable is equal to zero. Similarly, the central moment of any odd order of a continuous random variable is equal to zero, as an integral in symmetric limits of an odd function.

It is natural to assume that if the central moment of an odd order is different from zero, then the distribution itself will not be symmetrical with respect to its mathematical expectation. In this case, the more the central moment differs from zero, the greater the asymmetry in the distribution. Let us take as a characteristic of asymmetry the central moment of the smallest odd order. Since the central moment of the first order is equal to zero for random variables having any distributions, it is better to use the central moment of the third order for this purpose. However, this moment has the dimension of a cube of a random variable. To get rid of this shortcoming and pass to a dimensionless random variable, the value of the central moment is divided by the cube of the standard deviation.

Asymmetry coefficient A s or simply asymmetry is the ratio of the central moment of the third order to the cube of the standard deviation, i.e.

Sometimes asymmetry is called "skewness" and is denoted S k, which comes from the English word skew - "oblique".

If the asymmetry coefficient is negative, then its value is strongly influenced by negative terms (deviations) and the distribution will have left asymmetry, and the graph (curve) of the distribution is flatter to the left of the mathematical expectation. If the coefficient is positive, then right asymmetry, and the curve is flatter to the right of the mathematical expectation (Fig. 6.1).



As it was shown, the second central moment serves to characterize the spread of values ​​of a random variable around its mathematical expectation, i.e. dispersion. If this moment has a large numerical value, then this random variable has a large spread of values ​​and the corresponding distribution curve has a flatter shape than the curve for which the second central moment has a smaller value. Therefore, the second central moment characterizes, to some extent, the "flat-topped" or "pointed" distribution curve. However, this feature is not very convenient. The central moment of the second order has a dimension equal to the square of the dimension of the random variable. If we try to get a dimensionless value by dividing the value of the moment by the square of the standard deviation, then for any random variable we get: . Thus, this coefficient cannot be any characteristic of the distribution of a random variable. It is the same for all distributions. In this case, a fourth-order central moment can be used.

kurtosis E k is called the value determined by the formula

(6.15)

Kurtosis is mainly used for continuous random variables and serves to characterize the so-called "steepness" of the distribution curve, or otherwise, as already mentioned, to characterize the "flat top" or "pointedness" of the distribution curve. The normal distribution curve is considered as the reference distribution curve (it will be discussed in detail in the next chapter). For a random variable distributed according to the normal law, the equality takes place. Therefore, the kurtosis given by formula (6.15) serves to compare this distribution with the normal one, in which the kurtosis is equal to zero.

If a positive kurtosis is obtained for some random variable, then the distribution curve of this value is more peaked than the normal distribution curve. If the kurtosis is negative, then the curve is flatter than the normal distribution curve (Figure 6.2).



Let us now turn to specific types of distribution laws for discrete and continuous random variables.

The central moments of the distribution are called, in the calculation of which the deviation of the variants from the arithmetic mean of the given series is taken as the initial value.

1. Calculate the central moment of the first order according to the formula:

2. Calculate the central moment of the second order according to the formula:

where is the value of the middle of the intervals;

This is a weighted average;

Fi is the number of values.

3. Calculate the central moment of the third order according to the formula:

where is the value of the middle of the intervals; is the weighted average; - fi-number of values.

4. Calculate the central moment of the fourth order according to the formula:

where is the value of the middle of the intervals; is the weighted average; - fi-number of values.

Calculation for table 3.2

Calculation for table 3.4

1. Calculate the central moment of the first order according to the formula (7.1):

2. Calculate the central moment of the second order according to the formula (7.2):

3. Calculate the central moment of the third order according to the formula (7.3):

4. Calculate the central moment of the fourth order according to the formula (7.4):

Calculation for table 3.6

1. Calculate the central moment of the first order according to the formula (7.1):

2. Calculate the central moment of the second order according to the formula (7.2):

3. Calculate the central moment of the third order according to the formula (7.3):

4. Calculate the central moment of the fourth order according to the formula (7.4):






Moments of 1,2,3,4 orders for three tasks are calculated. Where the third order moment is needed to calculate the skewness, and the fourth order moment is needed to calculate the kurtosis.

CALCULATION OF DISTRIBUTION ASYMMETRY

In statistical practice, there are various distributions. There are the following types of distribution curves:

unimodal curves: symmetrical, moderately asymmetric and extremely asymmetric;

multivertex curves.

Homogeneous populations, as a rule, are characterized by unimodal distributions. Multi-vertex indicates the heterogeneity of the studied population. The appearance of two or more vertices makes it necessary to regroup the data in order to isolate more homogeneous groups.

Finding out the general nature of the distribution involves an assessment of its homogeneity, as well as the calculation of indicators of asymmetry and kurtosis. For symmetrical distributions, the frequencies of any two variants that are equally spaced on both sides of the distribution center are equal to each other. The mean, mode, and median calculated for such distributions are also equal.

In a comparative study of the asymmetry of several distributions with different units of measurement, the relative indicator of asymmetry () is calculated:

where is the weighted average; Mo-fashion; - root-mean-square weighted variance; Me-median.

Its value can be positive or negative. In the first case, we are talking about right-sided asymmetry, and in the second, about left-sided.

With right-sided asymmetry Mo>Me>x. The most widely used (as an indicator of asymmetry) is the ratio of the central moment of the third order to the standard deviation of this series in the cube:

where is the central moment of the third order; is the standard deviation cubed.

The use of this indicator makes it possible to determine not only the magnitude of the asymmetry, but also to check its presence in the general population. It is generally accepted that skewness above 0.5 (regardless of sign) is considered significant; if it is less than 0.25, then it is insignificant.

The significance assessment is based on the mean square error, the skewness coefficient (), which depends on the number of observations (n) and is calculated by the formula:

where n is the number of observations.

In the case, the asymmetry is significant and the distribution of the trait in the general population is asymmetrical. Otherwise, the asymmetry is insignificant and its presence can be caused by random circumstances.

Calculation for table 3.2 Population grouping by average monthly salary, rub.

Left-sided, significant asymmetry.

Calculation for table 3.4 Grouping of stores by retail turnover, million rubles

1. Define asymmetries by formula (7.5):

Right-sided, significant asymmetry.

Calculation for table 3.6 Grouping of transport organizations by freight turnover of public transport (mln.t.km)

1. Define asymmetries by formula (7.5):

Right-sided, slight asymmetry.

CALCULATION OF KURTUS DISTRIBUTION

For symmetric distributions, the kurtosis indicator () can be calculated:

where is the central moment of the fourth order; - standard deviation in the fourth degree.

Calculation for table 3.2 Population grouping by average monthly salary, rub.

Calculation for table 3.4 Grouping of stores by retail turnover, million rubles

Calculate the kurtosis indicator using the formula (7.7)

Peaked distribution.

Calculation for table 3.6 Grouping of transport organizations by freight turnover of public transport (mln.t.km)

Calculate the kurtosis indicator using the formula (7.7)

Flat top distribution.

ASSESSMENT OF THE HOMOGENEITY OF THE POPULATION

Uniformity score for table 3.2 Population grouping by average monthly salary, rub.

It should be noted that although the indicators of asymmetry and kurtosis directly characterize only the form of distribution of a trait within the studied population, their definition is not only descriptive. Often, asymmetry and kurtosis provide certain indications for further research on socio-economic phenomena. The result obtained indicates the presence of a significant and negative in nature asymmetry, it should be noted that the asymmetry is left-handed. In addition, the population has a flat-top distribution.

Uniformity score for table 3.4 Grouping of stores by retail turnover, million rubles

The result obtained indicates the presence of a significant and positive in nature asymmetry, it should be noted that the asymmetry is right-handed. And also the set has a sharp-vertex distribution.

Uniformity score for table 3.6 Grouping of transport organizations by freight turnover of public transport (mln.t.km)

The result obtained indicates the presence of a small and positive in nature asymmetry, it should be noted that the asymmetry is right-handed. In addition, the population has a flat-topped distribution.