Interval variation series at equal intervals. Students and schoolchildren - assistance in studying

When processing large amounts of information, which is especially important when carrying out modern scientific developments, the researcher faces the serious task of correctly grouping the source data. If the data is discrete in nature, then, as we have seen, no problems arise - you just need to calculate the frequency of each feature. If the characteristic under study has continuous nature (which is more common in practice), then choosing the optimal number of feature grouping intervals is by no means a trivial task.

For grouping continuous random variables, the entire variation range characteristics are divided into a number of intervals To.

Grouped interval (continuous) variation series are called intervals ranked by the value of the attribute (), where the numbers of observations falling into the r"th interval, or relative frequencies (), are indicated together with the corresponding frequencies ():

Characteristic value intervals
mi frequency

bar chart And cumulate (ogiva), already discussed in detail by us, are an excellent means of data visualization, allowing you to get a primary idea of the data structure. Such graphs (Fig. 1.15) are constructed for continuous data in the same way as for discrete data, only taking into account the fact that continuous data completely fills the region of its possible values, taking on any values.

Rice. 1.15.

That's why the columns on the histogram and the cumulate must touch each other and have no areas where the attribute values do not fall within all possible(i.e., the histogram and cumulates should not have “holes” along the abscissa axis, which do not contain the values of the variable being studied, as in Fig. 1.16). The height of the bar corresponds to frequency – the number of observations falling within a given interval, or relative frequency – the proportion of observations. Intervals must not intersect and are usually the same width.

Rice. 1.16.

The histogram and polygon are approximations of the probability density curve ( differential function) f(x) theoretical distribution, considered in the course of probability theory. Therefore, their construction is so important in the primary statistical processing of quantitative continuous data - by their appearance one can judge the hypothetical distribution law.

Cumulate – a curve of accumulated frequencies (frequencies) of an interval variation series. The graph of the cumulative distribution function is compared with the cumulate F(x), also discussed in the probability theory course.

Basically, the concepts of histogram and cumulate are associated specifically with continuous data and their interval variation series, since their graphs are empirical estimates of the probability density function and distribution function, respectively.

The construction of an interval variation series begins with determining the number of intervals k. And this task is perhaps the most difficult, important and controversial in the issue under study.

The number of intervals should not be too small, as this will make the histogram too smooth ( oversmoothed), loses all the features of variability of the original data - in Fig. 1.17 you can see how the same data on which the graphs in Fig. 1.15, used to construct a histogram with a smaller number of intervals (left graph).

At the same time, the number of intervals should not be too large - otherwise we will not be able to estimate the distribution density of the studied data along the numerical axis: the histogram will be under-smoothed (undersmoothed), with empty intervals, uneven (see Fig. 1.17, right graph).

Rice. 1.17.

How to determine the most preferable number of intervals?

Back in 1926, Herbert Sturges proposed a formula for calculating the number of intervals into which it is necessary to divide the original set of values of the characteristic being studied. This formula has truly become extremely popular - most statistical textbooks offer it, and many statistical packages use it by default. How justified this is and in all cases is a very serious question.

So, what is the Sturges formula based on?

Let's consider binomial distribution, the upper limit of which includes the last number of the ranked series.

We are building interval series(Table 2.3).

Interval series of distribution of firms and the average number of managers in one of the regions of the Russian Federation in the first quarter of the reporting year

Conclusion. The largest group of firms is the group with an average number of managers of 25-30 people, which includes 8 firms (27%); The smallest group with an average number of managers of 40-45 people includes only one company (3%).

Using the initial data from table. 2.1, as well as an interval series of distribution of firms by number of managers (Table 2.3), required build an analytical grouping of the relationship between the number of managers and the sales volume of firms and, based on it, draw a conclusion about the presence (or absence) of a relationship between these characteristics.

Solution:

Analytical grouping is based on factor characteristics. In our problem, the factor characteristic (x) is the number of managers, and the resultant characteristic (y) is the sales volume (Table 2.4).

Let's build now analytical grouping(Table 2.5).

Conclusion. Based on the data of the constructed analytical grouping, we can say that with an increase in the number of sales managers, the average sales volume of the company in the group also increases, which indicates the presence of a direct connection between these characteristics.

Table 2.4

Auxiliary table for constructing an analytical grouping

Number of managers, people,	Company number	Sales volume, million rubles, y











		" = 59 f = 9.97








		I-™ 4 - Yu.22







		74 '25 1PY1 U4 = 7 = 10,61










			at = ’ =10,31 30

Table 2.5

Dependence of sales volumes on the number of company managers in one of the regions of the Russian Federation in the first quarter of the reporting year

CONTROL QUESTIONS

1. What is the essence of statistical observation?
2. Name the stages of statistical observation.
3. What are organizational forms statistical observation?
4. Name the types of statistical observation.
5. What is a statistical summary?
6. Name the types of statistical reports.
7. What is statistical grouping?
8. Name the types of statistical groupings.
9. What is a distribution series?
10. Name the structural elements of the distribution row.
11. What is the procedure for constructing a distribution series?