Biographies Characteristics Analysis

Secondary grouping example. Secondary groupings

Groupings built over the same period of time, but for different objects or, conversely, for one object, but over two different periods time may not be comparable due to various numbers selected groups or differences in interval boundaries.

Secondary grouping, or regrouping of grouped data, is used to better characteristics the phenomenon being studied (in the case where the initial grouping does not clearly identify the nature of the distribution of population units), or to bring the groupings to a comparable type in order to carry out comparative analysis.

Secondary grouping- an operation to form new groups based on a previously carried out grouping.

There are two ways to form new groups. The first, simplest and most common way is to change (usually enlarge) the initial intervals. The second method is called fractional regrouping and consists of the formation of new groups based on assignment to each group a certain share units of the population. Let us illustrate the secondary grouping technique with the following example.

Distribution of company employees by income level

We will regroup the data, forming new groups with intervals of up to 5, 5-10, 10-20, 20-30, over 30 thousand rubles.

First new group will include the entire first group of employees and part of the second group. To form a group of up to 5 thousand rubles, you need to take 1.0 thousand rubles from the interval of the second group. The value of the interval of this group is 6.0 thousand rubles. Therefore, it is necessary to take 1/6 (1.0:6.0) part from it. A similar part in the newly formed first group must be taken from the number of workers, that is, 20 x 1/6 = 3 people. Then in the first group there will be workers: 16+3 = 19 people.

The second new group is formed by those working in the second group minus those assigned to the first, that is, 20-3 = 17 people. The newly formed third group will include all employees of the third group and some employees of the fourth. To determine this part from the interval 18-30 (the width of the interval is 12), you need to add 2.0 to the previous one (so that the upper limit of the interval is equal to 2.0 thousand rubles). Therefore, it is necessary to take a part of the interval equal to . There are 74 people in this group, which means we need to take 74x(1:6) = 12 people. The new third group will include 44+12 = 56 people. The newly formed fourth group will include 74-12 = 62 people remaining from the previous fourth group. The fifth newly formed group will consist of workers from the fifth and sixth of the previous groups: 37+9 = 46 people.

Topic 3. STATISTICAL SUMMARY AND GROUPING OF DATA.

Summary objectives and contents

Statistical summary is a scientifically organized processing of materials statistical observation. The purpose of the summary is to obtain, on the basis of the summarized materials, generalized statistical indicators that reflect the essence of socio-economic phenomena.

Statistical reports differ in a number of ways:

    According to the complexity of construction the summary can be simple or complex. If you imagine general results for the studied population as a whole without any preliminary systematization of the collected material - this is simple summary .Complex summary represents set of operations, including grouping observation units, calculating totals for each group and for everything and presenting grouping results and summaries in the form of statistical tables.

    By development method reports are divided into centralized when all data is concentrated in one organization and compiled according to a developed methodology (used to process materials from one-time statistical observations). At decentralized generalization of the material is carried out from bottom to top along the hierarchical management ladder, undergoing appropriate processing at each of them (used to process statistical reporting).

    By technique reporting is divided into mechanized and manual.

Thus, a statistical summary is the systematization and grouping of digital data, the characteristics of formed groups, a system of indicators, the calculation of the corresponding results and the presentation of the results of the summary in the form of tables and graphs.

To carry out the summary, a plan is drawn up that sets out organizational issues: by whom and when all operations will be carried out, the procedure for its implementation, and the composition of information to be published in periodicals.

Grouping method

Initial information at the summary stage is systematized, separate statistical aggregates are formed, i.e. statistical grouping is carried out.

Grouping - this is the division of a population into groups that are homogeneous according to some characteristic.

A special type of grouping is classification. It is based on the most essential features that change very little (for example, classification of industries National economy, classification of fixed assets).

Distinctive features of the classification:

    The basis is a qualitative sign.

    They are standard.

    They are resilient.

That is, the classification is a legalized, generally recognized, normative grouping. Classification is the basis of groupings.

    Grouping sign- this is a sign by which individual units of the population are combined into homogeneous groups. They are attributive - according to qualitative sign and quantitative.

Classification of grouping characteristics

According to the form of expression

attributive that do not have a quantitative expression (profession, education);

quantitative : 1)discrete(discontinuous), the values ​​of which are expressed only in whole numbers (number of rooms, children); 2) continuous, values ​​that can be either integer or fractional.

By the nature of fluctuation

alternative , which some units possess and others do not (quality);

having many quantitative values

According to the role of the sign in the relationship of the phenomena being studied

factorial, affect other signs;

productive, influenced by others

To find the number of groups, use the Sturgess formula

n = 1 + 3.322 logN,

where N is the number of elements of the population.

According to this formula, the choice of the number of groups depends on the size of the population.

The disadvantage of the formula is that its application gives good results if the population consists of a large number of units and the distribution of units according to the characteristic that forms the basis of the grouping is close to normal.

Another way to determine the number of groups is based on the use of the standard deviation indicator (). It is calculated

where is the average value of the characteristic in the population, which is determined by the formula;

If the value of the interval is 0.5, then the population is divided into 12 groups, and when the value of the interval is 2/3 and, then the population is divided into 9 and 6 groups, respectively.

If divided into 6 groups, the following intervals are obtained:

These methods do not guarantee that “empty” or small groups will not be formed. “Empty” groups are considered to be those in which not a single unit of the population is included. The presence of such intervals indicates that the grouping is constructed incorrectly.

Once the number of groups has been determined, the grouping intervals must be determined.

Interval- represents the gap between the maximum and minimum values ​​of the characteristic in the group.

Each interval has its own value, upper and lower boundaries, or at least one of them.

The lower limit of the interval is the smallest value of the characteristic in the interval, and the upper limit is the largest value of the characteristic in it. The interval value is the difference between the upper and lower limits of the interval.

Grouping intervals, depending on their size, can be equal or unequal. Unequals are divided into progressively increasing, progressively decreasing, arbitrary and specialized.

If the variation of a trait manifests itself within relatively narrow boundaries and the distribution is more or less uniform, then a grouping is built at regular intervals .

Hmah - Hmmin

h= ---------------- ;

Before determining the magnitude of the variation, it is recommended to exclude anomalous observations from the population.

The value obtained from the formula is rounded. It is the interval step.

There are the following rules for determining the interval step.

If the interval value is a value that has one decimal place (for example, 0.66; 1.372; 5.8), then it is advisable to round the resulting values ​​to tenths and use them as an interval step. (0.7; 1.4; 5.8).

When the calculated interval value has two significant figures before the decimal point and several decimal places, then this value must be rounded to

For example, X max = 180, X min= 80, n= 5.

h= (Xmax - Xmin) / p;

h= (180 - 80) / 5 = 20;

Therefore we got the following intervals

80-100; 100-120; 120-140; 140-160; 160-180.

b) unequal, when the width of the interval gradually increases, and the upper interval is often not closed at all. Unequal intervals are used more often in economic practice.

V) open, when there is only either the upper or bottom line. The need for open intervals is due to the spread of its quantitative values, which require the formation of many groups if separated by both boundaries.

G) closed, when there is both a lower and an upper boundary. If indivisible units are people, then 1-3, 4-7, 8-11. With a continuous change in the attribute, the same number serves as the upper and lower boundaries of two adjacent groups (90-120, 120-150, 150-180).

With such a construction of intervals, the question of assigning units of an observation object to groups is solved in practice in two ways: according to the principle of “inclusive” and “exclusive”.

Application depends on the form of writing intervals, especially the first and last groups.

    180 and more - exclusively - 180 is included in the last

    over 180 - inclusive - 180 is included in the previous one.

In practice, both occur, but preference is given to the “exclusive” principle.

The middle value of the intervals is determined in several ways.

    We sum the upper and lower limits of the interval and divide by 2.

    The middle of the 2nd interval plus the value of the interval.

    The middle of the 2nd interval minus the value of the interval (for open).

    To the middle of the penultimate interval we add the value of the interval (for open ones).

Kinds statistical groupings

    Typological grouping. Essence: Isolation of the main types from the many features characterizing the phenomena under study into qualitatively homogeneous ones. If attributive feature, then the number of groups is determined by the properties of the phenomenon being studied. For example, population grouping by gender and age, number by year, PPP, including workers, students, engineers, employees, Ministry of Labor. Isolation of types based on quantitative characteristic consists in defining groups taking into account the values ​​of the characteristics being studied. Example: nursery 0-2; preschool 3-6; school 7-17; able-bodied 16-54 for women and 16-59 for men.

Typological groupings are widely used in the study of socio-economic phenomena and processes.

Grouping by type of ownership in 1998

    Structural grouping. These are groupings used to study the structure of the population being studied. For the most part, structural groupings are made on the basis of the formation of qualitatively homogeneous groups. With the help of such groupings, the following can be studied: the composition of the population by gender, age, place of residence, the composition of enterprises by number of employees, and the value of fixed assets.

Grouping of the Russian population by place of residence

for 1959-1994

    Analytical grouping (factorial). It is used to study the relationship between individual characteristics. For example, between work experience and qualifications, worker category and education. Features of the analytical grouping: firstly, it is based on a factor characteristic; secondly, each selected group is characterized by average values ​​of the resulting characteristic.

Grouping of commercial banks in Russia by balance sheet assets

Group of banks by total balance sheet assets, million rubles.

Number of banks, units

On average per bank

number of employees, people

Balance sheet profit, billion rubles.

50,000 or more

    Combined group. This is the formation of groups according to two or more characteristics, taken in a certain combination. In this case, attribute characteristics are located first in a certain sequence, based on the logic of the relationship of indicators. For example, groups are formed according to business forms; they are divided into subgroups according to the level of profitability or according to labor productivity, capital productivity.

Depending on the number of signs underlying them, they are divided into:

Simple - This is a grouping made according to one characteristic.

Complex grouping is carried out according to two or more characteristics

Secondary grouping

Secondary grouping called rearrangement of already grouped material.

They resort to it:

    When from large number initially formed groups must be obtained smaller number larger ones.

    When, for the purpose of comparison, it is necessary to bring differently grouped material into a comparable form.

Statistical distribution series

Among simple groupings, distribution rows are especially distinguished.

Distribution series represent an ordered arrangement of units of the population being studied into groups according to grouping characteristics.

Distribution series formed by qualitative characteristics are called attributive.

When grouping a series according to quantitative characteristics, we get variation series.

Variation series are discrete (discontinuous) and interval (continuous).

Variation series consist of two elements: variations and frequencies.

Option - this is a separate value of the variable characteristic that it takes in the distribution series.

Frequency This is the number of individual variants or each group of a variation series.

Frequencies expressed in fractions of a unit or as a percentage of the total are called frequencies. The sum of the frequencies constitutes the volume of the distribution series.

For example, by attribute.

For example, a discrete series.

Number of students

In % of total

The nature of the distribution in discrete series is depicted graphically in the form of a distribution polygon.

An example of an interval series.

Distribution of workers by production

Output, t.r.

Number of workers

Cumulative (accumulated) number

The interval distribution series is graphically depicted as a histogram.

In practice, there is a need to transform distribution series into cumulative series, built according to accumulated frequencies. With their help, you can determine structural averages that facilitate the analysis of distribution series data.

Cumulative frequencies are determined by sequentially adding to the frequencies (or frequencies) of the first group these indicators of subsequent groups of the distribution series. Cumulates and ogives are used to illustrate distribution series. To construct them, the values ​​of a discrete characteristic (or the ends of intervals) are marked on the abscissa axis, and the cumulative totals of frequencies (cumulate) or frequencies (ogive) corresponding to these characteristic values ​​are marked on the ordinate axis.

One of the most important requirements for statistical distribution series is to ensure their comparability in time and space. Variation series with equal intervals provide this condition.

However, the frequencies of individual unequal intervals in the named series are not directly comparable. In such cases, to ensure the necessary comparability, calculate distribution density , i.e. determine how many units in each group are per unit of interval value.

Groups of stores by turnover size, etc.

Number of stores

The size of the interval, t.r.

Distribution density, units (1:2)

Frequency comparison separate groups shows that most often there are stores with an interval of 250-450 tr.

When plotting the distribution variation series with unequal intervals, the height of the rectangles is determined in proportion not to the frequencies, but to the indicators of the density of distribution of the values ​​of the characteristic being studied in the corresponding intervals.

Statistical tables

The results of the summary and grouping of observation materials are presented in the form of statistical tables. They allow you to present the material in the most convenient, compact, visual and rational way.

In statistical tables, a distinction is made between subject and predicate. Subject - is the object referred to in the table, and represents groups and subgroups that are characterized by a number of indicators. Predicate the table names the indicators with the help of which the object is studied, i.e. subject.

Statistical tables can be simple or complex.

TO simple include list tables in which the subject is a list of individual objects.

IN complex In tables, the subject is a collection divided into groups according to one or more characteristics.

Tables in which the subject is grouped according to one characteristic are called group.

If the subject contains a grouping according to two or more characteristics, the table is called combinational.

To the number complex tables include both correlation and balance tables.

The division of tables into simple, group and combinational is based on the degree of division of the subject. However, the predicate can be presented in different ways.

If all indicators of the predicate characterize the subject separately, independently of each other, then such development of the predicate is called simple. If in a predicate one feature is combined with another, then such a development of the predicate is called complex.

Statistical tables were first used to present statistical data in 1727 in Russia by I.K. Kirilov in his work “The Blooming State of the All-Russian State”

The use of combination tables dates back to a later period (1882).

Technical points when compiling tables include:

    Clarity of headings.

    Units of measurement are indicated in separate columns.

    Repeated terms are placed in general headings.

    Columns and lines must be numbered.

    In group and combination tables, you should always give summary columns and rows.

    Numbers are rounded with equal precision. When one value exceeds another many times over, it is better to express the obtained dynamics indicators not in %, but in times. For example, instead of 586%, it should be 5.9 times more.

    In analytical tables, the significance of absolute numbers should be minimal. When the interests of the research include multi-digit numbers, then starting from the right you should allocate millions, thousand units. For example, 1,458,946 rubles, 1,458,946 rubles. or you can round up to 2-3 digits 1.46 million rubles.

    When the table contains information about the calculation procedure along with the reporting data, reservations are made in the form of footnotes.

    If the volume of the population being studied is incomplete or there is no initial data, all terms are first shown in the “general totals” line, and then, after an explanation, their most important components are listed in the “including” line.

    Individual cells may not be filled in for the following reasons:

a) “x” - the cell cannot be filled in at all;

b) “...” - no information;

c) “-” - the phenomenon itself is absent;

d) 0.0 - then when rounding with greater accuracy, a significant figure may appear.

Statistical graphs

Statistical graph is a drawing in which statistical data is depicted using conventional geometric figures (lines, points, symbols).

The founder of the graphical method in statistics is considered to be the English economist W. Playfair (1731-1798). In his work “Commercial and Political Atlas” (1786), methods of graphically depicting statistical data (line, bar, sector and other diagrams) were first used.

Basic elements of a chart include:

    Graph field - this is the place where it is executed. It is generally accepted that the most optimal for visual perception is a graph made on the field rectangular shape with an aspect ratio from 1:1.3 to 1:1.5 (the “golden ratio” rule). Sometimes a square-shaped field is also used.

    Graphic image - these are symbolic signs with the help of which statistical data are depicted.

    Spatial and scale landmarks. Spatial landmarks determine the placement of graphic images on the graph field. They are specified by a coordinate grid or contour lines.Scale guidelines - give graphic images quantitative significance, which is conveyed using a system of scales .

    Explication of the graph - this is an explanation of its contents, includes the title of the graph, explanations of the scales, explanations of individual elements of the graphic image.

Send your good work in the knowledge base is simple. Use the form below

Good work to the site">

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Moscow Academy named after. S.Yu. Witte

Faculty of Economics

Test

Work completed:

1st year student,

distance learning

Vislyaeva M.N.

Moscow

By doing control task you have to do secondary rearrangement for not complex example(choose an example yourself) and explain how and under what conditions such a recalculation is valid. Using computer programs and a more complex example, also indicate the effect and features of the use of IT.

In your written response to the assignment you must:

1. Explain the connection between the formula for adding variances and the correlation relation, explain its statistical meaning.

2. Compare the variation for two different distributions with different means, explain the conditions of comparability when the means differ.

3. Give the most complete explanation of the meaning of the marginal error, connect it with the concept of representativeness of the sample and its required volume.

4. Explain the relationship between estimating unknown parameters using OLS and checking the significance of the results obtained using the criteria for testing statistical hypotheses.

Regrouping previously grouped statistics is called secondary grouping. This method is used in cases where, as a result of the initial grouping, the nature of the distribution of the population being studied is unclear.

In this case, the intervals are enlarged or reduced. Secondary grouping is also used to bring groupings at different intervals into a comparable form for the purpose of comparing them. Let's look at secondary grouping techniques using an example.

Enlarge the intervals based on the data in Table 1:

Table 1

Number of stores

The above grouping is not clear enough because it does not show a clear and strict pattern in the change in turnover by group.

Let us compact the distribution rows, forming six groups. New groups are formed by summing the original groups (Table 2).

table 2

Groups of stores by turnover for the fourth quarter, thousand rubles.

Number of stores

Turnover for the fourth quarter, thousand rubles.

Average turnover per store, thousand rubles.

It is absolutely clear that the larger the stores, the higher the level of turnover.

1. Based on the analytical grouping, the relationship can be measured using an empirical correlation relationship. This indicator is denoted by the Greek letter z (eta). It is based on the variance decomposition rule, according to which the total variance s2 is equal to the sum of the within-group and between-group variances.

The dispersion of the effective characteristic within a group with the relative constancy of the factor characteristic arises due to other factors. This variance is called residual. It is determined by the formula:

where y ij is the value of attribute y for i-th unit in the j-th group;

J is the average value of the trait in the j-th group;

n j - number jth units group;

j = 1, 2, 3, ..., i.e.

Within-group variances calculated for individual groups are combined into the average within-group variance:

Between-group variance is attributed to the factor being studied (and factors associated with it), so it is called factor variance. It is determined by the formula

The rule for adding variances can be written:

The empirical correlation ratio measures how much of the overall variability in the resulting attribute is caused by the factor being studied. Accordingly, it is calculated as the ratio of factor variance to total variance resultant sign:

This indicator takes values ​​in the range: the closer to 1, the closer connection, and vice versa.

Table 3. Initial data

Table 4. Worksheet

Average turnover = ?X*f / f= 17370/51 = 340.58 thousand rubles.

The variance is:

G2 =? f*(X-Xav) 2 / ? f = 38682.36/51 = 758.48

Standard deviation:

The coefficient of variation is:

V = G / Xavg = 27.54/758.48 = 0.081; 8.1%.

The coefficient of variation is less than 33%, therefore, the population is homogeneous.

Table 5. Initial data

1) average time spent on travel to the place of work for workers = X av =? Xf / ?f = (25*70 + 35*80 + 45*200 + 55*55 + 65*15) / 420 = 41.8 min.

2) variance calculation

The variance is:

G2 =? f deviation:

3) Coefficient*(X-Xaver) 2 / ? f = 43160.8/420 = 102.8

The mean square variation is:

V = G / Xavg = 10.14/41.8 = 0.24; 24%

The coefficient of variation is less than 33%, therefore, the population considered is homogeneous and the average for it is quite typical.

The sample population can be formed according to quantitative characteristic statistical quantities, as well as alternative or attributive. In the first case, the generalizing characteristic of the sample is the sample average value, denoted, and in the second case, the sample proportion of values, denoted w. IN population respectively: general average and general share of p.

The differences -- and W -- p are called the sampling error, which is divided into the registration error and the representativeness error. The first part of the sampling error occurs due to incorrect or inaccurate information due to a lack of understanding of the essence of the issue, the inattention of the registrar when filling out questionnaires, forms, etc. It is quite easy to detect and eliminate. The second part of the error arises from a constant or spontaneous failure to comply with the principle of random selection. It is difficult to detect and eliminate, it is much larger than the first one and therefore the main attention is paid to it.

Exclusively important role for justification and application sample observation the law of large numbers plays out. The use of the law of large numbers is that under certain conditions and with a sufficiently large volume of observations, the summary characteristics obtained on the basis of sample observation will differ little from the corresponding characteristics of the general power of attorney. Based on this, it is possible, by increasing the size of the sample population, to reduce the limits possible errors representativeness, bring them to smallest sizes. On the other hand, knowing the limits of representativeness errors, it is possible to determine the required size of the sample population.

One of the most important and responsible tasks when organizing and conducting sample observation is to establish the required size of the sample population, i.e. such a number that would ensure the receipt of data that sufficiently correctly reflect the studied properties of the general population.

In this case, the following must be taken into account: 1) with what degree of accuracy should the maximum sampling error be obtained; 2) what should be the probability that the conditioned accuracy of the results of sample observation will be ensured; 3) the degree of fluctuation of the studied properties in the studied population.

This means that the required sample size is set depending on the size of the maximum sampling error, the value of the confidence coefficient (t) and the size of the variance.

Parameter estimation method linear regression, which minimizes the sum of squared deviations of observations of the dependent variable from the desired one linear function, is called the least squares method.

The essence of the method is that the criterion for the quality of the solution under consideration is the sum of squared errors, which they strive to minimize. To apply this method, it is necessary to carry out as much as possible larger number measurements of an unknown random variable (the more, the higher the accuracy of the solution) and a certain set of estimated solutions from which the best one must be selected. If the set of solutions is parameterized, then we need to find the optimal value of the parameters.

LSM is used in mathematics, in particular in probability theory and mathematical statistics. This method is most widely used in filtering problems, when it is necessary to separate the useful signal from the noise superimposed on it. It is also used in mathematical analysis for a rough idea given function more simple functions. Another area of ​​application of least squares is the solution of systems of equations with a number of unknowns less than the number of equations.

Stages of testing statistical hypotheses:

Formulation of the main hypothesis H 0 and the competing hypothesis H 1 . Hypotheses must be clearly formalized in mathematical terms.

Setting the probability b, called the significance level and corresponding to errors of the first type, on which a conclusion about the veracity of the hypothesis will be made in the future.

The calculation of statistics μ criterion is such that:

its value depends on the original sample;

based on its value, one can draw conclusions about the truth of the hypothesis H 0 ;

the statistics itself must obey some known distribution law, because q itself is random due to chance.

Construction of the critical region. From the range of values ​​of μ, a subset of such values ​​is identified, according to which significant discrepancies with the assumption can be judged. Its size is chosen in such a way that equality is satisfied. This set is called the critical region.

Conclusion about the truth of the hypothesis. The observed sample values ​​are substituted into the statistics q and, based on whether they fall (or do not fall) into the critical region, a decision is made to reject (or accept) the put forward hypothesis H0.

dispersion correlation variation

Posted on Allbest.ru

...

Similar documents

    Discrete sample value table random variables in an orderly manner. Interval table statistical series relative frequencies. Specifying an empirical distribution function and plotting it. Polygon and random variable distribution.

    practical work, added 07/26/2012

    Numerical characteristics for statistical distributions. Construction of an interval variation series, frequency polygon, graph of the sample distribution function and determination of the sample mean value and sample variance two ways.

    presentation, added 11/01/2013

    Average value of the indicator (arithmetic mean). Variation indicators - range of variation, average linear deviation, standard deviation, dispersion, coefficient of variation. Maximum and minimum value statistical indicator.

    test, added 11/14/2008

    The concept of the general population, mathematical expectation and dispersion. Ensuring randomness and representativeness of the sample in statistical planning. Discrete and interval variation series, point estimates characteristics distribution parameters.

    abstract, added 06/13/2011

    Essence sample survey. Methods for selecting units in sample population. Average and maximum error for indicators average size and share indicators. Determining the required sample size for a given marginal error average value.

    presentation, added 03/16/2014

    Forms, types and methods of statistical observation. Types of groupings, their interval and frequency. Structure of the dynamics series. Absolute and relative statistical quantities. Presentation of the sample in the form of a statistical series. Point and interval estimation.

    course of lectures, added 11/29/2013

    Construction of interval variation series based on indicators. Calculation of arithmetic mean, mode and median, relative and absolute indicators variations. Definition quantitative characteristics distributions, construction of empirical functions.

    course work, added 01/11/2012

    A scatter diagram as points on a plane whose coordinates correspond to the values ​​of random variables X and Y, the order of its construction and purpose. Finding coefficients and constructing a linear approximation graph and a quadratic approximation graph.

    course work, added 05/03/2011

    Ordering the initial sample of time to failure. Examination statistical hypothesis on the correspondence between exponential and Weibull distributions. Estimation of distribution parameters and reliability indicators, its main methods and techniques.

    course work, added 01/22/2012

    The concept of a variation series, statistical distribution. Empirical function and main characteristics of the mathematical expectation of sample variance. Spot and interval estimates distributions. The theory of hypotheses is an analogue of the theory of confidence intervals.

Groups are distinguished:

  1. Primary, compiled on the basis primary material collected during observations.
  2. Secondary, compiled on the basis of primary ones, is used in two cases:
    • when it is necessary to reorganize small formal groups into larger ones;
    • when it is necessary to give a comparative assessment of materials collected in different places and using different methods.
A grouping based on two or more characteristics is called - combinational.
The characteristic by which groups or types of phenomena are distinguished is called grouping or grouping basis. The basis can be quantitative or attributive. Attributive– this is a sign that has a name (for example, profession: seamstress, teacher, etc.).

Example No. 1. The following distribution data are available trading companies by the number of employees in two regions.


Construct a secondary grouping of data on the distribution of firms by recalculating the data from region 1 in accordance with the grouping of region 2. In which region average number more workers?

Solution:
The first group “Less than 5” will include 4/5 of the group “1-5”. Then the number of firms will be: 6*4/5 = 4.8 ≈ 5.
The “5-10” group completely includes the “6-10” group and part of the “1-5” group, i.e. the number firm will be 4 + (6-5) = 5
The “11-20” group will completely include the “11-15” group and part of the “16-20” group, namely ¼*50 = 12.5 ≈ 13.
The “21-30” group fully includes the “16-20” group and the “21-25” group, and the “over 25” group. We get: (50-13) + 20 + 15 = 72


Find the average number of employees:
For the first region.

Weighted average: x av = 1960/105 = 18.67

For the second region.


Weighted average: x av = 3502.5/117 = 29.94
Thus, in the second region the average number of employees is higher.

Example No. 2.
Distribution of workers by length of service

Group numberGroups of workers by length of service, yearsNumber of workers, peopleNumber of workers as a percentage of the total
I2-6 6 30,0
II6-10 6 30,0
III10-14 5 25,0
IV14-18 3 15,0
TOTAL20 100,0

In the distribution series, for clarity, the characteristic being studied is calculated as a percentage. The results of the primary grouping showed that 60.0% of workers have up to 10 years of experience, with an equal split from 2-6 years - 30% and from 6-10 years - 30%, and 40% of workers have experience from 10 to 18 years.
To study the relationship between work experience and output, it is necessary to build an analytical grouping. At its base we will take the same groups as in the distribution series. We present the grouping results in Table 2.

Table 2 - Grouping of workers by length of service

Group numberGroups of workers by years of experienceNumber of workers, peopleAverage work experience, yearsProduct output, rub.
TotalFor one slave
I2-6 6 3,25 1335,0 222,5
II6-10 6 7,26 1613,0 268,8
III10-14 5 11,95 1351,0 270,2
IV14-18 3 16,5 965,0 321,6
TOTAL:20 8,62 5264 236

To fill out table 2, you need to create worksheet 3.

Table 3.

No.Groups of workers by length of service, yearsWorker numberExperienceOutput in rub.
1 2 3 4 5
1 2-6 1, 2, 3, 4, 2,0; 2,3; 3,0; 5,0; 4,5; 2,7 205, 200, 205, 250, 225, 250
Total for the group:6 19,5 1335
2 6-10 5, 6, 8, 13, 17, 19 6,2; 8,0; 6,9; 7,0; 9,0; 6,5 208, 290, 270, 250, 270, 253
Total for the group6 43,6 1613
3 10-14 9, 12, 15, 16, 18 12,5; 13,0; 11,0; 10,5; 12,8 230, 300, 287, 276, 258
Total for the group5 59,8 1351
4 14-18 11, 20, 14 16, 18, 15,5 295, 320, 350
Total for the group3 49,5 965
Total20 172.4 5264,0

Dividing the columns (4:3); (5:3) tab. 3 we get the corresponding data to fill out table 2. So further for all groups. By filling out Table 2, we obtain an analytical table.
Having calculated the work table, we compare the final results of the table with the data of the problem conditions; they must match. Thus, in addition to constructing groupings and finding average values, we will also check arithmetic control.
Analyzing analytical table 2, we can conclude that the studied characteristics (indicators) depend on each other. With increasing work experience, production output per worker constantly increases. The output of workers of the fourth group is 99.1 rubles. higher than the first or 44.5%, we considered an example of grouping according to one characteristic. But in a number of cases, such a grouping is insufficient to solve the assigned problems. In such cases, they move on to grouping according to two or more characteristics, i.e. to combinational. Let us perform a secondary grouping of data by average production output.
We characterize each group by the number of workers, average work experience, average output - in total and per worker calculations are presented in Table 4.

Table 4 - Grouping of workers by length of service and average output

No.Worker groupsNumber of workers, peopleAvg. work experience, yearsAverage production output, rub.
by experienceaccording to average production cont. in rub.Totalper one worker
1 2-6 200,0-250,0 4 2,5 835,0 208,75
Total for the group6 3,25 1335,0 222,5
2 6-10 200,0-250,0 - - - -
3 10-14 200,0-250,0 1 12,5 230,0 230,0
Total for the group5 11,96 1351,0 270,2
4 14-18 200,0-250,0 - - - -
Total for the group3 16,5 965,0 321,6
Total by group200,0-250,0 5 3,0 1065,0 213,0
Total20 8,62 5264 263,2

To construct a secondary analytical grouping based on average product output within the initially created groups, we will determine the interval of the secondary grouping, highlighting three groups, i.e. one less than in the original group.
Then, i=(350-200)/3 = 50 rub.
There is no point in taking more groups, there will be a very small interval, less is possible. The final data for the group is calculated as the sum of the experience for the group, send for the first 19, 5 years divided by the number of workers - 6 people, we get 3.25 years.
The table data shows that product output is directly dependent on work experience.

Sometimes the initial grouping does not clearly identify the nature of the distribution of population units, or in order to bring the groupings to a comparable type for the purpose of comparative analysis, it is necessary to change the existing grouping slightly: to combine previously identified relatively small groups into a small number of larger typical groups or to change the boundaries of the previous groups, in order to make the group comparable with others.

Grouping of data is carried out in accordance with the summary program in order to subsequently present the received information in an understandable manner.

Grouping- unification of units of the population into some groups that have their own characteristics, common features and similar sizes of the studied trait.

The grouping results are presented in the form grouping tables, making information visible. The table contains a summary numerical characteristic the population under study according to one or more essential characteristics, interconnected by the logic of analysis.

Example 5.2. Grouping table basis

Table title (general title)

The grouping table contains three types of headers: general, top and side. Table headings should be short and reveal the content of the indicators.

The general title reflects the contents of the entire table, indicating what place and time it relates to. It is located above the layout in the center and is the outer header. The top headings characterize the content of the columns (headings of the predicate), and the side headings (headings of the subject) - the lines. Subject of statistical table- an object characterized by numbers. Predicate- a system of indicators that characterize the object of study, i.e. subject. The appearance of cells that cannot contain the original data should be avoided. In cells where data is missing due to incompleteness background information, make special notes.

Example 5.3. Grouping table example

Attitude of students of the Faculty of Economics and Economics to a reduction in the amount of the scholarship (based on the results of a study in January 1999)

Thus, grouping- this is the division of population units into groups according to selected varying characteristics.

Groups are distinguished by:

Data systematization tasks;

The number of grouping characteristics;

Information used.

According to the tasks of data systematization, they are distinguished: typological, structural and analytical.

Typological groupings are intended to identify qualitatively homogeneous groups of populations, i.e. objects that are close to each other simultaneously according to all grouping characteristics. For example, grouping city enterprises by type of ownership. Typological grouping divides a heterogeneous set of observation units into qualitatively homogeneous groups (classes, types of phenomena). When constructing it, quantitative and attributive characteristics can be used as grouping characteristics.

Structural groupings are the division of a homogeneous population into groups that characterize its structure according to a certain grouping characteristic. For example, grouping workshop workers by qualification. Another example of a structural grouping is the grouping of economic sectors into fuel and energy, petrochemicals, agro-industrial complex, mining, telecommunications, transport, metallurgy, defense industries, etc. By its nature, the structural grouping is also quite general, although in in some cases in generality it is inferior to typological groupings.

Analytical groupings are designed to identify dependencies between characteristics. Analytical groupings are constructed by highlighting the resulting characteristics, i.e. characteristics that change under the influence of factor characteristics, and factor characteristics, i.e. those on which the dependence of the resulting characteristics is being studied. Analytical grouping is different the following features: population units are grouped according to factor characteristics; each selected group is characterized by the average values ​​of the resulting characteristic, the change in the value of which determines the presence of connections and dependencies between the characteristics. Each selected group must contain statistically homogeneous population units based on grouping characteristics. The number of units in each allocated group must be sufficient to obtain reliable statistical characteristics the phenomenon or process being studied.

Based on the information used, primary and secondary groups are distinguished.

Primary groupings are made on the basis of initial data obtained as a result of statistical observations.

Secondary factions- the result of combining or splitting primary groupings, they make it possible to overcome the incomparability of the initial data in the primary groupings and thereby combine them into one common one and perform comparisons and comparisons of the data presented in them after the secondary grouping.

When developing a primary grouping, it is essential selecting the number of groups. The number of groups depends on the type of characteristic used as the basis for the grouping (the basis of the group), on the size of the population, and the degree of variation of the characteristic.

When constructing groupings based on a qualitative characteristic, the number of groups corresponds to the number of levels of gradation of the characteristic. When grouping by quantitative characteristic, the entire set of characteristic values ​​is divided into intervals. In this case, two approaches are possible: grouping with equal and unequal intervals.

To determine these parameters in the first case, the Sturgess formula is recommended:

n = 1 + (3.322× logN), (5.1)

Where N— number of observations.

In this case, the interval value is:

I = (Xmax - Xmin)/n. (5.2)

Main stages construction of statistical groupings include:

Selection of grouping characteristics;

Determining the required number of groups into which the population under study should be divided;

Setting the boundaries of grouping intervals;

Establishment for each grouping of indicators or their system, which should characterize the selected groups.

Grouping at unequal intervals creates a lot of problems when processing data, so such groupings should be avoided whenever possible.

Self-test questions:

What is a summary?

What is data grouping?

What types of groups do you know?

What are the characteristics of each type of group?

What is the relationship between grouping, table and summary?

What are the features of complex multidimensional groupings?

What does secondary grouping mean?

Why is a secondary group needed?