Biographies Characteristics Analysis

How to find the arithmetic mean of statistics. Average values ​​in statistics

This term has other meanings, see the average meaning.

Average(in mathematics and statistics) sets of numbers - the sum of all numbers divided by their number. It is one of the most common measures of central tendency.

It was proposed (along with the geometric mean and harmonic mean) by the Pythagoreans.

Special cases of the arithmetic mean are the mean (of the general population) and the sample mean (of samples).

Introduction

Denote the set of data X = (x 1 , x 2 , …, x n), then the sample mean is usually denoted by a horizontal bar over the variable (x ¯ (\displaystyle (\bar (x))) , pronounced " x with a dash").

The Greek letter μ is used to denote the arithmetic mean of the entire population. For a random variable for which a mean value is defined, μ is probability mean or the mathematical expectation of a random variable. If the set X is a collection of random numbers with a probability mean μ, then for any sample x i from this collection μ = E( x i) is the expectation of this sample.

In practice, the difference between μ and x ¯ (\displaystyle (\bar (x))) is that μ is a typical variable because you can see the sample rather than the entire population. Therefore, if the sample is represented randomly (in terms of probability theory), then x ¯ (\displaystyle (\bar (x))) (but not μ) can be treated as a random variable having a probability distribution on the sample (probability distribution of the mean).

Both of these quantities are calculated in the same way:

X ¯ = 1 n ∑ i = 1 n x i = 1 n (x 1 + ⋯ + x n) . (\displaystyle (\bar (x))=(\frac (1)(n))\sum _(i=1)^(n)x_(i)=(\frac (1)(n))(x_ (1)+\cdots +x_(n)).)

If a X is a random variable, then the mathematical expectation X can be considered as the arithmetic mean of the values ​​in repeated measurements of the quantity X. This is a manifestation of the law of large numbers. Therefore, the sample mean is used to estimate the unknown mathematical expectation.

In elementary algebra, it is proved that the mean n+ 1 numbers above average n numbers if and only if the new number is greater than the old average, less if and only if the new number is less than the average, and does not change if and only if the new number is equal to the average. The more n, the smaller the difference between the new and old averages.

Note that there are several other "means" available, including power-law mean, Kolmogorov mean, harmonic mean, arithmetic-geometric mean, and various weighted means (e.g., arithmetic-weighted mean, geometric-weighted mean, harmonic-weighted mean).

Examples

  • For three numbers, you need to add them and divide by 3:
x 1 + x 2 + x 3 3 . (\displaystyle (\frac (x_(1)+x_(2)+x_(3))(3)).)
  • For four numbers, you need to add them and divide by 4:
x 1 + x 2 + x 3 + x 4 4 . (\displaystyle (\frac (x_(1)+x_(2)+x_(3)+x_(4))(4)).)

Or easier 5+5=10, 10:2. Because we added 2 numbers, which means that how many numbers we add, we divide by that much.

Continuous random variable

For a continuously distributed value f (x) (\displaystyle f(x)) the arithmetic mean on the interval [ a ; b ] (\displaystyle ) is defined via a definite integral:

F (x) ¯ [ a ; b ] = 1 b − a ∫ a b f (x) d x (\displaystyle (\overline (f(x)))_()=(\frac (1)(b-a))\int _(a)^(b) f(x)dx)

Some problems of using the average

Lack of robustness

Main article: Robustness in statistics

Although the arithmetic mean is often used as means or central trends, this concept does not apply to robust statistics, which means that the arithmetic mean is heavily influenced by "large deviations". It is noteworthy that for distributions with a large skewness, the arithmetic mean may not correspond to the concept of “average”, and the values ​​of the mean from robust statistics (for example, the median) may better describe the central trend.

The classic example is the calculation of the average income. The arithmetic mean can be misinterpreted as a median, which can lead to the conclusion that there are more people with more income than there really are. "Mean" income is interpreted in such a way that most people's incomes are close to this number. This "average" (in the sense of the arithmetic mean) income is higher than the income of most people, since a high income with a large deviation from the average makes the arithmetic mean strongly skewed (in contrast, the median income "resists" such a skew). However, this "average" income says nothing about the number of people near the median income (and says nothing about the number of people near the modal income). However, if the concepts of "average" and "majority" are taken lightly, then one can incorrectly conclude that most people have incomes higher than they actually are. For example, a report on the "average" net income in Medina, Washington, calculated as the arithmetic average of all annual net incomes of residents, will give a surprisingly high number due to Bill Gates. Consider the sample (1, 2, 2, 2, 3, 9). The arithmetic mean is 3.17, but five of the six values ​​are below this mean.

Compound interest

Main article: ROI

If numbers multiply, but not fold, you need to use the geometric mean, not the arithmetic mean. Most often, this incident happens when calculating the return on investment in finance.

For example, if stocks fell 10% in the first year and rose 30% in the second year, then it is incorrect to calculate the "average" increase over these two years as the arithmetic mean (−10% + 30%) / 2 = 10%; the correct average in this case is given by the compound annual growth rate, from which the annual growth is only about 8.16653826392% ≈ 8.2%.

The reason for this is that percentages have a new starting point each time: 30% is 30% from a number less than the price at the beginning of the first year: if the stock started at $30 and fell 10%, it is worth $27 at the start of the second year. If the stock is up 30%, it is worth $35.1 at the end of the second year. The arithmetic average of this growth is 10%, but since the stock has only grown by $5.1 in 2 years, an average increase of 8.2% gives a final result of $35.1:

[$30 (1 - 0.1) (1 + 0.3) = $30 (1 + 0.082) (1 + 0.082) = $35.1]. If we use the arithmetic mean of 10% in the same way, we will not get the actual value: [$30 (1 + 0.1) (1 + 0.1) = $36.3].

Compound interest at the end of year 2: 90% * 130% = 117% , i.e. a total increase of 17%, and the average annual compound interest is 117% ≈ 108.2% (\displaystyle (\sqrt (117\%))\approx 108.2\%) , that is, an average annual increase of 8.2%.

Directions

Main article: Destination statistics

When calculating the arithmetic mean of some variable that changes cyclically (for example, phase or angle), special care should be taken. For example, the average of 1° and 359° would be 1 ∘ + 359 ∘ 2 = (\displaystyle (\frac (1^(\circ )+359^(\circ ))(2))=) 180°. This number is incorrect for two reasons.

  • First, angular measures are only defined for the range from 0° to 360° (or from 0 to 2π when measured in radians). Thus, the same pair of numbers could be written as (1° and −1°) or as (1° and 719°). The averages of each pair will be different: 1 ∘ + (− 1 ∘) 2 = 0 ∘ (\displaystyle (\frac (1^(\circ )+(-1^(\circ )))(2))=0 ^(\circ )) , 1 ∘ + 719 ∘ 2 = 360 ∘ (\displaystyle (\frac (1^(\circ )+719^(\circ ))(2))=360^(\circ )) .
  • Second, in this case, a value of 0° (equivalent to 360°) would be the geometrically best mean, since the numbers deviate less from 0° than from any other value (value 0° has the smallest variance). Compare:
    • the number 1° deviates from 0° by only 1°;
    • the number 1° deviates from the calculated average of 180° by 179°.

The average value for a cyclic variable, calculated according to the above formula, will be artificially shifted relative to the real average to the middle of the numerical range. Because of this, the average is calculated in a different way, namely, the number with the smallest variance (center point) is chosen as the average value. Also, instead of subtracting, modulo distance (i.e., circumferential distance) is used. For example, the modular distance between 1° and 359° is 2°, not 358° (on a circle between 359° and 360°==0° - one degree, between 0° and 1° - also 1°, in total - 2 °).

4.3. Average values. Essence and meaning of averages

Average value in statistics, a generalizing indicator is called, characterizing the typical level of a phenomenon in specific conditions of place and time, reflecting the magnitude of a varying attribute per unit of a qualitatively homogeneous population. In economic practice, a wide range of indicators are used, calculated as averages.

For example, a generalizing indicator of the income of workers in a joint-stock company (JSC) is the average income of one worker, determined by the ratio of the wage fund and social payments for the period under review (year, quarter, month) to the number of workers in the JSC.

Calculating the average is one common generalization technique; the average indicator reflects the general that is typical (typical) for all units of the studied population, while at the same time it ignores the differences between individual units. In every phenomenon and its development there is a combination chance and need. When calculating averages, due to the operation of the law of large numbers, randomness cancels each other out, balances out, so you can abstract from the insignificant features of the phenomenon, from the quantitative values ​​of the attribute in each specific case. In the ability to abstract from the randomness of individual values, fluctuations lies the scientific value of averages as summarizing aggregate characteristics.

Where there is a need for generalization, the calculation of such characteristics leads to the replacement of many different individual values ​​of the attribute medium an indicator that characterizes the totality of phenomena, which makes it possible to identify patterns inherent in mass social phenomena, imperceptible in single phenomena.

The average reflects the characteristic, typical, real level of the studied phenomena, characterizes these levels and their changes in time and space.

The average is a summary characteristic of the regularities of the process under the conditions in which it proceeds.

4.4. Types of averages and methods for calculating them

The choice of the type of average is determined by the economic content of a certain indicator and the initial data. In each case, one of the average values ​​is applied: arithmetic, garmonic, geometric, quadratic, cubic etc. The listed averages belong to the class power medium.

In addition to power-law averages, in statistical practice, structural averages are used, which are considered to be the mode and median.

Let us dwell in more detail on power means.

Arithmetic mean

The most common type of average is average arithmetic. It is used in cases where the volume of a variable attribute for the entire population is the sum of the values ​​of the attributes of its individual units. Social phenomena are characterized by the additivity (summation) of the volumes of a varying attribute, this determines the scope of the arithmetic mean and explains its prevalence as a generalizing indicator, for example: the total wage fund is the sum of the wages of all workers, the gross harvest is the sum of manufactured products from the entire sowing area.

To calculate the arithmetic mean, you need to divide the sum of all feature values ​​by their number.

The arithmetic mean is applied in the form simple average and weighted average. The simple average serves as the initial, defining form.

simple arithmetic mean is equal to the simple sum of the individual values ​​of the averaged feature, divided by the total number of these values ​​(it is used in cases where there are ungrouped individual values ​​of the feature):

where
- individual values ​​of the variable (options); m - number of population units.

Further summation limits in the formulas will not be indicated. For example, it is required to find the average output of one worker (locksmith), if it is known how many parts each of 15 workers produced, i.e. given a number of individual values ​​of the trait, pcs.:

21; 20; 20; 19; 21; 19; 18; 22; 19; 20; 21; 20; 18; 19; 20.

The simple arithmetic mean is calculated by the formula (4.1), 1 pc.:

The average of options that are repeated a different number of times, or are said to have different weights, is called weighted. The weights are the numbers of units in different population groups (the group combines the same options).

Arithmetic weighted average- average grouped values ​​, - is calculated by the formula:

, (4.2)

where
- weights (frequency of repetition of the same features);

- the sum of the products of the magnitude of features by their frequencies;

- the total number of population units.

We will illustrate the technique for calculating the arithmetic weighted average using the example discussed above. To do this, we group the initial data and place them in the table. 4.1.

Table 4.1

The distribution of workers for the development of parts

According to the formula (4.2), the arithmetic weighted average is equal, pieces:

In some cases, the weights can be represented not by absolute values, but by relative ones (in percentages or fractions of a unit). Then the formula for the arithmetic weighted average will look like:

where
- particular, i.e. share of each frequency in the total sum of all

If the frequencies are counted in fractions (coefficients), then
= 1, and the formula for the arithmetically weighted average is:

Calculation of the arithmetic weighted average from the group averages carried out according to the formula:

,

where f-number of units in each group.

The results of calculating the arithmetic mean of the group means are presented in Table. 4.2.

Table 4.2

Distribution of workers by average length of service

In this example, the options are not individual data on the length of service of individual workers, but averages for each workshop. scales f are the number of workers in the shops. Hence, the average work experience of workers throughout the enterprise will be, years:

.

Calculation of the arithmetic mean in the distribution series

If the values ​​of the averaged attribute are given as intervals (“from - to”), i.e. interval distribution series, then when calculating the arithmetic mean value, the midpoints of these intervals are taken as the values ​​of the features in groups, as a result of which a discrete series is formed. Consider the following example (Table 4.3).

Let's move from an interval series to a discrete one by replacing the interval values ​​with their average values ​​/ (simple average

Table 4.3

Distribution of AO workers by the level of monthly wages

Groups of workers for

Number of workers

The middle of the interval

wages, rub.

pers., f

rub., X

900 and more

the values ​​of open intervals (first and last) are conditionally equated to the intervals adjoining them (second and penultimate).

With such a calculation of the average, some inaccuracy is allowed, since an assumption is made about the uniform distribution of units of the attribute within the group. However, the error will be the smaller, the narrower the interval and the more units in the interval.

After the midpoints of the intervals are found, the calculations are done in the same way as in a discrete series - the options are multiplied by the frequencies (weights) and the sum of the products is divided by the sum of the frequencies (weights), thousand rubles:

.

So, the average level of remuneration of workers in JSC is 729 rubles. per month.

The calculation of the arithmetic mean is often associated with a large expenditure of time and labor. However, in some cases, the procedure for calculating the average can be simplified and facilitated by using its properties. Let us present (without proof) some basic properties of the arithmetic mean.

Property 1. If all individual characteristic values ​​(i.e. all options) decrease or increase in itimes, then the average value of a new feature will decrease or increase accordingly in ionce.

Property 2. If all variants of the averaged feature are reducedsew or increase by the number A, then the arithmetic meansignificantly decrease or increase by the same number A.

Property 3. If the weights of all averaged options are reduced or increase to to times, the arithmetic mean will not change.

As average weights, instead of absolute indicators, you can use specific weights in the overall total (shares or percentages). This simplifies the calculation of the average.

To simplify the calculations of the average, they follow the path of reducing the values ​​​​of options and frequencies. The greatest simplification is achieved when BUT the value of one of the central options with the highest frequency is selected as / - the value of the interval (for rows with the same intervals). The value of L is called the origin, so this method of calculating the average is called the "method of counting from conditional zero" or "method of moments".

Let's assume that all options X first reduced by the same number A, and then reduced in i once. We get a new variational distribution series of new variants .

Then new options will be expressed:

,

and their new arithmetic mean , -first order moment- formula:

.

It is equal to the average of the original options, first reduced by BUT, and then in i once.

To obtain the real average, you need a moment of the first order m 1 , multiply by i and add BUT:

.

This method of calculating the arithmetic mean from a variational series is called "method of moments". This method is applied in rows with equal intervals.

The calculation of the arithmetic mean by the method of moments is illustrated by the data in Table. 4.4.

Table 4.4

Distribution of small enterprises in the region by the value of fixed production assets (OPF) in 2000

Groups of enterprises by cost of OPF, thousand rubles

Number of enterprises f

middle intervals, x

14-16 16-18 18-20 20-22 22-24

Finding the moment of the first order

.

Then, assuming A = 19 and knowing that i= 2, calculate X, thousand roubles.:

Types of average values ​​and methods for their calculation

At the stage of statistical processing, a variety of research tasks can be set, for the solution of which it is necessary to choose the appropriate average. In this case, it is necessary to be guided by the following rule: the values ​​\u200b\u200bthat represent the numerator and denominator of the average must be logically related to each other.

  • power averages;
  • structural averages.

Let us introduce the following notation:

The values ​​for which the average is calculated;

Average, where the line above indicates that the averaging of individual values ​​takes place;

Frequency (repeatability of individual trait values).

Various means are derived from the general power mean formula:

(5.1)

for k = 1 - arithmetic mean; k = -1 - harmonic mean; k = 0 - geometric mean; k = -2 - root mean square.

Averages are either simple or weighted. weighted averages are called quantities that take into account that some variants of the values ​​of the attribute may have different numbers, and therefore each variant has to be multiplied by this number. In other words, the "weights" are the numbers of population units in different groups, i.e. each option is "weighted" by its frequency. The frequency f is called statistical weight or weighing average.

Arithmetic mean- the most common type of medium. It is used when the calculation is carried out on ungrouped statistical data, where you want to get the average summand. The arithmetic mean is such an average value of a feature, upon receipt of which the total volume of the feature in the population remains unchanged.

The arithmetic mean formula ( simple) has the form

where n is the population size.

For example, the average salary of employees of an enterprise is calculated as the arithmetic average:

The determining indicators here are the wages of each employee and the number of employees of the enterprise. When calculating the average, the total amount of wages remained the same, but distributed, as it were, equally among all workers. For example, it is necessary to calculate the average salary of employees of a small company where 8 people are employed:

When calculating averages, individual values ​​of the attribute that is averaged can be repeated, so the average is calculated using grouped data. In this case, we are talking about using arithmetic mean weighted, which looks like

(5.3)

So, we need to calculate the average share price of a joint-stock company at the stock exchange. It is known that transactions were carried out within 5 days (5 transactions), the number of shares sold at the sales rate was distributed as follows:

1 - 800 ac. - 1010 rubles

2 - 650 ac. - 990 rub.

3 - 700 ak. - 1015 rubles.

4 - 550 ac. - 900 rub.

5 - 850 ak. - 1150 rubles.

The initial ratio for determining the average share price is the ratio of the total amount of transactions (OSS) to the number of shares sold (KPA).

The most common type of average is the arithmetic average.

simple arithmetic mean

The simple arithmetic mean is the average term, in determining which the total volume of a given attribute in the data is equally distributed among all units included in this population. Thus, the average annual output per worker is such a value of the volume of output that would fall on each employee if the entire volume of output was equally distributed among all employees of the organization. The arithmetic mean simple value is calculated by the formula:

simple arithmetic mean— Equal to the ratio of the sum of individual values ​​of a feature to the number of features in the aggregate

Example 1 . A team of 6 workers receives 3 3.2 3.3 3.5 3.8 3.1 thousand rubles per month.

Find the average salary
Solution: (3 + 3.2 + 3.3 +3.5 + 3.8 + 3.1) / 6 = 3.32 thousand rubles.

Arithmetic weighted average

If the volume of the data set is large and represents a distribution series, then a weighted arithmetic mean is calculated. This is how the weighted average price per unit of production is determined: the total cost of production (the sum of the products of its quantity and the price of a unit of production) is divided by the total quantity of production.

We represent this in the form of the following formula:

Weighted arithmetic mean- is equal to the ratio (the sum of the products of the attribute value to the frequency of repetition of this attribute) to (the sum of the frequencies of all attributes). It is used when the variants of the studied population occur an unequal number of times.

Example 2 . Find the average wages of shop workers per month

The average wage can be obtained by dividing the total wage by the total number of workers:

Answer: 3.35 thousand rubles.

Arithmetic mean for an interval series

When calculating the arithmetic mean for an interval variation series, the average for each interval is first determined as the half-sum of the upper and lower limits, and then the average of the entire series. In the case of open intervals, the value of the lower or upper interval is determined by the value of the intervals adjacent to them.

Averages calculated from interval series are approximate.

Example 3. Determine the average age of students in the evening department.

Averages calculated from interval series are approximate. The degree of their approximation depends on the extent to which the actual distribution of population units within the interval approaches uniform.

When calculating averages, not only absolute, but also relative values ​​(frequency) can be used as weights:

The arithmetic mean has a number of properties that more fully reveal its essence and simplify the calculation:

1. The product of the average and the sum of the frequencies is always equal to the sum of the products of the variant and the frequencies, i.e.

2. The arithmetic mean of the sum of the varying values ​​is equal to the sum of the arithmetic means of these values:

3. The algebraic sum of the deviations of the individual values ​​of the attribute from the average is zero:

4. The sum of the squared deviations of the options from the mean is less than the sum of the squared deviations from any other arbitrary value, i.e.

In order to find the average value in Excel (whether it is a numerical, textual, percentage or other value), there are many functions. And each of them has its own characteristics and advantages. After all, certain conditions can be set in this task.

For example, the average values ​​of a series of numbers in Excel are calculated using statistical functions. You can also manually enter your own formula. Let's consider various options.

How to find the arithmetic mean of numbers?

To find the arithmetic mean, you add all the numbers in the set and divide the sum by the number. For example, a student's grades in computer science: 3, 4, 3, 5, 5. What goes for a quarter: 4. We found the arithmetic mean using the formula: \u003d (3 + 4 + 3 + 5 + 5) / 5.

How to do it quickly using Excel functions? Take for example a series of random numbers in a string:

Or: make the cell active and simply manually enter the formula: =AVERAGE(A1:A8).

Now let's see what else the AVERAGE function can do.


Find the arithmetic mean of the first two and last three numbers. Formula: =AVERAGE(A1:B1;F1:H1). Result:



Average by condition

The condition for finding the arithmetic mean can be a numerical criterion or a text one. We will use the function: =AVERAGEIF().

Find the arithmetic mean of numbers that are greater than or equal to 10.

Function: =AVERAGEIF(A1:A8,">=10")


The result of using the AVERAGEIF function on the condition ">=10":

The third argument - "Averaging range" - is omitted. First, it is not required. Secondly, the range parsed by the program contains ONLY numeric values. In the cells specified in the first argument, the search will be performed according to the condition specified in the second argument.

Attention! The search criterion can be specified in a cell. And in the formula to make a reference to it.

Let's find the average value of the numbers by the text criterion. For example, the average sales of the product "tables".

The function will look like this: =AVERAGEIF($A$2:$A$12;A7;$B$2:$B$12). Range - a column with product names. The search criterion is a link to a cell with the word "tables" (you can insert the word "tables" instead of the link A7). Averaging range - those cells from which data will be taken to calculate the average value.

As a result of calculating the function, we obtain the following value:

Attention! For a text criterion (condition), the averaging range must be specified.

How to calculate the weighted average price in Excel?

How do we know the weighted average price?

Formula: =SUMPRODUCT(C2:C12,B2:B12)/SUM(C2:C12).


Using the SUMPRODUCT formula, we find out the total revenue after the sale of the entire quantity of goods. And the SUM function - sums up the quantity of goods. By dividing the total revenue from the sale of goods by the total number of units of goods, we found the weighted average price. This indicator takes into account the "weight" of each price. Its share in the total mass of values.

Standard deviation: formula in Excel

Distinguish between the standard deviation for the general population and for the sample. In the first case, this is the root of the general variance. In the second, from the sample variance.

To calculate this statistical indicator, a dispersion formula is compiled. The root is taken from it. But in Excel there is a ready-made function for finding the standard deviation.


The standard deviation is linked to the scale of the source data. This is not enough for a figurative representation of the variation of the analyzed range. To get the relative level of scatter in the data, the coefficient of variation is calculated:

standard deviation / arithmetic mean

The formula in Excel looks like this:

STDEV (range of values) / AVERAGE (range of values).

The coefficient of variation is calculated as a percentage. Therefore, we set the percentage format in the cell.

5.1. The concept of average

Average value - this is a generalizing indicator that characterizes the typical level of the phenomenon. It expresses the value of the attribute, related to the unit of the population.

The average always generalizes the quantitative variation of the trait, i.e. in average values, individual differences in the units of the population due to random circumstances are canceled out. In contrast to the average, the absolute value that characterizes the level of a feature of an individual unit of the population does not allow comparing the values ​​of the feature for units belonging to different populations. So, if you need to compare the levels of remuneration of workers at two enterprises, then you cannot compare two employees of different enterprises on this basis. The wages of the workers selected for comparison may not be typical for these enterprises. If we compare the size of wage funds at the enterprises under consideration, then the number of employees is not taken into account and, therefore, it is impossible to determine where the level of wages is higher. Ultimately, only averages can be compared, i.e. How much does one worker earn on average in each company? Thus, there is a need to calculate the average value as a generalizing characteristic of the population.

Calculating the average is one common generalization technique; the average indicator denies the general that is typical (typical) for all units of the studied population, at the same time it ignores the differences between individual units. In every phenomenon and its development there is a combination of chance and necessity. When calculating averages, due to the operation of the law of large numbers, randomness cancels each other out, balances out, so you can abstract from the insignificant features of the phenomenon, from the quantitative values ​​of the attribute in each specific case. In the ability to abstract from the randomness of individual values, fluctuations, lies the scientific value of averages as generalizing characteristics of aggregates.

In order for the average to be truly typifying, it must be calculated taking into account certain principles.

Let us dwell on some general principles for the application of averages.
1. The average should be determined for populations consisting of qualitatively homogeneous units.
2. The average should be calculated for a population consisting of a sufficiently large number of units.
3. The average should be calculated for the population, the units of which are in a normal, natural state.
4. The average should be calculated taking into account the economic content of the indicator under study.

5.2. Types of averages and methods for calculating them

Let us now consider the types of averages, the features of their calculation and areas of application. Average values ​​are divided into two large classes: power averages, structural averages.

To power mean include such the most famous and commonly used types as geometric mean, arithmetic mean and mean square.

As structural averages mode and median are considered.

Let us dwell on power averages. Power averages, depending on the presentation of the initial data, can be simple and weighted. simple average is calculated from ungrouped data and has the following general form:

where X i is the variant (value) of the averaged feature;

n is the number of options.

Weighted Average is calculated by grouped data and has a general form

,

where X i is the variant (value) of the averaged feature or the middle value of the interval in which the variant is measured;
m is the exponent of the mean;
f i - frequency showing how many times the i-e value of the averaged feature occurs.

Let us give as an example the calculation of the average age of students in a group of 20 people:


We calculate the average age using the simple average formula:

Let's group the source data. We get the following distribution series:

As a result of grouping, we get a new indicator - frequency, indicating the number of students aged X years. Therefore, the average age of the students in the group will be calculated using the weighted average formula:

General formulas for calculating exponential averages have an exponent (m). Depending on what value it takes, the following types of power averages are distinguished:
harmonic mean if m = -1;
geometric mean if m –> 0;
arithmetic mean if m = 1;
root mean square if m = 2;
mean cubic if m = 3.

The power mean formulas are given in Table. 4.4.

If we calculate all types of averages for the same initial data, then their values ​​will not be the same. Here the rule of majorance of averages applies: with an increase in the exponent m, the corresponding average value also increases:

In statistical practice, more often than other types of weighted averages, arithmetic and harmonic weighted averages are used.

Table 5.1

Types of Power Means

Type of power
middle
Indicator
degrees (m)
Calculation formula
Simple weighted
harmonic -1
Geometric 0
Arithmetic 1
quadratic 2
cubic 3

The harmonic mean has a more complex structure than the arithmetic mean. The harmonic mean is used for calculations when the weights are not the units of the population - the carriers of the trait, but the products of these units and the values ​​of the trait (i.e. m = Xf). The average harmonic downtime should be used in cases of determining, for example, the average costs of labor, time, materials per unit of output, per part for two (three, four, etc.) enterprises, workers engaged in the manufacture of the same type of product , the same part, product.

The main requirement for the formula for calculating the average value is that all stages of the calculation have a real meaningful justification; the resulting average value should replace the individual values ​​of the attribute for each object without breaking the connection between individual and summary indicators. In other words, the average value should be calculated in such a way that when each individual value of the averaged indicator is replaced by its average value, some final summary indicator, connected in one way or another with the averaged one, remains unchanged. This result is called determining since the nature of its relationship with individual values ​​determines the specific formula for calculating the average value. Let's show this rule on the example of the geometric mean.

Geometric mean formula

most often used when calculating the average value of individual relative values ​​of the dynamics.

The geometric mean is used if a sequence of chain relative values ​​of the dynamics is given, indicating, for example, an increase in the volume of production compared to the level of the previous year: i 1 , i 2 , i 3 ,..., i n . Obviously, the volume of production in the last year is determined by its initial level (q 0) and subsequent growth over the years:

q n =q 0 × i 1 × i 2 ×...×i n .

Taking q n as a defining indicator and replacing the individual values ​​of the dynamics indicators with average ones, we arrive at the relation

From here

5.3. Structural averages

A special type of average values ​​- structural averages - is used to study the internal structure of the series of distribution of attribute values, as well as to estimate the average value (power type), if, according to the available statistical data, its calculation cannot be performed (for example, if there were no data in the considered example). and on the volume of production, and on the amount of costs by groups of enterprises).

Indicators are most often used as structural averages. fashion - the most frequently repeated feature value - and median - the value of a feature that divides the ordered sequence of its values ​​into two parts equal in number. As a result, in one half of the population units, the value of the attribute does not exceed the median level, and in the other half it is not less than it.

If the feature under study has discrete values, then there are no particular difficulties in calculating the mode and median. If the data on the values ​​of the attribute X are presented in the form of ordered intervals of its change (interval series), the calculation of the mode and median becomes somewhat more complicated. Since the median value divides the entire population into two parts equal in number, it ends up in one of the intervals of the feature X. Using interpolation, the median value is found in this median interval:

,

where X Me is the lower limit of the median interval;
h Me is its value;
(Sum m) / 2 - half of the total number of observations or half of the volume of the indicator that is used as a weighting in the formulas for calculating the average value (in absolute or relative terms);
S Me-1 is the sum of observations (or the volume of the weighting feature) accumulated before the beginning of the median interval;
m Me is the number of observations or the volume of the weighting feature in the median interval (also in absolute or relative terms).

In our example, even three median values ​​\u200b\u200bcan be obtained - based on the signs of the number of enterprises, the volume of production and the total amount of production costs:

Thus, for half of the enterprises, the cost of a unit of production exceeds 125.19 thousand rubles, half of the total volume of production is produced with a level of costs per product of more than 124.79 thousand rubles. and 50% of the total cost is formed at the level of the cost of one product above 125.07 thousand rubles. We also note that there is a certain upward trend in cost, since Me 2 = 124.79 thousand rubles, and the average level is 123.15 thousand rubles.

When calculating the modal value of a feature according to the data of the interval series, it is necessary to pay attention to the fact that the intervals are the same, since the indicator of the frequency of feature values ​​X depends on this. For an interval series with equal intervals, the mode value is determined as

where X Mo is the lower value of the modal interval;
m Mo is the number of observations or the volume of the weighting feature in the modal interval (in absolute or relative terms);
m Mo -1 - the same for the interval preceding the modal;
m Mo+1 - the same for the interval following the modal;
h is the value of the interval of change of the trait in groups.

For our example, three modal values ​​can be calculated based on the signs of the number of enterprises, the volume of production and the amount of costs. In all three cases, the modal interval is the same, since for the same interval both the number of enterprises, the volume of production, and the total amount of production costs turn out to be the largest:

Thus, enterprises with a cost level of 126.75 thousand rubles are most often encountered, products with a cost level of 126.69 thousand rubles are most often produced, and most often production costs are explained by a cost level of 123.73 thousand rubles.

5.4. Variation indicators

The specific conditions in which each of the studied objects is located, as well as the features of their own development (social, economic, etc.) are expressed by the corresponding numerical levels of statistical indicators. Thus, variation, those. the discrepancy between the levels of the same indicator in different objects is objective and helps to understand the essence of the phenomenon under study.

There are several ways to measure variation in statistics.

The simplest is the calculation of the indicator span variation H as the difference between the maximum (X max) and minimum (X min) observed values ​​of the trait:

H=X max - X min .

However, the range of variation shows only the extreme values ​​of the trait. The repeatability of intermediate values ​​is not taken into account here.

More stringent characteristics are indicators of fluctuation relative to the average level of the attribute. The simplest indicator of this type is mean linear deviation L as the arithmetic mean of the absolute deviations of a trait from its average level:

With the repetition of individual values ​​of X, the weighted arithmetic mean formula is used:

(Recall that the algebraic sum of deviations from the mean level is zero.)

The indicator of the average linear deviation has found wide application in practice. With its help, for example, the composition of workers, the rhythm of production, the uniformity of the supply of materials are analyzed, and systems of material incentives are developed. But, unfortunately, this indicator complicates calculations of a probabilistic type, makes it difficult to apply the methods of mathematical statistics. Therefore, in statistical scientific research, the indicator is most often used to measure variation. dispersion.

The feature variance (s 2) is determined based on the quadratic power mean:

.

An exponent s equal to is called standard deviation.

In the general theory of statistics, the variance indicator is an estimate of the probability theory indicator of the same name and (as the sum of squared deviations) an estimate of the variance in mathematical statistics, which makes it possible to use the provisions of these theoretical disciplines to analyze socio-economic processes.

If the variation is estimated from a small number of observations taken from an unlimited general population, then the average value of the feature is determined with some error. The calculated value of the dispersion appears to be shifted downward. To obtain an unbiased estimate, the sample variance obtained from the formulas above must be multiplied by n / (n - 1). As a result, with a small number of observations (< 30) дисперсию признака рекомендуется вычислять по формуле

Usually already at n > (15÷20) the discrepancy between the biased and unbiased estimates becomes insignificant. For the same reason, bias is usually not taken into account in the formula for adding variances.

If several samples are taken from the general population and each time the average value of the attribute is determined, then the problem of estimating the variability of the averages arises. Estimate variance mean value can also be based on just one sample observation according to the formula

,

where n is the sample size; s 2 is the variance of the feature calculated from the sample data.

Value is called mean sampling error and is a characteristic of the deviation of the sample mean value of feature X from its true mean value. The average error indicator is used in assessing the reliability of the results of sample observation.

Relative dispersion indicators. To characterize the measure of fluctuation of the trait under study, the fluctuation indicators are calculated in relative terms. They allow you to compare the nature of dispersion in different distributions (different units of observation of the same trait in two sets, with different values ​​of the means, when comparing different sets). Calculation of indicators of measure of relative dispersion is carried out as the ratio of the absolute dispersion index to the arithmetic mean, multiplied by 100%.

1. Oscillation coefficient reflects the relative fluctuation of the extreme values ​​of the trait around the average

.

2. Relative linear shutdown characterizes the share of the average value of the sign of absolute deviations from the average value

.

3. Coefficient of variation:

is the most common variance measure used to assess the typicality of averages.

In statistics, populations with a coefficient of variation greater than 30–35% are considered to be heterogeneous.

This method of estimating variation also has a significant drawback. Indeed, let, for example, the initial population of workers with an average length of service of 15 years, with a standard deviation s = 10 years, "aged" by another 15 years. Now = 30 years, and the standard deviation is still 10. The previously heterogeneous population (10/15 × 100 = 66.7%), thus turns out to be quite homogeneous over time (10/30 × 100 = 33.3%).

Boyarsky A.Ya. Theoretical research on statistics: Sat. Scientific Proceedings. - M .: Statistics, 1974. pp. 19–57.

Previous

In order to analyze and obtain statistical conclusions on the result of the summary and grouping, generalizing indicators are calculated - average and relative values.

The problem of averages - to characterize all units of the statistical population with one value of the attribute.

Average values ​​characterize the qualitative indicators of entrepreneurial activity: distribution costs, profit, profitability, etc.

average value- this is a generalizing characteristic of the units of the population according to some varying attribute.

Average values ​​make it possible to compare the levels of the same trait in different populations and find the reasons for these discrepancies.

In the analysis of the phenomena under study, the role of average values ​​is enormous. The English economist W. Petty (1623-1687) made extensive use of averages. V. Petty wanted to use average values ​​as a measure of the cost of spending on the average daily subsistence of one worker. The stability of the average value is a reflection of the patterns of the processes under study. He believed that information can be transformed even if there is not enough initial data.

The English scientist G. King (1648-1712) used average and relative values ​​when analyzing data on the population of England.

The theoretical developments of the Belgian statistician A. Quetelet (1796-1874) are based on the inconsistency of the nature of social phenomena - highly stable in the mass, but purely individual.

According to A. Quetelet, permanent causes act in the same way on each phenomenon under study and make these phenomena similar to each other, create patterns common to all of them.

A consequence of the teachings of A. Quetelet was the allocation of average values ​​as the main method of statistical analysis. He said that statistical averages are not a category of objective reality.

A. Quetelet expressed his views on the average in his theory of the average person. An average person is a person who has all the qualities in an average size (average mortality or birth rate, average height and weight, average running speed, average propensity for marriage and suicide, for good deeds, etc.). For A. Quetelet, the average person is the ideal of a person. The inconsistency of A. Quetelet's theory of the average man was proved in Russian statistical literature at the end of the 19th-20th centuries.

The famous Russian statistician Yu. E. Yanson (1835-1893) wrote that A. Quetelet assumes the existence in nature of the type of the average person as something given, from which life has rejected the average people of a given society and a given time, and this leads him to a completely mechanical view of the laws of motion of social life: motion is a gradual increase in the average properties of a person, a gradual restoration of a type; consequently, such a leveling of all manifestations of the life of the social body, beyond which any forward movement ceases.

The essence of this theory has found its further development in the works of a number of statistical theorists as the theory of true values. A. Quetelet had followers - the German economist and statistician W. Lexis (1837-1914), who transferred the theory of true values ​​to the economic phenomena of social life. His theory is known as the stability theory. Another version of the idealistic theory of averages is based on the philosophy

Its founder is the English statistician A. Bowley (1869–1957), one of the most prominent theorists of modern times in the field of the theory of averages. His concept of averages is outlined in the book "Elements of Statistics".

A. Bowley considers averages only from the quantitative side, thereby separating quantity from quality. Determining the meaning of average values ​​(or "their function"), A. Bowley puts forward the Machist principle of thinking. A. Bowley wrote that the function of averages should express a complex group

with a few prime numbers. Statistical data should be simplified, grouped and averaged. These views were shared by R. Fisher (1890-1968), J. Yule (1871-1951), Frederick S. Mills (1892), and others.

In the 30s. 20th century and subsequent years, the average value is considered as a socially significant characteristic, the information content of which depends on the homogeneity of the data.

The most prominent representatives of the Italian school R. Benini (1862-1956) and C. Gini (1884-1965), considering statistics to be a branch of logic, expanded the scope of statistical induction, but they associated the cognitive principles of logic and statistics with the nature of the studied phenomena, following the traditions of the sociological interpretation of statistics.

In the works of K. Marx and V. I. Lenin, a special role is assigned to average values.

K. Marx argued that individual deviations from the general level are canceled in the average value and the average level becomes a generalizing characteristic of the mass phenomenon. The average value becomes such a characteristic of the mass phenomenon only if a significant number of units are taken and these units are qualitatively homogeneous. Marx wrote that the average value found was the average of "... many different individual values ​​of the same kind."

The average value acquires special significance in a market economy. It helps to determine the necessary and general, the trend of the laws of economic development directly through the individual and random.

Average values are generalizing indicators in which the action of general conditions, the regularity of the phenomenon under study is expressed.

Statistical averages are calculated on the basis of mass data of a statistically correctly organized mass observation. If the statistical average is calculated from mass data for a qualitatively homogeneous population (mass phenomena), then it will be objective.

The average value is abstract, since it characterizes the value of an abstract unit.

The average is abstracted from the diversity of the feature in individual objects. Abstraction is a stage of scientific research. The dialectical unity of the individual and the general is realized in the average value.

Average values ​​should be applied on the basis of a dialectical understanding of the categories of the individual and the general, the individual and the mass.

The middle one reflects something in common that is added up in a certain single object.

To identify patterns in mass social processes, the average value is of great importance.

The deviation of the individual from the general is a manifestation of the development process.

The average value reflects the characteristic, typical, real level of the phenomena being studied. The purpose of averages is to characterize these levels and their changes in time and space.

The average indicator is an ordinary value, because it is formed in normal, natural, general conditions for the existence of a specific mass phenomenon, considered as a whole.

An objective property of a statistical process or phenomenon reflects the average value.

The individual values ​​of the studied statistical feature are different for each unit of the population. The average value of individual values ​​of one kind is a product of necessity, which is the result of the cumulative action of all units of the population, manifested in a mass of repeating accidents.

Some individual phenomena have signs that exist in all phenomena, but in different quantities - this is the height or age of a person. Other signs of an individual phenomenon are qualitatively different in different phenomena, that is, they are present in some and not observed in others (a man will not become a woman). The average value is calculated for signs that are qualitatively homogeneous and differ only quantitatively, which are inherent in all phenomena in a given set.

The average value is a reflection of the values ​​of the trait being studied and is measured in the same dimension as this trait.

The theory of dialectical materialism teaches that everything in the world changes and develops. And also the signs that are characterized by average values ​​change, and, accordingly, the averages themselves.

Life is a continuous process of creating something new. The bearer of a new quality is single objects, then the number of these objects increases, and the new becomes mass, typical.

The average value characterizes the studied population only on one basis. For a complete and comprehensive presentation of the population under study for a number of specific features, it is necessary to have a system of average values ​​that can describe the phenomenon from different angles.

2. Types of averages

In the statistical processing of the material, various problems arise that need to be solved, and therefore various average values ​​are used in statistical practice. Mathematical statistics uses various averages, such as: arithmetic average; geometric mean; average harmonic; root mean square.

In order to apply one of the above types of average, it is necessary to analyze the population under study, determine the material content of the phenomenon under study, all this is done on the basis of conclusions drawn from the principle of meaningfulness of the results when weighing or summing up.

In the study of averages, the following indicators and notation are used.

The criterion by which the average is found is called averaged feature and is denoted by x; the value of the averaged feature for any unit of the statistical population is called its individual meaning or options, and denoted as x 1 , X 2 , x 3 ,… X P ; frequency is the repeatability of individual values ​​of a trait, denoted by the letter f.

Arithmetic mean

One of the most common types of medium arithmetic mean, which is calculated when the volume of the averaged attribute is formed as the sum of its values ​​for individual units of the studied statistical population.

To calculate the arithmetic mean, the sum of all feature levels is divided by their number.


If some options occur several times, then the sum of the attribute levels can be obtained by multiplying each level by the corresponding number of population units, followed by the sum of the resulting products, the arithmetic mean calculated in this way is called the weighted arithmetic mean.

The formula for the weighted arithmetic mean is as follows:


where x i are options,

f i - frequencies or weights.

A weighted average should be used in all cases where the variants have different abundances.

The arithmetic average, as it were, distributes equally among the individual objects the total value of the attribute, which in fact varies for each of them.

Calculation of average values ​​is carried out according to data grouped in the form of interval distribution series, when the trait variants from which the average is calculated are presented in the form of intervals (from - to).

Properties of the arithmetic mean:

1) the arithmetic mean of the sum of the varying values ​​is equal to the sum of the arithmetic means: If x i = y i + z i , then


This property shows in which cases it is possible to summarize the average values.

2) the algebraic sum of the deviations of the individual values ​​of the varying attribute from the mean is equal to zero, since the sum of deviations in one direction is offset by the sum of deviations in the other direction:


This rule demonstrates that the mean is the resultant.

3) if all variants of the series are increased or decreased by the same number?, then the average will increase or decrease by the same number?:


4) if all variants of the series are increased or decreased by A times, then the average will also increase or decrease by A times:


5) the fifth property of the average shows us that it does not depend on the size of the weights, but depends on the ratio between them. As weights, not only relative, but also absolute values ​​can be taken.

If all the frequencies of the series are divided or multiplied by the same number d, then the average will not change.


Average harmonic. In order to determine the arithmetic mean, it is necessary to have a number of options and frequencies, i.e., values X and f.

Suppose we know the individual values ​​of the feature X and works X/, and frequencies f are unknown, then, to calculate the average, we denote the product = X/; where:



The average in this form is called the harmonic weighted average and is denoted x harm. vzvv.

Accordingly, the harmonic mean is identical to the arithmetic mean. It is applicable when the actual weights are not known. f, and the product is known fx = z

When the works fx the same or equal to one (m = 1), the harmonic simple mean is used, calculated by the formula:


where X- separate options;

n- number.

Geometric mean

If there are n growth factors, then the formula for the average coefficient is:


This is the geometric mean formula.

The geometric mean is equal to the root of the degree n from the product of growth coefficients characterizing the ratio of the value of each subsequent period to the value of the previous one.

If values ​​expressed as square functions are subject to averaging, the root mean square is used. For example, using the root mean square, you can determine the diameters of pipes, wheels, etc.

The mean square simple is determined by taking the square root of the quotient from dividing the sum of squares of the individual feature values ​​by their number.


The weighted root mean square is:

3. Structural averages. Mode and median

To characterize the structure of the statistical population, indicators are used that are called structural averages. These include mode and median.

Fashion (M about ) - the most common option. Fashion the value of the feature is called, which corresponds to the maximum point of the theoretical distribution curve.

The mode represents the most frequently occurring or typical value.

Fashion is used in commercial practice to study consumer demand and record prices.

In a discrete series, the mode is the variant with the highest frequency. In the interval variation series, the central variant of the interval, which has the highest frequency (particularity), is considered the mode.

Within the interval, it is necessary to find the value of the attribute, which is the mode.


where X about is the lower limit of the modal interval;

h is the value of the modal interval;

fm is the frequency of the modal interval;

f t-1 - frequency of the interval preceding the modal;

fm+1 is the frequency of the interval following the modal.

The mode depends on the size of the groups, on the exact position of the boundaries of the groups.

Fashion- the number that actually occurs most often (is a certain value), in practice it has the widest use (the most common type of buyer).

Median (M e- this is the value that divides the number of ordered variation series into two equal parts: one part has values ​​of the varying feature that are smaller than the average variant, and the other is large.

Median is an element that is greater than or equal to and simultaneously less than or equal to half of the remaining elements of the distribution series.

The property of the median is that the sum of the absolute deviations of the trait values ​​from the median is less than from any other value.

Using the median allows you to get more accurate results than using other forms of averages.

The order of finding the median in the interval variation series is as follows: we arrange the individual values ​​of the attribute by rank; determine the accumulated frequencies for this ranked series; according to the accumulated frequencies, we find the median interval:


where x me is the lower limit of the median interval;

i Me is the value of the median interval;

f/2 is the half sum of the frequencies of the series;

S Me-1 is the sum of accumulated frequencies preceding the median interval;

f Me is the frequency of the median interval.

The median divides the number of rows in half, therefore, it is where the cumulative frequency is half or more than half of the total number of frequencies, and the previous (cumulative) frequency is less than half the number of the population.