Biographies Characteristics Analysis

How to calculate the arithmetic mean formula. Find the total index of the wholesale supply of food products in actual prices

How to calculate the average of numbers in Excel

You can find the arithmetic mean of numbers in Excel using the function.

Syntax AVERAGE

=AVERAGE(number1,[number2],…) - Russian version

Arguments AVERAGE

  • number1- the first number or range of numbers, for calculating the arithmetic mean;
  • number2(Optional) – second number or range of numbers to calculate the arithmetic mean. The maximum number of function arguments is 255.

To calculate, do the following steps:

  • Select any cell;
  • Write a formula in it =AVERAGE(
  • Select the range of cells for which you want to make a calculation;
  • Press the "Enter" key on the keyboard

The function will calculate the average value in the specified range among those cells that contain numbers.

How to find the average value given text

If there are empty lines or text in the data range, then the function treats them as "zero". If there are logical expressions FALSE or TRUE among the data, then the function perceives FALSE as “zero”, and TRUE as “1”.

How to find the arithmetic mean by condition

The function is used to calculate the average by a condition or criterion. For example, let's say we have product sales data:

Our task is to calculate the average sales of pens. To do this, we will take the following steps:

  • In a cell A13 write the name of the product “Pens”;
  • In a cell B13 let's enter the formula:

=AVERAGEIF(A2:A10,A13,B2:B10)

Cell range “ A2:A10” points to the list of products in which we will search for the word “Pens”. Argument A13 this is a link to a cell with text that we will search for among the entire list of products. Cell range “ B2:B10” is a range with product sales data, among which the function will find “Pens” and calculate the average value.


In order to analyze and obtain statistical conclusions on the result of the summary and grouping, generalizing indicators are calculated - average and relative values.

The problem of averages - to characterize all units of the statistical population with one value of the attribute.

Average values ​​characterize the qualitative indicators of entrepreneurial activity: distribution costs, profit, profitability, etc.

average value- this is a generalizing characteristic of the units of the population according to some varying attribute.

Average values ​​make it possible to compare the levels of the same trait in different populations and find the reasons for these discrepancies.

In the analysis of the phenomena under study, the role of average values ​​is enormous. The English economist W. Petty (1623-1687) made extensive use of averages. V. Petty wanted to use average values ​​as a measure of the cost of spending on the average daily subsistence of one worker. The stability of the average value is a reflection of the patterns of the processes under study. He believed that information can be transformed even if there is not enough initial data.

The English scientist G. King (1648-1712) used average and relative values ​​when analyzing data on the population of England.

The theoretical developments of the Belgian statistician A. Quetelet (1796-1874) are based on the inconsistency of the nature of social phenomena - highly stable in the mass, but purely individual.

According to A. Quetelet, permanent causes act in the same way on each phenomenon under study and make these phenomena similar to each other, create patterns common to all of them.

A consequence of the teachings of A. Quetelet was the allocation of average values ​​as the main method of statistical analysis. He said that statistical averages are not a category of objective reality.

A. Quetelet expressed his views on the average in his theory of the average person. An average person is a person who has all the qualities in an average size (average mortality or birth rate, average height and weight, average running speed, average propensity for marriage and suicide, for good deeds, etc.). For A. Quetelet, the average person is the ideal of a person. The inconsistency of A. Quetelet's theory of the average man was proved in Russian statistical literature at the end of the 19th-20th centuries.

The famous Russian statistician Yu. E. Yanson (1835-1893) wrote that A. Quetelet assumes the existence in nature of the type of the average person as something given, from which life has rejected the average people of a given society and a given time, and this leads him to a completely mechanical view of the laws of motion of social life: motion is a gradual increase in the average properties of a person, a gradual restoration of a type; consequently, such a leveling of all manifestations of the life of the social body, beyond which any forward movement ceases.

The essence of this theory has found its further development in the works of a number of statistical theorists as the theory of true values. A. Quetelet had followers - the German economist and statistician W. Lexis (1837-1914), who transferred the theory of true values ​​to the economic phenomena of social life. His theory is known as the stability theory. Another version of the idealistic theory of averages is based on the philosophy

Its founder is the English statistician A. Bowley (1869–1957), one of the most prominent theorists of modern times in the field of the theory of averages. His concept of averages is outlined in the book "Elements of Statistics".

A. Bowley considers averages only from the quantitative side, thereby separating quantity from quality. Determining the meaning of average values ​​(or "their function"), A. Bowley puts forward the Machist principle of thinking. A. Bowley wrote that the function of averages should express a complex group

with a few prime numbers. Statistical data should be simplified, grouped and averaged. These views were shared by R. Fisher (1890-1968), J. Yule (1871-1951), Frederick S. Mills (1892), and others.

In the 30s. 20th century and subsequent years, the average value is considered as a socially significant characteristic, the information content of which depends on the homogeneity of the data.

The most prominent representatives of the Italian school R. Benini (1862-1956) and C. Gini (1884-1965), considering statistics to be a branch of logic, expanded the scope of statistical induction, but they associated the cognitive principles of logic and statistics with the nature of the studied phenomena, following the traditions of the sociological interpretation of statistics.

In the works of K. Marx and V. I. Lenin, a special role is assigned to average values.

K. Marx argued that individual deviations from the general level are canceled in the average value and the average level becomes a generalizing characteristic of the mass phenomenon. The average value becomes such a characteristic of the mass phenomenon only if a significant number of units are taken and these units are qualitatively homogeneous. Marx wrote that the average value found was the average of "... many different individual values ​​of the same kind."

The average value acquires special significance in a market economy. It helps to determine the necessary and general, the trend of the laws of economic development directly through the individual and random.

Average values are generalizing indicators in which the action of general conditions, the regularity of the phenomenon under study is expressed.

Statistical averages are calculated on the basis of mass data of a statistically correctly organized mass observation. If the statistical average is calculated from mass data for a qualitatively homogeneous population (mass phenomena), then it will be objective.

The average value is abstract, since it characterizes the value of an abstract unit.

The average is abstracted from the diversity of the feature in individual objects. Abstraction is a stage of scientific research. The dialectical unity of the individual and the general is realized in the average value.

Average values ​​should be applied on the basis of a dialectical understanding of the categories of the individual and the general, the individual and the mass.

The middle one reflects something in common that is added up in a certain single object.

To identify patterns in mass social processes, the average value is of great importance.

The deviation of the individual from the general is a manifestation of the development process.

The average value reflects the characteristic, typical, real level of the phenomena being studied. The purpose of averages is to characterize these levels and their changes in time and space.

The average indicator is an ordinary value, because it is formed in normal, natural, general conditions for the existence of a specific mass phenomenon, considered as a whole.

An objective property of a statistical process or phenomenon reflects the average value.

The individual values ​​of the studied statistical feature are different for each unit of the population. The average value of individual values ​​of one kind is a product of necessity, which is the result of the cumulative action of all units of the population, manifested in a mass of repeating accidents.

Some individual phenomena have signs that exist in all phenomena, but in different quantities - this is the height or age of a person. Other signs of an individual phenomenon are qualitatively different in different phenomena, that is, they are present in some and not observed in others (a man will not become a woman). The average value is calculated for signs that are qualitatively homogeneous and differ only quantitatively, which are inherent in all phenomena in a given set.

The average value is a reflection of the values ​​of the trait being studied and is measured in the same dimension as this trait.

The theory of dialectical materialism teaches that everything in the world changes and develops. And also the signs that are characterized by average values ​​change, and, accordingly, the averages themselves.

Life is a continuous process of creating something new. The bearer of the new quality is single objects, then the number of these objects increases, and the new becomes mass, typical.

The average value characterizes the studied population only on one basis. For a complete and comprehensive presentation of the population under study for a number of specific features, it is necessary to have a system of average values ​​that can describe the phenomenon from different angles.

2. Types of averages

In the statistical processing of the material, various problems arise that need to be solved, and therefore various average values ​​are used in statistical practice. Mathematical statistics uses various averages, such as: arithmetic average; geometric mean; average harmonic; root mean square.

In order to apply one of the above types of average, it is necessary to analyze the population under study, determine the material content of the phenomenon under study, all this is done on the basis of conclusions drawn from the principle of meaningfulness of the results when weighing or summing up.

In the study of averages, the following indicators and notation are used.

The criterion by which the average is found is called averaged feature and is denoted by x; the value of the averaged feature for any unit of the statistical population is called its individual meaning or options, and denoted as x 1 , X 2 , x 3 ,… X P ; frequency is the repeatability of individual values ​​of a trait, denoted by the letter f.

Arithmetic mean

One of the most common types of medium arithmetic mean, which is calculated when the volume of the averaged attribute is formed as the sum of its values ​​for individual units of the studied statistical population.

To calculate the arithmetic mean, the sum of all feature levels is divided by their number.


If some options occur several times, then the sum of the attribute levels can be obtained by multiplying each level by the corresponding number of population units, followed by the sum of the resulting products, the arithmetic mean calculated in this way is called the weighted arithmetic mean.

The formula for the weighted arithmetic mean is as follows:


where x i are options,

f i - frequencies or weights.

A weighted average should be used in all cases where the variants have different abundances.

The arithmetic average, as it were, distributes equally among the individual objects the total value of the attribute, which in fact varies for each of them.

Calculation of average values ​​is carried out according to data grouped in the form of interval distribution series, when the trait variants from which the average is calculated are presented in the form of intervals (from - to).

Properties of the arithmetic mean:

1) the arithmetic mean of the sum of the varying values ​​is equal to the sum of the arithmetic means: If x i = y i + z i , then


This property shows in which cases it is possible to summarize the average values.

2) the algebraic sum of the deviations of the individual values ​​of the varying attribute from the mean is equal to zero, since the sum of deviations in one direction is offset by the sum of deviations in the other direction:


This rule demonstrates that the mean is the resultant.

3) if all variants of the series are increased or decreased by the same number?, then the average will increase or decrease by the same number?:


4) if all variants of the series are increased or decreased by A times, then the average will also increase or decrease by A times:


5) the fifth property of the average shows us that it does not depend on the size of the weights, but depends on the ratio between them. As weights, not only relative, but also absolute values ​​can be taken.

If all the frequencies of the series are divided or multiplied by the same number d, then the average will not change.


Average harmonic. In order to determine the arithmetic mean, it is necessary to have a number of options and frequencies, i.e., values X and f.

Suppose we know the individual values ​​of the feature X and works X/, and frequencies f are unknown, then, to calculate the average, we denote the product = X/; where:



The average in this form is called the harmonic weighted average and is denoted x harm. vzvv.

Accordingly, the harmonic mean is identical to the arithmetic mean. It is applicable when the actual weights are not known. f, and the product is known fx = z

When the works fx the same or equal to one (m = 1), the harmonic simple mean is used, calculated by the formula:


where X- separate options;

n- number.

Geometric mean

If there are n growth factors, then the formula for the average coefficient is:


This is the geometric mean formula.

The geometric mean is equal to the root of the degree n from the product of growth coefficients characterizing the ratio of the value of each subsequent period to the value of the previous one.

If values ​​expressed as square functions are subject to averaging, the root mean square is used. For example, using the root mean square, you can determine the diameters of pipes, wheels, etc.

The mean square simple is determined by taking the square root of the quotient from dividing the sum of squares of the individual feature values ​​by their number.


The weighted root mean square is:

3. Structural averages. Mode and median

To characterize the structure of the statistical population, indicators are used that are called structural averages. These include mode and median.

Fashion (M about ) - the most common option. Fashion the value of the feature is called, which corresponds to the maximum point of the theoretical distribution curve.

The mode represents the most frequently occurring or typical value.

Fashion is used in commercial practice to study consumer demand and record prices.

In a discrete series, the mode is the variant with the highest frequency. In the interval variation series, the central variant of the interval, which has the highest frequency (particularity), is considered the mode.

Within the interval, it is necessary to find the value of the attribute, which is the mode.


where X about is the lower limit of the modal interval;

h is the value of the modal interval;

f m is the frequency of the modal interval;

f t-1 - frequency of the interval preceding the modal;

f m+1 is the frequency of the interval following the modal.

The mode depends on the size of the groups, on the exact position of the boundaries of the groups.

Fashion- the number that actually occurs most often (is a certain value), in practice it has the widest use (the most common type of buyer).

Median (M e- this is the value that divides the number of ordered variation series into two equal parts: one part has values ​​of the varying feature that are smaller than the average variant, and the other is large.

Median is an element that is greater than or equal to and simultaneously less than or equal to half of the remaining elements of the distribution series.

The property of the median is that the sum of the absolute deviations of the trait values ​​from the median is less than from any other value.

Using the median allows you to get more accurate results than using other forms of averages.

The order of finding the median in the interval variation series is as follows: we arrange the individual values ​​of the attribute by rank; determine the accumulated frequencies for this ranked series; according to the accumulated frequencies, we find the median interval:


where x me is the lower limit of the median interval;

i Me is the value of the median interval;

f/2 is the half sum of the frequencies of the series;

S Me-1 is the sum of accumulated frequencies preceding the median interval;

f Me is the frequency of the median interval.

The median divides the number of rows in half, therefore, it is where the cumulative frequency is half or more than half of the total number of frequencies, and the previous (cumulative) frequency is less than half the number of the population.

In most cases, the data is concentrated around some central point. Thus, to describe any data set, it is enough to indicate the average value. Consider successively three numerical characteristics that are used to estimate the mean value of the distribution: arithmetic mean, median and mode.

Average

The arithmetic mean (often referred to simply as the mean) is the most common estimate of the mean of a distribution. It is the result of dividing the sum of all observed numerical values ​​by their number. For a sample of numbers X 1, X 2, ..., Xn, the sample mean (denoted by the symbol ) equals \u003d (X 1 + X 2 + ... + Xn) / n, or

where is the sample mean, n- sample size, Xi– i-th element of the sample.

Download note in or format, examples in format

Consider calculating the arithmetic average of the five-year average annual returns of 15 very high-risk mutual funds (Figure 1).

Rice. 1. Average annual return on 15 very high-risk mutual funds

The sample mean is calculated as follows:

This is a good return, especially when compared to the 3-4% return that bank or credit union depositors received over the same time period. If you sort the return values, it is easy to see that eight funds have a return above, and seven - below the average. The arithmetic mean acts as a balance point, so that low-income funds balance out high-income funds. All elements of the sample are involved in the calculation of the average. None of the other estimators of the mean of a distribution have this property.

When to calculate the arithmetic mean. Since the arithmetic mean depends on all elements of the sample, the presence of extreme values ​​significantly affects the result. In such situations, the arithmetic mean can distort the meaning of the numerical data. Therefore, when describing a data set containing extreme values, it is necessary to indicate the median or the arithmetic mean and the median. For example, if the return of the RS Emerging Growth fund is removed from the sample, the sample average of the return of the 14 funds decreases by almost 1% to 5.19%.

Median

The median is the middle value of an ordered array of numbers. If the array does not contain repeating numbers, then half of its elements will be less than and half more than the median. If the sample contains extreme values, it is better to use the median rather than the arithmetic mean to estimate the mean. To calculate the median of a sample, it must first be sorted.

This formula is ambiguous. Its result depends on whether the number is even or odd. n:

  • If the sample contains an odd number of items, the median is (n+1)/2-th element.
  • If the sample contains an even number of elements, the median lies between the two middle elements of the sample and is equal to the arithmetic mean calculated over these two elements.

To calculate the median for a sample of 15 very high-risk mutual funds, we first need to sort the raw data (Figure 2). Then the median will be opposite the number of the middle element of the sample; in our example number 8. Excel has a special function =MEDIAN() that works with unordered arrays too.

Rice. 2. Median 15 funds

Thus, the median is 6.5. This means that half of the very high-risk funds do not exceed 6.5, while the other half do so. Note that the median of 6.5 is slightly larger than the median of 6.08.

If we remove the profitability of the RS Emerging Growth fund from the sample, then the median of the remaining 14 funds will decrease to 6.2%, that is, not as significantly as the arithmetic mean (Fig. 3).

Rice. 3. Median 14 funds

Fashion

The term was first introduced by Pearson in 1894. Fashion is the number that occurs most often in the sample (the most fashionable). Fashion describes well, for example, the typical reaction of drivers to a traffic signal to stop traffic. A classic example of the use of fashion is the choice of the size of the produced batch of shoes or the color of the wallpaper. If a distribution has multiple modes, then it is said to be multimodal or multimodal (has two or more "peaks"). The multimodal distribution provides important information about the nature of the variable under study. For example, in sociological surveys, if a variable represents a preference or attitude towards something, then multimodality could mean that there are several distinctly different opinions. Multimodality is also an indicator that the sample is not homogeneous and that the observations may be generated by two or more "overlapped" distributions. Unlike the arithmetic mean, outliers do not affect the mode. For continuously distributed random variables, such as the average annual returns of mutual funds, the mode sometimes does not exist at all (or does not make sense). Since these indicators can take on a variety of values, repeating values ​​are extremely rare.

Quartiles

Quartiles are measures that are most commonly used to evaluate the distribution of data when describing the properties of large numerical samples. While the median splits the ordered array in half (50% of the array elements are less than the median and 50% are greater), quartiles break the ordered dataset into four parts. The Q 1 , median and Q 3 values ​​are the 25th, 50th and 75th percentile, respectively. The first quartile Q 1 is a number that divides the sample into two parts: 25% of the elements are less than, and 75% are more than the first quartile.

The third quartile Q 3 is a number that also divides the sample into two parts: 75% of the elements are less than, and 25% are more than the third quartile.

To calculate quartiles in versions of Excel prior to 2007, the function =QUARTILE(array, part) was used. Starting with Excel 2010, two functions apply:

  • =QUARTILE.ON(array, part)
  • =QUARTILE.EXC(array, part)

These two functions give slightly different values ​​(Figure 4). For example, when calculating the quartiles for a sample containing data on the average annual return of 15 very high-risk mutual funds, Q 1 = 1.8 or -0.7 for QUARTILE.INC and QUARTILE.EXC, respectively. By the way, the QUARTILE function used earlier corresponds to the modern QUARTILE.ON function. To calculate quartiles in Excel using the above formulas, the data array can be left unordered.

Rice. 4. Calculate quartiles in Excel

Let's emphasize again. Excel can calculate quartiles for univariate discrete series, containing the values ​​of a random variable. The calculation of quartiles for a frequency-based distribution is given in the section below.

geometric mean

Unlike the arithmetic mean, the geometric mean measures how much a variable has changed over time. The geometric mean is the root n th degree from the product n values ​​(in Excel, the function = CUGEOM is used):

G= (X 1 * X 2 * ... * X n) 1/n

A similar parameter - the geometric mean of the rate of return - is determined by the formula:

G \u003d [(1 + R 1) * (1 + R 2) * ... * (1 + R n)] 1 / n - 1,

where R i- rate of return i-th period of time.

For example, suppose the initial investment is $100,000. By the end of the first year, it drops to $50,000, and by the end of the second year, it recovers to the original $100,000. The rate of return on this investment over a two-year period is equal to 0, since the initial and final amount of funds are equal to each other. However, the arithmetic average of annual rates of return is = (-0.5 + 1) / 2 = 0.25 or 25%, since the rate of return in the first year R 1 = (50,000 - 100,000) / 100,000 = -0.5 , and in the second R 2 = (100,000 - 50,000) / 50,000 = 1. At the same time, the geometric mean of the rate of return for two years is: G = [(1–0.5) * (1 + 1 )] 1/2 – 1 = ½ – 1 = 1 – 1 = 0. Thus, the geometric mean more accurately reflects the change (more precisely, the absence of change) in the volume of investments over the biennium than the arithmetic mean.

Interesting Facts. First, the geometric mean will always be less than the arithmetic mean of the same numbers. Except for the case when all the taken numbers are equal to each other. Secondly, having considered the properties of a right triangle, one can understand why the mean is called geometric. The height of a right-angled triangle, lowered to the hypotenuse, is the average proportional between the projections of the legs on the hypotenuse, and each leg is the average proportional between the hypotenuse and its projection on the hypotenuse (Fig. 5). This gives a geometric way of constructing the geometric mean of two (lengths) segments: you need to build a circle on the sum of these two segments as a diameter, then the height, restored from the point of their connection to the intersection with the circle, will give the desired value:

Rice. 5. The geometric nature of the geometric mean (figure from Wikipedia)

The second important property of numerical data is their variation characterizing the degree of dispersion of the data. Two different samples can differ both in mean values ​​and in variations. However, as shown in fig. 6 and 7, two samples can have the same variation but different means, or the same mean and completely different variation. The data corresponding to polygon B in Fig. 7 change much less than the data from which polygon A was built.

Rice. 6. Two symmetric bell-shaped distributions with the same spread and different mean values

Rice. 7. Two symmetric bell-shaped distributions with the same mean values ​​and different scatter

There are five estimates of data variation:

  • span,
  • interquartile range,
  • dispersion,
  • standard deviation,
  • the coefficient of variation.

scope

The range is the difference between the largest and smallest elements of the sample:

Swipe = XMax-XMin

The range of a sample containing the average annual returns of 15 very high-risk mutual funds can be calculated using an ordered array (see Figure 4): range = 18.5 - (-6.1) = 24.6. This means that the difference between the highest and lowest average annual returns for very high risk funds is 24.6%.

The range measures the overall spread of the data. Although the sample range is a very simple estimate of the total spread of the data, its weakness is that it does not take into account exactly how the data is distributed between the minimum and maximum elements. This effect is well seen in Fig. 8 which illustrates samples having the same range. The B scale shows that if the sample contains at least one extreme value, the sample range is a very inaccurate estimate of the scatter of the data.

Rice. 8. Comparison of three samples with the same range; the triangle symbolizes the support of the balance, and its location corresponds to the average value of the sample

Interquartile range

The interquartile, or mean, range is the difference between the third and first quartiles of the sample:

Interquartile range \u003d Q 3 - Q 1

This value makes it possible to estimate the spread of 50% of the elements and not to take into account the influence of extreme elements. The interquartile range for a sample containing data on the average annual returns of 15 very high-risk mutual funds can be calculated using the data in Fig. 4 (for example, for the function QUARTILE.EXC): Interquartile range = 9.8 - (-0.7) = 10.5. The interval between 9.8 and -0.7 is often referred to as the middle half.

It should be noted that the Q 1 and Q 3 values, and hence the interquartile range, do not depend on the presence of outliers, since their calculation does not take into account any value that would be less than Q 1 or greater than Q 3 . The total quantitative characteristics, such as the median, the first and third quartiles, and the interquartile range, which are not affected by outliers, are called robust indicators.

While the range and interquartile range provide an estimate of the total and mean scatter of the sample, respectively, neither of these estimates takes into account exactly how the data are distributed. Variance and standard deviation free from this shortcoming. These indicators allow you to assess the degree of fluctuation of the data around the mean. Sample variance is an approximation of the arithmetic mean calculated from the squared differences between each sample element and the sample mean. For a sample of X 1 , X 2 , ... X n the sample variance (denoted by the symbol S 2 is given by the following formula:

In general, the sample variance is the sum of the squared differences between the sample elements and the sample mean, divided by a value equal to the sample size minus one:

where - arithmetic mean, n- sample size, X i - i-th sample element X. In Excel before version 2007, the function =VAR() was used to calculate the sample variance, since version 2010, the function =VAR.V() is used.

The most practical and widely accepted estimate of data scatter is standard deviation. This indicator is denoted by the symbol S and is equal to the square root of the sample variance:

In Excel before version 2007, the =STDEV() function was used to calculate the standard deviation, from version 2010 the =STDEV.V() function is used. To calculate these functions, the data array can be unordered.

Neither the sample variance nor the sample standard deviation can be negative. The only situation in which the indicators S 2 and S can be zero is if all elements of the sample are equal. In this completely improbable case, the range and interquartile range are also zero.

Numeric data is inherently volatile. Any variable can take on many different values. For example, different mutual funds have different rates of return and loss. Due to the variability of numerical data, it is very important to study not only estimates of the mean, which are summative in nature, but also estimates of the variance, which characterize the scatter of the data.

The variance and standard deviation allow us to estimate the spread of data around the mean, in other words, to determine how many elements of the sample are less than the mean, and how many are greater. The dispersion has some valuable mathematical properties. However, its value is the square of a unit of measure - a square percentage, a square dollar, a square inch, etc. Therefore, a natural estimate of the variance is the standard deviation, which is expressed in the usual units of measurement - percent of income, dollars or inches.

The standard deviation allows you to estimate the amount of fluctuation of the sample elements around the mean value. In almost all situations, the majority of observed values ​​lie within plus or minus one standard deviation from the mean. Therefore, knowing the arithmetic mean of the sample elements and the standard sample deviation, it is possible to determine the interval to which the bulk of the data belongs.

The standard deviation of returns on 15 very high-risk mutual funds is 6.6 (Figure 9). This means that the profitability of the bulk of funds differs from the average value by no more than 6.6% (i.e., it fluctuates in the range from – S= 6.2 – 6.6 = –0.4 to + S= 12.8). In fact, this interval contains a five-year average annual return of 53.3% (8 out of 15) of funds.

Rice. 9. Standard deviation

Note that in the process of summing the squared differences, items that are farther from the mean gain more weight than items that are closer. This property is the main reason why the arithmetic mean is most often used to estimate the mean of a distribution.

The coefficient of variation

Unlike previous scatter estimates, the coefficient of variation is a relative estimate. It is always measured as a percentage, not in the original data units. The coefficient of variation, denoted by the symbols CV, measures the scatter of the data around the mean. The coefficient of variation is equal to the standard deviation divided by the arithmetic mean and multiplied by 100%:

where S- standard sample deviation, - sample mean.

The coefficient of variation allows you to compare two samples, the elements of which are expressed in different units of measurement. For example, the manager of a mail delivery service intends to upgrade the fleet of trucks. When loading packages, there are two types of restrictions to consider: the weight (in pounds) and the volume (in cubic feet) of each package. Assume that in a sample of 200 bags, the average weight is 26.0 pounds, the standard deviation of the weight is 3.9 pounds, the average package volume is 8.8 cubic feet, and the standard deviation of the volume is 2.2 cubic feet. How to compare the spread of weight and volume of packages?

Since the units of measurement for weight and volume differ from each other, the manager must compare the relative spread of these values. The weight variation coefficient is CV W = 3.9 / 26.0 * 100% = 15%, and the volume variation coefficient CV V = 2.2 / 8.8 * 100% = 25% . Thus, the relative scatter of packet volumes is much larger than the relative scatter of their weights.

Distribution form

The third important property of the sample is the form of its distribution. This distribution can be symmetrical or asymmetric. To describe the shape of a distribution, it is necessary to calculate its mean and median. If these two measures are the same, the variable is said to be symmetrically distributed. If the mean value of a variable is greater than the median, its distribution has a positive skewness (Fig. 10). If the median is greater than the mean, the distribution of the variable is negatively skewed. Positive skewness occurs when the mean increases to unusually high values. Negative skewness occurs when the mean decreases to unusually small values. A variable is symmetrically distributed if it does not take on any extreme values ​​in either direction, such that large and small values ​​of the variable cancel each other out.

Rice. 10. Three types of distributions

The data depicted on the A scale have a negative skewness. This figure shows a long tail and left skew caused by unusually small values. These extremely small values ​​shift the mean value to the left, and it becomes less than the median. The data shown on scale B are distributed symmetrically. The left and right halves of the distribution are their mirror images. Large and small values ​​balance each other, and the mean and median are equal. The data shown on scale B has a positive skewness. This figure shows a long tail and skew to the right, caused by the presence of unusually high values. These too large values ​​shift the mean to the right, and it becomes larger than the median.

In Excel, descriptive statistics can be obtained using the add-in Analysis package. Go through the menu DataData analysis, in the window that opens, select the line Descriptive statistics and click Ok. In the window Descriptive statistics be sure to indicate input interval(Fig. 11). If you want to see descriptive statistics on the same sheet as the original data, select the radio button output interval and specify the cell where you want to place the upper left corner of the displayed statistics (in our example, $C$1). If you want to output data to a new sheet or to a new workbook, simply select the appropriate radio button. Check the box next to Final statistics. Optionally, you can also choose Difficulty level,k-th smallest andk-th largest.

If on deposit Data in area Analysis you don't see the icon Data analysis, you must first install the add-on Analysis package(see, for example,).

Rice. 11. Descriptive statistics of the five-year average annual returns of funds with very high levels of risk, calculated using the add-on Data analysis Excel programs

Excel calculates a number of statistics discussed above: mean, median, mode, standard deviation, variance, range ( interval), minimum, maximum, and sample size ( check). In addition, Excel calculates some new statistics for us: standard error, kurtosis, and skewness. standard error equals the standard deviation divided by the square root of the sample size. asymmetry characterizes the deviation from the symmetry of the distribution and is a function that depends on the cube of differences between the elements of the sample and the mean value. Kurtosis is a measure of the relative concentration of data around the mean versus the tails of the distribution, and depends on the differences between the sample and the mean raised to the fourth power.

Calculation of descriptive statistics for the general population

The mean, scatter, and shape of the distribution discussed above are sample-based characteristics. However, if the dataset contains numerical measurements of the entire population, then its parameters can be calculated. These parameters include the mean, variance, and standard deviation of the population.

Expected value is equal to the sum of all values ​​of the general population divided by the volume of the general population:

where µ - expected value, Xi- i-th variable observation X, N- the volume of the general population. In Excel, to calculate the mathematical expectation, the same function is used as for the arithmetic mean: =AVERAGE().

Population variance equal to the sum of the squared differences between the elements of the general population and mat. expectation divided by the size of the population:

where σ2 is the variance of the general population. Excel prior to version 2007 uses the =VAR() function to calculate the population variance, starting with version 2010 =VAR.G().

population standard deviation is equal to the square root of the population variance:

Prior to Excel 2007, the function =SDV() was used to calculate the population standard deviation, from version 2010 =SDV.Y(). Note that the formulas for population variance and standard deviation are different from the formulas for sample variance and standard deviation. When calculating sample statistics S2 and S the denominator of the fraction is n - 1, and when calculating the parameters σ2 and σ - the volume of the general population N.

rule of thumb

In most situations, a large proportion of observations are concentrated around the median, forming a cluster. In data sets with positive skewness, this cluster is located to the left (i.e., below) the mathematical expectation, and in sets with negative skewness, this cluster is located to the right (i.e., above) of the mathematical expectation. Symmetric data have the same mean and median, and the observations cluster around the mean, forming a bell-shaped distribution. If the distribution does not have a pronounced skewness, and the data is concentrated around a certain center of gravity, a rule of thumb can be used to estimate variability, which says: if the data has a bell-shaped distribution, then approximately 68% of the observations are within one standard deviation of the mathematical expectation, Approximately 95% of the observations are within two standard deviations of the expected value, and 99.7% of the observations are within three standard deviations of the expected value.

Thus, the standard deviation, which is an estimate of the average fluctuation around the mathematical expectation, helps to understand how the observations are distributed and to identify outliers. It follows from the rule of thumb that for bell-shaped distributions, only one value in twenty differs from the mathematical expectation by more than two standard deviations. Therefore, values ​​outside the interval µ ± 2σ, can be considered outliers. In addition, only three out of 1000 observations differ from the mathematical expectation by more than three standard deviations. Thus, values ​​outside the interval µ ± 3σ are almost always outliers. For distributions that are highly skewed or not bell-shaped, the Biename-Chebyshev rule of thumb can be applied.

More than a hundred years ago, the mathematicians Bienamay and Chebyshev independently discovered a useful property of the standard deviation. They found that for any data set, regardless of the shape of the distribution, the percentage of observations that lie at a distance not exceeding k standard deviations from mathematical expectation, not less (1 – 1/ 2)*100%.

For example, if k= 2, the Biename-Chebyshev rule states that at least (1 - (1/2) 2) x 100% = 75% of the observations must lie in the interval µ ± 2σ. This rule is true for any k exceeding one. The Biename-Chebyshev rule is of a very general nature and is valid for distributions of any kind. It indicates the minimum number of observations, the distance from which to the mathematical expectation does not exceed a given value. However, if the distribution is bell-shaped, the rule of thumb more accurately estimates the concentration of data around the mean.

Computing descriptive statistics for a frequency-based distribution

If the original data is not available, the frequency distribution becomes the only source of information. In such situations, it is possible to calculate approximate values ​​of quantitative indicators of the distribution, such as the arithmetic mean, standard deviation, quartiles.

If the sample data is presented as a frequency distribution, an approximate value of the arithmetic mean can be calculated, assuming that all values ​​within each class are concentrated at the midpoint of the class:

where - sample mean, n- number of observations, or sample size, with- the number of classes in the frequency distribution, mj- middle point j-th class, fj- frequency corresponding to j-th class.

To calculate the standard deviation from the frequency distribution, it is also assumed that all values ​​within each class are concentrated at the midpoint of the class.

To understand how the quartiles of the series are determined based on frequencies, let us consider the calculation of the lower quartile based on data for 2013 on the distribution of the Russian population by average per capita cash income (Fig. 12).

Rice. 12. The share of the population of Russia with per capita monetary income on average per month, rubles

To calculate the first quartile of the interval variation series, you can use the formula:

where Q1 is the value of the first quartile, xQ1 is the lower limit of the interval containing the first quartile (the interval is determined by the accumulated frequency, the first exceeding 25%); i is the value of the interval; Σf is the sum of the frequencies of the entire sample; probably always equal to 100%; SQ1–1 is the cumulative frequency of the interval preceding the interval containing the lower quartile; fQ1 is the frequency of the interval containing the lower quartile. The formula for the third quartile differs in that in all places, instead of Q1, you need to use Q3, and substitute ¾ instead of ¼.

In our example (Fig. 12), the lower quartile is in the range 7000.1 - 10,000, the cumulative frequency of which is 26.4%. The lower limit of this interval is 7000 rubles, the value of the interval is 3000 rubles, the accumulated frequency of the interval preceding the interval containing the lower quartile is 13.4%, the frequency of the interval containing the lower quartile is 13.0%. Thus: Q1 \u003d 7000 + 3000 * (¼ * 100 - 13.4) / 13 \u003d 9677 rubles.

Pitfalls associated with descriptive statistics

In this note, we looked at how to describe a data set using various statistics that estimate its mean, scatter, and distribution. The next step is to analyze and interpret the data. So far, we have studied the objective properties of data, and now we turn to their subjective interpretation. Two mistakes lie in wait for the researcher: an incorrectly chosen subject of analysis and an incorrect interpretation of the results.

An analysis of the performance of 15 very high-risk mutual funds is fairly unbiased. He led to completely objective conclusions: all mutual funds have different returns, the spread of fund returns ranges from -6.1 to 18.5, and the average return is 6.08. The objectivity of data analysis is ensured by the correct choice of total quantitative indicators of the distribution. Several methods for estimating the mean and scatter of data were considered, and their advantages and disadvantages were indicated. How to choose the right statistics that provide an objective and unbiased analysis? If the data distribution is slightly skewed, should the median be chosen over the arithmetic mean? Which indicator more accurately characterizes the spread of data: standard deviation or range? Should the positive skewness of the distribution be indicated?

On the other hand, data interpretation is a subjective process. Different people come to different conclusions, interpreting the same results. Everyone has their own point of view. Someone considers the total average annual returns of 15 funds with a very high level of risk to be good and is quite satisfied with the income received. Others may think that these funds have too low returns. Thus, subjectivity should be compensated by honesty, neutrality and clarity of conclusions.

Ethical Issues

Data analysis is inextricably linked to ethical issues. One should be critical of the information disseminated by newspapers, radio, television and the Internet. Over time, you will learn to be skeptical not only about the results, but also about the goals, subject and objectivity of research. The famous British politician Benjamin Disraeli said it best: “There are three kinds of lies: lies, damned lies and statistics.”

As noted in the note, ethical issues arise when choosing the results that should be presented in the report. Both positive and negative results should be published. In addition, when making a report or written report, the results must be presented honestly, neutrally and objectively. Distinguish between bad and dishonest presentations. To do this, it is necessary to determine what the intentions of the speaker were. Sometimes the speaker omits important information out of ignorance, and sometimes deliberately (for example, if he uses the arithmetic mean to estimate the mean of clearly skewed data in order to get the desired result). It is also dishonest to suppress results that do not correspond to the point of view of the researcher.

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 178–209

QUARTILE function retained to align with earlier versions of Excel

Average values ​​refer to generalizing statistical indicators that give a summary (final) characteristic of mass social phenomena, since they are built on the basis of a large number of individual values ​​of a varying attribute. To clarify the essence of the average value, it is necessary to consider the features of the formation of the values ​​of the signs of those phenomena, according to which the average value is calculated.

It is known that the units of each mass phenomenon have numerous features. Whichever of these signs we take, its values ​​for individual units will be different, they change, or, as they say in statistics, vary from one unit to another. So, for example, the salary of an employee is determined by his qualifications, the nature of work, length of service and a number of other factors, and therefore varies over a very wide range. The cumulative influence of all factors determines the amount of earnings of each employee, however, we can talk about the average monthly wages of workers in different sectors of the economy. Here we operate with a typical, characteristic value of a variable attribute, related to a unit of a large population.

The average reflects that general, which is typical for all units of the studied population. At the same time, it balances the influence of all factors acting on the magnitude of the attribute of individual units of the population, as if mutually canceling them. The level (or size) of any social phenomenon is determined by the action of two groups of factors. Some of them are general and main, constantly operating, closely related to the nature of the phenomenon or process being studied, and form that typical for all units of the studied population, which is reflected in the average value. Others are individual, their action is less pronounced and is episodic, random. They act in the opposite direction, cause differences between the quantitative characteristics of individual units of the population, seeking to change the constant value of the characteristics being studied. The action of individual signs is extinguished in the average value. In the cumulative influence of typical and individual factors, which is balanced and mutually canceled out in generalizing characteristics, the fundamental law of large numbers.

In the aggregate, the individual values ​​of the signs merge into a common mass and, as it were, dissolve. Hence and average value acts as "impersonal", which can deviate from the individual values ​​of features, not quantitatively coinciding with any of them. The average value reflects the general, characteristic and typical for the entire population due to the mutual cancellation in it of random, atypical differences between the signs of its individual units, since its value is determined, as it were, by the common resultant of all causes.

However, in order for the average value to reflect the most typical value of a feature, it should not be determined for any populations, but only for populations consisting of qualitatively homogeneous units. This requirement is the main condition for the scientifically based application of averages and implies a close connection between the method of averages and the method of groupings in the analysis of socio-economic phenomena. Therefore, the average value is a general indicator that characterizes the typical level of a variable trait per unit of a homogeneous population in specific conditions of place and time.

Determining, thus, the essence of average values, it must be emphasized that the correct calculation of any average value implies the fulfillment of the following requirements:

  • qualitative homogeneity of the population on which the average value is calculated. This means that the calculation of average values ​​should be based on the grouping method, which ensures the selection of homogeneous, same-type phenomena;
  • exclusion of the influence on the calculation of the average value of random, purely individual causes and factors. This is achieved when the calculation of the average is based on sufficiently massive material in which the operation of the law of large numbers is manifested, and all accidents cancel each other out;
  • when calculating the average value, it is important to establish the purpose of its calculation and the so-called defining indicator-tel(property) to which it should be oriented.

The determining indicator can act as the sum of the values ​​of the averaged feature, the sum of its reciprocals, the product of its values, etc. The relationship between the defining indicator and the average value is expressed as follows: if all values ​​of the averaged feature are replaced by the average value, then their sum or product in in this case will not change the defining indicator. On the basis of this connection of the determining indicator with the average value, an initial quantitative ratio is built for the direct calculation of the average value. The ability of averages to preserve the properties of statistical populations is called defining property.

The average value calculated for the population as a whole is called general average; average values ​​calculated for each group - group averages. The general average reflects the general features of the phenomenon under study, the group average gives a description of the phenomenon that develops under the specific conditions of this group.

The calculation methods can be different, therefore, in statistics, several types of average are distinguished, the main of which are the arithmetic average, the harmonic average and the geometric average.

In economic analysis, the use of averages is the main tool for assessing the results of scientific and technological progress, social measures, and the search for reserves for economic development. At the same time, it should be remembered that excessive focus on averages can lead to biased conclusions when conducting economic and statistical analysis. This is due to the fact that average values, being generalizing indicators, cancel out and ignore those differences in the quantitative characteristics of individual units of the population that really exist and may be of independent interest.

Types of averages

In statistics, various types of averages are used, which are divided into two large classes:

  • power averages (harmonic mean, geometric mean, arithmetic mean, mean square, mean cubic);
  • structural averages (mode, median).

To calculate power means all available characteristic values ​​must be used. Fashion and median are determined only by the distribution structure, therefore they are called structural, positional averages. The median and mode are often used as an average characteristic in those populations where the calculation of the mean exponential is impossible or impractical.

The most common type of average is the arithmetic average. Under arithmetic mean is understood as such a value of a feature that each unit of the population would have if the total of all values ​​of the feature were distributed evenly among all units of the population. The calculation of this value is reduced to the summation of all values ​​of the variable attribute and the division of the resulting amount by the total number of population units. For example, five workers completed an order for the manufacture of parts, while the first produced 5 parts, the second - 7, the third - 4, the fourth - 10, the fifth - 12. Since the value of each option occurred only once in the initial data, to determine the average output of one worker should apply the simple arithmetic mean formula:

i.e., in our example, the average output of one worker is equal to

Along with the simple arithmetic mean, they study weighted arithmetic mean. For example, let's calculate the average age of students in a group of 20 people whose age ranges from 18 to 22 years old, where xi- variants of the averaged feature, fi- frequency, which shows how many times it occurs i-th value in the aggregate (Table 5.1).

Table 5.1

Average age of students

Applying the weighted arithmetic mean formula, we get:


There is a certain rule for choosing a weighted arithmetic average: if there is a series of data on two indicators, for one of which it is necessary to calculate

the average value, and at the same time, the numerical values ​​\u200b\u200bof the denominator of its logical formula are known, and the values ​​\u200b\u200bof the numerator are unknown, but can be found as the product of these indicators, then the average value should be calculated using the arithmetic weighted average formula.

In some cases, the nature of the initial statistical data is such that the calculation of the arithmetic mean loses its meaning and the only generalizing indicator can only be another type of average value - average harmonic. At present, the computational properties of the arithmetic mean have lost their relevance in the calculation of generalizing statistical indicators due to the widespread introduction of electronic computers. The average harmonic value, which is also simple and weighted, has acquired great practical importance. If the numerical values ​​of the numerator of the logical formula are known, and the values ​​of the denominator are unknown, but can be found as a quotient of one indicator by another, then the average value is calculated by the weighted harmonic mean formula.

For example, let it be known that the car traveled the first 210 km at a speed of 70 km/h, and the remaining 150 km at a speed of 75 km/h. It is impossible to determine the average speed of the car throughout the entire journey of 360 km using the arithmetic mean formula. Since the options are the speeds in individual sections xj= 70 km/h and X2= 75 km/h, and weights (fi) are the corresponding segments of the path, then the products of options by weights will have neither physical nor economic meaning. In this case, it makes sense to divide the segments of the path into the corresponding speeds (options xi), i.e., the time spent on passing individual sections of the path (fi / xi). If the segments of the path are denoted by fi, then the entire path is expressed as Σfi, and the time spent on the entire path is expressed as Σ fi / xi , Then the average speed can be found as the quotient of the total distance divided by the total time spent:

In our example, we get:

If when using the average harmonic weight of all options (f) are equal, then instead of the weighted one, you can use simple (unweighted) harmonic mean:

where xi - individual options; n- the number of variants of the averaged feature. In the example with speed, a simple harmonic mean could be applied if the segments of the path traveled at different speeds were equal.

Any average value should be calculated so that when it replaces each variant of the averaged feature, the value of some final, generalizing indicator, which is associated with the averaged indicator, does not change. So, when replacing the actual speeds on individual sections of the path with their average value (average speed), the total distance should not change.

The form (formula) of the average value is determined by the nature (mechanism) of the relationship of this final indicator with the averaged one, therefore the final indicator, the value of which should not change when the options are replaced by their average value, is called defining indicator. To derive the average formula, you need to compose and solve an equation using the relationship of the averaged indicator with the determining one. This equation is constructed by replacing the variants of the averaged feature (indicator) with their average value.

In addition to the arithmetic mean and the harmonic mean, other types (forms) of the mean are also used in statistics. All of them are special cases. degree average. If we calculate all types of power-law averages for the same data, then the values

they will be the same, the rule applies here majorance medium. As the exponent of the mean increases, so does the mean itself. The most commonly used formulas in practical research for calculating various types of power mean values ​​are presented in Table. 5.2.

Table 5.2


The geometric mean is applied when available. n growth factors, while the individual values ​​of the trait are, as a rule, relative values ​​of the dynamics, built in the form of chain values, as a ratio to the previous level of each level in the dynamics series. The average thus characterizes the average growth rate. geometric mean simple calculated by the formula

Formula geometric mean weighted has the following form:

The above formulas are identical, but one is applied at current coefficients or growth rates, and the second - at the absolute values ​​of the levels of the series.

root mean square is used when calculating with the values ​​of square functions, is used to measure the degree of fluctuation of the individual values ​​of the attribute around the arithmetic mean in the distribution series and is calculated by the formula

Mean square weighted calculated using a different formula:

Average cubic is used when calculating with the values ​​of cubic functions and is calculated by the formula

weighted average cubic:

All the above average values ​​can be represented as a general formula:

where is the average value; - individual value; n- the number of units of the studied population; k- exponent, which determines the type of average.

When using the same source data, the more k in the general power mean formula, the larger the mean value. It follows from this that there is a regular relationship between the values ​​of power means:

The average values ​​described above give a generalized idea of ​​the population under study, and from this point of view, their theoretical, applied, and cognitive significance is indisputable. But it happens that the value of the average does not coincide with any of the really existing options, therefore, in addition to the considered averages, in statistical analysis it is advisable to use the values ​​​​of specific options that occupy a well-defined position in an ordered (ranked) series of attribute values. Among these quantities, the most commonly used are structural, or descriptive, average- mode (Mo) and median (Me).

Fashion- the value of the trait that is most often found in this population. With regard to the variational series, the mode is the most frequently occurring value of the ranked series, i.e., the variant with the highest frequency. Fashion can be used to determine the most visited stores, the most common price for any product. It shows the size of the feature characteristic of a significant part of the population, and is determined by the formula

where x0 is the lower limit of the interval; h- interval value; fm- interval frequency; fm_ 1 - frequency of the previous interval; fm+ 1 - frequency of the next interval.

Median the variant located in the center of the ranked row is called. The median divides the series into two equal parts in such a way that on both sides of it there is the same number of population units. At the same time, in one half of the population units, the value of the variable attribute is less than the median, in the other half it is greater than it. The median is used when examining an element whose value is greater than or equal to or simultaneously less than or equal to half of the elements of the distribution series. The median gives a general idea of ​​where the values ​​of the feature are concentrated, in other words, where is their center.

The descriptive nature of the median is manifested in the fact that it characterizes the quantitative boundary of the values ​​of the varying attribute, which are possessed by half of the population units. The problem of finding the median for a discrete variational series is solved simply. If all units of the series are given serial numbers, then the serial number of the median variant is defined as (n + 1) / 2 with an odd number of members n. If the number of members of the series is an even number, then the median will be the average value of two variants with serial numbers n/ 2 and n / 2 + 1.

When determining the median in interval variation series, the interval in which it is located (the median interval) is first determined. This interval is characterized by the fact that its accumulated sum of frequencies is equal to or exceeds half the sum of all frequencies of the series. The calculation of the median of the interval variation series is carried out according to the formula

where X0- the lower limit of the interval; h- interval value; fm- interval frequency; f- the number of members of the series;

∫m-1 - the sum of the accumulated terms of the series preceding this one.

Along with the median, for a more complete characterization of the structure of the studied population, other values ​​​​of options are used, which occupy a quite definite position in the ranked series. These include quartiles and deciles. Quartiles divide the series by the sum of frequencies into 4 equal parts, and deciles - into 10 equal parts. There are three quartiles and nine deciles.

The median and mode, in contrast to the arithmetic mean, do not extinguish individual differences in the values ​​of a variable attribute and, therefore, are additional and very important characteristics of the statistical population. In practice, they are often used instead of the average or along with it. It is especially expedient to calculate the median and mode in those cases when the studied population contains a certain number of units with a very large or very small value of the variable attribute. These values ​​of options, which are not very characteristic for the population, while affecting the value of the arithmetic mean, do not affect the values ​​of the median and mode, which makes the latter very valuable indicators for economic and statistical analysis.

Variation indicators

The purpose of a statistical study is to identify the main properties and patterns of the studied statistical population. In the process of summary processing of statistical observation data, we build distribution lines. There are two types of distribution series - attributive and variational, depending on whether the attribute taken as the basis of the grouping is qualitative or quantitative.

variational called distribution series built on a quantitative basis. The values ​​of quantitative characteristics for individual units of the population are not constant, more or less differ from each other. This difference in the value of a trait is called variations. Separate numerical values ​​of the trait occurring in the studied population are called value options. The presence of variation in individual units of the population is due to the influence of a large number of factors on the formation of the trait level. The study of the nature and degree of variation of signs in individual units of the population is the most important issue of any statistical study. Variation indicators are used to describe the measure of trait variability.

Another important task of statistical research is to determine the role of individual factors or their groups in the variation of certain features of the population. To solve such a problem in statistics, special methods for studying variation are used, based on the use of a system of indicators that measure variation. In practice, the researcher is faced with a sufficiently large number of options for the values ​​of the attribute, which does not give an idea of ​​the distribution of units according to the value of the attribute in the aggregate. To do this, all variants of the attribute values ​​are arranged in ascending or descending order. This process is called row ranking. The ranked series immediately gives a general idea of ​​the values ​​that the feature takes in the aggregate.

The insufficiency of the average value for an exhaustive characterization of the population makes it necessary to supplement the average values ​​with indicators that make it possible to assess the typicality of these averages by measuring the fluctuation (variation) of the trait under study. The use of these indicators of variation makes it possible to make the statistical analysis more complete and meaningful, and thus to better understand the essence of the studied social phenomena.

The simplest signs of variation are minimum and maximum - this is the smallest and largest value of the feature in the population. The number of repetitions of individual variants of feature values ​​is called repetition rate. Let us denote the frequency of repetition of the feature value fi, the sum of frequencies equal to the volume of the studied population will be:

where k- number of variants of attribute values. It is convenient to replace frequencies with frequencies - w.i. Frequency- relative frequency indicator - can be expressed in fractions of a unit or a percentage and allows you to compare variation series with a different number of observations. Formally we have:

To measure the variation of a trait, various absolute and relative indicators are used. The absolute indicators of variation include the mean linear deviation, the range of variation, variance, standard deviation.

Span variation(R) is the difference between the maximum and minimum values ​​of the trait in the studied population: R= Xmax - Xmin. This indicator gives only the most general idea of ​​the fluctuation of the trait under study, as it shows the difference only between the extreme values ​​of the variants. It is completely unrelated to the frequencies in the variational series, that is, to the nature of the distribution, and its dependence can give it an unstable, random character only from the extreme values ​​of the trait. The range of variation does not provide any information about the features of the studied populations and does not allow us to assess the degree of typicality of the obtained average values. The scope of this indicator is limited to fairly homogeneous populations, more precisely, it characterizes the variation of a trait, an indicator based on taking into account the variability of all values ​​of the trait.

To characterize the variation of a trait, it is necessary to generalize the deviations of all values ​​from any value typical for the population under study. Such indicators

variations, such as the mean linear deviation, variance and standard deviation, are based on the consideration of deviations of the values ​​of the attribute of individual units of the population from the arithmetic mean.

Average linear deviation is the arithmetic mean of the absolute values ​​of the deviations of individual options from their arithmetic mean:


The absolute value (modulus) of the variant deviation from the arithmetic mean; f- frequency.

The first formula is applied if each of the options occurs in the aggregate only once, and the second - in series with unequal frequencies.

There is another way to average the deviations of options from the arithmetic mean. This method, which is very common in statistics, is reduced to calculating the squared deviations of options from the mean value with their subsequent averaging. In this case, we get a new indicator of variation - the variance.

Dispersion(σ 2) - the average of the squared deviations of the variants of the trait values ​​from their average value:

The second formula is used if the variants have their own weights (or frequencies of the variation series).

In economic and statistical analysis, it is customary to evaluate the variation of an attribute most often using the standard deviation. Standard deviation(σ) is the square root of the variance:

The mean linear and mean square deviations show how much the value of the attribute fluctuates on average for the units of the population under study, and are expressed in the same units as the variants.

In statistical practice, it often becomes necessary to compare the variation of various features. For example, it is of great interest to compare variations in the age of personnel and their qualifications, length of service and wages, etc. For such comparisons, indicators of the absolute variability of signs - the average linear and standard deviation - are not suitable. It is impossible, in fact, to compare the fluctuation of work experience, expressed in years, with the fluctuation of wages, expressed in rubles and kopecks.

When comparing the variability of various traits in the aggregate, it is convenient to use relative indicators of variation. These indicators are calculated as the ratio of absolute indicators to the arithmetic mean (or median). Using as an absolute indicator of variation the range of variation, the average linear deviation, the standard deviation, one obtains the relative indicators of fluctuation:


The most commonly used indicator of relative volatility, characterizing the homogeneity of the population. The set is considered homogeneous if the coefficient of variation does not exceed 33% for distributions close to normal.

Let's assume that you need to find the average number of days for tasks to be completed by different employees. Or you want to calculate a time interval of 10 years Average temperature on a particular day. Calculating the average value of a series of numbers in several ways.

The mean is a function of the measure of central tendency, which is the center of a series of numbers in a statistical distribution. The three most common criteria for the central trend are.

    The average The arithmetic mean is calculated by adding a series of numbers and then dividing the number of those numbers. For example, the average of 2, 3, 3, 5, 7, and 10 has 30 divided by 6, 5;

    Median The middle number of a series of numbers. Half of the numbers have values ​​that are greater than the Median, and half of the numbers have values ​​that are less than the Median. For example, the median of 2, 3, 3, 5, 7 and 10 is 4.

    Mode The most frequently occurring number in a group of numbers. For example mode 2, 3, 3, 5, 7 and 10 - 3.

These three measures of the central tendency of the symmetrical distribution of a series of numbers are one and the same. In an asymmetric distribution of a number of numbers, they can be different.

Calculate the average value of cells located continuously in one row or one column

Do the following.

Calculating the Average of Scattered Cells

To accomplish this task, use the function AVERAGE. Copy the table below onto a blank sheet.

Calculating the weighted average

SUMPRODUCT and amounts. The vThis example calculates the average unit price paid across three purchases, where each purchase is for a different number of units of measure at different unit prices.

Copy the table below onto a blank sheet.

Calculating the average value of numbers, ignoring zero values

To accomplish this task, use the functions AVERAGE and if. Copy the table below and keep in mind that in this example, to make it easier to understand, copy it onto a blank sheet.