Biographies Characteristics Analysis

The mean square error of the sample is greater than. Mean square sample standard error explanation for

The average sample error shows how much the parameter deviates on average sampling frame from the corresponding general parameter. If we calculate the average of the errors of all possible samples a certain kind given volume ( n) extracted from the same general population, then we get their generalizing characteristic - mean sampling error ().

In the theory of selective observation, formulas for determining , which are individual for different ways selection (repeated and non-repeated), types of samples used and types of estimated statistical indicators.

For example, if repeated random sampling is used, then it is defined as:

When estimating the mean value of a feature;

If the sign is alternative, and the share is estimated.

In case of non-repeated random selection, the formulas are amended (1 - n/N):

- for the mean value of the attribute;

- for a share.

The probability of obtaining just such an error value is always equal to 0.683. In practice, it is preferable to obtain data with a higher probability, but this leads to an increase in the size of the sampling error.

The marginal sampling error () is equal to t times the number of average sampling errors (in sampling theory, it is customary to call the coefficient t the confidence coefficient):

If the sampling error is doubled (t = 2), then we get a much greater probability that it will not exceed a certain limit (in our case, double average error) - 0.954. If we take t = 3, then confidence level will be 0.997 - practically certainty.

Level marginal error sampling depends on the following factors:

  • the degree of variation of units of the general population;
  • sample size;
  • selected selection schemes (non-repetitive selection gives a smaller error value);
  • confidence level.

If the sample size is more than 30, then the value of t is determined from the table of normal distribution, if less - from the Student's distribution table.

Here are some values ​​of the confidence coefficient from the normal distribution table.

The confidence interval for the mean value of the attribute and for the proportion in the general population is set as follows:

So, the definition of the boundaries of the general average and share consists of the following steps:

Sampling errors at various types selection

  1. Actually random and mechanical sampling. The average error of the actual random and mechanical sampling are found using the formulas presented in Table. 11.3.

Example 11.2. To study the level of return on assets, a sample survey of 90 enterprises out of 225 was conducted using a random resampling, which resulted in the data presented in the table.

In this example, we have a 40% sample (90: 225 = 0.4, or 40%). Let us determine its marginal error and the boundaries for the average value of the feature in the general population by the steps of the algorithm:

  1. Based on the results of the sample survey, we calculate the mean value and variance in the sample population:
Table 11.5.
Observation results Estimated values
return on assets, rub., x i number of enterprises, f i middle of the interval, x i \xb4 x i \xb4 f i x i \xb4 2 f i
Up to 1.4 13 1,3 16,9 21,97
1,4-1,6 15 1,5 22,5 33,75
1,6-1,8 17 1,7 28,9 49,13
1,8-2,0 15 1,9 28,5 54,15
2,0-2,2 16 2,1 33,6 70,56
2.2 and up 14 2,3 32,2 74,06
Total 90 - 162,6 303,62

Sample mean

Sample variance of the trait under study

For our data, we define the marginal sampling error, for example, with a probability of 0.954. According to the table of probability values ​​of the normal distribution function (see an extract from it given in Appendix 1), we find the value of the confidence coefficient t corresponding to the probability of 0.954. With a probability of 0.954, the coefficient t is 2.

Thus, in 954 cases out of 1000, the average return on assets will not exceed 1.88 rubles. and not less than 1.74 rubles.

Above, a repeated random selection scheme was used. Let's see if the results of the survey change if we assume that the selection was carried out according to the no-repeating selection scheme. In this case, the average error is calculated using the formula

Then, with a probability equal to 0.954, the marginal sampling error will be:

Confidence limits for the mean value of the feature in case of non-repetitive random selection will have the following values:

Comparing the results of the two selection schemes, we can conclude that the use of non-repetitive random sampling gives more accurate results compared to the use of repeated selection with the same confidence level. At the same time, the larger the sample size, the more significantly the boundaries of the mean values ​​narrow when moving from one selection scheme to another.

According to the example, we determine the boundaries of the share of enterprises with a return on assets not exceeding 2.0 rubles in the general population:

  1. Let's calculate the sample rate.

The number of enterprises in the sample with a return on assets not exceeding 2.0 rubles is 60 units. Then

m = 60, n = 90, w = m/n = 60: 90 = 0.667;

  1. calculate the variance of the share in the sample population
  1. average sampling error when using re-scheme selection will be

If we assume that a non-repetitive selection scheme was used, then the average sampling error, taking into account the correction for the finiteness of the population, will be

  1. we set the confidence probability and determine the marginal sampling error.

With a probability value of P = 0.997, according to the normal distribution table, we obtain the value for the confidence coefficient t = 3 (see an extract from it given in Appendix 1):

Thus, with a probability of 0.997, it can be argued that in the general population the share of enterprises with a return on assets not exceeding 2.0 rubles is no less than 54.7% and no more than 78.7%.

  1. Typical sample. With a typical sample population objects is divided into k groups, then

N 1 + N 2 + ... + N i + ... + N k = N.

The volume of units extracted from each typical group depends on the method of selection adopted; them total forms the required sample size

n 1 + n 2 + … + n i + … + n k = n.

There are the following two ways to organize selection within a typical group: proportional to the volume of typical groups and proportional to the degree of fluctuation of the values ​​of the attribute in units of observation in groups. Consider the first of them, as the most commonly used.

Selection proportional to the size of typical groups assumes that in each of them will be selected next number population units:

n = n i N i /N

where n i is the number of extractable units for a sample from the i-th typical group;

n is the total sample size;

N i - the number of units of the general population that made up the i-th typical group;

N is the total number of units in the general population.

The selection of units within groups occurs in the form of random or mechanical sampling.

Formulas for estimating the mean sampling error for the mean and share are presented in Table. 11.6.

Here, is the average of the group variances of typical groups.

Example 11.3. A sample survey of students was conducted in one of the Moscow universities in order to determine the indicator of the average attendance of the university library by one student per semester. For this, a 5% non-repeated typical sample was used, the typical groups of which correspond to the course number. When selecting, proportional to the volume of typical groups, the following data were obtained:

Table 11.7.
Course number Total students, persons, N i Examined as a result of selective observation, people, n i Average number of library visits per student per semester, x i Intragroup sample variance,
1 650 33 11 6
2 610 31 8 15
3 580 29 5 18
4 360 18 6 24
5 350 17 10 12
Total 2 550 128 8 -

The number of students to be examined in each course is calculated as follows:

similar for other groups:

The distribution of values ​​of sample means always has a normal distribution law (or approaches it) for n > 100, regardless of the nature of the distribution of the general population. However, in the case of small samples, a different distribution law applies - Student's distribution. In this case, the confidence coefficient is found according to the Student's t-distribution table, depending on the value of the confidence probability P and the sample size n. Appendix 1 provides a fragment of the Student's t-distribution table, presented as a dependence of the confidence probability on the sample size and the confidence coefficient t.

Example 11.4. Suppose that a sample survey of eight students of the academy showed that in preparation for control work according to statistics, they spent the following number of hours: 8.5; 8.0; 7.8; 9.0; 7.2; 6.2; 8.4; 6.6.

Example 11.5. Let's calculate how many of 507 industrial enterprises the tax office should be checked to determine the share of enterprises with tax violations with a probability of 0.997. According to the previous similar survey, the value of the standard deviation was 0.15; the size of the sampling error is expected to be no higher than 0.05.

When using repeated random selection, check

In non-repetitive random selection, it will be necessary to check

As you can see, the use of non-repetitive sampling allows you to conduct a survey much fewer objects.

Example 11.6. A survey is planned wages at the enterprises of the industry by the method of random non-repetitive selection. What should be the size of the sample if at the time of the survey the number of employed in the industry was 100,000 people? The marginal sampling error should not exceed 100 rubles. with a probability of 0.954. From previous surveys of wages in the industry, it is known that the average standard deviation is 500 rubles.

Therefore, to solve the problem, it is necessary to include at least 100 people in the sample.

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error. Distinguish between systematic and random errors samples.

Random bugs not explained enough uniform representation in a sample set of different categories of units of the general population.

Systematic errors may be associated with a violation of the selection rules or the conditions for the implementation of the sample.

Thus, when surveying household budgets, the sampling frame was built for more than 40 years on the basis of the territorial-sectoral selection principle, which was due to the main goal of the budget survey - to characterize the standard of living of workers, employees and collective farmers. The sample was distributed among the regions and sectors of the economy of the RSFSR in proportion to total strength employed; to create an industry sample, a typical sample was used with a mechanical selection of units within groups.

The main selection criterion was the average monthly salary. The principle of selection ensured proportional representation in the sample set of workers with different levels of wages.

With the advent of new social groups(entrepreneurs, farmers, unemployed), the representativeness of the sample was violated not only due to differences with the structure of the general population, but also due to a systematic error that arose due to a mismatch between the sampling unit (employee) and the observation unit (household). A household with more than one working family member was also more likely to be selected than a household with one worker. Families not employed in the surveyed sectors fell out of the range of selected units (pensioner households, self-employed households, etc.). It was difficult to assess the accuracy of the results obtained (boundaries of confidence intervals, sampling errors), since probabilistic models were not used in the construction of the sample.

In 1996–1997 was fundamentally introduced new approach to the sampling of households. The data of the 1994 population microcensus were used as the basis for its implementation. The general population in the selection was made up of all types of households, with the exception of collective households. And the sampling set began to be organized taking into account the representativeness of the composition and types of households within each subject of the Russian Federation.

The measurement of errors in the representativeness of sample indicators is based on the assumption of the random nature of their distribution at infinite large numbers samples.

Quantifying the reliability of a sample indicator is used to get an idea of ​​the general characteristic. This is carried out either on the basis of a sample indicator, taking into account its random error, or on the basis of a certain hypothesis (about the value medium variance, nature of distribution, connection) in relation to the properties of the general population.

To test the hypothesis, the consistency of empirical data with hypothetical data is evaluated.

The magnitude of the random representativeness error depends on:

  • 1) on the sample size;
  • 2) the degree of variation of the studied trait in the general population;
  • 3) the accepted method of forming a sample population.

There are mean (standard) and marginal sampling errors.

Average error characterizes the measure of deviations of sample indicators from similar indicators of the general population.

marginal error it is customary to consider the maximum possible discrepancy between the sample and general characteristics, i.e. maximum error for a given probability of its occurrence.

According to the sample population, it is possible to evaluate various indicators (parameters) of the general population. The most commonly used scores are:

The basic principle of applying the sampling method is to ensure an equal opportunity for all units of the general population to be selected in the sample population. With this approach, the requirement of random, objective selection is observed and, therefore, the sampling error is determined primarily by its size ( P ). With an increase in the latter, the value of the average error decreases, the characteristics of the sample population approach the characteristics of the general population.

With the same number of sampling sets and other equal conditions the sampling error will be smaller in the goy of them, which is selected from the general population with less variation in the studied trait. A decrease in the variation of a trait means a decrease in the value of the variance (for a quantitative trait or for an alternative trait).

The dependence of the size of the sampling error on the methods of forming the sample population is determined by the formulas for the average sampling error (Table 5.2).

Let's supplement the indicators of Table. 5.2 with the following explanations.

The sample variance is slightly less than the general variance. mathematical statistics proved that

Table 5.2

Formulas for calculating the average sample error mri various ways selection

Sample type

repeated for

unrepeatable for

Actually

random

(simple)

Serial

(with equal

Typical (in proportion to the size of the groups)

If the sample is large (i.e. P large enough), then the ratio approaches unity and the sample variance practically coincides with the general one.

The sample is considered unconditionally large when n> 100 and unconditionally small at P < 30. При оценке результатов малой выборки указанное соотношение выборочной и генеральной дисперсии следует принимать во внимание.

They can be calculated using the following formulas:

where is the average i th series; is the overall average for the entire sample;

where is the proportion of units of a certain category in i th series; - the share of units of this category in the entire sample; r- number of selected episodes.

4. To determine the average error of a typical sample in the case of selecting units in proportion to the size of each group, the average of intra-group variances(– for a quantitative characteristic, for an alternative characteristic). According to the rule for adding variances, the value of the average of the intragroup variances is less than the value total variance. Mean value possible error typical sampling is less than the error of a simple proper random sampling.

Combined selection is often used: individual selection of units is combined with group selection, typical selection is combined with selection in series. With any selection method, with a certain probability, it can be argued that the deviation of the sample mean (or share) from the general mean (or share) will not exceed a certain value, which is called marginal error samples.

The ratio between the sampling error limit (∆) guaranteed with some probability F(t), and the mean sampling error has the form: or , where t – confidence coefficient, determined depending on the level of probability F(t).

Function values F(t) and t are determined on the basis of specially compiled mathematical tables. Here are some of the most commonly used ones:

t

Thus, the marginal sampling error answers the question of sampling accuracy with a certain probability, the value of which depends on the value of the confidence coefficient t. Yes, at t = 1 probability F(t ) deviation of the sample characteristics from the general ones by the value of a single mean error is 0.683. Consequently, on average, out of every 1000 samples, 683 will give generalized indicators (average, share), which will differ from the general ones by no more than a single average error. At t = 2 probability F(t) is equal to 0.954, which means that out of every 1000 samples, 954 will give general indicators that will differ from the general ones by no more than two times the average sample error, etc.

As well as absolute value marginal sampling error is calculated and relative error, which is defined as percentage marginal sampling error to the corresponding characteristic of the sampling population:

In practice, it is customary to set the value of ∆, as a rule, within 10% of the expected average level of the attribute.

The calculation of the average and marginal sampling errors allows you to determine the limits within which the characteristics of the general population will be:

The limits in which, with a given degree of probability, an unknown value of the indicator under study in the general population will be contained are called confidence interval, and the probability F(t) confidence probability. The higher the value of ∆, the greater the value confidence interval and hence lower estimation accuracy.

Consider the following example. To determine the average size of a deposit in a bank, 200 foreign currency accounts of depositors were selected using the method of repeated random sampling. As a result, it was found that the average size deposit - 60 thousand rubles, the dispersion was 32. At the same time, 40 accounts turned out to be on demand. It is necessary, with a probability of 0.954, to determine the limits within which the average deposit amount on foreign currency accounts in the bank and the share of demand accounts are located.

Calculate the mean error of the sample mean using the reselection formula

The marginal error of the sample mean with a probability of 0.954 will be

Consequently, the average deposit in foreign currency bank accounts is within a thousand rubles:

With a probability of 0.954, it can be argued that the average deposit in foreign currency bank accounts ranges from 59,200 to 60,800 rubles.

Let us determine the share of demand deposits in the sample population:

Sample share mean error

The marginal error of the share with a probability of 0.954 will be

Thus, the share of demand accounts in the general population is within w :

With a probability of 0.954, it can be argued that the share of demand accounts in the total number of foreign currency accounts in the bank ranges from 14.4 to 25.6%.

At case studies it is important to establish the optimal ratio between the measure of the reliability of the results obtained and the size of the acceptable sampling error. In this regard, when organizing a sample observation, the question arises related to determining the sample size necessary to obtain the required accuracy of the results with a given probability. The calculation of the required sample size is carried out on the basis of the formulas for the marginal sampling error in accordance with the type and method of selection (Table 5.3).

Table 5.3

Formulas for calculating the sample size with a proper random selection method

Let's continue the example, which presents the results of a sample survey of personal accounts of bank depositors.

It is required to determine how many accounts need to be examined so that with a probability of 0.977 the error in determining the average deposit amount does not exceed 1.5 thousand rubles. Let us express from the formula for the marginal sampling error for re-selection the indicator of the sample size:

When determining the required sample size using the above formulas, it becomes difficult to find the values ​​of σ2 and yes, since these values ​​can be obtained only after a sample survey. In this regard, instead of the actual values ​​of these indicators, approximate ones are substituted, which could be determined on the basis of any trial sample observations or from analytical previous surveys.

In cases where the statistician knows the average value of the characteristics being studied (for example, from instructions, legislative acts, etc.) or the limits in which this characteristic varies, the following calculation can be applied using approximate formulas:

and the product w(1 – w) should be replaced by the value 0.25 (w = 0.5).

To get more exact result, take the maximum possible value of these indicators. If the distribution of a trait in the general population obeys normal law, then the range of variation is approximately equal to 6σ ( extreme values separated in both directions from the average at a distance of 3σ). Hence , but if the distribution is obviously asymmetric, then .

With any type of sample, its volume begins to be calculated according to the re-sampling formula

If, as a result of the calculation, the selection share ( n ) exceeds 5%, then the calculation is carried out according to the formula of non-repetitive selection.

For a typical sample, it is necessary to divide the total volume of the sample population between the selected types of units. The calculation of the number of observations from each group depends on the previously named organizational forms typical sample.

In the typical selection of units disproportionately to the number of groups, the total number of selected units is divided by the number of groups, the resulting value gives the number of selection from each typical group:

where k is the number of identified typical groups.

When selecting units in proportion to the number of typical groups, the number of observations for each group is determined by the formula

where is the sample size from i -th group; - volume i -th group.

When selecting, taking into account the variation of the trait, the percentage of the sample from each group should be proportional to the standard deviation in this group (). The calculation of the number () is carried out according to the formulas

In serial selection, the required number of selected series is determined in the same way as in proper random selection:

Reselection

Non-repeating selection

In this case, the variances and sampling errors can be calculated for the mean value or proportion of the trait.

When using selective observation, the characteristics of its results are possible on the basis of a comparison of the obtained error limits of selective indicators with the value of the permissible error.

In this regard, the problem arises of determining the probability that the sampling error will not exceed the permissible error. The solution of this problem is reduced to the calculation based on the formula for the marginal sampling error of the quantity t.

Continuing the consideration of an example of a sample survey of personal accounts of bank customers, we will find the probability with which it can be argued that the error in determining the average deposit size will not exceed 785 rubles:

the corresponding confidence level is 0.95.

At present, sampling practices include statistical observations carried out:

  • - bodies of Rosstat;
  • – other ministries and departments (for example, monitoring of enterprises in the system of the Bank of Russia).

A well-known generalization of experience in organizing sample surveys of small enterprises, population and households is presented in the Methodological Provisions on Statistics. They give more broad concept sample observation than discussed above (Table 5.4).

In statistical practice, all four types of samples are used, presented in Table. 5.4. However, preference is usually given to the probabilistic (random) samples described above, which are the most objective, since they can be used to assess the accuracy of the results obtained from the data of the sample itself.

Table 5.4

Sample types

In samples quasi-random type probabilistic selection is assumed on the basis that the expert considering the sample considers it acceptable. An example of the use of quasi-random sampling in statistical practice is the "Sampling survey of small enterprises to study social processes in small business", conducted in 1996 in some regions of Russia. The observation units (small enterprises) were selected expertly, taking into account the representation of economic sectors from the already formed sample of the survey of the financial and economic activities of small enterprises (the form "Information on the main indicators of financial and economic activity small enterprise"). When summarizing the sample data, it was assumed that the sample was formed by the method of simple random selection.

direct use of expert judgment It is the most general method intentional inclusion of units in the sample. An example of such a selection method is monographic method, which involves obtaining information from only one unit of observation, which is typical, according to the survey organizer - an expert.

Samples based on directional selection, are implemented using an objective procedure, but without using a probabilistic mechanism. The main array method is widely known, in which the sample includes the largest (significant) units of observation that provide the main contribution to the indicator, for example total value a feature representing the main purpose of the survey.

In statistical practice, it is often used combined method of statistical observation. The combination of continuous and selective observation methods has two aspects:

  • alternation in time;
  • their simultaneous use (part of the population is observed on a continuous basis, and part - selectively).

alternation periodic sampling with relatively rare continuous surveys or censuses is necessary to clarify the composition of the studied population. This information is then used as statistical basis selective observation. Examples are population censuses and household sample surveys in between.

AT this case the following tasks are required:

  • – determination of the composition of signs of continuous observation, which ensure the organization of the sample;
  • – substantiation of periods of alternation, i.e. when continuous data is no longer relevant and costs are needed to update it.

Simultaneous use within the framework of one survey of continuous and sample observations is due to the heterogeneity of the populations encountered in statistical practice. This is especially true for surveys of the economic activity of a set of enterprises, which are characterized by skewed distributions of the characteristics under study, when a certain number of units have characteristics that are very different from the bulk of the values. In this case, such units are observed on a continuous basis, and the other part of the population is observed selectively.

With this organization of observations, the main tasks are:

  • – establishment of their optimal proportion;
  • – development of methods for assessing the accuracy of the results.

A typical example illustrating this aspect of the application combined method, is an general principle conducting surveys of the population of enterprises, according to which surveys of the population of large and medium-sized enterprises are carried out mainly by a continuous method, and small enterprises by a sample method.

Further development of the sampling methodology is carried out both in combination with the organization of continuous observation, and through the organization of special surveys, the conduct of which is dictated by the need to obtain additional information to solve specific problems. Thus, the organization of surveys in the field of conditions and living standards of the population is provided for in two aspects:

Mandatory components may be annual surveys of income, expenditure and consumption (similar to household budget surveys), which also include basic indicators of the living conditions of the population. Every year, according to a special plan, the mandatory components must be supplemented by one-time surveys (modules) of the living conditions of the population, aimed at in-depth study any chosen social theme of them total number(e.g. household assets, health, nutrition, education, working conditions, housing, leisure, social mobility, safety, etc.) with different frequency, determined by the need for indicators and resource capabilities.

The concept and calculation of sampling error.

The task of selective observation is to give correct ideas about the summary indicators of the entire population based on some part of them subjected to observation. The possible deviation of the sample share and sample mean from the share and mean in the general population is called sampling error or representativeness error. The greater the value of this error, the more the indicators of sample observation differ from those of the general population.

Differ:

Sampling errors;

Registration errors.

Registration errors occur when a fact is incorrectly established in the process of observation. They are characteristic of both continuous observation and selective observation, but they are less in selective observation.

The nature of the error is:

Tendentious - deliberate, i.e. either the best or worst units of the population were selected. In this case, the observations lose their meaning;

Random - the main organizational principle of selective observation is to prevent deliberate selection, i.e. ensure strict adherence to the principle of random selection.

General rule random selection is: individual units of the general population must have exactly the same conditions and opportunities to fall into the number of units included in the sample. This characterizes the independence of the sample result from the will of the observer. The will of the observer generates tendentious errors. The sampling error in random selection is random character. It characterizes the size of the deviations of the general characteristics from the sample ones.

Due to the fact that the characteristics in the studied population vary, the composition of the units in the sample may not coincide with the composition of the units of the entire population. It means that R and do not match with W and . The possible discrepancy between these characteristics is determined by the sampling error, which is determined by the formula:

where - general variance.

where is the sample variance.

This shows where the general variance differs from sample variance in time.

There is repeated and non-repeated selection. The essence of re-selection is that each unit in the sample, after observation, returns to the general population and can be re-examined. When resampling, the average sampling error is calculated:

For the indicator of the share of an alternative attribute, the sample variance is determined by the formula:

In practice, re-selection is rarely used. At no-reselection, population size N decreases during the sampling, the formula for the average sampling error for a quantitative attribute is:



, then

One of the possible values ​​in which the share of the studied trait can be is equal to:

where is the sampling error of the alternative feature.

Example.

At sample survey 10% of the products of the batch of finished products according to the method without re-selection received the following data on the moisture content in the samples.

Determine the average moisture %, variance, standard deviation, with a probability of 0.954 possible limits, in which we expect cf. % moisture of all finished products, with a probability of 0.987 possible limits specific gravity standard products, provided that the non-standard lot includes products with a moisture content of up to 13 and above 19%.

Only with a certain probability can it be argued that the general share of the sample share and the general average of the sample mean deviate in t once.

In statistics, these deviations are called marginal sampling errors and are marked.

The probability of judgments can be increased or decreased in t once. With a probability of 0.683, with 0.954, with 0.987, then the indicators of the general population are determined by the indicators of the sample.

As we already know, representativeness is the property of a sample population to represent a characteristic of the general population. If there is no match, they speak of a representativeness error - the measure of the deviation of the statistical structure of the sample from the structure of the corresponding general population. Suppose that the average monthly family income of pensioners in the general population is 2 thousand rubles, and in the sample - 6 thousand rubles. This means that the sociologist interviewed only the affluent part of pensioners, and a representativeness error crept into his study. In other words, the representativeness error is the discrepancy between two sets - the general one, to which the theoretical interest of the sociologist is directed and the idea of ​​the properties of which he wants to get in the end, and the selective one, to which the practical interest of the sociologist is directed, which acts both as an object of examination and a means of obtaining information about the general population.

Along with the term "representativeness error" in the domestic literature, you can find another - "sampling error". Sometimes they are used interchangeably, and sometimes “sampling error” is used instead of “representativeness error” as a quantitatively more accurate concept.

Sampling error is the deviation of the average characteristics of the sample population from the average characteristics of the general population.

In practice, sampling error is determined by comparing known characteristics of the population with sample means. In sociology, surveys of the adult population most often use data from population censuses, current statistical records, and the results of previous surveys. Socio-demographic characteristics are usually used as control parameters. Comparison of the averages of the general and sample populations, on the basis of this, the determination of the sampling error and its reduction is called representativeness control. Since a comparison of one's own and other people's data can be made at the end of the study, this method of control is called a posteriori, i.e. carried out after experience.

In Gallup polls, representativeness is controlled by data available in national censuses on the distribution of the population by sex, age, education, income, profession, race, place of residence, size locality. All-Russian Research Center public opinion(VTsIOM) uses for such purposes such indicators as gender, age, education, type of settlement, marital status, sphere of employment, official status of the respondent, which are borrowed from the State Committee on Statistics of the Russian Federation. In both cases, the population is known. Sampling error cannot be established if the values ​​of the variable in the sample and population are unknown.

During data analysis, VTsIOM specialists ensure a thorough repair of the sample in order to minimize deviations that occurred during the field work. Particularly strong shifts are observed in terms of gender and age. This is explained by the fact that women and people with higher education spend more time at home and make contact with the interviewer more easily; are an easily accessible group compared to men and people who are “uneducated”35.

Sampling error is due to two factors: the sampling method and the sample size.

Sampling errors are divided into two types - random and systematic. Random error is the probability that the sample mean will (or will not) exceed specified interval. Random errors include statistical errors inherent in the sampling method itself. They decrease as the sample size increases.

The second type of sampling error is systematic error. If a sociologist decided to find out the opinion of all residents of the city about the ongoing local authorities authorities in social policy, and interviewed only those who have a telephone, then there is a deliberate bias in the sample in favor of the wealthy strata, i.e. systematic error.

Thus, systematic errors are the result of the activity of the researcher himself. They are the most dangerous, because they lead to quite significant biases in the results of the study. Systematic errors are considered worse than random ones also because they cannot be controlled and measured.

They arise when, for example: 1) the sample does not meet the objectives of the study (the sociologist decided to study only working pensioners, but interviewed everyone in a row); 2) there is ignorance of the nature of the general population (the sociologist thought that 70% of all pensioners do not work, but it turned out that only 10% do not work); 3) only “winning” elements of the general population are selected (for example, only wealthy pensioners).

Attention! Unlike random errors, systematic errors do not decrease with increasing sample size.

Summarizing all the cases when systematic errors occur, the methodologists compiled a register of them. They believe that the source of uncontrolled biases in the distribution of sample observations may be the following factors:
♦ methodological and methodological rules for conducting sociological research;
♦ inadequate sampling methods, data collection and calculation methods were chosen;
♦ there was a replacement of the required units of observation by others, more accessible;
♦ Incomplete coverage of the sampling population (shortage of questionnaires, incomplete completion of questionnaires, inaccessibility of observation units) was noted.

Sociologists rarely make intentional mistakes. More often than not, errors arise because the sociologist is not well aware of the structure of the general population: the distribution of people by age, profession, income, and so on.

Systematic errors are easier to prevent (compared to random ones), but they are very difficult to eliminate. It is best to prevent systematic errors by accurately anticipating their sources in advance - at the very beginning of the study.

Here are some ways to avoid sampling errors:
♦ each unit of the general population must have an equal probability of being included in the sample;
♦ it is desirable to select from homogeneous populations;
♦ need to know the characteristics of the general population;
♦ Random and systematic errors should be taken into account when compiling the sample.

If the sample (or just the sample) is correctly drawn up, then the sociologist obtains reliable results that characterize the entire population. If it is compiled incorrectly, then the error that occurred at the stage of sampling is multiplied at each subsequent stage of sociological research and ultimately reaches a value that outweighs the value of the study. It is said that such research does more harm than good.

Such errors can only occur with a sample population. To avoid or reduce the probability of error, the easiest way is to increase the sample sizes (ideally up to the size of the population: when both populations match, the sample error will disappear altogether). Economically, this method is impossible. There is another way - to improve mathematical methods sampling. They are applied in practice. This is the first channel of penetration into the sociology of mathematics. Second channel - mathematical processing data.

Especially important issue errors become in marketing research, where not very large samples are used. Usually they make up several hundred, less often - a thousand respondents. Here, the starting point for calculating the sample is the question of determining the size of the sample population. The sample size depends on two factors: 1) the cost of collecting information and 2) striving for a certain degree statistical validity results that the researcher hopes to obtain. Of course, even people who are not experienced in statistics and sociology intuitively understand that the larger the sample size, i.e. the closer they are to the size of the general population as a whole, the more reliable and reliable the data obtained. However, we have already spoken above about the practical impossibility of complete surveys in those cases when they are carried out at objects whose number exceeds tens, hundreds of thousands and even millions. It is clear that the cost of collecting information (including payment for the replication of tools, the labor of questionnaires, field managers and computer input operators) depends on the amount that the customer is ready to allocate, and depends little on the researchers. As for the second factor, we will dwell on it in a little more detail.

So, the larger the sample size, the smaller the possible error. Although it should be noted that if you want to double the accuracy, you will have to increase the sample not by two, but by four times. For example, to do twice as much accurate estimate data obtained by interviewing 400 people, you need to interview not 800, but 1600 people. However, it is unlikely that marketing research needs 100% accuracy. If a brewer needs to find out what proportion of beer consumers prefer his brand rather than his competitor's brand - 60% or 40%, then the difference between 57%, 60 or 63% will not affect his plans.

Sampling error may depend not only on its size, but also on the degree of differences between individual units within the general population that we are studying. For example, if we want to know how much beer is consumed, then we find that within our population, consumption rates for various people differ significantly (heterogeneous general population). In another case, we will study the consumption of bread and find that different people it differs much less significantly (homogeneous population). The greater the difference (or heterogeneity) within the population, the greater the amount of possible sampling error. This regularity only confirms what the simple common sense. Thus, as V. Yadov rightly states, “the size (volume) of the sample depends on the level of homogeneity or heterogeneity of the objects under study. The more homogeneous they are, the smaller the number can provide statistically reliable conclusions.

The determination of the sample size also depends on the level of the confidence interval of the allowable statistical error. Here we mean the so-called random errors, which are associated with the nature of any statistical errors. IN AND. Paniotto gives the following calculations representative sample with the assumption of a 5% error:
This means that if you, after interviewing, say, 400 people in a district city, where the adult solvent population is 100 thousand people, found that 33% of the surveyed buyers prefer the products of a local meat processing plant, then with a 95% probability you can say that 33+5% (i.e. from 28 to 38%) of the inhabitants of this city are regular buyers of these products.

You can also use Gallup's calculations to estimate the ratio of sample sizes and sampling error.