Biographies Characteristics Analysis

General population and sample sample. General population and sample

A set of homogeneous objects is often examined in relation to some feature that characterizes them, measured quantitatively or qualitatively.

For example, if there is a batch of parts, then the size of the part according to GOST can be a quantitative sign, and the standardness of the part can be a quality sign.

If necessary, they are checked for compliance with standards, sometimes they resort to a complete survey, but in practice this is rarely used. For example, if the general population contains a huge number of objects under study, then it is practically impossible to conduct a continuous survey. In this case, a certain number of objects (elements) are selected from the entire population and they are examined. Thus, there is a general and sample population.

The general name is the totality of all objects that are subject to examination or study. The general population, as a rule, contains a finite number of elements, but if it is too large, then in order to simplify mathematical calculations, it is assumed that the entire population consists of an uncountable number of objects.

A sample or sample population is a part of the selected elements from the entire population. Sampling can be repeated or non-repeated. In the first case, it is returned to the general population, in the second, it is not. In practice, non-repetitive random selection is more often used.

The population and the sample must be related to each other by representativeness. In other words, in order for the characteristics of the sample population to be able to confidently determine the characteristics of the entire population, it is necessary that the elements of the sample represent them as accurately as possible. In other words, the sample must be representative (representative).

A sample will be more or less representative if it is drawn randomly from a very large number of the entire population. This can be argued on the basis of the so-called law of large numbers. In this case, all elements have an equal probability of being included in the sample.

There are various selection options. All these methods, in principle, can be divided into two options:

  • Option 1. Items are selected when the population is not divided into parts. This variant includes simple random repeated and non-repeated selections.
  • Option 2. The general population is divided into parts and the selection of elements is made. These include typical, mechanical and serial selections.

Simple random - selection in which elements are extracted one at a time from the entire population at random.

Typical is a selection in which elements are selected not from the entire population, but from all its “typical” parts.

Mechanical - this is such a selection, when the entire population is divided into a number of groups equal to the number of elements that should be in the sample, and, accordingly, one element is selected from each group. For example, if it is necessary to select 25% of the parts made by the machine, then every fourth part is selected, and if 4% of the parts are required, then every twenty-fifth part is selected, and so on. At the same time, it must be said that sometimes mechanical selection may not provide sufficient

Serial - this is such a selection in which elements are selected from the entire population in "series" subjected to continuous research, and not one at a time. For example, when parts are manufactured by a large number of automatic machines, then a complete survey is carried out only in relation to the products of several machines. Serial selection is used if the trait under study has little variability in different series.

In order to reduce the error, estimates of the general population are used with the help of a sample. Moreover, selective control can be both single-stage and multi-stage, which increases the reliability of the survey.

Population- the totality of all objects (units) regarding which the scientist intends to draw conclusions when studying a specific problem. The general population consists of all objects that are subject to study. The composition of the general population depends on the objectives of the study. Sometimes the general population is the entire population of a certain region (for example, when the ratio of potential voters to a candidate is being studied), most often several criteria are set that determine the object of study. For example, women aged 18-29 who use certain brands of hand cream at least once a week and have an income of at least $150 per family member.

Sample- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population for participation in the study.

  1. Sample size;
  2. Dependent and independent samples;
  3. Representativeness:
    1. An example of a non-representative sample;
  4. Types of plan for building groups from samples;
  5. Group Building Strategies:
    1. Randomization;
    2. Pairwise selection;
    3. Stratometric selection;
    4. Approximate modeling.

Sample size- the number of cases included in the sample. For statistical reasons, it is recommended that the number of cases be at least 30-35.

Dependent and independent samples

When comparing two (or more) samples, their dependence is an important parameter. If it is possible to establish a homomorphic pair (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the feature measured on the samples), such samples are called dependent. Examples of dependent samples: pairs of twins, two measurements of a trait before and after experimental exposure, husbands and wives, etc.

If there is no such relationship between the samples, then these samples are considered independent, for example: men and women, psychologists and mathematicians.

Accordingly, dependent samples always have the same size, while the size of independent samples may differ.

Samples are compared using various statistical criteria:

  • Student's t-test;
  • Wilcoxon T-test;
  • U-test Mann-Whitney;
  • Criteria of signs, etc.

Representativeness

The sample may be considered representative or non-representative.

An example of a non-representative sample

In the United States, one of the best-known historical examples of unrepresentative sampling is the 1936 presidential election. phone books all over the country, and people on car registration lists. In 25% of the returned ballots (nearly 2.5 million), the votes were distributed as follows:

57% preferred Republican candidate Alf Landon

40% chose then-Democratic President Franklin Roosevelt

As is well known, Roosevelt won the actual elections with more than 60% of the votes. The Litreary Digest's mistake was this: wanting to increase the representativeness of the sample - because they knew that the majority of their subscribers considered themselves Republicans - they expanded the sample with people selected from phone books and registration lists. However, they did not take into account the realities of their time and in fact recruited even more Republicans: during the Great Depression, it was mainly the middle and upper class (that is, the majority of Republicans, not Democrats) who could afford to own phones and cars.

Types of plan for building groups from samples

There are several main types of group building plan:

  1. Study with experimental and control groups, which are placed in different conditions;
  2. Study with experimental and control groups using a paired selection strategy;
  3. Study using only one group - experimental;
  4. A study using a mixed (factorial) plan - all groups are placed in different conditions.

Group Building Strategies

The selection of groups for their participation in a psychological experiment is carried out using various strategies that are needed in order to ensure the highest possible compliance with internal and external validity:

  1. Randomization (random selection);
  2. Pairwise selection;
  3. Stratometric selection;
  4. Approximate modeling;
  5. Engaging real groups.

Randomization

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put papers with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be random selection

Pairwise selection

Pairwise selection is a strategy for constructing sample groups, in which groups of subjects are made up of subjects that are equivalent in side parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups with the best option - attracting twin pairs (mono- and dizygotic), as it allows you to create.

Stratometric selection

Stratometric selection - randomization with the selection of strata (or clusters). With this method of sampling, the general population is divided into groups (strata) that have certain characteristics (gender, age, political preferences, education, income level, etc.), and subjects with the corresponding characteristics are selected.

Approximate modeling

Approximate modeling - drawing up limited samples and generalizing the conclusions about this sample to a larger population. For example, when participating in a study of students in the 2nd year of university, the data of this study are extended to "people aged 17 to 21 years." The admissibility of such generalizations is extremely limited.

Population(in English - population) - the totality of all objects (units), regarding which the scientist intends to draw conclusions when studying a specific problem.

The general population consists of all objects that are subject to study. The composition of the general population depends on the objectives of the study. Sometimes the general population is the entire population of a certain region (for example, when the attitude of potential voters to a candidate is being studied), most often several criteria are set that determine the object of study. For example, men aged 30-50 who use a certain brand of razor at least once a week and have an income of at least $100 per family member.

Sample or sampling frame- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population for participation in the study.

Sample characteristics:

 Qualitative characteristics of the sample - who exactly we choose and what methods of sampling we use to do this.

 Quantitative characteristics of the sample - how many cases we choose, in other words, the sample size.

Need for sampling

 The object of study is very extensive. For example, consumers of the products of a global company are a huge number of geographically dispersed markets.

 There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample. For statistical reasons, it is recommended that the number of cases be at least 30-35.

17. Main sampling methods

Sampling is primarily based on knowledge of the sample outline, which is understood as a list of all units of the population from which the units of the sample are selected. For example, if we consider all car service workshops in the city of Moscow as a set, then we need to have a list of such workshops, considered as a contour within which the sample is formed.

The sample contour inevitably contains an error, called the sample contour error, which characterizes the degree of deviation from the true size of the population. Obviously, there is no complete official list of all car service workshops in Moscow. The researcher must inform the customer of the work about the size of the sampling contour error.

When forming a sample, probabilistic (random) and improbability (non-random) methods are used.

If all sample units have a known chance (probability) of being included in the sample, then the sample is called a probability sample. If this probability is unknown, then the sample is called improbable. Unfortunately, in most marketing studies, due to the impossibility of accurately determining the size of the population, it is not possible to accurately calculate probabilities. Therefore, the term "known probability" is based more on the use of certain sampling methods than on knowledge of the exact size of the population.

Probabilistic methods include:

Simple random selection;

Systematic selection;

cluster selection;

stratified selection.

Incredible Methods:

Selection based on the principle of convenience;

Selection based on judgments;

Formation of the sample during the survey;

Formation of a sample based on quotas.

The meaning of the selection method based on the principle of convenience is that the sampling is carried out in the most convenient way from the standpoint of the researcher, for example, from the standpoint of minimal time and effort, from the standpoint of the availability of respondents. The selection of the study site and the composition of the sample is made subjectively, for example, a customer survey is carried out in a store closest to the researcher's place of residence. Obviously, many members of the population do not participate in the survey.

Formation of a sample based on judgment is based on the use of the opinion of qualified specialists, experts regarding the composition of the sample. Based on this approach, the composition of the focus group is often formed.

The formation of the sample during the survey is based on the expansion of the number of respondents based on the proposals of respondents who have already taken part in the survey. Initially, the researcher forms a sample much smaller than required for the study, then it expands as it is carried out.

The formation of a sample based on quotas (quota selection) involves a preliminary determination, based on the objectives of the study, of the number of groups of respondents that meet certain requirements (features). For example, for the purposes of the study, it was decided that fifty men and fifty women should be interviewed in a department store. The interviewer conducts a survey until he selects a set quota.

The distribution of a random variable contains all the information about its statistical properties. How many values ​​of a random variable do you need to know in order to build its distribution? To do this, you need to explore general population.

The general population is the set of all values ​​that a given random variable can take.

The number of units in the general population is called its volume N. This value can be finite or infinite. For example, if we study the growth of the inhabitants of a certain city, then the volume of the general population will be equal to the number of inhabitants of the city. If any physical experiment is performed, then the volume of the general population will be infinite, since the number of all possible values ​​of any physical parameter is equal to infinity.

The study of the general population is not always possible and appropriate. It is impossible if the size of the general population is infinite. But even with finite volumes, a complete study is not always justified, since it requires a lot of time and labor, and the absolute accuracy of the results is usually not required. Less accurate results, but with much less effort and money, can be obtained by studying only a part of the general population. Such studies are called selective.

Statistical studies conducted only on a part of the general population are called sampling, and the studied part of the general population is called a sample.

Figure 7.2 symbolically shows the population and the sample as a set and its subset.

Figure 7.2 Population and sample

Working with some subset of a given general population, often constituting an insignificant part of it, we obtain results that are quite satisfactory in accuracy for practical purposes. Examination of a large part of the general population only increases the accuracy, but does not change the essence of the results, if the sample is taken correctly from a statistical point of view.

In order for the sample to reflect the properties of the general population and the results to be reliable, it must be representative(representative).

In some general populations, any part of them is representative by virtue of their nature. However, in most cases special care must be taken to ensure that samples are representative.

One One of the main achievements of modern mathematical statistics is considered to be the development of the theory and practice of the random sampling method, which ensures the representativeness of data selection.

Sample studies always lose in accuracy compared to the study of the entire population. However, this can be reconciled if the magnitude of the error is known. Obviously, the more the sample size approaches the size of the general population, the smaller the error will be. From this it is clear that the problems of statistical inference become especially relevant when working with small samples ( N ? 10-50).

The entire array of individuals of a certain category is called the general population. The volume of the general population is determined by the objectives of the study.

If any species of wild animals or plants is being studied, then the general population will be all individuals of this species. In this case, the volume of the general population will be very large and in the calculations it is taken as an infinitely large value.

If the effect of some agent on plants and animals of a certain category is being studied, then the general population will be all plants and animals of that category (species, sex, age, economic purpose) to which the experimental objects belonged. This is no longer a very large number of individuals, but still inaccessible for continuous study.

The volume of the general population is not always available for a continuous study. Sometimes small aggregates are studied, for example, the average milk yield or the average wool shear is determined for a group of animals assigned to a particular worker. In such cases, the general population will be a very small number of individuals, all of which are studied. A small general population is also found in the study of plants or animals present in a collection in order to characterize a particular group in this collection.

Characteristics of group properties (etc.) relating to the entire population are called general parameters.

A sample is a group of objects that have three features:

1 is part of the general population;

2 selected at random, in a certain way;

3 studied to characterize the entire general population.

In order to obtain a fairly accurate characterization of the entire general population from the sample, it is necessary to organize the correct selection of objects from the general population.

Theory and practice have developed several systems for selecting individuals in a sample. The basis of all these systems is the desire to provide the maximum possibility of choosing any object from the general population. Bias, bias in the selection of objects for sample research prevent obtaining correct general conclusions, make the results of a sample study indicative of the entire population, i.e., unrepresentative.

To obtain a correct, undistorted characterization of the entire general population, it is necessary to strive to ensure the possibility of selecting any object from any part of the general population in the sample. This basic requirement must be met more strictly, the more variable the trait under study. It is quite understandable that with diversity approaching zero, for example in the case of studying the color of the hair or feathers of some species, any method of sampling will give representative results.

In various studies, the following methods of selecting objects in the sample are used.

4 Random re-selection, in which the objects of study are selected from the general population without first taking into account the development of the trait under study, i.e., in a random (for this trait) order; after selection, each item is studied and then returned to its own population, so that any item can be re-sampled. This method of selection is tantamount to selection from an infinitely large general population, for which the main indicators of the relationship between sample and general values ​​have been developed.

5 Random non-repetitive selection, in which objects randomly selected, as in the previous method, are not returned to the general population and cannot re-enter the sample. This is the most common sampling arrangement; it is tantamount to selection from a large but limited general population, which is taken into account when determining general indicators from sample ones.

6 Mechanical selection, in which objects are selected from separate parts of the general population, and these parts are preliminarily marked mechanically according to the squares of the experimental field, according to random groups of animals taken from different areas of the population, etc. Usually, as many such parts are planned as it is supposed to be taken objects to study, so the number of parts is equal to the size of the sample. Mechanical selection is sometimes carried out by choosing to study individuals after a certain number, for example, when passing animals through a split and selecting every tenth, hundredth, etc., or when taking a cut every 100 or 200 m, or selecting one object every 10 encountered, 100, etc. copies in the study of the entire population.

8 Serial (nested) selection, in which the general population is divided into parts - series, some of them are studied in their entirety. This method is used with success in those cases when the studied objects are fairly evenly distributed in a certain volume or in a certain territory. For example, when studying the contamination of air or water with microorganisms, samples are taken, which are subjected to a continuous study. In some cases, agricultural objects can also be surveyed by the nesting method. When studying the yields of meat and other products of processing of meat breeds of cattle, it is possible to take into the sample all animals of this breed that arrived at two or three meat processing plants. When studying the size of eggs in collective-farm poultry farming, it is possible to study this trait in the entire population of chickens on several collective farms.

Characteristics of group properties (μ, s etc.) obtained for a sample are called sample indicators.

Representativeness

A direct study of a group of selected objects provides, first of all, the primary material and characteristics of the sample itself.

All sample data and summary indicators are important as primary facts revealed by the study and subject to careful consideration, analysis and comparison with the results of other works. But this is not limited to the process of extracting information embedded in the primary materials of the study.

The fact that the objects were selected in the sample by special methods and in sufficient quantity makes the results of the study of the sample indicative not only for the sample itself, but also for the entire general population from which this sample was taken.

The sample, under certain conditions, becomes a more or less accurate reflection of the entire population. This property of the sample is called representativeness, which means representativeness with a certain accuracy and reliability.

Like any property, the representativeness of sample data can be expressed to a sufficient or insufficient extent. In the first case, reliable estimates of general parameters are obtained in the sample, in the second case, unreliable ones. It is important to remember that obtaining unreliable estimates does not detract from the value of sample indicators for characterizing the sample itself. Obtaining reliable estimates expands the scope of the achievements obtained in a selective study.