Biographies Characteristics Analysis

Statistics in simple terms. Essence and meaning of averages

Fuels are set on the basis of an analysis of statistical data on actual specific fuel consumption, as well as factors affecting changes in normal operating conditions. Multiple regression models are used as a mathematical apparatus.

Analysis of publications on the evaluation of the economic efficiency of new technology and their own research allowed the authors to draw a number of conclusions. First of all, the impact of individual factors on increasing the economic efficiency of production when using new equipment in oil product pipeline transport can be identified on the basis of voluminous material of actual observations and analysis of statistical data. When determining indicators for assessing economic efficiency, quantitative values ​​of meters should be taken into account, taking into account the conditions in force in a given period. The standards used in the calculations should fully reflect the existing costs with indexation of the cost of production and use of equipment in terms of inflation.

The history of the development of mankind has shown that without statistical data it is impossible to govern the state, develop individual industries and sectors of the economy, and ensure optimal proportions between them. The need to collect and summarize a lot of data on the country's population, enterprises, banks, farms, etc. leads to the existence of special statistical services - state statistics institutions. Depending on which industry the collection, processing and analysis of statistical data is organized, there are statistics of the population, industry, agriculture, capital construction, finance, etc. All these sections of statistics are designed to develop methods for collecting and summarizing data, constructing summary indicators to reflect the processes in the relevant industry. Statistics also calculates general economic indicators - gross national product, gross domestic product, total social product, national income, etc.

The word statistics is used in several senses, primarily as a synonym for the word data. It is in this sense that we can say the statistics of births and deaths in Russia or the statistics of crimes. Statistics is a branch of knowledge that combines the principles and methods of working with numerical data characterizing mass phenomena. Statistics is also called the branch of practical activity aimed at collecting, processing, analyzing statistical data.

An analysis of the causes of the emergence and course of inflation in the Russian Federation shows their uniqueness and a significant predominance of cost-push inflation over demand-push inflation. Therefore, Western anti-inflationary theories are not very suitable for Russian conditions. A domestic, harmonious, complete theory has not yet been created, just as there are no thick Russian textbooks on the fight against inflation. Bits of much-needed knowledge are scattered across hundreds of newspapers and magazines. The task is, on the one hand, to clear up the clots of non-payments, which in some cases have already led to the paralysis of production, on the other hand, to prevent precipitous inflation. Difficult tasks, but they must be solved. Based on the analysis of statistical data for the last seven years, the study of publications of leading domestic economists, the author proposes his own solutions to problems.

The task is, on the one hand, to clean up the clots of non-payments, which in some cases have already led to paralysis, and on the other hand, to prevent precipitous inflation. It's time to start suppressing inflation in the normal way - by increasing the output of products that are in demand in every possible way. The most difficult tasks, but they must be solved if we want to survive as a world power, and not as a raw materials appendage. Based on the analysis of statistical data and familiarization with the publications of leading domestic economists, the author proposes his own solutions to problems.

Thus, in models with variable parameters, a differentiated approach is needed to establish the ranges of variation of the selection coefficients, based on the analysis of statistical data, the type of technological processes and quality indicators of flows.

Forecasting tax revenues based on macroeconomic indicators determines the strategy for generating tax revenues for the next year and the medium term, but does not solve all the problems of tax planning. Therefore, a necessary component of tax planning is the processing and analysis of statistical data on the accumulation of taxes to the budget over the past period, as well as information on possible changes in tax legislation.

It is necessary to organize a systematic collection and analysis of statistical data characterizing the dynamics by years of operation of the volume of products and work performed using the introduced equipment, as well as the cost, labor intensity and material consumption.

Along with the determination by the main selected parameter, the calculation of the need for certain types of machinery and equipment is adjusted based on a number of other factors, changes in the balance of consumption of machinery and equipment by sectors of the national economy, changes in the structure of output of products, changes in the product range planned in rubles due to the introduction of more progressive, reliable and durable designs of changes associated with the development of specialization and cooperation, affecting the total volume of output, etc. period.

There is a very close relationship between employment indicators and other important indicators of economic development. Thus, the relationship between unemployment and GDP change is characterized by Okun's law, empirically discovered based on the analysis of statistical data for the United States (for the period of the 50-80s), and then substantiated and theoretically in macroeconomic studies. In its original form, as applied to the United States, Okun's law reads

For all positive values ​​of x, the function increases at x = b/2, the curve has an inflection point - accelerated growth at x slow growth at x > b/2. Functions of this type are used in the analysis of statistical data on consumer budgets, where a hypothesis is put forward about the existence of an asymptotic level of expenditure, about a change in the marginal propensity to consume a product, about the existence of a threshold level of income 1. In this case, for x -> yes y - e "(Fig. .2.5).

This formula was applied to analyze statistical data,

All sales forecasts are based on the use of three types of information obtained from studying what people say, what people do, and what people have done. Obtaining the first type of information is based on the study of the opinions of consumers and buyers, sales agents and intermediaries. Methods of sociological research and expert methods are used here. Learning what people are doing involves doing market testing. Studying what people have done involves analyzing the statistics of the purchases they have made.

Let us consider the distribution of oil and gas production facilities by the nature of changes in production volumes at oil and gas production facilities with growing, stable and declining production. For 1/1 1972, out of 104 oil and gas production departments of the industry, 43 (or 41.4%) were growing and 61 were stable or falling. Analysis of statistical data for 1970, carried out by the authors for 76 OGPDs, made it possible to identify some common characteristics of various subgroups of NGDUs, which are given in Table. fifteen.

Hypothesis testing is carried out using statistical analysis. Statistical significance is found using the P-value, which corresponds to the probability of a given event under the assumption that some statement (null hypothesis) is true. If the P-value is less than a given level of statistical significance (usually 0.05), the experimenter can safely conclude that the null hypothesis is false and move on to consider the alternative hypothesis. Using Student's t-test, you can calculate the P-value and determine the significance for two data sets.

Steps

Part 1

Setting up an experiment

    Define your hypothesis. The first step in evaluating statistical significance is to choose the question you want answered and formulate a hypothesis. A hypothesis is a statement about experimental data, their distribution and properties. For any experiment, there is both a null and an alternative hypothesis. Generally speaking, you will have to compare two sets of data to determine if they are similar or different.

    • The null hypothesis (H 0) usually states that there is no difference between the two datasets. For example: those students who read the material before class do not get higher marks.
    • The alternative hypothesis (H a) is the opposite of the null hypothesis and is a statement that needs to be confirmed with experimental data. For example: those students who read the material before class get higher marks.
  1. Set the significance level to determine how much the distribution of the data must differ from the usual one in order to be considered a significant result. Significance level (also called α (\displaystyle \alpha )-level) is the threshold you define for statistical significance. If the P-value is less than or equal to the significance level, the data is considered statistically significant.

    • As a rule, the level of significance (value α (\displaystyle \alpha )) is taken equal to 0.05, in which case the probability of detecting a random difference between different data sets is only 5%.
    • The higher the significance level (and, accordingly, the smaller the P-value), the more reliable the results.
    • If you want more reliable results, lower the P-value to 0.01. Typically, lower P-values ​​are used in production when it is necessary to detect defects in products. In this case, high fidelity is required to ensure that all parts work as expected.
    • For most hypotheses experiments, a significance level of 0.05 is sufficient.
  2. Decide which criteria you will use: one-sided or two-sided. One of the assumptions in Student's t-test is that the data are normally distributed. The normal distribution is a bell-shaped curve with the maximum number of results in the middle of the curve. Student's t-test is a mathematical data validation method that allows you to determine whether the data falls outside the normal distribution (more, less, or in the "tails" of the curve).

    • If you're not sure if the data is above or below the control group, use a two-tailed test. This will allow you to determine the significance in both directions.
    • If you know in which direction the data might fall outside of the normal distribution, use a one-tailed test. In the example above, we expect students' grades to go up, so a one-tailed test can be used.
  3. Determine the sample size using statistical power. The statistical power of a study is the probability that a given sample size will produce the expected result. A common power threshold (or β) is 80%. Power analysis without any prior data can be tricky because some information is required about the expected means in each data set and their standard deviations. Use the online statistical power calculator to determine the optimal sample size for your data.

    • Typically, researchers conduct a small pilot study that provides data for power analysis and determines the sample size needed for a larger and more complete study.
    • If you do not have the opportunity to conduct a pilot study, try to estimate possible average values ​​based on the literature data and the results of other people. This may help you determine the optimal sample size.

    Part 2

    Compute Standard Deviation
    1. Write down the formula for the standard deviation. The standard deviation indicates how large the spread of the data is. It allows you to conclude how close the data obtained on a particular sample. At first glance, the formula seems rather complicated, but the explanations below will help you understand it. The formula is as follows: s = √∑((x i – µ) 2 /(N – 1)).

      • s - standard deviation;
      • the ∑ sign indicates that all the data obtained in the sample should be added;
      • x i corresponds to the i-th value, that is, a separate result obtained;
      • µ is the average value for this group;
      • N is the total number of data in the sample.
    2. Find the average in each group. To calculate the standard deviation, you must first find the mean for each study group. The mean value is denoted by the Greek letter µ (mu). To find the average, simply add up all the resulting values ​​and divide them by the amount of data (sample size).

      • For example, to find the average grade in a group of students who study material before class, consider a small data set. For simplicity, we use a set of five points: 90, 91, 85, 83 and 94.
      • Let's add all the values ​​together: 90 + 91 + 85 + 83 + 94 = 443.
      • Divide the sum by the number of values, N = 5: 443/5 = 88.6.
      • Thus, the average value for this group is 88.6.
    3. Subtract each value obtained from the average. The next step is to calculate the difference (x i - µ). To do this, subtract each value obtained from the found average value. In our example, we need to find five differences:

      • (90 - 88.6), (91 - 88.6), (85 - 88.6), (83 - 88.6) and (94 - 88.6).
      • As a result, we get the following values: 1.4, 2.4, -3.6, -5.6 and 5.4.
    4. Square each value obtained and add them together. Each of the quantities just found should be squared. This step will remove all negative values. If after this step you still have negative numbers, then you forgot to square them.

      • For our example, we get 1.96, 5.76, 12.96, 31.36 and 29.16.
      • We add the obtained values: 1.96 + 5.76 + 12.96 + 31.36 + 29.16 = 81.2.
    5. Divide by the sample size minus 1. In the formula, the sum is divided by N - 1 due to the fact that we do not take into account the general population, but take a sample of all students for evaluation.

      • Subtract: N - 1 = 5 - 1 = 4
      • Divide: 81.2/4 = 20.3
    6. Take the square root. After dividing the sum by the sample size minus one, take the square root of the found value. This is the last step in calculating the standard deviation. There are statistical programs that, after entering the initial data, perform all the necessary calculations.

      • In our example, the standard deviation of the marks of those students who read the material before class is s = √20.3 = 4.51.

    Part 3

    Determine Significance
    1. Calculate the variance between the two groups of data. Up to this step, we have considered the example for only one group of data. If you want to compare two groups, obviously you should take the data for both groups. Calculate the standard deviation for the second group of data and then find the variance between the two experimental groups. The dispersion is calculated using the following formula: s d = √((s 1 /N 1) + (s 2 /N 2)).

The activity of people in many cases involves working with data, and it, in turn, can mean not only operating with them, but also studying, processing and analyzing them. For example, when you need to condense information, find some kind of relationship or define structures. And just for analytics in this case it is very convenient to use not only, but also to apply statistical methods.

A feature of the methods of statistical analysis is their complexity, due to the variety of forms of statistical patterns, as well as the complexity of the process of statistical research. However, we want to talk about exactly such methods that everyone can use, and do it effectively and with pleasure.

Statistical research can be carried out using the following methods:

  • Statistical observation;
  • Summary and grouping of statistical observation materials;
  • Absolute and relative statistical values;
  • Variation series;
  • Sample;
  • Correlation and regression analysis;
  • Rows of dynamics.

Statistical observation

Statistical observation is a planned, organized and in most cases systematic collection of information, aimed mainly at the phenomena of social life. This method is implemented through the registration of predetermined most striking features, the purpose of which is to subsequently obtain the characteristics of the studied phenomena.

Statistical observation must be carried out taking into account some important requirements:

  • It should fully cover the studied phenomena;
  • The data received must be accurate and reliable;
  • The resulting data should be uniform and easily comparable.

Also, statistical observation can take two forms:

  • Reporting is a form of statistical observation where information is received by specific statistical units of organizations, institutions or enterprises. In this case, the data is entered into special reports.
  • Specially organized observation - observation, which is organized for a specific purpose, in order to obtain information that is not available in the reports, or to clarify and establish the reliability of the information in the reports. This form includes surveys (for example, polls of people's opinions), population censuses, etc.

In addition, a statistical observation can be categorized on the basis of two features: either on the basis of the nature of the data collection or on the basis of the coverage of the units of observation. The first category includes interviews, documentation and direct observation, and the second category includes continuous and non-continuous observation, i.e. selective.

To obtain data using statistical observation, one can use such methods as questionnaires, correspondent activities, self-calculation (when the observed, for example, fill out the relevant documents themselves), expeditions and reporting.

Summary and grouping of statistical observation materials

Speaking about the second method, first of all it should be said about the summary. A summary is a process of processing certain single facts that form the total set of data collected during observation. If the summary is carried out correctly, a huge amount of single data on individual objects of observation can turn into a whole complex of statistical tables and results. Also, such a study helps to determine the common features and patterns of the studied phenomena.

Given the accuracy and depth of study, a simple and complex summary can be distinguished, but any of them should be based on specific stages:

  • A grouping attribute is selected;
  • The order of formation of groups is determined;
  • A system of indicators is being developed to characterize the group and the object or phenomenon as a whole;
  • Table layouts are being developed where the summary results will be presented.

It is important to note that there are different forms of summary:

  • Centralized summary, requiring the transfer of the received primary material to a higher center for further processing;
  • Decentralized summary, where the study of data occurs at several stages in ascending order.

The summary can be performed using specialized equipment, for example, using computer software or manually.

As for the grouping, this process is distinguished by the division of the studied data into groups according to features. The features of the tasks set by statistical analysis affect what kind of grouping will be: typological, structural or analytical. That is why, for summaries and groupings, they either resort to the services of highly specialized specialists, or use them.

Absolute and relative statistics

Absolute values ​​are considered the very first form of presentation of statistical data. With its help, it is possible to give phenomena dimensional characteristics, for example, in time, in length, in volume, in area, in mass, etc.

If you want to know about individual absolute statistical values, you can resort to measurement, evaluation, counting or weighting. And if you need to get total volume indicators, you should use a summary and grouping. It must be borne in mind that absolute statistical values ​​differ in the presence of units of measurement. Such units include cost, labor and natural.

And the relative values ​​express the quantitative ratios relating to the phenomena of social life. To get them, some quantities are always divided by others. The indicator that is compared (this is the denominator) is called the basis of comparison, and the indicator that is compared (this is the numerator) is called the reporting value.

Relative values ​​can be different, depending on their content. For example, there are magnitudes of comparison, magnitudes of the level of development, magnitudes of the intensity of a particular process, magnitudes of coordination, structure, dynamics, and so on. etc.

To study some set of differentiating features, statistical analysis uses average values ​​- generalizing the qualitative characteristics of a set of homogeneous phenomena for some differentiating feature.

An extremely important property of averages is that they speak about the values ​​of specific features in their entire complex as a single number. Despite the fact that individual units may have a quantitative difference, the average values ​​express the general values ​​inherent in all units of the complex under study. It turns out that with the help of the characteristics of one thing, you can get the characteristics of the whole.

It should be borne in mind that one of the most important conditions for the use of averages, if a statistical analysis of social phenomena is carried out, is considered to be the homogeneity of their complex, for which it is necessary to find out the average. And the formula for determining it will depend on how exactly the initial data for calculating the average value will be presented.

Variation Series

In some cases, data on the averages of certain studied quantities may not be enough to process, evaluate and in-depth analysis of a phenomenon or process. Then one should take into account the variation or spread of indicators of individual units, which is also an important characteristic of the population under study.

Many factors can affect the individual values ​​of quantities, and the phenomena or processes under study can be very diverse, i.e. to have variation (this variety is the series of variations), the causes of which should be sought in the essence of what is being studied.

The above absolute values ​​are directly dependent on the units of measurement of features, which means that they make the process of studying, evaluating and comparing two or more variational series more difficult. And relative indicators need to be calculated as a ratio of absolute and average indicators.

Sample

The meaning of the sampling method (or, more simply, sampling) is that the properties of one part determine the numerical characteristics of the whole (this is called the general population). The main selective method is an internal connection that unites parts and the whole, singular and general.

The sampling method has a number of significant advantages over the others, because Due to the reduction in the number of observations, it allows to reduce the amount of work, expended funds and efforts, as well as successfully obtain data on such processes and phenomena where it is either impractical or simply impossible to study them completely.

The correspondence between the characteristics of the sample and the characteristics of the phenomenon or process under study will depend on a set of conditions, and, first of all, on how the sampling method will be implemented in practice. This can be either systematic selection, following a prepared scheme, or unplanned, when the sample is made from the general population.

But in all cases, the sampling method must be typical and meet the criteria of objectivity. These requirements must always be met, because. it is on them that the correspondence between the characteristics of the method and the characteristics of what is subjected to statistical analysis will depend.

Thus, before processing the sample material, it is necessary to carefully check it, thereby getting rid of everything unnecessary and secondary. At the same time, when compiling a sample, it is imperative to bypass any amateur performance. This means that in no case should you select only those options that seem typical, and discard all others.

An effective and high-quality sample must be drawn objectively, i.e. it must be produced in such a way that any subjective influences and preconceived motives are excluded. And in order for this condition to be properly observed, it is required to resort to the principle of randomization, or, more simply, to the principle of random selection of options from their entire population.

The presented principle serves as the basis of the theory of the sampling method, and it must be followed whenever it is required to create an effective sampling population, and cases of systematic selection are no exception here.

Correlation and regression analysis

Correlation analysis and regression analysis are two highly effective methods for analyzing large amounts of data to explore the possible relationship between two or more indicators.

In the case of correlation analysis, the tasks are:

  • Measure the tightness of the existing connection of differentiating features;
  • Determine unknown causal relationships;
  • Assess the factors that have the greatest impact on the final trait.

And in the case of regression analysis, the tasks are as follows:

  • Determine the form of communication;
  • Establish the degree of influence of independent indicators on the dependent one;
  • Determine the calculated values ​​of the dependent indicator.

To solve all the above problems, it is almost always necessary to apply both correlation and regression analysis in combination.

Series of dynamics

Using this method of statistical analysis, it is very convenient to determine the intensity or speed with which phenomena develop, to find the trend of their development, to single out fluctuations, to compare the dynamics of development, to find the relationship between phenomena developing over time.

A series of dynamics is a series in which statistical indicators are sequentially located in time, changes in which characterize the process of development of the object or phenomenon under study.

The series of dynamics includes two components:

  • The period or point in time associated with the available data;
  • Level or statistic.

Together, these components represent two terms of a series of dynamics, where the first term (time period) is denoted by the letter "t", and the second (level) - by the letter "y".

Based on the duration of the time intervals with which the levels are interconnected, the series of dynamics can be momentary and interval. Interval series allow you to add levels to obtain the total value of periods following one after another, but in moment series there is no such possibility, but this is not required there.

Time series also exist with equal and different intervals. The essence of intervals in moment and interval series is always different. In the first case, the interval is the time interval between the dates to which the data for analysis is linked (it is convenient to use such a series, for example, to determine the number of actions per month, year, etc.). And in the second case - the time period to which the aggregated data is attached (such a series can be used to determine the quality of the same actions for a month, year, etc.). Intervals can be equal or different, regardless of the series type.

Naturally, in order to learn how to correctly apply each of the methods of statistical analysis, it is not enough just to know about them, because, in fact, statistics is a whole science that also requires certain skills and abilities. But to make it easier, you can and should train your thinking and.

Otherwise, research, evaluation, processing and analysis of information are very interesting processes. And even in cases where it does not lead to any specific result, during the study you can learn a lot of interesting things. Statistical analysis has found its way into a huge number of areas of human activity, and you can use it in study, work, business and other areas, including child development and self-education.

To obtain data on the state of society, a whole complex of sciences is used. One of them is statistics. What does she represent?

What is statistics?

This is the name of the branch of knowledge, which sets out general questions on the collection, measurement and analysis of mass (quantitative or qualitative) data. Also, statistics is engaged in the study of the quantitative side of social mass phenomena in terms of their numerical form. This word comes from the Latin status, which means "state of affairs." Initially, this science was called "State Studies".

The term "statistics" was first used in 1746, and this moment marked the beginning of such an academic discipline and science. True, it cannot be said that its direct use began with this, since the accounting, measurement and analysis of data were carried out much earlier. Fashion is an important parameter. Something similar can be remembered from geometry, but this is not quite the same. But in statistics? This is the name of the value from the linear series, which occurs most often.

Examples

Let's talk about something closer to reality. What are website page statistics? This parameter can be the number of users who accessed the resource and had the opportunity to view its content. True, from this point of view it will be difficult to answer the question of what VKontakte statistics are.

Separate information for each page is not collected. But the number of users who come in a day, a month is counted - in general, constantly. This is the answer to the question, what is statistics in practice in information technology.

Grouping types

Within the framework of a scientific discipline, one set is divided into separate groups, which are homogeneous in a certain respect. To calculate the number of intervals when there are no clear frames, the Sturges formula is often used:

CHI \u003d 1 + 3.322 * lg CHN, where

  • CHI - number of integrals;
  • Lg - logarithm;
  • CN - number of observations.

Depending on the goals, there are three types of groupings:


A typical group should strive to be as different from others as possible and to be as similar as possible within itself. They are primary and secondary. The first ones are formed during the Secondary groupings are made based on the received data.

Classification of statistical methods

They have found their way almost everywhere. Therefore, it is logical to assume that there is no universal tool. Depending on the specificity and immersion in specific problems, the following data analysis is distinguished:

  • Development and research of general purpose tools that do not take into account the specifics of the application area.
  • Creation and use of statistical models of some real phenomenon or process in a certain field of activity.
  • Development and use of methods and tools to analyze specific data to solve applied problems.

Applied Statistics

This branch of science deals with the processing of data of an arbitrary nature. Probability theory also serves as the mathematical basis of applied statistics and its methods of analysis. It all starts with a description of the type of data received, as well as the mechanism of their origin. For this, probabilistic and deterministic methods are used. The latter can be applied only in cases where the researcher has enough data at his disposal (an example is the reports of state statistical bodies, which are based on information provided by enterprises). But you can transfer the result to a larger scale and evaluate the prospects only using

In the simplest situation, the available data act as the value of a certain feature that is characteristic of the object under study. The parameters here are quantitative or indicative (depending on the category to which they belong). The second option usually speaks of a qualitative characteristic. What if we take several of them? Or add quantitative? Then we can say that the vector of the object has been obtained. It is regarded as new. In large-scale studies, samples are drawn from several sets of vectors. It is important to clarify and double-check the information received. For this, resampling is used.

Conclusion

As you can see, statistics allows you to structure significant amounts of data that are necessary to be able to provide information about the state of affairs in certain areas. Thus, it plays an important role for investors, as it makes it possible to observe the dynamics of the growth of the economies of states. Statistics are also of interest to citizens and authorities, telling them about the processes in the country: demographic growth or crisis, increase in welfare or its fall, and so on.