Biographies Characteristics Analysis

Reliability of statistical data formula. The concept of statistical significance

Today it's really too easy: you can walk up to a computer and with little or no knowledge of what you're doing, create sane and nonsense with truly amazing speed. (J. Box)

Basic terms and concepts of medical statistics

In this article, we present some of the key concepts of statistics that are relevant in medical research. The terms are discussed in more detail in the relevant articles.

Variation

Definition. The degree of dispersion of data (sign values) over the range of values

Probability

Definition. Probability is the degree to which a certain event can occur under certain conditions.

Example. Let us explain the definition of the term in the sentence "The probability of recovery when using the drug Arimidex is 70%". The event is “the recovery of the patient”, the condition “the patient is taking Arimidex”, the degree of possibility is 70% (roughly speaking, out of 100 people taking Arimidex, 70 recover).

Cumulative Probability

Definition. The Cumulative Probability of Surviving at time t is the same as the proportion of patients who have survived at that time.

Example. If it is said that the cumulative probability of survival after a five-year course of treatment is 0.7, then this means that of the considered group of patients, 70% of the initial number remained alive, and 30% died. In other words, out of every hundred people, 30 died within the first 5 years.

Time to event

Definition. Time to event - this is the time, expressed in some units, elapsed from some initial time until the occurrence of some event.

Explanation. The units of time in medical research are days, months, and years.

Typical examples of initial times:

    start of patient follow-up

    surgical treatment

Typical examples of considered events:

    disease progression

    recurrence

    patient death

Sample

Definition. Part of a population obtained by selection.

Based on the results of the sample analysis, conclusions are drawn about the entire population, which is valid only if the selection was random. Since random selection from a population is practically impossible, one should strive to ensure that the sample is at least representative of the population.

Dependent and independent samples

Definition. Samples in which the objects of study were recruited independently of each other. An alternative to independent samples is dependent (connected, paired) samples.

Hypothesis

Bilateral and unilateral hypotheses

Let us first explain the use of the term hypothesis in statistics.

The goal of most research is to test the truth of some statement. The purpose of drug testing is most often to test the hypothesis that one drug is more effective than another (for example, Arimidex is more effective than Tamoxifen).

To convey the rigor of the study, the statement being verified is expressed mathematically. For example, if A is the number of years a patient on Arimidex will live and T is the number of years a patient will live on Tamoxifen, then the hypothesis being tested can be written as A>T.

Definition. A hypothesis is called two-sided if it consists in the equality of two quantities.

An example of a two-sided hypothesis: A=T.

Definition. A hypothesis is called one-sided (1-sided) if it consists in the inequality of two quantities.

Examples of one-sided hypotheses:

Dichotomous (binary) data

Definition. Data expressed by only two valid alternative values

Example: The patient is "healthy" - "sick". Edema "is" - "is not present".

Confidence interval

Definition. The confidence interval for some quantity is the range around the value of the quantity that contains the true value of that quantity (with a certain level of confidence).

Example. Let the quantity under study be the number of patients per year. On average, their number is 500, and the 95% confidence interval is (350, 900). This means that, most likely (with a probability of 95%), at least 350 and no more than 900 people will contact the clinic during the year.

Designation. A very common abbreviation is: 95% CI (95% CI) is a confidence interval with a confidence level of 95%.

Reliability, statistical significance (P - level)

Definition. The statistical significance of a result is a measure of confidence in its "truth".

Any research is based on only a part of the objects. The study of the effectiveness of a drug is not carried out on the basis of all patients on the planet in general, but only on a certain group of patients (it is simply impossible to conduct an analysis on the basis of all patients).

Let us assume that some conclusion was made as a result of the analysis (for example, the use of Arimidex as an adequate therapy is 2 times more effective than Tamoxifen).

The question that needs to be asked is: "How much can you trust this result?".

Imagine that we were conducting a study based on only two patients. Of course, in this case, the results should be treated with concern. If a large number of patients were examined (the numerical value of "a large number" depends on the situation), then the conclusions drawn can already be trusted.

So, the degree of trust is determined by the value of the p-level (p-value).

A higher p-level corresponds to a lower level of confidence in the results obtained from the analysis of the sample. For example, a p-level equal to 0.05 (5%) shows that the conclusion made during the analysis of a certain group is only a random feature of these objects with a probability of only 5%.

In other words, with a very high probability (95%), the conclusion can be extended to all objects.

In many studies, 5% is considered an acceptable p-value. This means that if, for example, p=0.01, then the results can be trusted, but if p=0.06, then it is impossible.

Study

prospective study is a study in which samples are selected based on an input factor, and some resulting factor is analyzed in the samples.

Retrospective study is a study in which samples are selected based on the resulting factor, and some input factor is analyzed in the samples.

Example. The initial factor is a pregnant woman younger/older than 20 years. The resulting factor is the child is lighter/heavier than 2.5 kg. We analyze whether the weight of the child depends on the age of the mother.

If we take 2 samples, one with mothers younger than 20 years old, the other with older ones, and then analyze the mass of children in each group, then this is a prospective study.

If we collect 2 samples, in one - mothers who gave birth to children lighter than 2.5 kg, in the other - heavier, and then we analyze the age of mothers in each group, then this is a retrospective study (of course, such a study can be carried out only when the experiment is completed, i.e. all children were born).

Exodus

Definition. A clinically significant event, laboratory value, or sign that is of interest to the researcher. In clinical trials, outcomes serve as criteria for evaluating the effectiveness of a therapeutic or prophylactic intervention.

Clinical epidemiology

Definition. The science that makes it possible to predict a particular outcome for a particular patient based on the study of the clinical course of the disease in similar cases, using rigorous scientific methods of studying patients to ensure accurate predictions.

Cohort

Definition. A group of participants in a study, united by some common feature at the time of its formation and studied over a long period of time.

The control

Historical control

Definition. The control group formed and examined in the period preceding the study.

Parallel control

Definition. The control group, formed simultaneously with the formation of the main group.

Correlation

Definition. Statistical relationship of two signs (quantitative or ordinal), showing that a larger value of one sign in a certain part of cases corresponds to a greater - in the case of a positive (direct) correlation - the value of another sign or a smaller value - in the case of a negative (inverse) correlation.

Example. A significant correlation was found between the level of platelets and leukocytes in the patient's blood. The correlation coefficient is 0.76.

Risk ratio (CR)

Definition. The risk ratio (hazard ratio) is the ratio of the probability of a certain ("bad") event for the first group of objects to the probability of the same event occurring for the second group of objects.

Example. If nonsmokers have a 20% chance of getting lung cancer and 100% chance of getting lung cancer in smokers, then the CR will be one-fifth. In this example, the first group of objects are non-smokers, the second group is smokers, and the occurrence of lung cancer is considered as a "bad" event.

It's obvious that:

1) if КР=1, then the probability of the event occurring in the groups is the same

2) if КР>1, then the event occurs more often with objects from the first group than from the second

3) if CR<1, то событие чаще происходит с объектами из второй группы, чем из первой

Meta-analysis

Definition. With statistical analysis summarizing the results of several studies investigating the same problem (usually the effectiveness of methods of treatment, prevention, diagnosis). Pooling studies provides a larger sample for analysis and greater statistical power of pooled studies. Used to increase the evidence or confidence in the conclusion about the effectiveness of the study method.

Kaplan-Meier method (Multiple Kaplan-Meier estimates)

This method was invented by statisticians E. L. Kaplan and Paul Meyer.

The method is used to calculate various quantities related to the time of observation of the patient. Examples of such values:

    chance of recovery within one year when using the drug

    chance of recurrence after surgery within three years after surgery

    cumulative probability of survival at five years among patients with prostate cancer after organ amputation

Let us explain the advantages of using the Kaplan-Meier method.

The value of the values ​​in the "normal" analysis (not using the Kaplan-Meier method) is calculated on the basis of dividing the considered time interval into intervals.

For example, if we examine the probability of death of a patient within 5 years, then the time interval can be divided into 5 parts (less than 1 year, 1-2 years, 2-3 years, 3-4 years, 4-5 years), so and 10 (half a year each), or another number of intervals. The results will be different for different partitions.

Choosing the most appropriate partition is not an easy task.

Estimates of the values ​​of quantities obtained by the Kaplan-Meier method do not depend on the division of the observation time into intervals, but depend only on the lifetime of each individual patient.

Therefore, it is easier for the researcher to carry out the analysis, and the results often turn out to be of higher quality than the results of the “ordinary” analysis.

The Kaplan-Meier curve is a graph of the survival curve obtained using the Kaplan-Meier method.

Cox model

This model was invented by Sir David Roxby Cox (b. 1924), a famous English statistician, author of over 300 articles and books.

The Cox model is used in situations where the quantities studied in the survival analysis depend on functions of time. For example, the probability of recurrence after t years (t=1.2,…) may depend on the logarithm of time log(t).

An important advantage of the method proposed by Cox is the applicability of this method in a large number of situations (the model does not impose strict restrictions on the nature or form of the probability distribution).

Based on the Cox model, an analysis (called a Cox analysis) can be performed, which results in a risk ratio value and a confidence interval for the risk ratio.

Nonparametric methods of statistics

Definition. A class of statistical methods that are used primarily for the analysis of non-normally distributed quantitative data, as well as for the analysis of qualitative data.

Example. To identify the significance of differences in the systolic pressure of patients depending on the type of treatment, we will use the nonparametric Mann-Whitney test.

Feature (variable)

Definition. X characteristics of the object of study (observation). There are qualitative and quantitative characteristics.

Randomization

Definition. A method of random distribution of research objects into the main and control groups using special means (tables or a random number counter, tossing a coin and other methods of randomly assigning a group number to an included observation). Randomization minimizes differences between groups in terms of known and unknown traits potentially influencing the outcome being studied.

Risk

Attributive- additional risk of an unfavorable outcome (for example, a disease) due to the presence of a certain characteristic (risk factor) in the object of study. This is the part of the risk of developing a disease that is associated with this risk factor, is explained by it and can be eliminated if this risk factor is eliminated.

Relative risk- the ratio of the risk of an unfavorable condition in one group to the risk of this condition in another group. It is used in prospective and observational studies when groups are formed in advance, and the occurrence of the studied condition has not yet occurred.

rolling exam

Definition. A method for checking the stability, reliability, performance (validity) of a statistical model by successively deleting observations and recalculating the model. The more similar the resulting models, the more stable and reliable the model.

Event

Definition. The clinical outcome observed in the study, such as the occurrence of complications, relapse, recovery, death.

Stratification

Definition. M a sampling method in which a population of all participants who meet the inclusion criteria for a study are first divided into groups (strata) based on one or more characteristics (usually gender, age) potentially influencing the outcome under study, and then from each of these groups ( stratum), participants are independently recruited into the experimental and control groups. This allows the researcher to balance important characteristics between the experimental and control groups.

Contingency table

Definition. A table of absolute frequencies (numbers) of observations, the columns of which correspond to the values ​​of one feature, and the rows to the values ​​of another feature (in the case of a two-dimensional contingency table). The values ​​of absolute frequencies are located in cells at the intersection of rows and columns.

Let us give an example of a contingency table. Aneurysm surgery was performed in 194 patients. A known indicator of the severity of edema in patients before surgery.

Edema \ Outcome

no edema 20 6 26
moderate swelling 27 15 42
pronounced edema 8 21 29
mj 55 42 194

Thus, out of 26 patients without edema, 20 patients survived after the operation, 6 patients died. Out of 42 patients with moderate edema, 27 patients survived, 15 died, etc.

Chi-square test for contingency tables

To determine the significance (reliability) of differences in one sign depending on another (for example, the outcome of an operation depending on the severity of edema), a chi-square test is used for contingency tables:


Chance

Let the probability of some event be equal to p. Then the probability that the event will not occur is 1-p.

For example, if the probability that the patient will still be alive after five years is 0.8 (80%), then the probability that he will die during this time period is 0.2 (20%).

Definition. Chance is the ratio of the probability that an event will occur to the probability that the event will not occur.

Example. In our example (about the patient), the chance is 4, since 0.8/0.2=4

Thus, the probability of recovery is 4 times the probability of death.

Interpretation of the value of a quantity.

1) If Chance=1, then the probability of the event occurring is equal to the probability that the event will not occur;

2) if Chance >1, then the probability of the event occurring is greater than the probability that the event will not occur;

3) if Chance<1, то вероятность наступления события меньше вероятности того, что событие не произойдёт.

odds ratio

Definition. The odds ratio is the ratio of the odds for the first group of objects to the odds ratio for the second group of objects.

Example. Let us assume that both men and women undergo some treatment.

The probability that a male patient will still be alive after five years is 0.6 (60%); the probability that he will die during this time period is 0.4 (40%).

Similar probabilities for women are 0.8 and 0.2.

The odds ratio in this example is

Interpretation of the value of a quantity.

1) If the odds ratio = 1, then the chance for the first group is equal to the chance for the second group

2) If the odds ratio is >1, then the chance for the first group is greater than the chance for the second group

3) If the odds ratio<1, то шанс для первой группы меньше шанса для второй группы

Consider a typical example of the application of statistical methods in medicine. The creators of the drug suggest that it increases diuresis in proportion to the dose taken. To test this assumption, they give five volunteers different doses of the drug.

According to the results of observations, a plot of diuresis versus dose is plotted (Fig. 1.2A). Dependence is visible to the naked eye. The researchers congratulate each other on the discovery, and the world on the new diuretic.

In fact, the data allow us to reliably state only that the dependence of diuresis on the dose was observed in these five volunteers. The fact that this dependence will manifest itself in all people who will take the drug is nothing more than a guess.
WJ

with

zhenie. It cannot be said that it is groundless - otherwise, why experiment?

But now the drug is on the market. More and more people are taking it in hopes of increasing their diuresis. And what do we see? We see Fig. 1.2B, which indicates the absence of any relationship between the dose of the drug and diuresis. The black circles represent data from the original study. Statistics has methods for estimating the probability of obtaining such an "unrepresentative", moreover, confusing sample. It turns out that in the absence of a relationship between diuresis and the dose of the drug, the resulting "dependence" would be observed in about 5 out of 1000 experiments. So, in this case, the researchers were just out of luck. Even if they applied even the most perfect statistical methods, it still would not save them from error.

This fictitious, but not at all far from reality, example, we cited not in order to point out the uselessness
statistics. He talks about something else, about the probabilistic nature of her conclusions. As a result of applying the statistical method, we do not get the ultimate truth, but only an estimate of the probability of a particular assumption. In addition, each statistical method is based on its own mathematical model and its results are correct to the extent that this model corresponds to reality.

More on RELIABILITY AND STATISTICAL SIGNIFICANCE:

  1. Statistically significant differences in quality of life indicators
  2. Statistical aggregate. Account signs. The concept of continuous and selective research. Requirements for the statistical population and the use of accounting and reporting documents
  3. ESSAY. STUDY OF THE RELIABILITY OF TONOMETER READINGS FOR MEASURING INTRAOCULAR PRESSURE THROUGH THE EYELID2018, 2018

In any scientific and practical situation of an experiment (survey), researchers can not study all people (general population, population), but only a certain sample. For example, even if we are examining a relatively small group of people, such as those with a particular disease, it is highly unlikely that we have the resources or need to test every patient. Instead, a sample of the population is usually tested because it is more convenient and takes less time. In that case, how do we know that the results obtained from the sample represent the whole group? Or, to use professional terminology, can we be sure that our study correctly describes the entire population, the sample from which we used?

To answer this question, it is necessary to determine the statistical significance of the test results. Statistical Significance (Significant level, abbreviated Sig.), or /7-significance level (p level) - is the probability that a given result correctly represents the population from which the sample was studied. Note that this is only probability- it is impossible to say with absolute certainty that this study correctly describes the entire population. At best, one can only conclude from the level of significance that this is highly probable. Thus, the following question inevitably arises: what should be the level of significance in order to consider this result as a correct characterization of the population?

For example, at what value of probability are you willing to say that such odds are enough to take a risk? If the chances are 10 out of 100 or 50 out of 100? But what if this probability is higher? What about odds like 90 out of 100, 95 out of 100, or 98 out of 100? For a situation associated with risk, this choice is quite problematic, because it depends on the personal characteristics of a person.

In psychology, it is traditionally believed that a 95 or more chance out of 100 means that the probability of the correctness of the results is high enough to be generalized to the entire population. This figure was established in the process of scientific and practical activity - there is no law according to which it should be chosen as a guideline (and indeed, in other sciences, sometimes other values ​​​​of the significance level are chosen).

In psychology, this probability is handled in a somewhat unusual way. Instead of the probability that the sample represents a population, the probability that the sample is does not represent population. In other words, it is the probability that the discovered relationship or differences are random and not a property of the population. Thus, instead of saying that the results of a study are 95 out of 100 correct, psychologists say there is a 5 out of 100 chance that the results are wrong (similarly, 40 out of 100 chances in favor of the results being correct means 60 out of 100 chances in favor of their wrongness). The probability value is sometimes expressed as a percentage, but more often it is written as a decimal fraction. For example, 10 chances out of 100 are represented as a decimal fraction of 0.1; 5 out of 100 is written as 0.05; 1 in 100 - 0.01. With this form of recording, the limit value is 0.05. For a result to be considered correct, its significance level must be below this number (remember that this is the probability that the result not right describes the population. To do away with terminology, we add that the "probability of wrong result" (which is more correctly called significance level) usually denoted by the Latin letter R. The description of the results of the experiment usually includes a summary conclusion, such as "the results were significant at the level of significance (R(p) less than 0.05 (ie less than 5%).

Thus, the significance level ( R) indicates the probability that the results not represent the population. By tradition in psychology, it is believed that the results reliably reflect the overall picture, if the value R less than 0.05 (i.e. 5%). However, this is only a probabilistic statement, and not at all an unconditional guarantee. In some cases, this conclusion may be incorrect. In fact, we can calculate how often this can happen if we look at the magnitude of the significance level. At a significance level of 0.05, in 5 out of 100 cases, the results are probably incorrect. 11a at first glance it seems that this is not too often, but if you think about it, then 5 chances out of 100 is the same as 1 out of 20. In other words, in one out of every 20 cases the result will turn out to be wrong. Such odds do not seem particularly favorable, and researchers should beware of committing errors of the first kind. This is the name of the error that occurs when researchers think they have found real results, but in fact there are none. The opposite errors, consisting in the fact that researchers believe that they have not found a result, but in fact there is one, are called errors of the second kind.

These errors arise because the possibility of incorrect statistical analysis cannot be ruled out. The probability of error depends on the level of statistical significance of the results. We have already noted that in order for the result to be considered correct, the significance level must be below 0.05. Of course, some results are lower, and it's not uncommon to find results as low as 0.001 (a value of 0.001 indicates a 1 in 1000 chance of being wrong). The smaller the p value, the stronger our confidence in the correctness of the results.

In table. 7.2 shows the traditional interpretation of significance levels about the possibility of statistical inference and justification of the decision on the presence of a connection (differences).

Table 7.2

Traditional Interpretation of Significance Levels Used in Psychology

Based on the experience of practical research, it is recommended that, in order to avoid errors of the first and second types, when making responsible conclusions, decisions should be made about the presence of differences (connections), focusing on the level R n sign.

Statistical test(Statistical Test - it is a tool for determining the level of statistical significance. This is a decision rule that ensures that a true hypothesis is accepted and a false one is rejected with high probability.

Statistical criteria also indicate the method of calculating a certain number and this number itself. All criteria are used with one main goal: to determine significance level the data they analyze (i.e., the likelihood that the data reflects the true effect that correctly represents the population from which the sample was drawn).

Some criteria can only be used for normally distributed data (and if the feature is measured on an interval scale) - these criteria are usually called parametric. With the help of other criteria, you can analyze data with almost any distribution law - they are called nonparametric.

Parametric criteria - criteria that include distribution parameters in the calculation formula, i.e. means and variances (Student's t-test, Fisher's F-test, etc.).

Non-parametric criteria - criteria that do not include distribution parameters in the formula for calculating distributions and are based on operating frequencies or ranks (criterion Q Rosenbaum, criterion U Manna - Whitney

For example, when we say that the significance of differences was determined by Student's t-test, we mean that the Student's t-test method was used to calculate the empirical value, which is then compared with the tabular (critical) value.

According to the ratio of the empirical (we calculated) and critical values ​​of the criterion (table), we can judge whether our hypothesis is confirmed or refuted. In most cases, in order for us to recognize the differences as significant, it is necessary that the empirical value of the criterion exceed the critical one, although there are criteria (for example, the Mann-Whitney test or the sign test) in which we must adhere to the opposite rule.

In some cases, the calculation formula of the criterion includes the number of observations in the study sample, denoted as P. Using a special table, we determine what level of statistical significance of differences corresponds to a given empirical value. In most cases, the same empirical value of the criterion may turn out to be significant or insignificant, depending on the number of observations in the study sample ( P ) or from the so-called number of degrees of freedom , which is denoted as v (g>) or both df (sometimes d).

Knowing P or the number of degrees of freedom, we can use special tables (the main ones are given in Appendix 5) to determine the critical values ​​of the criterion and compare the obtained empirical value with them. It is usually written like this: n = 22 critical values ​​of the criterion are tSt = 2.07" or "at v (d) = 2, the critical values ​​of the Student's criterion are = 4.30 "and the so-called.

Usually, however, preference is given to parametric criteria, and we adhere to this position. They are considered to be more reliable and can provide more information and deeper analysis. As for the complexity of mathematical calculations, when using computer programs, this complexity disappears (but some others appear, however, quite surmountable).

  • In this textbook, we do not deal in detail with the problem of statistical
  • hypotheses (zero - R0 and alternative - Hj) and statistical decisions, since psychology students study this separately in the discipline "Mathematical Methods in Psychology". In addition, it should be noted that when preparing a research report (term paper or thesis, publication), statistical hypotheses and statistical solutions are usually not given. Usually, when describing the results, a criterion is indicated, the necessary descriptive statistics are given (means, sigma, correlation coefficients, etc.), empirical values ​​of the criteria, degrees of freedom, and necessarily the p-significance level. Then a meaningful conclusion is formulated in relation to the hypothesis being tested, indicating (usually in the form of inequality) the level of significance achieved or not achieved.

RELIABILITY STATISTICAL

- English credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity, and lack of ambiguity in a statistical test or in C.L. set of measurements. D. s. can be tested by repeating the same test (or questionnaire) on the same subject to see if the same results are obtained; or by comparing different parts of the test that are supposed to measure the same object.

Antinazi. Encyclopedia of Sociology, 2009

See what "STATISTICAL RELIABILITY" is in other dictionaries:

    RELIABILITY STATISTICAL- English. credibility/validity, statistical; German Validitat, statistische. Consistency, objectivity and lack of ambiguity in a statistical test or in a s. set of measurements. D. s. can be verified by repeating the same test (or ... ... Explanatory Dictionary of Sociology

    In statistics, a value is called statistically significant if the probability of its occurrence by chance or even more extreme values ​​is small. Here, the extreme is understood as the degree of deviation of the test statistics from the null hypothesis. The difference is called ... ... Wikipedia

    The physical phenomenon of statistical stability is that with an increase in the sample size, the frequency of a random event or the average value of a physical quantity tends to some fixed number. The phenomenon of statistical ... ... Wikipedia

    RELIABILITY OF DIFFERENCE (similarity)- analytical and statistical procedure for establishing the level of significance of differences or similarities between samples according to the studied indicators (variables) ... Modern educational process: basic concepts and terms

    REPORTING, STATISTICAL Big accounting dictionary

    REPORTING, STATISTICAL- a form of state statistical observation, in which the relevant authorities receive from enterprises (organizations and institutions) the information they need in the form of legally prescribed reporting documents (statistical reports) for ... Big Economic Dictionary

    A science that studies the methods of systematic observation of mass phenomena of human social life, the compilation of their numerical descriptions and the scientific processing of these descriptions. Thus, theoretical statistics is a science ... ... Encyclopedic Dictionary F.A. Brockhaus and I.A. Efron

    Correlation coefficient- (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables Definition of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application ... ... Encyclopedia of the investor

    Statistics- (Statistics) Statistics is a general theoretical science that studies quantitative changes in phenomena and processes. State statistics, statistics services, Rosstat (Goskomstat), statistical data, request statistics, sales statistics, ... ... Encyclopedia of the investor

    Correlation- (Correlation) Correlation is a statistical relationship of two or more random variables The concept of correlation, types of correlation, correlation coefficient, correlation analysis, price correlation, correlation of currency pairs on Forex Contents ... ... Encyclopedia of the investor

Books

  • Research in Mathematics and Mathematics in Research: A Methodological Collection on Research Activities of Students, Borzenko V.I. The collection presents methodological developments applicable in the organization of research activities of students. The first part of the collection is devoted to the application of the research approach in…

The concept of statistical significance

Statistical validity is essential in the calculation practice of the FCC. It was noted earlier that many samples can be selected from the same population:

If they are chosen correctly, then their average indicators and indicators of the general population differ slightly from each other in the size of the error of representativeness, taking into account the accepted reliability;

If they are chosen from different general populations, the difference between them turns out to be significant. Comparison of samples is commonly considered in statistics;

If they differ insignificantly, unimportantly, insignificantly, that is, they actually belong to the same general population, the difference between them is called statistically unreliable.

statistically significant a sample difference is a sample that differs significantly and fundamentally, i.e., belongs to different general populations.

In the FCC, assessing the statistical significance of sample differences means solving many practical problems. For example, the introduction of new teaching methods, programs, sets of exercises, tests, control exercises is associated with their experimental verification, which should show that the test group is fundamentally different from the control group. Therefore, special statistical methods are used, called criteria for statistical significance, allowing to detect the presence or absence of a statistically significant difference between the samples.

All criteria are divided into two groups: parametric and non-parametric. Parametric criteria provide for the mandatory presence of a normal distribution law, i.e. this refers to the mandatory determination of the main indicators of the normal law - the arithmetic mean X and standard deviation about. Parametric criteria are the most accurate and correct. Nonparametric tests are based on rank (ordinal) differences between the elements of the samples.

Here are the main statistical significance criteria used in the practice of the FCC: Student's test, Fisher's test, Wilcoxon's test, White's test, Van der Waerden's test (sign test).

Student's criterion named after the English scientist C. Gosset (Student is a pseudonym), who discovered this method. Student's criterion is parametric, used to compare the absolute values ​​of the samples. Samples may vary in size.

Student's criterion is defined as follows.

1. Find Student's criterion t according to the following formula:

where xi, x 2 - arithmetic mean of compared samples; /i b w 2 - representativeness errors identified on the basis of the indicators of the compared samples.

2. Practice in the FCC has shown that for sports work it is enough to accept the reliability of the score R= 0,95.

63 For account reliability: P= 0.95 (a = 0.05), with the number of degrees; freedom k= «! + n 2 - 2 according to the table of application 4 we find the value \ well, the boundary value of the criterion (^gr).

3. Based on the properties of the normal distribution law, a comparison is made in the Student's test t and t^.

4. Draw conclusions:

If a t> ftp, then the difference between the compared samples is statistically significant;

If a t< 7 F, then the difference is not statistically significant.

For researchers in the field of FCC, the assessment of statistical significance is the first step in solving a specific problem: fundamentally or non-fundamentally differ between; comparable samples. The next step is; assessment of this difference from a pedagogical point of view, which is determined by the condition of the problem.