Biographies Characteristics Analysis

Theory of tests and testing of students' physical fitness. Basic concepts of test theory

Fundamentals of test theory 1. Basic concepts of test theory 2. Reliability of tests and ways to determine it

test questions 1. What is called a test? 2. What are the requirements for the test? 3. What tests are called authentic? 4. What is called the reliability of the test? 5. List the reasons that cause variation in results when retesting. 6. What is the difference between intraclass variation and interclass variation? 7. How to practically determine the reliability of the test? 8. What is the difference between test consistency and stability? 9. What is the equivalence of tests? 10. What is a homogeneous test suite? 11. What is a heterogeneous test suite? 12. Ways to improve the reliability of tests.

A test is a measurement or test carried out to determine a person's condition or abilities. Not all measurements can be used as tests, but only those that meet special requirements. These include: 1. standardization (the procedure and testing conditions must be the same in all cases of applying the test); 2. reliability; 3. informative; 4. availability of a rating system.

Test requirements: n Informative - the degree of accuracy with which it measures the property (quality, ability, characteristic) for which it is used. n Reliability - the degree of agreement between results when the same people are tested repeatedly under the same conditions. Consistency - (different people, but the same devices and the same conditions). n n Standard conditions - (same conditions for repeated measurements). n The presence of a grading system - (transfer to a grading system. As in school 5 -4 -3. . .).

Tests that meet the requirements of reliability and informativeness are called good or authentic (Greek authentico - in a reliable way)

The testing process is called testing; resulting measurement numerical value- test result (or test result). For example, running 100 m is a test, the procedure for conducting races and timing is testing, the running time is the result of the test.

Tests based on motor tasks are called motor or motor tests. Their results can be either motor achievements (distance passing time, number of repetitions, distance traveled, etc.), or physiological and biochemical indicators.

Sometimes not one, but several tests are used that have a single end goal (for example, an assessment of the athlete's condition in the competitive period of training). Such a group of tests is called a complex or battery of tests.

The same test, applied to the same subjects, should give identical results under the same conditions (unless the subjects themselves have changed). However, with the strictest standardization and precise equipment, test results always vary somewhat. For example, the researcher, who has just shown a result of 215 k. G in the test of backbone dynamometry, when repeated, shows only 190 k. G.

Reliability of tests and ways to determine it Reliability of a test is the degree of agreement between the results when retesting the same people (or other objects) under the same conditions.

The variation of results during repeated testing is called intra-individual, or intra-group, or intra-class. Four main reasons cause this variation: 1. Changes in the state of the subjects (fatigue, working out, “learning”, changes in motivation, concentration, etc.). 2. Uncontrolled changes in external conditions and equipment (temperature, wind, humidity, voltage in the mains, the presence of unauthorized persons, etc.), i.e. everything that is combined by the term “random measurement error”.

Four main reasons cause this variation: 3. A change in the state of the person administering or evaluating the test (and, of course, the replacement of one experimenter or judge by another). 4. Imperfection of the test (there are tests that are obviously unreliable. For example, if the subjects perform free throws into a basketball basket, then even a basketball player with a high percentage of hits can accidentally make a mistake on the first throws).

The concept of the true test result is an abstraction (it cannot be measured in experience). Therefore, indirect methods have to be used. The analysis of variance with the subsequent calculation of intraclass correlation coefficients is most preferable for assessing reliability. Analysis of variance allows you to decompose the variation of test results recorded in the experiment into components due to the influence of individual factors.

If you register the results of the test subjects in any test, repeating this test on different days, and making several attempts every day, periodically changing the experimenters, then there will be variations: a) from test subject to test subject; n b) from day to day; n c) from experimenter to experimenter; n d) from attempt to attempt. Analysis of variance makes it possible to isolate and evaluate these variations. n

Thus, in order to practically assess the reliability of the test, it is necessary, n firstly, to perform an analysis of variance, n secondly, to calculate the intraclass correlation coefficient (reliability coefficient).

Speaking about the reliability of tests, it is necessary to distinguish between their stability (reproducibility), consistency, and equivalence. n n Test stability refers to the reproducibility of results when it is repeated after certain time under the same conditions. Retesting is commonly referred to as a retest. Test consistency is characterized by the independence of test results from the personal qualities of the person conducting or evaluating the test.

If all the tests included in any test suite are highly equivalent, it is called homogeneous. This whole complex measures one property of human motor skills (for example, a complex consisting of jumps from a place in length, up and triple; the level of development of speed-strength qualities is assessed). If there are no equivalent tests in the complex, that is, the tests included in it measure different properties, then it is called heterogeneous (for example, a complex consisting of standing dynamometry, a jump up Abalakov, a 100-meter run).

The reliability of tests can be improved to some extent by: n n n a) more stringent standardization of testing; b) increasing the number of attempts; c) increasing the number of evaluators (judges, experiments) and increasing the consistency of their opinions; d) increasing the number of equivalent tests; e) better motivation of the subjects.

A measurement or test carried out to determine the condition or ability of an athlete is called test. Not all measurements can be used as tests, but only those that meet special requirements: standardization, availability of a rating system, reliability, information content, objectivity. Tests that meet the requirements of reliability, informativeness and objectivity are called sound.

The testing process is called testing, and the numerical values ​​obtained as a result of the measurement are test result.

Tests based on motor tasks are called motor or motor. Three groups of motor tests are distinguished depending on the task that the researcher faces.

Varieties of motor tests

Test name

Task for the athlete

test result

Control exercise

Motor achievements

1500m run time

Standard functional tests

The same for everyone, dosed: 1) according to the amount of work performed; 2) by the magnitude of physiological changes

Physiological or biochemical parameters during standard work Motor parameters during standard value physiological changes

Heart rate registration at standard work 1000 kGm/min Running speed at heart rate 160 beats/min

Maximum functional trials

Show maximum score

Physiological or biochemical parameters

Determination of maximum oxygen debt or maximum oxygen consumption

Sometimes not one, but several tests are used that have a single end goal. This group of tests is called battery of tests.

It is known that even with the most stringent standardization and precise equipment, test results always vary somewhat. Therefore, one of the important conditions for the selection of good tests is their reliability.

Test reliability is the degree of agreement between the results when the same people are tested repeatedly under the same conditions. There are four main reasons causing intra-individual or intra-group variation in test results:

    change in the state of the subjects (fatigue, change in motivation, etc.); uncontrolled changes in external conditions and equipment;

    a change in the state of the person conducting or evaluating the test (health, replacement of the experimenter, etc.);

    imperfection of the test (for example, obviously imperfect and unreliable tests - free throws into the basketball basket before the first miss, etc.).

The test reliability criterion can be reliability factor, calculated as the ratio of the true variance to the variance recorded in the experiment: r = true s 2 / recorded s 2, where the true value is understood to be the variance obtained with an infinite number of observations under the same conditions; the reported variance is derived from experimental studies. In other words, the reliability coefficient is simply the proportion of true variation in the variation that is registered in the experiment.

In addition to this coefficient, we also use reliability index, which is considered as a theoretical coefficient of correlation or connection between the registered and true values ​​of the same test. This method is most common as a criterion for assessing the quality (reliability) of a test.

One of the characteristics of test reliability is its equivalence, which reflects the degree of agreement between test results of the same quality (for example, physical) by different tests. The attitude towards test equivalence depends on the specific task. On the one hand, if two or more tests are equivalent, their combined use increases the reliability of the estimates; on the other hand, it seems possible to apply only one equivalent test, which will simplify testing.

If all tests in a battery of tests are highly equivalent, they are called homogeneous(for example, to assess the quality of jumping ability, homogeneous, presumably, there will be jumps from a place in length, up, triple). On the contrary, if there are no equivalent tests in the complex (for example, for assessing general physical fitness), then all tests included in it measure different properties, i.e. in essence the complex is heterogeneous.

The reliability of tests can be improved to a certain extent by:

    more stringent standardization of testing;

    increasing the number of attempts;

    increasing the number of evaluators and increasing the consistency of their opinions;

    increasing the number of equivalent tests;

    better motivation of the test subjects.

Test objectivity there is a special case of reliability, i.e. independence of test results from the person conducting the test.

Informativeness of the test is the degree of accuracy with which it measures the property (quality of an athlete) for which it is used. In different cases, the same tests may have different informativeness. The question of the information content of the test is divided into two particular questions:

What does this test change? How accurately does it measure?

For example, is it possible to use such an indicator as MOC to judge the preparedness of long distance runners, and if so, with what degree of accuracy? Can this test be used in the control process?

If the test is used to determine the state of the athlete at the time of the examination, then they say about diagnostic informativeness of the test. If, on the basis of the test results, they want to draw a conclusion about the possible future performance of an athlete, they talk about predictive informative. A test may be diagnostically informative, but not prognostic and vice versa.

The degree of informativeness can be characterized quantitatively - on the basis of experimental data (the so-called empirical informative) and qualitatively - based on a meaningful analysis of the situation ( logical informative). Although in practical work, logical or meaningful analysis should always precede mathematical. The indicator of the information content of the test is the correlation coefficient calculated for the dependence of the criterion on the result in the test, and vice versa (an indicator that obviously reflects the property that is going to be measured using the test is taken as a criterion).

In cases of insufficient information content of any test, a battery of tests is used. However, the latter, even in the presence of high separate criteria of informativeness (judging by the correlation coefficients), does not allow one to obtain a single number. More help can come here. complex method mathematical statistics - factor analysis. Which allows you to determine how many and which tests work together on a particular factor and what is the degree of their contribution to each factor. And then it is easy to choose tests (or combinations thereof) that most accurately assess individual factors.

1 What is a test?

2 What is testing?

Quantification of an athlete's quality or condition A measurement or test carried out to determine an athlete's condition or ability A process of testing in which an athlete's quality or condition is quantified No definition required

3 What is the result of the test?

Quantification of an athlete's quality or condition A measurement or test carried out to determine an athlete's condition or ability A process of testing in which an athlete's quality or condition is quantified No definition required

4 What type of test is 100 m run?

5 What type of test is carpal dynamometry?

Control exercise Functional testMaximum functional test

6 What type of tests does the sample belong to? IPC?

Control exercise Functional testMaximum functional test

7 What type of test is 3 minute metronome run?

Control exercise Functional testMaximum functional test

8 What type of test is maximum number of pull-ups on the bar?

Control exercise Functional testMaximum functional test

9 When is the test considered informative?

10 When is a test considered reliable?

The ability of a test to replicate results when retested The ability of a test to measure an athlete's quality of interest Independence of test results from the person conducting the test

11 When is a test considered objective?

The ability of a test to replicate results when retested The ability of a test to measure an athlete's quality of interest Independence of test results from the person conducting the test

12 What criterion is needed when evaluating a test for informativeness?

13 What criterion is needed when evaluating a reliability test?

Student's T-test F-Fisher's test Correlation coefficient Determination coefficient Variance

14 What criterion is needed when evaluating an objectivity test?

Student's T-test F-Fisher's test Correlation coefficient Determination coefficient Variance

15 What is the informativeness of the test called if it is used to assess the degree of fitness of an athlete?

16 What information content of the control exercises is guided by the coach, selecting children in his sports section?

Logical Predictive Empirical Diagnostic

17 Is correlation analysis necessary to assess the information content of tests?

18 Is factor analysis necessary to assess the information content of tests?

19 Can correlation analysis assess the reliability of a test?

20 Is it possible to assess the objectivity of the test using correlation analysis?

21 Will tests designed to assess general fitness be equivalent?

22 When measuring the same quality with different tests, tests are used ...

Designed to measure the same quality Having a high correlation between each other Having a low correlation between each other

BASICS OF THE THEORY OF EVALUATION

To evaluate sports results, special score tables are often used. The purpose of such tables is to convert the shown sports result (expressed in objective measures) into conditional points. The law of converting sports results into points is called rating scale. The scale can be specified as a mathematical expression, table or graph. There are 4 main types of scales used in sports and physical education.

Proportional Scales

Regressing scales

progressive scales.

Proportional Scales assume the accrual of the same number of points for an equal increase in results (for example, for every 0.1 s of improvement in the result in a 100 m run, 20 points are awarded). Such scales are used in modern pentathlon, speed skating, cross-country skiing, Nordic combined, biathlon and other sports.

Regressing scales imply accrual, for the same increase in result as sports achievements increase, all lesser number points (for example, for improving the result in the 100 m run from 15.0 to 14.9 s, 20 points are added, and for 0.1 s in the range of 10.0-9.9 s - only 15 points).

progressive scales. Here, the higher the sports result, the greater the increase in points for its improvement (for example, for improving running time from 15.0 to 14.9 s, 10 points are added, and from 10.0 to 9.9 s, 100 points are added). Progressive scales are used in swimming, certain types of athletics, and weightlifting.

Sigmoid scales rarely used in sports, but widely used in assessing physical fitness (for example, this is how the scale of physical fitness standards of the US population looks like). In these scales, improvement in the very low and very high performance areas is sparingly encouraged; the most points are gained by the increase in results in the middle zone of achievements.

The main tasks of assessment are:

    compare different achievements in the same task;

    compare achievements in different tasks;

    define standards.

Norma in sports metrology, the boundary value of the result is called, which serves as the basis for assigning an athlete to one of the classification groups. There are three types of norms: comparative, individual, due.

Comparative norms are based on a comparison of people belonging to the same population. For example, dividing people into subgroups according to the degree of resistance (high, medium, low) or reactivity (hyper-reactive, norm-reactive, hypo-reactive) to hypoxia.

Different gradations of assessments and norms

Percentage of test subjects

Norms in scales

verbal

in points

Percentile

Very low

Below M - 2

From M - 2 to M - 1

below average

From M-1 to M-0.5

From М–0.5 to М+0.5

above average

From М+0.5 to М+1

From M+1 to M+2

Very high

Above M+2

These norms characterize only the comparative success of the subjects in a given population, but do not say anything about the population as a whole (or on average). Therefore, comparative norms should be compared with data obtained from other populations and used in conjunction with individual and due norms.

Individual norms based on comparing the performance of the same athlete in different states. For example, in many sports there is no relationship between body weight and athletic performance. Each athlete has an individually optimal weight corresponding to the state of sports form. This rate can be controlled at different stages of sports training.

due standards based on an analysis of what a person should be able to successfully cope with the tasks that life sets before him. An example of this can be the standards of individual complexes for physical training, the proper values ​​of VC, basal metabolism, body weight and height, etc.

1 Is it possible to directly measure the quality of endurance?

2 Is it possible to directly measure the quality of speed?

3 Is it possible to directly measure the quality of dexterity?

4 Can the quality of flexibility be measured by a direct method?

5 Is it possible to directly measure the strength of individual muscles?

6 Can an assessment be expressed in a qualitative characteristic (good, satisfactory, bad, pass, etc.)?

7 Is there a difference between a measurement scale and a rating scale?

8 What is a rating scale?

The system for measuring sports results The law of converting sports results into points The system for evaluating norms

9 The scale involves accrual the same number points for an equal increase in results. It …

10 For the same increase in the result, as sporting achievements increase, an ever smaller number of points are awarded. It …

Progressive scale Regressive scaleProportional scaleSigmoid scale

11 The higher the sports result, the greater the increase in points to assess its improvement. It …

Progressive scale Regressive scaleProportional scaleSigmoid scale

12 Improvement in the very low and very high performance areas is rewarded sparingly; the most points are gained by the increase in results in the middle zone of achievements. It …

Progressive scale Regressive scaleProportional scaleSigmoid scale

13 Norms based on a comparison of people belonging to the same population are called ...

14 Norms based on comparison of performance of the same athlete in different states are called...

Individual standards Due standards Comparative standards

15 Norms based on an analysis of what a person should be able to do in order to cope with the tasks assigned to him are called ...

Individual standards Due standards Comparative standards

BASIC CONCEPTS OF QUALIMETRY

Qualimetry(lat. qualitas - quality, metron - measure) studies and develops quantitative methods evaluation of qualitative features.

Qualimetry is based on several starting points:

Any quality can be measured;

The quality depends on a number of properties that form a “quality tree” (for example, the quality tree of exercises in figure skating consists of three levels - high, medium, low);

Each property is defined by two numbers: relative index and weight; the sum of the weights of properties at each level is equal to one (or 100%).

Methodological methods of qualimetry are divided into two groups:

Heuristic (intuitive), based on expert assessments and questionnaires;

Instrumental.

Expert called an assessment obtained by asking for the opinions of specialists. Typical examples of expertise: judging in gymnastics and figure skating, competition for the best scientific work etc.

Conducting an examination includes the following main stages: the formation of its goal, the selection of experts, the choice of methodology, the conduct of a survey and the processing of the information received, including an assessment of the consistency of individual expert assessments. During the examination, the degree of agreement of opinions of experts, estimated by the value rank correlation coefficient(in case of several experts). It should be noted that rank correlation underlies the solution of many qualimetry problems, since it allows mathematical calculations with qualitative features.

In practice, an indicator of an expert's qualification is often the deviation of his estimates from the average estimates of a group of experts.

Questioning called the method of collecting opinions by filling out questionnaires. Questioning, along with interviews and conversations, refers to survey methods. Unlike interviews and conversations, questioning involves written answers from the person filling out the questionnaire - the respondent - to a system of standardized questions. It allows you to study the motives of behavior, intentions, opinions, etc.

Questionnaires can be used to solve many practical problems in sports: assessing the psychological status of an athlete; his attitude to the nature and direction of training sessions; interpersonal relationships in the team; own assessment of technical and tactical readiness; nutrition assessment and many others.

1 What does qualimetry study?

Studies the quality of tests Studies the qualitative properties of a feature Studies and develops quantitative methods for assessing quality

2 Mathematical methods used in qualimetry?

Pair correlation Rank correlation Analysis of variance

3 What methods are used to assess the level of performance?

4 What methods are used to evaluate the diversity of technical elements?

Questionnaire method Method of expert assessments Method not specified

5 What methods are used to evaluate the complexity of technical elements?

Questionnaire method Method of expert assessments Method not specified

6 What methods are used to evaluate psychological condition athlete?

Questionnaire method Method of expert assessments Method not specified

The problem of testing the physical fitness of a person developed in the theory and methodology of physical education, sports metrology, anthropomotorics, biomechanics, sports medicine and other sciences. For about 130-140 years of the history of this problem, a huge and most diverse material has been accumulated, which has always aroused and continues to arouse great interest not only from scientists, but also physical education teachers, coaches, students, and their parents.

The first article devoted to the problem under consideration is introductory. It reveals the basics of the theory of tests and testing, without familiarization with which it is difficult for a teacher to solve the problems of applying tests in the practice of his work. Let us name at least some of the questions that arise. What is a "test"? What is the classification of tests? Why and is it necessary to test the physical fitness of students? How to determine the level (high, medium, low) of the development of physical qualities and fitness? What is considered the norm when testing and how to set it? If a teacher came up with a new motor test or a battery of tests to determine the physical fitness of children, then what should he pay attention to or what necessary conditions (requirements, criteria) must be met? Testing the physical condition of students involves the obligatory familiarization of the teacher with elementary methods mathematical statistics. With which of them?

In our articles, we will also present historical information about the emergence of tests and the theory of testing a person's physical fitness. Let's say when and where the first tests appeared, including batteries of tests to assess physical fitness. What are the most common tests to determine the conditioning (strength, speed, endurance, flexibility) and coordination abilities of children school age? What batteries (programs) of tests for assessing the physical fitness of children and adolescents are the most popular in different countries? We will also discuss such an important practical problem as the ratio of test results and grades (marks) in the subject " Physical Culture". More specifically, if a student consistently scores high on tests, does that automatically mean excellent rating in our subject? And so on.

In this article we will discuss: 1) testing tasks; 2) the concept of "test" and the classification of motor (motor) tests; 3) criteria for the quality factor of motor tests; 4) organization of physical fitness testing of schoolchildren.

1. Tasks of testing. Testing human motor abilities is one of the most important directions activities of scientists and teachers in the field of physical culture and sports. It helps to solve a number of complex pedagogical problems in identifying the levels of development of conditional and coordination abilities, evaluating the quality of technical and tactical readiness. Based on the test results, it is possible to compare the readiness of both individual students and entire groups of students living in different regions and countries; conduct appropriate selection for practicing a particular sport, for participation in competitions; to carry out fairly objective control over the education (training) of schoolchildren and young athletes; identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes; finally, to substantiate the norms (age, individual) of the physical fitness of children and adolescents.



a) to teach the schoolchildren themselves to determine the level of their physical fitness and plan the complexes of physical exercises necessary for themselves;

b) encourage students to further improve their physical condition
(shapes);

c) to know not so much the initial level of motor ability development as its change over a certain time;

d) to stimulate students who have achieved high results, but not so much for the achieved high level of physical fitness, but for the implementation of the planned increase in personal results.



Experts emphasize that the traditional approach to testing, when the data of standardized tests and standards are compared with the result shown, causes many students, especially those with low and medium levels of physical fitness, to have a negative attitude. Testing, on the other hand, should increase interest among schoolchildren, bring them joy, and not lead to the development of an inferiority complex. In this regard, we propose the following approaches:

1) the results of the student's tests are determined not on the basis of comparison with the standards, but on the basis of changes that have occurred over a certain period of time;

2) all components of the test are modified, lighter versions of the exercises are used (the tasks that make up the content of the test must be easy enough so that the probability of their successful completion is high);

3) zero score or with a minus sign are excluded, only positive results are eligible.

So, when testing, it is important to bring together scientific (theoretical) tasks and personally significant, positive motives for the student to participate in this procedure.

2. The concept of "test" and the classification of motor (motor) tests. The term test translated from of English language means test. Tests are used to solve many scientific and practical problems. Among the methods of assessing the physical condition of a person (observation, expert opinions) the test method (in our case - motor, or motor) is the main method used in sports metrology and other scientific disciplines- "the doctrine of movements", the theory and methodology of physical education.

Test is a measurement or test carried out to determine a person's ability or condition. There can be a lot of such measurements, including those based on the use of a wide variety of physical exercises. However, not every physical exercise or test can be considered a test. As tests, only those tests (samples) that meet special requirements and in accordance with which must be:

a) the purpose of any test (or tests) is defined;

b) a standardized methodology for measuring results in tests and a testing procedure have been developed;

c) the reliability and informativeness of the tests were determined;

d) the possibility of presenting test results in the corresponding assessment system has been implemented.

The system of using tests in connection with the task, the organization of conditions, the performance of tests by the subjects, the evaluation and analysis of the results is called testing. The numerical value obtained during the measurements - result of testing (test).

For example, the standing long jump is a test; the procedure for conducting jumps and measuring results - testing; jump length - test result.

The tests used in physical education are based on motor actions (physical exercises, motor tasks). Such tests are called motor or motor.

Currently, there is no single classification of motor tests. The classification of tests according to their structure and predominant indications is known (see table 1).

Distinguish unit and complex tests. unit test serves to measure and evaluate one attribute (coordinating or conditioning ability). Since the structure of each coordination or conditioning ability is complex, usually only one component of this ability is evaluated using such a test (for example, the ability to balance, the speed of a simple reaction, the strength of the muscles of the hands).

By using educational The test assesses the ability for motor learning (by the difference between the final and initial marks for a certain period of training in the technique of movements).

test series makes it possible to use the same test for a long time, when the measured ability improves significantly. At the same time, the tasks of the test are consistently increasing in their difficulty. Unfortunately, this type of unit test is not yet widely used both in science and in practice.

By using complex test evaluate several signs or components of different abilities or the same ability (for example, jumping up from a place - with a wave of hands, without a wave of hands, to a given height). On the basis of such a test, one can obtain information about the level of speed-strength abilities (by the height of the jump), coordination abilities (by the accuracy of differentiation of power efforts, by the difference in the height of the jump with and without a wave of arms).

test profile consists of several separate tests on the basis of which they evaluate or several different physical abilities (heterogeneous test profile), or several manifestations of the same physical ability (homogeneous test profile). Test results can be presented in the form of a profile, which makes it possible to

Forms of tests and the possibility of their application (according to D.-D. Blume, 1987)


Table 1


Type of Measured ability Structure sign Example
unit test
Elementary test containing one motor task One test task, one final test score Balance Test, Tremometry, Connectivity Test, Rhythm Test, Landing Accuracy Jump
Practice test One ability or aspect (component) of ability One or more test questions. One final test score (pedagogical period) General Practice Test
test series One ability or aspect (component) of ability One test task with variants or several tasks of increasing difficulty Connectivity Test
Comprehensive test
Complex test containing one task Several abilities or aspects (components) of one ability One test task, multiple final scores jump test
Reusable Task Test Multiple test tasks running in sequence, multiple final evaluations Reusable reaction test
test profile Multiple abilities or aspects of the same ability Multiple tests, multiple final grades coordinating star
Test battery Multiple abilities or aspects of the same ability Several tests, one test score Test battery for assessing the ability to learn movements

quickly compare individual and group results.

Test battery also consists of several separate tests, the results of which are summarized in one final assessment, considered in one of the rating scales (more on this in the second article). As in the test profile, here a distinction is made homogeneous and heterogeneous batteries.

homogenous battery, or a homogeneous profile are used in the assessment of all components of complex ability (eg, responsiveness). In this case, the results of individual tests should be closely interconnected (correlated).

A heterogeneous test profile or a heterogeneous battery serves to evaluate the complex (set) of various motor abilities. For example, such test batteries are used to assess strength, speed and endurance abilities - these are batteries of physical fitness tests.

In tests reusable tasks the subjects sequentially perform motor tasks and receive separate marks for each solution of the motor task. These estimates may be close connection with each other. Through appropriate statistical calculations, additional information about the abilities being assessed can be obtained. An example is the sequentially executed jump test tasks (Table 2).

The definition of motor tests indicates that they serve to assess motor abilities and partly motor skills. Therefore, in the very general view allocate conditioning tests, coordination tests and tests for assessing motor skills and abilities (movement techniques). Such a systematization is, however, still too general.

Classification of motor tests according to their predominant indications follows from the systematization of physical (motor) abilities. In this regard, distinguish condition tests(for assessing strength: maximum, speed, power endurance; for assessing endurance; for assessing speed abilities; for assessing flexibility: active and passive) and coordination tests(to estimate coor

dynational abilities related to individual independent groups motor actions that measure special coordination abilities; to assess specific coordination abilities - the ability to balance, orientation in space, response, differentiation of movement parameters, rhythm, restructuring of motor actions, coordination (connection), vestibular stability, voluntary muscle relaxation.

Developed big number tests to assess motor skills in different sports. They are given in the relevant textbooks and manuals and are not considered in this article.

Thus, each classification serves as a kind of guideline for choosing (or creating) the type of tests that best suits the tasks of testing.

3. Criteria for the quality factor of motor tests. As noted above, the concept of "motor test" meets its purpose if the test satisfies the relevant basic criteria: reliability, stability, equivalence, objectivity, information content, as well as additional criteria: normalization, comparability and economy.

Tests that meet the requirements of reliability and informativeness are called good or authentic (reliable).

The reliability of a test is understood as the degree of accuracy with which it evaluates a certain motor ability, regardless of the requirements of the one who evaluates it. Reliability is manifested in the degree of agreement between the results when retesting the same people under the same conditions; it is the stability or persistence of an individual's test result over repeated performance of a control exercise. In other words, a schoolchild in the group of those surveyed according to the results of repeated testing (for example, indicators of jumps, running time, throwing distance) steadily retains his ranking place.

The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case, various methods are used, on the basis of which the reliability of the test is judged.

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time in the same conditions by the same experimenter. The method of repeated testing to determine the reliability is called a retest. The stability of the test depends on the type of test, the age and sex of the subjects, the time interval between the test and the retest. For example, indicators of conditional tests or morphological features at short time intervals are more stable than the results of coordination tests; in older students - the results are more stable than in younger ones. The retest is usually carried out no later than one week later. At longer intervals (for example, after a month), the stability of even tests such as running 1000 m or standing long jump becomes noticeably lower.

Test equivalence is the correlation of the test result with the results of other tests of the same type. For example, the equivalence criterion is used when it is necessary to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 meters.

This or that attitude to equivalent (homogeneous) tests depends on many reasons. If it is necessary to increase the reliability of the estimates or conclusions of the study, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, then only one of the equivalent tests should be used.


Table 2 Sequentially performed jump test tasks (according to D.-D. Blume, 1987)

No. p / p Test task Result evaluation Ability
Jump to the maximum height without swinging the arms Height, cm Jumping power
Jump to the maximum height with a wave of hands Height, cm Jumping power and ability to connect (bond)
Jump to the maximum height with a wave of hands and a jump Height, cm Connectivity (bonds) and jumping power
10 jumps with a wave of arms for a distance equal to 2/3 of the maximum jump height, as in problem 2 The sum of deviations from a given mark The ability to differentiate the power parameters of movements
The difference between the results of solving one problem and two problems ... cm Ability to connect (connect)

Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous battery of tests is a 30-meter run, a pull-up on the bar, a forward bend, a 1000-meter run. Other examples of such complexes will be presented in a separate publication.

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, the average accuracy of ball shots from 1, 3, 5, 7, and 9 attempts is compared with the average accuracy of shots from 2, 4, 6, 8, and 10 attempts. This method of assessing reliability is called the method of doubling, or splitting, and it is used mainly when assessing coordination abilities and if the number of attempts that form the test result is at least six.

Under objectivity(consistency) of the test understand the degree of consistency of the results obtained on the same subjects by different experimenters (teachers, judges, experts).

a) time of testing, place, weather;

b) unified material and hardware support;

c) psychophysiological factors (volume and intensity of load, motivation);

d) presentation of information (exact verbal statement of the test task, explanation and demonstration).

Compliance with these conditions creates the so-called objectivity of the test. They talk more about interpretative objectivity, concerning the degree of independence of interpretation of test results by different experimenters.

In general, as experts note, the reliability of tests can be improved in various ways: more stringent standardization of testing (see above), an increase in the number of attempts, better motivation of the subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, an increase in the number of equivalent tests .

There are no fixed values ​​of test reliability indicators. In most cases, the following recommendations are used: 0.95-0.99 - excellent reliability; 0.90-0.94 - good; 0.80-0.89 - acceptable; 0.70-0.79 - bad; 0.60-0.69 - doubtful for individual assessments, the test is suitable only for characterizing a group of subjects. informative A test is the degree of accuracy with which it measures the assessed motor ability or skill. In foreign and domestic literature, the term "validity" is used instead of the word "informativeness" (from the English validity - validity, validity, legality). In fact, in relation to information content, the researcher answers two questions: what does this particular test (test battery) measure and what is the degree of measurement accuracy.

Distinguish validity logical (meaningful), empirical (based on experimental data) and predictive. More detailed information on this topic is contained in the textbooks that have already become classics for students of physical education universities (Sports Metrology / Edited by V.M. Zatsiorsky. - M .: FiS, 1982. - P. 73-80; Godik M.A. metrology. - M .: FiS, 1988), as well as in a number of modern manuals.

Important additional test criteria, as noted, are regulation, comparability and economy.

essence rationing is that, based on the test results, it is possible to create norms that are of particular importance for practice (this will be discussed in a separate article).

Comparability test lies in the ability to compare the results obtained on one test or several forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that as a result of the regular use of the same test, not only and not so much the level of ability, but the degree of skill is assessed. Simultaneously comparable test results increase the reliability of the conclusions.

essence economy as a test quality criterion is that the test does not require a long time, large material costs and the participation of many assistants. For example, a battery of six tests for determining physical fitness, recommended in the "Comprehensive program of physical education for students in grades I-XI" (M .: Prosveshchenie, 2005-2006), a teacher with two assistants can conduct in one lesson, examining 25-30 children .

Organization of physical readiness testing of schoolchildren. The second important problem of motor abilities testing (recall that the first - the selection of informative tests - was considered earlier) is the organization of their application.

The teacher of physical culture should determine: in what terms it is better to organize testing, how to carry it out in the classroom and how often testing should be carried out.

Testing time set in accordance with the school program, which provides for mandatory two-time testing of students' physical fitness. The first testing is advisable to be carried out in the second or third week of September (after studying proccess returns to normal), and the second - two weeks before the end of the school year (at a later date, there may be organizational difficulties caused by upcoming exams and vacations).

Knowledge of annual changes in the development of motor abilities of schoolchildren allows the teacher to make appropriate adjustments to the process of physical education for the next academic year. However, the teacher can and should conduct more frequent testing, to exercise the so-called operational control. It is expedient to perform this procedure, for example, in order to determine the change in the level of speed, strength abilities and endurance under the influence of athletics lessons during the first quarter, etc. To this end, the teacher can apply tests to assess the coordination abilities of children at the beginning and at the end of mastering the educational material. school curriculum, for example, in sports games, to identify changes in the development of these abilities.

It should be taken into account that the variety of pedagogical tasks does not provide the teacher with a unified testing methodology, the same rules for conducting tests and evaluating test results. This requires experimenters (teachers) to show independence in solving theoretical, methodological and organizational issues of testing.

Testing in class must be linked to its content. In other words, the applied test (or tests), subject to the relevant requirements for it as a research method, should (should) be organically included in the planned physical exercises. If, for example, schoolchildren need to determine the level of development of speed abilities or endurance, then the necessary tests should be planned in that part of the lesson in which the tasks of developing the corresponding physical abilities will be solved.

Test frequency is largely determined by the rate of development of specific physical abilities, age-sex and individual features their development.

For example, to achieve a significant increase in speed, endurance or strength, several months of regular training (training) are required. At the same time, to get a significant increase in flexibility or individual coordination abilities, only 4-12 workouts are required. It is possible to achieve an improvement in one or another physical quality, if you start from scratch, in a shorter time. But in order to improve the same quality, when it reaches a high level in a schoolchild, more time is required. In this regard, the teacher should study more deeply the features of the development and improvement of various motor abilities in children in different age and sex periods.

When assessing the general physical fitness of students, as noted, you can use a wide variety of test batteries, the choice of which depends on the specific tasks of testing and the availability of necessary conditions. However, due to the fact that the results of testing can be evaluated only by comparison, it is advisable to choose tests that are widely represented in the theory and practice of physical education of children. For example, rely on those that are recommended in the "Comprehensive program of physical education for students in grades I-XI of a general education school" (M.: Prosveshchenie, 2004-2006).

To compare the general level of physical fitness of a student or a group of students using a set of tests, they resort to converting test results into points or points (we will talk about this in more detail in the next article). Changing the sum of points during repeated testing makes it possible to judge the progress of both an individual child and a group of children.

Physical culture at school, 2007, No. 6


Introduction

Relevance. The problem of testing a person's physical fitness is one of the most developed in the theory and methodology of physical education. Per recent decades a huge and diverse material has been accumulated: definition of testing tasks; conditionality of test results by different factors; development of tests to assess individual conditioning and coordination abilities; test programs that characterize the physical fitness of children and adolescents from 11 to 15 years old, adopted in the Russian Federation, in other CIS countries and in many foreign countries.

Testing the motor qualities of schoolchildren is one of the most important and basic methods of pedagogical control.

It helps to solve a number of complex pedagogical problems: to identify the levels of development of conditioning and coordination abilities, to evaluate the quality of technical and tactical readiness. Based on the test results, you can:

compare the readiness of both individual students and entire groups living in different regions and countries;

conduct sports selection for practicing a particular sport, for participation in competitions;

carry out, to a large extent, objective control over the education (training) of schoolchildren and young athletes;

identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes;

finally, to substantiate the norms (age, individual) of the physical fitness of children and adolescents.

Along with scientific tasks in the practice of different countries, the tasks of testing are as follows:

to teach the schoolchildren themselves to determine the level of their physical fitness and plan the complexes of physical exercises necessary for themselves;

encourage students to further improve their physical condition (form);

to know not so much the initial level of motor ability development as its change over a certain time;

to stimulate students who have achieved high results, but not so much for a high level, but for the planned increase in personal results.

In this work, we will rely on those tests that are recommended in the "Comprehensive program of physical education for students in grades 1 - 11 of a comprehensive school" prepared by V.I. Lyakh and G.B. Maxson.

The purpose of the study: to substantiate the methodology for testing the physical qualities of primary school students.

Research hypothesis: the use of testing is an accurate, informative method for determining the development of physical qualities.

Object of study: testing as a method of pedagogical control.

Subject of research: testing the qualities of students.


Chapter 1. CONCEPTS OF THE THEORY OF PHYSICAL FITNESS TESTS

1.1 Brief historical information about the theory of motor ability testing

People have been interested in measuring human motor achievements for a long time. The first information about measuring the distance over which long jumps were made dates back to 664 BC. e. At the XXIX Olympic Games of antiquity in Olympia, Chionis of Sparta jumped a distance of 52 feet, which is approximately 16.66 m. It is clear that here in question about multiple jumps.

It is known that one of the founders of physical education - Guts-Muts (J. Ch. F. Guts-Muts, 1759--1839) measured the motor achievements of his students and carried out accurate records of their results. And for the improvement of achievements, he awarded them with "prizes" - oak wreaths (G. Sorm, 1977). In the thirties XIX years in. Eiselen (E. Eiselen), an employee of the famous German teacher Jan (F. L. Yahn), based on the measurements performed, compiled a table for determining achievements in jumping. As you can see, it contains three gradations (Table 1).

Table 1. Results in jumps (in cm) for men (source: K. Mekota, P. Blahus, 1983)

elementary

Through the goat


Note that already in the middle of the XIX century. in Germany, when determining the length or height of a jump, it was recommended to take into account the parameters of the body.

Precise measurements of sports achievements, including record ones, have been carried out since the middle of the 19th century, and regularly since 1896, from the modern Olympic Games.

For a long time, people have been trying to measure strength abilities. The first curious information on this matter dates back to 1741, when, using simple instruments, it was possible to measure the strength of the wrestler Thomas Topham. He lifted a weight exceeding 830 kg (G. Sorm, 1977). The strength capabilities of students were already measured by Guts-Muts and Jan, using simple strength meters for this. But the first dynamometer, the progenitor of the modern dynamometer, was designed by Reiniger in France in 1807. In the practice of physical education of gymnasium students in Paris, it was used by F. Amoros in 1821. In the 19th century. to measure strength, they also used lifting the torso in a hanging position on the crossbar, bending and unbending the arms in support, and lifting weights.

The harbingers of modern batteries of tests for determining physical fitness are sports and gymnastic all-around. As the first, the ancient pentathlon, put into practice at the XVIII Olympic Games of antiquity in 708 BC, is singled out. e. It consisted of discus throwing, javelin throwing, jumping, running and wrestling. The decathlon that we know was first included in the competition program at the III Olympic Games (St. Louis, USA, 1904), and the modern pentathlon at the V Olympic Games (Stockholm, Sweden, 1912). The composition of exercises in these competitions is heterogeneous; An athlete needs to show preparedness in different disciplines. So, he must be versatile physically prepared.

Probably, taking into account this idea, at about the same time (the beginning of the 20th century), for children, youth and adults, sets of exercises were put into practice that comprehensively determine the physical fitness of a person. For the first time such complex tests were introduced in Sweden (1906), then in Germany (1913) and even later - in Austria and the USSR (Russia) - the Ready for Labor and Defense complex (1931).

The forerunners of modern motor tests arose in the late 19th and early 20th centuries. In particular, D. A. Sargent introduced the "strength test" into the practice of Harvard University, which, in addition to dynamometry and spirometry, included push-ups, raising and lowering the torso. Since 1890, this test has been used in 15 US universities. The Frenchman G. Hebert created a test, the publication of which appeared in 1911. It includes 12 motor tasks: running at different distances, jumping from a place and from a running start, throwing, repeatedly lifting a 40-kilogram projectile (weight ), swimming and diving.

Let us briefly dwell on the sources of information that examine the results of scientific research by doctors and psychologists. Medical research up to late XIX in. were focused most often on changing external morphological data, as well as on identifying asymmetries. The anthropometry used for this purpose kept pace with the use of dynamometry. So, the Belgian doctor A. Quetelet, after conducting extensive research, in 1838 published a work according to which the average results of the backbone strength (spine) of 25-year-old women and men are 53 and 82 kg, respectively. In 1884, the Italian A. Mosso (A. Mosso) investigated muscle endurance. To do this, he used an ergograph, which allowed him to observe the development of fatigue with repeated flexion of the finger.

Modern ergometry dates back to 1707. Then a device was already created that made it possible to measure the pulse per minute. The prototype of today's ergometer was designed by G. A. Him in 1858. Cycloergometers and treadmills were created later, in 1889-1913.

At the end of XIX - beginning of XX century. systematic research of psychologists begins. Reaction time is being studied, tests are being developed to determine the coordination of movements and rhythm. The concept of "reaction time" was introduced into science by the Austrian physiologist S. Exner (S. Exner) in 1873. The disciples of the founder experimental psychology W. Wundt in the laboratory established in 1879 in Leipzig carried out extensive measurements of the time of simple and complex reactions. The first tests of motor coordination included tapping and different types of aiming. One of the first attempts to study aiming is the X. Frenkel test (H. S. Frenkel), proposed by him in 1900. Its essence was to hold the index finger in all kinds of holes, rings, etc. This is a prototype of modern tests "for static and dynamic tremor".

Trying to determine the musical talent, in 1915 Seashore (S. E. Seashore) investigated the ability to rhythm.

The theory of testing dates back, however, from the end of the 19th to the beginning of the 20th century. It was then that the foundations of mathematical statistics were laid, without which modern theory tests cannot be done. On this path, undoubted merits belong to the geneticist and anthropologist F. Galton (F. Galton), mathematicians Pearson (Pearson) and U. Youle (U. Youle), mathematician-psychologist Spearman (S. Spearman). It was these scientists who created a new branch of biology - biometrics, which is based on measurements and statistical methods, such as correlation, regression, etc. Created by Pearson (1901) and Spearman (1904), a complex mathematical-static method - factor analysis - made it possible English scientist Bart (S. Burt) to apply it in 1925 to the analysis of the results of motor tests of students in London schools. As a result, such physical abilities as strength, speed, agility and endurance were identified. A factor called “general physical fitness” also stood out. Somewhat later, one of the most famous works of the American scientist McCloy (S.N.McCloy, 1934) was published - “Measurement of general motor abilities”. By the beginning of the 40s. scientists come to the conclusion that complex structure human motor abilities. Using various motor tests in combination with the use of mathematical models developed in parallel (single- and multivariate analysis), the concept of five motor abilities has firmly entered into the theory of testing: strength, speed, coordination of movements, endurance and flexibility.

Motor tests in former USSR were used to develop control standards for the Ready for Labor and Defense complex (1931). There is a well-known test of motor abilities (mainly coordination of movements), which was proposed by N.I. Ozeretsky (1923) for children and youth. Works on measuring the motor abilities of children and youth appeared in Germany, Poland, Czechoslovakia and other countries around the same time.

Significant progress in the development of the theory of testing the physical fitness of a person falls on the end of the 50s and 60s. 20th century The founder of this theory, most likely, is the American McCloy, who published, in collaboration with M. Jung (M. D. Young) in 1954, the monograph "Tests and measurement in health care and physical education", which subsequently relied on many authors of similar works. .

Of great theoretical importance was and still is the book "The Structure and Measurement of Physical Abilities" by the famous American researcher E.A. Fleishman (1964). The book not only reflects the theoretical and methodological issues of the problem of testing these abilities, but also outlines specific results, options for approaches, studies of reliability, informativeness (validity) of tests, and also presents important factual material on the factorial structure of motor tests of various motor abilities.

Great importance for the theory of testing physical abilities have books by V.M. Zatsiorsky "Physical qualities of an athlete" (1966) and "Cybernetics, mathematics, sports" (1969).

Brief historical information on physical fitness testing in the former USSR can be found in the publications of E.Ya. Bondarevsky, V.V. Kudryavtsev, Yu.I. Sbrueva, V.G. Panaeva, B.G. Fadeeva, P.A. Vinogradova and others.

It is conditionally possible to distinguish three stages of testing in the USSR (Russia):

Stage 1 - 1920-1940 - the period of mass surveys in order to study the main indicators physical development and the level of motor fitness, the emergence on this basis of the standards of the complex "Ready for work and defense".

2nd stage - 1946-1960 - study of motor fitness depending on morphological and functional features in order to create prerequisites for scientific and theoretical substantiation of their relationship.

3rd stage - from 1961 to the present - period integrated research the physical condition of the population, depending on the climatic and geographical features of the regions of the country.

Studies carried out during this period show that the indicators of physical development and motor fitness of people living in different regions of the country are due to the influence of biological, climatic, geographical, socio-economic and other both constant and variable factors. According to the developed unified comprehensive program, consisting of four sections (physical fitness, physical development, functional state basic systems of the body, sociological information), in 1981 a comprehensive survey of the physical condition of the population was carried out different ages and gender of different regions of the USSR.

Somewhat later, our specialists noted that for more than 100 years the level of physical development and preparedness of a person has been studied. However, despite the relatively large number of works in this direction, to carry out a deep and comprehensive analysis obtained data is not possible, since the studies were carried out with different contingents, in different seasonal periods, using different methods, testing programs and mathematical and statistical processing of the information received.

In this regard, the main emphasis was placed on the development of a methodology and the organization of a unified data collection system, taking into account metrological and methodological requirements, and the creation of a data bank on a computer.

In the mid 80s. of the last century, a mass all-Union survey of about 200,000 people from 6 to 60 years old was carried out, which confirmed the conclusions of the previous study.

From the very beginning of the emergence of scientific approaches to testing human physical fitness, researchers have sought to answer two main questions:

what tests should be selected to assess the level of development of a specific motor (physical) ability and the level of physical fitness of children, adolescents and adults;

how many tests do you need to get the minimum and at the same time sufficient information about the physical condition of a person?

Uniform ideas in the world on these issues have not yet been developed. At the same time, ideas about the programs (batteries) of tests that characterize the physical fitness of children and adolescents from 6 to 17 years old, adopted in different countries, are increasingly converging.

1.2 The concept of "test" and the classification of motor (motor) tests

The term test in translation from English means "test, test."

Tests are used to solve many scientific and practical problems. Among other ways of assessing the physical condition of a person (observation, expert assessments), the test method (in our case, motor or motor) is the main method used in sports metrology and other scientific disciplines (“the doctrine of movements”, the theory and methodology of physical education) .

A test is a measurement or test carried out to determine a person's ability or condition. There can be a lot of such measurements, including those based on the use of a wide variety of physical exercises. However, not every physical exercise or test can be considered a test. Only those tests (samples) that meet special requirements can be used as tests:

the purpose of any test (or tests) should be defined;

a standardized test measurement methodology and test procedure should be developed;

it is necessary to determine the reliability and informativeness of tests;

test results can be presented in an appropriate scoring system.

The system of using tests in accordance with the task, organization of conditions, performance of tests by the subjects, evaluation and analysis of the results are called testing, and the numerical value obtained during the measurements is the result of testing (test). For example, the standing long jump is a test; the procedure for conducting jumps and measuring results - testing; jump length -- test result.

The tests used in physical education are based on motor actions (physical exercises, motor tasks). Such tests are called motion or motor tests.

Currently, there is no single classification of motor tests. The classification of tests according to their structure and according to their predominant indications is known (Table 2).

As follows from the table, there are single and complex tests. The unit test serves to measure and evaluate one attribute (coordinating or conditioning ability). Since, as we see, the structure of each coordinating or conditioning ability is complex, then, as a rule, only one component of such an ability is evaluated with the help of such a test (for example, the ability to balance, the speed of a simple reaction, the strength of the muscles of the hands).

Table 2. - Forms of tests and the possibilities of their application (according to D.D. Blume, 1987)

Measured ability

Structure sign

unit test

Elementary test containing one motor task

One ability or aspect (component) of ability

One test task, one final test score

Balance test, tremometry, connectivity test, rhythm test

Practice test

One or more test questions. One final test score

General Practice Test

test series

One task of tests with variants or several tasks of increased difficulty

Connectivity Test

Comprehensive test

Complex test containing one task

Several abilities or aspects (components) of one ability

One test task, multiple final scores

jump test

Reusable Task Test

Multiple test tasks running in sequence, multiple final evaluations

Reusable reaction test

test profile

Multiple tests, multiple final grades

Coordinating task

Test battery

Multiple tests, one test score

Test battery for assessing the ability to learn movement


With the help of a training test, the ability for motor learning is assessed (by the difference between the final and initial marks for a certain period of training in the technique of movements).

The test series makes it possible to use the same test for a long time, when the measured ability improves significantly. At the same time, the tasks of the test are consistently increasing in their difficulty. Unfortunately, this type of test is not yet sufficiently used both in science and in practice.

With the help of a complex test, several signs or components of different or the same ability are evaluated, for example, a jump up from a place (with a wave of hands, without a wave of hands, to a given height). Based on this test, you can get information about the level of speed-strength abilities (by the height of the jump), coordination abilities (by the accuracy of differentiation of power efforts, by the difference in the height of the jump with and without a wave of arms).

The test profile consists of individual tests on which either several different physical abilities are assessed (heterogeneous test profile), or different manifestations the same physical ability (homogeneous test profile). The test results can be presented in the form of a profile, which makes it possible to compare individual and group results.

The test battery also consists of several separate tests, the results of which are summarized in one final score, considered in one of the rating scales (see Chapter 2). As in the test profile, a distinction is made between homogeneous and heterogeneous batteries. A homogeneous battery, or homogeneous profile, finds use in assessing all components of a complex capacity (eg, reactivity). At the same time, the results of individual tests should be closely interconnected (should correlate).

In tests of reusable tasks, the subjects sequentially perform motor tasks and receive separate marks for each solution of the motor task. These estimates may be closely related to each other. Through appropriate statistical calculations, additional information about the abilities being assessed can be obtained. An example is the sequentially solved jump test tasks (Table 3).

Table 3. Sequentially solved jump test tasks

Test task

Result evaluation

Ability

Maximum jump without arm swing

Jumping power

Maximum jump up with a wave of hands

Jumping power and ability to connect (bond)

Maximum jump up with a wave of hands and a jump

Connectivity (bonds) and jumping power

10 jumps with a wave of hands for a distance equal to 2/3 from maximum height jump, as in problem 2

The sum of deviations from a given mark

The ability to differentiate the power parameters of movements

The difference between the results for solving one problem and two problems

Ability to connect (connect)

(according to D.D. Blume, 1987)

The definition of motor tests indicates that they serve to assess motor abilities and partly motor skills. In the most general form, there are conditioning tests, coordination tests and tests for assessing motor skills and abilities (movement techniques). Such a systematization is, however, still too general. The classification of motor tests according to their predominant indications follows from the systematization of physical (motor) abilities.

In this regard, there are:

1) condition tests:

to assess strength: maximum, speed, power endurance;

to assess endurance;

to assess speed abilities;

to assess flexibility -- active and passive;

2) coordination tests:

to assess the coordination abilities related to individual independent groups of motor actions, which measure special coordination abilities;

to assess specific coordination abilities - the ability to balance, orientation in space, response, differentiation of movement parameters, rhythm, restructuring of motor actions, coordination (connection),

vestibular stability, voluntary muscle relaxation.

The concept of “tests for assessing motor skills” is not considered in this work. Examples of tests are given in Appendix 2.

Thus, each classification is a kind of guideline for choosing (or creating) the type of tests that are more relevant to the testing tasks.

1.3 Criteria for the quality factor of motor tests

The concept of "motor test" serves its purpose when the test satisfies the relevant requirements.

Tests that meet the requirements of reliability and informativeness are called good or authentic (reliable).

The reliability of a test is understood as the degree of accuracy with which it evaluates a certain motor ability, regardless of the requirements of the one who evaluates it. Reliability is manifested in the degree of agreement between the results when the same people are tested repeatedly under the same conditions; it is the stability or stability of an individual's test result when a control exercise is repeated. In other words, a child in the group of those surveyed based on the results of repeated testing (for example, jumping performance, running time, throwing distance) steadily retains its ranking place.

The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case, various methods are used, on the basis of which the reliability of the test is judged.

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time in the same conditions by the same experimenter. The method of repeated testing to determine the reliability is called a retest. The stability of the test depends on the type of test, the age and sex of the subjects, the time interval between the test and the retest. For example, indicators of conditional tests or morphological features at short time intervals are more stable than the results of coordination tests; in older children, the results are more stable than in younger ones. The retest is usually carried out no later than a week later. At longer intervals (for example, after a month), the stability of even tests such as running 1000 m or standing long jump becomes noticeably lower.

Test equivalence consists in the correlation of the test result with the results of other tests of the same type (for example, when it is necessary to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 m).

The attitude towards equivalent (homogeneous) tests depends on many factors. If it is necessary to increase the reliability of the estimates or conclusions of the study, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, only one of the equivalent tests should be used. Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous test battery is a 30m run, a pull-up, a forward bend, and a 1000m run.

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, average target accuracy of 1, 3, 5, 7, and 9 attempts is compared to average accuracy of shots of 2, 4, 6, 8, and 10 attempts. This method of assessing reliability is called the method of doubling or splitting. It is used mainly when assessing coordination abilities and if the number of attempts that form a test result is not less than 6.

The objectivity (consistency) of the test is understood as the degree of consistency of the results obtained on the same subjects by different experimenters (teachers, judges, experts).

To increase the objectivity of testing, it is necessary to comply with the standard test conditions:

testing time, location, weather conditions;

unified material and hardware support;

psychophysiological factors (volume and intensity of load, motivation);

presentation of information (exact verbal statement of the test task, explanation and demonstration).

This is the so-called objectivity of the test. They also talk about interpretive objectivity, which refers to the degree of independence of interpretation of test results by different experimenters.

In general, as experts note, the reliability of tests can be improved in various ways: more stringent standardization of testing (see above), an increase in the number of attempts, better motivation of the subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, an increase in the number of equivalent tests .

There are no fixed values ​​for test reliability indicators. In most cases, the following recommendations are used: 0.95 - 0.99 - excellent reliability; 0.90--0.94 - good; 0.80 - 0.89 - acceptable; 0.70--0.79 - bad; 0.60 - 0.69 - doubtful for individual assessments, the test is suitable only for characterizing a group of subjects.

The informativeness of a test is the degree of accuracy with which it measures the assessed motor ability or skill. In foreign (and domestic) literature, instead of the word “informativeness”, the term “validity” is used (from the English validity - validity, validity, legality). In fact, speaking about informativeness, the researcher answers two questions: what does this particular test measure (test battery) and what is the degree of measurement accuracy?

There are several types of validity: logical (meaningful), empirical (based on experimental data) and predictive (2)

Important additional test criteria are standardization, comparability and economy.

The essence of normalization is that, based on the test results, it is possible to create norms that are of particular importance for practice.

Comparability of a test is the ability to compare the results obtained from one or more forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that as a result of the regular use of the same test, not only and not so much the level of ability, but the degree of skill is assessed. Simultaneously comparable test results increase the reliability of the conclusions.

The essence of economy as a test quality criterion is that the test does not require a long time, large material costs and the participation of many assistants.


Conclusion

The forerunners of modern motor tests arose in the late 19th and early 20th centuries. Since 1920, mass surveys have been conducted in our country in order to study the main indicators of physical development and the level of motor fitness. On this basis of these data, the standards of the Ready for Labor and Defense complex were developed.

The concept of five motor abilities has firmly entered the theory of testing: strength, speed, coordination of movements, endurance and flexibility. To evaluate them, a whole line various test batteries.

Among the ways to assess the physical condition of a person, the test method is the main one. There are single and complex tests. Also, in connection with the systematization of physical (motor) abilities, tests are classified into conditional and coordination tests.

All tests must meet special requirements. The main criteria include: reliability, stability, equivalence, objectivity, informativeness (validity). Additional criteria include: normalization, comparability and economy.

Therefore, when choosing certain tests, it is necessary to comply with all these requirements. To increase the objectivity of tests, one should adhere to more stringent standardization of testing, an increase in the number of attempts, better motivation of the subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, and an increase in the number of equivalent tests.


Chapter 2. Tasks, methods and organization of research

2.1 Research objectives:

1. To study information about the theory of testing according to literary sources;

2. Analyze the methodology for testing physical qualities;

3. Compare the indicators of motor readiness of students in grades 7a and 7b.

2.2 Research methods:

1. Analysis and generalization of literary sources.

carried out throughout the study. Solving these problems on theoretical level is carried out on the study of literature on: the theory and methodology of physical education and sports, the education of physical qualities, sports metrology. 20 literary sources were analyzed.

2. Verbal influence.

There was a briefing on the sequence of performing motor tests and a motivational conversation to set the mood for achieving the best result.

3. Testing of physical qualities.

30 meters run (from a high start),

shuttle run 3 x 10 meters,

standing long jump,

6-minute run (m),

forward bend from a sitting position (cm),

pull-ups on the crossbar (girls on the low).

4. Methods of mathematical statistics.

Used to carry out calculations that were used in comparative analysis students in grades 7a and 7b.

2.3 Organization of the study

At the first stage, in April 2009, the scientific and methodological literature was analyzed:

study of the content of physical education programs for students of general education

The areas of application, goals and objectives of software testing are diverse, so testing is evaluated and explained in different ways. Sometimes it is difficult for testers themselves to explain what "as is" software testing is. There is confusion.

To unravel this confusion, Alexey Barantsev (practitioner, trainer and consultant in software testing; a native of the Institute of System Programming of the Russian Academy of Sciences) precedes his testing trainings with an introductory video about the main points of testing.

It seems to me that in this report the lecturer was able to most adequately and balancedly explain "what is testing" from the point of view of a scientist and a programmer. It is strange that this text has not yet appeared on Habré.

Here is a condensed retelling of this report. At the end of the text there are links to full version as well as the mentioned video.

Basic provisions of testing

Dear colleagues,

First, let's try to understand what testing is NOT.

Testing is not development,

Even if testers know how to program, including tests (test automation = programming), they can develop some kind of auxiliary programs (for themselves).

However, testing is not a software development activity.

Testing is not analysis,

And not the activity of collecting and analyzing requirements.

Although, in the testing process, sometimes you have to clarify the requirements, and sometimes you have to analyze them. But this activity is not the main one, rather, it has to be done simply out of necessity.

Testing is not management,

Despite the fact that in many organizations there is such a role as a "test manager". Of course, testers need to be managed. But testing by itself is not management.

Testing is not technical writing,

However, testers have to document their tests and their work.

Testing cannot be considered one of these activities simply because during the development process (or requirements analysis, or writing documentation for their tests), all this work is done by testers. for myself and not for someone else.

Activity is significant only when it is in demand, that is, testers must produce something “for export”. What do they do "for export"?

Defects, defect descriptions, or test reports? This is partly true.

But this is not the whole truth.

Main activity of testers

is that they provide software project participants with negative feedback about the quality of the software product.

"Negative feedback" does not carry any negative connotation, and does not mean that the testers are doing something bad, or that they are doing something badly. It's just a technical term that means a fairly simple thing.

But this thing is very significant, and, probably, the single most significant component of the activity of testers.

There is a science - " systems theory". It defines the concept of "feedback".

"Feedback" is some data that goes back to the input from the output, or some part of the data that goes back to the input from the output. This feedback can be positive or negative.

Both types of feedback are equally important.

In the development of software systems, positive feedback is, of course, some kind of information that we receive from end users. These are requests for some new functionality, this is an increase in sales (if we release a quality product).

Negative feedback can also come from end users in the form of some kind of negative feedback. Or it can come from testers.

The earlier negative feedback is provided, the less energy is needed to modify that signal. That is why you need to start testing as early as possible, at the earliest stages of the project, and provide this feedback at the design stage, and maybe even earlier, even at the stage of collecting and analyzing requirements.

By the way, this is where the understanding that testers are not responsible for quality comes from. They help those who are responsible for it.

Synonyms for "testing"

From the point of view that testing is the provision of negative feedback, the world-famous abbreviation QA (Quality Assurance) is definitely NOT synonymous with the term “testing”.

Simply providing negative feedback cannot be considered quality assurance, as assurance is some positive action. It is understood that in this case we provide quality, we take timely measures to improve the quality of software development.

But “quality control” - Quality Control, can be considered in a broad sense a synonym for the term “testing”, because quality control is the provision of feedback in its most diverse varieties, at various stages of a software project.

Sometimes testing is meant as some separate form of quality control.

The confusion comes from the history of testing development. AT different time The term "testing" means various activities, which can be divided into 2 large classes: external and internal.

External definitions

The definitions given at different times by Myers, Beizer, Kaner describe testing precisely from the point of view of its EXTERNAL significance. That is, from their point of view, testing is an activity that is intended FOR something, and does not consist of something. All three of these definitions can be summarized as providing negative feedback.

Internal definitions

These are the definitions that are given in the terminology standard used in software engineering, such as the de facto standard called SWEBOK.

Such definitions constructively explain WHAT the testing activity is, but they do not give the slightest idea of ​​WHAT testing is for, for which all the results of checking the correspondence between the actual behavior of the program and its expected behavior will then be used.

testing is

  • verification of the compliance of the program with the requirements,
  • carried out by observing its work
  • in special, artificially created situations, chosen in a certain way.
Henceforth, we will consider this a working definition of "testing".

The general scheme of testing is approximately the following:

  1. The tester receives the program and/or requirements at the input.
  2. He does something with them, observes the work of the program in certain situations artificially created by him.
  3. At the output, he receives information about matches and inconsistencies.
  4. This information is then used to improve the existing program. Or in order to change the requirements for a program that is still being developed.

What is a test

  • This is a special, artificially created situation, chosen in a certain way,
  • and a description of what observations of the operation of the program need to be made
  • to check if it meets some requirement.
It is not necessary to assume that the situation is something one-time. The test can be quite long, for example, when testing performance, this artificially created situation can be a load on the system that continues for quite a long time. And the observations that need to be made in this case are a set of various graphs or metrics that we measure in the process of performing this test.

The test developer is engaged in the fact that he selects some limited set from a huge potentially infinite set of tests.

Well, so we can conclude that the tester does two things in the testing process.

1. Firstly, it controls the execution of the program and creates these very artificial situations in which we are going to check the behavior of the program.

2. And secondly, he observes the behavior of the program and compares what he sees with what is expected.

If a tester automates tests, then he does not himself observe the behavior of the program - he delegates this task to a special tool or a special program that he himself wrote. It is she who observes, she compares the observed behavior with the expected one, and gives the tester only some final result - whether the observed behavior matches the expected one, or does not match.

Any program is a mechanism for processing information. The input is information in one form, the output is information in some other form. At the same time, the program can have many inputs and outputs, they can be different, that is, the program can have several different interfaces, and these interfaces can have different types:

  • User Interface (UI)
  • Programming Interface (API)
  • network protocol
  • File system
  • Environment state
  • Developments
The most common interfaces are
  • custom,
  • graphic,
  • text,
  • cantilevered,
  • and speech.
Using all these interfaces, the tester:
  • somehow creates artificial situations,
  • and checks in these situations how the program behaves.

This is what testing is.

Other classifications of types of testing

The most commonly used division into three levels is
  1. unit testing,
  2. integration testing,
  3. system testing.
Unit testing usually means testing at a fairly low level, that is, testing individual operations, methods, functions.

System testing refers to testing at the user interface level.

Sometimes some other terms are also used, such as "component testing", but I prefer to single out these three, because the technological division into unit and system testing does not make much sense. At different levels, the same tools, the same techniques can be used. The division is conditional.

Practice shows that the tools that are positioned by the manufacturer as unit testing tools can be used with equal success at the level of testing the entire application as a whole.

And tools that test the entire application as a whole at the user interface level sometimes want to look, for example, into a database or call some separate stored procedure there.

That is, the division into system and unit testing is generally purely conditional, speaking from a technical point of view.

The same tools are used, and this is normal, the same techniques are used, at each level we can talk about testing of a different kind.

We combine:

That is, we can talk about unit testing of functionality.

We can talk about system testing of functionality.

You can talk about unit testing, for example, efficiency.

We can talk about system testing of effectiveness.

Either we consider the efficiency of a single algorithm, or we consider the efficiency of the entire system as a whole. That is, the technological division into unit and system testing does not make much sense. Because the same tools, the same techniques can be used at different levels.

Finally, during integration testing, we check if, within a certain system, modules interact with each other correctly. That is, we actually perform the same tests as in system testing, only we additionally pay attention to how the modules interact with each other. Let's do some additional checks. This is the only difference.

Let's try again to understand the difference between system testing and unit testing. Since such a division occurs quite often, this difference should be.

And this difference manifests itself when we perform not a technological classification, but a classification by goals testing.

Target classification is conveniently done using the "magic square", which was originally invented by Brian Marik and later improved by Eri Tennen.

In this magic square, all types of testing are located in four quadrants, depending on what is paid more attention to in these tests.

Vertically - the higher the type of testing is, the more attention is paid to some external manifestations behavior of the program, the lower it is, the more attention we pay to its internal technological structure of the program.

Horizontally - the more to the left our tests are, the more attention we pay to their programming, the to the right they are, the more attention we pay to manual testing and human program research.

In particular, such terms as acceptance testing, Acceptance Testing, unit testing can be easily entered into this square in the sense in which it is most often used in the literature. This is low-level testing with a lot, with an overwhelming share of programming. That is, all the tests are programmed, fully automatically performed, and attention is paid primarily to internal device program, namely its technological features.

In the upper right corner we will have manual tests aimed at some external behavior of the program, in particular usability testing, and in the lower right corner we will most likely have tests of various non-functional properties: performance, security, and so on.

So, based on the classification by goals, we have unit testing in the lower left quadrant, and all other quadrants are system testing.

Thank you for your attention.


Key questions: Test as a measurement tool. Basic testing theories. Functions, possibilities and limitations of testing. The use of tests in personnel assessment. Advantages and disadvantages of using tests. Forms and types test items. Task construction technology. Test quality assessment. Reliability and validity. Test development software. 2




Test as a measurement tool Basic concepts in testology: measurement, test, content and form of tasks, reliability and validity of measurement results. In addition, testology uses such concepts of statistical science as sampling and population, averages, variation, correlation, regression, etc. 4




A test task is a didactically and technologically effective unit of control material, a part of the test that meets the requirements of subject purity of content (or one-dimensionality), content and logical correctness, correctness of form, acceptability of the geometric image of the task. 6




The traditional test is a standardized method for diagnosing the level and structure of readiness. In such a test, all subjects answer the same tasks, at the same time, under the same conditions and with the same rules for evaluating answers. To achieve the test goal, you can create countless tests, and all of them can correspond to the achievement of the task. eight


Professiogram (from Lat. Professio specialty + Gramma entry) is a system of features that describe a particular profession, and also includes a list of norms and requirements for an employee by this profession or specialty. In particular, the professiogram may include a list of psychological characteristics that representatives of specific professional groups must meet. 9


Basic testing theory The first scientific works on test theory appeared at the beginning of the twentieth century, at the intersection of psychology, sociology, pedagogy and other so-called behavioral sciences. Foreign psychologists call this science psychometrics (Psychometrika), and teachers - pedagogical measurement (Educational measurement). Unclouded by ideology and politics, the interpretation of the name "testology" is simple and transparent: the science of tests. ten


The first stage - prehistory - from antiquity to the end of the 19th century, when pre-scientific forms of control of knowledge and abilities were widespread; the second period, classical, lasted from the beginning of the 20s to the end of the 60s, during which the classical theory of tests was created; the third period - technological - which began in the 70s - the time of the development of methods for adaptive testing and learning, the methodology for the effective development of tests and test tasks for the parametric assessment of subjects by the measured latent quality. eleven


Functions, possibilities and limitations of testing The tests used in the selection are designed to obtain a psychological portrait of the candidate, assess his abilities, as well as professional knowledge and skills. Tests allow you to compare candidates with each other or with standards, that is, an ideal candidate. Tests are used to measure the qualities of a person necessary for the effective performance of a job. Some tests are designed in such a way that the employer itself administers the test and calculates the results. Others require the services of experienced consultants to ensure their correct application. 12


The limitations of using tests are related - to their expensive administration; - with suitability for assessing the abilities of a person; - tests are more successful in predicting success in work that contains short-term professional tasks, and are not very useful in cases where tasks solved at work take several days or weeks. 13








2. The terminology used should be selected based on a specific target audience. Redundant articles or articles that include two or more questions should also be excluded, as they sometimes confuse the respondent and make interpretation difficult. 17


3. To meet all these requirements, you should go through the entire bank of questions article by article and analyze what purpose each of them serves. For example, if a test is being developed to measure the analytical skills of accounting trainees, it is worth considering what the term " analytic skills". eighteen




5. When questions and scoring formats are chosen, they should be converted into a user-friendly format, with clearly written instructions and example questions; so that test takers fully understand what is required of them. twenty


6. Very often, at this stage of development, more questions are included in the test than necessary. By some estimates, three times as much as will remain in the final test or measurement system. The starting point then would be to test the test being developed on a relatively large sample of existing workers to ensure that all questions are easily understood. 21


7. Knowledge tests usually start with simple questions gradually getting more difficult towards the end. When tests are meant to be measured social attitudes and personal characteristics It may be helpful to alternate negatively and positively worded articles to avoid ill-conceived responses. 22


8. The last stage is the application of the test on a wide representative sample to establish standards of performance, validity and validity even before it is used as a selection tool. In addition, the fairness of the test must be determined to ensure that it does not discriminate against any subgroups of the population (eg, ethnic differences). 23


Evaluation of test quality In order for selection methods to be effective enough, they must be reliable, valid and reliable. The reliability of the selection method is characterized by its non-susceptibility systematic errors when measuring, that is, its consistency when different conditions. 24


In practice, reliability in making judgments is achieved by comparing the results of two or more similar tests conducted on different days. Another way to increase reliability is to compare the results of several alternative selection methods (eg test and interview). If the results are similar or the same, they can be considered correct. 25


Reliability means that the measurements taken will give the same result as the previous ones, that is, the results of the assessment are not affected by external factors. Validity means that the method measures exactly what it is intended to do. The maximum possible accuracy of information obtained by specially developed methods in scientific research, is limited by technical factors and does not exceed 0.8. 26


In the practice of personnel selection, it is noted that the reliability various methods estimates are located in the intervals: 0.1 - 0.2 - traditional interview; 0.2 - 0.3 - recommendations; 0.3 - 0.5 - professional tests; 0.5 - 0.6 - structured interview, competency-based interview; 0.5 - 0.7 - cognitive and personality tests; 0.6 - 0.7 - competence-based approach (assessment - center). 27


Validity refers to the degree of accuracy with which given result, method or criterion "predicts" the future performance of the person being tested. Validity of methods refers to the conclusions drawn from a particular procedure, not to the procedure itself. That is, the selection method may itself be reliable, but not correspond to a specific task: to measure not what is required in this case. 28


Software for developing tests In domestic practice, various integrated programs with the "Psychodiagnostics" module are presented, for example, the program "1 C: Salary and Personnel Management 8.0" with the "Psychodiagnostics" module, developed jointly with a group of teachers of the Department of Personality Psychology and General Psychology of the Faculty of Psychology Moscow State University M. V. Lomonosov under the direction of Dr. psych. sciences, prof. A. N. Guseva. A training simulator for developing personnel assessment systems and adapting test methods of the Faculty of Psychology of TSU, also developed on the basis of "1 C: Enterprise 8.2" by Personnel Soft. 29


References: Selection and recruitment: testing and evaluation technologies / Dominic Cooper, Ivan T. Robertson, Gordon Tinline. - M., publishing house "Vershina", - 156 p. Psychological support professional activity: theory and practice / Ed. Prof. G. S. Nikiforova. - St. Petersburg: Speech, - 816 p. thirty