Biographies Characteristics Analysis

Presentation of the basis of the theory of tests in physical education. Testing Basics

Description of the presentation by individual slides:

1 slide

Slide description:

2 slide

Slide description:

Physical qualities are usually called congenital (genetically inherited) morphofunctional qualities, thanks to which physical (materially expressed) human activity is possible, which receives its full manifestation in purposeful motor activity. The main physical qualities include strength, speed, endurance, flexibility, and agility.

3 slide

Slide description:

Motor abilities are individual characteristics that determine the level of a person’s motor capabilities (V.I. Lyakh, 1996). The basis of a person’s motor abilities is physical qualities, and the form of manifestation is motor abilities and skills. Motor abilities include strength, speed, speed-strength, motor-coordination abilities, general and specific endurance

4 slide

Slide description:

Scheme of systematization of physical (motor) abilities Physical (motor) abilities Conditioning (energy) Strength Combinations of conditioning abilities Endurance Speed ​​Flexibility Coordination (information) CS related to separate groups motor actions, special CS Specific CS Combinations of coordination abilities Combinations of conditioning and coordination abilities

5 slide

Slide description:

YOU CAN GET ACCURATE INFORMATION ABOUT THE LEVEL OF DEVELOPMENT OF MOTOR ABILITIES /high, medium, low/ USING TESTS /or control exercises/.

6 slide

Slide description:

With the help of control tests (tests), it is possible to identify absolute (explicit) and relative (hidden, latent) indicators of these abilities. Absolute indicators characterize the level of development of certain motor abilities without taking into account their influence on each other. Relative indicators make it possible to judge the manifestation of motor abilities taking into account this influence.

7 slide

Slide description:

The above-mentioned physical abilities can be represented as existing potentially, that is, before the start of any motor activity or activities (they can be called potential abilities) and as actually manifesting themselves at the beginning (including when performing motor tests) and in the process of performing this activities (current physical abilities).

8 slide

Slide description:

With a certain degree of convention, we can talk about ELEMENTARY and physical abilities COMPLEX physical abilities

Slide 9

Slide description:

RESEARCH RESULTS ALLOW TO DISTINCTION THE FOLLOWING PHYSICAL ABILITIES SPECIAL SPECIFIC GENERAL KS

10 slide

Slide description:

Special physical abilities refer to homogeneous groups of integral motor actions or activities: running, acrobatic and gymnastic exercises on apparatus, throwing motor actions, sports games (basketball, volleyball).

11 slide

Slide description:

ABOUT specific manifestations physical abilities can be spoken of as components that make up their internal structure.

12 slide

Slide description:

Thus, the main components of a person’s coordination abilities are: the ability to navigate, balance, respond, differentiate movement parameters; ability to rhythm, rearrangement of motor actions, vestibular stability, voluntary muscle relaxation. These abilities are specific.

Slide 13

Slide description:

The main components of the structure of speed abilities are considered to be the speed of response, the speed of a single movement, the frequency of movements and the speed manifested in integral motor actions.

Slide 14

Slide description:

Manifestations of strength abilities include: static (isometric) strength, dynamic (isotonic) strength - explosive, shock-absorbing force.

15 slide

Slide description:

The structure of endurance is very complex: aerobic, requiring oxygen sources of energy breakdown for its manifestation; anaerobic (glycolytic, creatine phosphate energy sources - without the participation of oxygen); endurance of various muscle groups in static poses - static endurance; endurance in dynamic exercises performed at a speed of 20-90% of the maximum.

16 slide

Slide description:

Less complex are the manifestations (forms) of flexibility, where active and passive flexibility are distinguished.

Slide 17

Slide description:

General physical abilities should be understood as the potential and realized capabilities of a person, which determine his readiness to successfully carry out motor actions of various origins and meanings. Special physical abilities are a person’s capabilities that determine his readiness to successfully carry out motor actions of similar origin and meaning. Therefore, tests provide information primarily about the degree of formation of special and specific physical (speed, coordination, strength, endurance, flexibility) abilities.

18 slide

Slide description:

Special physical abilities are a person’s capabilities that determine his readiness to successfully carry out motor actions of similar origin and meaning. Therefore, tests provide information primarily about the degree of formation of special and specific physical (speed, coordination, strength, endurance, flexibility) abilities.

Slide 19

Slide description:

The objectives of testing are to identify the levels of development of conditioning and coordination abilities, to assess the quality of technical and tactical readiness. Based on the test results, you can: compare the preparedness of both individual students and entire groups living in different regions and countries; conduct sports selection for practicing one or another sport, for participation in competitions; exercise largely objective control over the education (training) of schoolchildren and young athletes; identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes; finally, to substantiate the norms (age-specific, individual) for the physical fitness of children and adolescents.

20 slide

Slide description:

Along with the above-mentioned tasks, in the practice of different countries, the testing tasks boil down to the following: to teach schoolchildren themselves to determine the level of their physical fitness and plan the complexes necessary for themselves physical exercise; encourage students to further improve their physical condition (shape); to know not so much the initial level of development of motor ability, but its change over a certain time; encourage students who have achieved high results, but not so much for a high level, but for a planned increase in personal results.

21 slides

Slide description:

A test is a measurement or test taken to determine a person's ability or condition.

22 slide

Slide description:

Only those tests (samples) that meet special requirements can be used as tests: the purpose of using any test (or tests) must be determined; A standardized test measurement methodology and testing procedure should be developed; it is necessary to determine the reliability and information content of the tests; test results can be presented in the appropriate evaluation system

Slide 23

Slide description:

Test. Testing. Testing result The system of using tests in accordance with the task, the organization of conditions, the execution of tests by subjects, the evaluation and analysis of results is called testing. The numerical value obtained during measurements is the result of testing (test).

24 slide

Slide description:

The basis of the tests used in physical culture, lie motor actions (physical exercises, motor tasks). Such tests are called movement or motor tests.

25 slide

Slide description:

There is a known classification of tests according to their structure and, according to their primary indications, a distinction is made between single and complex tests. A single test is used to measure and evaluate one trait (coordination or conditioning ability).

26 slide

Slide description:

Slide 27

Slide description:

Using a complex test, several signs or components of different or the same ability are assessed. for example, jumping up from a place (with a wave of the arms, without a wave of the arms, to a given height).

28 slide

Slide description:

Slide 29

Slide description:

TESTS may be conditioning tests to assess strength abilities to assess endurance; to assess speed abilities; to assess flexibility, coordination tests to assess coordination abilities related to individual independent groups of motor actions that measure special coordination abilities; to assess specific coordination abilities - the ability to balance, spatial orientation, response, differentiation of movement parameters, rhythm, rearrangement of motor actions, coordination (communication), vestibular stability, voluntary muscle relaxation).

30 slide

Slide description:

Each classification is a kind of guidelines for selecting (or creating) the type of tests that are more consistent with testing tasks.

31 slides

Slide description:

QUALITY CRITERIA FOR MOTOR TESTS The concept of “motor test” meets its purpose when the test satisfies the relevant basic criteria: reliability, stability, equivalence, objectivity, informativeness (validity), and additional criteria: standardization, comparability and efficiency. Tests that meet the requirements of reliability and information content are called good or authentic (reliable).

32 slide

Slide description:

Reliability of a test refers to the degree of accuracy with which it assesses a specific motor ability, regardless of the requirements of the person assessing it. Reliability is the extent to which results are consistent when the same people are tested repeatedly under the same conditions; This is the stability or stability of an individual's test result when the control exercise is repeated. In other words, a child in a group of subjects, based on the results of repeated testing (for example, jumping performance, running time, throwing distance), consistently retains its ranking place. The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case they use various ways, on the basis of which the reliability of the test is judged.

Slide 33

Slide description:

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time under the same conditions by the same experimenter. The method of repeated testing to determine reliability is called retest. The stability of the test depends on the type of test, the age and gender of the subjects, and the time interval between test and retest. For example, performance on conditioning tests or morphological traits over short time intervals is more stable than performance on coordination tests; in older children the results are more stable than in younger ones. A retest is usually carried out no later than a week later. At longer intervals (for example, after a month), the stability of even such tests as the 1000 m run or standing long jump becomes noticeably lower.

Slide 34

Slide description:

Test equivalence Test equivalence is the correlation of the test result with the results of other tests of the same type. For example, when you need to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 m. The attitude towards equivalent (homogeneous) tests depends on many reasons. If it is necessary to increase the reliability of assessments or study conclusions, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, then only one of the equivalent tests should be used. Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous battery of tests is the 30 m run, pull-up, forward bend, and 1000 m run.

35 slide

Slide description:

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, the average accuracy of shots on target from 1, 3, 5, 7 and 9 attempts is compared with the average accuracy of shots from 2, 4, 6, 8 and 10 attempts. This method of assessing reliability is called the doubling method, or splitting. It is used primarily when assessing coordination abilities and in the event that the number of attempts that form the test result is at least six.

36 slide

Slide description:

Under the objectivity (consistency) of the test The objectivity (consistency) of the test is understood as the degree of consistency of the results obtained on the same subjects by different experimenters (teachers, judges, experts). To increase the objectivity of testing, it is necessary to comply with standard test conditions: testing time, place, weather; unified material and hardware support; psychophysiological factors (volume and intensity of load, motivation); presentation of information (precise verbal statement of the test task, explanation and demonstration). This is the so-called objectivity of the test. They also talk about interpretive objectivity, which concerns the degree of independence in the interpretation of test results by different experimenters.

Slide 37

Slide description:

In general, as experts note, the reliability of tests can be increased in various ways: more stringent standardization of testing, an increase in the number of attempts, better motivation of subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, and an increase in the number of equivalent tests. There are no fixed values ​​for test reliability indicators. In most cases, the following recommendations are used: 0.95 - 0.99 - excellent reliability; 0.90 -- 0.94 -- good; 0.80 -- 0.89 -- acceptable; 0.70 - 0.79 - bad; 0.60 - 0.69 - doubtful for individual assessments, the test is suitable only for characterizing a group of subjects.

Slide 38

Slide description:

The validity of a test is the degree of accuracy with which it measures the motor ability or skill being assessed. In foreign (and domestic) literature, instead of the word “informativeness”, the term “validity” is used (from the English validity - validity, validity, legality). In fact, when talking about information content, the researcher answers two questions: what does this particular test (battery of tests) measure and what is the degree of measurement accuracy. There are several types of validity: logical (substantive), empirical (based on experimental data) and predictive.

Slide 39

Slide description:

Important additional test criteria, as noted, are standardization, comparability and efficiency. The essence of standardization is that, based on test results, you can create standards that have special meaning for practice. Test comparability is the ability to compare results obtained from one or more forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that, as a result of regular use of the same test, the degree of skill is assessed not only and not so much as the level of ability. At the same time, comparable test results increase the reliability of the conclusions. The essence of economy as a criterion for the quality of a test is that conducting the test does not require a long time, large material costs and the participation of many assistants.

40 slide

Slide description:

ORGANIZATION OF TESTING THE READINESS OF SCHOOL-AGE CHILDREN The second important problem of testing motor abilities (recall that the first is the selection of informative tests) is the organization of their use. The physical education teacher must determine: in what timeframe it is better to organize testing, how to carry it out in the lesson and how often it should be carried out testing The timing of testing is consistent with the school program, which provides for mandatory testing of students’ physical fitness twice a day.

41 slides

Slide description:

Knowledge of annual changes in the development of children’s motor abilities allows the teacher to make appropriate adjustments to the process of physical education for the next academic year. However, the teacher must and can conduct more frequent testing, conduct the so-called operational control. It is advisable to do this in order to determine, for example, changes in the level of speed, strength abilities and endurance under the influence of athletics lessons during the first quarter. For this purpose, the teacher can use tests to assess children’s coordination abilities at the beginning and at the end of mastering the program material, for example, in sports games, to identify changes in the development indicators of these abilities.

42 slide

Slide description:

It should be taken into account that the variety of pedagogical problems being solved does not allow the teacher to be provided with a unified testing methodology, the same rules for conducting tests and evaluating test results. This requires experimenters (teachers) to demonstrate independence in solving theoretical, methodological and organizational testing issues. Testing in a lesson must be linked to its content. In other words, the test or tests used, subject to the appropriate requirements (as a research method), should be organically included in the planned physical exercises. If, for example, children need to determine the level of development of speed abilities or endurance, then the necessary tests should be planned in that part of the lesson in which the tasks of developing the corresponding physical abilities will be solved.

43 slide

Slide description:

The frequency of testing is largely determined by the rate of development of specific physical abilities, age, gender and individual characteristics their development. For example, to achieve a significant increase in speed, endurance or strength, several months of regular exercise (training) are required. At the same time, in order to obtain a significant increase in flexibility or individual coordination abilities, only 4-12 workouts are required. If you start from scratch, you can achieve improvement in physical quality in a shorter period of time. And in order to improve the same quality when a child has a high level, it takes more time. In this regard, the teacher must study more deeply the features of the development and improvement of various motor abilities in children at different age and gender periods.

44 slide

Slide description:

When assessing the general physical fitness of children, you can use a wide variety of test batteries, the choice of which depends on the specific testing objectives and the availability of necessary conditions. However, due to the fact that the test results obtained can only be assessed by comparison, it is advisable to choose tests that are widely represented in the theory and practice of physical education of children. For example, rely on those recommended in the FC program. To compare the general level of physical fitness of a student or group of students using a set of tests, they resort to converting test results into points or scores. The change in the amount of points during repeated testing makes it possible to judge the progress of both an individual child and a group of children.

Slide 49

Slide description:

An important aspect of testing is the problem of choosing a test to assess a specific physical ability and general physical fitness.

50 slide

Slide description:

Practical recommendations and advice. IMPORTANT: Determine (select) the battery (or set) of necessary tests with a detailed description of all the details of their implementation; Set testing dates (better - 2-3 weeks of September - 1st testing, 2-3 weeks of May - 2nd testing); In accordance with the recommendation, accurately determine the age of children on the day of testing and their gender; Develop unified data recording protocols (possibly based on the use of ICT); Determine the circle of assistants and carry out the testing procedure itself; Immediately carry out mathematical processing of testing data - calculation of basic statistical parameters (arithmetic mean, error of arithmetic mean, standard deviation, coefficient of variation and assessment of the reliability of differences between means arithmetic indicators, for example, a parallel of classes of the same and different schools of children of such and such age and gender); One of the significant stages of the work may be the translation of test results into points or scores. With regular testing (2 times a year, for several years), this will allow the teacher to have an idea of ​​the progress of the results.

51 slides

Slide description:

Moscow “Enlightenment” 2007 The book contains the most common motor tests to assess the conditioning and coordination abilities of students. The benefit provides individual approach physical education teacher to each specific student, taking into account his age and physique.

The problem of testing human physical fitness developed in the theory and methodology of physical education, sports metrology, anthropomotorics, biomechanics, sports medicine and other sciences. Over approximately 130-140 years of the history of this problem, a huge and varied material has been accumulated, which has always aroused and continues to arouse great interest not only from scientists, but also from physical education teachers, coaches, students, and their parents.

The first article devoted to the problem under consideration is introductory. It reveals the basics of the theory of tests and testing, without familiarization with which it is difficult for a teacher to solve the problems of using tests in the practice of his work. Let us name at least some of the issues that arise. What is a "test"? What is the classification of tests? Why and is it necessary to test the physical fitness of students? How to determine the level (high, medium, low) of development of physical qualities and preparedness? What is considered the norm when testing and how to set it? If a teacher came up with a new motor test or battery of tests to determine the physical fitness of children, then what should he pay attention to or what necessary conditions (requirements, criteria) should he fulfill? Testing the physical condition of students requires mandatory familiarization of the teacher with elementary methods of mathematical statistics. Which ones?

In our articles we will also present historical information about the emergence of tests and the theory of testing human physical fitness. Let's say when and where the first tests appeared, including batteries of tests to assess physical fitness. What are the most common tests to determine the conditioning (strength, speed, endurance, flexibility) and coordination abilities of school-age children? Which batteries (programs) of tests for assessing the physical fitness of children and adolescents are the most popular in different countries? We will also discuss such an important practical problem, as the ratio of test results and grades (grades) in the subject “Physical Education”. More specifically, if a student consistently performs at a high level on tests, does that automatically mean an excellent grade in our subject? And so on.

In this article we will discuss: 1) testing tasks; 2) the concept of “test” and classification of motor (motor) tests; 3) criteria for the quality factor of motor tests; 4) organization of testing of physical fitness of school-age children.

1. Testing tasks. Testing human motor abilities is one of the most important areas of activity of scientists and teachers in the field of physical education and sports. It helps solve a number of complex pedagogical problems in identifying the levels of development of conditioning and coordination abilities, assessing the quality of technical and tactical readiness. Based on the test results, it is possible to compare the preparedness of both individual students and entire groups of students living in different regions and countries; carry out appropriate selection for practicing one or another sport, for participation in competitions; carry out fairly objective control over the education (training) of schoolchildren and young athletes; identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes; finally, to substantiate the norms (age-specific, individual) for the physical fitness of children and adolescents.



a) teach schoolchildren themselves to determine the level of their physical fitness and plan the necessary sets of physical exercises for themselves;

b) encourage students to further improve their physical condition
(forms);

c) know not so much the initial level of development of motor ability, but its change over a certain time;

d) stimulate students who have achieved high results, but not so much for the high level of physical fitness achieved, but for the implementation of the planned increase in personal results.



Experts emphasize that the traditional approach to testing, when data from standardized tests and standards are compared with the results shown, causes a negative attitude among many students, especially those with low and average levels of physical fitness. Testing should help increase the interest of schoolchildren, bring them joy, and not lead to the development of an inferiority complex. In this regard, we propose the following approaches:

1) the student’s test results are determined not based on comparison with standards, but on the basis of changes that have occurred over a certain period of time;

2) all components of the test are modified, lightweight versions of the exercises are used (the tasks that make up the content of the test must be easy enough so that the likelihood of their successful completion is high);

3) zero scores or those with a minus sign are excluded; only positive results are eligible.

So, when testing, it is important to bring together scientific (theoretical) tasks and personally significant, positive motives for the student to participate in this procedure.

2. The concept of “test” and classification of motor (motor) tests. The term test translated from in English means trial, test. Tests are used to solve many scientific and practical problems. Among the ways to assess a person’s physical condition (observation, expert assessments) the method of tests (in our case - motor, or motor) is the main method used in sports metrology and other scientific disciplines- “the study of movements”, theory and methods of physical education.

Test is a measurement or test taken to determine a person's ability or condition. There can be a lot of such measurements, including based on the use of a wide variety of physical exercises. However, not every physical exercise or test can be considered a test. Only those tests (samples) that meet special requirements and in accordance with which must be:

a) the purpose of using any test (or tests) is determined;

b) a standardized methodology for measuring test results and a testing procedure have been developed;

c) the reliability and information content of the tests was determined;

d) the ability to present test results in the appropriate assessment system has been implemented.

The system of using tests in connection with a given task, organizing conditions, performing tests by subjects, evaluating and analyzing the results is called testing. The numerical value obtained during measurements is test result.

For example, the standing long jump is a test; procedure for performing jumps and measuring results - testing; jump length - test result.

The tests used in physical education are based on motor actions (physical exercises, motor tasks). Such tests are called motor, or motor.

Currently, there is no unified classification of motor tests. There is a known classification of tests according to their structure and preferred indications (see Table 1).

Distinguish unit And complex tests. Unit test serves to measure and evaluate one trait (coordination or conditioning ability). Since the structure of each coordination or conditioning ability is complex, such a test usually evaluates only one component of this ability (for example, balance ability, simple reaction speed, arm muscle strength).

By using educational The test evaluates the ability for motor learning (based on the difference between the final and initial scores for a certain period of training in movement techniques).

Test series makes it possible to use the same test for a long time, when the measured ability improves significantly. At the same time, the test tasks consistently increase in difficulty. Unfortunately, this type of single test is not yet widely used both in science and in practice.

By using complex test evaluate several signs or components of different abilities or the same ability (for example, jumping up from a place - with a wave of the arms, without a wave of the arms, to a given height). Based on such a test, you can obtain information about the level of speed-strength abilities (based on the height of the jump), coordination abilities (based on the accuracy of differentiation of power efforts, the difference in the height of the jump with and without a swing of the arms).

Test profile consists of several separate tests on the basis of which several different physical abilities are assessed (heterogeneous test profile), or multiple manifestations of the same physical ability (homogeneous test profile). Test results can be presented in the form of a profile, which makes it possible

Forms of tests and possibilities of their use (according to D.-D. Blume, 1987)


Table 1


Type Measurable ability Sign of structure Example
Unit test
Elementary test containing one motor task One Test Objective, One Final Test Score Balance test, tremometer, connection test, rhythm test, landing accuracy jump
Practice test One ability or aspect (component) of an ability One or more test tasks. One final test score (teaching period) General Study Test
Test series One ability or aspect (component) of an ability One test problem with options or several problems of increasing difficulty Test for assessing the ability to connect (communication)
Complex test
Complex test containing one task Multiple abilities or aspects (components) of one ability One test task, multiple final grades Jump test
Reusable task test Multiple test tasks running sequentially, multiple final evaluations Reusable reaction test
Test profile Multiple abilities or aspects of one ability Multiple tests, multiple final assessments Coordinating star
Test battery Multiple abilities or aspects of one ability Multiple tests, one test score Test battery for assessing motor learning ability

quickly compare individual and group results.

Test battery also consists of several separate tests, the results of which are combined into one final score, considered in one of the rating scales (more on this in the second article). As in the test profile, here we distinguish homogeneous And heterogeneous batteries.

homogeneous battery, or homogeneous profile are used in assessing all components of a complex ability (eg, responsiveness). In this case, the results of individual tests must be closely interrelated (correlated).

A heterogeneous test profile or a heterogeneous battery serves to assess a complex (set) of various motor abilities. For example, such test batteries are used to assess strength, speed and endurance abilities - these are batteries of physical fitness tests.

In tests reusable tasks subjects perform motor tasks sequentially and receive separate marks for each solution of a motor task. These estimates may be close connection with each other. Through appropriate statistical calculations, additional information about the abilities being assessed can be obtained. An example is the sequentially performed jump test tasks (Table 2).

The definition of motor tests states that they assess motor abilities and partly motor skills. Therefore, in the very general view There are conditioning tests, coordination tests and tests for assessing motor abilities and skills (movement techniques). This systematization is, however, still too general.

Classification of motor tests according to their predominant indications stems from the systematization of physical (motor) abilities. In this regard, there are conditioning tests(to assess strength: maximum, speed, strength endurance; to assess endurance; to assess speed abilities; to assess flexibility: active and passive) and coordination tests(to estimate coor

dination abilities related to separate independent groups of motor actions that measure special coordination abilities; to assess specific coordination abilities - the ability to balance, spatial orientation, response, differentiation of movement parameters, rhythm, rearrangement of motor actions, coordination (communication), vestibular stability, voluntary muscle relaxation.

A large number of tests have been developed to assess motor skills in various sports. They are given in the relevant textbooks and manuals and are not discussed in this article.

Thus, each classification serves as a kind of guideline for selecting (or creating) the type of tests that best suits the testing objectives.

3. Quality criteria for motor tests. As noted above, the concept of “motor test” meets its purpose if the test satisfies the relevant basic criteria: reliability, stability, equivalence, objectivity, information content, as well as additional criteria: standardization, comparability and economy.

Tests that meet the requirements of reliability and information content are called good or authentic (reliable).

Reliability of a test refers to the degree of accuracy with which it assesses a particular motor ability, regardless of the requirements of the person assessing it. Reliability is the extent to which results are consistent when the same people are tested repeatedly under the same conditions; it is the stability or stability of an individual's test result when a test exercise is repeated. In other words, a student in a group of subjects, based on the results of repeated testing (for example, jumping indicators, running time, throwing distance), consistently retains his ranking place.

The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case, various methods are used to judge the reliability of the test.

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time under the same conditions by the same experimenter. The method of repeated testing to determine reliability is called retest. The stability of the test depends on the type of test, the age and gender of the subjects, and the time interval between test and retest. For example, performance on conditioning tests or morphological traits over short time intervals is more stable than performance on coordination tests; For older students, the results are more stable than for younger ones. A retest is usually carried out no later than one week later. At longer intervals (for example, after a month), the stability of even such tests as the 1000 m run or standing long jump becomes noticeably lower.

Test equivalence lies in the correlation of the test result with the results of other tests of the same type. For example, the equivalence criterion is used when it is necessary to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 m.

This or that attitude towards equivalent (homogeneous) tests depends on many reasons. If it is necessary to increase the reliability of assessments or study conclusions, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, then only one of the equivalent tests should be used.


Table 2 Sequentially performed tasks of the jump test (according to D.-D. Blume, 1987)

№№ Test objective Result evaluation Ability
Jump to maximum height without swinging arms Height, cm Jumping force
Jump to maximum height with arm swing Height, cm Jumping power and connection ability
Jump to maximum height with arm swing and hop Height, cm Connectivity and jumping strength
10 jumps with arm swings at a distance equal to 2/3 of the maximum jump height, as in problem 2 Sum of deviations from a given mark Ability to differentiate power parameters of movements
The difference between the results of solving one problem and two problems ... cm Ability to connect (communication)

Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous test battery is the 30 m run, pull-ups, bending forward, and 1000 m run. Other examples of such complexes will be presented in a separate publication.

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, the average accuracy of throwing a ball at a target from 1, 3, 5, 7 and 9 attempts is compared with the average accuracy of throws from 2, 4, 6, 8 and 10 attempts. This method of assessing reliability is called the doubling method, or splitting, and it is used primarily when assessing coordination abilities and in the event that the number of attempts that form the test result is at least six.

Under objectivity(consistency) of a test refers to the degree of consistency of results obtained on the same subjects by different experimenters (teachers, judges, experts).

a) testing time, place, weather conditions;

b) unified material and hardware support;

c) psychophysiological factors (volume and intensity of load, motivation);

d) presentation of information (precise verbal statement of the test task, explanation and demonstration).

Compliance with these conditions creates the so-called objectivity of the test. They also talk about interpretive objectivity, concerning the degree of independence of interpretation of test results by different experimenters.

In general, as experts note, the reliability of tests can be increased in various ways: more stringent standardization of testing (see above), an increase in the number of attempts, better motivation of subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, an increase in the number of equivalent tests .

There are no fixed values ​​for test reliability indicators. In most cases, the following recommendations are used: 0.95-0.99 - excellent reliability; 0.90-0.94 - good; 0.80-0.89 - acceptable; 0.70-0.79 - bad; 0.60-0.69 - doubtful for individual assessments, the test is only suitable for characterizing a group of subjects. Information content of a test is the degree of accuracy with which it measures the motor ability or skill being assessed. In foreign and domestic literature, instead of the word “informativeness,” the term “validity” is used (from the English validity - validity, reality, legality). In fact, in relation to information content, the researcher answers two questions: what does this particular test (battery of tests) measure and what is the degree of measurement accuracy.

Distinguish validity logical (substantive), empirical (based on experimental data) and predictive. More detailed information on this topic is contained in the now classic textbooks for students of physical education universities (Sports Metrology / Edited by V.M. Zatsiorsky. - M.: FiS, 1982. - P. 73-80; Godik M.A. Sports metrology. - M.: FiS, 1988), as well as in a number of modern manuals.

Important additional test criteria, as noted, are standardization, comparability and efficiency.

The essence rationing is that, based on the test results, it is possible to create standards that are of particular importance for practice (this will be discussed in a separate article).

Comparability test is the ability to compare results obtained from one test or several forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that, as a result of regular use of the same test, the degree of skill is assessed not only and not so much as the level of ability. At the same time, comparable test results increase the reliability of the conclusions.

The essence efficiency as a criterion for the quality of the test is that conducting the test does not require a long time, large material costs and the participation of many assistants. For example, a battery of six tests to determine physical fitness, recommended in the “Comprehensive program of physical education for students in grades I-XI” (M.: Prosveshcheniye, 2005-2006), can be carried out by a teacher with two assistants in one lesson, examining 25-30 children .

Organization of testing of physical fitness of school-age children. The second important problem of testing motor abilities (recall that the first - the selection of informative tests - was discussed earlier) is the organization of their use.

The physical education teacher must determine when it is best to organize testing, how to carry it out in the classroom, and how often testing should be carried out.

Save testing are established in accordance with the school curriculum, which provides for mandatory testing of students’ physical fitness twice a day. It is advisable to conduct the first test in the second or third week of September (after the educational process has returned to normal), and the second - two weeks before the end of the school year (at a later date there may be organizational difficulties caused by upcoming exams and holidays).

Knowledge of annual changes in the development of motor abilities of schoolchildren allows the teacher to make appropriate adjustments to the process of physical education for the next school year. However, the teacher can and should conduct more frequent testing and exercise so-called operational control. It is advisable to perform this procedure, for example, in order to determine changes in the level of speed, strength abilities and endurance under the influence of athletics lessons during the first quarter, etc. For this purpose, the teacher can use tests to assess the coordination abilities of children at the beginning and at the end of mastering educational material school curriculum, for example, in sports games, to identify changes in the development indicators of these abilities.

It should be taken into account that the variety of pedagogical problems being solved does not make it possible to provide the teacher with a unified testing methodology, the same rules for conducting tests and evaluating test results. This requires experimenters (teachers) to demonstrate independence in solving theoretical, methodological and organizational testing issues.

Testing in class must be linked to its content. In other words, the test (or tests) used, subject to the appropriate requirements for it as a research method, should (should) be organically included in the planned physical exercises. If, for example, schoolchildren need to determine the level of development of speed abilities or endurance, then the necessary tests should be scheduled in that part of the lesson in which the tasks of developing the corresponding physical abilities will be solved.

Test frequency is largely determined by the pace of development of specific physical abilities, age, gender and individual characteristics of their development.

For example, to achieve a significant increase in speed, endurance or strength, several months of regular exercise (training) are required. At the same time, to obtain a significant increase in flexibility or individual coordination abilities, only 4-12 workouts are required. If you start from scratch, you can achieve improvement in one or another physical quality in a shorter period of time. But to improve the same quality, when it reaches a high level in a student, it takes more time. In this regard, the teacher must study more deeply the features of the development and improvement of various motor abilities in children at different age and gender periods.

When assessing the general physical fitness of students, as noted, you can use a wide variety of test batteries, the choice of which depends on the specific testing objectives and the availability of necessary conditions. However, due to the fact that the test results obtained can only be assessed by comparison, it is advisable to choose tests that are widely represented in the theory and practice of physical education of children. For example, rely on those recommended in the “Comprehensive program of physical education for students in grades I-XI of a comprehensive school” (M.: Prosveshcheniye, 2004-2006).

To compare the general level of physical fitness of a student or group of students using a set of tests, they resort to converting test results into points or scores (we’ll talk about this in more detail in the next article). The change in the amount of points during repeated testing makes it possible to judge the progress of both an individual child and a group of children.

Physical education at school, 2007, No. 6


Introduction

Relevance. The problem of testing a person's physical fitness is one of the most developed in the theory and methodology of physical education. Behind last decades A huge and varied material has been accumulated: defining testing tasks; conditionality of test results by various factors; development of tests to assess individual conditioning and coordination abilities; test programs characterizing the physical fitness of children and adolescents from 11 to 15 years old, adopted in the Russian Federation, in other CIS countries and in many foreign countries.

Testing the motor qualities of schoolchildren is one of the most important and basic methods of pedagogical control.

It helps solve a number of complex pedagogical problems: identify levels of development of conditioning and coordination abilities, assess the quality of technical and tactical readiness. Based on the test results you can:

compare the preparedness of both individual students and entire groups living in different regions and countries;

conduct sports selection for practicing one or another sport, for participation in competitions;

exercise largely objective control over the education (training) of schoolchildren and young athletes;

identify the advantages and disadvantages of the means used, teaching methods and forms of organizing classes;

finally, to substantiate the norms (age-specific, individual) for the physical fitness of children and adolescents.

Along with scientific tasks in practice in different countries, testing tasks boil down to the following:

teach schoolchildren themselves to determine the level of their physical fitness and plan the necessary sets of physical exercises for themselves;

encourage students to further improve their physical condition (shape);

to know not so much the initial level of development of motor ability, but its change over a certain time;

encourage students who have achieved high results, but not so much for a high level, but for a planned increase in personal results.

In this work we will rely on those tests that are recommended in the “Comprehensive program of physical education for students in grades 1–11 of a comprehensive school” prepared by V.I. Lyakh and G.B. Maxson.

Purpose of the study: to substantiate the methodology for testing the physical qualities of primary school students.

Research hypothesis: the use of testing is an accurate, informative method for determining the development of physical qualities.

Object of study: testing as a method of pedagogical control.

Subject of research: testing the qualities of students.


Chapter 1. VIEWS ABOUT THE THEORY OF PHYSICAL FITNESS TESTS

1.1 Brief historical information about the theory of testing motor abilities

People have been interested in measuring human motor achievements for a long time. The first information about measuring the distance over which long jumps were made dates back to 664 BC. e. On XXIX Olympic Games in ancient times at Olympia, Chionis from Sparta jumped a distance of 52 feet, which is approximately 16.66 m. It is clear that we are talking about a repeated jump.

It is known that one of the founders of physical education, J. Ch. F. Guts-Muts, 1759-1839, measured the motor achievements of his students and made accurate records of their results. And for improving their achievements, he awarded them “prizes” - oak wreaths (G. Sorm, 1977). In the thirties of the XIX century. Eiselen, an employee of the famous German teacher F. L. Yahn, based on the measurements taken, compiled a table for determining achievements in jumping. As you can see, it contains three gradations (Table 1).

Table 1. - Results in jumps (in cm) for men (source: K. Mekota, P. Blahus, 1983)

elementary

Through the goat


Note that already in the middle of the 19th century. in Germany, when determining the length or height of a jump, it was recommended to take into account body parameters.

Accurate measurements of sporting achievements, including record ones, are carried out with mid-19th century, and regularly since 1896, since the Olympic Games of our time.

For quite a long time people have been trying to measure strength abilities. The first interesting information on this matter dates back to 1741, when, using simple instruments, it was possible to measure the strength of the wrestler Thomas Topham. He lifted a weight whose mass exceeded 830 kg (G. Sorm, 1977). The strength capabilities of the students were already measured by Guts-Muts and Jan, using simple strength meters. But the first dynamometer, the progenitor of the modern dynamometer, was designed by Reiniger in France in 1807. In the practice of physical education of gymnasium students in Paris, it was used by F. Amoros in 1821. In the 19th century. To measure strength, we also used lifting the body while hanging on a bar, bending and straightening the arms in support, and lifting weights.

The forerunners of modern batteries of tests to determine physical fitness are sports and gymnastic all-around events. The first is the ancient pentathlon, introduced into practice at the XVIII Olympic Games of antiquity in 708 BC. e. It included discus throwing, javelin throwing, jumping, running and wrestling. The decathlon as we know it was first included in the competition program at the III Olympic Games (St. Louis, USA, 1904), and the modern pentathlon at the V Olympic Games (Stockholm, Sweden, 1912). The composition of exercises in these competitions is heterogeneous; the athlete needs to demonstrate preparedness in different disciplines. So, he must be physically versatile.

Probably, taking this idea into account, around the same time (beginning of the 20th century), sets of exercises were introduced into practice for children, youth and adults, which comprehensively determined a person’s physical fitness. For the first time, such complex tests were introduced in Sweden (1906), then in Germany (1913) and even later - in Austria and the USSR (Russia) - the “Ready for Labor and Defense” complex (1931).

The predecessors of modern motor tests arose in the late 19th and early 20th centuries. In particular, D. A. Sargent introduced into practice Harvard University“strength test”, which, in addition to dynamometry and spirometry, included push-ups, raising and lowering the body. Since 1890, this test has been used in 15 US universities. The Frenchman G. Hebert created a test, the publication of which appeared in 1911. It includes 12 motor tasks: running at different distances, standing and running jump, throw, repeated lifting of a 40-kilogram projectile (weight ), swimming and diving.

Let us briefly look at sources of information that examine the results of scientific research by doctors and psychologists. Research by doctors until the end of the 19th century. were most often focused on changing external morphological data, as well as identifying asymmetry. Anthropometry used for these purposes kept pace with the use of dynamometry. Thus, the Belgian doctor A. Quetelet, having conducted extensive research, published a work in 1838, according to which the average results of the backbone (spine) of 25-year-old women and men are 53 and 82 kg, respectively. In 1884, the Italian A. Mosso studied muscle endurance. To do this, he used an ergograph, which allowed him to observe the development of fatigue with repeated bending of the finger.

Modern ergometry dates back to 1707. At that time, a device was created that made it possible to measure pulse per minute. The prototype of today's ergometer was designed by G. A. Him in 1858. Cycloergometers and treadmills were created later, in 1889-1913.

At the end of the 19th - beginning of the 20th centuries. Systematic research by psychologists begins. Reaction time is being studied, and tests are being developed to determine motor coordination and rhythm. The concept of “reaction time” was introduced into science by the Austrian physiologist S. Exner in 1873. Students of the founder of experimental psychology W. Wundt in the laboratory created in 1879 in Leipzig carried out extensive measurements of idle time And complex reactions. The first tests of motor coordination included tapping and different types of aiming. One of the first attempts to study aiming is the test of X. Frenkel, proposed by him in 1900. Its essence was to hold the index finger in all kinds of holes, rings, etc. This is the prototype of modern tests "for static and dynamic tremor."

Trying to determine musical talent, in 1915 S. E. Seashore examined the ability to rhythm.

The theory of testing dates back, however, to the end of the 19th and beginning of the 20th centuries. It was then that the foundations of mathematical statistics were laid, without which modern test theory cannot do. On this path, undoubted merits belong to the geneticist and anthropologist F. Galton, the mathematicians Pearson and U. Youle, and the mathematician-psychologist S. Spearman. It was these scientists who created a new branch of biology - biometrics, which is based on measurements and statistical methods, such as correlation, regression, etc. Created by Pearson (1901) and Spearman (1904) a complex mathematical-static method - factor analysis-- allowed the English scientist S. Burt to apply it in 1925 to the analysis of the results of motor tests of students London schools. As a result, physical abilities such as strength, speed, agility and endurance were identified. A factor called “general physical fitness” also stood out. Somewhat later, one of the most famous works of the American scientist McCloy (S.N. McCloy, 1934) was published - “Measurement of general motor abilities.” By the beginning of the 40s. scientists come to the conclusion about the complex structure of human motor abilities. Using different motor tests in combination with the use of parallel developed mathematical models (single and multivariate analysis), the testing theory has firmly incorporated the concepts of five motor abilities: strength, speed, coordination, endurance and flexibility.

Motor tests in former USSR were used to develop control standards for the “Ready for Labor and Defense” complex (1931). There is a well-known test of motor abilities (mainly coordination of movements), which was proposed for children and youth by N. I. Ozeretsky (1923). Work on measuring the motor abilities of children and youth appeared around the same time in Germany, Poland, Czechoslovakia and other countries.

Significant advances in the development of the theory of testing human physical fitness occurred in the late 50s and 60s. XX century The founder of this theory is most likely the American McCloy, who co-authored with M. D. Young in 1954 published the monograph “Tests and Measurement in Health Care and physical education”, which was subsequently relied upon by many authors of similar works.

The book “Structure and Measurement of Physical Abilities” by the famous American researcher E.A. was and still is of great theoretical importance. Fleishman (1964). The book not only reflects the theoretical and methodological issues of the problem of testing these abilities, but also outlines specific results, options for approaches, studies of reliability, information content (validity) of tests, and also presents important factual material on the factor structure of motor tests of various motor abilities.

The books by V.M. are of great importance for the theory of testing physical abilities. Zatsiorsky “Physical Qualities of an Athlete” (1966) and “Cybernetics, Mathematics, Sports” (1969).

Brief historical information on physical fitness testing in the former USSR can be found in the publications of E.Ya. Bondarevsky, V.V. Kudryavtsev, Yu.I. Sbrueva, V.G. Panaeva, B.G. Fadeeva, P.A. Vinogradova and others.

Conventionally, three stages of testing in the USSR (Russia) can be distinguished:

Stage 1 - 1920-1940 - a period of mass examinations in order to study the main indicators of physical development and the level of motor readiness, the emergence on this basis of the standards of the “Ready for Labor and Defense” complex.

Stage 2 -- 1946-1960 -- study of motor readiness depending on morphofunctional characteristics in order to create the prerequisites for a scientific and theoretical substantiation of their relationship.

Stage 3 - from 1961 to the present - a period of comprehensive studies of the physical condition of the population depending on the climatic and geographical characteristics of the country's regions.

Research carried out during this period shows that the indicators of physical development and motor fitness of people living in different regions of the country are determined by the influence of biological, climatic-geographical, socio-economic and other both constant and variable factors. According to the developed unified comprehensive program, consisting of four sections (physical fitness, physical development, functional state of the main body systems, sociological information), in 1981 a comprehensive survey of the physical condition of the population of different ages and genders in various regions of the USSR was carried out.

Somewhat later, our experts noted that the level of physical development and preparedness of a person has been studied for more than 100 years. However, despite the relatively large number of works in this direction, it is not possible to conduct a deep and comprehensive analysis of the data obtained, since the studies were carried out with different contingents, during different seasonal periods, using different methods, testing programs and mathematical and statistical processing of the information received .

In this regard, the main emphasis was placed on developing a methodology and organizing a unified data collection system, taking into account metrological and methodological requirements and creating a data bank on a computer.

In the mid-80s. last century, a massive all-Union survey was conducted of about 200,000 people from 6 to 60 years old, which confirmed the conclusions of the previous study.

From the very beginning of the emergence of scientific approaches to testing human physical fitness, researchers have sought to obtain answers to two main questions:

what tests should be selected to assess the level of development of a specific motor (physical) ability and the level of physical fitness of children, adolescents and adults;

How many tests are needed to obtain minimal and at the same time sufficient information about a person’s physical condition?

There are no common ideas in the world on these issues yet. At the same time, ideas about test programs (batteries) characterizing the physical fitness of children and adolescents from 6 to 17 years old, adopted in different countries, are becoming increasingly closer.

1.2 The concept of “test” and classification of motor (motor) tests

The term test translated from English means “sample, test”.

Tests are used to solve many scientific and practical problems. Among other methods of assessing a person’s physical condition (observation, expert assessments), the test method (in our case, motor or motor) is the main method used in sports metrology and other scientific disciplines (“the study of movements,” theory and methods of physical education) .

A test is a measurement or test taken to determine a person's ability or condition. There can be a lot of such measurements, including based on the use of a wide variety of physical exercises. However, not every physical exercise or test can be considered a test. Only those tests (samples) that meet special requirements can be used as tests:

the purpose of any test (or tests) must be defined;

A standardized test measurement methodology and testing procedure should be developed;

it is necessary to determine the reliability and information content of the tests;

test results can be presented in the appropriate evaluation system.

The system of using tests in accordance with the task, organization of conditions, performance of tests by subjects, evaluation and analysis of results is called testing, and the numerical value obtained during measurements is the result of testing (test). For example, the standing long jump is a test; jumping procedure and measurement of results - testing; jump length is the test result.

The tests used in physical education are based on motor actions (physical exercises, motor tasks). Such tests are called movement or motor tests.

Currently, there is no unified classification of motor tests. There is a known classification of tests according to their structure and their primary indications (Table 2).

As follows from the table, a distinction is made between single and complex tests. A single test is used to measure and evaluate one trait (coordination or conditioning ability). Since, as we see, the structure of each coordination or conditioning ability is complex, such a test, as a rule, evaluates only one component of such an ability (for example, the ability to balance, the speed of a simple reaction, the strength of the arm muscles).

Table 2. - Forms of tests and possibilities of their use (according to D.D. Blume, 1987)

Measurable ability

Sign of structure

Unit test

Elementary test containing one motor task

One ability or aspect (component) of an ability

One Test Objective, One Final Test Score

Balance test, tremometry, connectivity test, rhythm test

Practice test

One or more test tasks. One final test score

General Study Test

Test series

One test task with options or several tasks of increased difficulty

Test for assessing the ability to connect (communication)

Complex test

Complex test containing one task

Multiple abilities or aspects (components) of one ability

One test task, multiple final grades

Jump test

Reusable task test

Multiple test tasks running sequentially, multiple final evaluations

Reusable reaction test

Test profile

Multiple tests, multiple final assessments

Coordination task

Test battery

Multiple tests, one test score

Test battery for assessing movement learning ability


Using a training test, the ability for motor learning is assessed (based on the difference between the final and initial scores for a certain period of training in movement techniques).

A test series makes it possible to use the same test over a long period of time, when the ability to be measured improves significantly. At the same time, the test tasks consistently increase in difficulty. Unfortunately, this type of test is not yet sufficiently used both in science and in practice.

Using a complex test, several signs or components of different or the same ability are assessed, for example, jumping up from a place (with a wave of the arms, without a wave of the arms, to a given height). Based on this test, you can obtain information about the level of speed-strength abilities (based on the height of the jump), coordination abilities (based on the accuracy of differentiation of power efforts, the difference in the height of the jump with and without a swing of the arms).

A test profile consists of individual tests based on which either several different physical abilities are assessed (heterogeneous test profile), or different manifestations the same physical ability (homogeneous test profile). Test results can be presented in profile form, allowing comparison of individual and group results.

The test battery also consists of several individual tests, the results of which are combined into one final score, considered in one of the rating scales (see Chapter 2). As in the test profile, a distinction is made between homogeneous and heterogeneous batteries. The homogeneous battery, or homogeneous profile, finds application in assessing all components of a complex ability (eg, reaction ability). In this case, the results of individual tests must be closely interrelated (must correlate).

In tests of multiple tasks, subjects perform motor tasks sequentially and receive separate marks for each solution of a motor task. These assessments may be closely related to each other. Through appropriate statistical calculations, additional information about the abilities being assessed can be obtained. An example is the sequentially solved jump test tasks (Table 3).

Table 3. - Sequentially solved jump test tasks

Test objective

Result evaluation

Ability

Maximum jump without swinging arms

Jumping force

Maximum jump up with arm swing

Jumping power and connection ability

Maximum jump up with a wave of arms and a jump

Connectivity and jumping strength

10 jumps with arm swings at a distance equal to 2/3 of maximum height jump, as in task 2

Sum of deviations from a given mark

Ability to differentiate power parameters of movements

The difference between the results for solving one problem and two problems

Ability to connect (communication)

(according to D.D. Blume, 1987)

The definition of motor tests states that they assess motor abilities and partly motor skills. In the most general form, there are conditioning tests, coordination tests and tests for assessing motor abilities and skills (movement techniques). This systematization is, however, still too general. The classification of motor tests according to their primary indications follows from the systematization of physical (motor) abilities.

In this regard, there are:

1) condition tests:

to assess strength: maximum, speed, strength endurance;

to assess endurance;

to assess speed abilities;

to assess flexibility - active and passive;

2) coordination tests:

to assess coordination abilities related to individual independent groups of motor actions that measure special coordination abilities;

to assess specific coordination abilities - abilities for balance, orientation in space, response, differentiation of movement parameters, rhythm, restructuring of motor actions, coordination (communication),

vestibular stability, voluntary muscle relaxation.

The concept of “tests for assessing motor skills” is not discussed in this work. Examples of tests are given in Appendix 2.

Thus, each classification is a kind of guidelines for selecting (or creating) the type of tests that are more consistent with testing tasks.

1.3 Quality criteria for motor tests

The concept of “motor test” serves its purpose when the test satisfies the relevant requirements.

Tests that meet the requirements of reliability and information content are called good or authentic (reliable).

Reliability of a test refers to the degree of accuracy with which it assesses a specific motor ability, regardless of the requirements of the person assessing it. Reliability is the extent to which results are consistent when the same people are tested repeatedly under the same conditions; This is the stability or stability of an individual's test result when the control exercise is repeated. In other words, a child in a group of subjects, based on the results of repeated testing (for example, jumping performance, running time, throwing distance), consistently retains its ranking place.

The reliability of the test is determined using correlation-statistical analysis by calculating the reliability coefficient. In this case, various methods are used to judge the reliability of the test.

The stability of the test is based on the relationship between the first and second attempts, repeated after a certain time under the same conditions by the same experimenter. The method of repeated testing to determine reliability is called retest. The stability of the test depends on the type of test, the age and gender of the subjects, and the time interval between test and retest. For example, performance on conditioning tests or morphological traits over short time intervals is more stable than performance on coordination tests; Older children have more stable results than younger ones. A retest is usually carried out no later than a week later. At longer intervals (for example, after a month), the stability of even such tests as the 1000 m run or standing long jump becomes noticeably lower.

Test equivalence lies in the correlation of the test result with the results of other tests of the same type (for example, when it is necessary to choose which test more adequately reflects speed abilities: running 30, 50, 60 or 100 m).

The attitude towards equivalent (homogeneous) tests depends on many reasons. If it is necessary to increase the reliability of assessments or research conclusions, then it is advisable to use two or more equivalent tests. And if the task is to create a battery containing a minimum of tests, only one of the equivalent tests should be used. Such a battery, as noted, is heterogeneous, since the tests included in it measure different motor abilities. An example of a heterogeneous battery of tests is the 30 m run, pull-up, forward bend, and 1000 m run.

The reliability of tests is also determined by comparing the average scores of even and odd attempts included in the test. For example, the average accuracy of shots on target from 1, 3, 5, 7 and 9 attempts is compared with the average accuracy of shots from 2, 4, 6, 8 and 10 attempts. This method of assessing reliability is called the doubling or splitting method. It is used primarily when assessing coordination abilities and in the event that the number of attempts that form the test result is at least 6.

The objectivity (consistency) of a test is understood as the degree of consistency of results obtained on the same subjects by different experimenters (teachers, judges, experts).

To increase the objectivity of testing, it is necessary to comply with standard test conditions:

testing time, location, weather conditions;

unified material and hardware support;

psychophysiological factors (volume and intensity of load, motivation);

presentation of information (precise verbal statement of the test task, explanation and demonstration).

This is the so-called objectivity of the test. They also talk about interpretive objectivity, which concerns the degree of independence in the interpretation of test results by different experimenters.

In general, as experts note, the reliability of tests can be increased in various ways: more stringent standardization of testing (see above), an increase in the number of attempts, better motivation of subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, an increase in the number of equivalent tests .

There are no fixed values ​​for test reliability indicators. In most cases, the following recommendations are used: 0.95--0.99 - excellent reliability; 0.90--0.94 - good; 0.80--0.89 - acceptable; 0.70--0.79 - bad; 0.60-- 0.69 - doubtful for individual assessments, the test is only suitable for characterizing a group of subjects.

The validity of a test is the degree of accuracy with which it measures the motor ability or skill being assessed. In foreign (and domestic) literature, instead of the word “informativeness”, the term “validity” is used (from the English validity - validity, reality, legality). In fact, when talking about information content, the researcher answers two questions: what does this particular test (battery of tests) measure and what is the degree of accuracy of the measurement?

There are several types of validity: logical (substantive), empirical (based on experimental data) and predictive (2)

Important additional test criteria are standardization, comparability and efficiency.

The essence of standardization is that, based on test results, it is possible to create standards that are of particular importance for practice.

Test comparability is the ability to compare results obtained from one or more forms of parallel (homogeneous) tests. In practical terms, the use of comparable motor tests reduces the likelihood that, as a result of regular use of the same test, the degree of skill is assessed not only and not so much as the level of ability. At the same time, comparable test results increase the reliability of the conclusions.

The essence of economy as a criterion for the quality of a test is that conducting the test does not require a long time, large material costs and the participation of many assistants.


Conclusion

The predecessors of modern motor tests arose in the late 19th and early 20th centuries. Since 1920, mass examinations have been carried out in our country to study the main indicators of physical development and the level of motor readiness. Based on this data, the standards of the “Ready for Labor and Defense” complex were developed.

The testing theory has firmly incorporated the concepts of five motor abilities: strength, speed, coordination, endurance and flexibility. A number of different test batteries have been developed to evaluate them.

Among the methods of assessing a person’s physical condition, the test method is the main one. There are single and complex tests. Also, in connection with the systematization of physical (motor) abilities, tests are classified into conditioning and coordination.

All tests must meet specific requirements. The main criteria include: reliability, stability, equivalence, objectivity, information content (validity). Additional criteria include: standardization, comparability and efficiency.

Therefore, when choosing certain tests, all these requirements must be met. To increase the objectivity of tests, one should adhere to more stringent standardization of testing, an increase in the number of attempts, better motivation of subjects, an increase in the number of evaluators (judges, experts), an increase in the consistency of their opinions, and an increase in the number of equivalent tests.


Chapter 2. Objectives, methods and organization of the study

2.1 Research objectives:

1. Study information about testing theory according to literary sources;

2. Analyze the methodology for testing physical qualities;

3. Compare the indicators of motor readiness of students in grades 7a and 7b.

2.2 Research methods:

1. Analysis and synthesis of literary sources.

Carried out throughout the study. Solving these problems on theoretical level carried out by studying literature on: theory and methodology of physical education and sports, education of physical qualities, sports metrology. 20 literary sources were analyzed.

2. Verbal influence.

Instructions were provided on the sequence of performing motor tests and a motivational conversation to set the mood for achieving a better result.

3. Testing physical qualities.

30 meter run (from a high start),

shuttle run 3 x 10 meters,

standing long jump,

6 minute run (m),

forward bend from a sitting position (cm),

pull-ups on the bar (girls on low).

4. Methods of mathematical statistics.

Used to carry out calculations that were used in a comparative analysis of students in grades 7a and 7b.

2.3 Organization of the study

At the first stage, in April 2009, an analysis of scientific and methodological literature was carried out:

· studying the content of physical education programs for general education students

REPORT

student 137 gr. Ivanova I.

on testing the effectiveness of training methods
using methods of mathematical statistics

Sections of the report are drawn up in accordance with the samples given in this manual at the end of each stage of the game. The completed reports are stored at the Department of Biomechanics until consultation before the exam. Students who have not reported for the work done and have not submitted the notebook with the report to the teacher are not allowed to take the sports metrology exam.


Stage I business games
Control and measurement in sports

Target:

1. Familiarize yourself with the theoretical foundations of control and measurement in sports and physical education.

2. Acquire skills in measuring speed performance indicators in athletes.

1. Physical control
education and sports

Physical education and sports training is not a spontaneous, but a controlled process. At each moment of time, a person is in a certain physical state, which is determined mainly by health (compliance of vital signs with the norm, the degree of resistance of the body to adverse sudden influences), physique and the state of physical functions.

It is advisable to manage a person’s physical condition by changing it in the right direction. This management is carried out by means of physical education and sports, which, in particular, include physical exercises.

It only seems that the teacher (or coach) controls the physical state, influencing the behavior of the athlete, i.e. offering certain physical exercises, as well as monitoring the correctness of their implementation and the results obtained. In reality, the athlete’s behavior is controlled not by the coach, but by the athlete himself. During sports training, the self-governing system (the human body) is influenced. Individual differences in the condition of athletes do not provide confidence that the same impact will cause the same response. Therefore, the question of feedback is relevant: information about the athlete’s condition received by the coach during control of the training process.

Control in physical education and sports is based on measuring indicators, selecting the most significant ones and their mathematical processing.

Management of the educational and training process includes three stages:

1) collection of information;

2) its analysis;

3) decision making (planning).

Information collection is usually carried out during comprehensive control, the objects of which are:

1) competitive activity;

2) training loads;

3) the athlete’s condition.



There are (V.A. Zaporozhanov) three types of athlete’s states depending on the duration of the interval required for the transition from one state to another.

1. Staged(permanent) condition. Saved relatively long – weeks or months. Comprehensive characteristics the stage state of an athlete, reflecting his ability to demonstrate sports achievements, is called preparedness, and the state of optimal (best for a given training cycle) readiness is called sports uniform. Obviously, a state of athletic fitness cannot be achieved or lost within one or several days.

2. Current state. Changes under the influence of one or several classes. Often the consequences of participation in competitions or training work performed in one of the classes drag on for several days. In this case, the athlete usually notes phenomena of both an unfavorable nature (for example, muscle pain) and positive (for example, a state of increased performance). Such changes are called delayed training effect.

The current state of the athlete determines the nature of the next training sessions and the magnitude of the loads in them. A special case of the current state, characterized by readiness to perform a competitive exercise in the coming days with a result close to the maximum, is called current readiness.

3. Operational state. Changes under the influence one-time execution physical exercise and is temporary (for example, fatigue caused by running a distance once; a temporary increase in performance after warming up). The athlete’s operational state changes during the training session and should be taken into account when planning rest intervals between approaches, repeated races, when deciding on the advisability of additional warm-up, etc. A special case of an operational state, characterized by immediate readiness to perform a competitive exercise with a result close to the maximum, is called operational readiness.

In accordance with the above classification, there are three main types of monitoring the athlete’s condition:

1) stage control. Its purpose is to assess the stage condition (readiness) of the athlete;

2) current control. Its main task is to determine everyday (current) fluctuations in the athlete’s condition;

3) operational control. Its purpose is a rapid assessment of the athlete’s condition at the moment.

A measurement or test performed to determine the condition or ability of an athlete is called test. The measurement or test procedure is called testing.

Any test involves measurement. But not every measurement serves as a test. Only those that satisfy the following metrological requirements can be used as tests: requirements:

2) standardization;

3) the presence of a rating system;

4) reliability and information content (quality factor) of tests;

5) type of control (stage-by-stage, current or operational).

A test based on motor tasks is called motor. There are three groups of motor tests:

1. Control exercises, in which the athlete is tasked to show maximum results. The test result is a motor achievement. For example, the time it takes an athlete to run a distance of 100 m.

2. Standard functional tests, during which the task, the same for everyone, is dosed either according to the amount of work performed, or according to the magnitude of physiological changes. The test result is physiological or biochemical indicators during standard work or motor achievements with a standard amount of physiological changes. For example, the percentage increase in heart rate after 20 squats or the speed at which an athlete runs with a fixed heart rate of 160 beats per minute.

3. Maximum functional tests, during which the athlete must show maximum results. The test result is physiological or biochemical indicators at maximum work. For example, maximum oxygen consumption or maximum oxygen debt.

High quality testing requires knowledge of measurement theory.


Key issues: Test as a measurement tool. Basic testing theories. Functions, capabilities and limitations of testing. Application of tests in personnel assessment. Advantages and disadvantages of using tests. Forms and types of test tasks. Task construction technology. Assessment of test quality. Reliability and validity. Software for test development. 2




Test as a measurement tool Basic concepts in testology: measurement, test, content and form of tasks, reliability and validity of measurement results. In addition, testology uses such concepts of statistical science as sample and general population, average indicators, variation, correlation, regression, etc. 4




A test task is a didactically and technologically effective unit of control material, a part of the test that meets the requirements of substantive purity of content (or one-dimensionality), substantive and logical correctness, correctness of form, and acceptability of the geometric image of the task. 6




The traditional test is a standardized method for diagnosing the level and structure of preparedness. In such a test, all subjects answer the same tasks, at the same time, under the same conditions and with the same rules for evaluating answers. To achieve a testing goal, countless tests can be created, and all of them can be relevant to achieving the goal. 8


Professionogram (from Latin: Professio specialty + Gramma record) is a system of characteristics that describe a particular profession, and also includes a list of norms and requirements imposed by this profession or specialty on an employee. In particular, a professional profile may include a list psychological characteristics, which representatives of specific professional groups must comply with. 9


Basic testing theories The first scientific works on test theory appeared at the beginning of the twentieth century, at the intersection of psychology, sociology, pedagogy and other so-called behavioral sciences. Foreign psychologists call this science psychometrics (Psychometrika), and teachers call it pedagogical measurement (Educational measurement). Unclouded by ideology and politics, the interpretation of the name “testology” is simple and transparent: the science of tests. 10


The first stage is prehistory - from antiquity to the end of the 19th century, when pre-scientific forms of control of knowledge and abilities were widespread; the second period, classical, lasted from the early 20s to the end of the 60s, during which the classical theory of tests was created; the third period - technological - began in the 70s - the time of development of methods of adaptive testing and training, methodology for the effective development of tests and test items for parametric assessment of subjects according to the measured latent quality. eleven


Functions, capabilities and limitations of testing The tests used in selection are designed to obtain psychological picture candidate, assess his abilities, as well as professional knowledge and skills. Tests allow you to compare candidates with each other or with standards, that is, the ideal candidate. Tests are used to measure the qualities a person needs to perform a job effectively. Some tests are designed so that the employer administers the test and calculates the results. Others require the services of experienced consultants to ensure proper application. 12


Limitations of the use of tests are related to their expensive administration; - with suitability for assessing human abilities; - tests are more successful in predicting success in work that contains short-term professional tasks, and are not very convenient in cases where tasks solved at work take several days or weeks. 13








2. The terminology used should be tailored to the specific target audience. It is also necessary to exclude redundant articles or articles that include two or more questions, as they sometimes confuse the respondent and make interpretation difficult. 17


3. To satisfy all these requirements, you should go through the entire question bank article by article and analyze what purpose each one serves. For example, if a test is being developed to measure the analytical skills of trainee accountants, it is worth considering what the term " analytic skills" 18




5. Once questions and scoring formats have been selected, they should be converted into a user-friendly format, with clearly written instructions and example questions; so that candidates taking the test fully understand what is required of them. 20


6. Very often at this stage of development, more questions are included in the test than necessary. By some estimates, three times as much as will remain in the final test or measurement system. The initial measure would then be to test the test being developed on a relatively broad sample of existing workers to ensure that all questions are easily understood. 21


7. Knowledge tests usually start with simple questions, gradually becoming more complex towards the end. When tests are intended to measure social attitudes And personal characteristics It may be helpful to alternate between negatively and positively worded articles to avoid ill-conceived responses. 22


8. The final step involves administering the test to a broadly representative sample to establish standards of performance, reliability, and validity before using it as a selection tool. In addition, it is necessary to determine the validity of the test to ensure that it does not discriminate against any subgroups of the population (eg, ethnic differences). 23


Assessing the quality of the test For selection methods to be sufficiently effective, they must be reliable, valid and reliable. The reliability of the selection method is characterized by its non-susceptibility systematic errors when measured, that is, its consistency under different conditions. 24


In practice, reliability in making judgments is achieved by comparing the results of two or more similar tests conducted in different days. Another way to increase reliability is to compare the results of several alternative selection methods (for example, a test and an interview). If the results are similar or the same, they can be considered correct. 25


Reliability means that the measurements taken will give the same result as the previous ones, that is, the assessment results are not influenced by third-party factors. Validity means that the method measures exactly what it is intended to measure. The maximum possible accuracy of information obtained by specially developed methods in scientific research is limited by technical factors and does not exceed 0.8. 26


In personnel selection practice, it is noted that reliability various methods assessments are located in the intervals: 0.1 – 0.2 – traditional interview; 0.2 – 0.3 – recommendations; 0.3 – 0.5 – professional tests; 0.5 – 0.6 – structured interview, competency-based interview; 0.5 – 0.7 – cognitive and personality tests; 0.6 – 0.7 – competency-based approach (assessment center). 27


Validity refers to the degree of accuracy with which this result, a method or criterion "predicts" the future performance of the person being tested. Validity of methods refers to the conclusions drawn from a particular procedure, not to the procedure itself. That is, the selection method itself may be reliable, but may not correspond to a specific task: it may not measure what is required in this case. 28


Software for test development In domestic practice, various comprehensive programs with the “Psychodiagnostics” module are presented, for example, the “1 C: Salary and Personnel Management 8.0” program with the “Psychodiagnostics” module, developed jointly with a group of teachers from the Department of Personality Psychology and General Psychology of the Faculty of Psychology Moscow State University named after M.V. Lomonosov under the guidance of Doctor of Psychiatry. sciences, prof. A. N. Guseva. A training simulator for developing personnel assessment systems and adapting test methods at the Faculty of Psychology of TSU, also developed on the basis of “1 C: Enterprise 8.2” by Personnel Soft. 29


Literature: Selection and recruitment: testing and assessment technologies / Dominic Cooper, Ivan T. Robertson, Gordon Tinline. – M., publishing house “Vershina”, – 156 p. Psychological support of professional activity: theory and practice / Ed. Prof. G. S. Nikiforova. – St. Petersburg: Speech, – 816 p. thirty

The first component, test theory, contains a description of statistical models for processing diagnostic data. It contains models for analyzing answers in test tasks and models for calculating total test results. Mullenberg (1980, 1990) called this “psychometrics.” Classical test theory, modern test theory (or the Item Response Analysis model - IRT), and the


item samples constitute the three most important types of test theory models. The subject of consideration of psychodiagnostics is the first two models.

Classical test theory. Most intellectual and personality tests have been developed on the basis of this theory. The central concept of this theory is the concept of “reliability”. Reliability refers to the consistency of results across repeated assessments. In reference books, this concept is usually presented very briefly, and then a detailed description of the apparatus of mathematical statistics is given. In this introductory chapter we will introduce condensed description the basic meaning of the noted concept. In classical test theory, reliability refers to the repeatability of the results of several measurement procedures (mainly measurements using tests). The concept of reliability involves the calculation of measurement error. The results obtained during the testing process can be presented as the sum of the true result and measurement error:

Xi = Ti+ Ej

Where Xi is an assessment of the results obtained, Ti is the true result, and Ej- measurement error.

The assessment of the results obtained is, as a rule, the number of correct answers to the test tasks. A true outcome can be thought of as a true evaluation in the Platonic sense (Gulliksen, 1950). The concept of expected results is widespread, i.e. ideas about scores that can be obtained as a result of a large number of repetitions of measurement procedures (Lord & Novich, 1968). But carrying out the same assessment procedure with one person is not possible. Therefore, it is necessary to look for other options to solve the problem (Witlman, 1988).

This concept makes certain assumptions about true results and measurement errors. The latter are taken as an independent factor, which, of course, is quite educated guess, since random fluctuations in the results do not produce covariances: r EE =0.

It is assumed that there is no correlation between true scores and measurement errors: rEE =0.


The total error is 0, because The arithmetic mean is taken as the true estimate:

These assumptions ultimately lead us to the well-known definition of reliability as the ratio of the true result to total variance or the expression: 1 minus the ratio, the numerator of which is the measurement error, and the denominator is the total variance:


, OR

From this formula for determining reliability we obtain that the error variance S 2 (E) equal to the total variance in the number of cases (1 – r XX "); thus, the standard error of measurement is determined by the formula:

After a theoretical justification of reliability and its derivatives, it is necessary to determine the reliability index of a particular test. There are practical procedures for assessing test reliability, such as using interchangeable forms (parallel tests), splitting items into two parts, retesting, and measuring internal consistency. Each reference book contains indices of consistency of test results:

r XX ’ =r(x 1 , x 2)

Where r XX ' - stability coefficient, and x 1 And x 2 - results of two measurements.

The concept of reliability of interchangeable forms was introduced and developed by Gulliksen (1950). This procedure is quite labor-intensive, since it is associated with the need to create a parallel series of tasks

r XX ’ =r(x 1 , x 2)

Where r XX ' - equivalence coefficient, and x 1 And x 2 - two parallel tests.

The next procedure - splitting the main dough into two parts A and B - is easier to use. The scores obtained from both parts of the test are correlated. Using the Spearman-Brown formula, the reliability of the test as a whole is assessed:

where A and B are two parallel parts of the test.

The next method is to determine the internal consistency of test tasks. This method is based on determining the covariances of individual tasks. Sg is the variance of a randomly selected task, and Sgh is the covariance of two randomly selected tasks. The most commonly used coefficient to determine internal consistency is Cronbach's alpha. The formula is also used KR20 and λ-2(lambda-2).

The classical concept of reliability defines measurement errors that arise both during testing and during observations. The sources of these errors are different: these can be personal characteristics, characteristics of the testing conditions, and the test tasks themselves. Exist specific methods error calculations. We know that our observations may turn out to be erroneous, our methodological tools are imperfect, just as people themselves are imperfect. (How not to remember Shakespeare: “Untrustworthy are you, whose name is man”). The fact that in classical test theory measurement errors are made explicit and explained is an important positive point.

Classical test theory has a number of significant features that can also be considered as its disadvantages. Some of these characteristics are noted in reference books, but their importance (from an everyday point of view) is not often emphasized, nor is it noted that from a theoretical or methodological point of view they should be considered shortcomings.

First. Classical test theory and the concept of reliability are focused on calculating total test scores, which are the result of adding up the scores obtained in individual tasks. Yes, when working


Second. The reliability coefficient involves assessing the amount of dispersion of the measured indicators. It follows that the reliability coefficient will be lower if (other indicators being equal) the sample is more homogeneous. There is no single coefficient of internal consistency of test items; this coefficient is always “contextual”. Crocker and Algina (1986), for example, propose a special “homogeneous sample correction” formula designed for the highest and lowest scores obtained by test takers. It is important for the diagnostician to know the characteristics of variation in the sample population, otherwise he will not be able to use the internal consistency coefficients specified in the manual for this test.

Third. The phenomenon of reduction to an arithmetic mean is a logical consequence of the classical concept of reliability. If a test's score fluctuates (i.e., it is not reliable enough), then it is possible that when the procedure is repeated, low-scoring subjects will score higher. high scores, and vice versa, subjects with high scores are low. This artifact of the measurement procedure should not be mistaken for true change or manifestation of developmental processes. But at the same time, it is not easy to differentiate between them, because... the possibility of change during development can never be ruled out. To be completely sure, a comparison with a control group is necessary.

The fourth characteristic of tests developed in accordance with the principles of classical theory is the presence of normative data. Knowledge of test norms allows the researcher to adequately interpret the test takers’ results. Outside of norms, test scores are meaningless. Developing test standards is a fairly expensive undertaking, since the psychologist must obtain test results from a representative sample.

2 Ya ter Laak

If we talk about the shortcomings of the classical concept of reliability, then it is appropriate to cite the statement of Siytsma (1992, pp. 123-125). He notes that the first and main assumption of classical test theory is that test scores follow the interval principle. However, there are no studies to support this assumption. In essence, it is “measurement according to an arbitrarily established rule.” This feature puts classical test theory at a disadvantage compared to attitude measurement scales and, of course, compared to modern test theory. Many methods of data analysis (variance analysis, regression analysis, correlation and factor analysis) are based on the assumption of the existence of an interval scale. However, it does not have a solid basis. Considering the scale of true results as a scale of values ​​of psychological characteristics (for example, arithmetic abilities, intelligence, neuroticism) can only be assumed.

The second remark concerns the fact that the test results are not absolute indicators of one or another psychological characteristic of the person being tested; they should be considered only as the results of one or another test. Two tests may purport to examine the same psychological characteristics (eg, intelligence, verbal ability, extraversion), but this does not mean that the two tests are equivalent or have the same capabilities. Comparing the performance of two people tested with different tests is incorrect. The same applies to the same test taker completing two different tests. The third point concerns the assumption that the standard error of measurement is the same for any level of individual ability being measured. However, there is no empirical test of this assumption. For example, there is no guarantee that the person being tested has good mathematical abilities will score high on a relatively simple arithmetic test. In this case, a person with low or average abilities is more likely to receive a high rating.

Within the framework of modern test theory or the theory of answer analysis, test items contain a description of a large


number of models of possible answers from respondents. These models differ in the assumptions underlying them, as well as in the requirements for the data obtained. The Rasch model is often considered synonymous with theories of item response analysis (1RT). In fact, this is only one of the models. The formula presented in it for describing the characteristic curve of the g task is as follows:

Where g- separate task test; exp- exponential function (nonlinear dependence); δ (“delta”) - the level of difficulty of the test.

Other test items, e.g. h, also obtain their own characteristic curves. Condition fulfilled δ h >δ g (g means that h- more difficult task. Therefore, for any value of the indicator Θ (“theta” - latent properties of test takers’ abilities) probability of successful completion of the task h less. This model is called strict because it is obvious that with a low degree of trait expression, the probability of completing the task is close to zero. There is no room for guessing or guesswork in this model. For multiple-choice tasks, there is no need to make assumptions about the likelihood of success. In addition, this model is strict in the sense that all test items must have the same discriminative ability (high discriminativeness is reflected in the steepness of the curve; here it is possible to construct the Guttman scale, according to which at each point of the characteristic curve the probability of completing the task varies from O to 1). Because of this condition, not all items can be included in tests based on the Rasch model.

There are several variants of this model (eg, Birnbaura, 1968; See Lord & Novik). It allows the existence of tasks with different discriminative

ability.

The Dutch researcher Mokken (1971) developed two models for analyzing test item responses that are less stringent than the Rasch model and therefore perhaps more realistic. As a basic condition

Via Mokken puts forward the proposition that the characteristic curve of a task should follow monotonously, without breaks. All test tasks are aimed at studying the same psychological characteristic, which should be measured V. Any form of this dependence is allowed until it is interrupted. Therefore, the shape of the characteristic curve is not determined by any specific function. This “freedom” allows you to use more test items, and the level of assessment is no higher than usual.

The methodology of Item Response Test (IRT) models differs from that of most experimental and correlation studies. The mathematical model is designed to study behavioral, cognitive, emotional characteristics, as well as developmental phenomena. These phenomena in question are often limited to item responses, leading Mellenberg (1990) to call IRT a “mini-behavior theory.” The results of the study can, to a certain extent, be presented as consistency curves, especially in cases where theoretical understanding of the characteristics being studied is lacking. Until now, we have at our disposal only a few intelligence, aptitude and personality tests created on the basis of numerous models of IRT theory. Variants of the Rasch model are more often used in the development of achievement tests (Verhelst, 1993), while Mokken models are more suitable for developmental phenomena (see also Chapter 6).

The test taker's response to test items is the basic unit of IRT models. The type of response is determined by the degree of expression of the characteristic being studied in a person. Such a characteristic could be, for example, arithmetic or spatial abilities. In most cases, this is one or another aspect of intelligence, characteristics of achievements, or personality traits. It is assumed that there is a nonlinear relationship between the position of a given person in a certain range of the characteristics being studied and the probability of successfully completing a particular task. The nonlinearity of this dependence is in a certain sense intuitive. Famous phrases “Every beginning is difficult” (slow non-


linear start) and “Becoming a saint is not so easy” mean that further improvement after reaching a certain level is difficult. The curve slowly approaches, but almost never reaches a 100% success rate.

Some models rather contradict our intuitive understanding. Let's take this example. A person with a voluntary characteristic intensity index of 1.5 has a 60 percent probability of success in completing the task. This contradicts our intuitive understanding of such a situation, because you can either successfully cope with the task or not cope with it at all. Let's take this example: a person tries 100 times to reach a height of 1m 50 cm. Success accompanies him 60 times, i.e. it has a 60 percent success rate.

To assess the severity of a characteristic, at least two tasks are required. The Rasch model involves determining the severity of characteristics regardless of the difficulty of the task. This also goes against our intuition: suppose a person has an 80% chance of jumping above 1.30 m. If this is the case, then according to the task characteristic curve he has a 60% chance of jumping above 1.50 m and a 40% chance of jumping above 1.50 m. probability of jumping above 1.70 m. Therefore, regardless of the value of the independent variable (height), it is possible to estimate a person's ability to jump high.

There are about 50 IRT models (Goldstein & Wood, 1989). There are many nonlinear functions that describe (explain) the probability of success in completing a task or group of tasks. The requirements and limitations of these models are different, and these differences can be revealed by comparing the Rasch model and the Mokken scale. The requirements of these models include:

1) the need to determine the characteristic under study and assess the person’s position within the range of this trait;

2) assessment of the sequence of tasks;

3) checking specific models. In psychometrics, many procedures have been developed to test the model.

Some reference books discuss IRT theory as a form of test item analysis (see, for example,

Croker & Algina, J 986). One could, however, argue that IRT is a “mini-theory about mini-behavior.” Proponents of the IRT theory note that if intermediate-level concepts (models) are imperfect, then what can be said about more complex constructs in psychology?

Classical and modern test theories. People can't help but compare things that look almost the same. (Perhaps the everyday equivalent of psychometry consists mainly of comparing people on significant characteristics and choosing between them.) Each of the theories presented—the theory of measurement of estimation errors and the mathematical model of test responses—has its supporters (Goldstein & Wood, 1986).

IRT models have not been accused of being "rules-based assessments" like classical test theory. The IRT model is focused on the analysis of the characteristics being assessed. Personality characteristics and task characteristics are assessed using scales (ordinal or interval). Moreover, it is possible to compare the performance of different tests aimed at studying similar characteristics. Finally, reliability is not the same for each value on a scale, and average scores are generally more reliable than scores at the beginning and end of the scale. Thus, IRT models seem to be more theoretically superior. There are also differences in the practical use of modern test theory and classical theory (Sijstma, 1992, pp. 127-130). Modern test theory is more complex compared to the classical one, so it is less often used by non-specialists. Moreover, IRT has specific task requirements. This means that items must be excluded from the test if they do not meet the requirements of the model. This rule further applies to those tasks that were part of widely used tests built on the principles of classical theory. The test becomes shorter and, therefore, its reliability decreases.

IRT provides mathematical models to study real-world phenomena. Models should help us understand key aspects of these phenomena. However, here lies the main theoretical question. Models can be considered


as an approach to studying the complex reality in which we live. But model and reality are not the same thing. According to the pessimistic view, it is possible to model only isolated (and not the most interesting) types of behavior. You can also come across the statement that reality cannot be modeled at all, because it obeys more than just cause-and-effect laws. IN best case scenario it is possible to model individual (ideal) behavioral phenomena. There is another, more optimistic, view of the possibilities of modeling. The above position blocks the possibility of deep comprehension of the nature of the phenomena of human behavior. The application of one model or another raises some general, fundamental questions. In our opinion, there is no doubt that IRT is a concept theoretically and technically superior to classical test theory.

The practical purpose of tests, no matter on what theoretical basis they are created, is to determine significant criteria and establishing on their basis the characteristics of certain psychological constructs. Does the IRT model have advantages in this regard as well? It is possible that tests based on this model do not predict more accurately than tests based on classical theory, and it is possible that their contribution to the development of psychological constructs is not more significant. Diagnosticians prefer criteria that directly relate to to an individual, institution or community. A model that is more scientifically advanced “ipso facto”* does not define a more appropriate criterion and is to a certain extent limited in explaining scientific constructs. It is obvious that the development of tests based on classical theory will continue, but at the same time, new IRT models will be created that extend to the study more psychological phenomena.

In classical test theory, the concepts of “reliability” and “validity” are distinguished. Test results must be reliable, i.e. the results of the initial and retesting should be consistent. Besides,

* ipso facto(varnish) - by itself (approx. transl.).

the results should be free (as far as possible) from estimation errors. Validity is one of the requirements for the results obtained. In this case, reliability is considered as a necessary, but not yet sufficient condition for the validity of the test.

The concept of validity suggests that the findings relate to something important in practical or theoretical terms. Conclusions drawn from test scores must be valid. Most often they talk about two types of validity: predictive (criterion) and constructive. There are also other types of validity (see Chapter 3). In addition, validity can be determined in the case of quasi-experiments (Cook & Campbell, 1976, Cook & Shadish, 1994). However, the main type of validity is still predictive validity, which is understood as the ability to predict something significant about future behavior from a test result, as well as the possibility of a deeper understanding of this or that psychological properties or quality.

The types of validity presented are discussed in each reference book and are accompanied by a description of methods for analyzing test validity. Factor analysis is more appropriate to determine construct validity, and linear regression equations are used to analyze predictive validity. Certain characteristics (academic performance, effectiveness of therapy) can be predicted on the basis of one or more indicators obtained when working with intellectual or personality tests. Data processing techniques such as correlation, regression, analysis of variance, analysis of partial correlations and variances are used to determine the predictive validity of a test.

Content validity is also often described. It is assumed that all tasks and tasks of the test must belong to a specific area (mental properties, behavior, etc.). The concept of content validity characterizes the correspondence of each test item to the measured domain. Content validity is sometimes viewed as part of reliability or "generalizability" (Cronbach, Gleser, Nanda & Rajaratnam, 1972). However, when


When choosing tasks for achievement tests in a specific subject area, it is also important to pay attention to the rules for including tasks in the test.

In classical test theory, reliability and validity are treated relatively independently of each other. But there is another understanding of the relationship between these concepts. Modern test theory is based on the use of models. The parameters are estimated within a certain model. If a task does not meet the requirements of the model, then within the framework of this model it is considered invalid. Construct validation is part of the verification of the model itself. This validation refers primarily to testing the existence of a unidimensional latent trait of interest with known scale characteristics. Scale scores can certainly be used to determine appropriate measures, and they can be correlated with measures of other constructs to gather information about the convergent and divergent validity of the construct.

Psychodiagnostics is similar to language, described as the unity of four components presented at three levels. The first component, test theory, is analogous to syntax, the grammar of a language. Generative grammar is, on the one hand, an ingenious model, and on the other, a system that obeys rules. Using these rules based on simple affirmative proposals complex ones are built. At the same time, however, this model leaves aside a description of how the communication process is organized (what is transmitted and what is perceived), and for what purposes it is carried out. Understanding this requires additional knowledge. The same can be said about test theory: it is necessary in psychodiagnostics, but it is not able to explain what a psychodiagnostician does and what his goals are.

1.3.2. Psychological theories and psychological constructs

Psychodiagnostics is always a diagnosis of something specific: personal characteristics, behavior, thinking, emotions. The tests are designed to assess individual differences. There are several concepts

individual differences, each of which has its own distinctive characteristics. If it is recognized that psychodiagnostics is not limited only to the assessment of individual differences, then other theories become essential for psychodiagnostics. An example is the assessment of differences in mental development processes and differences in the social environment. Although the assessment of individual differences is not an indispensable attribute of psychodiagnostics, there are nevertheless certain traditions of research in this area. Psychodiagnostics began with the assessment of differences in intelligence. The main purpose of the tests was “to determine the hereditary transmission of genius” (Gallon) or the selection of children for training (Binet, Simon). The measurement of IQ received theoretical understanding and practical development in the works of Spearman (Great Britain) and Thurstone (USA). Raymond B. Cattell did a similar thing to assess personality characteristics. Psychodiagnostics becomes inextricably linked with theories and ideas about individual differences in achievements (assessment of maximum capabilities) and forms of behavior (level of typical functioning). This tradition continues to be effective today. IN textbooks in psychodiagnostics, differences in the social environment are much less often assessed compared to consideration of the characteristics of the development processes themselves. There is no reasonable explanation for this. On the one hand, diagnostics is not limited to certain theories and concepts. On the other hand, it needs theories, since it is in them that the content being diagnosed is determined (i.e., “what” is being diagnosed). So, for example, intelligence can be considered as general characteristics, and as the basis for many independent abilities. If psychodiagnostics tries to “escape” this or that theory, then the basis of the psychodiagnostic process becomes ideas common sense. Research uses various methods of data analysis, and the general logic of research determines the choice of one or another mathematical model and determines the structure of the psychological concepts used. Such methods of mathematical statistics


ki, such as analysis of variance, regression analysis, factor analysis, and calculation of correlations, assume the existence of linear dependencies. If these methods are used incorrectly, they “introduce” their structure into the data obtained and the constructs used.

Ideas about differences in the social environment and personality development had almost no impact on psychodiagnostics. Textbooks (see, for example, Murphy & Davidshofer, 1988) review classical test theory and discuss relevant statistical methods, describing famous tests, issues of using psychodiagnostics in practice are considered: in management psychology, in personnel selection, in assessing the psychological characteristics of a person.

Theories of individual differences (as well as ideas about differences between social environments and mental development) are analogous to the study of the semantics of language. This is the study of essence, content, and meaning. Meanings are structured in a certain way (similar to psychological constructs), for example, by similarity or contrast (analogy, convergence, divergence).

1.3.3. Psychological tests and other methodological tools

The third component of the proposed scheme is tests, procedures and methodological means with the help of which information about personality characteristics is collected. Drene and Sijtsma (1990, p. 31) define tests as follows: “A psychological test is considered as a classification according to a certain system or as a measurement procedure that allows a certain judgment to be made about one or more empirically isolated or theoretically based characteristics of a specific aspect of human behavior (for within the test situation). In this case, the response of respondents to a certain number of carefully selected stimuli is examined, and the responses obtained are compared with test norms.”

Diagnostics requires tests and techniques to collect reliable, accurate and valid information about features

and characteristic personality traits, about human thinking, emotions and behavior. In addition to developing test procedures, this component also includes next questions: how tests are created, how tasks are formulated and selected, how the testing process proceeds, what are the requirements for testing conditions, how measurement errors are taken into account, how test results are calculated and interpreted.

The test development process distinguishes between rational and empirical strategies. The application of a rational strategy begins with defining basic concepts (for example, the concept of intelligence, extraversion), and test tasks are formulated in accordance with these concepts. An example of such a strategy is the concept of aspect analysis (the facet theory) of Guttman (1957, 1968, 1978). First, various aspects of the main constructs are determined, then tasks and assignments are selected in such a way that each of these aspects is taken into account. The second strategy is that tasks are selected on an empirical basis. For example, if a researcher were trying to create a vocational interest test that would differentiate doctors from engineers, this would be the procedure. Both groups of respondents must answer all test items, and those items for which statistically significant differences are found are included in the final test. If, for example, there are differences between groups in responses to the statement “I like to fish,” then that statement becomes an element of the test. The central premise of this book is that the test is linked to a conceptual or taxonomic theory that defines these characteristics.

The purpose of the test is usually defined in the instructions for its use. The test must be standardized so that it can assess differences between individuals rather than between test conditions. There are, however, deviations from standardization in procedures called “testing the limits” and “learning potential tests”. In these conditions, the respondent is assisted in the process


testing and then evaluate the effect of such a procedure on the result. Scoring for answers to assignments is objective, i.e. carried out in accordance with standard procedure. The interpretation of the results obtained is also strictly defined and carried out on the basis of test standards.

The third component of psychodiagnostics - psychological tests, instruments, procedures - contains certain tasks that are the smallest units of psychodiagnostics and in this sense the tasks are similar to the phonemes of a language. The number of possible combinations of phonemes is limited. Only certain phonemic structures can form words and sentences that convey information to the listener. Also And test tasks: only in a certain combination with each other they can become effective means assessment of the relevant construct.