The assessment of the validity of the methodology can be quantitative and qualitative. Method validity, types of validity

After reliability, another key criterion for assessing the quality of methods is validity. The question of the validity of the methodology is decided only after its sufficient reliability has been established, since an unreliable methodology cannot be valid. But the most reliable technique without knowing its validity is practically useless.

It should be noted that the question of validity is still one of the most difficult. The most rooted definition of this concept is the one given in the book by A. Anastasi: “The validity of a test is a concept that tells us what the test measures and how well it does it.”

Validity in its essence is a complex characteristic, including, on the one hand, information about whether the technique is suitable for measuring what it was created for, and on the other hand, what is its effectiveness, efficiency, and practical usefulness.

For this reason, there is no single universal approach to determining validity. Depending on which side of validity the researcher wants to consider, different methods of proof are also used. In other words, the concept of validity includes its different types, which have their own special meaning. Checking the validity of a technique is called validation.

Validity in its first sense is related to the methodology itself, that is, it is the validity of the measuring tool. This check is called theoretical validation. Validity in the second sense already refers not so much to the methodology as to the purpose of its use. This is pragmatic validation.

Summarizing, we can say the following:

» in theoretical validation, the researcher is interested in the very property measured by the technique. This, in essence, means that the actual psychological validation is being carried out;

» with pragmatic validation, the essence of the subject of measurement (psychological property) is out of sight. The main emphasis is on proving that the "something" measured by the methodology has a relationship with certain areas of practice.

Conducting theoretical validation, in contrast to pragmatic validation, is sometimes much more difficult. Without going into specific details for now, let us dwell in general terms on how pragmatic validity is checked: some external criterion independent of the methodology is selected that determines success in a particular activity (educational, professional, etc.), and with it the results of the diagnostic technique are compared. If the connection between them is recognized as satisfactory, then a conclusion is made about the practical significance, efficiency, and effectiveness of the diagnostic technique.

To determine theoretical validity, it is much more difficult to find any independent criterion that lies outside the methodology. Therefore, in the early stages of the development of testology, when the concept of validity was just taking shape, there was an intuitive idea of what exactly this test measures:

1) the technique was called valid, since what it measures is simply “obvious”;

2) the proof of validity was based on the researcher's confidence that his method allows "to understand the subject";

3) the methodology was considered valid (i.e., the statement was accepted that such and such a test measures such and such a quality) only because the theory on the basis of which the methodology was built was “very good”.

Acceptance on faith of allegations about the validity of the methodology could not last for a long time. The first manifestations of truly scientific criticism debunked this approach: the search for scientifically sound evidence began.

Thus, to carry out a theoretical validation of a methodology is to prove that the methodology measures exactly the property, quality that it, according to the researcher's intention, should measure.

So, for example, if a test was developed in order to diagnose the mental development of children, it is necessary to analyze whether it really measures this development, and not some other features (for example, personality, character, etc.). Therefore, for theoretical validation, the cardinal problem is the relationship between psychological phenomena and their indicators, through which these psychological phenomena are trying to be known. Such a check shows how the author's intention and the results of the methodology coincide.

It is not so difficult to theoretically validate a new method if there is already a method with proven validity to measure a given property. The presence of a correlation between a new and a similar, already proven method indicates that the developed method measures the same psychological quality as the reference one. And if the new method at the same time turns out to be more compact and economical in carrying out and processing the results, then psychodiagnostics get the opportunity to use the new tool instead of the old one. This technique is especially often used in differential psychophysiology when creating methods for diagnosing the basic properties of the human nervous system (see Chapter 16).

But theoretical validity is proved by comparing not only with related indicators, but also with those where, based on the hypothesis, there should not be significant relationships. Thus, to test theoretical validity, it is important, on the one hand, to establish the degree of connection with a related technique (convergent validity), and on the other hand, the absence of this connection with methods that have a different theoretical basis (discriminant validity).

It is much more difficult to carry out theoretical validation of the method when such a way of verification is impossible. Most often, this is the situation faced by the researcher. In such circumstances, only the gradual accumulation of various information about the property under study, the analysis of theoretical premises and experimental data, and considerable experience in working with the technique make it possible to reveal its psychological meaning.

An important role in understanding what the methodology measures is played by the comparison of its indicators with practical forms of activity. But here it is especially important that the methodology be thoroughly worked out in theoretical terms, that is, that there be a solid, well-founded scientific basis. Then, when comparing the methodology with an external criterion taken from everyday practice, corresponding to what it measures, information can be obtained that reinforces theoretical ideas about its essence.

It is important to remember that if the theoretical validity is proven, then the interpretation of the obtained indicators becomes clearer and more unambiguous, and the name of the methodology corresponds to the scope of its application.

As for pragmatic validation, it implies testing the methodology in terms of its practical effectiveness, significance, usefulness, since it makes sense to use a diagnostic technique only when it is proved that the property being measured is manifested in certain life situations, in certain types of activity. It is given great importance, especially where the question of selection arises.

If we turn again to the history of the development of testology, we can distinguish a period (20-30s) when the scientific content of tests and their theoretical "baggage" were of less interest. It was important that the test worked and helped to quickly select the most prepared people. The empirical criterion for evaluating test items was considered the only true guideline in solving scientific and applied problems.

The use of diagnostic methods with a purely empirical justification, without a clear theoretical basis, often led to pseudoscientific conclusions and unjustified practical recommendations. It was impossible to accurately name those features, qualities that the tests revealed. B. M. Teplov, analyzing the tests of that period, called them "blind tests."

This approach to the problem of test validity was typical until the early 1950s. not only in the USA, but also in other countries. The theoretical weakness of empirical methods of validation could not but cause criticism from those scientists who, in the development of tests, called for relying not only on "bare" empiricism and practice, but also on a theoretical concept. Practice without theory is blind, and theory without practice is dead. Currently, the theoretical and pragmatic assessment of the validity of methods is perceived as the most productive.

To conduct a pragmatic validation of a methodology, i.e. to assess its effectiveness, efficiency, practical significance, an independent external criterion is usually used - an indicator of the manifestation of the studied property in everyday life. These criteria can be:

1) performance (for learning ability tests, achievement tests, intelligence tests);

2) production achievements (for methods of professional orientation);

3) the effectiveness of real activities - drawing, modeling, etc. (for tests of special abilities);

4) subjective assessments (for personality tests). American researchers D. Tiffin and E. McCormick, after

analysis of the external criteria used to prove the validity, identified four types of them:

1) performance criteria (they may include such as the amount of work performed, academic performance, time spent on training, the rate of growth of qualifications, etc.);

2) subjective criteria (they include various types of answers that reflect a person's attitude to something or someone, his opinion, views, preferences; usually subjective criteria are obtained through interviews, questionnaires, questionnaires);

3) physiological criteria (they are used in studying the influence of the environment and other situational variables on the human body and psyche; the pulse rate, blood pressure, skin electrical resistance, symptoms of fatigue, etc. are measured);

4) randomness criteria (applied when the purpose of the study concerns, for example, the problem of selecting for work such persons who are less prone to accidents).

The external criterion must meet three basic requirements:

1) it must be relevant;

2) free from interference (contamination);

3) reliable].

Relevance refers to the semantic correspondence of a diagnostic tool to an independent vital criterion. In other words, there must be confidence that the criteria involve precisely those features of the individual psyche that are also measured by the diagnostic technique. The external criterion and the diagnostic technique must be in internal semantic correspondence with each other, be qualitatively homogeneous in psychological essence.

If, for example, a test measures the individual characteristics of thinking, the ability to perform logical actions with certain objects, concepts, then in the criterion one should look for the manifestation of precisely these skills. This applies equally to professional activities. It has not one, but several goals, tasks, each of which is specific and imposes its own conditions for implementation. This implies the existence of several criteria for the performance of professional activities. Therefore, one should not compare the success of diagnostic methods with production efficiency in general. It is necessary to find a criterion that, by the nature of the operations performed, is comparable with the methodology.

If it is not known with respect to the external criterion whether it is relevant to the measured property or not, then comparing the results of the psychodiagnostic technique with it becomes practically useless. It does not allow to come to any conclusions that could assess the validity of the methodology.

The requirements of freedom from interference (contamination) are caused by the fact that, for example, educational or industrial success depends on two variables: on the person himself, his individual characteristics, measured by methods, and on the situation, conditions of study, work, which can bring interference, “contaminate » applied criterion. In order to avoid this to some extent, groups of people who are in more or less the same conditions should be selected for research. You can also use another method. It consists in correcting the influence of interference. This adjustment is usually statistical in nature. So, for example, productivity should not be taken in absolute terms, but in relation to the average productivity of workers with similar working conditions.

When it is said that a criterion must have statistically significant reliability, this means that it must reflect the constancy and stability of the function under study.

The search for an adequate and easily identifiable criterion is one of the most important and difficult tasks of validation. In Western testology, many methods are disqualified only because it was not possible to find a suitable criterion for their verification. In particular, for most questionnaires, the data on their validity are questionable, since it is difficult to find an adequate external criterion that corresponds to what they measure.

The assessment of the validity of methods can be quantitative and qualitative.

To calculate a quantitative indicator - the coefficient of validity - the results obtained using the diagnostic technique are compared with the data obtained by the external criterion of the same persons. Different types of linear correlation are used (according to Spearman, according to Pearson).

How many subjects are needed to calculate validity?

Practice has shown that there should not be less than 50 of them, but more than 200 are best. The question often arises, what should be the value of the coefficient of validity in order for it to be considered acceptable? In general, it is noted that it is sufficient that the coefficient of validity be statistically significant. A coefficient of validity of about 0.2-0.3 is recognized as low, 0.3-0.5 as medium, and over 0.6 as high.

But, as A. Anastasi, K. M. Gurevich and others emphasize, it is not always right to use linear correlation to calculate the validity coefficient. This technique is justified only when it is proved that success in some activity is directly proportional to success in performing a diagnostic test. The position of foreign testologists, especially those involved in professional suitability and professional selection, most often comes down to the unconditional recognition that the one who completed the most tasks in the test is more suitable for the profession. But it may also be that in order to be successful in an activity, it is necessary to have a property at the level of 40% of the test solution. A higher score on the test no longer makes any difference to the profession. An illustrative example from the monograph by K. M. Gurevich: a postman must be able to read, but whether he reads at a normal speed or at a very high speed is no longer of professional importance. With such a correlation between the indicators of the methodology and the external criterion, the most adequate way to establish validity may be the criterion of differences.

Another case is also possible: a higher level of property than is required by the profession interferes with professional success. So, at the dawn of the 20th century. the American researcher F. Taylor found that the most developed workers in production have low labor productivity. That is, the high level of their mental development prevented them from working highly productively. In this case, analysis of variance or calculation of correlation ratios would be more suitable for calculating the coefficient of validity.

As the experience of foreign testologists has shown, not a single statistical procedure is able to fully reflect the diversity of individual assessments. Therefore, another model is often used to prove the validity of methods - clinical assessments. This is nothing more than a qualitative description of the essence of the studied property. In this case, we are talking about the use of techniques that are not based on statistical processing.

There are several types of validity, due to the peculiarities of diagnostic methods, as well as the temporary status of the external criterion. However, the following are the most common.

1. Validity "by content". This technique is used, for example, in achievement tests. Usually, achievement tests do not include all the material that students have passed, but some small part of it (3-4 questions). Is it possible to be sure that the correct answers to these few questions indicate the assimilation of all the material? This is what the content validity check should answer. To do this, a comparison of success on the test with expert assessments of teachers (for this material) is carried out. Validity "by content" also applies to criteria-based tests. This technique is sometimes called logical validity.

2. Validity "by simultaneity", or current validity, is determined using an external criterion by which information is collected simultaneously with experiments according to the method being tested. In other words, up-to-date data is collected: academic performance during the trial period,

performance in the same period, etc. They are compared with the results of success on the test.

3. "Predictive" validity (another name is "predictive" validity). It is also determined by an external criterion, but information on it is collected some time after the test. The external criterion is usually the ability of a person, expressed in some assessments, to the type of activity for which he was evaluated according to the results of diagnostic tests. Although this technique is most suitable for the task of diagnostic techniques - the prediction of future success - it is very difficult to apply it. The accuracy of the diagnosis is inversely related to the time given for such prediction. The more time passes after the measurement, the more factors must be taken into account when assessing the prognostic significance of the technique. However, it is almost impossible to take into account all the factors that affect the prediction.

4. "Retrospective" validity. It is determined on the basis of a criterion that reflects the events or state of quality in the past. It can be used to quickly obtain information about the predictive capabilities of the technique. Thus, to test the extent to which good test scores correspond to rapid learning, one can compare past grades, past expert opinions, etc. in individuals with high and low diagnostic indicators at the moment.

When presenting data on the validity of the developed methodology, it is important to clearly indicate what type of validity is meant (by content, by simultaneity, etc.). It is also desirable to report information about the number and characteristics of individuals on whom validation was carried out. Such information allows the researcher using the technique to decide how valid this technique is for the group to which he intends to apply it. As in the case of reliability, it must be remembered that a technique may have high validity in one sample and low validity in another. Therefore, if the researcher plans to use the methodology on a sample of subjects that is significantly different from the one on which the validity test was carried out, he needs to re-perform such a test. The validity coefficient given in the manual is applicable only to groups of subjects similar to those on which it was determined.

The issue of validity is decided after reliability has been established, since an unreliable method cannot be valid.

Test validity is a concept that tells us what the test measures and how well it does it (A. Anastasi). Validity in its essence is a complex characteristic, including, on the one hand, information about whether the technique is suitable for measuring what it was created for, and on the other hand, what is its effectiveness, efficiency, and practical usefulness.

Validity - compliance of a particular study with accepted standards (an impeccable experiment).

Validity in its first sense is related to the methodology itself, i.e. is the validity of the measuring instrument. This check is called theoretical validation. Validity in its second sense already refers not so much to the methodology as to the purpose of its use. This is pragmatic validation.

In theoretical validation, the researcher is interested in the very property measured by the technique.

Since it is difficult to find any independent criterion that lies outside the methodology to determine theoretical validation, and therefore allegations about the validity of this methodology were previously taken for granted. Since theoretical validation is aimed at proving that the technique measures exactly the property that it should measure. For theoretical validation, the cardinal problem is the relationship between psychological phenomena and their indicators, by means of which these psychological phenomena are trying to be known. It shows that the author's intention and the results of the technique coincide.

It is not so difficult to theoretically validate a new technique if there is already a technique with known, proven validity to measure a given property. The presence of a correlation between the new and similar old methods indicates that the developed method measures the same psychological quality as the reference one.

To test the theoretical validity, it is important, on the one hand, to establish the degree of connection with a related technique (convergent validity), and on the other hand, the absence of this connection with methods that have a different theoretical basis (discriminant validity).

An important role in understanding what the methodology measures is played by the comparison of its indicators with practical forms of activity. It is important that the methodology be worked out in theoretical terms.

Pragmatic Validation

The practical effectiveness, significance, usefulness of the methodology is checked, since the methodology can be used only when it is proved that the property being measured is manifested in certain types of Activities.

To test pragmatic validity, an independent external criterion is used - an indicator of the manifestation of the studied property in everyday life. Such a criterion can be performance (for tests of learning abilities, tests of achievement, tests of intelligence), production achievements (for methods of professional orientation), the effectiveness of real Activities - drawing, modeling, and so on (for tests of special abilities), subjective assessments ( for personality tests).

American researchers Tiffin and McCormick identified 4 types of external criteria:

1) Performance criterion (amount of work performed, academic performance, time, rate of growth of qualifications).
2) Subjective criteria (include various types of answers that reflect a person's attitude to something, his opinions, views).
3) Physiological criterion (used in the study of the influence of the external environment that affects the body and psyche).
4) The criterion of randomness (for example, when the goal concerns the problem of selecting for work such persons who are less prone to accidents).

The external criterion must have 3 main requirements: 1) It must be relevant, that is, there must be confidence that the criteria involve precisely those features of the individual psyche that are measured by the diagnostic technique. The external criterion and the diagnostic model must be in internal semantic correspondence. 2) Must be free from interference (contamination). It is necessary to select for research such groups of people who are in more or less the same conditions. 3) Must be reliable. Constancy and stability of the investigated function.

The assessment of the validity of the methodology can be quantitative and qualitative.

To calculate a quantitative indicator (validity coefficient), the results obtained using diagnostic methods are compared with the data of the same persons obtained by an external criterion. Different types of linear correlation are used (according to Spearman, according to Pearsen).

Qualitative description of the essence of the measured property. No statistical processing is used here. There are several types of validity, due to the peculiarities of the diagnostic technique, as well as the temporary status of the external criterion: 1) Validity "by content" (used in achievement tests): 3-4 questions from a large topic can show the student's true knowledge. To do this, the diagnostic results are compared with the teacher's expert assessments. 2) Validity "by simultaneity" or current validity - data related to the present time is collected: academic performance, productivity, etc. They correlate with the results of success on the test. 3) "Predictive" validity ("predictive"). Determined by a reliable external criterion, but information on it is collected some time after the test. The accuracy of the forecast is inversely related to the time given for such a forecast. 4) "Retrospective" validity. It is determined on the basis of a criterion that reflects events or a state of quality in the past. It can be used to quickly obtain information about the predictive capabilities of the technique.

Before psychodiagnostic methods can be used for practical purposes, they must be tested according to a number of formal criteria that prove their high quality and effectiveness. The main criteria for evaluating psychodiagnostic methods are reliability and validity.

A great contribution to the development of these concepts was made by foreign psychologists (A. Anastasi, E. Ghiselli, J. Gilford, L. Cronbach, R. Thorndike, E. Hagen, and others). They developed a formal-logical and mathematical-statistical apparatus (first of all, the correlation method and actual analysis) to substantiate the degree of compliance of the methods with the above criteria.

In traditional testology, the term "reliability" means the relative constancy, stability, consistency of the test results during its initial and repeated use on the same subjects.

Reliability of the technique- this is a criterion that indicates the accuracy of psychological measurements, that is, it allows you to judge how trustworthy the results are.

This is the consistency of test results of subjects at different points in time, during primary and secondary testing and using tasks that are different in terms of equivalence and content. Reliability characterizes tests of properties, but not states. Properties:

Reproducibility of research results.
Measurement accuracy.
Stability of results.

The degree of reliability of the methods depends on many factors. Among negative factors the following are most often mentioned:

instability of the diagnosed property;
imperfection of diagnostic methods (instructions are carelessly drawn up, tasks are heterogeneous in nature, instructions for presenting the methodology to subjects are not clearly formulated, etc.);
the changing situation of the examination (different times of the day when experiments are carried out, different illumination of the room, the presence or absence of extraneous noise, etc.);
differences in the behavior of the experimenter (from experience to experience presents instructions in different ways, stimulates the performance of tasks in different ways, etc.);
fluctuations in the functional state of the subject (in one experiment, good health is noted, in another - fatigue, etc.);
elements of subjectivity in the methods of evaluating and interpreting the results (when the subjects' answers are recorded, the answers are evaluated according to the degree of completeness, originality, etc.).

One of the most important means of increasing the reliability of the methodology is the uniformity of the examination procedure, its strict regulation: the same environment, the same type of instructions, the same time limits for everyone, the methods and features of contact with the subjects, and so on.

The studied sample has a great influence on the characteristics of the reliability of the methods. It can both reduce and overestimate this indicator, for example, reliability can be artificially high if there is a small scatter of results in the sample, i.e. if the results are close to each other in their values. Therefore, the manual usually describes the sample on which the reliability of the methodology was determined.

At present, reliability is increasingly being determined on the most homogeneous samples, i.e. on samples similar in gender, age, level of education, professional training, etc.

There are as many varieties of method reliability as there are conditions that affect the results of diagnostic tests. Since all types of reliability reflect the degree of consistency of two independently obtained series of indicators, the mathematical and statistical method by which the reliability of the technique is established is correlations (according to Pearson or Spearman). The reliability is the higher, the more the obtained correlation coefficient approaches unity, and vice versa.

K.M. Gurevich proposed to interpret reliability as:

reliability of the measuring tool itself (reliability factor);
stability of the trait under study (stability coefficient);
constancy, i.e. relative independence of the results from the personality of the experimenter (constancy coefficient).

The indicator characterizing the measuring tool is proposed to be called the reliability coefficient; an indicator characterizing the stability of the measured property - the stability coefficient; and the indicator of assessing the influence of the personality of the experimenter - by the coefficient of constancy. It is in this order that it is recommended to check the methodology: it is advisable to first check the measuring instrument. If the data obtained are satisfactory, then it is possible to proceed to establishing a measure of the stability of the measured property, and after that, if necessary, to deal with the criterion of constancy. (Reliability: retest, parallel shapes, body parts, internal consistency, factor-dispersion).

Determination of the reliability of the measuring tool. The accuracy and objectivity of the measurement depend on how the methodology is drawn up, how correctly the tasks are selected, how homogeneous it is.

To check the reliability of the measuring instrument, which indicates its uniformity (homogeneity), the splitting method is used. Tasks are divided into even and odd (all tasks must be completed), and then the results are correlated with each other. If the methodology is homogeneous, then there will not be a big difference in success for these halves, the coefficient will be high. You can compare by parts, but even and odd are better, because this method does not depend on training, fatigue, etc.

The technique is reliable if the coefficient is not lower than 0,75 – 0,85, better than 0.90 and above.

Determination of the stability of the studied trait. It is also necessary to establish how stable, stable the trait that the researcher intends to measure. The sign may change over time, but its fluctuations should not be unpredictable.

For verification, a technique called a test retest is used. It consists in re-examining the subjects using the same technique. Stability is judged by the correlation coefficient between the results of the first and second surveys. It will indicate whether or not each subject retains his serial number in the sample.

The degree of stability is influenced by the diversity of the factor. It is necessary to observe the uniformity of the examination procedure.

When determining the stability of a trait, the time interval between the 1st and 2nd examinations is of great importance. The shorter this interval, the greater the chance that this trait retains the level of the first test. It is advisable to re-test a short time after testing. The experimenter himself sets this period, but more often in the psychological literature they indicate an interval of several months (but not more than six months). The question of the stability of a measured property is not always solved uniformly. The decision depends on the nature of the diagnosed symptom.

If the measured property is already formed, then the coefficient must be at least 0.80.

Definition of constancy, i.e. relative independence of the results from the personality of the experimenter. Since the technique is being developed for further use by other psychodiagnostics, it is necessary to determine to what extent its results are influenced by the personality of the experimenter. The constancy coefficient is determined by correlating the results of two experiments conducted on the same sample, but by different experimenters. The correlation coefficient should not be lower than 0.80.

The issue of validity is decided after reliability has been established, since an unreliable method cannot be valid.

Validity test - a concept that tells us what the test measures and how well it does it (A. Anastasi). Validity in its essence, it is a complex characteristic, including, on the one hand, information about whether the technique is suitable for measuring what it was created for, and on the other hand, what is its effectiveness, efficiency, and practical usefulness.

Validity is the compliance of a particular study with accepted standards (an impeccable experiment).

Validity in its first sense is related to the methodology itself, i.e. is the validity of the measuring instrument. This verification is called theoretical validation. Validity in its second sense already refers not so much to the methodology as to the purpose of its use. This is pragmatic validation.

In theoretical validation, the researcher is interested in the very property measured by the technique.

To test the theoretical validity, it is important, on the one hand, to establish the degree of connection with a related method (convergent validity), and on the other hand, the absence of this connection with methods that have a different theoretical basis (discriminant validity).

Pragmatic Validation

To test pragmatic validity, an independent external criterion is used - an indicator of the manifestation of the studied property in everyday life. Such a criterion can be academic performance (for learning ability tests, achievement tests, intelligence tests), production achievements (for professional orientation methods), the effectiveness of real Activities - drawing, modeling, and so on (for tests of special abilities), subjective assessments ( for personality tests).

American researchers Tiffin and McCormick identified 4 types of external criteria:

Performance criterion (amount of work performed, academic performance, time, rate of qualification growth).
Subjective criteria (include various types of answers that reflect a person's attitude to something, his opinions, views).
Physiological criterion (used in the study of the influence of the external environment that affects the body and psyche).
A criterion of randomness (for example, when the goal concerns the problem of selecting for work such persons who are less prone to accidents).

An external criterion must have 3 main requirements:

It must be relevant, that is, there must be confidence that the criteria involve precisely those features of the individual psyche that are measured by the diagnostic technique. The external criterion and the diagnostic model must be in internal semantic correspondence.
Must be free from interference (contamination). Groups of people who are in more or less the same conditions should be selected for research.
Must be reliable. Constancy and stability of the investigated function.

The assessment of the validity of the methodology can be quantitative and qualitative.

Qualitative description of the essence of the measured property. No statistical processing is used here.

There are several types of validity, due to the peculiarities of the diagnostic technique, as well as the temporary status of the external criterion:

Validity "by content" (used in achievement tests): 3 - 4 questions from a large topic can show the student's true knowledge. To do this, the diagnostic results are compared with the teacher's expert assessments.
Simultaneous or Current Validity – Collects data related to the present: academic performance, performance, etc. They correlate with the results of success on the test.
"Predictive" validity ("predictive"). Determined by a reliable external criterion, but information on it is collected some time after the test. The accuracy of the forecast is inversely related to the time given for such forecasting.
retrospective validity. It is determined on the basis of a criterion that reflects the events or the state of quality in the past. It can be used to quickly obtain information about the predictive capabilities of the technique.

Validity (from the English valid - “valid, suitable, valid”) is a complex characteristic of a methodology (test), including information about the area of \u200b\u200bthe studied phenomena and the representativeness of the diagnostic procedure in relation to them.

In its simplest and most general formulation, the validity of a test is "a concept that tells us what the test measures and how well it does it." In the standard requirements for psychological and educational tests, validity is defined as a set of information about which groups of psychological properties of a person can be concluded using the methodology, as well as the degree of validity of the conclusions when using specific test scores or other forms of assessment. In psychodiagnostics, validity is an obligatory and most important part of the information about the methodology, including (along with the above) data on the degree of consistency of test results with other information about the person being studied, obtained from various sources (theoretical expectations, observations, expert assessments, results of other methods, the reliability of which has been established, etc.), a judgment about the validity of the forecast for the development of the quality under study, the connection of the studied area of behavior or personality traits with certain psychological constructs. Validity also describes the specific orientation of the methodology (the contingent of subjects by age, level of education, socio-cultural affiliation, etc.) and the degree of validity of the conclusions in the specific conditions of using the test. The totality of information characterizing the validity of the test contains information about the adequacy of the applied activity model in terms of reflecting the studied psychological characteristics in it, about the degree of homogeneity of tasks (subtests) included in the test, their comparability in the quantitative assessment of test results as a whole.

The most important component of validity - the definition of the area of the studied properties - is of fundamental theoretical and practical importance in choosing a research methodology and interpreting its data. The information contained in the name of the test, as a rule, is insufficient to judge the scope of its application. This is just a designation, the “name” of a particular research procedure.

Types of test validity. Methods for determining validity

According to the definition of the American textologist A. Anastasi, "the validity of a test is a concept that tells us what the test measures and how well it does it." Validity indicates whether the technique is suitable for measuring certain qualities, features and how effectively it does this. The most common way to find the theoretical validity of a test (method) is convergent validity, that is, comparing a given technique with authoritative related methods and proving significant links with them.

Comparison with methods that have a different theoretical basis, and the constant lack of significant relationships with them is called discriminant validity. Another type of validity - pragmatic validity - testing the methodology in terms of its practical significance, efficiency, usefulness. To conduct such a test, as a rule, so-called independent external criteria are used, that is, an external source of information independent of the test is used about the manifestation in real life and the activity of people of a measured mental property. Among such external criteria may be academic performance, professional achievements, success in various activities, subjective assessments (or self-assessments). If, for example, the methodology measures the features of the development of professionally important qualities, then for the criterion it is necessary to find such an activity or individual operations where these qualities are realized.

To check the validity of the test, you can use the method of known groups, when people are invited about whom it is known which group according to the criterion they belong to (for example, a group of "highly successful, disciplined students" - a high criterion and a group of "poor, undisciplined students" - a low criterion, and students with average values do not participate in testing), conduct testing and find a correlation between the test results and the criterion.

Here a is the number of subjects who fell into the high group according to the test and according to the criterion, c is the number of subjects who fell into the high group according to the criterion and have low test results. If the test is completely valid, the elements b and c must be equal to zero. The measure of coincidence, the correlation between the extreme groups according to the test and criterion is estimated using the Guilford phi coefficient. There are many different ways to prove the validity of a test. A test is said to be valid if it measures what it is intended to measure. External validity - in relation to psychodiagnostic methods, means the correspondence of the results of psychodiagnostics carried out using this method to external signs, independent of the method, attributable to the subject of the survey. It means approximately the same thing as empirical validity, with the difference that here we are talking about the relationship between the indicators of the methodology and the most important, key external features related to the behavior of the subject. A psychodiagnostic technique is considered externally valid if, for example, it is used to evaluate the character traits of an individual and his externally observed behavior is consistent with the results of the testing.

Validity is internal - in relation to psychodiagnostic methods, it means the correspondence of the tasks and subtests contained in it; compliance of the results of psychodiagnostics carried out by means of this technique with the definition of the psychological property being evaluated, used in the technique itself. A methodology is considered not internally valid or insufficiently valid when all or part of the questions, tasks and subtests included in it do not measure what is required by this methodology. Apparent validity - describes the perception of the test that has developed in the subject. The test should be perceived by the subject as a serious tool for understanding his personality. Obvious validity is of particular importance in modern conditions, when the idea of tests in the public mind is formed by numerous publications in popular newspapers and magazines of what can be called quasi-tests, with the help of which the reader is invited to determine anything: from intelligence to compatibility with a future spouse.

Competitive validity is assessed by the correlation of the developed test with others, the validity of which in relation to the measured parameter has been established. P. Kline notes that data on competitive validity are useful when there are unsatisfactory tests for measuring some variables, and new ones are created in order to improve the quality of the measurement. But the question arises: if an effective test already exists, why do we need the same new one? Predictive validity is established by the correlation between test scores and some criterion that characterizes the property being measured, but at a later time. For example, the predictive validity of an intelligence test can be shown by correlating its scores obtained from a test subject at age 10 with academic performance at the end of high school. L. Cronbach considers predictive validity to be the most convincing evidence that the test measures exactly what it was intended for. The main problem faced by a researcher trying to establish the predictive validity of his test is the choice of an external criterion. In particular, this most often concerns the measurement of personality variables, where the selection of an external criterion is an extremely difficult task, the solution of which requires considerable ingenuity. The situation is somewhat simpler when determining an external criterion for cognitive tests, however, even in this case, the researcher has to “turn a blind eye” to many problems. Thus, academic performance is traditionally used as an external criterion for the validation of intelligence tests, but at the same time it is well known that academic achievement is far from the only evidence of high intelligence. Incremental validity is of limited value and refers to the case where one test from a battery of tests may have a low correlation with a criterion, but not overlap with other tests from the battery. In this case, the test has incremental validity. This can be useful when conducting professional selection using psychological tests. Differential validity can be illustrated using tests of interest. Interest tests usually correlate with academic performance, but in different ways for different disciplines. The value of differential validity, as well as incremental validity, is limited.

Content validity is defined as confirmation that the test items reflect all aspects of the area of behavior being studied. Usually it is determined in achievement tests (the meaning of the measured parameter is completely clear), which, as it was already mentioned, are not actually psychological tests. In practice, to determine content validity, experts are selected who indicate which area of behavior is most important, for example, for musical abilities, and then, based on this, test items are generated, which are again evaluated by experts. The construct validity of a test is demonstrated as complete as possible by describing the variable that the test is intended to measure. In fact, construct validity includes all the approaches to determining validity that have been listed above. Cronbach and Meehl, who introduced the concept of construct validity into psychodiagnostics, tried to solve the problem of selecting criteria for test validation. They emphasized that in many cases no single criterion can serve to validate a single test. We can assume that the solution to the question of the construct validity of the test is a search for an answer to two questions: 1) does a certain property really exist; 2) whether this test reliably measures individual differences in this property. It is quite clear that the problem of objectivity in the interpretation of the results of the study of construct validity is associated with construct validity, but this problem is general psychological and goes beyond validity.

Validity(from the English valid - “valid, suitable, valid”) - a complex characteristic of a methodology (test), including information about the area of \u200b\u200bthe studied phenomena and the representativeness of the diagnostic procedure in relation to them.

In its simplest and most general formulation, the validity of a test is “a concept that tells us what the test measures and how well it does it.” A. Anastasi , 1982). In the standard requirements for psychological and educational tests, validity is defined as a set of information about which groups of psychological properties of a person can be concluded using the methodology, as well as the degree of validity of the conclusions when using specific test scores or other forms of assessment. In psychodiagnostics, validity is an obligatory and most important part of the information about the methodology, including (along with the above) data on the degree of consistency of test results with other information about the person being studied, obtained from various sources (theoretical expectations, observations, expert assessments, results of other methods, the reliability of which has been established, etc.), a judgment about the validity of the forecast for the development of the quality under study, the connection of the studied area of behavior or personality traits with certain psychological constructs. Validity also describes the specific orientation of the methodology (the contingent of subjects by age, level of education, socio-cultural affiliation, etc.) and the degree of validity of the conclusions in the specific conditions of using the test. The totality of information characterizing the validity of the test contains information about the adequacy of the applied activity model in terms of reflecting the studied psychological characteristics in it, about the degree of homogeneity of tasks (subtests) included in the test, their comparability in the quantitative assessment of test results as a whole.

2. The most important components of validity

The most important component of validity - the definition of the area of the studied properties - is of fundamental theoretical and practical importance when choosing a research methodology and interpreting its data. The information contained in the name of the test, as a rule, is insufficient to judge the scope of its application. This is just a designation, the “name” of a particular research procedure. An example is the well-known correction test. The area of personality properties studied includes stability and concentration of attention, psychomotor mobility. This technique allows to obtain estimates of the severity of these psychological qualities in the subject, is in good agreement with the indicators obtained by other methods, and, therefore, has a high validity. Along with this, the results of the correction test are influenced by a large number of other factors (neurodynamic features, characteristics of short-term and operative memory, individual monotony tolerance, development of reading skills, visual characteristics, etc.), in relation to which the technique is not specific. If a correction sample is used to measure them, the validity will be low or doubtful.

Thus, outlining the scope of the methodology, validity also reflects the level of validity of the measurement results. Obviously, with a small number of concomitant factors affecting the result of the study, and therefore, with their insignificant impact on the test result, the reliability of test scores will be higher. To an even greater extent, the reliability of test data is determined by set of measurable properties, their significance for the implementation of the diagnosed complex activity, the completeness and significance of the reflection in the test material of the subject of measurement. So, in order to meet the requirements of validity, a diagnostic technique intended for professional selection should include an analysis of a wide range of indicators, often different in nature, that are most important for achieving success in a given profession (level of attention, memory characteristics, psychomotor skills, emotional stability, interests, inclinations and etc.). As can be seen from the above, the concept of validity includes a large amount of the most diverse information about the test. The various categories of this information and the ways in which they are obtained form types of validity.

Diagnostic (competitive) validity reflects the ability of the test to differentiate the subjects according to the trait being studied. The analysis of diagnostic validity is related to establishing the conformity of the test indicators with the real state of the psychological characteristics of the subject at the time of the examination. An example of determining this type of validity would be a contrast group study. Conducting an intelligence test in normally developing children and their peers with intellectual disabilities can reveal profound quantitative and qualitative differences in the performance of tasks by the compared groups. The degree of reliability of differentiation of children of the first and second groups according to the test data will be a characteristic of the diagnostic validity of the assessment of mental development obtained using this technique.

Information characterizing the degree of validity and statistical reliability of the development of the studied psychological feature in the future is predictive validity methods. The conclusion about this type of validity can be obtained, for example, by comparing test scores in the same group of subjects after a certain time. The basis of predictive validity is the determination of how important the trait under study is from the point of view of the subject's activity in the future, taking into account regularly changing circumstances, the transition to another level of development.

Most methods, especially aptitude and intelligence tests, are being investigated for diagnostic and predictive validity. These two types of validity are often combined under the concept empirical validity. Here the generality of the approach to their determination is emphasized, which is carried out by statistical correlation of scores (grades) on the test and indicators on the external parameter chosen as the criterion validation (see criterion validity). The criterion of validity acts as a measure, an indicator of the studied psychological characteristics. For example, tests of special abilities are tested against learning outcomes in other subjects, achievements in music, art, etc. Tests of general intellectual ability are validated against even broader measures of school achievement (general achievement, mastery of complex systems of knowledge and skills). A criterion of validity is a test-independent measure of immediate value for certain areas of practice. For example, in the field of educational psychology this is academic performance, in labor psychology it is productivity, in medical psychology it is the state of health, etc. As direct criteria, expert assessments and characteristics of persons examined using a validated test, given by teachers, employees, managers are often used. .

In many cases, it is difficult or impossible to find an adequate validation criterion. At the same time, the set of characteristics included in the type of theoretical validity is of particular importance. When developing and using a test, a number of hypotheses can be formulated about how the test under study will correlate with another test that measures related or opposite psychological characteristics of the subjects. These hypotheses are put forward on the basis of theoretical ideas about the measured properties as a psychological construct. Confirmation of hypotheses indicates the theoretical validity of the methodology, i.e., the degree of its construct validity. This type of validity is the most complex and complex. A variety of information is used to confirm the compliance of the results obtained with the help of the test with theoretical expectations and patterns, including those related to other types of validity.

Content validity (internal, logical)- a set of information about the representativeness of test items in relation to the measured properties and features. One of the main requirements for the validation of the methodology in this direction is the reflection in the content of the test of the key aspects of the studied psychological phenomenon. If the area of behavior or feature is very complex, then meaningful validity requires the presentation in the test tasks of all the most important constituent elements of the phenomenon under study. Thus, when developing a verbal intelligence test, it is necessary to introduce groups of tasks (subtests) to test writing and reading skills that are quite heterogeneous in their operational composition.

Along with the listed main types of validity (meaningful, criterion and constructive) in practice, there are factorial, cross (convergent) and discriminant validity.

3. The main types of validity (diagnostic, prognostic, empirical, criterial, constructive, meaningful). Classification of types of validity

The classification of validity types is rather conditional, since, on the one hand, common methods of determination are often used for various validity criteria, and, on the other hand, the same source data can be interpreted from the point of view of different types of validity.

Classification of types of validity:

1) constructive;

2) differential;

3) convergent;

4) discriminatory (discriminatory);

5) factor;

6) validity by age differentiation;

7) criterion;

8) diagnostic (competitive);

9) current;

10) prognostic;

11) incremental;

12) synthetic;

13) retrospective;

14) empirical;

16) front (obvious).

Other types of validity:

1) illusory;

2) ecological.

4. Relativity of the division of validity into types. The concept of the validity complex. Rationale for the need for periodic validation of psychodiagnostic methods

In psychological diagnostics, there is no universal approach to the characterization of validity. For the validation of each type of psychodiagnostic procedures and individual tests, different types of validity can be used. The information included in the validity complex can be assessed qualitatively and quantitatively (using the validity coefficient), often they can be described. However, due to the complexity, complexity, and situationality in relation to the specific conditions for applying the methodology, the validity as a whole cannot be measured, it can only be judged.

Real validity is revealed only as a result of the accumulation of significant experience with tests. Obtaining new, expanded data on validity can radically change the idea of the scope and effectiveness of the methodology. Thus, some methods developed for diagnosing verbal factors of intelligence reflect only the level of awareness with sufficient validity. The scope of the test during its long-term validation, on the contrary, can be expanded. An example is Raven's progressive matrices, which were designed to study certain aspects of perceptual activity, but turned out to be largely saturated with a factor common to intelligence tests (see factor G). The actual validity of a number of psychodiagnostic methods, especially intelligence tests, learning achievements, professional suitability, personality questionnaires, changes over time. This is due to the obsolescence of age-related statistical norms, changes in social norms and patterns of behavior, teaching methods and the content of tasks, and requirements for professions. This circumstance creates the need for periodic control of the validity of methods.