Biographies Characteristics Analysis

Correlation analysis according to the Spearman method. Rank Correlation and Spearman's Rank Correlation Coefficient

In the presence of two series of values ​​subjected to ranking, it is rational to calculate the Spearman's rank correlation.

Such rows can be represented:

  • a pair of features determined in the same group of objects under study;
  • a pair of individual subordinate signs determined in 2 studied objects by the same set of signs;
  • a pair of group subordinate signs;
  • individual and group subordination of signs.

The method involves the ranking of indicators separately for each of the signs.

The smallest value has the smallest rank.

This method is non-parametric statistical method, designed to establish the existence of a connection between the studied phenomena:

  • determining the actual degree of parallelism between the two series of quantitative data;
  • assessment of the tightness of the revealed relationship, expressed quantitatively.

Correlation analysis

A statistical method designed to detect the existence of a relationship between 2 or more random variables(variables), as well as its strength, was called correlation analysis.

It got its name from correlatio (lat.) - ratio.

When using it, the following scenarios are possible:

  • the presence of a correlation (positive or negative);
  • no correlation (zero).

In the case of establishing a relationship between variables we are talking about their correlation. In other words, we can say that when the value of X changes, a proportional change in the value of Y will necessarily be observed.

Various measures of connection (coefficients) are used as tools.

Their choice is influenced by:

  • a way to measure random numbers;
  • the nature of the relationship between random numbers.

The existence of a correlation can be displayed graphically (graphs) and with a coefficient (numerical display).

Correlation is characterized by the following features:

  • connection strength (with a correlation coefficient from ±0.7 to ±1 - strong; from ±0.3 to ±0.699 - medium; from 0 to ±0.299 - weak);
  • direction of communication (forward or reverse).

Goals of correlation analysis

Correlation analysis does not allow establishing a causal relationship between the studied variables.

It is carried out with the aim of:

  • establishment of dependence between variables;
  • obtaining certain information about a variable based on another variable;
  • determining the closeness (connection) of this dependence;
  • determining the direction of the established connection.

Methods of correlation analysis


This analysis can be done using:

  • method of squares or Pearson;
  • rank method or Spearman.

The Pearson method is applicable for calculations requiring exact definition the force that exists between variables. The signs studied with its help should be expressed only quantitatively.

To apply the Spearman method or rank correlation there are no strict requirements in the expression of signs - it can be both quantitative and attributive. Thanks to this method, information is obtained not on the exact establishment of the strength of the connection, but of an indicative nature.

Variable rows can contain open options. For example, when work experience is expressed by values ​​such as up to 1 year, more than 5 years, etc.

Correlation coefficient

A statistical value characterizing the nature of the change in two variables is called the correlation coefficient or pair coefficient correlations. In quantitative terms, it ranges from -1 to +1.

The most common ratios are:

  • Pearson– applicable for variables belonging to the interval scale;
  • Spearman– for ordinal scale variables.

Limitations on the use of the correlation coefficient

Obtaining unreliable data when calculating the correlation coefficient is possible in cases where:

  • there is a sufficient number of values ​​for the variable (25-100 pairs of observations);
  • between the studied variables, for example, a quadratic relationship is established, and not linear;
  • in each case, the data contains more than one observation;
  • the presence of abnormal values ​​(outliers) of variables;
  • the data under study consist of well-defined subgroups of observations;
  • the presence of a correlation does not allow one to establish which of the variables can be considered as a cause, and which - as a consequence.

Correlation Significance Test

For rate statistics the concept of their significance or reliability is used, which characterizes the probability of a random occurrence of a quantity or its extreme values.

The most common method for determining the significance of a correlation is to determine the Student's t-test.

Its value is compared with the tabular value, the number of degrees of freedom is taken as 2. When the calculated value of the criterion is greater than the tabular value, it indicates the significance of the correlation coefficient.

When conducting economic calculations, it is considered sufficient confidence level 0.05 (95%) or 0.01 (99%).

Spearman ranks

Spearman's rank correlation coefficient makes it possible to statistically establish the presence of a connection between phenomena. Its calculation involves the establishment of a serial number for each attribute - a rank. The rank can be ascending or descending.

The number of features to be ranked can be any. This is a rather laborious process, limiting their number. Difficulties begin when you reach 20 signs.

To calculate the Spearman coefficient, use the formula:

wherein:

n - displays the number of ranked features;

d is nothing more than the difference between the ranks in two variables;

and ∑(d2) is the sum of squared rank differences.

Application of correlation analysis in psychology

Statistical support psychological research makes them more objective and highly representative. Statistical processing data received during psychological experiments helps to extract the maximum of useful information.

Correlation analysis has received the widest application in processing their results.

It is appropriate to conduct a correlation analysis of the results obtained during the research:

  • anxiety (according to R. Temml, M. Dorca, V. Amen tests);
  • family relationships (“Analysis of family relationships” (DIA) questionnaire of E.G. Eidemiller, V.V. Yustitskis);
  • the level of internality-externality (questionnaire of E.F. Bazhin, E.A. Golynkina and A.M. Etkind);
  • level emotional burnout teachers (questionnaire V.V. Boyko);
  • connections between the elements of the verbal intelligence of students in different profiles of education (method of K.M. Gurevich and others);
  • relationship between the level of empathy (method of V.V. Boyko) and satisfaction with marriage (questionnaire of V.V. Stolin, T.L. Romanova, G.P. Butenko);
  • links between the sociometric status of adolescents (test by Jacob L. Moreno) and the characteristics of the style of family education (questionnaire by E.G. Eidemiller, V.V. Yustitskis);
  • structures of life goals of adolescents brought up in complete and single-parent families (questionnaire Edward L. Deci, Richard M. Ryan Ryan).

Brief instructions for conducting correlation analysis according to the Spearman criterion

Correlation analysis using the Spearman method is performed according to the following algorithm:

  • paired comparable features are arranged in 2 rows, one of which is indicated by X, and the other by Y;
  • the values ​​of the X series are arranged in ascending or descending order;
  • the sequence of arrangement of the values ​​of the Y series is determined by their correspondence with the values ​​of the X series;
  • for each value in the X series, determine the rank - assign serial number from the minimum value to the maximum;
  • for each of the values ​​in the Y series, also determine the rank (from minimum to maximum);
  • calculate the difference (D) between the ranks of X and Y, using the formula D=X-Y;
  • the resulting difference values ​​are squared;
  • sum the squares of the rank differences;
  • perform calculations using the formula:

Spearman Correlation Example

It is necessary to establish the presence of a correlation between the length of service and the injury rate in the presence of the following data:

The most appropriate method of analysis is the rank method, because one of the signs is presented in the form of open options: work experience up to 1 year and work experience 7 years or more.

The solution of the problem begins with the ranking of data, which is summarized in a worksheet and can be done manually, because. their volume is not large:

Work experience Number of injuries Ordinal numbers (ranks) Rank Difference rank difference squared
d(x-y)
up to 1 year 24 1 5 -4 16
1-2 16 2 4 -2 4
3-4 12 3 2,5 +0,5 0,25
5-6 12 4 2,5 +1,5 2,5
7 or more 6 5 1 +4 16
Σd2 = 38.5

The appearance of fractional ranks in the column is due to the fact that in the case of the appearance of a variant of the same value, the average is found arithmetic value rank. AT this example injury rate 12 occurs twice and is assigned ranks 2 and 3, we find the arithmetic mean of these ranks (2 + 3) / 2 = 2.5 and put this value in the worksheet for 2 indicators.
By substituting the obtained values ​​into the working formula and making simple calculations, we obtain the Spearman coefficient equal to -0.92

The negative value of the coefficient indicates the presence feedback between signs and allows us to assert that a short work experience is accompanied by a large number injuries. Moreover, the strength of the relationship of these indicators is quite large.
The next stage of calculations is to determine the reliability of the obtained coefficient:
its error and Student's criterion are calculated

In cases where the measurements of the studied characteristics are carried out on an order scale, or the form of the relationship differs from a linear one, the study of the relationship between two random variables is carried out using rank correlation coefficients. Consider Spearman's rank correlation coefficient. When calculating it, it is necessary to rank (order) the sample options. Ranking is the grouping of experimental data in a certain order, either ascending or descending.

The ranking operation is carried out according to the following algorithm:

1. A lower value is assigned a lower rank. The highest value is assigned a rank corresponding to the number of ranked values. The smallest value is assigned a rank equal to 1. For example, if n=7, then highest value will receive rank number 7, except as provided in the second rule.

2. If several values ​​are equal, then they are assigned a rank, which is the average of those ranks that they would have received if they were not equal. As an example, consider an ascending sample consisting of 7 elements: 22, 23, 25, 25, 25, 28, 30. The values ​​22 and 23 occur once, so their ranks are respectively equal to R22=1, and R23=2 . The value 25 occurs 3 times. If these values ​​did not repeat, then their ranks would be equal to 3, 4, 5. Therefore, their rank R25 is equal to the arithmetic mean of 3, 4 and 5: . The values ​​28 and 30 do not repeat, so their ranks are respectively R28=6 and R30=7. Finally, we have the following correspondence:

3. total amount ranks must match the calculated one, which is determined by the formula:

where n - total ranked values.

The discrepancy between the actual and calculated amounts of ranks will indicate an error made in the calculation of ranks or their summation. In this case, you need to find and fix the error.

Spearman's rank correlation coefficient is a method that allows you to determine the strength and direction of the relationship between two features or two feature hierarchies. The use of the rank correlation coefficient has a number of limitations:

  • a) The expected correlation should be monotonic.
  • b) The volume of each of the samples must be greater than or equal to 5. To determine the upper limit of the sample, tables of critical values ​​​​are used (Table 3 of the Appendix). Maximum value n in the table is 40.
  • c) During the analysis, it is likely that a large number of identical ranks will occur. In this case, an amendment needs to be made. The most favorable case is when both studied samples represent two sequences of mismatched values.

To conduct a correlation analysis, the researcher must have two samples that can be ranked, for example:

  • - two signs measured in the same group of subjects;
  • - two individual trait hierarchies identified in two subjects for the same set of traits;
  • - two group hierarchies of features;
  • - individual and group hierarchies of features.

We begin the calculation with ranking the studied indicators separately for each of the signs.

Let us analyze a case with two features measured in the same group of subjects. First, the individual values ​​are ranked according to the first attribute obtained by different subjects, and then the individual values ​​according to the second attribute. If lower ranks of one indicator correspond to lower ranks of another indicator, and higher ranks of one indicator correspond to higher ranks of another indicator, then the two features are positively related. If the higher ranks of one indicator correspond to the lower ranks of another indicator, then the two signs are negatively related. To find rs, we determine the differences between the ranks (d) for each subject. The smaller the difference between the ranks, the closer the rank correlation coefficient rs will be to "+1". If there is no relationship, then there will be no correspondence between them, hence rs will be close to zero. The greater the difference between the ranks of the subjects in two variables, the closer to "-1" will be the value of the coefficient rs. Thus, the Spearman rank correlation coefficient is a measure of any monotonic relationship between the two characteristics under study.

Consider the case with two individual feature hierarchies identified in two subjects for the same set of features. In this situation, the individual values ​​obtained by each of the two subjects according to a certain set of features are ranked. The feature with the lowest value should be assigned the first rank; sign with more high value- second rank, etc. Should be paid Special attention to ensure that all features are measured in the same units. For example, it is impossible to rank indicators if they are expressed in points of different “price”, since it is impossible to determine which of the factors will take the first place in terms of severity until all values ​​are brought to a single scale. If features that have low ranks in one of the subjects also have low ranks in the other, and vice versa, then the individual hierarchies are positively related.

In the case of two group hierarchies of features, the average group values ​​obtained in two groups of subjects are ranked according to the same set of features for the studied groups. Next, we follow the algorithm given in the previous cases.

Let us analyze the case with individual and group hierarchy of features. They start by ranking separately the individual values ​​of the subject and the mean group values ​​according to the same set of features that were obtained, with the exception of the subject who does not participate in the mean group hierarchy, since his individual hierarchy will be compared with it. Rank correlation makes it possible to assess the degree of consistency between the individual and group hierarchy of features.

Let us consider how the significance of the correlation coefficient is determined in the cases listed above. In the case of two features, it will be determined by the sample size. In the case of two individual feature hierarchies, the significance depends on the number of features included in the hierarchy. In two recent cases significance is determined by the number of traits studied, and not by the number of groups. Thus, the significance of rs in all cases is determined by the number of ranked values ​​n.

When checking statistical significance rs use tables of critical values ​​of the rank correlation coefficient compiled for various numbers of ranked values ​​and different levels significance. If a absolute value rs reaches a critical value or exceeds it, then the correlation is significant.

When considering the first option (a case with two features measured in the same group of subjects), the following hypotheses are possible.

H0: The correlation between variables x and y is not different from zero.

H1: The correlation between variables x and y is significantly different from zero.

If we work with any of the three remaining cases, then we need to put forward another pair of hypotheses:

H0: The correlation between the x and y hierarchies is nonzero.

H1: The correlation between x and y hierarchies is significantly different from zero.

The sequence of actions in calculating the Spearman rank correlation coefficient rs is as follows.

  • - Determine which two features or two feature hierarchies will participate in the matching as x and y variables.
  • - Rank the values ​​of the variable x, assigning a rank of 1 the smallest value, according to the ranking rules. Place the ranks in the first column of the table in order of the numbers of the subjects or signs.
  • - Rank the values ​​of the variable y. Place the ranks in the second column of the table in order of the numbers of the subjects or signs.
  • - Calculate the differences d between the ranks x and y for each row of the table. The results are placed in the next column of the table.
  • - Calculate the squared differences (d2). Place the obtained values ​​in the fourth column of the table.
  • - Calculate the sum of the squares of the differences? d2.
  • - If the same ranks occur, calculate the corrections:

where tx is the volume of each group of equal ranks in sample x;

ty is the size of each group of equal ranks in sample y.

Calculate the rank correlation coefficient depending on the presence or absence of identical ranks. In the absence of identical ranks, the rank correlation coefficient rs is calculated using the formula:

In the presence of the same ranks, the rank correlation coefficient rs is calculated using the formula:

where?d2 is the sum of the squared differences between the ranks;

Tx and Ty - corrections for the same ranks;

n is the number of subjects or features that participated in the ranking.

Determine the critical values ​​of rs from table 3 of the Appendix, for a given number of subjects n. A significant difference from zero of the correlation coefficient will be observed provided that rs is not less than the critical value.

Discipline" higher mathematics"causes some rejection, since it is truly not possible for everyone to understand it. But those who are lucky enough to study this subject and solve problems using various equations and coefficients, can boast of almost complete knowledge of it. AT psychological science there is not only a humanitarian focus, but also certain formulas and methods for mathematical verification of the hypothesis put forward in the course of research. For this, various coefficients are applied.

Spearman's correlation coefficient

This is a common measurement for determining the closeness of the relationship between any two features. The coefficient is also called the non-parametric method. It shows connection statistics. That is, we know, for example, that in a child, aggression and irritability are interconnected, and the Spearman rank correlation coefficient shows the statistical mathematical relationship of these two features.

How is the ranking coefficient calculated?

Naturally, for everyone mathematical definitions or quantities, there are formulas by which they are calculated. It also has the Spearman correlation coefficient. Its formula is the following:

At first glance, the formula is not entirely clear, but if you look, everything is very easy to calculate:

  • n is the number of features or indicators that are ranked.
  • d is the difference between certain two ranks corresponding to the specific two variables of each subject.
  • ∑d 2 is the sum of all squared differences of the feature ranks, the squares of which are calculated separately for each rank.

Scope of the mathematical measure of connection

For application rank coefficient it is necessary that the quantitative data of the attribute be ranked, that is, they were assigned a certain number depending on the place where the attribute is located and on its value. It is proved that two series of features expressed in numerical form are somewhat parallel to each other. Spearman's rank correlation coefficient determines the degree of this parallelism, the tightness of the relationship of features.

For mathematical operation to calculate and determine the relationship of features using the specified coefficient, you need to perform some actions:

  1. Each value of any subject or phenomenon is assigned a number in order - a rank. It can correspond to the value of the phenomenon in ascending and descending order.
  2. Next, the ranks of the values ​​of the signs of two quantitative series are compared in order to determine the difference between them.
  3. In a separate column of the table, for each difference obtained, its square is written, and the results are summarized below.
  4. After these steps, a formula is applied by which the Spearman correlation coefficient is calculated.

Properties of the correlation coefficient

The main properties of the Spearman coefficient include the following:

  • Measuring values ​​between -1 and 1.
  • The sign of the coefficient of interpretation has no.
  • The closeness of the connection is determined by the principle: the higher the value, the closer the connection.

How to check the received value?

To check the relationship between signs, you must perform certain actions:

  1. being put forward null hypothesis(H0), it is also the main one, then another one is formulated, alternative to the first one (H 1). The first hypothesis would be that the Spearman correlation coefficient is 0, which means that there will be no connection. The second, on the contrary, says that the coefficient is not equal to 0, then there is a connection.
  2. The next step is to find the observed value of the criterion. It is found by the basic formula of the Spearman coefficient.
  3. Next, the critical values ​​of the given criterion are found. This can only be done using a special table that displays various meanings for given indicators: significance level (l) and the number that determines (n).
  4. Now we need to compare the two received values: the established observable, as well as the critical one. To do this, you need to build a critical region. It is necessary to draw a straight line, mark on it the points of the critical value of the coefficient with the "-" sign and with the "+" sign. To the left and to the right of the critical values, the critical regions are plotted in semicircles from the points. In the middle, combining two values, it is marked with a semicircle of the OPG.
  5. After that, a conclusion is made about the tightness of the relationship between the two features.

Where is the best place to use this value?

The very first science where this coefficient was actively used was psychology. After all, this is a science that is not based on numbers, however, to prove any important hypotheses regarding the development of relationships, character traits of people, students' knowledge, statistical confirmation of the conclusions is required. It is also used in the economy, in particular, in foreign exchange transactions. Here, features without statistics are evaluated. Spearman's rank correlation coefficient is very convenient in this area of ​​application in that the assessment is made independently of the distribution of variables, since they are replaced by a rank number. The Spearman coefficient is actively used in banking. Sociology, political science, demography and other sciences also use it in their research. Results are obtained quickly and as accurately as possible.

Conveniently and quickly used Spearman's correlation coefficient in Excel. There are special functions here that help you quickly get the necessary values.

What other correlation coefficients exist?

In addition to what we learned about the Spearman correlation coefficient, there are also various correlation coefficients that allow us to measure, evaluate qualitative features, connection between quantitative features, the tightness of the relationship between them, presented in a rank scale. These are such coefficients as bis-serial, rank-bis-serial, content, associations, and so on. The Spearman coefficient shows the tightness of the connection very accurately, unlike all other methods of its mathematical determination.

Assignment of the rank correlation coefficient

Spearman's rank correlation method allows you to determine the tightness (strength) and direction of the correlation between two signs or two profiles (hierarchies) signs.

Description of the method

To calculate rank correlation, it is necessary to have two rows of values ​​that can be ranked. These ranges of values ​​can be:

1) two signs measured in the same group of subjects;

2) two individual feature hierarchies, identified in two subjects according to the same set of characteristics (for example, personality profiles according to the 16-factor questionnaire of R. B. Cattell, hierarchy of values ​​according to the method of R. Rokeach, sequences of preferences in choosing from several alternatives, etc.);

3) two group hierarchies of features;

4) individual and group feature hierarchy.

First, the indicators are ranked separately for each of the features. As a rule, a lower value of a feature is assigned a lower rank.

Consider case 1 (two features). Here, the individual values ​​for the first feature obtained by different subjects are ranked, and then the individual values ​​for the second feature.

If two features are positively related, then subjects who have low ranks in one of them will have low ranks in the other, and subjects who have high ranks in one of the features will also have high ranks in the other feature. For counting r s it is necessary to determine the differences (d) between the ranks obtained by the given subject on both grounds. Then these indicators d are transformed in a certain way and subtracted from 1. The smaller the difference between the ranks, the larger r s will be, the closer it will be to +1.

If there is no correlation, then all ranks will be mixed and there will be no correspondence between them. The formula is designed so that in this case r s, will be close to 0.

In the case of a negative correlation, low ranks of the subjects on one attribute will correspond to high ranks on another attribute, and vice versa.

The greater the discrepancy between the ranks of the subjects on the two variables, the closer r s to -1.

Consider case 2 (two individual profiles). Here, the individual values ​​obtained by each of the 2 subjects are ranked according to a certain (the same for both of them) set of features. The first rank will receive the trait with the lowest value; the second rank is a feature with a higher value, and so on. Obviously, all features must be measured in the same units, otherwise ranking is impossible. For example, it is impossible to rank indicators according to the Cattell Personality Questionnaire (16 PF) if they are expressed in "raw" scores, since the ranges of values ​​​​are different for different factors: from 0 to 13, from 0 to 20 and from 0 to 26. We cannot say which of the factors will take the first place in terms of severity, yet we will not bring all the values ​​to a single scale (most often this is the scale of the walls).

If the individual hierarchies of two subjects are positively related, then the features that have low ranks for one of them will have low ranks for the other, and vice versa. For example, if for one subject the factor E (dominance) has the lowest rank, then for another subject it should have a low rank, if for one subject factor C (emotional stability) has highest rank, then the other subject must also have a high rank in this factor, and so on.

Consider case 3 (two group profiles). Here, the average group values ​​obtained in 2 groups of subjects are ranked according to a certain set of characteristics, which is the same for the two groups. In what follows, the line of reasoning is the same as in the previous two cases.

Consider case 4 (individual and group profiles). Here, the individual values ​​of the subject and the mean group values ​​are ranked separately for the same set of features that are obtained, as a rule, by excluding this individual subject - he does not participate in the mean group profile, with which his individual profile will be compared. Rank correlation will allow you to check how consistent the individual and group profiles are.

In all four cases, the significance of the obtained correlation coefficient is determined by the number of ranked values N. In the first case, this number will coincide with the sample size n. In the second case, the number of observations will be the number of features that make up the hierarchy. In the third and fourth cases N- it is also the number of features compared, and not the number of subjects in groups. Detailed explanations are given in the examples.

If the absolute value of r s reaches or exceeds a critical value, the correlation is significant.

Hypotheses

There are two possible hypotheses. The first refers to case 1, the second to the other three cases.

The first version of hypotheses

H 0: The correlation between variables A and B is non-zero.

H 1: The correlation between variables A and B is significantly different from zero.

The second version of the hypotheses

H 0: Correlation between hierarchies A and B is non-zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Graphical representation of the rank correlation method

Most often, a correlation is represented graphically in the form of a cloud of points or in the form of lines reflecting the general trend in the placement of points in the space of two axes: the axes of feature A and feature B (see Fig. 6.2).

Let's try to depict the rank correlation as two series of ranked values, which are pairwise connected by lines (Fig. 6.3). If the ranks on attribute A and on attribute B coincide, then there is a horizontal line between them, if the ranks do not match, then the line becomes slanted. The greater the rank mismatch, the more sloping the line becomes. On the left in Fig. 6.3 shows the highest possible positive correlation (r in = +1.0) - practically this is a "ladder". In the center, zero correlation is displayed - a braid with irregular weaves. All ranks are mixed up here. The highest negative correlation (r s =-1.0) is displayed on the right - a web with the correct interweaving of lines.

Rice. 6.3. Graphical representation of rank correlation:

a) high positive correlation;

b) zero correlation;

c) high negative correlation

Restrictionsrank coefficientcorrelations

1. At least 5 observations must be submitted for each variable. The upper limit of the sample is determined by the available tables of critical values ​​(Table XVI of Appendix 1), namely N40.

2. Spearman's rank correlation coefficient r s with a large number of identical ranks for one or both of the compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values. If this condition is not met, it is necessary to make an adjustment for the same ranks. The corresponding formula is given in example 4.

Example 1 - Correlationbetween twosigns

In a study simulating the activity of an air traffic controller (Oderyshev B.S., Shamova E.P., Sidorenko E.V., Larchenko N.N., 1978), a group of subjects, students of the Faculty of Physics of Leningrad State University, were trained before starting work on the simulator. The subjects had to solve the problem of choosing the optimal type of runway for a given type of aircraft. Is the number of mistakes made by the subjects in the training session related to the indicators of verbal and non-verbal intelligence, measured by the method of D. Veksler?

Table 6.1

Indicators of the number of errors in the training session and indicators of the level of verbal and non-verbal intelligence among physics students (N=10)

test subject

Number of mistakes

Verbal intelligence score

Non-verbal intelligence score

First, let's try to answer the question of whether the indicators of the number of errors and verbal intelligence are related.

Let's formulate hypotheses.

H 0: The correlation between the number of errors in the training session and the level of verbal intelligence does not differ from zero.

H1 : The correlation between the indicator of the number of errors in the training session and the level of verbal intelligence is statistically significantly different from zero.

Next, we need to rank both indicators, Attributing a lower rank to the smaller value, then calculate the differences between the ranks that each subject received for two variables (features), and square these differences. We will make all the necessary calculations in the table.

In Table. 6.2 in the first column on the left are the values ​​​​in terms of the number of errors; in the next column, their ranks. The third column from the left presents values ​​for verbal intelligence; the next column is their ranks. The fifth from the left shows the differences d between the rank in variable A (number of errors) and variable B (verbal intelligence). The last column shows the squares of the differences - d 2 .

Table 6.2

Calculation d 2 for Spearman's rank correlation coefficient r s when comparing the indicators of the number of errors and verbal intelligence among physics students (N=10)

test subject

Variable A

number of mistakes

Variable B

verbal intelligence.

d (rank A -

J 2

Individual

values

Individual

values

Spearman's rank correlation coefficient is calculated by the formula:

where d - the difference between the ranks of the two variables for each subject;

N- number of ranked values, c. in this case, the number of subjects.

Let's calculate the empirical value of r s:

The obtained empirical value of r s is close to 0. And yet we determine the critical values ​​of r s at N=10 according to Table. XVI Appendix 1:

Answer: H 0 is received. The correlation between the indicator of the number of errors in the training session and the level of verbal intelligence does not differ from zero.

Now let's try to answer the question of whether the indicators of the number of errors and non-verbal intelligence are related.

Let's formulate hypotheses.

H 0: The correlation between the number of errors in the training session and the level of non-verbal intelligence does not differ from 0.

H 1: The correlation between the number of errors in the training session and the level of non-verbal intelligence is statistically significantly different from 0.

The results of ranking and comparison of ranks are presented in Table. 6.3.

Table 6.3

Calculation d 2 for Spearman's rank correlation coefficient r s when comparing indicators of the number of errors and non-verbal intelligence among physics students (N=10)

test subject

Variable A

number of mistakes

Variable E

non-verbal intelligence

d (rank A -

d 2

Individual

Individual

values

values

We remember that to determine the significance of r s, it does not matter whether it is positive or negative, only its absolute value is important. In this case:

r s emp

Answer: H 0 is received. The correlation between the indicator of the number of errors in the training session and the level of non-verbal intelligence is random, r s does not differ from 0.

However, we can draw attention to a certain trend negative relationship between these two variables. Perhaps we could confirm it at a statistically significant level if we increased the sample size.

Example 2 - correlation between individual profiles

In a study devoted to the problems of value reorientation, hierarchies of terminal values ​​were identified according to the method of M. Rokeach in parents and their adult children (Sidorenko E.V., 1996). The ranks of terminal values ​​obtained during the examination of a mother-daughter pair (mother - 66 years old, daughters - 42 years old) are presented in Table. 6.4. Let's try to determine how these value hierarchies correlate with each other.

Table 6.4

Ranks of terminal values ​​according to the list of M. Rokeach in individual hierarchies of mother and daughter

terminal values

The rank of values ​​in

The rank of values ​​in

d 2

mother's hierarchy

daughter hierarchy

1 Active active life

2 Life wisdom

3 Health

4 Interesting work

5 Beauty of nature and art

7 Financially secure life

8 Having good and loyal friends

9 Public recognition

10 Cognition

11 Productive life

12 Development

13 Entertainment

14 Freedom

15 Happy family life

16 The happiness of others

17 Creativity

18 self-confidence

Let's formulate hypotheses.

H 0: Correlation between mother and daughter terminal value hierarchies does not differ from zero.

H 1: The correlation between the terminal value hierarchies of mother and daughter is statistically significantly different from zero.

Since the ranking of values ​​is assumed by the research procedure itself, we only have to calculate the differences between the ranks of the 18 values ​​in the two hierarchies. In the 3rd and 4th columns of Tab. 6.4 presents the differences d and the squares of these differences d 2 .

We determine the empirical value r s by the formula:

where d - differences between the ranks for each of the variables, in this case for each of the terminal values;

N- the number of variables forming the hierarchy, in this case, the number of values.

For this example:

According to Table. XVI Appendix 1 define critical values:

Answer: H 0 is rejected. H 1 is accepted. The correlation between the hierarchies of terminal values ​​of mother and daughter is statistically significant (p<0,01) и является положительной.

According to Table. 6.4 we can determine that the main differences are in the values ​​of "Happy family life", "Public recognition" and "Health", the ranks of other values ​​are quite close.

Example 3 - Correlation between two group hierarchies

Joseph Wolpe in a book written jointly with his son (Wolpe J., Wolpe D., 1981) provides an ordered list of the most common "useless" fears in modern man, according to his designation, which do not carry a signal value and only interfere with a full life and act. In a domestic study conducted by M.E. Rakhova (1994) 32 subjects had to assess on a 10-point scale how relevant this or that type of fear from the Volpe list is for them 3 . The surveyed sample consisted of students of the Hydrometeorological and Pedagogical Institutes of St. Petersburg: 15 boys and 17 girls aged 17 to 28 years, average age 23 years.

The data obtained on a 10-point scale were averaged over 32 subjects, and the averages were ranked. In Table. 6.5 presents the ranking indicators obtained by J. Volpe and M. E. Rakhova. Do the ranking sequences of the 20 types of fear match?

Let's formulate hypotheses.

H 0: Correlation between the ordered lists of types of fear in the American and domestic samples does not differ from zero.

H 1: The correlation between the ordered lists of types of fear in the American and Russian samples is statistically significantly different from zero.

All calculations related to the calculation and squaring of the differences between the ranks of different types of fear in two samples are presented in Table. 6.5.

Table 6.5

Calculation d for Spearman's rank correlation coefficient when comparing ordered lists of fear types in American and Russian samples

Types of fear

Rank in the American sample

Rank in Russian

Fear of public speaking

Fear of flying

Fear of making a mistake

Fear of failure

Fear of disapproval

Fear of rejection

Fear of evil people

Fear of being alone

Fear of blood

Fear of open wounds

Dentist Fear

Fear of injections

Fear of taking tests

Fear of the police ^militia)

Fear of heights

fear of dogs

Fear of spiders

Fear of crippled people

Fear of hospitals

Fear of the dark

We determine the empirical value r s:

According to Table. XVI Appendix 1 determine the critical values ​​of g s at N=20:

Answer: H 0 is received. The correlation between the ordered lists of types of fear in the American and Russian samples does not reach the level of statistical significance, i.e., does not significantly differ from zero.

Example 4 - Correlation between individual and group mean profiles

A sample of St. Petersburg residents aged 20 to 78 years (31 men, 46 women), balanced by age in such a way that people over the age of 55 accounted for 50% of it 4 , was asked to answer the question: "What is the level of development of each of the following qualities necessary for a deputy of the City Assembly of St. Petersburg?" (Sidorenko E.V., Dermanova I.B., Anisimova O.M., Vitenberg E.V., Shulga A.P., 1994). The assessment was made on a 10-point scale. In parallel with this, a sample of deputies and candidates for deputies to the City Assembly of St. Petersburg (n=14) was surveyed. Individual diagnostics of politicians and candidates was carried out using the Oxford system of express video diagnostics according to the same set of personal qualities that was presented to a sample of voters.

In Table. 6.6 shows the average values ​​obtained for each of the qualities in sample of voters ("reference row") and individual values ​​of one of the deputies of the City Assembly.

Let's try to determine how the individual profile of the deputy of the K-va correlates with the reference profile.

Table 6.6

Averaged reference voters' ratings (n=77) and individual indicators of the K-va deputy on 18 personal qualities of express video diagnostics

Name of quality

Average Voter Benchmarks

Individual indicators of the deputy K-va

1. General level of culture

2. Learnability

4. Ability to create something new

5. Self-criticism

6. Responsibility

7. Self-reliance

8. Energy, activity

9. Purposefulness

10. Endurance, self-control

I. Persistence

12. Personal maturity

13. Decency

14. Humanism

15. Ability to communicate with people

16. Tolerance for other people's opinions

17. Flexibility of behavior

18. Ability to make a favorable impression

Table 6.7

Calculation d 2 for Spearman's rank correlation coefficient between the reference and individual profiles of a deputy's personal qualities

Name of quality

quality rank in the reference profile

Row 2: Quality Rank in Individual Profile

d 2

1 Responsibility

2 Integrity

3 Ability to communicate with people

4 Endurance, self-control

5 General level of culture

6 Energy, activity

8 Self-criticism

9 Autonomy

10 Personal maturity

And Purposefulness

12 Learnability

13 Humanism

14 Tolerance for other people's opinions

15 Fortitude

16 Flexibility of behavior

17 Ability to make a favorable impression

18 Ability to create new

As can be seen from Table. 6.6, estimates of voters and individual indicators of a deputy vary in different ranges. Indeed, voters' assessments were obtained on a 10-point scale, and individual indicators for express video diagnostics are measured on a 20-point scale. Ranking allows us to translate both measurement scales into a single scale, where the unit of measurement will be 1 rank, and the maximum value will be 18 ranks.

Ranking, as we remember, must be done separately for each series of values. In this case, it is advisable to assign a lower rank to a higher value, so that you can immediately see in what place in terms of significance (for voters) or in terms of severity (for a deputy) this or that quality is located.

The ranking results are presented in Table. 6.7. The qualities are listed in a sequence that reflects the reference profile.

Let's formulate hypotheses.

H 0: The correlation between the individual profile of the deputy of the Q-va and the reference profile, built on the basis of voters' assessments, does not differ from zero.

H 1: The correlation between the individual profile of the deputy of the Q-va and the reference profile, built on the basis of voters' assessments, is statistically significantly different from zero. Since both compared ranking series contain

groups of identical ranks, before calculating the rank coefficient

correlation, it is necessary to correct for the same ranks T a and T b :

where a - the volume of each group of identical ranks in the rank row A,

b - the volume of each group of identical ranks in the rank series B.

In this case, in row A (reference profile) there is one group of identical ranks - the qualities "learning ability" and "humanism" have the same rank of 12.5; hence, a=2.

T a \u003d (2 3 -2) / 12 \u003d 0.50.

In row B (individual profile) there are two groups of the same ranks, while b 1 =2 and b 2 =2.

T a =[(2 3 -2)+(2 3 -2)]/12=1.00

To calculate the empirical value of r s, we use the formula

In this case:

Note that if we did not introduce a correction for the same ranks, then the value of r s would be only (by 0.0002) higher:

For large numbers of identical ranks, the changes in r 5 may turn out to be much more significant. The presence of the same ranks means a lesser degree of differentiated™ ordered variables and, consequently, a lower ability to assess the degree of connection between them (Sukhodolsky G.V., 1972, p. 76).

According to Table. XVI Appendix 1 determine the critical values ​​of r, at N=18:

Answer: hq is rejected. The correlation between the individual profile of the deputy of the Q-va and the reference profile that meets the requirements of voters is statistically significant (p<0,05) и является положи­тельной.

From Tab. 6.7 it can be seen that Deputy K-v has a lower rank on the scales of Ability to communicate with people and higher ranks on the scales of Purposefulness and Fortitude than prescribed by the electoral standard. These discrepancies mainly explain some decrease in the obtained r s .

Let us formulate a general algorithm for counting r s .

Spearman's rank correlation method allows you to determine the tightness (strength) and direction of the correlation between two features or two profiles (hierarchies) of features.

To calculate the rank correlation, it is necessary to have two series of values,

which can be ranked. These ranges of values ​​can be:

1) two signs measured in the same group of subjects;

2) two individual hierarchies of traits identified in two subjects for the same set of traits;

3) two group hierarchies of features,

4) individual and group hierarchies of features.

First, the indicators are ranked separately for each of the features.

As a rule, a lower value of a feature is assigned a lower rank.

In the first case (two features), the individual values ​​for the first feature, obtained by different subjects, are ranked, and then the individual values ​​for the second feature.

If two attributes are positively related, then subjects with low ranks in one of them will have low ranks in the other, and subjects with high ranks in

one of the traits will also have high ranks on the other trait. To calculate rs, it is necessary to determine the difference (d) between the ranks obtained by the given subject on both grounds. Then these indicators d are transformed in a certain way and subtracted from 1. Than

the smaller the difference between the ranks, the larger rs will be, the closer it will be to +1.

If there is no correlation, then all ranks will be mixed and there will be no

no match. The formula is designed so that in this case rs will be close to 0.

In the case of a negative correlation, the low ranks of the subjects on one attribute

will correspond to high ranks on another attribute, and vice versa. The greater the discrepancy between the ranks of subjects on two variables, the closer rs is to -1.

In the second case (two individual profiles), individual

the values ​​obtained by each of the 2 subjects according to a certain (the same for both of them) set of features. The first rank will receive the trait with the lowest value; the second rank is a feature with a higher value, and so on. Obviously, all features must be measured in the same units, otherwise ranking is impossible. For example, it is impossible to rank indicators on the Cattell Personality Questionnaire (16PF) if they are expressed in "raw" scores, since the ranges of values ​​for different factors are different: from 0 to 13, from 0 to

20 and from 0 to 26. We cannot say which of the factors will take the first place in terms of severity until we bring all the values ​​to a single scale (most often this is the wall scale).

If the individual hierarchies of two subjects are positively related, then the features that have low ranks for one of them will have low ranks for the other, and vice versa. For example, if for one subject the factor E (dominance) has the lowest rank, then for another subject it should have a low rank, if one subject has factor C

(emotional stability) has the highest rank, then the other subject must also have

this factor has a high rank, and so on.

In the third case (two group profiles), the average group values ​​obtained in 2 groups of subjects are ranked according to a certain set of features that is the same for two groups. In what follows, the line of reasoning is the same as in the previous two cases.

In the case of the 4th (individual and group profiles), the individual values ​​of the subject and the mean group values ​​are ranked separately according to the same set of features that are obtained, as a rule, by excluding this individual subject - he does not participate in the mean group profile, with which he will be compared. individual profile. Rank correlation will allow you to check how consistent the individual and group profiles are.

In all four cases, the significance of the obtained correlation coefficient is determined by the number of ranked values ​​N. In the first case, this number will coincide with the sample size n. In the second case, the number of observations will be the number of features that make up the hierarchy. In the third and fourth cases, N is also the number of compared features, and not the number of subjects in the groups. Detailed explanations are given in the examples. If the absolute value of rs reaches or exceeds a critical value, the correlation is significant.

Hypotheses.

There are two possible hypotheses. The first refers to case 1, the second to the other three cases.

The first version of hypotheses

H0: The correlation between variables A and B is not different from zero.

H1: The correlation between variables A and B is significantly different from zero.

The second version of the hypotheses

H0: Correlation between hierarchies A and B is not different from zero.

H1: The correlation between hierarchies A and B is significantly different from zero.

Limitations of the rank correlation coefficient

1. At least 5 observations must be submitted for each variable. The upper limit of the sample is determined by the available tables of critical values.

2. Spearman's rank correlation coefficient rs at in large numbers equal ranks for one or both of the compared variables gives coarsened values. Ideally, both correlated series should be two sequences of mismatched values. If this condition is not met, it is necessary to make an adjustment for the same ranks.

Spearman's rank correlation coefficient is calculated by the formula:

If in both compared rank series there are groups of the same ranks, before calculating the rank correlation coefficient, it is necessary to make corrections for the same ranks Ta and Tv:

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank series A, c is the volume of each

groups of equal ranks in the rank series B.

To calculate the empirical value of rs, use the formula:

Calculation of Spearman's rank correlation coefficient rs

1. Determine which two characteristics or two characteristic hierarchies will participate in

comparison as variables A and B.

2. Rank the values ​​of the variable A, assigning rank 1 to the smallest value, in accordance with the ranking rules (see A.2.3). Enter the ranks in the first column of the table in order of the numbers of the subjects or signs.

3. Order the values ​​of the variable B, in accordance with the same rules. Enter the ranks in the second column of the table in order of the numbers of the subjects or signs.

5. Square each difference: d2. Enter these values ​​in the fourth column of the table.

Ta \u003d Σ (a3 - a) / 12,

TV \u003d Σ (v3 - c) / 12,

where a is the volume of each group of identical ranks in the rank row A; c - the volume of each group

the same ranks in the ranking series B.

a) in the absence of identical ranks

rs  1 − 6 ⋅

b) in the presence of the same ranks

Σd 2  T  T

r  1 − 6 ⋅ a in,

where Σd2 is the sum of squared differences between ranks; Ta and TV are corrections for the same

N is the number of subjects or features that participated in the ranking.

9. Determine from the Table (see Appendix 4.3) the critical values ​​of rs for a given N. If rs is greater than or at least equal to the critical value, the correlation is significantly different from 0.

Example 4.1. When determining the degree of dependence of the reaction of drinking alcohol on the oculomotor reaction in the test group, data were obtained before drinking alcohol and after drinking. Does the reaction of the subject depend on the state of intoxication?

Experiment results:

Before: 16, 13, 14, 9, 10, 13, 14, 14, 18, 20, 15, 10, 9, 10, 16, 17, 18. After: 24, 9, 10, 23, 20, 11, 12, 19, 18, 13, 14, 12, 14, 7, 9, 14. Let's formulate hypotheses:

H0: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking does not differ from zero.

H1: the correlation between the degree of dependence of the reaction before drinking alcohol and after drinking is significantly different from zero.

Table 4.1. Calculation of d2 for the Spearman rank correlation coefficient rs when comparing the parameters of the oculomotor reaction before and after the experiment (N=17)

values

values

Since we have repeated ranks, in this case we will apply the formula adjusted for the same ranks:

Ta= ((23-2)+(33-3)+(23-2)+(33-3)+(23-2)+(23-2))/12=6

Tb =((23-2)+(23-2)+(33-3))/12=3

Find the empirical value of the Spearman coefficient:

rs = 1- 6*((767.75+6+3)/(17*(172-1)))=0.05

According to the table (Appendix 4.3) we find the critical values ​​of the correlation coefficient

0.48 (p ≤ 0.05)

0.62 (p ≤ 0.01)

We get

rs=0.05∠rcr(0.05)=0.48

Conclusion: H1 hypothesis is rejected and H0 is accepted. Those. correlation between degree

dependence of the reaction before alcohol consumption and after does not differ from zero.