Biographies Characteristics Analysis

Statistical psychology. Methods of mathematical statistics in psychology

The word "statistics" is often associated with the word "mathematics", and this intimidates students who associate this concept with complex formulas that require a high level of abstraction.

However, as McConnell says, statistics is primarily a way of thinking, and all you need to use it is to have a little common sense and know the basics of mathematics. In our daily life, we ourselves, without realizing it, are constantly engaged in statistics. Do we want to plan a budget, calculate the gasoline consumption of a car, estimate the effort that will be required to master a certain course, taking into account the marks obtained so far, predict the likelihood of good and bad weather from a weather report, or generally estimate how this or that event will affect on our personal or collective future - we constantly have to select, classify and organize information, connect it with other data so that we can draw conclusions that allow us to make the right decision.

All these activities differ little from those operations that underlie scientific research and consist in the synthesis of data obtained on various groups of objects in a particular experiment, in their comparison in order to find out the differences between them, in their comparison in order to identify indicators that change in one direction, and, finally, in the prediction of certain facts based on the conclusions that the results lead to. This is precisely the purpose of statistics in the sciences in general, especially in the humanities. There is nothing absolutely reliable in the latter, and without statistics, the conclusions in most cases would be purely intuitive and could not form a solid basis for interpreting the data obtained in other studies.

In order to appreciate the enormous benefits that statistics can provide, we will try to follow the progress of deciphering and processing the data obtained in the experiment. Thus, based on the specific results and the questions that they pose to the researcher, we will be able to understand the various methods and simple ways to apply them. However, before embarking on this work, it will be useful for us to consider in the most general terms the three main branches of statistics.

1. Descriptive statistics, as the name suggests, allows you to describe, summarize and reproduce in the form of tables or graphs

data of one or another distribution, calculate the average for a given distribution and its scope and dispersion.

2. Challenge inductive statistics- checking whether it is possible to disseminate the results obtained at this sampling, for the entire population from which this sample was taken. In other words, the rules of this section of statistics make it possible to find out to what extent it is possible, by induction, to generalize to a larger number of objects this or that regularity discovered when studying their limited group in the course of any observation or experiment. Thus, with the help of inductive statistics, some conclusions and generalizations are made based on the data obtained during the study of the sample.

3. Finally, measurement correlations allows us to know how related two variables are, so that we can predict the possible values ​​of one of them if we know the other.

There are two types of statistical methods or tests that allow you to generalize or calculate the degree of correlation. The first type is the most widely used parametric methods, which use parameters such as the mean or variance of the data. The second variety is nonparametric methods, which provide an invaluable service when the researcher is dealing with very small samples or with high-quality data; these methods are very simple in terms of both calculation and application. As we become familiar with the different ways of describing data and move on to statistical analysis of it, we will look at both of these varieties.

As already mentioned, in order to try to understand these various areas of statistics, we will try to answer the questions that arise in connection with the results of a particular study. As an example, we will take one experiment, namely, the study of the effect of marijuana consumption on oculomotor coordination and reaction time. The methodology used in this hypothetical experiment, as well as the results we could obtain from it, are presented below.

If you wish, you can replace some specific details of this experiment with others - for example, marijuana use for alcohol consumption or sleep deprivation - or, even better, substitute for these hypothetical data that you actually received in your own research. In any case, you will have to accept the "rules of our game" and perform the calculations that are required of you here; only under this condition will the essence of the object “reach” you, if this has not already happened to you before.

Important note. In the sections on descriptive and inductive statistics, we will consider only those experimental data that are relevant to the dependent variable “targets hit”. As for such an indicator as reaction time, we will turn to it only in the section on calculating the correlation. However, it goes without saying that from the very beginning, the values ​​of this indicator should be treated in the same way as the variable “hit targets”. We leave it to the reader to do this on their own with pencil and paper.

Some basic concepts. Population and sample

One of the tasks of statistics is to analyze data obtained from a part of a population in order to draw conclusions about the population as a whole.

population in statistics does not necessarily mean any group of people or natural community; this term refers to all beings or objects that form a common study population, whether they are atoms or students visiting this or that cafe.

Sample- this is a small number of elements selected using scientific methods so that it is representative, i.e. reflected the population as a whole.

(In the domestic literature, the terms “general population” and “sample population”, respectively, are more common. - Note. transl.)

Data and its varieties

Data in statistics, these are the main elements to be analyzed. Data can be any quantitative results, properties inherent in certain members of the population, a place in a particular sequence - in general, any information that can be classified or categorized for the purpose of processing.

"Data" should not be confused with the "values" that data can take. In order to always distinguish between them, Chatillon (1977) recommends remembering the following phrase: “Data often takes on the same values” (so if we take, for example, six data - 8, 13, 10, 8, 10 and 5, they take only four different values ​​- 5, 8, 10 and 13).

Building distribution- this is the division of primary data obtained in the sample into classes or categories in order to obtain a generalized ordered picture that allows them to be analyzed.

There are three types of data:

1. quantitative data obtained during measurements (for example, data on weight, dimensions, temperature, time, test results, etc.). They can be distributed on a scale with equal intervals.

2. Ordinal data, corresponding to the places of these elements in the sequence obtained by placing them in ascending order (1st, ..., 7th, ..., 100th, ...; A, B, C. ...) .

3. Qualitative data, representing some properties of the elements of the sample or population. They cannot be measured, and their only quantitative assessment is the frequency of occurrence (the number of persons with blue or green eyes, smokers and non-smokers, tired and rested, strong and weak, etc.).

Of all these types of data, only quantitative data can be analyzed using methods based on options(such as the arithmetic mean, for example). But even for quantitative data, such methods can only be applied if the number of these data is sufficient to show a normal distribution. So, in principle, three conditions are necessary for the use of parametric methods: the data must be quantitative, their number must be sufficient, and their distribution must be normal. In all other cases, it is always recommended to use nonparametric methods.

Multivariate statistical methods among the many possible probabilistic-statistical models make it possible to reasonably choose the one that best corresponds to the initial statistical data characterizing the real behavior of the studied set of objects, to assess the reliability and accuracy of conclusions drawn on the basis of limited statistical material. The manual discusses the following methods of multivariate statistical analysis: regression analysis, factor analysis, discriminant analysis. The structure of the application software package "Statistica", as well as the implementation in this package of the stated methods of multivariate statistical analysis, is outlined.

Release year : 2007
Author : Bureeva N.N.
Genre : Tutorial
Publisher: Nizhny Novgorod

Tags ,

The textbook discusses the possibility of using the STATISTICA application package (APP) to implement statistical methods for analyzing empirical distributions and conducting sample statistical observation in an amount sufficient to solve a wide range of practical problems. It is recommended for students of the Faculty of Economics and Management of the day and evening departments studying the discipline "Statistics". The manual can be used by graduate students, graduate students, scientists and practitioners who are faced with the need to use statistical methods for processing initial data. The manual contains information on the STATISTICA PPP that has not been published in Russian.

Release year : 2009
Author : Kuprienko N.V., Ponomareva O.A., Tikhonov D.V.
Genre : Help
Publisher: SPb.: Izd-vo Politekhn. university

Tags ,

The book is the first step towards getting acquainted with the STATISTICA program for statistical data analysis in the Windows environment STATISTICA (manufacturer StatSoft Inc, USA) occupies a steadily leading position among statistical data processing programs, has more than 250 thousand registered users in the world.

On simple, accessible to everyone examples (descriptive statistics, regression, discriminant analysis, etc.), taken from various spheres of life, the possibilities of the system for data processing are shown. The appendix contains brief materials on the toolbar, the STATISTICA BASIC language, etc. The book is addressed to the widest range of readers working on personal computers and is available to high school students.

Tags ,

Corporate manual for the program STATISTICA 6. Very large and detailed. Useful as a reference. Can be used as a textbook. When working seriously with the STATISTICA program, the manual is a must.
Volume I: Basic Agreements and Statistics I
Volume II: Graphics
Volume III: Statistics II
Details in the file with a table of contents.

Tags ,

The manual contains a complete description of the STATISTICA® system.
The guide consists of five volumes:
Volume I: AGREEMENTS AND STATISTICS I
Volume II: GRAPHICS
Volume III: STATISTICS II
Volume IV: INDUSTRIAL STATISTICS
Volume V: LANGUAGES: BASIC and SCL
The distribution includes the first three volumes.

Tags ,

Neural network methods for data analysis based on the use of the Statistica Neural Networks package (StatSoft manufacturer), fully adapted for the Russian user, are outlined. The foundations of the theory of neural networks are given; Much attention is paid to solving practical problems, the methodology and technology of conducting research using the Statistica Neural Networks package, a powerful tool for analyzing and predicting data, which has wide applications in business, industry, management, and finance, is comprehensively considered. The book contains many examples of data analysis, practical recommendations for analysis, forecasting, classification, pattern recognition, production process control using neural networks.

For a wide range of readers involved in research in banking, industry, economics, business, exploration, management, transport and other areas.

Tags ,

The book is devoted to the theory and practice of studying the foundations of mathematical statistics and pedagogical problems that arise in the learning process. The experience of using information technologies in the study of this discipline is promised.

The publication may be useful to students, graduate students and teachers of medical colleges and universities.

Tags ,

The book covers the most important elements of probability theory, the basic concepts of mathematical statistics, some sections of experiment planning and applied statistical analysis in the environment of the sixth version of the program Statistica. A large number of examples contributes to a more effective perception of the material, the development and acquisition of skills in working with the Statistica PPP.
The publication has practical significance, since it is necessary to support the educational process and research work at the university at a level corresponding to modern information technologies, provides a more complete and effective assimilation of knowledge by students in the field of applied statistical data analysis, which contributes to improving the quality of the educational process in higher education .

Addressed to students, graduate students, researchers, teachers of medical universities, biological faculties. It will be useful and interesting for representatives of other natural science and technical specialties.

Tags ,

This tutorial describes the Russian version of STATISTICA.

In addition to the general principles of work in the system and the evaluation of the statistical characteristics of indicators, the manual discusses in detail the stages of correlation, regression and dispersion analyzes, multivariate classifications. The description is accompanied by step-by-step instructions and illustrative examples, which makes the material presented accessible even for insufficiently trained users.

The textbook is intended for students, graduate students and researchers interested in statistical computer research.

Tags ,

It contains a description of practical methods and forecasting techniques in the STATISTICA system in the Windows environment and a presentation of the theoretical foundations, supplemented by a variety of practical examples. The second edition (1st edition - 1999) significantly revised Part 1. Recreated and described all dialog boxes related to forecasting in the current version of STATISTICA 6.0, showing decision automation using the STATISTICA Visual Basic language. Part 2 presents the basics of the statistical theory of forecasting.

For students, analysts, marketers, economists, actuaries, financiers, scientists who use forecasting methods in their daily activities.

Tags ,

The book is a teaching aid on probability theory, statistical methods and operations research. The necessary theoretical information is given and the solution of applied statistics problems using the Statistica package is considered in detail. The basics of the simplex method are outlined and the solution of operations research problems by means of the Excel package is considered. Variants of assignments and methodological developments on the main sections of statistics and operations research are given.

The book is addressed to everyone who needs to apply statistical methods in their work, teachers and students studying statistics and methods of operations research.

Mathematical methods in psychology are used to process research data and establish patterns between the studied phenomena. Even the simplest research is not complete without mathematical data processing.

Data processing can be carried out manually, or maybe with the use of special software. The final result may look like a table; Methods in psychology also allow you to graphically display the data obtained. For different (quantitative, qualitative and ordinal) different assessment tools are used.

Mathematical methods in psychology include both allowing to establish numerical dependencies and methods of statistical processing. Let's take a closer look at the most common of them.

In order to measure data, first of all, it is necessary to determine the scale of measurements. And here such mathematical methods in psychology are used as registration and scaling, consisting in the expression of the studied phenomena in numerical terms. There are several types of scales. However, only some of them are suitable for mathematical processing. This is mainly a quantitative scale that allows you to measure the degree of expression of specific properties in the objects under study and numerically express the difference between them. The simplest example is the measurement of intelligence quotient. The quantitative scale allows you to carry out the operation of ranking data (see below). When ranking data from a quantitative scale, it is converted into a nominal one (for example, low, medium or high value of the indicator), while the reverse transition is no longer possible.

Ranging is the distribution of data in descending (ascending) order of the feature being evaluated. In this case, a quantitative scale is used. Each value is assigned a certain rank (the indicator with the minimum value is rank 1, the next value is rank 2, and so on), after which it becomes possible to transfer the values ​​from the quantitative scale to the nominal one. For example, the measured indicator is the level of anxiety. 100 people were tested, the results are ranked, and the researcher sees how many people have a low (high or average) score. However, this way of presenting data entails a partial loss of information for each respondent.

Correlation analysis is the establishment of a relationship between phenomena. At the same time, it is measured how one indicator will change when the indicator in the relationship with which it is changed changes. Correlation is considered in two aspects: in strength and in direction. It can be positive (with an increase in one indicator, the second also increases) and negative (with an increase in the first, the second indicator decreases: for example, the higher the level of anxiety in an individual, the less likely it is that he will take a leading position in the group). The relationship can be linear or, more commonly, curved. The connections that help to establish may not be obvious at first glance if other methods of mathematical processing in psychology are used. This is its main merit. The disadvantages include high labor intensity due to the need to use a considerable number of formulas and careful calculations.

Factor analysis- this is another one that allows you to predict the likely influence of various factors on the process under study. At the same time, all factors of influence are initially taken as having equal value, and the degree of their influence is calculated mathematically. Such an analysis allows one to establish the common cause of the variability of several phenomena at once.

To display the obtained data, tabulation methods (creating tables) and graphic construction (diagrams and graphs that not only give a visual representation of the results obtained, but also allow predicting the course of the process) can be used.

The main conditions under which the above mathematical methods in psychology ensure the reliability of the study are the presence of a sufficient sample, the accuracy of measurements and the correctness of the calculations made.

Statistics in psychology (statistics in psychology)

The first use of S. in psychology is often associated with the name of Sir Francis Galton. In psychology, "statistics" refers to the use of quantitative measures and methods to describe and analyze the results of psychol. research Psychology as a science of S. is necessary. The recording, description and analysis of quantitative data allows valid comparisons based on objective criteria. S. used in psychology usually consists of two sections: descriptive (descriptive) statistics and the theory of statistical inference.

Descriptive statistics.

Descriptive S. includes methods of organizing, summarizing, and describing data. Descriptive metrics allow you to quickly and efficiently represent large datasets. The most commonly used descriptive methods are frequency distributions, measures of central tendency, and measures of relative position. Regression and correlations are used to describe relationships between variables.

The frequency distribution shows how many times each qualitative or quantitative indicator (or interval of such indicators) occurs in the data array. In addition, relative frequencies are often given - the percentage of responses of each type. The frequency distribution provides a quick insight into the structure of the data, which would be difficult to achieve by working directly with the raw data. Various types of graphs are often used to visualize frequency data.

Measures of the central tendency are the final S., describing what is typical for the distribution. Mode is defined as the most frequently occurring observation (value, category, etc.). The median is the value that bisects the distribution so that one half of it includes all values ​​above the median and the other half includes all values ​​below the median. The mean is calculated as the arithmetic mean of all observed values. Which of the measures - mode, median or mean - will best describe the distribution depends on its shape. If the distribution is symmetric and unimodal (having one mode), the median mean and mode will simply be the same. The mean is particularly affected by "outliers", shifting its value towards the extremes of the distribution, which makes the arithmetic mean the least useful measure of highly skewed (skewed) distributions.

Dr. useful descriptive characteristics of distributions are measures of variability, i.e., the extent to which the values ​​of a variable in a variational series differ. Two distributions can have the same means, medians, and modes, but differ significantly in the degree of variability in the values. Variability is estimated by two S.: variance and standard deviation.

Relative position measures include percentiles and normalized scores used to describe the location of a particular value of a variable relative to the rest of its values ​​in the distribution. Velkowitz et al define a percentile as "a number indicating the percentage of cases in a particular reference group with equal or lower scores." Thus, a percentile provides more accurate information than simply reporting that a given distribution has a value of a variable above or below the mean, median, or mode.

Normalized scores (commonly referred to as z-scores) express the deviation from the mean in units of standard deviation (σ). Normalized scores are useful because they can be interpreted relative to a standardized normal distribution (z-distribution), a symmetrical bell-shaped curve with known properties of a mean of 0 and a standard deviation of 1. Since the z-score has a sign (+ or -) , it immediately shows whether the observed value of the variable lies above or below the mean (m). And since the normalized estimate expresses the values ​​of the variable in units of standard deviation, it shows how rare each value is: approximately 34% of all values ​​fall in the interval from m to m + 1σ and 34% - in the interval from m to m - 1σ; 14% each - in the intervals from m + 1σ to m + 2σ and from m - 1σ to m - 2σ; and 2% each - in the intervals from m + 2σ to m + 3σ and from m - 2σ to m - 3σ.

Links between variables. Regression and correlation are among the methods most often used to describe relationships between variables. Two different measurements obtained for each sample element can be displayed as points in the Cartesian coordinate system (x, y) - a scatterplot that is a graphical representation of the relationship between these measurements. Often these points form an almost straight line, indicating a linear relationship between the variables. To get the regression line - mat. line equations of best fit to a set of scatterplot points - numerical methods are used. After the derivation of the regression line, it becomes possible to predict the values ​​of one variable from the known values ​​of the other and, moreover, evaluate the accuracy of the prediction.

The correlation coefficient (r) is a quantitative indicator of the tightness of a linear relationship between two variables. Methods for calculating correlation coefficients exclude the problem of comparing different units of measurement of variables. The r values ​​vary from -1 to +1. The sign reflects the direction of the connection. Negative correlation means the presence of an inverse relationship, when as the values ​​of one variable increase, the values ​​of another variable decrease. A positive correlation indicates a direct relationship, when with an increase in the values ​​of one variable, the values ​​of another variable increase. The absolute value of r shows the strength (tightness) of the relationship: r = ±1 means a straight-line relationship, and r = 0 indicates the absence of a linear relationship. The value of r2 shows the percentage of variance in one variable that can be explained by variation in another variable. Psychologists use r2 to evaluate the predictive utility of a particular measure.

Pearson's correlation coefficient (r) is for interval data obtained on supposedly normally distributed variables. For processing other types of data, there are a number of other correlation measures, for example. point-biserial correlation coefficient, coefficient j and Spearman's rank correlation coefficient (r). Correlations are often used in psychology as a source of information. for formulating hypotheses eksperim. research Multiple regression, factor analysis, and canonical correlation form a sister group of more advanced methods made available to practitioners by advances in computing technology. These methods allow you to analyze relationships between a large number of variables.

Theory of statistical inference

This section of S. includes a system of methods for drawing conclusions about large groups (actually populations) from observations made in smaller groups, called samples. In psychology, statistical inference serves two main purposes: 1) to estimate the parameters of the general population from sample statistics; 2) assess the chances of obtaining a certain pattern of research results with given characteristics of sample data.

The mean is the most commonly estimated population parameter. By virtue of the way the standard error is calculated, larger samples usually give smaller standard errors, which makes statistics calculated from larger samples somewhat more accurate estimates of population parameters. Using the standard error of the mean and normalized (standardized) probability distributions (such as the t-distribution), you can build confidence intervals - ranges of values ​​​​with known chances that the true general average will fall into them.

Evaluation of research results. The theory of statistical inference can be used to estimate the probability that particular samples belong to a known population. The process of statistical inference begins with the formulation of the null hypothesis (H0), which consists in assuming that the sample statistics are drawn from a certain population. The null hypothesis is retained or rejected, depending on how likely the result is. If the observed differences are large relative to the amount of variability in the sample data, the researcher will usually reject the null hypothesis and conclude that there is very little chance that the observed differences are due to chance: the result is statistically significant. Computed criterion statistics with known probability distributions express the relationship between observed differences and variability (variability).

Parametric statistics. Parametric C. can be used in cases where two requirements are met: 1) the variable under study is known or at least can be assumed to have a normal distribution; 2) the data are interval measurements or ratio measurements.

If the population mean and standard deviation are known (at least tentatively), it is possible to determine the exact probability of obtaining an observed difference between the known population parameter and the sample statistic. The normalized deviation (z-score) can be found by comparing it to a standardized normal curve (also called a z-distribution).

Because researchers often work with small samples and because population parameters are rarely known, the standardized Student's t-distribution is usually used more often than the normal distribution. The exact form of the t-distribution varies depending on the size of the sample (more precisely, on the number of degrees of freedom, i.e., the number of values ​​that can be freely changed in a given sample). A family of t-distributions can be used to test the null hypothesis that the two samples were drawn from the same population. This null hypothesis is typical of studies with two groups of subjects, e.g. experimental and control.

When in research more than two groups are involved, analysis of variance (F-test) can be applied. F is a universal criterion that evaluates the differences between all possible pairs of study groups simultaneously. In this case, the values ​​of dispersion within groups and between groups are compared. There are many post hoc techniques for identifying a paired source of F-test significance.

Nonparametric statistics. When it is not possible to meet the requirements for an adequate application of parametric criteria, or when the collected data are ordinal (rank) or nominal (categorical), non-parametric methods are used. These methods are parallel to parametric ones in terms of their application and purpose. Nonparametric alternatives to the t-test include the Mann-Whitney U test, the Wilcoxon (W) test, and the c2 test for nominal data. Non-parametric alternatives to analysis of variance include the Kruskal-Wallace, Friedman, and c2 tests. The logic for applying each non-parametric criterion remains the same: the corresponding null hypothesis is rejected if the estimated value of the criterion statistic is outside the specified critical region (i.e., it turns out to be less likely than expected).

Since all statistical inferences are based on probability estimates, two erroneous outcomes are possible: type I errors, in which a true null hypothesis is rejected, and type II errors, in which a false null hypothesis is retained. The former result in an erroneous confirmation of the research hypothesis, while the latter result in an inability to recognize a statistically significant result.

See also ANOVA, Central trend measures, Factor analysis, Measurement, Multivariate analysis methods, Null hypothesis testing, Probability, Statistical inference

A. Myers

See what "Statistics in psychology" is in other dictionaries:

    Contents 1 Biomedical and Life Sciences (Biomedical and Life Sciences) 2 Z ... Wikipedia

    This article contains an unfinished translation from a foreign language. You can help the project by translating it to the end. If you know what language the fragment is written in, please indicate it in this template ... Wikipedia