Theory of pattern recognition. Methods based on assumptions about the class of decision functions

And signs. Such tasks are solved quite often, for example, when crossing or driving a street at traffic lights. Recognizing the color of a lit traffic light and knowing the rules of the road allows you to make the right decision about whether or not to cross the street at the moment.

In the process of biological evolution, many animals solved problems with the help of visual and auditory apparatus. pattern recognition good enough. Creation of artificial systems pattern recognition remains a difficult theoretical and technical problem. The need for such recognition arises in a variety of areas - from military affairs and security systems to the digitization of all kinds of analog signals.

Traditionally, image recognition tasks are included in the scope of artificial intelligence tasks.

Directions in pattern recognition

There are two main directions:

The study of the recognition abilities possessed by living beings, their explanation and modeling;
Development of the theory and methods for constructing devices designed to solve individual problems in applied problems.

Formal statement of the problem

Pattern recognition is the assignment of initial data to a certain class by highlighting essential features that characterize these data from the total mass of non-essential data.

When setting recognition problems, they try to use the mathematical language, trying, unlike the theory of artificial neural networks, where the basis is to obtain a result by experiment, to replace the experiment with logical reasoning and mathematical proofs.

Most often, monochrome images are considered in pattern recognition problems, which makes it possible to consider an image as a function on a plane. If we consider a point set on a plane T, where the function x(x,y) expresses at each point of the image its characteristic - brightness, transparency, optical density, then such a function is a formal record of the image.

The set of all possible functions x(x,y) on surface T- there is a model of the set of all images X. Introducing the concept similarities between the images, you can set the task of recognition. The specific form of such a setting strongly depends on the subsequent stages in recognition in accordance with one or another approach.

Pattern recognition methods

For optical image recognition, you can apply the method of iterating over the appearance of an object at different angles, scales, offsets, etc. For letters, you need to iterate over the font, font properties, etc.

The second approach is to find the contour of the object and examine its properties (connectivity, presence of corners, etc.)

Another approach is to use artificial neural networks. This method requires either a large number of examples of the recognition task (with correct answers), or a special neural network structure that takes into account the specifics of this task.

Perceptron as a method of pattern recognition

F. Rosenblatt, introducing the concept of a model of the brain, whose task is to show how psychological phenomena can arise in some physical system, the structure and functional properties of which are known - described the simplest discrimination experiments. These experiments are entirely related to pattern recognition methods, but differ in that the solution algorithm is not deterministic.

The simplest experiment, on the basis of which it is possible to obtain psychologically significant information about a certain system, boils down to the fact that the model is presented with two different stimuli and is required to respond to them in different ways. The purpose of such an experiment may be to study the possibility of their spontaneous discrimination by the system in the absence of intervention from the experimenter, or, conversely, to study forced discrimination, in which the experimenter seeks to teach the system to carry out the required classification.

In a learning experiment, the perceptron is usually presented with a certain sequence of images, which includes representatives of each of the classes to be distinguished. According to some memory modification rule, the correct choice of reaction is reinforced. Then the control stimulus is presented to the perceptron and the probability of obtaining the correct response for stimuli of this class is determined. Depending on whether the selected control stimulus matches or does not match with one of the images that were used in the training sequence, different results are obtained:

1. If the control stimulus does not coincide with any of the learning stimuli, then the experiment is associated not only with pure discrimination, but also includes elements generalizations.
2. If the control stimulus excites a certain set of sensory elements that are completely different from those elements that were activated under the influence of previously presented stimuli of the same class, then the experiment is a study pure generalization .

Perceptrons do not have the capacity for pure generalization, but they function quite satisfactorily in discrimination experiments, especially if the control stimulus matches closely enough with one of the patterns about which the perceptron has already accumulated some experience.

Examples of pattern recognition problems

Letter recognition.
Barcode recognition.
License plate recognition.
Face recognition.
Speech recognition.
Image recognition.
Recognition of local areas of the earth's crust in which mineral deposits are located.

Pattern recognition programs

Notes

Literature

David A. Forsyth, Jean Pons Computer vision. Modern Approach = Computer Vision: A Modern Approach. - M.: "Williams", 2004. - S. 928. - ISBN 0-13-085198-1
George Stockman, Linda Shapiro Computer vision = Computer Vision. - M.: Binom. Knowledge Laboratory, 2006. - S. 752. - ISBN 5947743841

A.L. Gorelik, V.A. Skripkin, Recognition methods, M .: Higher school, 1989.
Sh.-K. Cheng, Design principles of visual information systems, M.: Mir, 1994.

Wikimedia Foundation. 2010 .

- in technology, a scientific and technical direction associated with the development of methods and the construction of systems (including on the basis of a computer) to establish the belonging of an object (subject, process, phenomenon, situation, signal) to one of the pre ... ... Big Encyclopedic Dictionary

One of the new regions cybernetics. The content of the theory of R. about. is the extrapolation of the properties of objects (images) belonging to several classes to objects that are close to them in some sense. Usually, when teaching an automaton R. about. there is ... ... Geological Encyclopedia

English recognition, image; German Gestalt alterkennung. A branch of mathematical cybernetics that develops principles and methods for classifying and identifying objects described by a finite set of features that characterize them. Antinazi. Encyclopedia ... ... Encyclopedia of Sociology

Pattern recognition- a method of studying complex objects with the help of a computer; consists in the selection of features and the development of algorithms and programs that allow computers to automatically classify objects according to these features. For example, to determine which ... ... Economic and Mathematical Dictionary

- (technical), a scientific and technical direction associated with the development of methods and the construction of systems (including computer-based ones) to establish the belonging of an object (subject, process, phenomenon, situation, signal) to one of the pre ... ... encyclopedic Dictionary

PATTERN RECOGNITION- a section of mathematical cybernetics that develops methods for classifying, as well as identifying objects, phenomena, processes, signals, situations of all those objects that can be described by a finite set of certain features or properties, ... ... Russian sociological encyclopedia

pattern recognition- 160 pattern recognition: Identification of form representations and configurations using automatic means

Overview of existing pattern recognition methods

L.P. Popova , AND ABOUT. Datiev

The ability to "recognize" is considered the main property of human beings, as, indeed, of other living organisms. Pattern recognition is a branch of cybernetics that develops principles and methods for classifying and identifying objects, phenomena, processes, signals, situations - all those objects that can be described by a finite set of some features or properties that characterize an object.

An image is a description of an object. Images have a characteristic property, which manifests itself in the fact that acquaintance with a finite number of phenomena from the same set makes it possible to recognize an arbitrarily large number of its representatives.

There are two main directions in the theory of pattern recognition:

the study of the powers of recognition possessed by human beings and other living organisms;

development of the theory and methods for constructing devices designed to solve individual problems of pattern recognition in certain application areas.

Further, the article describes the problems, principles and methods for implementing pattern recognition systems related to the development of the second direction. The second part of the article discusses neural network methods of pattern recognition, which can be attributed to the first direction of pattern recognition theory.

Problems of building image recognition systems

The tasks that arise in the construction of automatic pattern recognition systems can usually be classified into several main areas. The first of them is related to the representation of the initial data obtained as the results of measurements for the object to be recognized. This sensitivity problem. Each measured value is some "characteristic of an image or an object. Suppose, for example, that the images are alphanumeric characters. In this case, a measuring retina, similar to that shown in Fig. 1 (a), can be successfully used in the sensor. If the retina consists of n-elements, then the measurement results can be represented as a measurement vector or an image vector ,

where each element xi takes, for example, the value 1 if the image of the symbol passes through the i-th cell of the retina, and the value 0 otherwise.

Consider Fig. 2(b). In this case, the images are continuous functions (of the type of sound signals) of the variable t. If the function values are measured at discrete points t1,t2, ..., tn, then the image vector can be formed by taking x1= f(t1),x2=f(t2),... , xn = f(tn).

Figure 1. Measuring retina

The second problem of pattern recognition is related to the selection of characteristic features or properties from the obtained initial data and the reduction in the dimension of pattern vectors. This problem is often defined as a problem preprocessing and feature selection.

Features of a class of images are characteristic properties common to all images of a given class. The signs that characterize the differences between individual classes can be interpreted as interclass signs. Intraclass features common to all the classes under consideration do not carry useful information from the point of view of recognition and may not be taken into account. The choice of features is considered one of the important tasks associated with the construction of recognition systems. If the measurement results make it possible to obtain a complete set of distinguishing features for all classes, the actual recognition and classification of patterns will not cause any particular difficulties. Automatic recognition would then be reduced to a simple matching process or procedures such as table lookups. In most practical recognition problems, however, determining a complete set of distinguishing features is extremely difficult, if not impossible. From the original data, it is usually possible to extract some of the distinguishing features and use them to simplify the process of automatic pattern recognition. In particular, the dimension of the measurement vectors can be reduced using transformations that minimize the loss of information.

The third problem associated with the construction of pattern recognition systems is to find the optimal decision procedures necessary for identification and classification. After the data collected about the patterns to be recognized are represented by points or vectors of measurements in the pattern space, let the machine figure out which class of patterns this data corresponds to. Let the machine be designed to distinguish between M classes, denoted w1, w2, ... ..., wm. In this case, the image space can be considered to consist of M regions, each of which contains points corresponding to images from the same class. In this case, the recognition problem can be considered as the construction of the boundaries of the decision regions that separate M classes, based on the registered measurement vectors. Let these boundaries be determined, for example, by decision functions d1(х),d2(x),..., dm(х). These functions, also called discriminant functions, are scalar and single-valued functions of the image of x. If di (x) > dj (x), then the image of x belongs to the class w1. In other words, if the i-th decision function di(x) has the highest value, then a meaningful illustration of such an automatic classification scheme based on the implementation of the decision-making process is shown in Fig. 2 (on the scheme "GR" - the generator of decisive functions).

Figure 2. Scheme of automatic classification.

Decision functions can be obtained in a number of ways. In those cases where there is complete a priori information about the recognized patterns, the decision functions can be determined exactly on the basis of this information. If only qualitative information is available about the patterns, reasonable assumptions can be made about the form of the decision functions. In the latter case, the boundaries of the decision regions can deviate significantly from the true ones, and therefore it is necessary to create a system capable of arriving at a satisfactory result through a series of successive adjustments.

Objects (images) to be recognized and classified using an automatic pattern recognition system must have a set of measurable characteristics. When for a whole group of images the results of the corresponding measurements turn out to be similar, it is considered that these objects belong to the same class. The purpose of the pattern recognition system is to determine, on the basis of the collected information, a class of objects with characteristics similar to those measured for recognizable objects. The correctness of recognition depends on the amount of distinguishing information contained in the measured characteristics, and the efficiency of using this information.

Basic Methods for Implementing Pattern Recognition Systems

Pattern recognition is the task of constructing and applying formal operations on numerical or symbolic representations of objects of the real or ideal world, the results, the solutions of which reflect the equivalence relations between these objects. Equivalence relations express the belonging of the evaluated objects to some classes, considered as independent semantic units.

When constructing recognition algorithms, equivalence classes can be set by a researcher who uses his own meaningful ideas or uses external additional information about the similarity and difference of objects in the context of the problem being solved. Then one speaks of "discerning with the teacher." Otherwise, i.e. when an automated system solves a classification problem without involving external training information, one speaks of automatic classification or “unsupervised recognition”. Most pattern recognition algorithms require very significant computing power, which can only be provided by high-performance computer technology.

Various authors (Yu.L. Barabash, V.I. Vasiliev, A.L. Gorelik, V.A. Skripkin, R. Duda, P. Hart, L.T. Kuzin, F.I. Peregudov, F.P. Tarasenko, Temnikov F.E., Afonin V.A., Dmitriev V.I., J. Tu, R. Gonzalez, P. Winston, K. Fu, Ya.Z. Tsypkin and others) give a different typology of methods pattern recognition. Some authors distinguish between parametric, non-parametric and heuristic methods, while others single out groups of methods based on historical schools and trends in this field.

At the same time, well-known typologies do not take into account one very significant characteristic, which reflects the specifics of the way knowledge about the subject area is represented using any formal pattern recognition algorithm. D.A. Pospelov identifies two main ways of representing knowledge:

Intensional representation - in the form of a diagram of relationships between attributes (features).

Extensional representation - with the help of concrete facts (objects, examples).

It should be noted that the existence of these two groups of recognition methods: those operating with features and those operating with objects, is deeply natural. From this point of view, none of these methods, taken separately from the other, makes it possible to form an adequate reflection of the subject area. Between these methods there is a relation of complementarity in the sense of N. Bohr, therefore, promising recognition systems should provide the implementation of both of these methods, and not just any one of them.

Thus, the classification of recognition methods proposed by D.A. Pospelov is based on the fundamental laws that underlie the human way of cognition in general, which puts it in a very special (privileged) position compared to other classifications, which, against this background, look more lightweight and artificial.

Intensional Methods

A distinctive feature of intensional methods is that they use different characteristics of features and their relationships as elements of operations in the construction and application of pattern recognition algorithms. Such elements can be individual values or intervals of feature values, average values and variances, feature relationship matrices, etc., on which actions are performed, expressed in an analytical or constructive form. At the same time, the objects in these methods are not considered as integral information units, but act as indicators for assessing the interaction and behavior of their attributes.

The group of intensional pattern recognition methods is extensive, and its division into subclasses is somewhat arbitrary:

– methods based on estimates of the distribution densities of feature values

– methods based on assumptions about the class of decision functions

– logical methods

– linguistic (structural) methods.

Methods based on estimates of the distribution densities of feature values. These pattern recognition methods are borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to a priori probabilities of objects belonging to one or another recognizable class and conditional distribution densities of the feature vector values. These methods are reduced to determining the likelihood ratio in various areas of the multidimensional feature space.

The group of methods based on the estimation of the distribution densities of feature values is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed in modern statistics, the so-called parametric methods, for which the analytical expression of the distribution law (in this case, the normal law) is considered to be known and only a small number of parameters (mean vectors and covariance matrices) need to be estimated.

This group also includes a method for calculating the likelihood ratio for independent features. This method, with the exception of the assumption of the independence of features (which is practically never fulfilled in reality), does not imply knowledge of the functional form of the distribution law. It can be attributed to non-parametric methods.

Other non-parametric methods, used when the shape of the distribution density curve is unknown and no assumptions can be made about its nature at all, occupy a special position. These include the well-known method of multidimensional histograms, the “k-nearest neighbors” method, the Euclidean distance method, the method of potential functions, etc., the generalization of which is the method called “Parzen estimates”. These methods formally operate with objects as integral structures, but depending on the type of recognition task, they can act both in intensional and extensional hypostases.

Nonparametric methods analyze the relative numbers of objects falling into the given multidimensional volumes and use various distance functions between the objects of the training sample and the recognized objects. For quantitative features, when their number is much less than the sample size, operations with objects play an intermediate role in estimating local distribution densities of conditional probabilities, and objects do not carry the semantic load of independent information units. At the same time, when the number of features is commensurate or greater than the number of objects under study, and the features are of a qualitative or dichotomous nature, then there can be no talk of any local estimates of the probability distribution densities. In this case, the objects in these non-parametric methods are considered as independent information units (holistic empirical facts) and these methods acquire the meaning of assessments of the similarity and difference of the objects under study.

Thus, the same technological operations of non-parametric methods, depending on the conditions of the problem, make sense either local estimates of the probability distribution densities of feature values, or estimates of the similarity and difference of objects.

In the context of the intensional representation of knowledge, the first side of nonparametric methods is considered here, as estimates of probability distribution densities. Many authors note that nonparametric methods such as Parzen estimates work well in practice. The main difficulties in applying these methods are the need to remember the entire training sample to calculate estimates of local probability distribution densities and high sensitivity to the non-representativeness of the training sample.

Methods based on assumptions about the class of decision functions. In this group of methods, the general form of the decision function is considered known and its quality functional is given. Based on this functional, the best approximation of the decision function is sought for the training sequence. The most common are representations of decision functions in the form of linear and generalized nonlinear polynomials. The quality functional of the decision rule is usually associated with the classification error.

The main advantage of methods based on assumptions about the class of decision functions is the clarity of the mathematical formulation of the recognition problem as a problem of finding an extremum. The solution to this problem is often achieved using some kind of gradient algorithms. The variety of methods of this group is explained by the wide range of used decision rule quality functionals and extremum search algorithms. A generalization of the considered algorithms, which include, in particular, Newton's algorithm, perceptron-type algorithms, etc., is the method of stochastic approximation. Unlike parametric recognition methods, the success of this group of methods does not depend so much on the mismatch of theoretical ideas about the laws of distribution of objects in the feature space with empirical reality. All operations are subordinated to one main goal - finding the extremum of the quality functional of the decision rule. At the same time, the results of the parametric and considered methods may be similar. As shown above, parametric methods for the case of normal distributions of objects in different classes with equal covariance matrices lead to linear decision functions. We also note that the algorithms for selecting informative features in linear diagnostic models can be interpreted as particular variants of gradient algorithms for searching for an extremum.

The possibilities of gradient algorithms for finding an extremum, especially in the group of linear decision rules, have been studied quite well. The convergence of these algorithms has been proved only for the case when the recognizable classes of objects are displayed in the feature space by compact geometric structures. However, the desire to achieve sufficient quality of the decision rule can often be satisfied with the help of algorithms that do not have a rigorous mathematical proof of the convergence of the solution to the global extremum.

Such algorithms include a large group of heuristic programming procedures representing the direction of evolutionary modeling. Evolutionary modeling is a bionic method borrowed from nature. It is based on the use of known mechanisms of evolution in order to replace the process of meaningful modeling of a complex object with phenomenological modeling of its evolution.

A well-known representative of evolutionary modeling in pattern recognition is the method of group accounting of arguments (MGUA). The GMDH is based on the principle of self-organization, and the GMDH algorithms reproduce the scheme of mass selection. In GMDH algorithms, members of a generalized polynomial are synthesized and selected in a special way, which is often called the Kolmogorov-Gabor polynomial. This synthesis and selection is carried out with increasing complexity, and it is impossible to predict in advance what final form the generalized polynomial will have. First, simple pairwise combinations of initial features are usually considered, from which the equations of the decisive functions are composed, as a rule, not higher than the second order. Each equation is analyzed as an independent decision function, and the values of the parameters of the composed equations are found in one way or another from the training sample. Then, from the resulting set of decision functions, a part of the best in some sense is selected. The quality of individual decision functions is checked on a control (test) sample, which is sometimes called the principle of external addition. The selected partial decision functions are considered below as intermediate variables that serve as initial arguments for a similar synthesis of new decision functions, etc. The process of such a hierarchical synthesis continues until the extremum of the decision function quality criterion is reached, which in practice manifests itself in the deterioration of this quality when trying to further increase the order of the members of the polynomial relative to the original features.

The self-organization principle underlying the GMDH is called heuristic self-organization, since the whole process is based on the introduction of external additions chosen heuristically. The result of the decision can significantly depend on these heuristics. The resulting diagnostic model depends on how the objects are divided into training and testing samples, how the recognition quality criterion is determined, how many variables are skipped in the next selection row, etc.

These features of GMDH algorithms are also characteristic of other approaches to evolutionary modeling. But we note here one more aspect of the methods under consideration. This is their content essence. Using methods based on assumptions about the class of decision functions (evolutionary and gradient), it is possible to build diagnostic models of high complexity and obtain practically acceptable results. At the same time, the achievement of practical goals in this case is not accompanied by the extraction of new knowledge about the nature of recognizable objects. The possibility of extracting this knowledge, in particular knowledge about the mechanisms of interaction of attributes (features), is fundamentally limited here by the given structure of such interaction, fixed in the chosen form of decisive functions. Therefore, the maximum that can be said after constructing a particular diagnostic model is to list the combinations of features and the features themselves that are included in the resulting model. But the meaning of combinations that reflect the nature and structure of the distributions of the objects under study often remains undiscovered within the framework of this approach.

Boolean Methods. Logical methods of pattern recognition are based on the apparatus of logical algebra and allow to operate with information contained not only in individual features, but also in combinations of feature values. In these methods, the values of any attribute are considered as elementary events.

In the most general form, logical methods can be characterized as a kind of search for logical patterns in the training sample and the formation of a certain system of logical decision rules (for example, in the form of conjunctions of elementary events), each of which has its own weight. The group of logical methods is diverse and includes methods of varying complexity and depth of analysis. For dichotomous (boolean) features, the so-called tree-like classifiers, the dead-end test method, the Kora algorithm, and others are popular. More complex methods are based on the formalization of D.S. Mill's inductive methods. Formalization is carried out by constructing a quasi-axiomatic theory and is based on multi-sorted multi-valued logic with quantifiers over variable-length tuples.

The Kora algorithm, like other logical methods of pattern recognition, is quite laborious, since a complete enumeration is necessary when selecting conjunctions. Therefore, when applying logical methods, high requirements are placed on the efficient organization of the computational process, and these methods work well with relatively small dimensions of the feature space and only on powerful computers.

Linguistic (syntactic or structural) methods. Linguistic methods of pattern recognition are based on the use of special grammars that generate languages, with the help of which a set of properties of recognizable objects can be described. Grammar refers to the rules for constructing objects from these non-derived elements.

If the description of images is made with the help of non-derivative elements (sub-images) and their relationships, then a linguistic or syntactic approach is used to build automatic recognition systems using the principle of commonality of properties. An image can be described using a hierarchical structure of subimages similar to the syntactic structure of a language. This circumstance makes it possible to apply the theory of formal languages in solving problems of pattern recognition. It is assumed that the grammar of images contains finite sets of elements called variables, non-derived elements and substitution rules. The nature of the substitution rules determines the type of grammar. Among the most studied grammars are regular, context-free and grammars of direct constituents. The key points of this approach are the choice of non-derivative elements of the image, the union of these elements and the relations connecting them into grammars of images, and, finally, the implementation of the processes of analysis and recognition in the corresponding language. This approach is especially useful when working with images that either cannot be described by numerical measurements, or are so complex that their local features cannot be identified and one has to refer to the global properties of objects.

For example, E.A. Butakov, V.I. Ostrovsky, I.L. Fadeev propose the following system structure for image processing (Fig. 3), using a linguistic approach, where each of the functional blocks is a software (microprogram) complex (module) that implements the corresponding functions.

Figure 3. Structural diagram of the recognizer

Attempts to apply the methods of mathematical linguistics to the problem of image analysis lead to the need to solve a number of problems related to the mapping of a two-dimensional image structure onto one-dimensional chains of a formal language.

Extensional Methods

In the methods of this group, in contrast to the intensional direction, each studied object is given an independent diagnostic value to a greater or lesser extent. At their core, these methods are close to the clinical approach, which considers people not as a chain of objects ranked according to one or another indicator, but as integral systems, each of which is individual and has a special diagnostic value. Such a careful attitude to the objects of study does not allow one to exclude or lose information about each individual object, which occurs when applying the methods of intensional direction, using objects only to detect and fix the patterns of behavior of their attributes.

The main operations in pattern recognition using the discussed methods are the operations of determining the similarity and difference of objects. Objects in the specified group of methods play the role of diagnostic precedents. At the same time, depending on the conditions of a particular task, the role of an individual precedent can vary within the widest limits: from the main and defining to very indirect participation in the recognition process. In turn, the conditions of the problem may require the participation of a different number of diagnostic precedents for a successful solution: from one in each recognizable class to the entire sample size, as well as different ways to calculate the measures of similarity and difference of objects. These requirements explain the further division of extensional methods into subclasses:

prototype comparison method;

k-nearest neighbor method;

teams of decision rules.

Prototype comparison method. This is the simplest extensional recognition method. It is used, for example, when the recognized classes are displayed in the feature space in compact geometric groupings. In this case, the center of the geometric grouping of the class (or the object closest to the center) is usually chosen as the prototype point.

To classify an unknown object, the prototype closest to it is found, and the object belongs to the same class as this prototype. Obviously, no generalized class images are formed in this method.

Various types of distances can be used as a measure of proximity. Often for dichotomous features, the Hamming distance is used, which in this case is equal to the square of the Euclidean distance. In this case, the decision rule for classifying objects is equivalent to a linear decision function.

This fact should be especially noted. It clearly demonstrates the connection between prototype and indicative representation of information about the data structure. Using the above representation, for example, any traditional measuring scale, which is a linear function of the values of dichotomous features, can be considered as a hypothetical diagnostic prototype. In turn, if the analysis of the spatial structure of the recognized classes allows us to conclude that they are geometrically compact, then it is enough to replace each of these classes with one prototype, which is actually equivalent to a linear diagnostic model.

In practice, of course, the situation is often different from the idealized example described. A researcher who intends to apply a recognition method based on comparison with the prototypes of diagnostic classes faces difficult problems. This is, first of all, the choice of a proximity measure (metric), which can significantly change the spatial configuration of the distribution of objects. And, secondly, an independent problem is the analysis of multidimensional structures of experimental data. Both of these problems are especially acute for the researcher under conditions of high dimension of the feature space, which is typical for real problems.

Method of k-nearest neighbors. The k-nearest neighbor method for solving discriminant analysis problems was first proposed back in 1952. It is as follows.

When classifying an unknown object, a given number (k) of other objects geometrically closest to it in the feature space (nearest neighbors) with already known belonging to recognizable classes is found. The decision to assign an unknown object to a particular diagnostic class is made by analyzing information about this known membership of its nearest neighbors, for example, using a simple vote count.

Initially, the k-nearest neighbor method was considered as a non-parametric method for estimating the likelihood ratio. For this method, theoretical estimates of its effectiveness are obtained in comparison with the optimal Bayesian classifier. It is proved that the asymptotic error probabilities for the k-nearest neighbor method exceed the errors of the Bayes rule by no more than two times.

As noted above, in real problems it is often necessary to operate with objects that are described by a large number of qualitative (dichotomous) features. At the same time, the dimension of the feature space is commensurate with or exceeds the volume of the sample under study. Under such conditions, it is convenient to interpret each object of the training sample as a separate linear classifier. Then this or that diagnostic class is represented not by one prototype, but by a set of linear classifiers. The combined interaction of linear classifiers results in a piecewise linear surface that separates the recognizable classes in the feature space. The type of dividing surface, consisting of pieces of hyperplanes, can be varied and depends on the relative position of the classified aggregates.

Another interpretation of k-nearest neighbor classification mechanisms can also be used. It is based on the idea of the existence of some latent variables, abstract or related by some transformation with the original feature space. If pairwise distances between objects in the space of latent variables are the same as in the space of initial features, and the number of these variables is much less than the number of objects, then the interpretation of the k-nearest neighbors method can be considered from the point of view of comparing nonparametric estimates of conditional probability distribution densities. The concept of latent variables presented here is close in nature to the concept of true dimensionality and other representations used in various dimensionality reduction methods.

When using the k-nearest neighbors method for pattern recognition, the researcher has to solve the difficult problem of choosing a metric to determine the proximity of diagnosed objects. This problem in the conditions of high dimension of the feature space becomes extremely aggravated due to the sufficient complexity of this method, which becomes significant even for high-performance computers. Therefore, here, just as in the prototype comparison method, it is necessary to solve the creative problem of analyzing the multidimensional structure of experimental data in order to minimize the number of objects representing diagnostic classes.

Algorithms for calculating grades (voting). The principle of operation of algorithms for calculating scores (ABO) is to calculate the priority (similarity scores) that characterize the “proximity” of the recognized and reference objects according to the system of feature ensembles, which is a system of subsets of a given set of features.

Unlike all previously considered methods, algorithms for calculating estimates operate with object descriptions in a fundamentally new way. For these algorithms, objects exist simultaneously in very different subspaces of the feature space. The ABO class brings the idea of using features to its logical end: since it is not always known which combinations of features are the most informative, in ABO the degree of similarity of objects is calculated by comparing all possible or certain combinations of features included in the descriptions of objects.

Teams of decision rules. The decision rule uses a two-level recognition scheme. At the first level, private recognition algorithms work, the results of which are combined at the second level in the synthesis block. The most common methods of such a combination are based on the allocation of areas of competence of a particular algorithm. The simplest way to find areas of competence is to a priori split the feature space based on the professional considerations of a particular science (for example, stratification of the sample according to some feature). Then, for each of the selected areas, its own recognition algorithm is built. Another method is based on the use of formal analysis to determine local areas of the feature space as neighborhoods of recognizable objects for which the success of any particular recognition algorithm has been proven.

The most general approach to constructing a synthesis block considers the resulting indicators of partial algorithms as initial features for constructing a new generalized decision rule. In this case, all the above methods of intensional and extensional directions in pattern recognition can be used. Effective for solving the problem of creating a set of decision rules are logical algorithms of the “Kora” type and algorithms for calculating estimates (ABO), which form the basis of the so-called algebraic approach, which provides research and a constructive description of recognition algorithms, within which all existing types of algorithms fit.

Neural network methods

Neural network methods are methods based on the use of various types of neural networks (NN). The main areas of application of various NNs for pattern and image recognition:

application for extracting key characteristics or features of given images,

classification of the images themselves or the characteristics already extracted from them (in the first case, the extraction of key characteristics occurs implicitly within the network),

solution of optimization problems.

Multilayer neural networks. The architecture of a multilayer neural network (MNN) consists of sequentially connected layers, where the neuron of each layer is connected with all the neurons of the previous layer with its inputs, and with the outputs of the next one.

The simplest application of a single-layer NN (called auto-associative memory) is to train the network to reconstruct the feed images. By feeding a test image to the input and calculating the quality of the reconstructed image, one can estimate how well the network recognized the input image. The positive properties of this method are that the network can recover distorted and noisy images, but it is not suitable for more serious purposes.

MNN is also used for direct classification of images - the input is either the image itself in some form, or a set of previously extracted key characteristics of the image, at the output, the neuron with maximum activity indicates belonging to the recognized class (Fig. 4). If this activity is below a certain threshold, then it is considered that the submitted image does not belong to any of the known classes. The learning process establishes the correspondence of the input images with belonging to a certain class. This is called supervised learning. This approach is good for access control tasks for a small group of people. This approach provides a direct comparison of the images themselves by the network, but with an increase in the number of classes, the time of training and network operation increases exponentially. Therefore, for tasks such as searching for a similar person in a large database, it requires extracting a compact set of key features from which to search.

A classification approach using the frequency characteristics of the entire image is described in . A single-layer NS based on multivalued neurons was used.

B shows the use of NN for image classification, when the network input receives the results of image decomposition by the method of principal components.

In the classical MNS, the interlayer neural connections are fully connected, and the image is represented as a one-dimensional vector, although it is two-dimensional. The architecture of the convolutional neural network aims to overcome these shortcomings. It used local receptor fields (providing local two-dimensional connectivity of neurons), general weights (providing the detection of some features anywhere in the image), and hierarchical organization with spatial subsampling (spatial subsampling). Convolutional NN (CNN) provides partial resistance to scale changes, displacements, rotations, distortions.

MNS are also used to detect objects of a certain type. In addition to the fact that any trained MNS can to some extent determine whether images belong to “its own” classes, it can be specially trained to reliably detect certain classes. In this case, the output classes will be classes that belong and do not belong to the given image type. A neural network detector was used to detect the face image in the input image. The image was scanned with a window of 20x20 pixels, which was fed to the input of the network, which decides whether the given area belongs to the class of faces. Training was done using both positive examples (various images of faces) and negative examples (images that are not faces). To increase the reliability of detection, a team of NNs trained with different initial weights was used, as a result of which the NNs made mistakes in different ways, and the final decision was made by voting of the entire team.

Figure 5. Principal components (eigenfaces) and decomposition of the image into principal components

NN is also used to extract the key characteristics of the image, which are then used for subsequent classification. In , a method for the neural network implementation of the principal component analysis method is shown. The essence of the principal component analysis method is to obtain the maximally decorelled coefficients that characterize the input patterns. These coefficients are called principal components and are used for statistical image compression, in which a small number of coefficients are used to represent the entire image. An NN with one hidden layer containing N neurons (which is much smaller than the image dimension), trained by the method of error backpropagation to restore the input image at the output, forms the coefficients of the first N principal components at the output of the hidden neurons, which are used for comparison. Typically, 10 to 200 principal components are used. As the component number increases, its representativeness decreases greatly, and it makes no sense to use components with large numbers. When using nonlinear activation functions of neural elements, a nonlinear decomposition into principal components is possible. Nonlinearity allows you to more accurately reflect the variations in input data. Applying principal component analysis to the decomposition of face images, we obtain the main components, called proper faces, which also have a useful property - there are components that mainly reflect such essential face characteristics as gender, race, emotions. When restored, the components have a face-like appearance, with the former reflecting the most general shape of the face, the latter representing various minor differences between faces (Fig. 5). This method is well applicable for searching similar face images in large databases. The possibility of further reduction of the dimension of principal components with the help of NS is also shown. By evaluating the quality of the reconstruction of the input image, one can very accurately determine whether it belongs to the class of faces.

Neural networks of high order. High-order neural networks (HNNs) differ from MNNs in that they have only one layer, but the inputs of neurons also receive high-order terms that are the product of two or more components of the input vector . Such networks can also form complex dividing surfaces.

Hopfield neural networks. Hopfield NN (HSH) is single-layer and fully connected (there are no connections of neurons to themselves), its outputs are connected with inputs. Unlike the MNS, the NSH is relaxational, i.e. being set to the initial state, it functions until it reaches a stable state, which will be its output value. To search for a global minimum in relation to optimization problems, stochastic modifications of the NSH are used.

The use of NSH as an associative memory allows you to accurately restore the images that the network has been trained to when a distorted image is fed to the input. In this case, the network will “remember” the closest (in the sense of the local minimum of energy) image, and thus recognize it. Such functioning can also be thought of as a sequential application of the auto-associative memory described above. Unlike auto-associative memory, NSH will restore the image perfectly accurately. To avoid interference minima and increase network capacity, various methods are used.

Kohonen self-organizing neural networks. Kohonen self-organizing neural networks (SNNCs) provide topological ordering of the input image space. They allow topologically continuous mapping of the input n-dimensional space into the output m-dimensional, m<

Cognitron. The cognitron in its architecture is similar to the structure of the visual cortex, it has a hierarchical multilayer organization, in which neurons between layers are connected only locally. Trained by competitive learning (without a teacher). Each layer of the brain implements different levels of generalization; the input layer is sensitive to simple patterns, such as lines, and their orientation in certain areas of the visual area, while the response of other layers is more complex, abstract, and independent of the pattern's position. Similar functions are implemented in the cognitron by modeling the organization of the visual cortex.

Neocognitron is a further development of the cognitron idea and more accurately reflects the structure of the visual system, allows you to recognize images regardless of their transformations, rotations, distortions and scale changes.

Cognitron is a powerful image recognition tool, however, it requires high computational costs, which are currently unattainable.

The considered neural network methods provide fast and reliable image recognition, but when using these methods, problems arise in the recognition of three-dimensional objects. However, this approach has many advantages.

Conclusion

Currently, there are a fairly large number of automatic pattern recognition systems for various applied problems.

Pattern recognition by formal methods as a fundamental scientific direction is inexhaustible.

Mathematical methods of image processing have a wide variety of applications: science, technology, medicine, social sphere. In the future, the role of pattern recognition in human life will increase even more.

Neural network methods provide fast and reliable image recognition. This approach has many advantages and is one of the most promising.

Literature

D.V. Brilyuk, V.V. Starovoitov. Neural network methods of image recognition // /

Kuzin L.T. Fundamentals of Cybernetics: Fundamentals of Cybernetic Models. T.2. - M.: Energy, 1979. - 584 p.

Peregudov F.I., Tarasenko F.P. Introduction to System Analysis: Textbook. - M .: Higher School, 1997. - 389s.

Temnikov F.E., Afonin V.A., Dmitriev V.I. Theoretical foundations of information technology. - M.: Energy, 1979. - 511s.

Tu J., Gonzalez R. Principles of Pattern Recognition. / Per. from English. - M.: Mir, 1978. - 410s.

Winston P. Artificial intelligence. / Per. from English. - M.: Mir, 1980. - 520s.

Fu K. Structural methods in pattern recognition: Translated from English. - M.: Mir, 1977. - 320s.

Tsypkin Ya.Z. Fundamentals of Information Theory of Identification. - M.: Nauka, 1984. - 520s.

Pospelov G.S. Artificial intelligence is the basis of new information technology. - M.: Nauka, 1988. - 280s.

Yu. Lifshits, Statistical methods of pattern recognition ///modern/07modernnote.pdf

Bohr N. Atomic physics and human knowledge. / Translation from English. - M.: Mir, 1961. - 151s.

Butakov E.A., Ostrovsky V.I., Fadeev I.L. Image processing on a computer.1987.-236s.

Duda R., Hart P. Pattern recognition and scene analysis. / Translation from English. - M.: Mir, 1978. - 510s.

Duke V.A. Computer psychodiagnostics. - St. Petersburg: Brotherhood, 1994. - 365 p.

Aizenberg I. N., Aizenberg N. N. and Krivosheev G. A. Multi-valued and Universal Binary Neurons: Learning Algorithms, Applications to Image Processing and Recognition. Lecture Notes in Artificial Intelligence - Machine Learning and Data Mining in Pattern Recognition, 1999, pp. 21-35.

Ranganath S. and Arun K. Face recognition using transform features and neural networks. Pattern Recognition 1997, Vol. 30, pp. 1615-1622.

Golovko V.A. Neurointelligence: Theory and Applications. Book 1. Organization and training of neural networks with direct and feedback - Brest: BPI, 1999, - 260s.

Vetter T. and Poggio T. Linear Object Classes and Image Synthesis From a Single Example Image. IEEE Transactions on Pattern Analysis and Machine Intelligence 1997, Vol. 19, pp. 733-742.

Golovko V.A. Neurointelligence: Theory and Applications. Book 2. Self-organization, fault tolerance and the use of neural networks - Brest: BPI, 1999, - 228s.

Lawrence S., Giles C. L., Tsoi A. C. and Back A. D. Face Recognition: A Convolutional Neural Network Approach. IEEE Transactions on Neural Networks, Special Issue on Neural Networks and Pattern Recognition, pp. 1-24.

Wasserman F. Neurocomputer technology: Theory and practice, 1992 - 184p.

Rowley H. A., Baluja S. and Kanade T. Neural Network-Based Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998, Vol. 20, pp. 23-37.

Valentin D., Abdi H., O "Toole A. J. and Cottrell G. W. Connectionist models of face processing: a survey. IN: Pattern Recognition 1994, Vol. 27, pp. 1209-1230.

Document

They make up algorithms recognitionimages. Methodsrecognitionimages As noted above ... reality is not exist"ecosystems in general" and exist only a few ... conclusions from this detailed reviewmethodsrecognition we presented in...

Overview of methods for identifying people based on face images, taking into account the features of visual recognition
Review
... recognition by a person of low-contrast objects, incl. persons. Brought review common methods ... Exist whole line methods ... way, as a result of the study, a platform for the development of methodrecognition ...
Imeni Glazkova Valentina Vladimirovna RESEARCH AND DEVELOPMENT OF METHODS FOR CONSTRUCTION OF SOFTWARE TOOLS FOR CLASSIFICATION OF MULTI-TOPIC HYPERTEXT DOCUMENTS Specialty 05
Dissertation abstract
hypertext documents. The chapter contains reviewexistingmethods solution of the problem under consideration, description ... by cutting off the least relevant classes // Mathematical methodsrecognitionimages: 13th All-Russian Conference. Leningrad region...
Slide 0 Overview of the tasks of bioinformatics related to the analysis and processing of genetic texts
Lecture
DNA and protein sequences. Review tasks of bioinformatics as tasks ... signals require the use of modern methodsrecognitionimages, statistical approaches and ... with low gene density. Existing gene prediction programs don't...

→

Chapter 3: Analytical Review of Pattern Recognition and Decision Making Methods

Pattern recognition theory and control automation

The main tasks of adaptive pattern recognition

Recognition is an information process implemented by some information converter (intelligent information channel, recognition system) that has an input and an output. The input of the system is information about what features the presented objects have. The output of the system displays information on which classes (generalized images) the recognizable objects are assigned to.

When creating and operating an automated pattern recognition system, a number of tasks are solved. Let us briefly and simply consider these tasks. Note that the formulations of these tasks, and the set itself, do not coincide with different authors, since to a certain extent it depends on the specific mathematical model on which this or that recognition system is based. In addition, some tasks in certain recognition models do not have a solution and, accordingly, are not posed.

The task of formalizing the subject area

In fact, this task is a task of coding. A list of generalized classes is compiled, which may include specific implementations of objects, as well as a list of features that these objects, in principle, can have.

The task of forming a training sample

The training sample is a database containing descriptions of specific implementations of objects in the feature language, supplemented with information about the belonging of these objects to certain recognition classes.

The task of training the recognition system

The training sample is used to form generalized images of recognition classes based on the generalization of information about what features the objects of the training sample have that belong to this class and other classes.

The problem of feature space dimension reduction

After training the recognition system (obtaining statistics on the distribution of feature frequencies by class), it becomes possible to determine for each feature its value for solving the recognition problem. After that, the least valuable features can be removed from the feature system. Then the recognition system must be retrained, since as a result of the removal of some features, the distribution statistics of the remaining features by class changes. This process can be repeated, i.e. be iterative.

Recognition task

Objects of a recognizable sample are recognized, which, in particular, may consist of one object. The recognizable sample is formed similarly to the training one, but does not contain information about the belonging of objects to classes, since this is precisely what is determined in the recognition process. The result of the recognition of each object is a distribution or a list of all recognition classes in descending order of the degree of similarity of the recognized object with them.

Recognition quality control task

After recognition, its adequacy can be established. For objects of the training sample, this can be done immediately, since for them it is simply known to which classes they belong. For other objects, this information can be obtained later. In any case, the actual average error probability for all recognition classes can be determined, as well as the error probability when assigning a recognized object to a particular class.

The recognition results should be interpreted taking into account the available information about the recognition quality.

Adaptation task

If, as a result of the quality control procedure, it is found that it is unsatisfactory, then the descriptions of incorrectly recognized objects can be copied from the recognizable sample to the training one, supplemented with adequate classification information, and used to reshape the decision rules, i.e. taken into account. Moreover, if these objects do not belong to the already existing recognition classes, which could be the reason for their incorrect recognition, then this list can be expanded. As a result, the recognition system adapts and begins to adequately classify these objects.

Inverse recognition problem

The task of recognition is that for a given object, according to its known features, the system establishes its belonging to some previously unknown class. In the inverse recognition problem, on the contrary, for a given recognition class, the system determines which features are most characteristic of objects of this class and which are not (or which objects of the training sample belong to this class).

Tasks of cluster and constructive analysis

Clusters are such groups of objects, classes or features that within each cluster they are as similar as possible, and between different clusters they are as different as possible.

A construct (in the context considered in this section) is a system of opposite clusters. Thus, in a certain sense, constructs are the result of a cluster analysis of clusters.

In cluster analysis, the degree of similarity and difference of objects (classes, features) is quantitatively measured, and this information is used for classification. The result of cluster analysis is the very classification of objects by clusters. This classification can be represented in the form of semantic networks.

The task of cognitive analysis

In cognitive analysis, information about the similarity and difference of classes or features is of interest to the researcher in itself, and not in order to use it for classification, as in cluster and constructive analysis.

If two classes of recognition are characterized by the same feature, then this contributes to the similarity of these two classes. If for one of the classes this feature is uncharacteristic, then this contributes to the difference.

If two signs correlate with each other, then in a certain sense they can be considered as one sign, and if they are anticorrelated, then as different. Taking into account this circumstance, the presence of different features in different classes also makes a certain contribution to their similarity and difference.

The results of cognitive analysis can be presented in the form of cognitive diagrams.

Pattern recognition methods and their characteristics

Principles of classification of pattern recognition methods

Pattern recognition is the task of constructing and applying formal operations on numerical or symbolic representations of real or ideal world objects, the solution results of which reflect the equivalence relations between these objects. Equivalence relations express the belonging of the evaluated objects to some classes, considered as independent semantic units.

When constructing recognition algorithms, equivalence classes can be set by a researcher who uses his own meaningful ideas or uses external additional information about the similarity and difference of objects in the context of the problem being solved. Then one speaks of "recognition with the teacher". Otherwise, i.e. when an automated system solves a classification problem without involving external training information, one speaks of automatic classification or “unsupervised recognition”. Most pattern recognition algorithms require very significant computing power, which can only be provided by high-performance computer technology.

Various authors (Yu.L. Barabash, V.I. Vasiliev, A.L. Gorelik, V.A. Skripkin, R. Duda, P. Hart, L.T. Kuzin, F.I. Peregudov, F.P. Tarasenko, F. E. Temnikov, J. Tu, R. Gonzalez, P. Winston, K. Fu, Ya. Z. Tsypkin and others) give a different typology of pattern recognition methods. Some authors distinguish between parametric, nonparametric, and heuristic methods, while others single out groups of methods based on historical schools and trends in the field. For example, in the work, which gives an academic overview of recognition methods, the following typology of pattern recognition methods is used:

methods based on the principle of separation;
statistical methods;
methods built on the basis of "potential functions";
methods for calculating grades (voting);
methods based on the propositional calculus, in particular on the apparatus of the algebra of logic.

This classification is based on the difference in formal methods of pattern recognition, and therefore the consideration of the heuristic approach to recognition, which has received full and adequate development in expert systems, is omitted. The heuristic approach is based on hard-to-formalizable knowledge and intuition of the researcher. At the same time, the researcher himself determines what information and how the system should use to achieve the desired recognition effect.

A similar typology of recognition methods with varying degrees of detail is found in many works on recognition. At the same time, the well-known typologies do not take into account one very significant characteristic, which reflects the specifics of the way knowledge about the subject area is represented using any formal pattern recognition algorithm.

D.A. Pospelov (1990) identifies two main ways of representing knowledge:

intensional, in the form of a scheme of connections between attributes (features).
extensional, with the help of specific facts (objects, examples).

The intensional representation captures the patterns and relationships that explain the structure of the data. With regard to diagnostic tasks, such fixation consists in determining operations on the attributes (features) of objects that lead to the required diagnostic result. Intensional representations are implemented through operations on attribute values and do not imply operations on specific information facts (objects).

In turn, extensional representations of knowledge are associated with the description and fixation of specific objects from the subject area and are implemented in operations, the elements of which are objects as integral systems.

It is possible to draw an analogy between intensional and extensional representations of knowledge and the mechanisms underlying the activity of the left and right hemispheres of the human brain. If the right hemisphere is characterized by a holistic prototypical representation of the surrounding world, then the left hemisphere operates with patterns that reflect the connections of the attributes of this world.

The two fundamental ways of knowledge representation described above allow us to propose the following classification of pattern recognition methods:

intensional methods based on operations with attributes.
extensional methods based on operations with objects.

It is necessary to emphasize that the existence of these two (and only two) groups of recognition methods: those operating with features and those operating with objects, is deeply natural. From this point of view, none of these methods, taken separately from the other, makes it possible to form an adequate reflection of the subject area. According to the authors, between these methods there is a relation of complementarity in the sense of N. Bohr, therefore, promising recognition systems should ensure the implementation of both of these methods, and not just any one of them.

Thus, the classification of recognition methods proposed by D. A. Pospelov is based on the fundamental laws that underlie the human way of cognition in general, which puts it in a very special (privileged) position compared to other classifications, which look more lightweight and artificial.

Intensional Methods

The group of intensional pattern recognition methods is extensive, and its division into subclasses is somewhat arbitrary.

Methods based on estimates of the distribution densities of feature values

These pattern recognition methods are borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to a priori probabilities of objects belonging to one or another recognizable class and conditional distribution densities of the feature vector values. These methods are reduced to determining the likelihood ratio in various areas of the multidimensional feature space.

The main difficulties in applying these methods are the need to remember the entire training sample to calculate estimates of local probability distribution densities and high sensitivity to the non-representativeness of the training sample.

Methods based on assumptions about the class of decision functions

In this group of methods, the general form of the decision function is considered known and its quality functional is given. Based on this functional, the best approximation of the decision function is found from the training sequence. The most common are representations of decision functions in the form of linear and generalized nonlinear polynomials. The quality functional of the decision rule is usually associated with the classification error.

The main advantage of methods based on assumptions about the class of decision functions is the clarity of the mathematical formulation of the recognition problem as a problem of finding an extremum. The variety of methods of this group is explained by the wide range of used decision rule quality functionals and extremum search algorithms. A generalization of the considered algorithms, which include, in particular, Newton's algorithm, perceptron-type algorithms, etc., is the method of stochastic approximation.

Sufficiently high quality of the decision rule can be achieved using algorithms that do not have a rigorous mathematical proof of the convergence of the solution to the global extremum. Such algorithms include a large group of heuristic programming procedures representing the direction of evolutionary modeling. Evolutionary modeling is a bionic method borrowed from nature. It is based on the use of known mechanisms of evolution in order to replace the process of meaningful modeling of a complex object with phenomenological modeling of its evolution. A well-known representative of evolutionary modeling in pattern recognition is the method of group accounting of arguments (MGUA). The GMDH is based on the principle of self-organization, and the GMDH algorithms reproduce the scheme of mass selection.

However, the achievement of practical goals in this case is not accompanied by the extraction of new knowledge about the nature of recognizable objects. The possibility of extracting this knowledge, in particular knowledge about the mechanisms of interaction of attributes (features), is fundamentally limited here by the given structure of such interaction, fixed in the chosen form of decisive functions.

Boolean Methods

Logical methods of pattern recognition are based on the apparatus of logical algebra and allow to operate with information contained not only in individual features, but also in combinations of feature values. In these methods, the values of any attribute are considered as elementary events.

The Kora algorithm, like other logical methods of pattern recognition, is quite laborious in terms of computation, since a complete enumeration is required when selecting conjunctions. Therefore, when applying logical methods, high requirements are placed on the efficient organization of the computational process, and these methods work well with relatively small dimensions of the feature space and only on powerful computers.

Linguistic (structural) methods

Linguistic methods of pattern recognition are based on the use of special grammars that generate languages that can be used to describe a set of properties of recognizable objects.

For different classes of objects, non-derivative (atomic) elements (subimages, signs) and possible relationships between them are distinguished. Grammar refers to the rules for constructing objects from these non-derived elements.

Thus, each object is a collection of non-derivative elements, "connected" to each other in one way or another, or, in other words, by a "sentence" of some "language". I would like to emphasize the very significant ideological value of this thought.

By parsing (parsing) a "sentence", its syntactic "correctness" is determined, or, equivalently, whether some fixed grammar describing a class can generate an existing description of an object.

However, the task of restoring (defining) grammars from a certain set of statements (sentences - descriptions of objects) that generate a given language is difficult to formalize.

Extensional Methods

Prototype comparison method

This is the simplest extensional recognition method. It is used, for example, in the case when the recognized classes are displayed in the feature space by compact geometric groupings. In this case, the center of the geometric grouping of the class (or the object closest to the center) is usually chosen as the prototype point.

To classify an unknown object, the prototype closest to it is found, and the object belongs to the same class as this prototype. Obviously, no generalized class images are formed in this method.

In practice, of course, the situation is often different from the described idealized example. A researcher who intends to apply a recognition method based on comparison with the prototypes of diagnostic classes faces difficult problems.

Firstly, it is the choice of a proximity measure (metric), which can significantly change the spatial configuration of the distribution of objects. Second, an independent problem is the analysis of multidimensional structures of experimental data. Both of these problems are especially acute for the researcher under conditions of high dimension of the feature space, which is typical for real problems.

k nearest neighbor method

The k nearest neighbors method for solving discriminant analysis problems was first proposed back in 1952. It is as follows.

Initially, the k nearest neighbors method was considered as a nonparametric method for estimating the likelihood ratio. For this method, theoretical estimates of its effectiveness are obtained in comparison with the optimal Bayesian classifier. It is proved that the asymptotic error probabilities for the k nearest neighbor method exceed the errors of the Bayes rule by no more than two times.

When using the k nearest neighbors method for pattern recognition, the researcher has to solve the difficult problem of choosing a metric to determine the proximity of diagnosed objects. This problem in the conditions of high dimension of the feature space becomes extremely aggravated due to the sufficient complexity of this method, which becomes significant even for high-performance computers. Therefore, here, just as in the prototype comparison method, it is necessary to solve the creative problem of analyzing the multidimensional structure of experimental data in order to minimize the number of objects representing diagnostic classes.

The need to reduce the number of objects in the training sample (diagnostic precedents) is a disadvantage of this method, as it reduces the representativeness of the training sample.

Algorithms for calculating grades ("voting")

The principle of operation of evaluation algorithms (ABO) is to calculate priorities (similarity scores) that characterize the “proximity” of the recognized and reference objects according to the system of feature ensembles, which is a system of subsets of a given set of features.

The used combinations of attributes (subspaces) are called support sets or sets of partial descriptions of objects. The concept of generalized proximity between the recognized object and objects of the training sample (with a known classification), which are called reference objects, is introduced. This proximity is represented by a combination of the proximity of the recognized object with the reference objects calculated on sets of partial descriptions. Thus, ABO is an extension of the k nearest neighbors method, in which the proximity of objects is considered only in one given feature space.

Another extension of the ABO is that in these algorithms the problem of determining the similarity and difference of objects is formulated as a parametric one and the stage of setting the ABO according to the training sample is selected, at which the optimal values of the entered parameters are selected. The quality criterion is the recognition error, and literally everything is parameterized:

rules for calculating the proximity of objects by individual features;
rules for calculating the proximity of objects in feature subspaces;
the degree of importance of a particular reference object as a diagnostic precedent;
the significance of the contribution of each reference set of features to the final assessment of the similarity of the recognized object with any diagnostic class.

Air cooler parameters are set in the form of threshold values and (or) as weights of the specified components.

The theoretical possibilities of ABO are at least not lower than those of any other pattern recognition algorithm, since with the help of ABO all conceivable operations with the objects under study can be implemented.

But, as it usually happens, the expansion of potentialities encounters great difficulties in their practical implementation, especially at the stage of constructing (tuning) algorithms of this type.

Separate difficulties were noted earlier when discussing the k nearest neighbors method, which could be interpreted as a truncated version of ABO. It can also be considered in a parametric form and reduce the problem to finding a weighted metric of the selected type. At the same time, complex theoretical questions and problems associated with the organization of an efficient computational process already arise here for high-dimensional problems.

For ABO, if you try to use the capabilities of these algorithms in full, these difficulties increase many times over.

The noted problems explain the fact that in practice the use of ABO for solving high-dimensional problems is accompanied by the introduction of any heuristic restrictions and assumptions. In particular, there is an example of the use of ABO in psychodiagnostics, in which a version of ABO was tested, which is actually equivalent to the k nearest neighbors method.

Decisive Rule Collectives

At the end of the review of pattern recognition methods, let's dwell on one more approach. These are the so-called teams of decision rules (CRC).

Since different recognition algorithms behave differently on the same sample of objects, the question naturally arises of a synthetic decision rule that adaptively uses the strengths of these algorithms. The synthetic decision rule uses a two-level recognition scheme. At the first level, private recognition algorithms work, the results of which are combined at the second level in the synthesis block. The most common methods of such a combination are based on the allocation of areas of competence of a particular algorithm. The simplest way to find areas of competence is to a priori split the space of attributes based on professional considerations of a particular science (for example, stratification of the sample according to some attribute). Then, for each of the selected areas, its own recognition algorithm is built. Another method is based on the use of formal analysis to determine local areas of the feature space as neighborhoods of recognizable objects for which the success of any particular recognition algorithm has been proven.

The most general approach to constructing a synthesis block considers the resulting indicators of partial algorithms as initial features for constructing a new generalized decision rule. In this case, all the above methods of intensional and extensional directions in pattern recognition can be used. Effective for solving the problem of creating a set of decision rules are logical algorithms of the "Kora" type and algorithms for calculating estimates (ABO), which form the basis of the so-called algebraic approach, which provides research and a constructive description of recognition algorithms, within which all existing types of algorithms fit.

Comparative analysis of pattern recognition methods

Let us compare the pattern recognition methods described above and evaluate the degree of their adequacy to the requirements formulated in Section 3.3.3 for SDA models for adaptive automated control systems for complex systems.

For solving real problems from the group of methods of the intensional direction, parametric methods and methods based on proposals on the form of decisive functions are of practical value. Parametric methods form the basis of the traditional methodology for constructing indicators. The application of these methods in real problems is associated with the imposition of strong restrictions on the data structure, which lead to linear diagnostic models with very approximate estimates of their parameters. When using methods based on assumptions about the form of decision functions, the researcher is also forced to turn to linear models. This is due to the high dimension of the feature space, which is typical for real problems, which, with an increase in the degree of the polynomial decision function, gives a huge increase in the number of its members with a problematic concomitant increase in the quality of recognition. Thus, projecting the area of potential application of intensional recognition methods to real problems, we get a picture that corresponds to the well-established traditional methodology of linear diagnostic models.

The properties of linear diagnostic models, in which the diagnostic indicator is represented by a weighted sum of initial features, are well studied. The results of these models (with appropriate normalization) are interpreted as distances from the objects under study to some hyperplane in the feature space or, equivalently, as projections of the objects onto some straight line in the given space. Therefore, linear models are adequate only for simple geometric configurations of feature space regions into which objects of different diagnostic classes are mapped. With more complex distributions, these models fundamentally cannot reflect many features of the experimental data structure. At the same time, such features can provide valuable diagnostic information.

At the same time, the appearance in any real problem of simple multidimensional structures (in particular, multidimensional normal distributions) should be regarded as an exception rather than as a rule. Often, diagnostic classes are formed on the basis of complex external criteria, which automatically entails the geometric heterogeneity of these classes in the feature space. This is especially true for the “life” criteria that are most frequently encountered in practice. Under such conditions, the use of linear models captures only the most "rough" patterns of experimental information.

The use of extensional methods is not associated with any assumptions about the structure of experimental information, except that within the recognized classes there must be one or more groups of objects that are somewhat similar, and objects of different classes must differ from each other in some way. It is obvious that for any finite dimension of the training sample (and it cannot be different), this requirement is always met simply because there are random differences between objects. Various measures of proximity (distance) of objects in the feature space are used as similarity measures. Therefore, the effective use of extensional pattern recognition methods depends on how well these proximity measures are determined, as well as on which objects of the training sample (objects with a known classification) play the role of diagnostic precedents. Successful solution of these problems gives a result approaching the theoretically achievable limits of recognition efficiency.

The advantages of extensional methods of pattern recognition are opposed, first of all, by the high technical complexity of their practical implementation. For high-dimensional feature spaces, the seemingly simple task of finding pairs of nearest points turns into a serious problem. Also, many authors note as a problem the need to remember a sufficiently large number of objects representing recognizable classes.

In itself, this is not a problem, but it is perceived as a problem (for example, in the k nearest neighbors method) for the reason that when recognizing each object, a complete enumeration of all objects in the training sample occurs.

Therefore, it is advisable to apply the model of the recognition system, in which the problem of a complete enumeration of objects of the training sample during recognition is removed, since it is carried out only once when forming generalized images of recognition classes. In recognition itself, the identified object is compared only with generalized images of recognition classes, the number of which is fixed and does not depend at all on the dimension of the training sample. This approach allows you to increase the dimension of the training sample until the required high quality of generalized images is achieved, without any fear that this may lead to an unacceptable increase in the recognition time (since the recognition time in this model does not depend on the dimension of the training at all). samples).

The theoretical problems of applying extensional recognition methods are related to the problems of searching for informative groups of features, finding optimal metrics for measuring the similarity and difference of objects, and analyzing the structure of experimental information. At the same time, the successful solution of these problems allows not only designing effective recognition algorithms, but also making the transition from extensional knowledge of empirical facts to intensional knowledge about the patterns of their structure.

The transition from extensional knowledge to intensional knowledge occurs at the stage when a formal recognition algorithm has already been constructed and its effectiveness has been demonstrated. Then the study of the mechanisms by which the obtained efficiency is achieved is carried out. Such a study, associated with the analysis of the geometric structure of data, can, for example, lead to the conclusion that it is enough to replace the objects representing a particular diagnostic class with one typical representative (prototype). This is equivalent, as noted above, to setting a traditional linear diagnostic scale. It is also possible that it is enough to replace each diagnostic class with several objects that are meaningful as typical representatives of some subclasses, which is equivalent to constructing a fan of linear scales. There are other options, which will be discussed below.

Thus, a review of recognition methods shows that a number of different methods of pattern recognition have been theoretically developed at present. The literature provides a detailed classification of them. However, for most of these methods, their software implementation is absent, and this is deeply natural, one might even say “predetermined” by the characteristics of the recognition methods themselves. This can be judged by the fact that such systems are little mentioned in the specialized literature and other sources of information.

Consequently, the question of the practical applicability of certain theoretical recognition methods for solving practical problems with real (i.e., quite significant) data dimensions and on real modern computers remains insufficiently developed.

The above circumstance can be understood if we recall that the complexity of the mathematical model exponentially increases the complexity of the software implementation of the system and to the same extent reduces the chances that this system will work in practice. This means that only software systems based on fairly simple and “transparent” mathematical models can be implemented on the market. Therefore, a developer interested in replicating his software product approaches the issue of choosing a mathematical model not from a purely scientific point of view, but as a pragmatist, taking into account the possibilities of software implementation. He believes that the model should be as simple as possible, which means it should be implemented at lower cost and with better quality, and it should also work (be practically effective).

In this regard, the task of implementing in recognition systems a mechanism for generalizing descriptions of objects belonging to the same class, i.e. mechanism for the formation of compact generalized images. It is obvious that such a generalization mechanism will allow "compressing" any training sample in terms of dimension to a base of generalized images known in advance in terms of dimension. This will also allow us to set and solve a number of problems that cannot even be formulated in such recognition methods as the comparison with the prototype method, the k nearest neighbors method, and ABO.

These are the tasks:

determining the information contribution of features to the information portrait of a generalized image;
cluster-constructive analysis of generalized images;
determination of the semantic load of the feature;
semantic cluster-constructive analysis of features;
a meaningful comparison of generalized class images with each other and features with each other (cognitive diagrams, including Merlin diagrams).

The method that made it possible to achieve the solution of these problems also distinguishes the perspective system based on it from other systems, just as compilers differ from interpreters, since due to the formation of generalized images in this perspective system, recognition time is independent of the size of the training sample. It is known that it is the existence of this dependence that leads to practically unacceptable computer time spent on recognition in such methods as the k nearest neighbors method, ABO, and CLD at such dimensions of the training sample, when we can talk about sufficient statistics.

In conclusion of a brief review of recognition methods, we present the essence of the above in a summary table (Table 3.1), containing a brief description of the various methods of pattern recognition in the following parameters:

classification of recognition methods;
areas of application of recognition methods;
classification of limitations of recognition methods.

Classification of recognition methods		Application area	Limitations (disadvantages)
Intensive recognition methods	Methods based on estimates of the distribution densities of feature values (or similarities and differences between objects)	Problems with a known distribution, usually normal, the need to collect large statistics	The need to enumerate the entire training set during recognition, high sensitivity to non-representativeness of the training set and artifacts
	Methods based on assumptions about the class of decision functions	Classes should be well separable, the feature system should be orthonormal	The form of the decision function must be known in advance. The impossibility of taking into account new knowledge about correlations between features
	Boolean Methods		When selecting logical decision rules (conjunctions), a complete enumeration is necessary. High computational complexity
	Linguistic (structural) methods	Problems of small dimension of feature space	The task of restoring (defining) grammar from a certain set of statements (descriptions of objects) is difficult to formalize. Unresolved theoretical problems
Extensional methods of recognition	Prototype comparison method	Problems of small dimension of feature space	High dependence of classification results on distance measures (metrics). Unknown optimal metric
	k nearest neighbor method		High dependence of classification results on distance measures (metrics). The need for a complete enumeration of the training sample during recognition. Computational complexity
	Algorithms for calculating marks (voting) AVO	Problems of small dimension in terms of the number of classes and features	Dependence of classification results on distance measure (metric). The need for a complete enumeration of the training sample during recognition. High technical complexity of the method
	Decisive Rule Collectives (CRCs)	Problems of small dimension in terms of the number of classes and features	Very high technical complexity of the method, the unresolved number of theoretical problems, both in determining the areas of competence of particular methods, and in the particular methods themselves

Table 3.1 - Summary table of classification of recognition methods, comparison of their areas of application and limitations

The role and place of pattern recognition in the automation of complex systems management

The automated control system consists of two main parts: the control object and the control system.

The control system performs the following functions:

identification of the state of the control object;
development of a control action based on the goals of management, taking into account the state of the control object and the environment;
providing a control effect on the control object.

Pattern recognition is nothing more than the identification of the state of some object.

Therefore, the possibility of using the pattern recognition system at the stage of identifying the state of the control object seems quite obvious and natural. However, this may not be necessary. Therefore, the question arises in which cases it is advisable to use the recognition system in the automated control system, and in which it is not.

According to the literature data, in many previously developed and modern automated control systems in the subsystems for identifying the state of the control object and generating control actions, deterministic mathematical models of “direct counting” are used, which unambiguously and quite simply determine what to do with the control object if it has certain external parameters.

At the same time, the question of how these parameters are related to certain states of the control object is not raised or resolved. This position corresponds to the point of view, which consists in the fact that their one-to-one relationship is accepted “by default”. Therefore, the terms: "parameters of the control object" and "state of the control object" are considered as synonyms, and the concept of "state of the control object" is not explicitly introduced at all. However, it is obvious that, in the general case, the relationship between the observed parameters of the control object and its state is dynamic and probabilistic.

Thus, traditional automated control systems are essentially parametric control systems, i.e. systems that manage not the states of the control object, but only its observable parameters. The decision on the control action is taken in such systems as if "blindly", i.e. without forming a holistic image of the control object and the environment in their current state, as well as without predicting the development of the environment and the reaction of the control object to certain control actions on it, acting simultaneously with the predicted influence of the environment.

From the positions developed in this paper, the term "decision making" in the modern sense is hardly applicable to traditional automated control systems. The fact is that “decision-making”, at least, involves a holistic vision of an object in the environment, and not only in their current state, but also in dynamics, and in interaction both with each other and with the control system, involves consideration various alternative options for the development of this entire system, as well as the narrowing of the diversity (reduction) of these alternatives based on certain target criteria. None of this, obviously, is not in traditional ACS, or it is, but in a simplified form.

Of course, the traditional method is adequate and its application is quite correct and justified in cases where the control object is indeed a stable and rigidly determined system, and the influence of the environment on it can be neglected.

However, in other cases this method is ineffective.

If the control object is dynamic, then the models underlying its control algorithms quickly become inadequate, as the relationship between input and output parameters changes, as well as the set of essential parameters itself. In essence, this means that traditional automated control systems are able to control the state of the control object only near the equilibrium point by means of weak control actions on it, i.e. by the method of small perturbations. Far from the state of equilibrium, from the traditional point of view, the behavior of the control object looks unpredictable and uncontrollable.

If there is no unambiguous relationship between the input and output parameters of the control object (that is, between the input parameters and the state of the object), in other words, if this relationship has a pronounced probabilistic nature, then deterministic models, in which it is assumed that the result of measuring a certain parameter is simply number, initially not applicable. In addition, the form of this relationship may simply be unknown, and then it is necessary to proceed from the most general assumption: that it is probabilistic, or not defined at all.

An automated control system built on traditional principles can only work on the basis of parameters, the patterns of relationships of which are already known, studied and reflected in the mathematical model, in this study the task was set to develop such methods for designing automated control systems that will allow creating systems that can identify and set the most significant parameters, and determine the nature of the links between them and the states of the control object.

In this case, it is necessary to apply more developed and adequate measurement methods to the real situation:

classification or pattern recognition (learning based on a training sample, adaptability of recognition algorithms, adaptability of sets of classes and studied parameters, selection of the most significant parameters and reduction of description dimension while maintaining a given redundancy, etc.);
statistical measurements, when the result of measuring a certain parameter is not a single number, but a probability distribution: a change in a statistical variable does not mean a change in its value in itself, but a change in the characteristics of the probability distribution of its values.

As a result, automated control systems based on the traditional deterministic approach practically do not work with complex dynamic multi-parameter weakly determined control objects, such as, for example, macro- and micro-socio-economic systems in a dynamic economy of the “transition period”, hierarchical elite and ethnic groups, society and electorate, human physiology and psyche, natural and artificial ecosystems and many others.

It is very significant that in the mid-80s, the school of I.Prigozhin developed an approach, according to which in the development of any system (including a person), periods alternate during which the system behaves either as “mostly deterministic”, or as “ mostly random." Naturally, a real control system must stably manage the control object not only at “deterministic” sections of its history, but also at points when its further behavior becomes highly uncertain. This alone means that it is necessary to develop approaches to the management of systems in the behavior of which there is a large element of randomness (or what is currently mathematically described as "randomness").

Therefore, the composition of promising automated control systems that provide control of complex dynamic multi-parameter weakly deterministic systems, as essential functional links, will apparently include subsystems for identifying and predicting the states of the environment and the control object, based on artificial intelligence methods (primarily pattern recognition), methods of supporting decision making and information theory.

Let us briefly consider the issue of using image recognition systems to make a decision on a control action (this issue will be discussed in more detail later, since it is the key one for this work). If we take the target and other states of the control object as recognition classes, and the factors influencing it as features, then a measure of the relationship between factors and states can be formed in the pattern recognition model. This allows obtaining information about the factors that contribute to or hinder its transition to this state, based on a given state of the control object, and, on this basis, develop a decision on the control action.

Factors can be divided into the following groups:

characterizing the prehistory of the control object;
characterizing the current state of the control object;
environmental factors;
technological (managed) factors.

Thus, image recognition systems can be used as part of an automated control system: in subsystems for identifying the state of a control object and generating control actions.

This is useful when the control object is a complex system.

Making a decision on the control action in the automated control system

The solution of the problem of synthesis of adaptive automated control systems by complex systems is considered in this paper, taking into account numerous and deep analogies between the methods of pattern recognition and decision making.

On the one hand, the task of pattern recognition is a decision about the belonging of a recognizable object to a certain recognition class.

On the other hand, the authors propose to consider the problem of decision making as an inverse problem of decoding or an inverse problem of pattern recognition (see Section 2.2.2).

The commonality of the basic ideas underlying the methods of pattern recognition and decision making becomes especially obvious when considering them from the standpoint of information theory.

Variety of decision-making tasks

Decision making as the realization of a goal

Definition: making a decision (“choice”) is an action on a set of alternatives, as a result of which the original set of alternatives narrows, i.e. it is reduced.

Choice is an action that gives purposefulness to all activity. It is through acts of choice that the subordination of all activity to a specific goal or a set of interrelated goals is realized.

Thus, in order for the act of choice to become possible, the following is necessary:

generation or discovery of a set of alternatives on which to make a choice;
determination of the goals for the achievement of which the choice is made;
development and application of a method for comparing alternatives with each other, i.e. determination of a preference rating for each alternative according to certain criteria, allowing indirectly assessing how each alternative meets the goal.

Modern work in the field of decision support has revealed a characteristic situation, which consists in the fact that the complete formalization of finding the best (in a certain sense) solution is possible only for well-studied, relatively simple problems, while in practice, weakly structured problems are more common, for which completely formalized algorithms have not been developed (except for exhaustive enumeration and trial and error). However, experienced, competent and capable professionals often make choices that turn out to be quite good. Therefore, the current trend in decision-making practice in natural situations is to combine the ability of a person to solve non-formalized problems with the capabilities of formal methods and computer modeling: interactive decision support systems, expert systems, adaptive human-machine automated control systems, neural networks and cognitive systems.

Decision making as removal of uncertainty (information approach)

The process of obtaining information can be considered as a decrease in uncertainty as a result of receiving a signal, and the amount of information - as a quantitative measure of the degree of removal of uncertainty.

But as a result of choosing some subset of alternatives from the set, i.e. as a result of making a decision, the same thing happens (decrease in uncertainty). This means that every choice, every decision generates a certain amount of information, and therefore can be described in terms of information theory.

Classification of decision-making problems

The multiplicity of decision-making tasks is due to the fact that each component of the situation in which decision-making is carried out can be implemented in qualitatively different options.

Here are just a few of these options:

the set of alternatives, on the one hand, can be finite, countable or continual, and on the other hand, it can be closed (that is, completely known) or open (including unknown elements);
alternatives can be evaluated according to one or more criteria, which, in turn, can be quantitative or qualitative;
the selection mode can be single (one-time), or multiple, repetitive, including feedback on the results of the selection, i.e. allowing the learning of decision-making algorithms, taking into account the consequences of previous elections;
the consequences of choosing each alternative can be precisely known in advance (choice under certainty), have a probabilistic nature when the probabilities of possible outcomes are known after the choice is made (choice under risk) or have an ambiguous outcome with unknown probabilities (choice under uncertainty);
responsibility for the choice may be absent, be individual or group;
the degree of consistency of goals in a group choice can vary from the complete coincidence of the interests of the parties (cooperative choice) to their opposite (choice in a conflict situation). Intermediate options are also possible: a compromise, a coalition, a growing or fading conflict.

Various combinations of these options lead to numerous decision-making problems that have been studied to varying degrees.

Languages for describing decision-making methods

One and the same phenomenon can be spoken about in different languages of varying degrees of generality and adequacy. To date, there have been three main languages for describing choice.

The simplest, most developed and most popular is the criterion language.

Criteria language

The name of this language is associated with the basic assumption that each individual alternative can be evaluated by some specific (one) number, after which the comparison of alternatives is reduced to a comparison of their corresponding numbers.

Let, for example, (X) be a set of alternatives, and x be some definite alternative belonging to this set: x∈X. Then it is considered that for all x a function q(x) can be given, which is called a criterion (quality criterion, objective function, preference function, utility function, etc.), which has the property that if the alternative x 1 is preferable to x 2 (denoted: x 1 > x 2), then q (x 1) > q (x 2).

In this case, the choice is reduced to finding an alternative with the highest value of the criterion function.

However, in practice, the use of only one criterion for comparing the degree of preference of alternatives turns out to be an unjustified simplification, since a more detailed consideration of alternatives leads to the need to evaluate them not according to one, but according to many criteria that can be of a different nature and qualitatively different from each other.

For example, when choosing the most acceptable type of aircraft for passengers and the operating organization on certain types of routes, the comparison is carried out simultaneously according to many groups of criteria: technical, technological, economic, social, ergonomic, etc.

Multicriteria problems do not have a unique general solution. Therefore, many ways are proposed to give a multicriteria problem a particular form that allows a single general solution. Naturally, these solutions are generally different for different methods. Therefore, perhaps the main thing in solving a multicriteria problem is the justification of this type of its formulation.

Various options for simplifying the multicriteria selection problem are used. Let's list some of them.

Conditional maximization (not the global extremum of the integral criterion is found, but the local extremum of the main criterion).
Search for an alternative with given properties.
Finding the Pareto set.
Reduction of a multi-criteria problem to a single-criteria one by introducing an integral criterion.

Let us consider in more detail the formal formulation of the method of reducing a multi-criteria problem to a single-criteria one.

We introduce the integral criterion q 0 (x) as a scalar function of the vector argument:

q 0 (x) = q 0 ((q 1 (x), q 2 (x), ..., q n (x)).

The integral criterion makes it possible to order the alternatives by q 0 , thereby highlighting the best one (in the sense of this criterion). The form of the function q 0 is determined by how specifically we imagine the contribution of each criterion to the integral criterion. Usually additive and multiplicative functions are used:

q 0 = ∑a i ⋅q i /s i

1 - q 0 = ∏(1 - b i ⋅q i /s i)

Coefficients s i provide:

Dimensionlessness or a single dimension of the number a i ⋅q i /s i (different particular criteria may have different dimensions, and then it is impossible to perform arithmetic operations on them and reduce them to an integral criterion).
Normalization, i.e. provision of the condition: b i ⋅q i /s i<1.

The coefficients a i and b i reflect the relative contribution of particular criteria q i to the integral criterion.

So, in a multi-criteria setting, the problem of making a decision on choosing one of the alternatives is reduced to maximizing the integral criterion:

x * = arg max(q 0 (q 1 (x), q 2 (x), ..., q n (x)))

The main problem in the multicriteria formulation of the decision-making problem is that it is necessary to find such an analytical form of the coefficients a i and b i , which would provide the following properties of the model:

a high degree of adequacy of the subject area and the point of view of experts;
minimal computational difficulties in maximizing the integral criterion, i.e. its calculation for different alternatives;
stability of the results of maximizing the integral criterion from small perturbations of the initial data.
The stability of the solution means that a small change in the initial data should lead to a small change in the value of the integral criterion, and, accordingly, to a small change in the decision being made. Thus, if the initial data are practically the same, then the decision should be made either the same or very close.

Sequential Binary Selection Language

The language of binary relations is a generalization of the multicriteria language and is based on the fact that when we evaluate some alternative, this assessment is always relative, i.e. explicitly or more often implicitly, other alternatives from the set under study or from the general population are used as a base or frame of reference for comparison. Human thinking is based on the search and analysis of opposites (constructs), so it is always easier for us to choose one of two opposite options than one option from a large and in no way disordered set.

Thus, the main assumptions of this language boil down to the following:

a single alternative is not evaluated, i.e. the criterion function is not introduced;
for each pair of alternatives, it can be established in some way that one of them is preferable to the other, or that they are equivalent or incomparable;
the preference relation in any pair of alternatives does not depend on the other alternatives presented for choice.

There are various ways to specify binary relations: direct, matrix, using preference graphs, the method of sections, etc.

Relations between alternatives of one pair are expressed through the concepts of equivalence, order and dominance.

Generalized language of choice functions

The language of choice functions is based on set theory and allows one to operate with mappings of sets to their subsets corresponding to different choices without the need to enumerate elements. This language is very general and potentially allows any choice to be described. However, the mathematical apparatus of generalized choice functions is currently being developed and tested mainly on problems that have already been solved using the criterion or binary approaches.

group choice

Let there be a group of people who have the right to take part in collective decision-making. Suppose this group is considering some set of alternatives, and each member of the group makes his choice. The task is to develop a solution that in a certain way coordinates individual choices and in some sense expresses the "general opinion" of the group, i.e. taken as a group choice.

Naturally, different group decisions will correspond to different principles for coordinating individual decisions.

The rules for coordinating individual decisions in a group choice are called voting rules. The most common is the "majority rule", in which the group decision is taken by the alternative that receives the most votes.

It must be understood that such a decision only reflects the prevalence of different points of view in the group, and not a truly optimal option, for which no one may vote at all. "Truth is not determined by voting."

In addition, there are so-called "voting paradoxes", the most famous of which is Arrow's paradox.

These paradoxes can lead, and sometimes do lead, to very unpleasant features of the voting procedure: for example, there are cases when the group cannot make a single decision at all (there is no quorum or everyone votes for their own unique option, etc.), and sometimes ( in multistage voting) the minority can impose its will on the majority.

Choice under Uncertainty

Certainty is a special case of uncertainty, namely: it is an uncertainty close to zero.

In modern choice theory, it is believed that there are three main types of uncertainty in decision-making problems:

Informational (statistical) uncertainty of initial data for decision-making.
Uncertainty of the consequences of decision-making (choice).
Vagueness in the description of the components of the decision-making process.

Let's consider them in order.

Informational (statistical) uncertainty in initial data

The data obtained about the subject area cannot be considered as absolutely accurate. In addition, it is obvious that these data are not of interest to us in themselves, but only as signals that, perhaps, carry certain information about what we are really interested in. Thus, it is more realistic to consider that we are dealing with data that is not only noisy and inaccurate, but also indirect, and possibly incomplete. In addition, these data do not concern the entire (general) population under study, but only a certain subset of it, about which we were actually able to collect data, but at the same time we want to draw conclusions about the entire population, and we also want to know the degree of reliability of these conclusions.

Under these conditions, the theory of statistical decisions is used.

There are two main sources of uncertainty in this theory. First, it is not known what distribution the original data obeys. Secondly, it is not known what distribution has the set (general population) about which we want to draw conclusions from its subset that forms the initial data.

Statistical procedures are the decision-making procedures that remove both of these types of uncertainty.

It should be noted that there are a number of reasons that lead to the incorrect application of statistical methods:

statistical inferences, like any other, always have some definite reliability or certainty. But, unlike many other cases, the reliability of statistical findings is known and determined in the course of statistical research;
the quality of the solution obtained as a result of applying the statistical procedure depends on the quality of the initial data;
data that does not have a statistical nature should not be subjected to statistical processing;
it is necessary to use statistical procedures corresponding to the level of a priori information about the population under study (for example, you should not apply methods of analysis of variance to non-Gaussian data). If the distribution of the original data is unknown, then one must either establish it, or use several different methods and compare the results. If they are very different, this indicates the inapplicability of some of the procedures used.

Uncertainty of Consequences

When the consequences of choosing an alternative are uniquely determined by the alternative itself, then we can not distinguish between an alternative and its consequences, taking it for granted that choosing an alternative, we actually choose its consequences.

However, in real practice, one often has to deal with a more complex situation, when the choice of one or another alternative ambiguously determines the consequences of the choice made.

In the case of a discrete set of alternatives and outcomes of their choice, provided that the set of possible outcomes is common to all alternatives, we can assume that different alternatives differ from each other in the distribution of outcome probabilities. These probability distributions, in the general case, may depend on the results of the choice of alternatives and the outcomes actually resulting from this. In the simplest case, the outcomes are equally likely. The outcomes themselves usually have the meaning of gains or losses and are quantified.

If the outcomes are equal for all alternatives, then there is nothing to choose. If they are different, then alternatives can be compared by introducing certain quantitative estimates for them. The variety of problems in game theory is associated with a different choice of numerical characteristics of losses and gains as a result of the choice of alternatives, different degrees of conflict between the parties choosing alternatives, etc.

Consider this type of uncertainty as vague uncertainty

Any choice problem is a goal narrowing of the set of alternatives. Both a formal description of alternatives (their list itself, a list of their attributes or parameters) and a description of the rules for their comparison (criteria, relationships) are always given in terms of one or another measuring scale (even when the one who does this does not know about this).

It is known that all scales are blurred, but to varying degrees. The term "blur" refers to the property of the scales, which consists in the fact that it is always possible to present two alternatives that are distinguishable, i.e. different in one scale and indistinguishable, i.e. are identical, in the other - more blurred. The fewer gradations in a certain scale, the more blurred it is.

Thus, we can clearly see the alternatives and at the same time vaguely classify them, i.e. be ambiguous as to which classes they belong to.

Already in their first work on decision making in a fuzzy situation, Bellman and Zadeh put forward the idea that both goals and constraints should be represented as fuzzy (fuzzy) sets on a set of alternatives.

On some limitations of the optimization approach

In all the selection problems and decision-making methods considered above, the problem was to find the best ones in the initial set under given conditions, i.e. optimal alternatives in a certain sense.

The idea of optimality is the central idea of cybernetics and has firmly entered the practice of designing and operating technical systems. At the same time, this idea requires a cautious attitude when we try to transfer it to the field of managing complex, large and weakly determined systems, such as, for example, socio-economic systems.

There are good reasons for this conclusion. Let's consider some of them:

The optimal solution often turns out to be unstable, i.e. minor changes in the conditions of the problem, input data or constraints can lead to the selection of significantly different alternatives.
Optimization models are developed only for narrow classes of fairly simple tasks that do not always adequately and systematically reflect real control objects. Most often, optimization methods make it possible to optimize only fairly simple and well-formally described subsystems of some large and complex systems, i.e. allow only local optimization. However, if each subsystem of some large system works optimally, this does not mean at all that the system as a whole will also work optimally. Therefore, the optimization of a subsystem does not necessarily lead to its behavior, which is required of it when optimizing the system as a whole. Moreover, sometimes local optimization can lead to negative consequences for the system as a whole. Therefore, when optimizing subsystems and the system as a whole, it is necessary to determine the tree of goals and subgoals and their priority.
Often, maximizing the optimization criterion according to some mathematical model is considered to be the goal of optimization, but in reality the goal is to optimize the control object. Optimization criteria and mathematical models are always related to the goal only indirectly, i.e. more or less adequate, but always approximate.

Thus, the idea of optimality, which is extremely fruitful for systems that lend themselves to adequate mathematical formalization, must be transferred to complex systems with caution. Of course, mathematical models that can sometimes be proposed for such systems can be optimized. However, one should always take into account the strong simplification of these models, which in the case of complex systems can no longer be neglected, as well as the fact that the degree of adequacy of these models in the case of complex systems is actually unknown. Therefore, it is not known what purely practical significance this optimization has. The high practicality of optimization in technical systems should not give rise to the illusion that it will be just as effective in optimizing complex systems. Meaningful mathematical modeling of complex systems is very difficult, approximate and inaccurate. The more complex the system, the more careful one should be about the idea of its optimization.

Therefore, when developing control methods for complex, large, weakly determined systems, the authors consider the main thing not only the optimality of the chosen approach from a formal mathematical point of view, but also its adequacy to the goal and the very nature of the control object.

Expert Selection Methods

In the study of complex systems, problems often arise that, for various reasons, cannot be rigorously posed and solved using the currently developed mathematical apparatus. In these cases, the services of experts (system analysts) are resorted to, whose experience and intuition help to reduce the complexity of the problem.

However, it must be taken into account that experts themselves are highly complex systems, and their activities also depend on many external and internal conditions. Therefore, in the methods of organizing expert assessments, much attention is paid to creating favorable external and psychological conditions for the work of experts.

The following factors influence the work of an expert:

responsibility for the use of the results of the examination;
knowing that other experts are involved;
availability of information contact between experts;
interpersonal relations of experts (if there is information contact between them);
personal interest of the expert in the results of the assessment;
personal qualities of experts (self-esteem, conformity, will, etc.)

Interaction between experts can either stimulate or inhibit their activity. Therefore, in different cases, different methods of examination are used, which differ in the nature of the interaction of experts with each other: anonymous and open surveys and questionnaires, meetings, discussions, business games, brainstorming, etc.

There are various methods of mathematical processing of expert opinions. Experts are asked to evaluate various alternatives either by one or by a system of indicators. In addition, they are asked to evaluate the degree of importance of each indicator (its "weight" or "contribution"). The experts themselves are also assigned a level of competence corresponding to the contribution of each of them to the resulting opinion of the group.

A developed method of working with experts is the "Delphi" method. The main idea of this method is that criticism and argumentation have a beneficial effect on the expert, if his self-esteem is not affected and conditions are provided that exclude personal confrontation.

It should be emphasized that there is a fundamental difference in the nature of the use of expert methods in expert systems and in decision support. If in the first case, experts are required to formalize the methods of decision-making, then in the second - only the decision itself, as such.

Since experts are involved in the implementation of precisely those functions that are currently either not provided by automated systems at all, or are performed worse than by humans, a promising direction in the development of automated systems is the maximum automation of these functions.

Automated decision support systems

A person has always used assistants in making decisions: they were simply providers of information about the control object, and consultants (advisers) who offer options for decisions and analyze their consequences. The person making decisions always made them in a certain information environment: for a military commander, this is the headquarters, for the rector, the academic council, for the minister, the collegium.

In our time, the decision-making information infrastructure is unthinkable without automated systems for iterative decision evaluation, and especially decision support systems (DDS - Decision Support Systems), i.e. automated systems that are specifically designed to prepare the information a person needs to make a decision. The development of decision support systems is carried out, in particular, within the framework of an international project carried out under the auspices of the International Institute for Applied Systems Analysis in Laxenburg (Austria).

The choice in real situations requires the performance of a number of operations, some of which are performed more efficiently by a person, and others by a machine. Effective combination of their advantages with simultaneous compensation of shortcomings is embodied in automated decision support systems.

A person makes decisions better than a machine under conditions of uncertainty, but in order to make the right decision, he also needs adequate (complete and reliable) information that characterizes the subject area. However, it is known that a person does not cope well with large amounts of "raw" unprocessed information. Therefore, the role of the machine in decision support may be to carry out preliminary preparation of information about the control object and uncontrollable factors (environment), to help view the consequences of making certain decisions, and also to present all this information in a visual and convenient way. for making decisions form.

Thus, automated decision support systems compensate for the weaknesses of a person, freeing him from the routine preliminary processing of information, and provide him with a comfortable information environment in which he can better show his strengths. These systems are focused not on automating the functions of the decision maker (and, as a result, alienating these functions from him, and hence responsibility for the decisions made, which is often generally unacceptable), but on providing him with assistance in finding a good solution.

tutorial

For a long time I wanted to write a general article containing the very basics of Image Recognition, a kind of guide on basic methods, telling when to apply them, what tasks they solve, what can be done in the evening on the knee, and what is better not to think about without having a team of people in 20.

I have been writing some articles on Optical Recognition for a long time, so a couple of times a month various people write to me with questions on this topic. Sometimes you get the feeling that you live with them in different worlds. On the one hand, you understand that a person is most likely a professional in a related topic, but knows very little about optical recognition methods. And the most annoying thing is that he tries to apply a method from a nearby field of knowledge, which is logical, but does not work completely in Image Recognition, but does not understand this and is very offended if he starts telling him something from the very basics. And considering that telling from the basics is a lot of time, which is often not there, it becomes even sadder.

This article is designed so that a person who has never dealt with image recognition methods can, within 10-15 minutes, create in his head a certain basic picture of the world corresponding to the topic, and understand in which direction he should dig. Many of the methods described here are applicable to radar and audio processing.
I'll start with a couple of principles that we always begin to tell a potential customer, or a person who wants to start doing Optical Recognition:

When solving a problem, always go from the simplest. It is much easier to hang an orange label on a person than to follow a person, highlighting him in cascades. It is much easier to take a camera with a higher resolution than to develop a super-resolution algorithm.
A strict problem statement in optical recognition methods is orders of magnitude more important than in system programming problems: one extra word in the TK can add 50% of the work.
In recognition problems, there are no universal solutions. You can not make an algorithm that will simply "recognize any inscription." A sign on the street and a sheet of text are fundamentally different objects. It is probably possible to make a general algorithm (a good example from Google), but this will require a lot of work from a large team and consist of dozens of different subroutines.
OpenCV is the bible, which has many methods, and with which you can solve 50% of the volume of almost any problem, but OpenCV is only a small part of what can be done in reality. In one study, it was written in the conclusions: "The problem is not solved by OpenCV methods, therefore, it is unsolvable." Try to avoid this, not be lazy and soberly evaluate the current task every time from scratch, without using OpenCV templates.

It is very difficult to give some kind of universal advice, or tell how to create some kind of structure around which you can build a solution to arbitrary computer vision problems. The purpose of this article is to structure what can be used. I will try to break the existing methods into three groups. The first group is pre-filtering and image preparation. The second group is the logical processing of the filtering results. The third group is decision-making algorithms based on logical processing. The boundaries between groups are very arbitrary. To solve a problem, it is far from always necessary to apply methods from all groups; sometimes two are enough, and sometimes even one.

The list of methods presented here is not complete. I propose to add in the comments critical methods that I did not write and attribute 2-3 accompanying words to each.

Part 1. Filtering

In this group, I placed methods that allow you to select areas of interest in images without analyzing them. Most of these methods apply some kind of uniform transformation to all points in the image. At the filtering level, the image is not analyzed, but the points that are filtered can be considered as areas with special characteristics.

Threshold binarization, histogram area selection

The simplest transformation is the binarization of the image by the threshold. For RGB and grayscale images, the threshold is the color value. There are ideal problems in which such a transformation is sufficient. Suppose you want to automatically select items on a white sheet of paper:

The choice of the threshold by which binarization takes place largely determines the process of binarization itself. In this case, the image was binarized by the average color. Typically, binarization is done with an algorithm that adaptively selects a threshold. Such an algorithm can be the choice of expectation or mode. And you can choose the largest peak of the histogram.

Binarization can give very interesting results when working with histograms, including the situation if we consider an image not in RGB, but in HSV. For example, segment the colors of interest. On this principle, it is possible to build both a label detector and a human skin detector.

Classical filtering: Fourier, LPF, HPF

Classical filtering methods from radar and signal processing can be successfully applied in a variety of Pattern Recognition tasks. The traditional method in radar, which is almost never used in images in its pure form, is the Fourier transform (more specifically, the FFT). One of the few exceptions where the 1D Fourier transform is used is image compression. For image analysis, a one-dimensional transformation is usually not enough, you need to use a much more resource-intensive two-dimensional transformation.

Few people actually calculate it, usually it is much faster and easier to use the convolution of the region of interest with a ready-made filter sharpened to high (HPF) or low (LPF) frequencies. Such a method, of course, does not allow spectrum analysis, but in a specific video processing task, it is usually not an analysis that is needed, but a result.

The simplest examples of filters that emphasize low frequencies (Gaussian filter) and high frequencies (Gabor filter).
For each image point, a window is selected and multiplied with a filter of the same size. The result of such a convolution is the new value of the point. When implementing LPF and HPF, images of this type are obtained:

Wavelets

But what if we use some arbitrary characteristic function for convolution with the signal? Then it will be called "Wavelet Transform". This definition of wavelets is not correct, but traditionally, in many teams, wavelet analysis is the search for an arbitrary pattern in an image using convolution with a model of this pattern. There is a set of classical functions used in wavelet analysis. These include the Haar wavelet, the Morlet wavelet, the Mexican hat wavelet, and so on. Haar primitives, about which there were several of my previous articles ( , ), refer to such functions for a two-dimensional space.

Above are 4 examples of classical wavelets. 3D Haar wavelet, 2D Meyer wavelet, Mexican Hat wavelet, Daubechies wavelet. A good example of using the extended interpretation of wavelets is the problem of finding a glint in the eye, for which the glint itself is a wavelet:

Classical wavelets are usually used for , or for their classification (to be described below).

Correlation

After such a free interpretation of wavelets on my part, it is worth mentioning the actual correlation underlying them. When filtering images, this is an indispensable tool. A classic application is video stream correlation to find offsets or optical streams. The simplest shift detector is also, in a sense, a difference correlator. Where the images do not correlate, there was movement.

Function filtering

An interesting class of filters is filtering functions. These are purely mathematical filters that allow you to detect a simple mathematical function in an image (line, parabola, circle). An accumulative image is built, in which for each point of the original image a set of functions that generate it is drawn. The most classical transformation is the Hough transformation for lines. In this transformation, for each point (x;y), a set of points (a;b) of the line y=ax+b is drawn, for which equality is true. Get beautiful pictures:

(the first plus for the one who is the first to find a catch in the picture and such a definition and explain it, the second plus for the one who is the first to say what is shown here)
The Hough transform allows you to find any parameterizable functions. For example circles. There is a modified transformation that allows you to search for any . This transformation is terribly fond of mathematicians. But when processing images, it, unfortunately, does not always work. Very slow speed, very high sensitivity to the quality of binarization. Even in ideal situations, I preferred to get by with other methods.
The counterpart of the Hough transform for lines is the Radon transform. It is calculated through the FFT, which gives a performance gain in a situation where there are a lot of points. In addition, it can be applied to a non-binarized image.

Contour filtering

A separate class of filters is border and contour filtering. Paths are very useful when we want to move from working with an image to working with objects in that image. When an object is quite complex, but well distinguished, then often the only way to work with it is to select its contours. There are a number of algorithms that solve the problem of contour filtering:

The most commonly used is Kenny, who works well and whose implementation is in OpenCV (Sobel is also there, but he looks for contours worse).

Other filters

Above are filters, modifications of which help to solve 80-90% of tasks. But besides them, there are more rare filters used in local tasks. There are dozens of such filters, I will not list them all. Of interest are iterative filters (for example ), as well as ridgelet and curvlet transforms, which are an alloy of classical wavelet filtering and analysis in the radon transform field. Beamlet transform works beautifully on the border of wavelet transform and logical analysis, allowing you to highlight the contours:

But these transformations are very specific and tailored for rare tasks.

Part 2. Logical processing of filtering results

Filtering gives a set of data suitable for processing. But often you can’t just take and use this data without processing it. In this section, there will be several classic methods that allow you to go from the image to the properties of objects, or to the objects themselves.

Morphology

The transition from filtering to logic, in my opinion, is the methods of mathematical morphology ( , ). In fact, these are the simplest operations of increasing and eroding binary images. These methods allow you to remove noise from a binary image by increasing or decreasing the available elements. Based on mathematical morphology, there are contouring algorithms, but usually they use some kind of hybrid algorithms or algorithms in conjunction.

contour analysis

In the section on filtering, algorithms for obtaining boundaries have already been mentioned. The resulting boundaries are quite simply converted into contours. For the Canny algorithm this happens automatically, for other algorithms additional binarization is required. You can get a contour for a binary algorithm, for example, with the beetle algorithm.
The contour is a unique characteristic of an object. Often this allows you to identify the object along the contour. There is a powerful mathematical apparatus that allows you to do this. The apparatus is called contour analysis ( , ).

To be honest, I have never managed to apply contour analysis in real problems. Too ideal conditions are required. Either there is no border, or there is too much noise. But, if you need to recognize something under ideal conditions, then contour analysis is a great option. It works very fast, beautiful mathematics and understandable logic.

Singular points

Keypoints are unique characteristics of an object that allow the object to be associated with itself or with similar object classes. There are dozens of ways to select such points. Some methods highlight special points in neighboring frames, some after a long period of time and when lighting changes, some allow you to find special points that remain so even when the object rotates. Let's start with methods that allow us to find special points that are not so stable, but are quickly calculated, and then we will go in increasing complexity:
First grade. Singular points that are stable for seconds. Such points are used to guide an object between adjacent video frames, or to converge images from neighboring cameras. These points include local maxima of the image, corners in the image (the best of the detectors, perhaps, the Haris detector), points at which the dispersion maxima are reached, certain gradients, etc.
Second class. Singular points that are stable when changing lighting and small movements of the object. Such points serve primarily for training and subsequent classification of object types. For example, a pedestrian classifier or a face classifier is the product of a system built on just such points. Some of the previously mentioned wavelets may be the basis for such points. For example, Haar primitives, glare search, search for other specific features. These points include points found by the method of histograms of directional gradients (HOG).
Third class. stable points. I know only about two methods that give complete stability and about their modifications. This and . They allow you to find key points even when you rotate the image. The calculation of such points takes longer than other methods, but for a rather limited time. Unfortunately, these methods are patented. Although, in Russia it is impossible to patent algorithms, so use it for the domestic market.

Part 3. Training

The third part of the story will be devoted to methods that do not work directly with the image, but which allow you to make decisions. Basically, these are various methods of machine learning and decision making. Recently, Yandyks posted on Habr on this topic, there is a very good selection. Here it is in the text version. For a serious study of the subject, I strongly recommend that you look at them. Here I will try to identify several basic methods used specifically in pattern recognition.
In 80% of situations, the essence of learning in the recognition problem is as follows:
There is a test sample on which there are several classes of objects. Let it be the presence / absence of a person in the photo. For each image, there is a set of features that have been highlighted by some feature, be it Haar, HOG, SURF, or some wavelet. The learning algorithm must build such a model, according to which it will be able to analyze the new image and decide which of the objects is in the image.
How it's done? Each of the test images is a point in the feature space. Its coordinates are the weight of each feature in the image. Let our signs be: “The presence of eyes”, “The presence of a nose”, “The presence of two hands”, “The presence of ears”, etc. We will allocate all these signs with the detectors that we have, which are trained on body parts similar to human. For a person in such a space, the correct point will be . For the monkey, dot for the horse. The classifier is trained on a sample of examples. But not all of the photographs showed hands, others had no eyes, and in the third the monkey had a human nose due to a classifier error. The trainable human classifier automatically splits the feature space in such a way as to say: if the first feature lies in the range 0.5 In essence, the purpose of the classifier is to draw in the feature space the areas characteristic of the objects of classification. This is how the successive approximation to the answer for one of the classifiers (AdaBoost) in two-dimensional space will look like:

There are many classifiers. Each of them works better in some of its tasks. The task of selecting a classifier for a specific task is largely an art. Here are some nice pictures on the topic.

Simple case, one-dimensional separation

Let's take an example of the simplest case of classification, when the feature space is one-dimensional, and we need to separate 2 classes. The situation occurs more often than it might seem: for example, when you need to distinguish two signals, or compare a pattern with a sample. Let's say we have a training sample. In this case, an image is obtained, where the X-axis will be a measure of similarity, and the Y-axis will be the number of events with such a measure. When the desired object is similar to itself, a left Gaussian is obtained. When not similar - right. The value X=0.4 separates the samples so that an erroneous decision minimizes the probability of making any wrong decision. It is the search for such a separator that is the task of classification.

Little note. The criterion that minimizes the error will not always be optimal. The following graph is a graph of an actual iris recognition system. For such a system, the criterion is chosen in such a way as to minimize the probability of false admission of an outsider to the object. Such a probability is called "error of the first kind", "probability of false alarm", "false positive". In English literature "False Access Rate".
) AdaBusta is one of the most common classifiers. For example, the Haar cascade is built on it. Usually used when binary classification is needed, but nothing prevents teaching for more classes.
SVM ( , , , ) One of the most powerful classifiers with many implementations. In principle, on the learning tasks that I encountered, it worked similarly to adabusta. It is considered quite fast, but its training is more difficult than that of Adabusta and requires the choice of the correct kernel.

There are also neural networks and regression. But to briefly classify them and show how they differ, an article much larger than this is needed.
________________________________________________
I hope I was able to give a quick overview of the methods used without diving into the math and description. Maybe this will help someone. Although, of course, the article is incomplete and there is not a word about working with stereo images, or about LSM with the Kalman filter, or about the adaptive Bayesian approach.
If you like the article, then I will try to make the second part with a selection of examples of how existing ImageRecognition problems are solved.

And finally

What to read?
1) Once I really liked the book "Digital Image Processing" by B. Yana, which is written simply and clearly, but at the same time almost all mathematics is given. Good for getting familiar with existing methods.
2) The classic of the genre is R Gonzalez, R. Woods "Digital Image Processing". For some reason, it was more difficult for me than the first one. Much less mathematics, but more methods and pictures.
3) “Image processing and analysis in machine vision problems” - written on the basis of a course taught at one of the departments of PhysTech. A lot of methods and their detailed description. But in my opinion, the book has two big minuses: the book is strongly focused on the software package that is attached to it, in the book too often the description of a simple method turns into mathematical jungle, from which it is difficult to take out the structural diagram of the method. But the authors have made a convenient site, where almost all the content is presented - wiki.technicalvision.ru Add tags

Iteration method. In this method, a comparison is made with a certain database, where for each of the objects there are different options for modifying the display. For example, for optical image recognition, you can apply the iteration method at different angles or scales, offsets, deformations, etc. For letters, you can iterate over the font or its properties. In the case of sound pattern recognition, there is a comparison with some known patterns (a word spoken by many people). Further, a deeper analysis of the characteristics of the image is performed. In the case of optical recognition, this may be the definition of geometric characteristics. The sound sample in this case is subjected to frequency and amplitude analysis.

The next method is use of artificial neural networks(INS). It requires either a huge number of examples of the recognition task, or a special neural network structure that takes into account the specifics of this task. But, nevertheless, this method is characterized by high efficiency and productivity.

Methods based on estimates of the distribution densities of feature values. Borrowed from the classical theory of statistical decisions, in which the objects of study are considered as realizations of a multidimensional random variable distributed in the feature space according to some law. They are based on the Bayesian decision-making scheme, which appeals to the initial probabilities of objects belonging to a particular class and conditional feature distribution densities.

The group of methods based on the estimation of the distribution densities of feature values is directly related to the methods of discriminant analysis. The Bayesian approach to decision making is one of the most developed parametric methods in modern statistics, for which the analytical expression of the distribution law (the normal law) is considered to be known and only a small number of parameters (mean vectors and covariance matrices) need to be estimated. The main difficulties in applying this method are considered to be the need to remember the entire training sample to calculate density estimates and high sensitivity to the training sample.

Methods based on assumptions about the class of decision functions. In this group, the type of the decision function is considered to be known and its quality functional is given. Based on this functional, the optimal approximation to the decision function is found from the training sequence. The decision rule quality functional is usually associated with an error. The main advantage of the method is the clarity of the mathematical formulation of the recognition problem. The possibility of extracting new knowledge about the nature of an object, in particular, knowledge about the mechanisms of interaction of attributes, is fundamentally limited here by a given structure of interaction, fixed in the chosen form of decision functions.

Prototype comparison method. This is the easiest extensional recognition method in practice. It applies when the recognizable classes are shown as compact geometric classes. Then the center of the geometric grouping (or the object closest to the center) is chosen as the prototype point.

To classify an indeterminate object, the prototype closest to it is found, and the object belongs to the same class as it. Obviously, no generalized images are formed in this method. Various types of distances can be used as a measure.

Method of k nearest neighbors. The method lies in the fact that when classifying an unknown object, a given number (k) of geometrically nearest feature space of other nearest neighbors with already known belonging to a class is found. The decision to assign an unknown object is made by analyzing information about its nearest neighbors. The need to reduce the number of objects in the training sample (diagnostic precedents) is a disadvantage of this method, since this reduces the representativeness of the training sample.

Based on the fact that different recognition algorithms behave differently on the same sample, the question arises of a synthetic decision rule that would use the strengths of all algorithms. For this, there is a synthetic method or sets of decision rules that combine the most positive aspects of each of the methods.

In conclusion of the review of recognition methods, we present the essence of the above in a summary table, adding some other methods used in practice.

Table 1. Classification table of recognition methods, comparison of their areas of application and limitations

Classification of recognition methods	Application area	Limitations (disadvantages)
Intensive recognition methods	Methods based on density estimates	Problems with a known distribution (normal), the need to collect large statistics	The need to enumerate the entire training set during recognition, high sensitivity to non-representativeness of the training set and artifacts
Assumption based methods	Classes should be well separable	The form of the decision function must be known in advance. The impossibility of taking into account new knowledge about correlations between features
Boolean Methods	Problems of small dimension	When selecting logical decision rules, a complete enumeration is necessary. High labor intensity
Linguistic Methods		The task of determining the grammar for a certain set of statements (descriptions of objects) is difficult to formalize. Unresolved theoretical problems
Extensional methods of recognition	Prototype comparison method	Problems of small dimension of feature space	High dependence of classification results on the metric. Unknown optimal metric
k nearest neighbor method		High dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. Computational complexity
Grade Calculation Algorithms (ABO)	Problems of small dimension in terms of the number of classes and features	Dependence of classification results on the metric. The need for a complete enumeration of the training sample during recognition. High technical complexity of the method
Collective decision rules (CRC) is a synthetic method.	Problems of small dimension in terms of the number of classes and features	Very high technical complexity of the method, the unresolved number of theoretical problems, both in determining the areas of competence of particular methods, and in the particular methods themselves