Biographies Characteristics Analysis

Asymptotic properties of characterization-based symmetry and agreement tests. Asymptotically optimal Criteria based on the number of cells in generalized layouts

The asymptotic behavior (or asymptotics) of a function in the vicinity of a certain point a (finite or infinite) is understood as the nature of the change of the function as its argument x tends to this point. They usually try to represent this behavior using another, simpler and studied function, which in the vicinity of point a with sufficient accuracy describes the change in the function we are interested in or evaluates its behavior from one side or another. In this regard, the problem arises of comparing the nature of the change of two functions in the vicinity of point a, associated with the consideration of their quotient. Of particular interest are cases when, for x a, both functions are either infinitesimal (infinitesimal) or infinitely large (infinitely large). 10.1. Comparison of infinitesimal functions The main purpose of comparing b.m. functions consists in comparing the nature of their approach to zero at x a, or the speed of their approach to zero. Let b.m. for x a the functions a(i) and P(x) are non-zero in some punctured neighborhood (a) of the point a, and at the point a they are equal to zero or are not defined. Definition 10.1. The functions a(x) and 0(x) are called b.m. of the same order for a and write og(a:) = in O (/?(«)) (the symbol O is read “O big”), if at x a there is a non-zero finite limit of the ratio a(x)//?( i), i.e. Obviously, then, according to (7.24), ΒΒ R\(0), and the notation X^a0[a(x)) is valid, i.e. in fact, taking into account Definition 10.1 and the property of the product of functions (see (7.23)) having finite (in this case non-zero) limits, we obtain ASYMPTOTIC BEHAVIOR OF FUNCTIONS. Definition 10.2. are called b.m. of a higher order of smallness compared to (3(x) (or relative to /3(x)) for x a and write) (the symbol o is read io small if the limit of the ratio a exists and is equal to zero. In this case also they say that the function is of a lower order of smallness compared to a(x) for x a, and the word smallness is usually omitted (as in the case of a higher order in Definition 10.2). This means that if lim (then). the function /)(x) is, according to Definition 10.2, b.m. higher order compared to a(x) for x a and a(i) are b.m. lower order compared to /3(x) for x a, because in this case lijTi (fi(x)/ot(x)) . So we can write According to Theorem 7.3 about the connection between a function, its limit and b.m. functions from (10.3) it follows that ot) is a function, b.m. at. Hence a(x), i.e. values ​​|a(z)| for x close to a, much less than the values ​​of \0(x)\. In other words, the function a(x) tends to zero faster than the function /?(x). Theorem 10.1. The product of any b.m. for x a the functions a(x) and P(x)) are different from zero in some punctured neighborhood of the point a, there are for x-¥a b.m. a function of higher order compared to each of the factors. Indeed, according to the definition of 10.2 b.m. higher order (taking into account Definition 7.10 b.m. functions), equalities mean the validity of the theorem. Equalities containing the symbols O and o are sometimes called asymptotic estimates. Definition 10.3. The functions ot(x) and /3(x) are called incomparable b.m. for x -¥ a, if there is neither a finite nor an infinite limit to their ratio, i.e. if $ lim a(x)/0(x) (p £ as well as $ lim 0(x)/a(x)). Example 10.1. A. The functions a(x) = x and /?(x) = sin2ar by definition 10.1 - b.m. of the same order at x 0, since taking into account (b. The function a(x) = 1 -coss, by definition 10.2, is b.m. of a higher order compared to 0(x) = x at x 0, since with taking into account c. The function a(z) = \/x is of a lower order compared to fl(x) = x for x 0, since g. The functions a(s) = = x according to Definition 10.3 are incomparable b.m. at x 0, since the limit ASYMPTOTIC BEHAVIOR OF FUNCTIONS. Comparison of infinitesimal functions does not exist (neither finite nor infinite - see example 7.5). The power function x11 with exponent n 6 N, n > 1, is at x a b.m. of a higher order compared to xn~1) i.e. yapa = ao(a:n"*1), since lim (xL/xn"1) = If a more accurate comparative description of the behavior of b.m. is necessary. functions for x - and one of them is chosen as a kind of standard and called the main one. Of course, the choice of the main b.m. to a certain extent arbitrary (they try to choose a simpler one: x for x -*0; x-1 for x -41; 1/x for x ->oo, etc.). From the degrees 0k(x) the main b.m. functions /)(x) with different exponents k > 0 (for k ^ 0 0k(x) is not a b.m.) constitute a comparison tie for estimating a more complex b.m. functions a(z). Definition 10.4. The function a(z) is called b.m. kth order of smallness relative to (3(x) for x a, and the number k is of order of smallness if the functions a(z) and /Zk(x) are of the same order for x a) i.e. if the word “smallness” is usually omitted in this case. Note: 1) the order k of one b.m. function relative to another can be any positive number; 2) if the order of the function a(x) relative to /3(x) is equal to k, then the order of the function P(x) relative to a(x) is equal to 1/k; 3) not always for the b.m. function a(x), even comparable to all powers of /? *(x), you can specify a specific order k. Example 10.2. A. Function cosx, according to definition 10.4, - b.m. order k = 2 relative to 0(x) = x for x 0, since taking into account b. Let's look at the functions. Let us show that for any Indeed, according to (7.32). Thus, b.m. for x -»+0 the function a1/1 is comparable to xk for any k > 0, but for this function it is not possible to indicate the order of smallness with respect to x. # Determine the order of one b.m. functions relative to another is not always easy. We can recommend the following procedure: 1) write the relation a(x)/0k(x) under the limit sign; 2) analyze the written relation and try to simplify it; 3) based on known results, make an assumption about the possible value of k) at which a non-zero finite limit will exist; 4) check the assumption by calculating the limit. Example 10.3. Let us determine the order of b.m. functions tgx - sin x relative to x for x -» 0, i.e. Let's find a number k > O such that we have ASYMPTOTIC BEHAVIOR OF THE FUNCTIONS. Comparison of infinitesimal functions. At this stage, knowing that for x 0, according to (7.35) and (7.36), (sinx)/x 1 and cosx -> 1, and taking into account (7.23) and (7.33), we can determine that condition (10.7) will be fulfilled at k = 3. Indeed, direct calculation of the limit at k = 3 gives the value A = 1/2: Note that for k > 3 we obtain an infinite limit, and at the limit it will be equal to zero.

1 Entropy and information distance

1.1 Basic definitions and notations.

1.2 Entropy of discrete distributions with limited mathematical expectation.

1.3 Logarithmic generalized metric on a set of discrete distributions.

1.4 Compactness of functions with a countable set of arguments

1.5 Continuity of information distance Kullback - Leibler - Sanov

1.6 Conclusions.

2 Probabilities of large deviations

2.1 Probabilities of large deviations of functions from the number of cells with a given filling.

2.1.1 Local limit theorem.

2.1.2 Integral limit theorem.

2.1.3 Information distance and probabilities of large deviations of separable statistics

2.2 Probabilities of large deviations of separable statistics that do not satisfy the Cramer condition.

2.3 Conclusions.

3 Asymptotic properties of goodness-of-fit criteria

3.1 Consent criteria for selection without return scheme

3.2 Asymptotic relative efficiency of goodness-of-fit criteria.

3.3 Criteria based on the number of cells in general layouts.

3.4 Conclusions.

Recommended list of dissertations

  • Asymptotic efficiency of goodness-of-fit tests based on characterization properties of distributions 2011, Candidate of Physical and Mathematical Sciences Volkova, Ksenia Yurievna

  • Large deviations and limit theorems for some random walk functionals 2011, candidate of physical and mathematical sciences Shklyaev, Alexander Viktorovich

  • Limit theorems and large deviations for random walk increments 2004, candidate of physical and mathematical sciences Kozlov, Andrey Mikhailovich

  • On the rate of convergence of statistics of goodness-of-fit tests with power measures of divergence to the chi-square distribution 2010, candidate of physical and mathematical sciences Zubov, Vasily Nikolaevich

  • Probabilities of large deviations of asymptotically homogeneous ergodic Markov chains in space 2004, Doctor of Physical and Mathematical Sciences Korshunov, Dmitry Alekseevich

Introduction of the dissertation (part of the abstract) on the topic “Asymptotic properties of goodness-of-fit criteria for testing hypotheses in a selection scheme without returning, based on filling cells in a generalized placement scheme”

Object of research and relevance of the topic. In the theory of statistical analysis of discrete sequences, a special place is occupied by goodness-of-fit tests for testing a possibly complex null hypothesis, which is that for a random sequence such that

Xi e hi,i = 1, ,n, where hi = (0,1,. ,M), for any i = 1,.,n, and for any k £ 1m the probability of the event

Xi = k) does not depend on r. This means that the sequence is in some sense stationary.

In a number of applied problems, the sequence (Xr-)™ = 1 is considered to be a sequence of colors of balls when choosing without returning until exhaustion from an urn containing n - 1 > 0 balls of color k, k € 1m - We will denote the set of such selections O(n0 - 1, .,pm - 1). Let there be a total of n - 1 balls in the urn, m k=0

Let us denote by r(k) (fc) Jk) rw - Г! , . . . , sequence of numbers of balls of color A; in the sample. Consider the sequence where k)

Kk-p-GPk1.

The sequence h^ is defined using the distances between the locations of adjacent balls of color k in such a way that

Pk Kf = p. 1>=1

The set of sequences h(fc) for all k £ 1m uniquely determines the sequence. Sequences hk for different k are dependent on each other. In particular, any of them is uniquely determined by all the others. If the cardinality of the set 1m is 2, then the sequence of colors of balls is uniquely determined by the sequence of distances between the places of neighboring balls of the same fixed color. Let there be N - 1 balls of color 0 in an urn containing n - 1 balls of two different colors. We can establish a one-to-one correspondence between the set ffl(N- l,n - N) and the set 9\n,N vectors h(n, N ) = (hi,., hjf) with positive integer components such that K = P. (0.1)

The set 9П)дг corresponds to the set of all different partitions of a positive integer n into N ordered terms.

Having specified a certain probability distribution on the set of vectors £Hn,dr, we obtain the corresponding probability distribution on the set Wl(N - 1,n - N). A set is a subset of a set of vectors with non-negative integer components satisfying (0.1). Distributions of the form will be considered as probability distributions on a set of vectors in the dissertation work

P(%,N) = (n,.,rN)) = P(£„ = ru,v = l,.,N\jr^ = n), (0.2) where. ,£dr - independent non-negative integer random variables.

Distributions of the form (0.2) in /24/ are called generalized schemes for placing n particles in N cells. In particular, if the random variables £b. ,£lg in (0.2) are distributed according to Poisson’s laws with parameters Ai,., Ldr respectively, then the vector h(n,N) has a polynomial distribution with the probabilities of outcomes

Ri = . , L" ,V = \,.,N.

L\ + . . . + AN

If the random variables £ь >&v in (0-2) are identically distributed according to the geometric law where p is any in the interval 0< р < 1, то, как отмечено в /25/,/26/, получающаяся обобщенная схема размещения соответствует равномерному распределению на множестве В силу взаимнооднозначного соответствия между множеством dft(N - 1 ,п - N) и множеством tRn,N получаем равномерное распределение на множестве выборов без возвращения. При этом, вектору расстояний между местами шаров одного цвета взаимно однозначно соответствует вектор частот в обобщенной схеме размещения, и, соответственно, числу расстояний длины г - число ячеек, содержащих ровно г частиц. Для проверки по единственной последовательности гипотезы о том, что она получена как результат выбора без возвращения, и каждая такая выборка имеет одну и ту же вероятность можно проверить гипотезу о том, что вектор расстояний между местами шаров цвета 0 распределен как вектор частот в соответствующей обобщенной схеме размещения п частиц по N ячейкам.

As noted in /14/, /38/, a special place in testing hypotheses about the distribution of frequency vectors h(n, N) = (hi,., /gdr) in generalized schemes for placing n particles in N cells is occupied by criteria based on based on statistics of the form 1 m(N -l,n-N)\ N

LN(h(n,N))=Zfv(hv)

Фн = Ф(-Т7, flQ Hi II-

0.4) where fu, v = 1,2,. and φ - some real-valued functions, N

Mr = E = r), r = 0.1,. 1/=1

The quantities in /27/ were called the number of cells containing exactly g particles.

Statistics of the form (0.3) in /30/ are called separable (additively separable) statistics. If the functions /„ in (0.3) do not depend on u, then such statistics were called in /31/ symmetric separable statistics.

For any r the statistic /xr is a symmetric separable statistic. From equality

E DM = E DFg (0.5) it follows that the class of symmetric separable statistics of hv coincides with the class of linear functions of fir. Moreover, the class of functions of the form (0.4) is wider than the class of symmetric separable statistics.

But = (#o(n, N)) is a sequence of simple null hypotheses that the distribution of the vector h(n, N) is (0.2), where the random variables are,. in (0.2) are identically distributed and k) = pk,k = 0,1,2,., the parameters n, N change in the central region.

Consider some P £ (0,1) and a sequence of, generally speaking, complex alternatives

H = (H(n, N)) such that exists - the maximum number for which, for any simple hypothesis H\ € H(n, N), the inequality holds

РШ > an,N(P)) > Р

We will reject the hypothesis Hq(ti,N) if fm > asm((3). If there is a limit

Шп ~1пР(0н > an,N(P))=u(p,Н), where the probability for each N is calculated under the hypothesis Нк(п, N), then the value ^(/З, Н) is named in /38/ index of the criterion φ at the point (j3, H). The last limit may, generally speaking, not exist. Therefore, in the dissertation work, in addition to the criterion index, the value is considered

Ish (~1pP(fm > al(/?)))

JV->oo N-ooo mean, respectively, the lower and upper limits of the sequence (odg) for N -> oo,

If a criterion index exists, then the criterion's subscript coincides with it. The lower index of the criterion always exists. The higher the value of the criterion index (subscript of the criterion), the better the statistical criterion in the sense under consideration. In /38/, the problem of constructing goodness-of-fit criteria for generalized layouts with the highest value of the criterion index in the class of criteria that reject the hypothesis Ho(n,N) at /MO Ml Mt MS iV" iV""""" ~yv" " was solved ^ "where m > 0 is some fixed number, the sequence of constant edg is selected based on the given value of the power of the criterion for the sequence of alternatives, ft is a real function of m + 1 arguments.

The criterion indices are determined by the probabilities of large deviations. As was shown in /38/, the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of separable statistics when Cramer’s condition is satisfied for the random variable /(ξ) is determined by the corresponding Kull-Bak-Leibler-Sanov information distance (the random variable rj satisfies the condition Cramer, if for some R > 0 the moment generating function Metr] is finite in the interval \t\< Н /28/).

The question of the probabilities of large deviations of statistics from an unlimited number of fir, as well as arbitrary separable statistics that do not satisfy the Cramer condition, remained open. This did not allow us to finally solve the problem of constructing criteria for testing hypotheses in generalized placement schemes with the highest rate of tending to zero of the probability of an error of the first kind with non-approaching alternatives in the class of criteria based on statistics of the form (0.4). The relevance of the dissertation research is determined by the need to complete the solution to the specified problem.

The purpose of the dissertation work is to construct goodness-of-fit criteria with the highest value of the criterion index (subscript of the criterion) for testing hypotheses in a selection scheme without return in the class of criteria that reject the hypothesis U(n, N) for $.<>,■ ■)><*. (0-7) где ф - функция от счетного количества аргументов, и параметры п, N изменяются в центральной области.

In accordance with the purpose of the study, the following tasks were set:

Investigate the properties of entropy and information distance Kull-Bak - Leibler - Sanov for discrete distributions with a countable number of outcomes;

Investigate the probabilities of large deviations of statistics of the form (0.4);

Investigate the probabilities of large deviations of symmetric separable statistics (0.3) that do not satisfy the Cramer condition;

Find a statistic such that the goodness-of-fit criterion constructed on its basis for testing hypotheses in generalized layouts has the highest index value in the class of criteria of the form (0.7).

Scientific novelty:

Scientific and practical value. The work solves a number of questions about the behavior of the probabilities of large deviations in generalized placement schemes. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences, and were used in /3/, /21/ to justify the security of one class of information systems. Provisions for defense:

Reducing the problem of testing, based on a single sequence of ball colors, the hypothesis that this sequence is obtained as a result of a choice without returning until the balls are exhausted from an urn containing balls of two colors, and each such choice has the same probability, to the construction of goodness-of-fit criteria for testing hypotheses in the corresponding generalized layout;

Continuity of the entropy and Kullback-Leibler-Sanov information distance functions on an infinite-dimensional simplex with the introduced logarithmic generalized metric;

A theorem on the rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition in the generalized placement scheme in the semi-exponential case;

Theorem on rough (up to logarithmic equivalence) asymptotics of the probabilities of large deviations for statistics of the form (0.4);

Construction of a goodness-of-fit criterion for testing hypotheses in generalized layouts with the highest index value in the class of criteria of the form (0.7).

Approbation of work. The results were presented at seminars of the Department of Discrete Mathematics of the Mathematical Institute named after. V. A. Steklov RAS, information security department of ITM&VT named after. S. A. Lebedev RAS and at:

Fifth All-Russian Symposium on Applied and Industrial Mathematics. Spring session, Kislovodsk, May 2 - 8, 2004;

Sixth International Petrozavodsk Conference "Probabilistic methods in discrete mathematics" June 10 - 16, 2004;

Second International Conference "Information Systems and Technologies (IST" 2004)", Minsk, November 8-10, 2004;

International conference "Modern Problems and new Trends in Probability Theory", Chernivtsi, Ukraine, June 19 - 26, 2005.

The main results of the work were used in the research work "Apology", carried out by ITMiVT RAS. S. A. Lebedev in the interests of the Federal Service for Technical and Export Control of the Russian Federation, and were included in the report on the implementation of the research stage /21/. Some results of the dissertation were included in the research report "Development of mathematical problems of cryptography" of the Academy of Cryptography of the Russian Federation for 2004 /22/.

The author expresses deep gratitude to the scientific supervisor, Doctor of Physical and Mathematical Sciences A. F. Ronzhin and the scientific consultant, Doctor of Physical and Mathematical Sciences, Senior Researcher A. V. Knyazev. The author expresses gratitude to Doctor of Physical and Mathematical Sciences, Professor A. M. Zubkov and Candidate of Physical and Mathematical Sciences Mathematical Sciences I. A. Kruglov for his attention to the work and a number of valuable comments.

Structure and content of the work.

The first chapter examines the properties of entropy and information distance for distributions on the set of non-negative integers.

In the first paragraph of the first chapter, notations are introduced and the necessary definitions are given. In particular, the following notation is used: x = (xq,x\, . ) - an infinite-dimensional vector with a countable number of components;

H(x) - -Ex^oXvlnx,-, truncm(x) = (x0,x1,.,xm,0,0,.)] f2* = (x, xi > 0, zy = 0.1,. , Oh"< 1}; Q = {х, х, >0,u = 0.1,., o xv = 1); = (x G O, ££L0 = 7);

Ml = o Ue>1|5 € o< Ml - 7МГ1 < 00}. Понятно, что множество £1 соответствует семейству вероятностных распределений на множестве неотрицательных целых чисел, П7 - семейству вероятностных распределений на множестве неотрицательных целых чисел с математическим ожиданием 7.

If y 6E P, then for e > 0 the set will be denoted by Oe(y)

Oe(y) - (x^< уие£ для всех v = 0,1,.}.

In the second paragraph of the first chapter, a theorem on the boundedness of the entropy of discrete distributions with limited mathematical expectation is proved.

Theorem 1. On the boundedness of the entropy of discrete distributions with bounded mathematical expectation.

For any 6 P7

H(x)

If x € fly corresponds to a geometric distribution with a mathematical definition of 7, that is, 7 x„ = (1- р)р\ v = 0.1,., where р = --,

1 + 7 then the equality holds

H(x) = F(<7).

The statement of the theorem can be viewed as the result of a formal application of the Lagrange method of conditional multipliers in the case of an infinite number of variables. The theorem that the only distribution on the set (k, k + 1, k + 2,.) with a given mathematical expectation and maximum entropy is a geometric distribution with a given mathematical expectation is given (without proof) in /47/. The author, however, has given strict proof.

The third paragraph of the first chapter gives the definition of a generalized metric - a metric that allows infinite values.

For x,y € Q the function p(x,y) is defined as the minimal e > O with the property yie~£<хи< уиее для всех и = 0,1,. Если такого е не существует, то полагается, что р(х,у) = оо.

It is proved that the function p(x,y) is a generalized metric on the family of distributions on the set of non-negative integers, as well as on the entire set Cl*. Instead of e in the definition of the metric p(x,y), you can use any other positive number other than 1. The resulting metrics will differ by a multiplicative constant. Let us denote by J(x, y) the information distance

00 £ J(x,y) = E In-.

Here and below it is assumed that 0 In 0 = 0.0 In jj = 0. The information distance is defined for such x, y that x„ = 0 for all and such that y = 0. If this condition is not met, then we will assume J(x,ij) = oo. Let L SP. Then we will denote

J (A Y) = |nf J(x,y).

The fourth paragraph of the first chapter gives the definition of compactness of functions defined on the set Q*. The compactness of a function with a countable number of arguments means that with any degree of accuracy the value of the function can be approximated by the values ​​of this function at points where only a finite number of arguments are non-zero. The compactness of the entropy and information distance functions is proved.

1. For any 0< 7 < оо функция Н(х) компактна на

2. If for some 0< 70 < оо

R e then for any 0<7<оо,г>0 the function x) = J(x,p) is compact on the set

The fifth paragraph of the first chapter discusses the properties of the information distance defined on an infinite-dimensional space. Compared to the finite-dimensional case, the situation with the continuity of the information distance function changes qualitatively. It is shown that the information distance function is not continuous on the set in any of the metrics

Pl&V) = E\Xi~Y»\, u=0

E (xv - Ui)2 v=Q

Рз(х,у) = 8Up\xu-yv\. v

The validity of the following inequalities is proved for the entropy functions H(x) and information distance J(x,p):

1. For any x, x" € fi

N(x) - N(x")\< - 1){Н{х) + Н{х")).

2. If for some x,p e P there exists e > 0 such that x 6 0 £(p), then for any x" £ Q J(x,p) - J(x",p)|< (е"М - 1){Н{х) + Н{х") + ееН(р)).

From these inequalities, taking into account Theorem 1, it follows that the entropy and information distance functions are uniformly continuous on the corresponding subsets of Q in the metric p(x,y)t, namely,

1. For any 7 such that 0< 7 < оо, функция Н(х) равномерно непрерывна на Г2 в метрике р(ж,у);

2. If for some 70, 0< 70 < оо

TO for any 0<7<оои£>0 function

L p(x) = J(x,p) is uniformly continuous on the set Π Oe(p) in the metric p(x,y).

A definition of non-extremal function is given. The non-extremal condition means that the function does not have local extrema, or the function takes the same values ​​at local minima (local maxima). The non-extrema condition weakens the requirement of the absence of local extrema. For example, the function sin x on the set of real numbers has local extrema, but satisfies the non-extremal condition.

Let for some 7 > 0, the region A is given by the condition

A = (x € VLv4>(x) > a), (0.9) where φ(x) is a real-valued function, a is some real constant, inf φ(x)< а < inf ф(х).

The question was studied under what conditions on the function φ when changing the parameters n,N in the central region, ^ -; 7, for all sufficiently large values ​​there are non-negative integers ko, k\,., kn such that k0 + ki + . + kn = N, k\ + 2k2. + control panel - N and

F(ko k\ kp

-£,0,0 ,.)>a.

It is proved that for this it is sufficient to require that the function φ be non-extremal, compact and continuous in the metric p(x,y), and also that for at least one point x satisfying (0.9), for some e > 0 there exists a finite moment degrees 1 + e and x„ > 0 for any v = 0.1.

In the second chapter, we study the rough (up to logarithmic equivalence) asymptotics of the probability of large deviations of functions from D = (^0) ■ ) Ts "n, 0, .) - the number of cells with a given filling in the central region of change of parameters N, n. Rough The asymptotics of the probabilities of large deviations are sufficient to study the indices of the agreement criteria.

Let the random variables ^ in (0.2) be identically distributed and

P(z) - generating function of a random variable - converges in a circle of radius 1< R < оо. Следуя /38/, для 0 < z < R обозначим через £(z) случайную величину такую, что

Ml+£ = £ i1+ex„< 00.

0.10) k] = Pk, k = 0.1,.

Let's denote

If there is a solution to the equation m Z(z) = ъ then it is unique /38/. Throughout what follows we will assume that pk > O,A; = 0.1,.

The first paragraph of the first paragraph of the second chapter contains the asymptotics of logarithms of probabilities of the form

1pP(/x0 = ko,.,tsp = kp).

The following theorem is proved.

Theorem 2. Rough local theorem on the probabilities of large deviations. Let n, N -» oo so that jj ->7.0<7 < оо, существует z7 - корень уравнения M£(z) = 7, с. в. £(г7) имеет положительную дисперсию. Тогда для любого k G Cl(n,N)

1nP(D = k) = JftpK)) + O(^lniV).

The statement of the theorem follows directly from the formula for the joint distribution fii,. fin in /26/ and the following estimate: if non-negative integer values ​​, Нп satisfy the condition

Hi + 2d2 + + PNn = n, then the number of non-zero values ​​among them is 0(l/n). This is a rough estimate and does not claim to be new. The number of non-zero CGs in generalized layout schemes does not exceed the value of the maximum filling of cells, which in the central region, with a probability tending to 1, does not exceed the value O(lnn) /25/, /27/. Nevertheless, the resulting estimate 0(y/n) is satisfied with probability 1 and is sufficient to obtain rough asymptotics.

In the second paragraph of the first paragraph of the second chapter, the value of the limit is found where adg is a sequence of real numbers converging to some a G R, φ(x) is a real-valued function. The following theorem is proved.

Theorem 3. Rough integral theorem on the probabilities of large deviations. Let the conditions of Theorem 2 be satisfied, for some r > 0, C > 0 the real function φ(x) is compact and uniformly continuous in the metric p on the set

A = 0r+<;(p(z7)) П Ц7+с] и удовлетворяет условию неэкстремальности на множестве fly. Если для некоторой константы а такой, что inf ф(х) < а < sup ф(х). xeily существует вектор ра € fi7 П 0r(p(z7)); такой, что

Ф(ra) > a and j(( (x) >a,xe P7),p(2;7)) = 7(pa,p(*y)) mo for any sequence a^ converging to a,

Jim -vbPW%%,.)>aN) = J(pa,p(2h)). (0.11)

With additional restrictions on the function φ(x), the information distance J(pa,p(z7)) in (2.3) can be calculated more specifically. Namely, the following theorem is true. Theorem 4. On information distance. Let for some 0< 7 < оо для некоторвх г >0, C > 0, the real function φ(x) and its first-order partial derivatives are compact and uniformly continuous in the generalized metric p(x, y) on the set p G

A = Og(p) P %+c] there exist T > 0, R > 0 such that for all \t\<Т,0 < z < R,x е А

E^exp^-f(x))< оо,

0(a;)exp(t-< со, i/=o oxv 0X1/ для некоторого е >O oo Q pvv1+£zu exp(t-ph(x))< оо, (0.13) и существует единственный вектор x(z,t), удовлетворяющий системе уравнений xv(z, t) = pvzv ехр {Ь-ф(х(г, t))}, v = 0,1,. функция ф(х) удовлетворяет на множестве А условию неэкстремальности, а - некоторая константа, ф(р) < а < sup ф(:x)(z,t),

0

00 vpv(za,ta) = 7, 1/=0

0(p(*aL)) = a, where

Then p(za, ta) € and

J((x e А,ф(х) = а),р) = J(p(za, ta),p)

00 d 00 d = l\nza + taYl ir- (x(za,ta)) - In E^r/exp(ta-z- (p(zatta))). j/=0 C^i/ t^=0

If the function f(x) is a linear function, and the function f(x) is defined using equality (0.5), then condition (0.12) turns into Cramer’s condition for the random variable f(£(z)). Condition (0.13) is a form of condition (0.10) and is used to prove the presence in domains of the form (x G f(x) > a) of at least one point from 0(n, N) for all sufficiently large n, N.

Let ^)(n, N) = (hi,., /gdr) be the frequency vector in the generalized layout scheme (0.2). As a corollary of Theorems 3 and 4, the following theorem is formulated.

Theorem 5. Rough integral theorem on the probabilities of large deviations of symmetric separable statistics in a generalized allocation scheme.

Let n, N -» oo so that ^ - 7, 0< 7 < оо, существует z1 - корень уравнения М£(,г) = 7, с. в. £(27) имеет положительную дисперсию и максимальный шаг распределения 1, а - некоторая константа, f(x) - действительная функция, а < Mf(^(z1)), существуют Т >0,R > 0 such that for all |t|<Т,0 < z < R,

00 oo, u=0 there are such ta\

E vVi/("01 ta) = b where f(v)p"(za,ta) = a, 1/=0

Then for any sequence adg converging to a,

Jim - - InF»(- £ f(h„) > aN) = J(p(za,ta),p(z7))

00 7 In 2a + taa - In £ p^/e^M i/=0

This theorem was first proven by A.F. Ronzhin in /38/ using the saddle point method.

In the second paragraph of the second chapter, the probabilities of large deviations of separable statistics in generalized cxj^iax placements are studied in the case of failure to satisfy the Cramer condition for the random variable f(€(z)). Cramer's condition for the random variable f(£(z)) is not satisfied, in particular, if £(z) is a Poisson random variable and f(x) is x2. Note that Cramer's condition for the separable statistics themselves in generalized allocation schemes is always satisfied, since for any fixed n, N the number of possible outcomes in these schemes is finite.

As noted in /2/, if the Cramer condition is not satisfied, then in order to find the asymptotics of the probabilities of large deviations of sums of identically distributed random variables, additional ones are required. f

V and. . I conditions of correct change on the distribution of the term. In progress j

O, 5 the case corresponding to the fulfillment of condition (3) in /2/ is considered, that is, the seven-exponential case. Let P(£i = k) > 0 for all k = 0,1. and the function p(k) = -\nP(^ = k), can be extended to a function of continuous argument - a regularly varying function of order p, 0< р < со /45/, то есть положительной функции такой, что при t ->oo p(tx) xr.

Let the function f(x) for sufficiently large values ​​of the argument be a positive strictly increasing, regularly varying function of order. Let us define the function cp(x) by setting for sufficiently large x φ) = p(Γ\x)).

On the rest of the numerical axis, ip(x) can be specified in an arbitrary limited measurable way.

Then s. V. /(£i) has moments of any order and does not satisfy the Cramer condition, p(x) = o(x) as x -> ω, and the following Theorem 6 is valid. Let the function ip(x) be monotonically nondecreasing for sufficiently large x, fg^ction does not increase monotonically, n, N -> oo so that jj - A, 0< Л < оо; гд - единственный корень уравнения M^i(^) = Л, тогда для любого с >b(z\), where b(z) = M/(£i(.z)), there is a limit CN) = -(c - b(z\))4.

It follows from Theorem b that if Cramer’s condition is not met, the limit lim 1 InP(LN(h(n, N)) > cN) = 0, ^ ^ iv-too iv which proves the validity of the hypothesis stated in /39/. Thus, the value of the index of the agreement criterion in generalized placement schemes and failure to fulfill Cramer’s condition is always equal to zero. In this case, in the class of criteria, when Cramer’s condition is satisfied, criteria with a non-zero index value are constructed. From this we can conclude that using criteria whose statistics do not satisfy the Cramer condition, for example, the chi-square test in a polynomial scheme, to construct goodness-of-fit tests for testing hypotheses for non-converging alternatives in the indicated sense is asymptotically ineffective. A similar conclusion was made in /54/ based on the results of a comparison of chi-square and maximum likelihood ratio statistics in a polynomial scheme.

The third chapter solves the problem of constructing goodness-of-fit criteria with the largest value of the criterion index (the largest value of the subscript of the criterion) to test hypotheses in generalized placement schemes. Based on the results of the first and second chapters on the properties of the entropy functions, information distance and probabilities of large deviations, in the third chapter a function of the form (0.4) is found such that the goodness-of-fit criterion constructed on its basis has the largest value of the exact subscript in the class of criteria under consideration. The following theorem is proved.

Theorem 7. On the existence of an index. Let the conditions of Theorem 3 be satisfied: 0< /3 < 1, Н = Hp(i),Hp(2>,. is a sequence of alternative distributions, а,ф((3, N) is the maximum number for which, under the hypothesis Нр<ло выполнено неравенство существует предел lim^-оо о>φ(P, N) - a. Then at the point (/3, H) there is a criterion index φ

Zff, H) = 3((φ(x) > a, x £ ^.PW).

Shy)<ШН)>where w/fo fh h v^l ^

The Conclusion sets out the results obtained in their relation to the general goal and specific tasks posed in the dissertation, formulates conclusions based on the results of the dissertation research, indicates the scientific novelty, theoretical and practical value of the work, as well as specific scientific tasks identified by the author and the solution of which seems relevant .

Brief review of the literature on the research topic. The thesis examines the problem of constructing agreement criteria in generalized placement schemes with the highest value of the criterion index in the class of functions of the form (0.4) with non-converging alternatives.

Generalized layout schemes were introduced by V.F. Kolchin in /24/. The quantities in the polynomial scheme were called the number of cells with g pellets and were studied in detail in the monograph by V. F. Kolchin, B. A. Sevastyanov, V. P. Chistyakov /27/. The values ​​of fir in generalized layouts were studied by V.F. Kolchin in /25/, /26/. Statistics of the form (0.3) were first considered by Yu. I. Medvedev in /30/ and were called separable (additively separable) statistics. If the functions /„ in (0.3) do not depend on u, such statistics were called in /31/ symmetric separable statistics. The asymptotic behavior of the moments of separable statistics in generalized allocation schemes was obtained by G. I. Ivchenko in /9/. Limit theorems for a generalized layout scheme were also considered in /23/. Reviews of the results of limit theorems and agreement criteria in discrete probabilistic schemes of type (0.2) were given by V. A. Ivanov, G. I. Ivchenko, Yu. I. Medvedev in /8/ and G. I. Ivchenko, Yu. I. Medvedev , A.F. Ronzhin in /14/. Agreement criteria for generalized layouts were considered by A.F. Ronzhin in /38/.

A comparison of the properties of statistical criteria in these works was carried out from the point of view of relative asymptotic efficiency. The case of converging (contigual) hypotheses was considered - efficiency in the sense of Pitman and non-converging hypotheses - efficiency in the sense of Bahadur, Hodges - Lehman and Chernov. The relationship between different types of relative performance statistical tests is discussed, for example, in /49/. As follows from the results of 10. I. Medvedev in /31/ on the distribution of separable statistics in a polynomial scheme, the greatest asymptotic power under convergent hypotheses in the class of separable statistics on the frequencies of outcomes in a polynomial scheme has a criterion based on the chi-square statistic. This result was generalized by A.F. Ronzhin for circuits of type (0.2) in /38/. I. I. Viktorova and V. P. Chistyakov in /4/ constructed an optimal criterion for a polynomial scheme in the class of linear functions of /xr. A.F. Ronzhin in /38/ constructed a criterion that, given a sequence of alternatives that are not close to the null hypothesis, minimizes the logarithmic rate at which the probability of an error of the first kind tends to zero, in the class of statistics of the form (0.6). A comparison of the relative performance of the chi-square and maximum likelihood ratio statistics under approaching and non-approximating hypotheses was carried out in /54/.

The thesis considered the case of non-converging hypotheses. Studying the relative statistical effectiveness of criteria under non-converging hypotheses requires studying the probabilities of extremely large deviations - of the order of 0(i/n). For the first time, such a problem for a polynomial distribution with a fixed number of outcomes was solved by I. N. Sanov in /40/. The asymptotic optimality of goodness-of-fit tests for testing simple and complex hypotheses for a multinomial distribution in the case of a finite number of outcomes with non-converging alternatives was considered in /48/. The properties of information distance were previously considered by Kullback, Leibler /29/,/53/ and I. II. Sanov /40/, as well as Hoeffding /48/. In these works, the continuity of information distance was considered on finite-dimensional spaces in the Euclidean metric. A number of authors considered a sequence of spaces with increasing dimension, for example, in the work of Yu. V. Prokhorov /37/ or in the work of V. I. Bogachev, A. V. Kolesnikov /1/. Rough (up to logarithmic equivalence) theorems on the probabilities of large deviations of separable statistics in generalized allocation schemes under the Cramer condition were obtained by A.F. Ronzhin in /38/. A. N. Timashev in /42/,/43/ obtained exact (up to equivalence) multidimensional integral and local limit theorems on the probabilities of large deviations of the vector fir^n, N),., iir.(n,N), where s, r\,., rs - fixed integers,

ABOUT<П < .

The study of the probabilities of large deviations when the Cramer condition is not met for the case of independent random variables was carried out in the works of A. V. Nagaev /35/. The method of conjugate distributions is described by Feller /45/.

Statistical problems of testing hypotheses and estimating parameters in a selection scheme without return in a slightly different formulation were considered by G. I. Ivchenko, V. V. Levin, E. E. Timonina /10/, /15/, where estimation problems were solved for a finite population, when the number of its elements is an unknown quantity, the asymptotic normality of multivariate S - statistics from s independent samples in a selection scheme without reversion was proved. The problem of studying random variables associated with repetitions in sequences of independent trials was studied by A. M. Zubkov, V. G. Mikhailov, A. M. Shoitov in /6/, /7/, /32/, /33/, /34/ . An analysis of the main statistical problems of estimation and testing of hypotheses within the framework of the general Markov-Pólya model was carried out by G. I. Ivchenko, Yu. I. Medvedev in /13/, a probabilistic analysis of which was given in /11/. A method for specifying non-uniform probability measures on a set of combinatorial objects, which is not reducible to the generalized placement scheme (0.2), was described in G. I. Ivchenko, Yu. I. Medvedev /12/. A number of problems in probability theory, in which the answer can be obtained as a result of calculations using recurrent formulas, were indicated by A. M. Zubkov in /5/.

Inequalities for the entropy of discrete distributions were obtained in /50/ (cited from the abstract of A. M. Zubkov in RZhMat). If (pn)^Lo is the probability distribution, oo

Рп = Е Рк, к=тг

A = supp^Pn+i< оо (0.14) п>0 and

F(x) = (x + 1) In (x + 1) - x In x, then for the entropy I of this probability distribution

00 i = - 5Z Рк^Рк к=0 the inequalities are valid -L 1 00 00 Р

I + (In -f-) £ (Arn - Rn+1)< F(А) < Я + £ (АРп - P„+i)(ln

L D p=P -t p.4-1 and inequalities turn into equalities if

Рп= (xf1)n+vn>Q. (0.15)

Note that the extremal distribution (0.15) is a geometric distribution with mathematical expectation A, and the function F(A) of the parameter (0.14) coincides with the function of the mathematical expectation in Theorem 1.

Similar dissertations in the specialty "Probability Theory and Mathematical Statistics", 01/01/05 code VAK

  • Asymptotic efficiency of scale-parameter-free exponential tests 2005, Candidate of Physical and Mathematical Sciences Chirina, Anna Vladimirovna

  • Some problems in probability theory and mathematical statistics related to the Laplace distribution 2010, Candidate of Physical and Mathematical Sciences Lyamin, Oleg Olegovich

  • Limit theorems in problems of dense embedding and dense series in discrete random sequences 2009, Candidate of Physical and Mathematical Sciences Mezhennaya, Natalya Mikhailovna

  • Limit theorems for the number of intersections of a strip by random walk trajectories 2006, candidate of physical and mathematical sciences Orlova, Nina Gennadievna

  • Optimization of the structure of moment estimates of the accuracy of normal approximation for distributions of sums of independent random variables 2013, Doctor of Physical and Mathematical Sciences Shevtsova, Irina Gennadievna

Conclusion of the dissertation on the topic “Probability Theory and Mathematical Statistics”, Kolodzei, Alexander Vladimirovich

3.4. Conclusions

In this chapter, based on the results of previous chapters, it was possible to construct a goodness-of-fit criterion for testing hypotheses in generalized placement schemes with the highest logarithmic rate of tending to zero probabilities of errors of the first type, with a fixed probability of an error of the first kind and non-approaching alternatives. ~"

Conclusion

The purpose of the dissertation work was to construct goodness-of-fit criteria for testing hypotheses in a selection scheme without returning from an urn containing balls of 2 colors. The author decided to study statistics based on the frequencies of distances between balls of the same color. In this formulation, the problem was reduced to the task of testing hypotheses in a suitable generalized layout.

The dissertation work included

The properties of entropy and information distance of discrete distributions with an unlimited number of outcomes and limited mathematical expectation have been studied;

A rough (up to logarithmic equivalence) asymptotic behavior of the probabilities of large deviations of a wide class of statistics in a generalized placement scheme is obtained;

Based on the results obtained, a criterion function with the highest logarithmic rate of tending to zero of the probability of an error of the first type with a fixed probability of an error of the second type and non-converging alternatives was constructed;

It has been proven that statistics that do not satisfy the Cramer condition have a lower rate of convergence to zero of the probabilities of large deviations compared to statistics that satisfy this condition.

The scientific novelty of the work is as follows.

The concept of a generalized metric is given - a function that admits infinite values ​​and satisfies the axioms of identity, symmetry and triangle inequality. A generalized metric is found and sets are indicated on which the entropy and information distance functions, defined on a family of discrete distributions with a countable number of outcomes, are continuous in this metric;

In the generalized placement scheme, a rough (up to logarithmic equivalence) asymptotics was found for the probabilities of large deviations of statistics of the form (0.4) satisfying the corresponding form of Cramer’s condition;

In a generalized placement scheme, a rough (up to logarithmic equivalence) asymptotics is found for the probabilities of large deviations of symmetric separable statistics that do not satisfy the Cramer condition;

In the class of criteria of the form (0.7), a criterion with the highest value of the criterion index is constructed.

The work solves a number of questions about the behavior of the probabilities of large deviations in generalized placement schemes. The results obtained can be used in the educational process in the specialties of mathematical statistics and information theory, in the study of statistical procedures for the analysis of discrete sequences, and were used in /3/, /21/ to justify the security of one class of information systems.

However, a number of questions remain open. The author limited himself to considering the central zone of changes in parameters n, N of generalized schemes for placing n particles in N cells. If the carrier of the distribution of random variables generating the generalized arrangement scheme (0.2) is not a set of the form r, r + 1, r + 2,., then when proving the continuity of the information distance function and studying the probabilities of large deviations, it is necessary to take into account the arithmetic structure of such a carrier that was not considered in the author's work. For the practical application of criteria built on the basis of the proposed function with the maximum index value, it is necessary to study its distribution both under the null hypothesis and under alternatives, including converging ones. It is also of interest to transfer the developed methods and generalize the results obtained to other probabilistic schemes other than generalized placement schemes.

If - frequencies of distances between outcome numbers 0 in a binomial scheme with probabilities of outcomes po> 1 - Po, then it can be shown that in this case

Pb = kh.t fin = kn) = I(± iki = n)(kl + --, (3.3) v=\ K\ \ . Kn\ where

O* = Po~1(1 ~Po),v =

From the analysis of the formula for the joint distribution of values ​​of cg in a generalized arrangement scheme, proven in /26/, it follows that distribution (3.3), generally speaking, cannot be represented in the general case as a joint distribution of values ​​of cg in any generalized arrangement of particles by cells. This distribution is a special case of distributions on the set of combinatorial objects introduced in /12/. It seems an urgent task to transfer the results of the dissertation work for generalized placement schemes to this case, which was discussed in /52/.

If the number of outcomes in a selection without return or polynomial placement scheme is more than two, then the joint frequency distribution of distances between adjacent identical outcomes can no longer be represented in such a simple way. So far it is possible to calculate only the mathematical expectation and dispersion of the number of such distances /51/.

List of references for dissertation research Candidate of Physical and Mathematical Sciences Kolodzei, Alexander Vladimirovich, 2006

1. Bogachev V.I., Kolesnikov A.V. Nonlinear transformations of convex measures and entropy of Radon-Nikodym densities // Reports of the Academy of Sciences. - 2004. - T. 207. - 2. - P. 155 - 159.

2. Vidyakin V.V., Kolodzei A.V. Statistical detection of covert channels in data transmission networks // Proc. report II International conf. "Information systems and technologies IST" 2004" (Minsk, October 8-10, 2004) Minsk: BSU, 2004. - Part 1. - pp. 116 - 117.

3. Viktorova I. I., Chistyakov V. P. Some generalizations of the empty box criterion // Theory Probab. and its applications. - 1966. - T. XI. - 2. P. 306-313.

4. Zubkov A. M. Recurrent formulas for calculating functionals of ods of discrete random variables // Review of Appl. and industrial math. 1996. - T. 3. - 4. - P. 567 - 573.

5. G. Zubkov A. M., Mikhailov V. G. Limit distributions of random variables associated with long repetitions in a sequence of independent tests // Theory Probab. and its applications. - 1974. - T. XIX. 1. - pp. 173 - 181.

6. Zubkov A. M., Mikhailov V. G. On repetitions of s - chains in a sequence of independent quantities // Theory Probab. and its application. - 1979. T. XXIV. - 2. - P. 267 - 273.

7. Ivanov V. A., Ivchenko G. I., Medvedev Yu. I. Discrete problems in probability theory // Results of Science and Technology. Ser. theory of probability, mathematics. stat., theor. cybern. T. 23. - M.: VINITI, 1984. P. 3 -60.

8. Ivchenko G. I. On moments of separable statistics in a generalized allocation scheme // Mat. notes. 1986. - T. 39. - 2. - P. 284 - 293.

9. Ivchenko G. I., Levin V. V. Asymptotic normality in a selection scheme without return // Theory Probab. and it is applied. - 1978.- T. XXIII. 1. - pp. 97 - 108.

10. Ivchenko G.I., Medvedev Yu.I. On the Markov-Polya urn scheme: from 1917 to the present day // Review applied. and industrial math. - 1996.- T. 3. 4. - P. 484-511.

11. Ivchenko G. I., Medvedev Yu. I. Random combinatorial objects // Reports of the Academy of Sciences. 2004. - T. 396. - 2. - P. 151 - 154.

12. Ivchenko G. I., Medvedev Yu. I. Statistical problems associated with the organization of control over the processes of generating discrete random sequences // Diskretn. math. - 2000. - T. 12. - 2. S. 3 - 24.

13. Ivchenko G. I., Medvedev Yu. I., Ronzhin A. F. Separable statistics and goodness-of-fit criteria for polynomial samples // Proceedings of Mathematics. Institute of the USSR Academy of Sciences. 1986. - T. 177. - P. 60 - 74.

14. Ivchenko G. I., Timonina E. E. On estimation when choosing from a finite population // Mat. notes. - 1980. - T. 28. - 4. - P. 623 - 633.

15. Kolodzei A. V. Theorem on the probabilities of large deviations for separable statistics that do not satisfy the Cramer condition // Diskretn. math. 2005. - T. 17. - 2. - P. 87 - 94.

16. Kolodzei A. V. Entropy of discrete distributions and the probability of large deviations of functions from filling cells in generalized layouts // Review of Appl. and industrial math. - 2005. - T. 12. 2. - P. 248 - 252.

17. Kolodzey A. V. Statistical criteria for identifying hidden channels based on changing the order of messages // Research work "Apology": Report / FSTEC of the Russian Federation, Head A. V. Knyazev. Inv. 7 chipboards - M., 2004. - P. 96 - 128.

18. Kolodzei A.V., Ronzhin A.F. About some statistics related to checking the homogeneity of random discrete sequences // Research work "Development of mathematical problems of cryptography" N 4 2004.: Report / AK RF, - M., 2004 .

19. Kolchin A. V. Limit theorems for a generalized layout scheme // Diskretn. math. 2003. - T. 15. - 4. - P. 148 - 157.

20. Kolchin V.F. One class of limit theorems for conditional distributions // Lit. math. Sat. - 1968. - T. 8. - 1. - P. 111 - 126.

21. Kolchin V. F. Random graphs. 2nd ed. - M.: FIZMATLIT, 2004. - 256 p.

22. Kolchin V. F. Random mappings. - M.: Nauka, 1984. - 208 p.

23. Kolchin V.F., Sevastyanov B.A., Chistyakov V.P. Random placements. M.: Nauka, 1976. - 223 p.

24. Kramer G. // Uspekhi Mat. science. - 1944. - high. 10. - pp. 166 - 178.

25. Kulbak S. Information theory and statistics. - M.: Nauka, 1967. - 408 p.

26. Medvedev Yu. I. Some theorems on the asymptotic distribution of the chi-square statistic // Dokl. Academy of Sciences of the USSR. - 1970. - T. 192. 5. - P. 997 - 989.

27. Medvedev Yu. I. Separable statistics in a polynomial scheme I; II. // Theory Prob. and its use. - 1977. - T. 22. - 1. - P. 3 - 17; 1977. T. 22. - 3. - P. 623 - 631.

28. Mikhailov V. G. Limit distributions of random variables associated with multiple long repetitions in a sequence of independent tests // Theory Probab. and its applications. - 1974. T. 19. - 1. - P. 182 - 187.

29. Mikhailov V. G. Central limit theorem for the number of incomplete long repetitions // Theory Probab. and its applications. - 1975. - T. 20. 4. - P. 880 - 884.

30. Mikhailov V. G., Shoitov A. M. Structural equivalence of s - chains in random discrete sequences // Discrete. math. 2003. - T. 15, - 4. - P. 7 - 34.

31. Nagaev A.V. Integral limit theorems taking into account probabilities of large deviations. I. // Theory Probab. and it is applied. -1969. T. 14. 1. - pp. 51 - 63.

32. Petrov V. V. Sums of independent random variables. - M.: Nauka, 1972. 416 p.

33. Prokhorov Yu. V. Limit theorems for sums of random vectors whose dimension tends to infinity // Theory Probab. and its applications. 1990. - T. 35. - 4. - P. 751 - 753.

34. Ronzhin A.F. Criteria for generalized particle placement schemes // Theory Probab. and its applications. - 1988. - T. 33. - 1. - P. 94 - 104.

35. Ronzhin A.F. Theorem on the probabilities of large deviations for separable statistics and its statistical application // Mat. notes. 1984. - T. 36. - 4. - P. 610 - 615.

36. Sanov I. N. On the probabilities of large deviations of random variables // Mat. Sat. 1957. - T. 42. - 1 (84). - S.I - 44.

37. Seneta E. Correctly changing functions. M.: Nauka, 1985. - 144 p.

38. Timashev A. N. Multidimensional integral theorem on large deviations in an equiprobable placement scheme // Diskret, Mat. - 1992. T. 4. - 4. - P. 74 - 81.

39. Timashev A. N. Multidimensional local theorem on large deviations in an equiprobable placement scheme // Diskretn. math. - 1990. T. 2. - 2. - P. 143 - 149.

40. Fedoryuk M.V. Pass method. M.: Nauka, 1977. 368 p.

41. Feller V. Introduction to probability theory and its applications. T. 2. - M.: Mir, 1984. 738 p.

42. Shannon K. Mathematical theory of communication // Works on information theory and cybernetics: Trans. from English / M., IL, 1963, p. 243 - 332.

43. Conrad K. Probability Distribution and Maximum Entropy // http://www.math.uconn.edu/~kconrad/blurbs/entropypost.pdf

44. Hoeffding W. Asymptotically optimal tests for multinomial distribution // Ann. Math. Statist. 1965. - T. 36. - pp. 369 - 408.

45. Inglot T,. Rallenberg W. S. M., Ledwina T. Vanishing shortcoming and asymptotic relative efficiency // Ann. Statist. - 2000. - T. 28. - P. 215 238.

46. ​​Jurdas C., Pecaric J., Roki R., Sarapa N., On an inequality for the entropy of probability distribution // Math. Inequal. and Appl. - 2001. T. 4. - 2. - P. 209 - 214. (RZhMat. - 2005. - 05.07-13B.16).

47. Kolodzey A. V., Ronzhin A. F., Goodness of Fit Tests for Random Combinatoric Objects // Proc. report intl. conf. Modern Problems and new Trends in Probability Theory, (Chernivtsi, June 19 - 26, 2005) - Kyiv: Institute of Mathematics, 2005. Part 1. P. 122.

48. Kullback S. and Leibler R. A. On information and sufficiency // Ann. Math. Statist. 1951. - T. 22. - pp. 79 - 86.

49. Quine M.P., Robinson J. Efficiency of chi-square and likelihood ratio goodness of fit tests // Ann. Statist. 1985. - T. 13. - 2. - pp. 727 -742.

Please note that the scientific texts presented above are posted for informational purposes only and were obtained through original dissertation text recognition (OCR). In this connection, they may contain errors associated with imperfect recognition algorithms. There are no such errors in the PDF files of dissertations and abstracts that we deliver.

Thesis

Therefore, one of the ways to develop the testing of statistical hypotheses was the path of “empirical” construction of criteria, when the constructed statistics of the criterion are based on a certain principle, an ingenious idea or common sense, but its optimality is not guaranteed. In order to justify the use of such statistics when testing hypotheses against a certain class of alternatives, most often by the method...

  • 1. Supporting information
    • 1. 1. Information from the theory of C/- and V-statistics
    • 1. 2. Definition and calculation of Bahadur efficiency
    • 1. 3. On large deviations of II- and V-statistics
  • 2. Baringhouse-Hentze symmetry criteria
    • 2. 1. Introduction
    • 2. 2. Statistics
    • 2. 3. Statistics
  • 3. Exponentiality criteria
    • 3. 1. Introduction
    • 3. 2. Statistics I
    • 3. 3. Statistics n
  • 4. Normality criteria
    • 4. 1. Introduction
    • 4. 2. Statistics B^
    • 4. 3. Statistics V^n
    • 4. 4. Statistics V|)P
  • 5. Criteria for agreement with Cauchy's law
    • 5. 1. Introduction
    • 5. 2. Statistics
    • 5. 3. Statistics

Asymptotic properties of symmetry and agreement criteria based on characterizations (essay, coursework, diploma, test)

This dissertation constructs and studies goodness-of-fit and symmetry criteria based on the characterization properties of distributions, and also calculates their asymptotic relative efficiency for a number of alternatives.

The construction of statistical criteria and the study of their asymptotic properties is one of the most important problems of mathematical statistics. When testing a simple hypothesis against a simple alternative, the problem is solved using the Neyman-Pearson lemma, which, as is known, gives the optimal (most powerful) criterion in the class of all criteria of a given level. This is the likelihood ratio test.

However, for more difficult and practical hypothesis testing problems involving either testing complex hypotheses or considering complex alternatives, uniformly most powerful tests rarely exist, and the role of the likelihood ratio test changes significantly. The likelihood ratio statistic usually cannot be calculated explicitly; it loses its optimality property, and its distribution is unstable to changes in the statistical model. Moreover, the statistician often cannot determine the type of alternative at all, without which the construction of parametric criteria becomes meaningless.

Therefore, one of the ways to develop the testing of statistical hypotheses was the path of “empirical” construction of criteria, when the constructed statistics of the criterion are based on a certain principle, an ingenious idea or common sense, but its optimality is not guaranteed.

Typical examples of such statistics are the sign statistic, Pearson's x2 statistic (1900), the Kolmogorov statistic (1933), which measures the uniform distance between the empirical and true distribution function, the Kendall rank correlation coefficient (1938), or the Bickel-Rosenblatt statistic (1973), based on the quadratic risk of nuclear density assessment. Currently, mathematical statistics has many dozens of “empirical” statistics for testing the hypotheses of agreement, symmetry, homogeneity, randomness and independence, and more and more statistics of this type are constantly being proposed in the literature. A huge literature is devoted to the study of their exact and limit distributions, estimates of the rate of convergence, large deviations, asymptotic expansions, etc.

In order to justify the use of such statistics when testing hypotheses against a certain class of alternatives, their power is most often calculated using statistical modeling. However, for any consistent criterion, the power tends to unity as the sample size increases, and therefore is not always informative. A deeper analysis of the comparative properties of statistics can be carried out based on the concept of asymptotic relative efficiency (ARE). Various approaches to calculating AOE were proposed by E. Pitman, J. Hodges and E. Lehman, R. Bahadur, G. Chernov and W. Kallenberg in the mid-20th century; the results of the development of AOE theory by the mid-90s were summarized in the monograph. There is a generally accepted opinion that the synthesis of new criteria should be accompanied not only by an analysis of their properties, but also by the calculation of AOE in order to assess their quality and give informed recommendations for their use in practice.

This paper uses the idea of ​​constructing criteria based on characterizing distributions by the equidistribution property. Characterization theory originates from the work of D. Polya, published in 1923. Then it was developed in the works of I. Martsinkevich, S. N. Bernstein, E. Lukach, Yu. V. Linnik, A.A. Singer, J. Darmois, V.P. Skitovich, S.R. Pao, A.M. Kagan, J. Galambos, S. Kotz, L. B. Klebanov and many other mathematicians. The literature on this subject is large, and there are currently several monographs devoted to characterizations, for example, , , , , , , .

The idea of ​​constructing statistical criteria based on characterizations by the equidistribution property belongs to Yu. V. Linnik. At the end of his extensive work, he wrote: “. one can raise the question of constructing criteria for the agreement of a sample with a complex hypothesis, based on the identical distribution of the two corresponding statistics gi (xi> .xr) and g2(x, ¦¦¦xr) and thus reducing the question to the criterion of homogeneity.”

Let's return to the classic Polya's theorem to explain with a concrete example how this approach can work. In its simplest form, this theorem is formulated as follows.

Polya's theorem. Let X and Y be two independent and identically distributed centered s. V. Then s. V. (X + Y)//2 and X are identically distributed if and only if the distribution law of X is normal.

Suppose we have a sample of centered independent observations Xi, ., Xn and want to test the (complex) null hypothesis that the distribution of this sample is normal with mean 0 and some variance. Using our sample, let us construct the usual empirical distribution function (d.f.) n

Fn (t) = n-^VD

Gn(t) = n~2? VD + Xj< iv^}, t <= R1. i, j=l

By virtue of the Glivenko-Cantelli theorem, which is also valid for V-statistical empirical d.f. , for large n the function Fn(t) uniformly approaches the d.f. F (t) = P (X< t), а функция Gn (t) равномерно сближается с G (t) = ЦХ + У < tV2). Поскольку при нулевой гипотезе F = G, то Fn (t) близка к Gn (t), и критерий значимости можно основывать на подходящем функционале Тп от разности Fn (t) — Gn (t). Напротив, при альтернативе (то есть при нарушении нормальности) по теореме Пойа F ф G, что приводит к большим значениям Тп и позволяет отвергнуть нулевую гипотезу, обеспечивая состоятельность критерия.

However, this design, based on the idea of ​​Yu. V. Linnik, received almost no development, perhaps due to technical difficulties in constructing and analyzing the resulting criteria. Another reason is probably that characterizations of distributions by the equidistribution property are few and far between.

We know of only a few works devoted to one degree or another to the development of the idea of ​​Yu. V. Linnik. These are the works of Baringhouse and Henze and Muliere and Nikitin, which will be discussed below. There are also works in which goodness-of-fit criteria for specific distributions are also constructed on the basis of characterizations, but not on the basis of equidistribution, for example, , , , , , , , .

The most common use in the literature is to characterize the exponential distribution using various variants of the no-memory property , , , , , , .

It should be noted that in almost all of these works (except perhaps) the AOE of the criteria under consideration is not calculated and not discussed. In this thesis, we not only study the asymptotic properties of the known and our proposed characterization-based criteria, but also calculate their local exact (or approximate) AOE according to Bahadur.

Let us now define the concept of AOE. Let (Tn) and (1^) be two sequences of statistics constructed from a sample X,., Xn with distribution Pd, where in € 0 C R1, and the null hypothesis Ho is tested: 9 € in C against alternative A: in € ©-x = ©-6o. Let Mm (a, P,0) be the minimum sample size X[,., Xn, for which the sequence (Tn) with a given significance level, a > 0 reaches power /3< 1 при альтернативном значении параметра в € (c)1- Аналогично вводится в). Относительной эффективностью критерия, основанного на статистике Тп, по отношению к критерию, основанному на Уп, называется величина равная обратному отношению указанных выборочных объемов:

Since relative efficiency as a function of three arguments cannot be calculated explicitly even for the simplest statistics, it is customary to consider limits:

Ptet, y (a,/?, 0), Ntet, y (a,/3,0).

In the first case, the AOE according to Bahadur is obtained, the second limit determines the AOE according to Hodges-Lehman, and the third leads to the determination of the AOE according to Pitman. Since in practical applications it is the cases of low significance levels, high powers and close alternatives that are most interesting, all three definitions seem reasonable and natural.

In this work, to compare criteria, we will use AOE according to Bahadur. There are several reasons for this. Firstly, the Pitman efficiency is suitable mainly for asymptotically normal statistics, and under this condition coincides with the local Bach-Dur efficiency , . We consider not only asymptotically normal statistics, but also statistics of quadratic type, for which the limit distribution under the null hypothesis differs sharply from normal, so that Pitman efficiency is not applicable. Secondly, the Hodges-Lehman AOE is unsuitable for studying two-sided criteria, since they all turn out to be asymptotically optimal, and for one-sided criteria this AOE usually locally coincides with the Bahadur AOE. Third, significant progress has recently been made in the area of ​​large deviations for test statistics, which is crucial when calculating the Bahadur AOE. We are referring to the large deviations of the U- and V-statistics described in recent works and.

Let us now move on to an overview of the contents of the dissertation. The first chapter is of an auxiliary nature. It sets out the necessary theoretical and technical information from the theory of 11-statistics, the theory of large deviations and the theory of asymptotic efficiency according to Bahadur.

Chapter 2 is devoted to the construction and study of criteria for testing the symmetry hypothesis. Baringhouse and Henze proposed the idea of ​​constructing symmetry criteria based on the following elementary characterization.

Let X and Y be n.o.s.v.s having a continuous d.f. Then |X| and |max (X, Y)| identically distributed if and only if X and Y are symmetrically distributed around zero.

We use this characterization to construct new symmetry criteria. Let us recall that several classical symmetry criteria (see, Chapter 4) are based on characterizing symmetry by the even simpler property of equidistribution of X and -X.

Let us return to the Baringhouse-Hentze characterization. Let X, ., Xn observations having a continuous d.f.<7. Рассмотрим проверку гипотезы симметрии:

H0: OD = 1 —<3(-:г) V я (Е Я1. Это сложная гипотеза, поскольку вид С? не уточняется. В качестве альтернатив мы рассмотрим параметрическую альтернативу сдвига, т. е. G (x-0) = F (x — в), в >0-skew alternative, i.e. d(x-v) = 2f(x)F ($x), c > 0-Leman alternative, i.e. G(x-, 6) = F1+ e (x), 6 > 0 and the pollution alternative, i.e. G(x-6) = (1 - 6) F(x) + 6Fr+1(x), in > 0, r > 0, where F (x) and f (x) are d.f. and the density of some symmetric distribution.

In accordance with the above characterization, an empirical df is constructed based on |Xj|,., Xn, n

Hn (t) = n~2 J2 Tmax (X^Xk)<г}. На основе этих функций составляются статистики: лоо ):

Let X uY be non-negative and non-degenerate n.o.s.v.s having a d.f. differentiable at zero. F, and let 0< а < 1. Тогда X и min (^, —) одинаково распределены тогда и только тогда, когда F есть ф.р. экспоненциального закона.

In addition to constructing the agreement criterion itself and studying its asymptotic properties, it is of interest to calculate the AOE of a new criterion and study its dependence on the parameter a.

The second generalization of this characterization belongs to Des. We formulate it based on more recent work:

Let Xi, ., Xm, m ^ 2 be non-negative and non-degenerate i.s. r.v.s having a d.f. differentiable at zero. F. Then the statistics X and m minpfi, ., Xm) are identically distributed if and only if F is a d.f. exponential law.

Let Xx,., Xn be independent observations having d.f. Based on the characterizations formulated above, we can test the exponential hypothesis Ho, which consists in the fact that (7 is the d.f. of the exponential law. P, against the alternative H, which consists in the fact that C f? under weak additional conditions.

In accordance with these characterizations, an empirical df is constructed. p = pVD< О (°-0−3) 1 и -статистические ф.р. п-2 ± (* ^ < 4} + ^{тш (?, < «}), 1 П

We propose to base the criteria for checking exponentiality on statistics: pkp = - c&bdquo-(*)] aop(1).

As alternatives, we choose the standard alternatives used in the literature on exponential testing: the Weibull alternative with d(x) = (β + 1)xx(-x1+β), x ^ 0- the Makehama alternative with d(x) = ( 1 + 0(1 - exp (-x))) exp (-x - 0(exp (-x) - 1 + x)), x ^ 0 - an alternative to the linearity of the failure rate function with d (x) = (1 + bx) exp[—x—^bx2], x^O.

For the two statistics proposed above, the limit distributions under the null hypothesis are written:

Theorem 3.2.1 For the statistics Uε for n -* oo, the relation holds: where Dz(a) is defined in (3.2.2). Theorem 3.3.1 For the statistics n as n -> oo the relation holds

U0,(t + 1)2A1(t)), where D4 (t) is defined in (3.3.6).

Since both statistics depend on the parameters a and m, we establish at what parameter values ​​the AOE according to Bahadur reach their maximum and find these values. In addition, we construct an alternative in which the maximum is achieved at the point and φ ½.

The fourth chapter is devoted to testing the normality hypothesis. There are many characterizations of the normal law as one of the central laws of probability theory and mathematical statistics, and two monographs devoted exclusively to this issue. We will consider a slightly simplified version of the well-known characterization of and:

Let Xr, X2, ., Xm be centered n.o.s.v.s having d.f. o constants a, a-2,., am are such that 0< а* < 1 и = 1. Тогда статистики Х и одинаково распределены тогда и только тогда, когда F (x) = Ф (х/а), то есть F — ф.р. нормального закона с нулевым средним и некоторой дисперсией, а > 0.

Let X, ., Xn be a sample with d.f. G. Based on this characterization, we can test the main hypothesis R0, which is that G is a d.f. the normal law Fa (x) = Ф (x/a), against the alternative Hi, which is that G φ Fa. The usual empirical df is constructed. Gn and V-statistical d.f. n^

Bm, n (t) = n~t (E 1 + - +< *}),

1.¿-t=1 s

Hereinafter, the symbol a means summation over all permutations of indices. Criteria for testing normality can be based on the following statistics:

B, n = Г dGn (t), J -00 oo

BmAt)-Gn (t)]dGn (t), oo

Bin = Г straight lines of this direction intersecting the line; we find the midpoints of the chords to be cut off; draw a straight line through the middle - this is the diameter).

Discuss:

1. Why in determining the diameter is a vector of a non-asymptotic direction taken. If they cannot answer, then ask them to construct the diameter, for example, for a parabola.

2. Does any second-order line have at least one diameter? Why?

3. During the lecture it was proven that diameter is a straight line. The midpoint of which chord is point M in the figure?


4. Look at the parentheses in equation (7). What do they remind you of?

Conclusion: 1) each center belongs to each diameter;

2) if there is a line of centers, then there is a single diameter.

5. What direction do the diameters of a parabolic line have? (Asymptotic)

Proof (probably in lecture).

Let the diameter d, given by equation (7`), be conjugate to a vector of non-asymptotic direction. Then its direction vector

(-(), ). Let us show that this vector has an asymptotic direction. Let us use the criterion of the asymptotic direction vector for a line of parabolic type (see (5)). Let’s substitute and make sure (don’t forget that .

6. How many diameters does a parabola have? Their relative position? How many diameters do the remaining parabolic lines have? Why?

7. How to construct the total diameter of some pairs of second-order lines (see questions 30, 31 below).

8. We fill out the table and be sure to make drawings.

1. . Write an equation for the set of midpoints of all chords parallel to the vector

2. Write the equation for the diameter d passing through the point K(1,-2) for the line.

Solution steps:

1st method.

1. Determine the type (to know how the diameters of this line behave).

In this case, the line is central, then all diameters pass through center C.

2. We compose the equation of a straight line passing through two points K and C. This is the desired diameter.

2nd method.

1. We write the equation for diameter d in the form (7`).

2. Substituting the coordinates of point K into this equation, we find the relationship between the coordinates of the vector conjugate to the diameter d.

3. We set this vector, taking into account the found dependence, and compose an equation for diameter d.

In this problem, it is easier to calculate using the second method.

3. . Write an equation for the diameter parallel to the x-axis.

4. Find the midpoint of the chord cut off by the line

on the straight line x + 3y – 12 =0.

Directions to the solution: Of course, you can find the points of intersection of the straight line and line data, and then the middle of the resulting segment. The desire to do this disappears if we take, for example, a straight line with the equation x +3y – 2009 =0.