Error-Free Training via Information Structuring in the Classification Problem

The present paper solves the training problem that comprises the initial phases of the classification problem using the data matrix invariant method. The method is reduced to an approximate “slicing” of the information contained in the problem, which leads to its structuring. According to this method, the values of each feature are divided into an equal number of intervals, and lists of objects falling into these intervals are constructed. Objects are identified by a set of numbers of intervals, i.e., indices, for each feature. As-suming that the feature values within any interval are approximately the same, we calculate frequency features for objects of different classes that are equal to the frequencies of the corresponding indices. These features allow us to determine the frequency of any object class as the sum of the frequencies of the indices. For any number of intervals, the maximum frequency corresponds to a class object. If the features do not contain repeated values, the error rate of training tends to zero for an infinite number of intervals. If this condition is not fulfilled, a preliminary randomization of the features should be carried out.


Introduction
The classification problem addresses numerous questions in science and technology that involve the use of massive amounts of accumulated information in analyses of new data. These data play the role of incoming information in various models of formalized complex systems. The accumulated information is represented by objects, and the features of objects are vectors, usually in Eucli-

Assumptions and Preliminary Observations
The solution is sought in accordance with the following algorithm. The range of values of each feature is divided into an equal number of intervals and the number of each interval, called the index, in which the object feature falls is determined. The objects are then described by a set of indices, and the data matrix is transformed into an index matrix. For each feature, it is possible to find a subset of objects with the same index and these subsets are referred to as granules. The frequency of a certain object's classes can then be calculated in terms of the composition of the granules. The sum of the frequencies of the granules that correspond to individual features is approximately equal to the frequency of occurrence of a certain class object. The maximum of this frequency gives an estimate of the object class for any number of intervals.
The proposed algorithm is conceptually based on the fact that the uncertain quantities are quantities that yield different quantitative information about a single entity. On this basis, the assumption is made here that the given values of the features are random and that values within a certain neighborhood are equally probable. According to this assumption, the probability distribution of features is piecewise constant. Therefore, each feature value of an object can be assigned an index, and for each class, the frequency of the teaching sample is calculated.
The probability of a particular object class consists of a complete group of independent events of index appearance for all features in the same class. Then, the process of classification is reduced to a comparison of the frequencies of different class objects whose feature indices have the frequencies of the corresponding classes. The formula of full probability allows estimating the probability of an object for each class, and its maximum defines an object class.
In general, the compositions of granules (in terms of their feature values and object class) are highly uncertain, and it is advisable to reduce this uncertainty to eliminate possible classification errors. This objective can be partly achieved by increasing the number of intervals to infinity when the granules are composed only of objects with the same feature values.
Since this level is measured by the amount of information entropy, the entropy of the information contained in the data matrix must also be increased. From the properties of entropy, it follows that such a result can be achieved by increasing the number of options in each of the feature values. Practically, this means that it is advisable to reduce the number of equal values of each feature.
This conclusion is consistent with the law of necessary diversity [8], which states that an organism's assessment of its environmental impact and adaptation requires the perception of the maximum number of variants of information values. Therefore, this study proposes randomizing the features by increasing their val-Journal of Intelligent Learning Systems and Applications ues by a uniformly distributed random variable that is the same for all features of the object. All of the features then have different meanings, and the training will be asymptotically error-free. An analysis of the solutions for a variety of training sample models shows that this conclusion is valid for any real problem.
The invariant method represents a model of processes in the sensory systems of animals and is intended to carry out class recognition by searching for prototype objects in the information stored in an animal's brain [9]. A single algorithm for implementing the method and model indicates the biological roots of the proposed approach and confirms its applicability.

Statement of the Problem
The basis of this work is the data matrix invariant method. This method allows the establishment of a function that divides the objects in a training sample into classes. The effectiveness of the method has been demonstrated using 9 databases from a repository [10].  Table 1, where 3 Mi = and 4 Mk = , correspond to the Iris database [10]. The distribution of objects by class (5%, 35% and 60%) indicates that the data are "unbalanced", which complicates the classification problems [11].    Within the framework of the method, we consider the vector of object features to be an ordered set of feature values to which the segments of the coordinate axes correspond. The object itself is considered to represent a certain point in the multidimensional coordinate system. The totality of such points for all sampling objects determines the feature space. This space is not Euclidean, as is customary in machine learning, because the additional assumption of the existence of the scalar product is not made. Here, we use only the structure of the space, the points of which differ in terms of their numbers and related information.

Multidimensional Intervals
Let us divide the objects into groups in which the values of each feature lie close to one another. For this purpose, we use multidimensional intervals that break the range of values of each feature into an equal number of intervals. For any k, we then obtain the intervals of values The second option assumes that s k q is renumbered and receives a number ( ) t s Journal of Intelligent Learning Systems and Applications such that the value t k q is arranged in nondecreasing order: 1 The index m = 1 receives one or several objects (with numbers t) for which If, with a further increase in t, this relation is not satisfied for some

Granulation of the Given Data
Given the uncertainty of the data, we introduce the assumption that all of the values of the features of the objects fall within a certain interval and include some random errors that are equal. For each indexing option, we find the data matrix mapping for several samples ata value of n = 3000 are illustrated by the curves shown in Figure 2. For all of the curves, a notation system is adopted in which the symbol γ is followed by a string of characters that act as the identifier of the sample, and the indexing option i1 or i2 is then indicated. The graphs depict the sequence convergence process { } n γ . They show that the convergence rate for index i1 is higher than that for i2.
The calculations show that the correlation coefficients of the features of individual classes and the corresponding indices i1 and i2 do not differ significantly.

Randomization of the Data
The question of the limitations of the given data in solving the training problem using the method considered naturally arises. In connection, we note that the curves in Figure 2  can be constructed. However, it will concern an analogous set made up entirely or partially of other objects because two objects with the same characteristics cannot exist in the sample. All of the summands that determine the value of , i k m g for the object s will be greater than zero. Since M n  , most of the analogous summands for the objects of the indicated sets will differ from zero only for some values of k and i. Therefore, given repeated values of the attributes, training errors occur. information and is not actually taken into account, although it is "claimed" to be one of the features. It is thus a "degenerate" feature.  [5].
For real datasets, we can assume that the invariant method leads to practically error-free training.

Conclusions
This paper presents a solution to the problem of training sample models by employing the data matrix invariant method, which was developed to solve the classification problem. This solution is limited to finding a function according to that the objects in a training sample are divided into classes, studying the influence of various data features on the accuracy of training.
The key idea of the method is structuring the information contained in the problem by introducing multidimensional intervals that allow us to roughly break up the given dataset into its component parts and to compute the information granules for each feature. The granules possess an important property in that, given an infinite number of interval; each granule contains only the objects with the same feature value.
The principle of the scheme of structuring information is reduced to dividing the whole, which is represented by multilevel data, into parts such that one can reproduce the whole from the lower levels to calculate the frequencies of these parts. This "whole" consistently represents the summary information of the sample, the classes, the objects, the features and the indices of the features. Journal of Intelligent Learning Systems and Applications Therefore, multidimensional intervals allow us to compare objects in a probability space, and they play the role of the concept of distance in determining the positions of objects in Euclidean space. The advantage is that the evaluation of the probability of an object is incomparably closer to solving a problem than evaluating the position of an object in a metric space.
In this paper, it is shown that the data matrix invariant method yields practically error-free training for any data matrix on the basis of a simple and universal algorithm. This method has independent importance because it offers a new way of analyzing multidimensional data that do not rely on the concept of distance between objects.

Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this paper.