A Study on Computer Consciousness on Intuitive Geometry Based on Mathematics Experiments and Statistical Analysis

In this paper, we present our research on building computing machines consciousness about intuitive geometry based on mathematics experiments and statistical inference. The investigation consists of the following five steps. At first, we select a set of geometric configurations and for each configuration we construct a large amount of geometric data as observation data using dynamic geometry programs together with the pseudo-random number generator. Secondly, we refer to the geometric predicates in the algebraic method of machine proof of geometric theorems to construct statistics suitable for measuring the approximate geometric relationships in the observation data. In the third step, we propose a geometric relationship detection method based on the similarity of data distribution, where the search space has been reduced into small batches of data by pre-searching for efficiency, and the hypothetical test of the possible geometric relationships in the search results has be performed. In the fourth step, we explore the integer relation of the line segment lengths in the geometric configuration in addition. At the final step, we do numerical experiments for the pre-selected geometric configurations to verify the effectiveness of our method. The results show that computer equipped with the above procedures can find out the hidden geometric relations from the randomly generated data of related geometric configurations, and in this sense, computing machines can actually attain certain consciousness of intuitive geometry as early civilized humans in ancient Mesopotamia.


Introduction
Intuitive geometric knowledge is an origin of human civilization, just as shown by the Plimpton 322 tablet that people in the Old Babylonian period (between −1900 and −1600) already knew the rule of the right triangle i.e., the Pythagorean theorem, through various instances of right triangles, almost one thousand years before proof was given in Greek time. From the analogue view, the machine's consciousness would be better built starting from recognizing geometric configurations, formating of geometric concepts and discovering geometric properties from observing sufficiently many examples of geometric configuration without human interference, and automated verification (or proof) of the observed geometric theorems. Indeed, machine proof of geometric theorems has been regarded as an essential subject of artificial intelligence research during the inception of artificial intelligence. In the past few decades, researchers have made significant progress in using computers to prove geometric theorems. The research work of computer proof of geometric theorems is mainly developed from the following three directions: 1) Algebraic calculation method based on coordinates; 2) Point elimination method based on geometric invariants; 3) Proving theorems by simulating human thinking the reasoning database search method.
The machine proof of geometric theorems originated in the 1950s. Tarski [1] proposed that most of the decision problems in elementary algebra and elementary geometry can be verified using an algebraic method. Among many implementations, great progress was made by Wu Wen-Tsün in the 1970s. Inspired by ancient Chinese mathematics, Wu proposed the algebraic method of geometric theorem machine proof, called the "Wu's method" [2] [3] [4]. Its basic idea is to transform a geometric problem into a system of algebraic equations, and then verify (prove or disprove) the geometric theorem by calculating the relationship between the system of algebraic equations. Wu's method has been successfully used for the mechanized proof of geometric theorems along with the rapid development of computer algebra systems like Reduce, Derive, Mathematica, Maple, and so on. Soon after Wu's initial work, the Gröbner basis method, which was developed by Buchberger for processing polynomial system in the 1960s, has also been widely used in the field of geometric theorem proving [5] [6]. Both Wu's method and Gröbner basis method are essentially verifying algebraic identities with some constrained variables and a set of polynomial constrained equations.
Starting from the fact that the lower and upper bounds of a polynomial equation can be determined by its coefficients, Hong [7] proposed the "one-example illustration method" that can verify the correctness of a geometric theorem via a single instance of the related geometry statement. Furthermore, based on the following observation: if a multivariate polynomial has a value equal to zero on a sufficiently large grid, then this polynomial is always equal to zero, Zhang et al.
proposed the "numerical parallel method" [8], which passed a certain scale of or Sharp PC1500) at the end of 1980s. When the above-mentioned algebraic methods are used to prove geometric theorems, they usually include large-scale complicated calculations involving polynomials, which geometric meaning generally can't be understood by human, and for human it is also too difficult to check the correctness of the machine computation by manual method. Therefore, such proofs are called "human nonreadable". Zhang et al. [9] proposed to use the area method to prove the geometric theorem and realized the readable proof for the first time. Zhou et al. [10] introduced the Pythagorean difference to the proof process of Non-Euclidean geometric theorems. Similar to the area method and the Pythagorean difference method, a generalized vector method was suggested in [11]. These methods are collectively referred to as the "geometric invariant method" [12] [13].
Another category method, the "deductive reasoning method" based on database searching, which simulates the idea of human proof of geometric theorems, namely, using known hypotheses and standard axioms to perform inference searches on geometric propositions, can be traced to 1960. Gelernter et al. [14] proposed a method that combined the backstepping method with the depth-first search and implemented a program based on the backstepping method on the computer. Nevins [15] combined the forward and backstepping method to prove the geometric theorem. Zhang et al. [16] gave a more effective method based on a geometric deduction database system. Based on the idea of a structured database, the amount of calculation in the inference process was significantly reduced, and it proves that generate geometric propositions are generally readable. It worths indicating that together with dynamic geometry programs (like Geometer's Sketchpad), the deductive reasoning method has been widely used for developing educational software in China. Nevertheless, there has no report on studies to promote computers to obtain graphical intuitive analysis capabilities for elemental geometry yet.
Considering that the intuitive knowledge of geometry played the essential role in the development of human intelligence-in both meaning of humankind and human individuals, it is natural to expect that computing machines that are able to see or understand certain geometry meaning, like three-point collineance, four points lie on the same circle, or square of one edge equals to the sum of squares of other two edges in certain triangles, would eventually lead to a higher stage of machine intelligence-the ASI (Artificial Super Intelligence), Sun studied recently in his Master thesis [17] the problem to train the intelligent agents such as computers to "observe" a large number of intuitive geometric configurations, to combine the powerful algebraic computing capabilities and data storage capabilities of machine, so to understand and master the intuitive geometrical analysis capabilities of humans in the long-term goal of AI. The work implemented a symbolic computation program with Maple software to mimic dynamic geometry for randomly generating geometric configuration in batch, and designed several statistical formulas to discover latent geometry relationships from suitable amount of graphic data, therefore, exhibited a potential probability of the conscious evolution of the computing machine species.
As an English translation of one part of the thesis, this paper focuses on establishing statistics of geometric relations in graphic data and establishing a quantitative method for comparing the similarities between the distributions of graphic data.
The rest of this paper is organized as follows. In Section 2, we introduce the geometric theorem machine proof methods related to the content of this paper. In Section 3, we propose the geometric relationship detection method based on distribution similarity. In Section 4, we conducted numerical experiments and compared the results under different observation error levels. In the final section, we draw a short conclusion.

Related Methods of Mechanical Geometry
Theorem-Proving

Wu's Method
Let F and G be two multivariate polynomials about the variable x , the class of F is k, and the highest degree of F and G about k x are d and s respectively. Arrange F and G in descending order of the variable k x and write as follows form: Then, there must be a non-negative integer t and polynomials T and R. The highest coefficient of R with respect to k x is less than d or 0 R = , which satisfies: In Equation (2), R is the pseudo remainder of polynomial G with respect to polynomial F, denoted as Assuming that 1 2 , , , s TS T T T = is a triangular polynomial group, the remainder of polynomial G with respect to TS can be obtained by the following Further, the remainder formula Equation (2) can be extended to the following form: Among them, i I and i C are the initial formula and polynomial of i T respectively.
The general procedure of Wu's method to prove geometric theorem is as follows: 1) The geometric theorem is algebraized, the known assumptions are partially transformed into a polynomial group H, and the theorem's conclusion is transformed into a polynomial g.
2) The polynomial group H is sorted according to the Wu-Ritt principle [2] [18], and the ascending 3) Solve the theorem conclusion polynomial g and the continuous pseudo-division of ascending sequence, get ( ) , prem g CS R = , and judge whether the residue R is 0. If 0 R = , according to Equation (5), it is easy to get the equa-

Numerical Parallel Method
The single-example illustration method has expanded a new idea for the machine proof of geometric theorems, but it has not been realized due to its high computational complexity. Zhang et al. [8] proposed a numerical parallel method inspired by Wu's method.
Suppose the polynomial  . If the following Equation (6) holds, F is an identity that is always 0: The conclusion can be drawn from the above: To verify whether an n-ary polynomial ( ) 3) According to Equation (6), construct the set of instances to be tested and substitute these instances into TS one by one, solve the specific values of the constraint variables, and then substitute them into g. If 0 g = , it indicates that the instance is consistent with the theorem; Otherwise, this geometric theorem is generally invalid.

Intuitive Geometry Based on Experimental
Mathematics and Statistical Analysis

Data and Statistics
The algebraic methods such as Wu's method, single-example illustration method, and numerical parallel method prove geometric theorems. It is necessary to algebraize the geometric theorems. We propose to calculate the numerical value of the geometric configuration instance without algebraic processing, so it needs to generate a large number of geometric configuration legends. Data can be generated by changing the free points in the geometric configuration. We use Maple to write a dynamic geometry subroutine module similar to the geometric sketchpad and super sketchpad to realize the data generation of the geometric configuration.
The algebraic method proves the geometric theorem, and a polynomial ( ) 1 2 , , , 0 n f x x x = expresses the geometric relationship by selecting appropriate coordinates. Our Maple program simulates the intelligent subject to observe the geometric configuration intuitively, adding slight disturbances to the data and rounding the coordinates of the points. In this way, it is not possible to directly use the polynomial to express the geometric relationship. In analogy to the geometric predicate in the algebraic method, we have constructed relevant statistics to express the geometric relationship.
The construction of statistics satisfies the following three principles: 1) 0 f = , if and only if a particular geometric relationship is strictly valid numerically, the degree of approximate validity of a particular geometric relationship is measured by the degree of deviation from 0.
2) Statistics should eliminate the influence of dimensions.

Distribution Similarity Geometric Relationship Detection
In this section, we propose a geometric relationship detection method based on distribution similarity. Before that, let me introduce the methods of measuring the similarity of distributions and the nonparametric test methods used in this paper.
Considering the similarity of two probability distributions, P and Q, Kullback-Leibler divergence (KL divergence) in Equation (11) and Jensen-Shannon divergence (JS divergence ) in Equation (12) can be used.
When the support sets of the two distributions, P and Q, do not overlap or the overlap is small, it is difficult for KL divergence and JS divergence to quantify the similarity between the distributions. In recent years, the similarity of the Wasserstein distance Equation (13) (14) In this way, the calculation of p-Wasserstein distance is simplified. In the previous section, we constructed the statistics of geometric relations and mapped the observation data to one dimension to use 1-Wasserstein distance Equation (15) to measure similarity.
( ) ( ) ( ) In statistics, hypothesis testing is often used in statistical inference, inferring hypotheses about the population based on empirical data. It can also be used to test whether two distributions come from the same distribution. In this paper, we used two non-parametric tests. One is the Kolmogorov-Smirnov (K-S) test, which uses the K-S statistic Equation (16) or Equation (17) to accept or reject the null hypothesis.
Another method is referred to as the permutation test based on the 2-Wasserstein distance used in Matsui et al. [19] and Schefzik et al. [20]. Considering the Wasserstein distance when 1 d = and 2 p = , it can be decomposed into three parts [21] in Equation (18).
Among them, the mean, variance and shape of the three items on the right are respectively distributed.   (20) ( ) The geometric relationship detection method based on distribution similarity mainly includes the following steps: Step 1: Call the Maple subroutine to generate the corresponding geometric configuration legend, and generate a large sample data according to the dynamic geometry.
Step 2: Randomly select a batch of samples, and measure the similarity according to Equation (15). Under fixed disturbance δ , construct the standard The complete process can be found in Algorithm 1.

Integral Coefficient Invariant Discovery
In Section 3.2, we propose a geometric relationship detection method based on distribution similarity to explore the deterministic vertical and collinear geometric relationships in geometric configurations. In this section, we explore the integer coefficient relationship between the lengths of geometric quantities. This  [22] researched a PSLQ algorithm with empirical data as input. Our data is observational data, and the accuracy of the data does not meet the requirements of these algorithms, so we try to solve it in the following way. First, we converted the sample data, and the generated data is uniformly expressed as an array of points P Equation (23) Usually, in plane geometry, the integer coefficients between geometric quantities are relatively small. Due to the existence of such prior knowledge, we add regular term constraints based on Equation (27) to obtain Equation (28).

Numerical Experiment
In this section, we construct observation data of some geometric theorems and   Table 1.
Since there are many collinear and perpendicular relationships, it is too verbose to list them all.
Here are examples of each type of geometric relationship.
The results are shown in Table 2 Finally, in the numerical experiment of Candy theorem, the theorem's conclu- sion of the solution vector x will be increased very largely, and the error will also accumulate due to the multiplication of each item.
The comparative experimental results of the other two groups of different disturbances can be seen in Table 3 and Table 4. The error of the integral coefficient invariant relationship in Table 4 is taken from the average of the results of 10 experiments, and "−" means that the result of the theorem is not obtained.

Conclusion
In this paper, we construct statistics that measure approximate geometric relationships for inaccurate observation data, map the observation data to one dimension through statistics. Using the distribution similarity of the Wasserstein distance metric, we propose a method for detecting geometric relationship similarities. The method has been successfully applied for checking the following   We will try to overcome this difficulty by control error accumulation in numerical analysis. An interesting problem is to train computers to find latent inequalities from configuration data. A very simple and famous example of such kind is Euler's Inequality, which states that the distance d between the incenter r and the circumcenter R of a triangle satisfies ( ) 2 2 , d R R r = − and therefore 2 R r ≥ . Since almost interesting theorems that involved equalities in Euclidean geometry have been well established in past three thousand years, a prospective application of machine intelligence in the future would be automated discovering of geometric inequalities through analyzing big data of geometric configurations.