_{1}

^{*}

This paper models a biological brain—excluding motivation (e.g., emotions)—as a Finite Automaton in Developmental Network (FA-in-DN), but such an FA emerges incrementally in DN. In artificial intelligence (AI), there are two major schools: symbolic and connectionist. Weng 2011 [1] proposed three major properties of the Developmental Network (DN) which bridged the two schools: 1) From any complex FA that demonstrates human knowledge through its sequence of the symbolic inputs-outputs, a Developmental Program (DP) incrementally develops an emergent FA itself inside through naturally emerging image patterns of the symbolic inputs-outputs of the FA. The DN learning from the FA is incremental, immediate and error-free; 2) After learning the FA, if the DN freezes its learning but runs, it generalizes optimally for infinitely many inputs and actions based on the neuron’s inner-product distance, state equivalence, and the principle of maximum likelihood; 3) After learning the FA, if the DN continues to learn and run, it “thinks” optimally in the sense of maximum likelihood conditioned on its limited computational resource and its limited past experience. This paper gives an overview of the FA-in-DN brain theory and presents the three major theorems and their proofs.

Our computational theory [

The computation in the former (A) is carried out by target-precise neuron-to-neuron signal transmissions. Weng & Luciw 2012 [

The computation in the latter (B) is based on target-imprecise diffusion of neural transmitters that diffuse across brain tissue. Weng et al. 2013 [

This paper will not further discuss the motivation part of a biological brain and will instead concentrate on the former―(A) the basic brain circuits and functions. In other words, the theory below models any emotion-free brain. DNs with emotion such as pain avoidance and pleasure seeking will be only briefly discussed in Section 9.

This theory here does not claim that the FA-based brain model is indeed complete for an emotion-free brain, because there is no widely accepted and rigorous definition of a natural phenomenon such as a brain and therefore, there is always some limitation for any theory to explain a natural phenomenon. As such, as any theory can only approximate a natural phenomenon but can never exhaust such an approximation. The Newtonian physics is a good example because it is refined by the relativity theory.

All computational networks fall into two categories: Symbolic Networks (SNs) and Emergent Networks. The former category uses symbolic representations and the latter uses emergent representations. See the review for symbolic models and emergent models in Weng 2012 [

The class of SN [

The class of Emergent Network includes all neural networks that use exclusively emergent representations, such as Feed-forward Networks, Hopfield Networks, Boltzmann Machines, Restricted Boltzmann Machines, Liquid State Machines, and Reservoir Computing, and the newer Developmental Networks (DNs) [

The major differences between a Symbolic Network (SN) and a Developmental Network (DN) are illustrated in

Marvin Minsky 1991 [

Computationally, feed-forward connections serve to feed sensory features [

The work of Finite Automata (FA) played a major role in our theory about the brain. The work of Weng 2011 [

The above studies aimed successfully compute the state transition function using a programmed network, but they do not generate emergent representations, do not learn from observations of FA operations, do not deal with natural input images (or patterns), and do not deal with natural motor images (or patterns), and not incrementally learn.

As far as we know, the DN in Weng 2011 [

1) uses fully emergent representations,

2) allows natural sensory firing patterns,

3) allows the motor area to have subareas where each subarea represents either an abstract concept (location, type, scale, etc.) or natural muscle actions (e.g., driving a car or riding a bicycle),

4) uses a general-purpose and unified area function that does not need interactive approximation and does not have local minima in its high dimensional and nonlinear but non-iterative approximation,

5) learns incrementally―taking one-pair of sensory pattern and motor pattern at a time to update the network and discarding the pair immediately after―and,

6) uses an optimization scheme in which every update of the network realizes the maximum likelihood estimate of the network, conditioned on the limited computational resources in the network and the limited learning experience in the network’s “life time”.

Explained in Weng 2012 [

In the following, we analyze how the DN theory bridges the symbolic school and the connectionist school in artificial intelligence (AI). First, Section 2 presents the algorithm for the Developmental Program (DP) of the DN. Section 3 gives a temporal formulation of FA to facilitate understanding the brain theory. Then, Section 4 proposes that the framework for FA is complete. All FAs in this paper are Deterministic FA. So, we call them simply FAs. How a DN learns incrementally from an FA is discussed in Section 5. The three theorems are presented and proved in Section 6 through Section 8.

Theorem 1 states that for any FA that operates in real time, there is an emergent DN that learns the FA incre- mentally. It observes one state-and-input pair from the FA at a time, learns immediately and becomes error-free for all the FA transitions that it has learned, regardless how many times a transition has been observed―one is sufficient but more results in better optimality in the real world. The DN is equivalent to the part of FA that corresponds to all transitions that have demonstrated so far.

Theorem 2 establishes that if the FA-learned DN is frozen―computing responses only but not updating its adaptive parts, the frozen DN is optimal in the sense of maximum likelihood when it takes inputs from infinitely many possible cases in the world.

Theorem 3 asserts that the FA-learned DN, if it is allowed to continue to learn from infinitely many possible cases in the world, is optimal in the sense of maximum likelihood.

Section 9 briefly discusses experiments of DN. Section 10 provides concluding remarks and discussion.

The small DP algorithm self-programs logic of the world into a huge DN based on experiences in its physical activities. A DN has its area Y as a “bridge” for its two banks, X and Z, as illustrated in

Biologically, a DP algorithm models the collective effects of some genome properties of the cells of the nervous system―neurons and other types of cells in the nervous system [

In artificial intelligence, a DP algorithm is the result of human understanding of the development of natural intelligence followed by a human DP design based such understanding. This approach, known as developmental approach [

Some parameters of DP (e.g., the number of cells in Y) could be experimentally selected by a genetic algorithm, but the DP as a whole seems to be extremely expensive for any artificial genetic algorithm to reach without handcrafting (e.g., see the handcrafted area function below).

Human design of DP algorithm [

The quality in a human-designed DP, when the DP is widely used in the future, greatly affects all the capabilities in the developmental robots and computers that use the DP.

In the DN, if Y is meant for modeling the entire brain, X consists of all receptors and Z consists of all effectors―muscle neurons and glands. Additionally, the Y area of the DP can also model any Brodmann area in the brain and if so, the X and Z correspond to respectively, the bottom-up areas and top-down areas of the Brodemann area. From the analysis below, we can also see that the Y area of the DN can model any closely related set of neurons―Brodmann area, a subset, or a superset.

The most basic function of an area Y seems to be prediction―predict the signals in its two vast banks X and Z through space and time.

Algorithm 1 (DP) Input areas: X and Z. Output areas: X and Z. The dimension and representation

of X and Y areas are hand designed based on the sensors and effectors of the species (or from evolution in biology). Y is the skull-closed (inside the brain), not directly accessible by the outside.

1) At time

2) At time

a) Every area A performs mitosis-equivalent if it is needed, using its bottom-up and top-down inputs

b) Every area A computes its area function

where

c) For every area A in

The DN must update at least twice for the effects of each new signal pattern in X and Z, respectively, to go through one update in Y and then one update in Z to appear in X and Z.

In the remaining discussion, we assume that Y models the entire brain. If X is a sensory area,

The area function

which measures the degree of match between the directions of

To simulate lateral inhibitions (winner-take-all) within each area A, only top-

The area dynamically scale top-k winners so that the top-k respond with values in

All the connections in a DN are learned incrementally based on Hebbian learning―cofiring of the pre- synaptic activity

where

The simplest version of

where

The initial condition is as follows. The smallest

In other words, any initialization of weight vectors will only determine who win (i.e., which newly born neurons take the current role) but the initialization will not affect the distribution of weights at all. In this sense, all random initializations of synaptic weights will work equally well―all resulting in weight distributions that are computationally equivalent. Biologically, we do not care which neurons (in a small 3-D neighborhood) take the specific roles, as long as the distribution of the synaptic weights of these neurons lead to the same computational effect. This neuronal learning model leads to the following conjecture.

Conjecture 1 In a small 3-D neighborhood (e.g., of a hundred nearby neurons), neural circuits are so different across different biological brains that mapping the detailed neuron wiring of brain is not informative at the level of individual neuron.

The NIH Connectome program aims to “map the neural pathways ... about the structural and functional connectivity of the human brain. ... resulting in improved sensitivity, resolution, and utility, thereby accelerating progress in the emerging field of human connectomics”. The DN theory and the above conjecture predict that such an NIH program is not as scientifically useful as the NIH program hoped in terms of understanding how the brain works and future studies of abnormal brain circuits. For the brain, “more detailed connectomics data” seems to be not as productive as more complete and clear theories.

In this section, we present an FA as a temporal machine, although traditionally an FA is a logic machine, driven by discrete event of input.

As we need a slight deviation from the standard definition of FA, let us look at the standard definition first.

Definition 1 (Language acceptor FA) A finite automaton (FA) M is a 5-tuple

This classical definition is for a language acceptor, which accepts all strings x from the alphabet Σ that belongs to a language L. It has been proved [

We need to extend the definition of FA for agents that run at discrete times as follows.

Definition 2 (Agent FA) A finite automaton (FA) M for a finite symbolic world is a 4-tuple

The inputs to an FA are symbolic. The input space is denoted as

The outputs (actions) from a language acceptor FA are also symbolic,

An agent FA is an extension from the corresponding language FA, in the sense that it outputs the state, not only the acceptance property of the state. The meanings of each state, which are handcrafted by the human programmer but are not part of the formal FA definition, are only in the mind of the human programmer. Such meanings can indicate whether a state is an accepting state or not, along many other meanings associated with each state as our later example will show. However, such concepts are only in the mind of the human system designer, not something that the FA is “aware” of. This is a fundamental limitation of all symbolic models. The Developmental Network (DN) described below do not use any symbols, but instead (image) vectors from the real-world sensors and real-world effectors. As illustrated in

Without loss of generality, we can consider that an agent FA simply outputs its current state at any time, since the state is uniquely linked to a pair of the cognition set and the action set, at least in the mind of human designer.

The FA-in-DN framework is useful for understanding how a DN works. However, FA itself is handcrafted by a human teacher, or in other words, the behaviors of an autonomously developed human teacher.

It has been proved [

From the above discussion, we can see that the key power of an FA is to lump very complex equivalent

A Turing Machine (TM) [_{0} is the initial state, and

where

1) Define

Each state in

2) Let

The above transition function

Therefore, the controller of any TM is an FA. A grounded DN can learn the FA perfectly. It takes input

The completeness of agent FA-in-DA can be described as follows. Given a vocabulary S' representing the elements of a symbolic world, a natural language L is defined in terms of S' where the meanings of all sentences (or events) in L are defined by the set of equivalent classes, determined by Q' of FA-in-DN. When the number of states is sufficiently large, a properly learned FA-in-DN can sufficiently characterize the cognition and behaviors of an agent living in the real physical world with vocabulary S'.

This argument is based on the following observation: as long as the context state

generate the next state:

As a simple example, an FA-in-DN can accept the context-free language

strings that consist of n a’s followed by the same number of b’s, by simulating how a TM works on a tape to accept the language L.

The Chomsky hierarchy [

In particular, it is important to note that a state can remember very early event [

But FA-in-DN goes beyond the symbolic AI, because it automatically develop internal representations― emergent.

Next, let us consider how a DN learns from any FA. First we consider the mapping from symbolic sets S and q to vector spaces X and Z.

Definition 3 (Symbol-to-vector mapping) A symbol-to-vector mapping m is a one-to-one mapping

A binary vector of dimension d is such that all its components are either 0 or 1. It simulates that each neuron, among d neurons, either fires with a spike

rate around

Let

Definition 4 (Binary-p mapping) Let

The larger the p, the more symbols the space of Z can represent. However, through a binary-p mapping, each symbol

Suppose that a DN is taught by supervising binary-p codes at its exposed areas, X and Z. When the motor area Z is free, the DN performs, but the output from Z is not always exact due to (a) the DN outputs in real numbers instead of discrete symbols and (b) there are errors in any computer or biological system. The following binary conditioning can prevent error accumulation by suppressing noise and normalizing the spikes as 1, which the brain seems to use through spikes.

Definition 5 (Binary conditioning) For any vector from

The binary conditioning must be used during autonomous performance as long as the Z representations use spikes, instead of firing rates. Machines zeros are noises from computer finite precision in representing a number. The binary conditioning suppresses the accumulation of such computer generated round-off errors. Because the Z representation is binary by definition, the binary conditioning forces the real numbers to become 0 or 1 only. However, the actual value of machine zero is computer dependent, depending on the length to represent a real number. In particular, the case of a constant Z vector of all ones will not appear incorrectly because all noises components that are meant to be 0 are set back to 0.

The output layer Z that uses binary-p mapping must use the binary conditioning, instead of top-k competition with a fixed k, as the number of firing neurons ranges from 1 to p.

Algorithm 2 (DP for GDN) A GDN is a DN that gives the following specific way of initialization. It starts from pre-specified dimensions for the X and Z areas, respectively. X represents receptors and is totally determined by the current input. But it incrementally generates neurons in Y from an empty Y (computer programming may use dynamic memory allocation). Each neuron in Z is initialized by a synaptic vector v of dimension 0, age 0. Suppose

1) Increment the number of neurons

2) Add a new Y neuron. Set the weight vector

The response value of each Z neuron is determined by the starting state (e.g., background class). As soon as the first Y neuron is generated, every Z neuron will add the first dimension in its synaptic vector in the following DN update. This way, the dimension of its weight vector continuously increases together with the number c of Y neurons.

In this section, we establish the most basic theorem of the three, Theorem 1. First, we give an overview. Next, we establish a lemma to facilitate the proof of Theorem 1. Then, we present Theorem 1. Finally, we discuss grounded DN.

We first give an overview to facilitate the understanding of the proofs.

Although an FA is a temporal machine, the classic way to run an FA at discrete events that correspond to the time when the FA receives a symbolic input [

An FA, such as the one in

Because all three areas X, Y, Z in the DN all compute in parallel in Algorithm 1, we have two parallel computation flows in

1) The first flow corresponds to

2) The second flow has y in the first column,

Both flows satisfy the real-world events, but for the FA logic here we let the second flow simply repeat (retain) the first flow. Therefore, due to these two flows, the DN must update at least twice for each pair

The X area is always supervised by x as the binary pattern of σ.

The number of Z firing neurons depends on the number of different physical patterns required for Z, but we assume that the Z area uses binary representations. Each firing Z neuron, supervised by the current q from the FA as vector z, accumulates the firing frequency of the current single firing Y neuron as the corresponding Y -to-Z synaptic weights of the Z neuron. The incremental average in the Hebbian learning of Equation (4) is ex-

actly what is needed to compute this firing frequency. This firing frequency is equal to the discrete probability required for the optimality in the later Theorems 2 and 3.

The Y area of the GDN is empty to start with. Whenever there is a new input

With this overview, we are ready for Lemma 1.

This subsection is a little long because of the detailed and complete proof, but I use the top-level Case 1 (new Y input) and Case 2 (old Y input) in the proof to organize the material. Each Case first considers Y and then Z. When we consider Z, we have Case

Lemma 1 (Properties of a GDN) Suppose a GDN simulates any given FA using top-1 competition for Y, binary-p mapping, and binary conditioning for Z, and update at least twice in each unit time. Each input

1) The winner Y neuron matches perfectly with input

2) All the synaptic vectors in Y are unit and they never change once initialized, for all times up to t. They only advance their firing ages. The number of Y neurons c is exactly the number of learned state transitions up to time t.

3) Suppose that the weight vector v of each Z neuron is

rate straight recursive average

4) Suppose that the FA makes transition

Proof. The proof below is a constructive proof, instead of an existence one. To facilitate understanding, the main ideas are illustrated in

We prove it using induction on integer t.

Basis: When

Hypothesis: We hypothesize that the above four properties are true up to integer time t. In the following, we prove that the above properties are true for t + 1.

Induction step: During t to t + 1, suppose that the FA makes transition

At the next DN update, there are two cases for Y: Case 1: the transition is observed by the DN as the first time; Case 2: the DN has observed the transition.

Case 1: new Y input. First consider Y. As the input

As DN updates at least twice in the unit time, Y area is updated again for the second DN update. But X and Z retain their values within each unit time, per simulation rule. Thus, the Y winner is still the same new neuron and its vector still does not change as the above expression is still true. Thus, properties 1 and 2 are true for the first two DN updates within

Next consider Z. Z retains its values in the first DN update, per hypothesis. For the second DN update, the response of Z is regarded the DN’s Z output for this unit time, which uses the above Y response as illustrated in

We can see that Equations (7) and (8) are the same Hebbian learning, but the former is for Y and the latter is for Z. Note that Z has only bottom input

Subcase (1.a): the Z neuron should fire. All Z neurons that should fire, up to p of them, are supervised to fire for the second DN update by the Z area function. Suppose that a supervised-to-fire Z neuron has a synapse vector

which is the correct count for the new

for all

Subcase (1.b): the Z neuron should not fire. All Z neurons that should not fire must be supervised to be zero (not firing). All such Z neurons could not be linked with the new Y neuron because the new Y neuron was not present until now. However, in computer programming or hardware circuits, each non-firing Z neuron must add a zero-weight link from this new Y neuron. Otherwise, the Z neuron never “sees” the new Y neuron and can never link from it when the Z neuron fires in the future. All these non-firing neurons keep their counts and ages unchanged. As Y response does not change for more DN updates within

The binary conditioning for Z makes sure that all the Z neurons that have a positive pre-response to fire fully. That is, the properties 3 and 4 are true from the first two DN updates within

Case 2: old Y input. First consider Y. To Y,

Next consider Z. Z retains its previous vector values in the first DN update, per hypothesis. In the second DN update, the transition is not new, we show that Z does not need to be supervised during the unit time

where

(occurrences) of Y neuron

For sub-case (2.a) where the Z neuron should fire, we have

because the Z neuron has been supervised at least the first time for this transition and thus

which is the correct count for the

for all

Next consider sub-case (2.b) where the Z neuron should not fire. Similarly we have

Combining the sub-cases (2.a) and (2.b), all the Z neurons act perfectly and the properties 3 and 4 are true for the first two DN updates. We have proved for Case 2, old Y input.

Therefore, the properties 1, 2, 3, 4 are true for first two DN updates. If DN has time to continue to update before time

According to the principle of induction, we have proved that the properties 1, 2, 3 and 4 are all true for all t.

Using the above lemma, we are ready to prove:

Theorem 1 (Simulate any FA as scaffolding) The general-purpose DP incrementally grows a GDN to simulate any given FA

Proof. Run the given FA and the GDN at discrete time t,

If the training data set is finite and consistent (the same

Definition 6 (Grounded DN) Suppose that the symbol-to-vector mapping for the DN is consistent with the real sensor of the a real-world agent (robot or animal), namely, each symbol

For a grounded DN, the SN is a human knowledge abstraction of the real world. After training, a grounded DN can run in the real physical world, at least in principle. However, as we discussed above, the complexity of symbolic representation for S and Q is exponential in the number of concepts. Therefore, it is intractable for any SN to sufficiently sample the real world since the number of symbols required is too many for a realistic problem. The fact that there are enough symbols to model the real world causes the symbolic system to be brittle. All the probability variants of FA can only adjust the boundaries between any two nearby symbols, but the added probability cannot resolve the fundamental problem of the lack of sufficient number of symbols.

The next theorem states how the frozen GDN generalizes for infinitely many sensory inputs.

Theorem 2 (DN generalization while frozen) Suppose that after having experienced all the transitions of the FA from time

1) freezes: It does not generate new Y neurons and does not update its adaptive part.

2) generalizes: It continues to generate responses by taking sensory inputs not restricted to the finite ones for the FA.

Then the DN generates the Maximum Likelihood (ML) action

where the probability density

Proof. Reuse the proof of the lemma. Case 1 does not apply since the DN does not generate new neurons. Only Case 2 applies.

First consider Y. Define

Given observation

According to the dependence of parameters in DN, first consider c events for area Y:

where

Note that

since finding the ML estimator

Next, consider Z. The set of all possible binary-1 Y vectors and the set of producible binary-p Z vectors have a one-to-one correspondence:

for every

There seems no more proper terms to describe the nature of the DN operation other than “think”. The thinking process by the current basic version of DN seems similar to, but not exactly the same as, that of the brain. At least, the richness of the mechanisms in DN that has demonstrated experimentally to be close to that of the brain.

Theorem 3 (DN generalization while updating) Suppose that after having experienced all the transitions of the FA from time

1) fixes its size: It does not generate new Y neurons.

2) adapts: It updates its adaptive part

3) generalizes: It continues to generate responses by taking sensory inputs not restricted to the finite ones for the FA.

Then the DN “thinks” (i.e., learns and generalizes) recursively and optimally: for all integer

where

The firing of each Z neuron has a freedom to choose a binary conditioning method to map the above the pre-response vector

Proof. Again, reuse the proof of the lemma with the synaptic vectors of Y to be

First consider Y. Equation (16) is still true as this is what DN does but V is now adapting. The probability density in Equation (15) is the currently estimated version based on past experience but V is now adapting. Then, when

Next, consider Z. From the proof of the Lemma 1, the synaptic weight between the j-th Y neuron and the n-th Z neuron is

The total pre-response for the n-th neuron is

since the j-th neuron is the only firing Y neuron at this time. The above two expressions give Equation (18).

The last sentence in the theorem gives the freedom for Z to choose a binary conditioning method but a binary conditioning method is required in order to determine which Z neurons fire and all other Z neurons do not. In the brain, neural modulation (e.g., expected punishment, reward, or novelty) discourages or encourages the recalled components of z to fire.

The adaptive mode after learning the FA is autonomous inside the DN. A major novelty of this theory of thinking is that the structure inside the DN is fully emergent, regulated by the DP (i.e., nature) and indirectly shaped (i.e., nurture) by the external environment.

The neuronal resource of Y gradually re-distribute according to the new observations in

However, an adaptive DN does not simply repeat the function of the FA it has learned. Its new thinking experience includes those that are not applicable to the FA. The following cases are all allowed in principle:

1) Thinking with a “closed eye”: A closed eye sets

2) Thinking with an “open eye”: In the sensory input x is different from any prior input.

3) Inconsistent experience: from the same

The neuronal resources of Y gradually re-distribute according to the new context-motor experience in

In the developmental process of a DN, there is no need for a rigid switch between FA and the real-world learning. The mitosis-equivalent of Y neurons is gradually realized by gradual mitosis and cell death, neuronal migration and connection, neuronal spine growth and death, and other neuronal adaptation. DN can also switch between neuronal initialization and adaptation smoothly. The rigid switches between neuronal initialization and neuronal adaptation and between FA learning and the real-world learning above are meant to facilitate our understanding and analysis only.

The binary conditioning is suited only when Z is supervised according to the FA to be simulated. As the “thinking” of the DN is not necessarily correct, it is not desirable to use the binary conditioning for Z neurons. For example, a dynamic threshold can be used for

The thinking process by the current basic version of DN seems similar to, but not exactly the same as, that of the brain. At least, the richness of the mechanisms in an experimental DN is not yet close to that of an adult brain. For example, the DN here does not use neuromodulators so it does not prefer any signals from receptors (e.g., sweet vs bitter).

Due to the focused theoretical subject here and the space limitation, detailed experimental results of DN are not included here. The DN has had several versions of experimental embodiments, called Where-What Networks (WWNs), from WWN-1 [

A learned WWN can simultaneously detect and recognize learned 3-D objects from new unobserved cluttered natural scenes [

The function of this space-time machine DN differs depending on the context information in its Z area [

A WWN can also perform autonomous attention. If the DN suppresses the firing neuron that represents an object type in TM, the WWN switches attention from one object type to another object type that barely lost in the previous Y competition―feature-based autonomous attention. If the DN suppresses the firing neuron in LM, the WWN switches attention from one object location to another object location that barely lost in the previous Y competition―location-based autonomous attention.

The WWN has also performed language acquisition for a subset of natural language and also generalized and predicted [

The WWNs have versions that are motivated, such as pain avoidance and pleasure seeking, so that its learning does not need to be supervised [

However, the experimental results from such DN experiments are difficult to understand and to train without a clear theoretical framework here that links DNs with the well-known automata theory and the mathematical properties presented as the three theorems that have been proved here.

Proposed first in Weng 2011 [

Therefore, it appears that a valid brain model at least should not assume a static existence of―genome rigidly specified―Brodmann areas. This static existence has been prevailing in almost all existing biologically inspired models for sensorimotor systems. Instead, a brain model should explain the emergence and known plasticity of brain areas. DP enables areas to emerge in DN and adapt. The genome provides the power of cells to move and connect. The genome also plays a major role in early and coarse connections of a brain. However, fine connections in the brain seem to be primarily determined by the statistics of activities from the conception of the life all the way up to the current life time.

In conclusion, this paper provides an overarching theory of the brain and mind, although the complexity of the mind is left to the richness of the environment and the activities of DN―task nonspecific [

The author would like to thank Hao Ye at Fudan University who carefully proof-read the proofs presented here and raised two gaps that I have filled since then. The author would also like to thank Z. Ji, M. Luciw, K. Miyan and other members of the Embodied Intelligence Laboratory at Michigan State University; Q. Zhang, Yuekai Wang, Xiaofeng Wu and other members of the Embodied Intelligence Laboratory at Fudan University whose work has provided experimental supports for the theory presented here.