Applying DNA Computation to Error Detection Problem in Rule-Based Systems

As rule-based systems (RBS) technology gains wider acceptance, the need to create and maintain large knowledge bases will assume greater importance. Demonstrating a rule base to be free from error remains one of the obstacles to the adoption of this technology. In the past several years, a vast body of research has been carried out in developing various graphical techniques such as utilizing Petri Nets to analyze structural errors in rule-based systems, which utilize propositional logic. Four typical errors in rule-based systems are redundancy, circularity, incompleteness


Introduction
Adoption of expert systems in real world applications has been greatly increased.In past years, much effort has been devoted to analyze different aspects of rule-based systems such as knowledge representation, reasoning, and verification of rule-based systems [1]- [5].A rule base, which is the central part of an expert system codifies the knowledge from domain expert in the form of inference rules.Often these inference rules are built into a rule base incrementally over years and subject to frequent refinements.Due to the different and even conflicting views provided by domain experts besides the above construction process, a rule base can contain many structural errors.According to [1]- [4] [6], and several other studies, four typical types of structural errors include inconsistency (conflict rules), incompleteness (missing rules), redundancy (redundant rules), and circularity (circular depending rules).Many different techniques have been developed to detect the above errors in rule-based systems.Earlier work mainly focused on detecting structural errors by checking rules pair-wisely [7]- [9].Recent work aimed at detecting structural errors caused from applying multiple rules in longer inference chains.The majority of recent verification techniques involve using some graphical notations such as graphs [10]- [12], and Petri nets [2] [3] [6] [13]- [15].
DNA computation has emerged in recent years as an exciting new research field at the intersection of computer science, biology, engineering, and mathematics.There exist two main barriers to the continued development of traditional silicon-based computers [16].Invention of silicon integrated circuits and advances in miniaturization has led to incredible increases in processors speed and memory access time.However, there is a limit to how far this miniaturization can go.Eventually chip fabrication will hit the wall imposed by the Heisenberg Uncertainty Principle (HUP) [16].DNA computing based on its complementary characteristics and massive parallelism (when a step is performed in an experiment, the operation is performed in parallel on all molecules in the tube) has the potential to solve complex problems such as NP-Complete ones [16].The physician Richard Feynman first proposed the idea of using living cells and molecular complexes to construct sub-microscopic computers [17].After Feynman proposal, there has been an explosion of interest in performing computations at molecular level.Adleman who used DNA strands to solve a directed Hamiltonian path problem, indicated the feasibility of a molecular approach to solve combinatorial problems [18].Subsequently, by solving satisfiability problem (SAT), Lipton demonstrated the advantage of using the massive parallelism inherent in DNA-based computing.
Authors in [1] proposed algorithms, which utilized DNA computing to render an error free rule base for rulebased systems.The algorithms proposed in [1] lack generality since just for special cases of rule base can work and there are cases that structural errors are not removed correctly by utilizing their algorithms.Rules in a Rule-Based system are typically formed as X → Y, in which X is an antecedent node (AN) and Y is a conclusion node (CN).Two nodes, atomic and compound, exist in rule bases.We are interested in the problem domain of Horn Clauses, as addressed in [2]- [4], which allow only one conclusion part in each rule and compound antecedents in rules are only presented in the conjunction format.So, before we transform rules to their corresponding rule paths, a normalization should be carried out in order to obtain Horn Clause form of the rules [3] [4].In this paper, we are interested in finding structural errors and the set of rules causing these errors.The reasons of structural errors may be due to rule conflicting, mismatched condition and conclusion, and circular and redundant rules [3] [13] [15].Inconsistent rules result in conflict, which is the direct source of incorrect rule derivation.Redundant rules increase the size of rule base and cause non necessary reasoning.Incomplete rules prohibit the rule base from activating certain normal rule derivation.Circular dependent rules will force the rule base to run into an infinite loop of reasoning.
In this study, algorithms are developed to cope with all cases.Our algorithms have the ability to detect structural errors in any form that they may occur in rule base.As a result, DNA computing as an alternative to verify structural errors in rule-based systems gains more generality.The remainder of this paper is organized as follows.Structural errors are briefly described in Section 2. In Section 3, we outline DNA computation and introduce our DNA-based algorithms to detect structural errors in Rule-Based systems.We analyze the complexity of our algorithm and conclusion is represented in Section 4.

Typical Structural Errors in Rule Bases
• Redundancy.When unnecessary rules exist in the rule base, redundancy occurs.These rules not only increase the size of the rule base but also may cause additional useless inferences.Redundancy is a potential source of inconsistency when knowledge is updated [4].A rule is redundant with respect to the conclusion if two rules have identical conditions and conclusions (identical rules) or two rules have identical conclusions while the condition for one rule is either a generalization or special case of the condition for the other one [3] [4].The rule, which has more general condition subsumes (is stronger than) the other rule.Logical redundancy implies operational redundancy [4].Thus, subsumed (less general) rules can be eliminated, which has no influence on logical inference capability [4].• Incompleteness.When there are missing rules in a rule base, incompleteness occurs.Except the rules for representing facts and goal nodes, a rule is called as a useless rule if the rule's condition (conclusion) cannot be matched by other rules' conclusion (condition).The unmatched condition are called dangling conditions, while the unmatched conclusions are called dead-end conclusions.Mostly, the reasons of useless rules are due to some missing rules.• Circularity.When two or several rules have circular dependency, Circularity occurs.Circular dependent rules can cause infinite reasoning and must be broken.• Inconsistency.Since inconsistent rules result in conflict facts, for correct functioning of an expert system, inconsistency must be resolved.Two rules r 1 and r 2 (that their conclusions are not compatible) are inconsistent if there exists a state, such that simultaneously both antecedents (pre-conditions) of r 1 and r 2 can be fired [4].

The Structure of DNA and Basic Denaturing and Annealing Operations
DNA (deoxyribonucleic acid) encodes the genetic information of cellular organisms [16].It consists of polymer chains, commonly referred to as DNA strands.Each strand may be viewed as a chain of nucleotides, or bases, attached to a sugar-phosphate backbone.The four DNA nucleotides are Adenine, Guanine, Cytosine, and Thymine, commonly abbreviated to "A", "G", "C", and "T" respectively.Each strand, according to chemical convention, has a 5ʹ and 3ʹ end.Thus, any single strand has a natural orientation.This orientation is due to the fact that one end of the single strand has a free 5ʹ phosphate group, and the other end has a free 3ʹ deoxyribose hydroxyl group."A" bonds with "T" and "G" bond with "C".The pairs (A, T) and (G, C) are therefore known as complementary base pairs.The two pairs of bases form hydrogen bonds between each other.Double stranded DNA may be dissolved into single strands (denatured), by heating the solution to a temperature determined by the composition of the strand [19].Heating breaks the hydrogen bonds between complementary strands.Annealing is reverse of denaturing, whereby a solution of single strands is cooled, allowing complementary strands to bind together (Figure 1). Figure 1 demonstrate the annealing of 5ʹ end of a single strand DNA to the 3ʹ end of another DNA in presence of DNA ligase and a splint.In this figure splint has 20 base pairs and consists of the complement of the 10 nucleotides at the 3ʹ end of one strand and the complement of 10 nucleotides at the 5ʹ end of the other.
In order to anneal the 5ʹ end of a single strand DNA to the 3ʹ end of another DNA, in the presence of "DNA ligase" we hybridize a set of specific splint oligos of length 20.Each splint consists of the complement of the 10 nucleotides at the 3ʹ end of one strand and the complement of 10 nucleotides at the 5ʹ end of the other.

Initial Set Construction
Oligonucleotides uniquely encoding each node and splint are assigned.As proposed by Barich [20], there are some constraints for strand library design indicating that sequences must be designed in a way that strands have little secondary structure in order to prevent unintended probe-library hybridization.Thus, short oligonucleotides uniquely encoding each node and splint must be used.Then, splints are used to join to their complement sequence.Hence, by defined polarity, short single stranded oligos which represent nodes covalently join and create longer single stranded DNA molecules.The above procedure enables us to encode each rule path, including appropriate nodes in form of long single stranded DNA.

Operations Description
We use the basic operations on strands that are defined by Adleman [18], the Parallel Filtering Model of Amos [16], and an alternative filtering-style model called the Stickers Model developed by Roweis [21].In all filtering models a computation consists of a sequence of operations on finite multi-set of strands.It is normally the case that a computation begins and terminates with a single multi set of strands.An initial solution consists of strands which are of length O(n).Where n is the problem size.The initial solution should include all possible solutions (Each encoded by a strand) to the problem to be solved.The point here is that the initial set in any implementation of the model is assumed to be relatively easy to generate as a starting point for the computation.The computation then proceeds by filtering out strands which encode illegal solution and cannot be the result.The operations defined in parallel filtering models and Adleman experiments are as follows.The implementation of these operations can be found in detail in [16] [18] [21] and [22].
• Separate (T, s, k, T on , T off ): This operation separates strands that contain sequence "s" starting from position "k", into T on ; otherwise, into T off [21].• Extract (T, s, T + , T − ): Given a tube T and a sub-strand "s", this operation creates two new sets T + , T − , where T + include all strands in T containing "s", and T − includes all strands in T that do not contain "s" [18].• Union (T, T 1 , T 2 , …, T n ): This operation creates set T, which is the set union of the T 1 , T 2 , …, T n [16].
• Detect (T): Given a set T, this operation returns true (Y), if T contains at least one DNA strand; Otherwise, it returns false (N) [22].• Read (T): This operation describes each DNA strand in set T [22].
• Remove (T): This operation removes all strands in tube T [22].

Encoding Inference Rules by DNA Strands
For a general inference rule with compound antecedent (R:(X 1 Λ… ΛX n ) → Y), each antecedent node (AN) and conclusion node (CN) are encoded by a 20-mer DNA strand.To encode the relations "Λ" and "→", two tetranucleotide sequences, "AAAA" and "CCCC" (and their complements) are used respectively.Thus, by creating 24 nucleotide long splints, which contain the appropriate tetra-nucleotide, relations "X i Λ X j " and "X i → X j " are enforced [1].The "X i → X j " is a splint whose sequence in the 3ʹ-5ʹ direction is the concatenation of the complement of 10 nucleotides at the 3ʹ-end of the strand node X i , four nucleotides of "CCCC", and the complement of 10 nucleotides at the 5ʹ-end of the strand X j .Similarly, "X i Λ X j " is a splint whose sequence is the concatenation of complement of 10 nucleotides at the 3ʹ-end of the strand X i , four nucleotides of "AAAA", and the complement of 10 nucleotides at the 5ʹ-end of strand X j .All strands representing starting nodes are designed in the way that, all of them have common sub-strand "TTTTTTTTTT" at the 5ʹ end of their strands and all strands representing the goal nodes are designed in the way that all of them contain the common sub-strand "GGGGGG GGGG" at the 3ʹ end of their strands.Thus, sequences "AAAAAAAAAA" and "CCCCCCCCCC" are needed in our algorithm to distinguish these nodes, as explained in Section 3.6.1.Finally, in order to make sure that strands representing the complements of "TTTTTTTTTT" and "GGGGGGGGGG" will bond to the starting nodes and goal nodes respectively and nowhere else, all strands representing the other nodes should be designed in the way that they do not have successive "G"s or "T"s at the 3ʹ or 5ʹ end of their strands.Since permutations of k-node compound antecedent creates the same antecedent (k! strands representing the same AN); therefore, for each rule with compound antecedent, one of the CN for the k! strands is chosen randomly and poured into tube T r1 in order to detect redundancy and circularity [1].We encode all permutations of antecedent of the rules with compound AN and pour them into tube T Λ .In order to detect subsumed rules, special tubes T Λsk are created as described below.The antecedent of all rules with compound AN, which are comprised of k starting nodes, are poured into tube T Λsk .The strands are labeled with k i .For instance, T Λs1 is comprised of antecedents of all rules with atomic AN of starting nodes and are labeled as 1 i strands (i th strand of tube T Λs1 ) and T Λs2 is comprised of antecedents of all rules with compound AN of two starting node labeled as 2 i strands.Thus tubes T Λsk c , Comprised of many copies of complements of strands in tubes T Λsk are created.These are used in Detect Redundancy algorithm described in Section 3.6.4.

Generating the Solution Space
An essential difficulty in all filtering models is that initial multi sets of strands generally have quantity, which is exponential in the problem size [16].What is done in practice is that an initial set is constructed containing a polynomial number of distinct strands.The design of these strands ensures that exponentially large initial set of the system (rule paths) can be generated automatically [16].Sequence of pre-steps are carried out in order to create the initial solution space containing rule paths.An example of creating the initial set for a simple rule base is depicted in Figure 3.These pre-steps are alike what is presented in [1], as follows.
Pre-Step 1. Tubes T Λ , T →, T r1 , T r2 , T r3 , T c , and T s are needed at this stage.Strands representing splints "XΛY" And "X → Y" are poured into tube T Λ and T→ respectively.Starting nodes and goal nodes are not conclusions and antecedent parts (conditions) of any rule respectively.Thus, in order eliminate these kinds of rules and prevent circularity from occurring to starting nodes (in our sample rule base, X 1 ) and goal nodes (in our sample rule base, X 6 ), copies of strands designed as "GGGG" followed by the 10 nucleotides at the 5ʹ end of starting nodes and 10 nucleotides at 3ʹ end of the goal nodes followed by "GGGG" are poured into tube T→.Thus any splint in T→ that anneals to the above strands, is removed.We may have rules in which the goal nodes are included in compound antecedents.In order to eliminate these kinds of rules, copies of strands designed as 10 nucleotides at 3ʹ end of goal nodes followed by "TTTT" and "TTTT" followed by 10 nucleotides at the 5ʹ end of the goal nodes are poured into tube T Λ .Thus, any splint in tube T Λ that anneals to the above strands is eliminated (assuming Y as goal node, splints formed as X i Λ Y and Y Λ X i are eliminated).Next the CN of rules with compound AN are poured into tube T r1 and strands representing CN for each atomic AN is poured into tube T r2 .In order to identify which CN has more than one rule leading to it, strands in T r2 are poured into tube T r1 to make sure that T r1 contains the CN for all rules.Then only one copy of the complements for all CN is poured into T r1 .Any CN in T r1 that does not anneal to its complement represents CN with more than one rule leading to it and is poured intoTr3.Next copies of all AN are poured into T c .
Pre-Step 2. Splints in T Λ and copies of strands "TTTT" are poured into tube T c and DNA ligation is allowed to occur.By means of splints, each set of compound AN would stick together and creates a double stranded DNA in length of the splint.Then single strands are separated from T c to T s .

Removing Incomplete Rule Paths
In order to remove incomplete rule paths, which do not start with starting nodes or do not lead to the goal node the algorithm below is performed [1].
All strands containing at least one starting node in their sequence are extracted from T into tube T y ; otherwise, into tube T incomp at line 2 (multiple starting nodes can be at the beginning of rule paths in form of compound antecedent of the first rule).This extract operation is carried out by pouring many copies of strand "AAAAAAAA AA" into tube T. This strand only anneals to single strands containing "TTTTTTTTTT".As explained in Section 3.4, only strands representing starting nodes are designed so that all of them have this sub-strand.At line 3, strands containing goal nodes are extracted from T y and poured into tube T; otherwise, into tube T incomp .This extract operation is carried out by pouring many copies of strand "CCCCCCCCCC" into tube T. This strand only anneals to single strands containing "GGGGGGGGGG".As explained in Section 3.4, only strands representing goal nodes are designed so that all of them have this sub-strand.At the end of algorithm, strands in tube T incomp represent incomplete rule paths and should be removed.By performing this algorithm just one time, all complete rule-paths with different starting nodes and goal nodes are extracted.Thus, there is no need to perform the algorithm for all starting nodes and goal nodes repeatedly.Assuming that X i is a starting node and Y i is a goal node, it should be noted that there is no splint to complement the 10 nucleotides at the 5ʹ end of X i ; therefore, in all strands containing X i , it has to be located in front of the strands (in case of starting nodes in form of compound AN there is no splint to complement 10 nucleotides at the 5ʹ end of first starting node).Similarly, there is no splint to complement the 10 nucleotides at the 3ʹ end of Y i .Thus, in all strands containing Y i , it has to be located at the end of the strand.

General Algorithm to Detect Circularity
Algorithm proposed in [1] is aimed at removing all the circularity depending rules that may exist, by removing strands, in which the same node appears in at least two location.Only strands containing the nodes that exist in T r3 are likely to have circularity problem.Assuming z to represent the number of nodes in T r3 .We modify the algorithm presented in [1] and call it "Detect Circularity Part 0" algorithm as follows.
Detect Circularity Part 0 Tube T i f is composed of strands containing any goal node at position q * 24 and thus there is no need to be compared.At the end of this algorithm, all rule chains, in which node X i appears at least twice in the strand are poured into tubes T i cir .We remove these strands from T. There are some cases that this algorithm is unable to remove circularity error and after applying the algorithm, circularity error will remain in rule base.Suppose that after performing above algorithm, X i and X j are found to be circular nodes and there exist at least two paths between nodes X i and X j or more precisely there exist two rules or chains of rules acting reverse between these two nodes (e.g.X i → X j and X j → X i ) besides one or more distinct chains of rules from a starting node (starting nodes can be distinct) leading to nodes X i and X j in addition to distinct chains of rules from these nodes leading to the goal node.In such a situation, this algorithm is unable to remove circularity error.That is, by removing rule paths which have at least two occurrences of nodes X i and X j , circularity error will not be removed.In order to clarify these situations, take simple rule base shown in Figure 4 as an example.Assuming X 1 and X 4 are starting and goal nodes respectively.The resulting directed graph made by these rules and all the paths starting from node X 1 and leading to X 4 is depicted in Figure 4.
By performing the "Detect Circularity Part 0" for nodes X 2 and X 3 strands number 5 and 6 are detected to have two occurrence of these nodes respectively.These strands are poured into tubes T 1 cir and T 2 cir .Next, we remove these strands from T. Now, if we establish the directed graph made by rules embedded in remainder paths, we see that removing paths 4 and 5 will not result in elimination of any of rules causing circularity error.That is, these rules (R 4 : X 2 → X 3 , R 5 : X 3 → X 2 ) exist in other rule paths.Thus, this error remains and the algorithm is unable to remove it.As a matter of fact this is the case for all rule base with rules or chains of rules acting reverse between circular nodes, in addition to rules or chains of rules from a starting node leading to each one of these circular nodes and having from each circular node, paths or more precisely chains of rules, leading to the goal node.Thus, by means of these algorithm circularity error for these cases cannot be eliminated.To make these statements more clear, another instance of rule base and its corresponding directed graph is depicted in Figure 5. Assume nodes X 1 , X 5 to be starting node and goal node respectively.As it is obvious from the directed graph of this rule base, there exists a circle in this rule base comprised of rules R 5 , R 6 , and R 7 .These rules are circularly dependent.Similar to the previous section we make an attempt to remove this error by means of above algorithm.All the complete paths start from node X 1 leading to X 5 are as follows.
By executing the algorithm that explained above, for nodes, which are present in T r3 (X 4 , X 3 , and X 4 ), circular paths {4, 8, 12} are found in which nodes X 2 , X 3 , and X 4 appear twice in these strands respectively.We take into account the remainder paths and establish the directed graph constructed by rules embedded in them (rule paths {1, 2, 3, 5, 6, 7, 9, 10, 11}).Obviously the directed graph made by these rules is the same as the directed graph of original rule base.We notice that all circularly dependent rules (R 5 , R 6 , R 7 ) still exist in remainder paths (and consequently in rule base) and none of them is removed.Thus, after performing the algorithm, circularity error still exists in rule base.It should be noted that, as it is obvious from our examples, in such a situation there exist complete rule paths having two occurrence of circular node X i , in which circular node X j is located between two position of X i .And there exist complete rule paths having two occurrence of circular node X j in which circular node X i is located between two position of X j (in our example paths 4, 8, and 12 have this property for nodes X 4 , X 3 , and X 4 ).
In this section we propose an algorithm, which can perfectly remove circularity errors.This algorithm comprises three parts performed for each pair from circular nodes (X i , X j ) that has been found in the previous  algorithm.It should be noted that all paths at this stage start from starting nodes and there are no rules in which starting nodes are inferred from them.Thus, in performing different parts of our Detect Circularity algorithm, we don't need to check position 0 to 24 of the paths.That is, in all parts of Detect Circularity algorithm, q is initially equal to one (q = 1).First, Detect Circularity Part 0 is performed.This algorithm results in separating paths, in which node X i appears at least in two positions (i.e.circular nodes are found) in the strands and pouring them into tube Ticir.Next, for instance, if we assume that three circular nodes X 2 , X 3 , and X 4 are found, three pairs (X 2 , X 3 ), (X 2 , X 4 ), and (X 3 , X 4 ) should be analyzed in subsequent parts of our algorithms.Using k to represent the number of pairs from circular nodes other parts ofour algorithm is represented in Table 1.
Detect Circularity Part 1: At this part of our algorithm for each pair from circular nodes, the algorithm is performed in parallel as follows.Three extra tubes (T ij a , T ij b , T ij c ) are necessary for each pair (X i , X j ).Initially, k copies of T i cir is created as tubes T ij in parallel.Lines 4 to 9 are carried out until there is no strand in tube T ij .At line 6, strands from T ij having node X i at position q * 24 are extracted and poured into tube T ij a .At line 7, strands from T ij b that all of them has occurrence of X i are extracted and poured into tube T ij c if they have X j located after the first position of X i .Strands in T ij c represent paths having X j located between first and last position of X i (strands in T ij include at least two occurrence of node X i ).Within the q th iteration of the algorithm, line 5 checks whether T ij c contains any strands.If it is so, the algorithm makes sure that there exist paths that X j is located between first and last position of node X i and part one of the algorithm finishes; otherwise, it will continue until tube T ij contains no strands.If tube T ij c contains no strands.After performing this part, it means that there is no rules or chains of rules between these two circular nodes acting reverse (which cause circularity still remains in the rule base after performing Detect Circularity Part 0).Thus by removing strands in T i cir and T j cir from tube T, the remainder paths contain no circularity dependent rules caused by these two nodes.And performing Detect Circularity part 0 is enough in order to remove circularity.The next part of the algorithm is performed.If tube T ij c contains any strand.The second part of the algorithm is performed in parallel as follows.
Detect Circularity Part 2: At this part of our algorithm for each pair from circular nodes, the algorithm is performed in parallel as follows.Three extra tubes (T ji a , T ji b , T ji c ) are necessary for each pair (X i , X j ).Initially, k copies of T j cir is created as tubes T ji in parallel.In this part, between lines 4 to 9, for all pairs of circular nodes in parallel, if there exist a strand in T ji , which has sub-strand representing X i located between two position of substrand representing X j , is poured into tube T ji c .There is no rules or chains of rules acting reverse between circular nodes X i and X j If there is no strand in T ji c .In this situation, by removing strands in tubes T i cir and T j cirfrom T, we will be sure that there is no circularity error considering nodes X i and X j ; otherwise, we put into practice the third part of our algorithm for each pair from circular nodes, which both previous parts has been fulfilled for them as follows.Detect Circularity Part 3: At this part of the algorithm four extra tubes are necessary for each pair (X i , X j ) from circular nodes.Initially, k copies of T are generated as tubes T i .At the final part of our Detect Circularity algorithm for each pair from circular nodes (X i , X j ), X i or X j is selected (here X i is chosen).Then between lines 4 to 9, all paths in which node X j is located after X i are extracted from T i and poured into T i2 in parallel.At the end of this part of our algorithm, tubes T i2 are merged into tube T 2 .We then extract strands in T 2 from tube T. Thus, all complete paths from T, in which node X j is located after node X i will be removed from T. Thus, we make sure that after performing this part, there are no rules or chains of rules from X i leading to X j in tube T. Consequently one of rules or one of chains of rules causing circularity between X i and X j are removed.In the end, the resulting rule base will be free of any form of circularly dependent rules.
In order to demonstrate effectiveness of the algorithm, we perform it for rule base shown in Figure 5.As stated above, in this rule base, rules {R 5 , R 6 , R 7 } cause circularity between nodes {X 2 , X 3 , X 4 }.After performing Detect Circularity Part 0, each tube T i cir has strands shown below.
These strands are removed from T. In order to clarify the procedure of our algorithm, we perform it for each pair of circular nodes {(X 2 , X 3 ), (X 2 , X 4 ), (X 3 , X 4 )} successively.In Detect Circularity Part 1, tubes T 23 , T 24 , and T 34 are created for these pairs of circular nodes in parallel at line 3. Tubes T 32 , T 42 , and T 43 are created for these pairs of circular nodes in parallel at Detect Circularity Part 2, line3.First consider nodes (X 2 , X 3 ).By performing Detect Circularity Part 1, In T 2 cir (T 23 ), we find X 3 located between two position of X 2 .Thus, Detect circularity part 2 is performed and X 2 is found to be located between two positions of X 3 in T 3 cir (T 32 ).Thus, there exists rules (chains of rules) acting reverse between these nodes (i.e.{R 5 , (R 6 , R 7 )}).At Detect circularity part 3, we choose X 2 and extract all strands, in which X 2 is located before X 3 from T 1 and pour them into T 12 .We remove these strands from T. Paths 2, 3, and 7 are removed from T and the remainder paths are as follows.
1: R 2 R 8 : X 1 →X 2 →X 5 5: R 3 R 9 : X 1 →X 4 →X 5 6: R 3 R 7 R 8 : X 1 →X 4 →X 2 →X 5 9: R 4 R 10 : X 1 →X 3 →X 5 10: R 4 R 6 R 9 : X 1 →X 3 →X 4 →X 5 11: R 4 R 6 R 7 R 8 : X 1 →X 3 →X 4 →X 2 →X 5 Now, we perform the algorithm for nodes (X 2 , X 4 ).We perform Detect circularity part 1 for node X 2 and in tube T 2 cir (T 24 ), we find X 4 located between two position of node X 2 besides finding (in tube T 4 cir (T 42 )) X 2 located between two position of node X 4 , in Detect circularity part 2. In Detect circularity part 3, we choose node X 2 and extract from tube T 2 all strands having X 2 located before X 4 in their sequence.These strands (if there exist any) should be removed from T. In the end, we perform the algorithm for nodes (X 3 , X 4 ).Since in Detect Circularity Part 1, we find paths in which (in tube T 3 cir (T 34 )), X 4 is located between two position of node X 3 in addition to finding paths, in which X 3 is located between two position of X 4 in tube T 4 cir (T 43 ) in Detect Circularity Part 2, we choose one of these nodes (here we choose X 4 ) and remove all strands, in which X 4 is located before X 3 .Consequently the remainder paths and the directed graph made by them are shown in Figure 6.
As it is obvious, rule R 4 is removed and circularity error is eliminated from the rule base.Consequently, there is no circle between rules in the resultant rule base.It should be noted that Detect Circularity Part 3 (in lines 5 and 6), depending on the selection of the node that is located before the other in strands, extracts strands in which selected circular node is located before the other circular node (for instance, X 2 is located before X 3 in our example).Therefore, at least one of rule chains (rules) causing circularity between circular nodes is removed and at the most, all the rule chains (rules) causing circularity between circular nodes are removed (in our example all rules {R 4 , R 5 , R 6 } at the most).This removal is dependent on the node selection in Detect Circularity Part 3. however, all pairs from circular nodes, which fulfill Detect Circularity Part 1 and Part 2, has distinct rules or chains of rules from starting nodes leading to each one of them.Thus, our algorithm does not cause incompleteness in the rule base in all cases.

Inconsistency Detection of Rule Bases
Conflicts are known conditions in the system.Thus, we define conflicting nodes as pair (X i , X j ).If there exists a physical state for which the rules resulting in conflicting nodes can be fired simultaneously, one can say that the rules are physically (practically) conflicting [4].Our Detect Conflict algorithm is performed for each pairs of conflicting nodes in parallel as follows (assuming k is the number of conflicting pair nodes).

Detect Inconsistency
For each pair of conflicting nodes (X i , X j ) in parallel, strands containing node X i are extracted from T z and poured into tube T z + ; otherwise, poured into tube T z − (line 3) .Strands containing X j are extracted from T z + and poured into Tube T z2 ; otherwise, into tube T z1 (line 4).Strands containing X j are extracted from T z − and poured into T z3 ; otherwise, into T z4 (line 5).Strands in T z1 and T z3 contain X i and X j within their sequences respectively.Strands in tube T z2 contain conflicting nodes X i and X j in their chain and are invalid.These tubes are merged into tube T r to be discarded.According to the definition of inconsistency [4], two rules resulting in inconsistent nodes (X i , X j ) (and consequently their corresponding rule paths in tubes T z1 , T z3 ) are inconsistent, if there exists a state, such that simultaneously both rules can be fired.Although, this is a potential inconsistency, such a possibility exists.These kinds of rules should be further analyzed by domain experts.If there exist a physical state for which the rules can be fired simultaneously, they should be analyzed, modified, or one of them be eliminated.Modification is carried out to arrive at pre-conditions (antecedents) for these rules that cannot be fired at the same time.It is the problem of so called conflict resolution mechanism to select a single rule to be fired [4].Conflict situations can be solved with appropriate inference control mechanism [4].If one wants to keep both of these rules, this can be done by controlling the facts and inference control and avoid generating inconsistency by selection of one of the rules (never fire the second rule if the other was fired (e.g. by priority mechanism) [4].In the event that if, it is decided to eliminate one of the rules resulting in inconsistent nodes (X i , X j ), this can be done by removing strands in one of these tubes (strands in tube T z1 or T z3 ) from tube T (based on domain experts decision).In this way, inconsistency can be eliminated and the resultant rule base will be free of inconsistency error.

Redundancy Detection of Rule Bases
According to definition of redundancy, a rule is redundant with respect to the conclusion, in the even that if two rules have identical conditions and conclusions identical rules) or two rules have identical conclusions while the condition for one rule is either a generalization or special case of the condition for the other one [3] [4].The rule, which has more general condition subsumes (is stronger than) the other rule.Suppose that we have more than one starting node in rule base and starting nodes are logically independent and none of them implies the other.For instance, consider the simple rule base and directed graph established for it in Figure 7. Assume nodes X 1 , X 2 , X 3 , and X 4 are starting nodes and X 8 is the goal node.
Complete paths from starting nodes leading to goal node are as follows.
According to the definition of redundancy, paths 1, 2, 4 are not redundant from paths 3. Thus, in order to maintain the completeness of rule base, both rule paths 1 and 3 (rules R 5 and R 8 ) are system required and should remain in rule base (since these rules are not subsumed by any other rule in this rule base).However, algorithm proposed in [1] considers these rule paths redundant from each other.In the subsequent section we propose an  It should be noted that, rules in the middle of the rule paths with atomic or compound AN does not influence the redundancy detection procedure even if there exist nodes in the middle of the paths that are inferred from distinct starting nodes.To make these statements more clear, take the rule base depicted in Figure 8(b) as an example.Assume that X 1 and X 2 are starting nodes and X 7 is the goal node.The complete rule paths for this rule base is shown in Figure 8(b).
By performing detect redundancy algorithms, rule paths (1, 2) and (3,4) are redundant from each other.By removing one of the redundant rule paths, incompleteness will not arise in the rule base.The reason is that, a system required rule that is eliminated by removing one of the paths, remains in other rule paths.For such a special case depicted in Figure 8(b), we should be careful not to eliminate redundant rule paths in a way that a system required rule be removed from the rule base, as a result of elimination of redundant rule paths considering different starting nodes (for example, rule X 3 Λ X 4 → X 7 and X 4 Λ X 3 → X 7 will be removed by elimination of rule paths 1 and 3).

Conclusions
Different techniques have been developed in order to represent rule-based systems and detect structural errors in them.Nazareth proposed an approach based on Petri nets to verify rule-based systems [13].Zhang and Nguyen proposed a tool based on Pr/T net to automatically detect potential errors in a rule-based system [23].Agarwal and Tanniru utilized incidence matrix of Petri nets for detecting structural errors in rule base [14].An approach based on hyper-graph to verify rule-based systems is proposed by Ramaswamy et al. [10], which utilizes di-rected hyper-graphs to model rule-based systems' graph and transform the hyper-graph into adjacency matrix.He et al. [6] utilized a special class of low level Petri nets (ω-nets) in order to detect the structural errors in rule based systems.In their approach, transitions were used to represent rules, and places of Petri-nets represent conditions and conclusions.Thus, rules that their firing does not result in marking of w-net are considered either redundant or circular rules.The algorithms presented in [6] do not distinguish between redundancy and circularity problem.Algorithms based on DNA computing are proposed in [1] in order to detect the four structural errors in rule-base systems.That paper presents algorithms, which are able to detect structural errors just for special cases with one starting node and one goal node.For rule base, which contains more than one starting node and goal node, structural errors are not removed correctly by utilizing these algorithm.By virtue of the special strand design that we used to encode starting nodes and goal nodes, all complete paths can be extracted from initial solution by performing our detect incompleteness algorithm just one time.
Thus, in case of multiple starting nodes and goal nodes, there is no need to repeat the algorithm.Our algorithm efficiently removes all circularity errors in chains of rules in any form that they may occur.In our approach for each pair of inconsistent nodes (X i , X j ), rule paths which lead to these nodes are extracted from T and categorized in distinct tubes in parallel.Then, these rule paths and rules resulting in inconsistent nodes should be further analyzed, modified or eliminated based on experts domain decision.If there exists a state for which these rules resulting in inconsistent nodes (X i , X j ) can be fired simultaneously, and it is decided that one of the rules resulting in these nodes should be removed, this can be done by removing strands in one of the tubes containing X i or X j based on experts domain decision.The Detect Redundancy algorithm considers starting nodes, which are logically independent and there is no implied relationship between them.Hence, two rules with the same conclusions, which are inferred under different conditions (different starting nodes) are not considered redundant from each other.And Detect Redundancy algorithm is able to detect subsumed rule paths.
Efforts utilizing traditional measures of complexity such as time and space have been made to characterize DNA computation.Most existing models determine the time complexity of DNA-based algorithms by counting the number of biological steps it take to solve the given problem.We use the strong model of DNA computation for parallel filtering models.This model considers that a basic operator actually needs a time dependent on the problem size rather than taking constant time to be carried out [16].The operation time of some of the operators utilized in this paper is presented in [16].For instance, Union (T, T 1 , …, T n ) and Copy (T, T 1 , …, T n ) take O(n) time rather than taking constant time , which n is the problem size.
Our algorithm comprises eight parts: Detect Completeness, Detect Circularity Part 0 to Part 3, and Detect Conflict, Detect Redundancy Part 1, and Detect Redundancy Part 2. We assume that the initial library (solution) is already constructed.Issues about constructing the initial library can be found in [16].Operations of our algorithm are as follows.The Detect Completeness algorithm includes 2 Extract and one Remove operations and the time it takes is from the order of O(n).Detect Circularity Part 0 includes one Copy, (q − 1) Detect, (3 * (q − 1)) Separate, (q − 1) Union, and one Remove operations and takes O(n * q) time.Detect Circularity Part 1, consists of one parallel Copy, (2 * (q − 1)) parallel Detect, (2 * (q − 1)) parallel Separate, (q − 1) parallel Union, and one Remove and takes O(n * q) time.By the same token, number of operations included in Detect Circularity Part 2, is exactly the same as Detect Circularity Part 1 and takes O(n * q) time.Detect Circularity Part 3, consists of one Copy, (q − 1) parallel Detect, (3 * (q − 1)) parallel Separate, (q − 1) parallel Union, one Remove, and one Union operations and takes O(n * q) time.Detect Conflict algorithm consists of one parallel Copy, 3 parallel extract, one parallel Union and takes O(n) time.Detect Redundancy Part 1 algorithm consists of one Copy, (K + 2) parallel Extract, and takes O(K * n) time, in which "K" is the number of tubes T Λsk .Detect Redundancy Part 2 algorithm consists of (2 * z) parallel Extract and (z) parallel detect operations and takes O(z * n) time, in which "z" denotes the maximum number of distinct sub-strands in tubes T Λsk .In the end, one Read operation is performed which takes O(1) time.
Using the strong parallel model of DNA computation, according to the above complexity analysis, the biological operations of our algorithm in the worst case is O (20q + K + 3z + 3), in "q" is the number of rules in the longest inference chain, "K" is the number of tubes {T Λs1 , …, T Λsk }, and "z" denotes the maximum number of distinct sub-strands in tubes T Λsk .If we assume that just Detect circularity Part 0 and Part 1 are performed, then the complexity of our algorithm would be O(10q + K + 3z − 1).The time complexity of our algorithm in all cases is O(n * (Max{q, K, z})).Applicability of utilizing DNA computing to verification of rule-based systems first shown in [1].But proposed algorithms were not general and there are lots of cases, in which these errors cannot be removed correctly by utilizing their algorithms.In this paper, the deficiencies of the algorithms pre-sented in [1] are outlined.We have proposed algorithms in which these deficiencies are eliminated.The proposed algorithms are able to detect the four structural errors in rule base in any forms that they may occur.Our algorithms utilize an entirely linear increase of computation and for a rule base with n nodes, the time complexity of our algorithm is O(n * (Max{q, K, z})), almost the same as the time complexity that has been achieved in [1].The number of biological operations used in our algorithm at the worst case of complexity, for which all parts of the Detect Circularity algorithm is performed is O(22q + K + 3z + 3).Our algorithm utilizes some more operations than the algorithm proposed in [1].However, this small number of added operations results in efficient performance of our algorithms for different cases and consequently, supplements DNA computing approach to verify Rule-Based Systems.In future, we plan to investigate the application of sparse Bayesian models in classification of errors in rule-bases systems [24]- [26].Additionally we plan to investigate application of system dynamics modeling in implementation and verification of rule based systems [27].

Figure 1 .
Figure 1.Annealing and denaturing and using annealing to form a long single stranded DNA.
For instance Figure 2(a) shows three distinct strands representing nodes.

Figure 2 (
b) shows two distinct strands representing "Λ" operator and two distinct strands representing "→" operator.The resulting strands are shown in Figure 2(c).

Figure 3 .
Figure 3. Initial set generation.Pre-Step 3. To process the situation that one of the compound AN of a rule is the CN of another rule, population of strands from T→ are poured into tube T c .Splints in T→ bond to the strands mentioned above and long double stranded DNAs, which are subset of rule chains are formed.Pre-Step 4. At this stage copies of "GGGG" and the strands from T s are poured into T c to form rule-chain subsets containing atomic AN.Each long strand created at this step corresponds to one set of possible inference rule paths that may contain any of typical structural errors.Pre-Step 5.At this stage, double-stranded DNAs from T c are denatured and poured into tube T. Finally, all possible rule sequences are represented in T.
{k: number of pairs from circular nodes, in which X i is the first node in them} {k: number of pairs from circular nodes defined in previous part, in which X j is the second node in them} {k: number of pairs from circular nodes obtained from