^{1}

^{*}

^{2}

^{*}

^{3}

^{*}

As rule-based systems (RBS) technology gains wider acceptance, the need to create and maintain large knowledge bases will assume greater importance. Demonstrating a rule base to be free from error remains one of the obstacles to the adoption of this technology. In the past several years, a vast body of research has been carried out in developing various graphical techniques such as utilizing Petri Nets to analyze structural errors in rule-based systems, which utilize propositional logic. Four typical errors in rule-based systems are redundancy, circularity, incompleteness, and inconsistency. Recently, a DNA-based computing approach to detect these errors has been proposed. That paper presents algorithms which are able to detect structural errors just for special cases. For a rule base, which contains multiple starting nodes and goal nodes, structural errors are not removed correctly by utilizing the algorithms proposed in that paper and algorithms lack generality. In this study algorithms mainly based on Adleman’s operations, which are able to detect structural errors, in any form that they may arise in rule base, are presented. The potential of applying our algorithm is auspicious giving the operational time complexity of O(n*(Max{q, K, z})), in which n is the number of fact clauses; q is the number of rules in the longest inference chain; K is the number of tubes containing antecedents which are comprised of distinct number of starting nodes; and z denotes the maximum number of distinct antecedents comprised of the same number of starting nodes.

Adoption of expert systems in real world applications has been greatly increased. In past years, much effort has been devoted to analyze different aspects of rule-based systems such as knowledge representation, reasoning, and verification of rule-based systems [

DNA computation has emerged in recent years as an exciting new research field at the intersection of computer science, biology, engineering, and mathematics. There exist two main barriers to the continued development of traditional silicon-based computers [

Authors in [

In this study, algorithms are developed to cope with all cases. Our algorithms have the ability to detect structural errors in any form that they may occur in rule base. As a result, DNA computing as an alternative to verify structural errors in rule-based systems gains more generality. The remainder of this paper is organized as follows. Structural errors are briefly described in Section 2. In Section 3, we outline DNA computation and introduce our DNA-based algorithms to detect structural errors in Rule-Based systems. We analyze the complexity of our algorithm and conclusion is represented in Section 4.

・ Redundancy. When unnecessary rules exist in the rule base, redundancy occurs. These rules not only increase the size of the rule base but also may cause additional useless inferences. Redundancy is a potential source of inconsistency when knowledge is updated [

・ Incompleteness. When there are missing rules in a rule base, incompleteness occurs. Except the rules for representing facts and goal nodes, a rule is called as a useless rule if the rule’s condition (conclusion) cannot be matched by other rules’ conclusion (condition). The unmatched condition are called dangling conditions, while the unmatched conclusions are called dead-end conclusions. Mostly, the reasons of useless rules are due to some missing rules.

・ Circularity. When two or several rules have circular dependency, Circularity occurs. Circular dependent rules can cause infinite reasoning and must be broken.

・ Inconsistency. Since inconsistent rules result in conflict facts, for correct functioning of an expert system, inconsistency must be resolved. Two rules r_{1} and r_{2} (that their conclusions are not compatible) are inconsistent if there exists a state, such that simultaneously both antecedents (pre-conditions) of r_{1} and r_{2} can be fired [

DNA (deoxyribonucleic acid) encodes the genetic information of cellular organisms [

In order to anneal the 5ʹ end of a single strand DNA to the 3ʹ end of another DNA, in the presence of “DNA ligase” we hybridize a set of specific splint oligos of length 20. Each splint consists of the complement of the 10 nucleotides at the 3ʹ end of one strand and the complement of 10 nucleotides at the 5ʹ end of the other.

Oligonucleotides uniquely encoding each node and splint are assigned. As proposed by Barich [

some constraints for strand library design indicating that sequences must be designed in a way that strands have little secondary structure in order to prevent unintended probe-library hybridization. Thus, short oligonucleotides uniquely encoding each node and splint must be used. Then, splints are used to join to their complement sequence. Hence, by defined polarity, short single stranded oligos which represent nodes covalently join and create longer single stranded DNA molecules. The above procedure enables us to encode each rule path, including appropriate nodes in form of long single stranded DNA.

We use the basic operations on strands that are defined by Adleman [

・ Separate (T, s, k, T_{on}, T_{off}): This operation separates strands that contain sequence “s” starting from position “k”, into T_{on}; otherwise, into T_{off} [

・ Extract (T, s, T^{+}, T^{−}): Given a tube T and a sub-strand “s”, this operation creates two new sets T^{+}, T^{−}, where T^{+} include all strands in T containing “s”, and T^{−} includes all strands in T that do not contain “s” [

・ Union (T, T_{1}, T_{2}, …, T_{n}): This operation creates set T, which is the set union of the T_{1}, T_{2}, …, T_{n} [

・ Copy (T, T_{1}, T_{2}, …, T_{n}): This operation produces copies T_{1}, T_{2}, …, T_{n} of the set T [

・ Detect (T): Given a set T, this operation returns true (Y), if T contains at least one DNA strand; Otherwise, it returns false (N) [

・ Read (T): This operation describes each DNA strand in set T [

・ Remove (T): This operation removes all strands in tube T [

For a general inference rule with compound antecedent (R:(X_{1}Λ… ΛX_{n}) → Y), each antecedent node (AN) and conclusion node (CN) are encoded by a 20-mer DNA strand. To encode the relations “Λ” and “→”, two tetra- nucleotide sequences, “AAAA” and “CCCC” (and their complements) are used respectively. Thus, by creating 24 nucleotide long splints, which contain the appropriate tetra-nucleotide, relations “X_{i} Λ X_{j}” and “X_{i} → X_{j}” are enforced [_{i} → X_{j}” is a splint whose sequence in the 3ʹ-5ʹ direction is the concatenation of the complement of 10 nucleotides at the 3ʹ-end of the strand node X_{i}, four nucleotides of “CCCC”, and the complement of 10 nucleotides at the 5ʹ-end of the strand X_{j}. Similarly, “X_{i} Λ X_{j}” is a splint whose sequence is the concatenation of complement of 10 nucleotides at the 3ʹ-end of the strand X_{i}, four nucleotides of “AAAA”, and the complement of 10 nucleotides at the 5ʹ-end of strand X_{j}. All strands representing starting nodes are designed in the way that, all of them have common sub-strand “TTTTTTTTTT” at the 5ʹ end of their strands and all strands representing the goal nodes are designed in the way that all of them contain the common sub-strand “GGGGGG GGGG” at the 3ʹ end of their strands. Thus, sequences “AAAAAAAAAA” and “CCCCCCCCCC” are needed in our algorithm to distinguish these nodes, as explained in Section 3.6.1. Finally, in order to make sure that strands representing the complements of “TTTTTTTTTT” and “GGGGGGGGGG” will bond to the starting nodes and goal nodes respectively and nowhere else, all strands representing the other nodes should be designed in the way that they do not have successive “G”s or “T”s at the 3ʹ or 5ʹ end of their strands. For instance _{r1} in order to detect redundancy and circularity [_{Λ}. In order to detect subsumed rules, special tubes T_{Λsk} are

created as described below. The antecedent of all rules with compound AN, which are comprised of k starting nodes, are poured into tube T_{Λsk}. The strands are labeled with k_{i}. For instance, T_{Λs1} is comprised of antecedents of all rules with atomic AN of starting nodes and are labeled as 1_{i} strands (i^{th} strand of tube T_{Λs1}) and T_{Λs2} is comprised of antecedents of all rules with compound AN of two starting node labeled as 2_{i} strands. Thus tubes T_{Λsk}^{c}, Comprised of many copies of complements of strands in tubes T_{Λsk} are created. These are used in Detect Redundancy algorithm described in Section 3.6.4.

An essential difficulty in all filtering models is that initial multi sets of strands generally have quantity, which is exponential in the problem size [

Pre-Step 1. Tubes T_{Λ}, T →, T_{r1}, T_{r2}, T_{r3}, T_{c}, and T_{s} are needed at this stage. Strands representing splints “XΛY” And “X → Y” are poured into tube T_{Λ} and T→ respectively. Starting nodes and goal nodes are not conclusions and antecedent parts (conditions) of any rule respectively. Thus, in order eliminate these kinds of rules and prevent circularity from occurring to starting nodes (in our sample rule base, X_{1}) and goal nodes (in our sample rule base, X_{6}), copies of strands designed as “GGGG” followed by the 10 nucleotides at the 5ʹ end of starting nodes and 10 nucleotides at 3ʹ end of the goal nodes followed by “GGGG” are poured into tube T→. Thus any splint in T→ that anneals to the above strands, is removed. We may have rules in which the goal nodes are included in compound antecedents. In order to eliminate these kinds of rules, copies of strands designed as 10 nucleotides at 3ʹ end of goal nodes followed by “TTTT” and “TTTT” followed by 10 nucleotides at the 5ʹ end of the goal nodes are poured into tube T_{Λ}. Thus, any splint in tube T_{Λ} that anneals to the above strands is eliminated (assuming Y as goal node, splints formed as X_{i} Λ Y and Y Λ X_{i} are eliminated). Next the CN of rules with compound AN are poured into tube T_{r1} and strands representing CN for each atomic AN is poured into tube T_{r2}. In order to identify which CN has more than one rule leading to it, strands in T_{r2} are poured into tube T_{r1} to make sure that T_{r1} contains the CN for all rules. Then only one copy of the complements for all CN is poured into T_{r1}. Any CN in T_{r1}that does not anneal to its complement represents CN with more than one rule leading to it and is poured intoTr3. Next copies of all AN are poured into T_{c}.

Pre-Step 2. Splints in T_{Λ} and copies of strands “TTTT” are poured into tube T_{c} and DNA ligation is allowed to occur. By means of splints, each set of compound AN would stick together and creates a double stranded DNA in length of the splint. Then single strands are separated from T_{c} to T_{s}.

Pre-Step 3. To process the situation that one of the compound AN of a rule is the CN of another rule, popula- tion of strands from T→ are poured into tube T_{c}. Splints in T→ bond to the strands mentioned above and long double stranded DNAs, which are subset of rule chains are formed.

Pre-Step 4. At this stage copies of “GGGG” and the strands from T_{s} are poured into T_{c} to form rule-chain subsets containing atomic AN. Each long strand created at this step corresponds to one set of possible inference rule paths that may contain any of typical structural errors.

Pre-Step 5. At this stage, double-stranded DNAs from T_{c} are denatured and poured into tube T. Finally, all possible rule sequences are represented in T.

In order to remove incomplete rule paths, which do not start with starting nodes or do not lead to the goal node the algorithm below is performed [

Detect completeness algorithm

1) Input (T);

2) Extract (T, “TTTTTTTTTT”, T_{y}, T_{incomp});

3) Extract (T_{y}, “GGGGGGGGGG”, T, T_{incomp});

4) Remove (T_{incomp}).

All strands containing at least one starting node in their sequence are extracted from T into tube T_{y}; otherwise, into tube T_{incomp} at line 2 (multiple starting nodes can be at the beginning of rule paths in form of compound antecedent of the first rule). This extract operation is carried out by pouring many copies of strand “AAAAAAAA AA” into tube T. This strand only anneals to single strands containing “TTTTTTTTTT”. As explained in Section 3.4, only strands representing starting nodes are designed so that all of them have this sub-strand. At line 3, strands containing goal nodes are extracted from T_{y} and poured into tube T; otherwise, into tube T_{incomp}. This extract operation is carried out by pouring many copies of strand “CCCCCCCCCC” into tube T. This strand only anneals to single strands containing “GGGGGGGGGG”. As explained in Section 3.4, only strands representing goal nodes are designed so that all of them have this sub-strand. At the end of algorithm, strands in tube T_{incomp} represent incomplete rule paths and should be removed. By performing this algorithm just one time, all complete rule-paths with different starting nodes and goal nodes are extracted. Thus, there is no need to perform the algorithm for all starting nodes and goal nodes repeatedly. Assuming that X_{i} is a starting node and Y_{i} is a goal node, it should be noted that there is no splint to complement the 10 nucleotides at the 5ʹ end of X_{i}; therefore, in all strands containing X_{i}, it has to be located in front of the strands (in case of starting nodes in form of compound AN there is no splint to complement 10 nucleotides at the 5ʹ end of first starting node). Similarly, there is no splint to complement the 10 nucleotides at the 3ʹ end of Y_{i}. Thus, in all strands containing Y_{i}, it has to be located at the end of the strand.

Algorithm proposed in [_{r3} are likely to have circularity problem. Assuming z to represent the number of nodes in T_{r3}. We modify the algorithm presented in [

Detect Circularity Part 0

Tube T_{i}^{f} is composed of strands containing any goal node at position q*24 and thus there is no need to be compared. At the end of this algorithm, all rule chains, in which node X_{i} appears at least twice in the strand are poured into tubes T_{i}^{cir}. We remove these strands from T. There are some cases that this algorithm is unable to remove circularity error and after applying the algorithm, circularity error will remain in rule base. Suppose that after performing above algorithm, X_{i} and X_{j} are found to be circular nodes and there exist at least two paths between nodes X_{i} and X_{j} or more precisely there exist two rules or chains of rules acting reverse between these two nodes (e.g. X_{i} → X_{j} and X_{j} → X_{i}) besides one or more distinct chains of rules from a starting node (starting nodes can be distinct) leading to nodes X_{i} and X_{j} in addition to distinct chains of rules from these nodes leading to the goal node. In such a situation, this algorithm is unable to remove circularity error. That is, by removing rule paths which have at least two occurrences of nodes X_{i} and X_{j}, circularity error will not be removed. In order to clarify these situations, take simple rule base shown in _{1} and X_{4} are starting and goal nodes respectively. The resulting directed graph made by these rules and all the paths starting from node X_{1} and leading to X_{4} is depicted in

By performing the “Detect Circularity Part 0” for nodes X_{2} and X_{3} strands number 5 and 6 are detected to have two occurrence of these nodes respectively. These strands are poured into tubes T_{1}^{cir} and T_{2}^{cir}. Next, we remove these strands from T. Now, if we establish the directed graph made by rules embedded in remainder paths, we see that removing paths 4 and 5 will not result in elimination of any of rules causing circularity error. That is, these rules (R_{4}: X_{2} → X_{3}, R_{5}: X_{3} → X_{2}) exist in other rule paths. Thus, this error remains and the algorithm is unable to remove it. As a matter of fact this is the case for all rule base with rules or chains of rules acting reverse between circular nodes, in addition to rules or chains of rules from a starting node leading to each one of these circular nodes and having from each circular node, paths or more precisely chains of rules, leading to the goal node. Thus, by means of these algorithm circularity error for these cases cannot be eliminated. To make these statements more clear, another instance of rule base and its corresponding directed graph is depicted in _{1}, X_{5} to be starting node and goal node respectively. As it is obvious from the directed graph of this rule base, there exists a circle in this rule base comprised of rules R_{5}, R_{6}, and R_{7}. These rules are circularly dependent. Similar to the previous section we make an attempt to remove this error by means of above algorithm. All the complete paths start from node X_{1} leading to X_{5} are as follows.

By executing the algorithm that explained above, for nodes, which are present in T_{r3} (X_{4}, X_{3}, and X_{4}), circular paths {4, 8, 12} are found in which nodes X_{2}, X_{3}, and X_{4} appear twice in these strands respectively. We take into account the remainder paths and establish the directed graph constructed by rules embedded in them (rule paths {1, 2, 3, 5, 6, 7, 9, 10, 11}). Obviously the directed graph made by these rules is the same as the directed graph of original rule base. We notice that all circularly dependent rules (R_{5}, R_{6}, R_{7}) still exist in remainder paths (and consequently in rule base) and none of them is removed. Thus, after performing the algorithm, circularity error still exists in rule base. It should be noted that, as it is obvious from our examples, in such a situation there exist complete rule paths having two occurrence of circular node X_{i}, in which circular node X_{j} is located between two position of X_{i}. And there exist complete rule paths having two occurrence of circular node X_{j} in which circular node X_{i} is located between two position of X_{j} (in our example paths 4, 8, and 12 have this property for nodes X_{4}, X_{3}, and X_{4}).

In this section we propose an algorithm, which can perfectly remove circularity errors. This algorithm comprises three parts performed for each pair from circular nodes (X_{i}, X_{j}) that has been found in the previous

algorithm. It should be noted that all paths at this stage start from starting nodes and there are no rules in which starting nodes are inferred from them. Thus, in performing different parts of our Detect Circularity algorithm, we don’t need to check position 0 to 24 of the paths. That is, in all parts of Detect Circularity algorithm, q is initially equal to one (q = 1). First, Detect Circularity Part 0 is performed. This algorithm results in separating paths, in which node X_{i} appears at least in two positions (i.e. circular nodes are found) in the strands and pouring them into tube Ticir. Next, for instance, if we assume that three circular nodes X_{2}, X_{3}, and X_{4} are found, three pairs (X_{2}, X_{3}), (X_{2}, X_{4}), and (X_{3}, X_{4}) should be analyzed in subsequent parts of our algorithms. Using k to represent the number of pairs from circular nodes other parts ofour algorithm is represented in

Detect Circularity Part 1: At this part of our algorithm for each pair from circular nodes, the algorithm is performed in parallel as follows. Three extra tubes (T_{ij}^{a}, T_{ij}^{b}, T_{ij}^{c}) are necessary for each pair (X_{i}, X_{j}). Initially, k copies of T_{i} cir is created as tubes T_{ij} in parallel. Lines 4 to 9 are carried out until there is no strand in tube T_{ij}. At line 6, strands from T_{ij} having node X_{i} at position q*24 are extracted and poured into tube T_{ij}^{a}. At line 7, strands from T_{ij}^{b} that all of them has occurrence of X_{i} are extracted and poured into tube T_{ij}^{c} if they have X_{j} located after the first position of X_{i}. Strands in T_{ij}^{c} represent paths having X_{j} located between first and last position of X_{i} (strands in T_{ij} include at least two occurrence of node X_{i}). Within the q^{th} iteration of the algorithm, line 5 checks whether T_{ij}^{c} contains any strands. If it is so, the algorithm makes sure that there exist paths that X_{j} is located between first and last position of node X_{i} and part one of the algorithm finishes; otherwise, it will continue until tube T_{ij} contains no strands. If tube T_{ij}^{c} contains no strands. After performing this part, it means that there is no rules or chains of rules between these two circular nodes acting reverse (which cause circularity still remains in the rule base after performing Detect Circularity Part 0). Thus by removing strands in T_{i}^{cir} and T_{j}^{cir} from tube T, the remainder paths contain no circularity dependent rules caused by these two nodes. And performing Detect Circularity part 0 is enough in order to remove circularity. The next part of the algorithm is performed. If tube T_{ij}^{c} contains any strand. The second part of the algorithm is performed in parallel as follows.

Detect Circularity Part 2: At this part of our algorithm for each pair from circular nodes, the algorithm is performed in parallel as follows. Three extra tubes (T_{ji}^{a}, T_{ji}^{b}, T_{ji}^{c}) are necessary for each pair (X_{i}, X_{j}). Initially, k copies of T_{j}^{cir} is created as tubes T_{ji} in parallel. In this part, between lines 4 to 9, for all pairs of circular nodes in parallel, if there exist a strand in T_{ji}, which has sub-strand representing X_{i} located between two position of sub- strand representing X_{j}, is poured into tube T_{ji}^{c}. There is no rules or chains of rules acting reverse between circular nodes X_{i} and X_{j} If there is no strand in T_{ji}^{c}. In this situation, by removing strands in tubes T_{i}^{cir} and T_{j}^{cirfrom} T, we will be sure that there is no circularity error considering nodes X_{i} and X_{j}; otherwise, we put into practice the third part of our algorithm for each pair from circular nodes, which both previous parts has been fulfilled for them as follows.

{k: number of pairs from circular nodes, in which X_{i} is the first node in them} | {k: number of pairs from circular nodes defined in previous part, in which X_{j} is the second node in them} | {k: number of pairs from circular nodes obtained from previous parts} |
---|---|---|

Detect Circularity Part 1 | Detect Circularity Part 2 | Detect Circularity Part 3 |

Detect Circularity Part 3: At this part of the algorithm four extra tubes are necessary for each pair (X_{i}, X_{j}) from circular nodes. Initially, k copies of T are generated as tubes T_{i}. At the final part of our Detect Circularity algorithm for each pair from circular nodes (X_{i}, X_{j}), X_{i} or X_{j} is selected (here X_{i} is chosen). Then between lines 4 to 9, all paths in which node X_{j} is located after X_{i}are extracted from T_{i} and poured into T_{i2} in parallel. At the end of this part of our algorithm, tubes T_{i2} are merged into tube T^{2}. We then extract strands in T^{2} from tube T. Thus, all complete paths from T, in which node X_{j} is located after node X_{i} will be removed from T. Thus, we make sure that after performing this part, there are no rules or chains of rules from X_{i} leading to X_{j} in tube T. Consequently one of rules or one of chains of rules causing circularity between X_{i} and X_{j} are removed. In the end, the resulting rule base will be free of any form of circularly dependent rules.

In order to demonstrate effectiveness of the algorithm, we perform it for rule base shown in _{5}, R_{6}, R_{7}} cause circularity between nodes {X_{2}, X_{3}, X_{4}}. After performing Detect Circularity Part 0, each tube T_{i}^{cir} has strands shown below.

T_{2}^{cir} = {R_{1}R_{4}R_{5}R_{6}R_{7}} T_{3}^{cir} = {R_{3}R_{5}R_{6}R_{4}R_{9}} T_{4}^{cir} = {R_{2}R_{6}R_{4}R_{5}R_{8}}

These strands are removed from T. In order to clarify the procedure of our algorithm, we perform it for each pair of circular nodes {(X_{2}, X_{3}), (X_{2}, X_{4}), (X_{3}, X_{4})} successively. In Detect Circularity Part 1, tubes T_{23}, T_{24}, and T_{34} are created for these pairs of circular nodes in parallel at line 3. Tubes T_{32}, T_{42}, and T_{43} are created for these pairs of circular nodes in parallel at Detect Circularity Part 2, line3. First consider nodes (X_{2}, X_{3}). By performing Detect Circularity Part 1, In T_{2}^{cir}(T_{23}), we find X_{3}located between two position of X_{2}. Thus, Detect circularity part 2 is performed and X_{2} is found to be located between two positions of X_{3} in T_{3}^{cir}(T_{32}). Thus, there exists rules (chains of rules) acting reverse between these nodes (i.e. {R_{5}, (R_{6}, R_{7})}). At Detect circularity part 3, we choose X_{2} and extract all strands, in which X_{2} is located before X_{3} from T_{1} and pour them into T_{12}. We remove these strands from T. Paths 2, 3, and 7 are removed from T and the remainder paths are as follows.

1: R_{2}R_{8}: X_{1}→X_{2}→X_{5}_{ }

5: R_{3}R_{9}: X_{1}→X_{4}→X_{5}_{ }

6: R_{3}R_{7}R_{8}: X_{1}→X_{4}→X_{2}→X_{5}

9: R_{4}R_{10}: X_{1}→X_{3}→X_{5}_{ }

10: R_{4}R_{6}R_{9}: X_{1}→X_{3}→X_{4}→X_{5}_{ }

11: R_{4}R_{6}R_{7}R_{8}: X_{1}→X_{3}→X_{4}→X_{2}→X_{5}

Now, we perform the algorithm for nodes (X_{2}, X_{4}). We perform Detect circularity part 1 for node X_{2} and in tube T_{2}^{cir}(T_{24}), we find X_{4}located between two position of node X_{2} besides finding (in tube T_{4}^{cir}(T_{42})) X_{2} located between two position of node X_{4}, in Detect circularity part 2. In Detect circularity part 3, we choose node X_{2} and extract from tube T_{2} all strands having X_{2} located before X_{4} in their sequence. These strands (if there exist any) should be removed from T. In the end, we perform the algorithm for nodes (X_{3}, X_{4}). Since in Detect Circularity Part 1, we find paths in which (in tube T_{3}^{cir}(T_{34})), X_{4} is located between two position of node X_{3} in addition to finding paths, in which X_{3} is located between two position of X_{4} in tube T_{4}^{cir}(T_{43}) in Detect Circularity Part 2, we choose one of these nodes (here we choose X_{4}) and remove all strands, in which X_{4} is located before X_{3}. Consequently the remainder paths and the directed graph made by them are shown in

As it is obvious, rule R_{4} is removed and circularity error is eliminated from the rule base. Consequently, there is no circle between rules in the resultant rule base. It should be noted that Detect Circularity Part 3 (in lines 5 and 6), depending on the selection of the node that is located before the other in strands, extracts strands in which selected circular node is located before the other circular node (for instance, X_{2} is located before X_{3} in our example). Therefore, at least one of rule chains (rules) causing circularity between circular nodes is removed and

at the most, all the rule chains (rules) causing circularity between circular nodes are removed (in our example all rules {R_{4}, R_{5}, R_{6}} at the most). This removal is dependent on the node selection in Detect Circularity Part 3. however, all pairs from circular nodes, which fulfill Detect Circularity Part 1 and Part 2, has distinct rules or chains of rules from starting nodes leading to each one of them. Thus, our algorithm does not cause incompleteness in the rule base in all cases.

Conflicts are known conditions in the system. Thus, we define conflicting nodes as pair (X_{i}, X_{j}). If there exists a physical state for which the rules resulting in conflicting nodes can be fired simultaneously, one can say that the rules are physically (practically) conflicting [

Detect Inconsistency

For each pair of conflicting nodes (X_{i}, X_{j}) in parallel, strands containing node X_{i} are extracted from T_{z} and poured into tube T_{z}^{+}; otherwise, poured into tube T_{z}^{−} (line 3) .Strands containing X_{j} are extracted from T_{z}^{+} and poured into Tube T_{z2}; otherwise, into tube T_{z1} (line 4). Strands containing X_{j} are extracted from T_{z}^{−} and poured into T_{z3}; otherwise, into T_{z4} (line 5). Strands in T_{z1} and T_{z3} contain X_{i} and X_{j} within their sequences respectively. Strands in tube T_{z2} contain conflicting nodes X_{i} and X_{j} in their chain and are invalid. These tubes are merged into tube T_{r} to be discarded. According to the definition of inconsistency [_{i}, X_{j}) (and consequently their corresponding rule paths in tubes T_{z1}, T_{z3}) are inconsistent, if there exists a state, such that simultaneously both rules can be fired. Although, this is a potential inconsistency, such a possibility exists. These kinds of rules should be further analyzed by domain experts. If there exist a physical state for which the rules can be fired simultaneously, they should be analyzed, modified, or one of them be eliminated. Modification is carried out to arrive at pre-conditions (antecedents) for these rules that cannot be fired at the same time. It is the problem of so called conflict resolution mechanism to select a single rule to be fired [_{i}, X_{j}), this can be done by removing strands in one of these tubes (strands in tube T_{z1} or T_{z3}) from tube T (based on domain experts decision). In this way, inconsistency can be eliminated and the resultant rule base will be free of inconsistency error.

According to definition of redundancy, a rule is redundant with respect to the conclusion, in the even that if two rules have identical conditions and conclusions identical rules) or two rules have identical conclusions while the condition for one rule is either a generalization or special case of the condition for the other one [_{1}, X_{2}, X_{3}, and X_{4} are starting nodes and X_{8} is the goal node.

Complete paths from starting nodes leading to goal node are as follows.

According to the definition of redundancy, paths 1, 2, 4 are not redundant from paths 3. Thus, in order to maintain the completeness of rule base, both rule paths 1 and 3 (rules R_{5} and R_{8}) are system required and should remain in rule base (since these rules are not subsumed by any other rule in this rule base). However, algorithm proposed in [

algorithm to detect redundancy in which we have taken into account starting nodes and assumed independency of starting nodes in detecting redundancy error. In our algorithm we will find redundant rules in forms of identical and subsumed rules as follows. The detect redundancy algorithm is represented in

Initially, z copies of tube T are generated as tubes T_{i}. At line 2, in parallel for each redundant node, we extract all strands containing redundant node X_{i}. Next strands containing “→X_{i}” are extracted from T_{i}^{+} and poured into tube T_{i}^{Red}; otherwise, into tube T_{i}^{Req} at line 3. It should be noted that if a rule path contains the redundant node X_{i}, but it lacks a node that directly infers X_{i}, then X_{i} must be part of a compound AN in the rule path. Thus, this rule path needs to exist when considering redundant node X_{i} and is poured into tube T_{i}^{Req} (line 3). Next, we categorize strands in tube T_{i}^{Red} in terms of length of the antecedent of the first rules in strands. To do so, many copies of strands representing complements of strands in tube T_{Λsk} are poured into tube T_{i}^{Red}. First, for correct extraction of strands, this line is carried out for tube which contains paths with longest AN of first rules and is repeated until the last tube which contains paths with shortest AN of first rules. Thus, any strand in tube T_{i}^{Red} that anneals to the above strands are extracted and poured into tube T_{ik}^{Red} (tube T_{Λsk} is comprised of strands representing k starting nodes in conjunction form). Thus, the AN of first rules of strands in tube T_{ik}^{Red} is comprised of k starting nodes, if there exist any strand of this form.

Strands in each tube T_{ik}^{Red}, which the AN of first rules of them, are the same (or permutation of one another) are redundant from each other (antecedent of the first rule in all strands in tube T_{ik}^{Red} have k starting nodes). Other strands in tubes T_{ik}^{Red}, represent rule paths with distinct starting nodes, or strands in which at least one of starting nodes of the first rules are distinct from those of other rule paths. These strands are not redundant from each other.

In order to determine subsumed rules, we compare strands in each tube T_{ik}^{Red} with strands in all other tubes {T_{ij}^{Red},…} for all j > k, generated at Detect Redundancy Part 1. Our Detect Redundancy Algorithm Part 2 is described below. For each tube T_{ik}^{Red}, this algorithm is performed in parallel as follows. Each strand in tube T_{Λsk} is represented by k_{i}. Thus, k_{z} denotes strand number z of tube T_{Λsk} (explained in Section 3.4). At line 3, strands containing k_{z} are extracted from T_{ik} Red and poured into tube T_{ik}^{+}. At line 4, If there exists any strand in tube T_{ik}^{+}, the first rule of this strand subsumes all rule paths in tubes T_{ij}^{Red} (j > k), which contain kz. Thus, all rule paths containing kz are extracted from all T_{ij}^{Red} (j>k) in parallel and poured into tube T_{ij}R^{+} (line 5). Consequently all strands in tubes T_{ij}R^{+}, are subsumed rule paths. In order to show the effectiveness of our algorithm, we perform it for the rule base depicted in _{6} has more than one rule leading to it thus it is candidate for redundancy error. Paths {1, 2, 3, 4} contain sub-strand “→X_{6}” and are extracted from T6 and poured into T_{6}^{Red} at Detect Redundancy Part 1. Next, strands in tube T_{6}^{Red}, which contain, strands in tube T_{Λs3} (i.e. X_{1} Λ X_{2} Λ X_{4}) are extracted from T_{6}^{Red} and poured into tube T_{63}^{Red}. At next iteration of while-loop, strands in tube T_{6}^{Red}, which contain sub-strands in tube T_{Λs2} (i.e. X_{1} Λ X_{2} and X_{2} Λ X_{3}) are extracted from T_{6}^{Red} and poured into tube T_{62}^{Red}. Finally, strands containing the sub-strand in tube T_{Λs1} (i.e. X_{1}) are extracted and poured into tube T_{61}^{Red}. At Detect Redundancy Part 2, lines 3 to 5 is performed in parallel for all tubes T_{61}^{Red}, T_{62}^{Red}, T_{63}^{Red}. We describe the process for tube T_{61}^{Red}. At line 3, strands in tube T_{61}^{Red}, which contains sub-strand in tube T_{Λs1} (X_{1}) are extracted from T_{61}^{Red} and poured into tube T_{61}^{+}. Since tube T_{61}^{+} is not empty, line 5 of the algorithm is performed and strands containing Sub-strand X1 are extracted from T_{62}^{Red} and T_{63}^{Red} and poured into tubes T_{62}R^{+}, T_{63}R^{+} respectively. Finally, strands in tubes T_{62}R^{+} and T_{63}R^{+} are redundant (subsumed) rule paths. It should be noted that the algorithm is performed for all tubes T_{ik}^{Red} (in our example, T_{61}^{Red}, T_{62}^{Red}, and T_{63}^{Red}). Thus, between lines 2 to 5, the algorithm will find other redundant rule paths by checking tubes T_{62}^{Red} and T_{63}^{Red}, if there exists any. The process of our algorithm for the example explained above is depicted in

{k = number of last tube from tubes T_{Λsk}, arranged in ascending order of k} | |
---|---|

Detect Redundancy Algorithm Part 1 | Detect Redundancy Algorithm Part 2 |

It should be noted that, rules in the middle of the rule paths with atomic or compound AN does not influence the redundancy detection procedure even if there exist nodes in the middle of the paths that are inferred from distinct starting nodes. To make these statements more clear, take the rule base depicted in _{1} and X_{2} are starting nodes and X_{7}is the goal node. The complete rule paths for this rule base is shown in

By performing detect redundancy algorithms, rule paths (1, 2) and (3, 4) are redundant from each other. By removing one of the redundant rule paths, incompleteness will not arise in the rule base. The reason is that, a system required rule that is eliminated by removing one of the paths, remains in other rule paths. For such a special case depicted in _{3} Λ X_{4} → X_{7} and X_{4} Λ X_{3} → X_{7} will be removed by elimination of rule paths 1 and 3).

Different techniques have been developed in order to represent rule-based systems and detect structural errors in them. Nazareth proposed an approach based on Petri nets to verify rule-based systems [

Thus, in case of multiple starting nodes and goal nodes, there is no need to repeat the algorithm. Our algorithm efficiently removes all circularity errors in chains of rules in any form that they may occur. In our approach for each pair of inconsistent nodes (X_{i}, X_{j}), rule paths which lead to these nodes are extracted from T and categorized in distinct tubes in parallel. Then, these rule paths and rules resulting in inconsistent nodes should be further analyzed, modified or eliminated based on experts domain decision. If there exists a state for which these rules resulting in inconsistent nodes (X_{i}, X_{j}) can be fired simultaneously, and it is decided that one of the rules resulting in these nodes should be removed, this can be done by removing strands in one of the tubes containing X_{i} or X_{j} based on experts domain decision. The Detect Redundancy algorithm considers starting nodes, which are logically independent and there is no implied relationship between them. Hence, two rules with the same conclusions, which are inferred under different conditions (different starting nodes) are not considered redundant from each other. And Detect Redundancy algorithm is able to detect subsumed rule paths.

Efforts utilizing traditional measures of complexity such as time and space have been made to characterize DNA computation. Most existing models determine the time complexity of DNA-based algorithms by counting the number of biological steps it take to solve the given problem. We use the strong model of DNA computation for parallel filtering models. This model considers that a basic operator actually needs a time dependent on the problem size rather than taking constant time to be carried out [_{1}, …, T_{n}) and Copy (T, T_{1}, …, T_{n}) take O(n) time rather than taking constant time , which n is the problem size.

Our algorithm comprises eight parts: Detect Completeness, Detect Circularity Part 0 to Part 3, and Detect Conflict, Detect Redundancy Part 1, and Detect Redundancy Part 2. We assume that the initial library (solution) is already constructed. Issues about constructing the initial library can be found in [_{Λsk}. Detect Redundancy Part 2 algorithm consists of (2*z) parallel Extract and (z) parallel detect operations and takes O(z*n) time, in which “z” denotes the maximum number of distinct sub-strands in tubes T_{Λsk}. In the end, one Read operation is performed which takes O(1) time.

Using the strong parallel model of DNA computation, according to the above complexity analysis, the biological operations of our algorithm in the worst case is O (20q + K + 3z + 3), in “q” is the number of rules in the longest inference chain, “K” is the number of tubes {T_{Λs1}, …, T_{Λsk}}, and “z” denotes the maximum number of distinct sub-strands in tubes T_{Λsk}. If we assume that just Detect circularity Part 0 and Part 1 are performed, then the complexity of our algorithm would be O(10q + K + 3z − 1). The time complexity of our algorithm in all cases is O(n*(Max{q, K, z})). Applicability of utilizing DNA computing to verification of rule-based systems first shown in [