Applying surface-based DNA computing for solving the dominating set problem

The surface-based DNA computing is one of the methods of DNA computing which uses DNA strands immobilized on a solid surface. In this paper, we applied surface-based DNA computing for solving the dominating set problem. At first step, surface-based DNA solution space was constructed by using appropriate DNA strands. Then, by application of a DNA parallel algorithm, dominating set problem was resolved in polynomial time.


INTRODUCTION
DNA molecules are genetic materials of organisms which are located in the cell nucleus.The unique and specific structure of DNA makes it one of the favorite candidates for computing purposes.In comparison with traditional electronics-based computers, DNA computers have massively parallel nature.While, a single DNA molecule can only carry out computation slowly, DNA computers can perform a very large and staggering number of calculations simultaneously.A DNA computer can perform 10 ^9 calculations per mL of DNA per second.
DNA computing was initially developed by Leonard Adleman in 1994 [1].Adleman succeeded in solving seven-point Hamiltonian path problem solely by manipulating DNA molecules and suggested that DNA could be used to solve complex mathematical problems.
Surface-based DNA computing was introduced by Liu et al. in 1996 [2].This model uses DNA molecules attached to a solid surface, instead of DNA molecules floating in a solution.This method greatly reduces losses of DNA molecules during different steps of computation.
In this paper, we applied the surface-based model for solving the dominating set problem which is one of the NP-complete problems.Dominating set problem is widely used in network routing, city planning, designing and construction of health services in appropriate places.The paper is organized as follows.Section 2 introduces the DNA structure and various DNA computing models and discuss about surface-based DNA computing and biological operations which are used in surfacebased model.Section 3 introduces a DNA based algorithm for solving the dominating set problem in surface-based model.DNA computing was initially developed by Leonard Adleman in 1994 [1].Adleman resolved an instance of Hamiltonian path problem just by handling the DNA molecules.In 1995, Lipton [3] presented a method for solving the satisfiability (SAT) problem.Adleman-Lipton model can be used to solve different NP-complete problems.In Adleman-Lipton model, DNA splints are used for construction of solution space.Adleman [4,5] also presented a molecular algorithm for solving the 3-coloring problem.Chang and Guo [6][7][8] showed that the DNA operations in Adelman-Lipton model could be used for developing DNA algorithms to resolve the dominating set problem, the vertex cover problem, the maximal clique problem and independent set problem.

BASICS OF DNA COMPUTING
The surface-based model was introduced by Liu et al. in 1996 [2].This model uses DNA molecules attached to a solid surface, instead of DNA molecules floating in a solution.Liu et al. also proposed a surface-based DNA algorithm for solving the satisfiability problem.
In 1996, Roweis et al. [9] introduced the Sticker based DNA computing model and applied it in solving the Minimal Set Cover problem.Perez-Jimenez and Sanchocaparrini [10] used Sticker based DNA computing to resolve knapsack problem, and this model also were applied for breaking the Data Encryption Standard (DES) [11,12].In our previous work, we also applied sticker model for solving the independent set problem [13].
Other than Adleman-Lipton, surface-based and Sticker based models, other various models are also proposed in DNA computing by researchers.Quyang et al. [14] solved the maximal clique problem using DNA molecules and Restriction endonuclease enzymes.Amos et al. [15,16] described a DNA computation model using restriction endonuclease enzymes instead of successive cycles of separation by DNA hybridization, which can reduce the error-rate of computation.Hagiya et al. [17] proposed a new method of DNA computing that involves a selfacting DNA molecule containing both the input, program, and working memory.In this method, a single-stranded DNA molecule consists of an input segment on the 5'-end, followed by a formula (program) segment, followed by a spacer, and finally with a "head" on the 3'-end that moves and performs the computation.Another method for DNA computation is "computation by self-assembly".Eric winfree et al. [18][19][20] introduced a linear and 2-dimentional self-assembly model.
The computing by blocking was introduced by Rozenberg et al. [21] This model uses a novel approach to filter the DNA molecules: Instead of separating the DNA strands to distinct tubes, or destroy and removing the DNA molecules that does not contribute to finding a solution, it blocks (inactivates) them in a way that the blocked strands can be considered as non-existent during the subsequent steps of computation.

General Aspects of Surface-Based Model
The surface-based model was introduced by Liu et al. in 1996 [2].This model uses DNA molecules attached to a solid surface, instead of DNA molecules floating in a solution.The solution set of DNA strands is initially attached to a surface (glass, silicon, gold, for example).The immobilized DNA strands are then subjected to biological operations such as hybridization or exonuclease degradation, in order to extract the desired DNA strands.This model greatly reduces losses of DNA molecules during different steps of computation.Briefly, the basic operations in surface-based model are as follows: selectively mark strands, destroy either marked or unmarked strands, and unmark all marked strands.Another feature of surface-based model is the use of single-base mismatch discrimination in hybridization as a basis for selectively marking DNA strands, which allows obtaining a high density of information per nucleotide.

Biological Operations in Surface-Based Model
For simplicity, let's consider that the solution space be the set S of binary strands of length n.The following operations may be performed on S [2].
1) Mark (i, b): this marks all strings in which the i th bit has value b.This operation is performed by annealing specific probes to desired DNA strands.3) Destroy-marked: removes all marked strands (double stranded DNA molecules) from solution space.This is performed by specific enzymes which selectively destroy double stranded DNA molecules.
4) Destroy-unmarked: removes all unmarked strands (single stranded DNA molecules) from solution space.This is performed by specific enzymes which selectively destroy single stranded DNA molecules.
5) Unmark: this unmarks all marked strands in solution space (dissociate probes from immobilized DNA strands and converts double stranded molecules to single stranded DNA molecules).
6) Test-if-empty: this operation determines whether the set S is empty or not.It is usually executed at the end of computation.

Designing of Appropriate DNA Strands and Construction of Solution Space
First of all, it is essential to represent all possible binary strings of length n as DNA strands.In order to synthesis DNA strands attached to solid surface, a desired DNA molecule is synthesized nucleotide by nucleotide on a support particle in sequential coupling steps.By application of this method, we can produce combinatorial sets of molecules by using mixture of nucleotides at each coupling step.For example, if two nucleotides are used together in four coupling steps, 16 different DNA strands will be produced on solid support.
In this article, we use one base to represent one bit of the binary strings, while keeping the GC content of the string constant (about 50%).For example, we can use A or T in half of the positions to represent 0 and 1 respectively, and G or C in the remaining positions to represent 0 and 1 respectively.This is an important rule in designing of DNA strands, because GC content has a very strong effect on DNA hybridization reactions.In addition, all DNA strands immobilized on solid surface have markers at each end; these will be used as binding sites for primers of PCR reaction.The length of binding sites of primers is about 20 nucleotides.PCR reactions will be used for amplifying the desired DNA molecules.

In Vitro Implementation of Operations
We now describe how each of the operations of surfacebased model can be performed on surfaces.
Mark (i, b): in order to marking the DNA strands which represent binary strings in which the i th bit is b, first, the DNA probes that are complementary to the mentioned strands are synthesized.Then, each of these probes hybridizes to its complement DNA strand on the surface.Thus, the marked DNA strands will be doublestranded whereas unmarked strands remain singlestranded.
In this method, single-base mismatch is used for discrimination of marked and unmarked strands, and it was determined that excellent discrimination based on singlebase mismatch is obtained using 15 mer sequences.
Destroy-marked, destroy-unmarked: Either marked (double-stranded) or unmarked (single-stranded) DNA molecules may be selectively destroyed by using exonuclease enzymes.
Unmark: This operation is performed simply by washing the surface in distilled water.Distilled water is low tonic solution which denature double-stranded DNA molecules and leads probes to dissociate from immobilized DNA strings and are washed away, leaving only the original single-stranded DNA attached to the surface.
Test-if-empty: As mentioned before, this operation is usually executed at the end of computation.In final step, the remaining DNA molecules may be marked (doublestranded) or unmarked (single-stranded).If remaining DNA molecules are unmarked (single-stranded), we can cleave them from the surface, and amplify them by using PCR and detect if there is any product as a result.But, if the remaining DNA molecules on the surface are marked (double-stranded), we should convert them to unmarked strands by removing complementary probes from immobilized DNA strands, and then, cleave them from the surface, amplify by using PCR and detect if there is any product as result.

Definition of the Dominating Set Problem
In graph theory, a dominating set of a graph G = (V, E), where V is the set of the vertices and E is the set of the edges, is a subset The size of a dominating set is the number of vertices it contains.The dominating set problem is to find a minimum size dominating set in G.The dominating set problem has been proved to be a NP-complete problem.For example, the graph in Figure 2 includes 7 vertices and 6 edges.
It is clear that the minimum size dominating set for our graph is {V 4 , V 5 }, furthermore, the size of the dominating set problem in our graph is 2.

Construction of the Surface-Based Solution Space for Dominating Set Problem
First of all, it is essential to generate the surface-based DNA solution space of our problem.Then, basic biological operations will be used to select legal strands and remove illegal strands from the solution space.It is obvious that a graph with N vertices has 2 N subset of vertices or 2 N possible dominating sets.Furthermore, each possible dominating set can be represented by an N-digit binary number.Also suppose that V 1 is a dominating set of G.If the i th bit in an N-digit binary number is set to 1, it represents that the i th vertex is in If the i th bit in an N-digit binary number is set to 0, it represents that the i th vertex is not in Our graph has 7 vertices and 128 possible dominating sets.Furthermore, the solution space has 128 distinct molecules, which have designed and synthesized as shown in Figure 3.The length of these DNA molecules is 55 nucleotides, consisting of a unique 20 nucleotides sequence at each end (binding sites for PCR primers), and a 15 nucleotides hybridization sequence in the middle.As discussed before, excellent specificity and discrimination based on single-base mismatch is obtained using 15 mer sequences, for this reason, we considered 15 nucleotides for hybridization sequence.The graph of our problem have 7 vertices, thus, the 7 central nucleotides in the 15 mer hybridization sequence were synthesized as a combinatorial set with two possibilities at each position: , representing 2 7 = 128 distinct molecules in solution space.The adjacent 4 nucleotides at the two ends of the hybridization sequence have unique and constant sequences in order to limit the size of the solution space to 128 molecules.

OPEN ACCESS
1) Prepare solution space by designing and synthesis of appropriate DNA strands which are immobilized on a surface.

DNA Algorithm for Solving the Dominating Set Problem
The following algorithm is proposed for solving the dominating set problem: 2) For i = 1 to n, where n is the number of vertices in the graph G.
a) Mark (i, 1); b) For each vertex V j which have adjacency to V i ; c) Mark (j, 1); d) Destroy-unmarked; e) Unmark.
3) Cleave remaining immobilized DNA strings from surface.
4) Amplify by PCR.5) Input DNA molecules to tube T 0 .6) For i = 0 to n -1 For j = i down to 0 Separate (T j , i + 1) → (T (j+1)' , T j ) Combine (T j+1 , T j+1 , T (j+1)' ) 7) Read T 1 ; else if it was empty then: Read T 2 ; else if it was empty then: Read T 3 ; else if it was empty then: . . .Read T n-1 ; else if it was empty then: Read T n .According to the steps in the algorithm, the dominat-ing set problem can be resolved by surface-based DNA computation in polynomial time.
Step 2 of the algorithm is executed n times (n is 7 in our graph).Step 2a, marks the DNA strands which contain the vertex V i and the DNA strands which do not contain the vertex V i remain unmark.From the definition of dominating set, the unmarked DNA strands represent sets V − V 1 , which do not contain the vertex V i .If there is no vertex adjacent to V i , then step 2d will destroy the unmarked DNA strands.Otherwise, step 2c will be executed z times, where z is the number of vertices adjacent (directly connected by an edge) to V i .Each time step 2c is executed, it marks DNA strands which contain V j (subsets which contain vertices that have adjacency to V i ).Furthermore, the remaining unmarked DNA strands consist of all of the strands which do not contain V i and V j , or we can say it contains vertices which do not have adjacency to V i .Thus, the unmarked DNA strands are illegal strands and should be destroyed.For all vertices, similar processing is also performed, therefore, at the end of step 2, illegal strands will be destroyed and only legal strands (representing dominating sets) will be remain in solution space.
During steps 3 to 5, legal DNA strands which remain on surface at the end of step 2, are cleaved from surface, amplified by using PCR, and finally, all of them are transferred to tube T 0 .
In step 6 of the algorithm, we applied 2 operations: separation and combination.Here, we briefly discuss about these operations: By the execution of step 6, the DNA strands which represent the  subset are placed in tube T 0 , the DNA strands represent the subsets which contain only one vertex are placed in tube T 1 , the DNA strands represent the subsets which contain two vertices are placed in tube T 2 , the DNA strands represent the subsets which contain 3 vertices are placed in tube T 3 , and so on.
In step 7, all of the tubes (from T 1 to T n ) are evaluated for presence of DNA strands , and the first tube which is not empty and contains DNA strands represent minimum size dominating set.In our example, tube T 1 is empty and devoid of any DNA strands.The first tube which contains DNA molecules is tube T 2 which represent the subset {V 4 , V 5 }.Hence, the minimum size dominating set in our graph is 2.

CONCLUSIONS
In this paper, the surface-based DNA computing was used for solving the dominating set problem.This method could be used for solving other NP-complete problems.
The loss of DNA molecules is one of the major problems in Adleman-Lipton and sticker models, but the surface-based model greatly reduces losses of DNA molecules during different steps of computation.One of the major limitations of this model is that the length of hybridization sequence is restricted to 15 nucleotides, because selectively marking DNA strands is based on single-base mismatch discrimination in hybridization.For solving the large scale NP-complete problems, it is essential to design the oligos larger than 15 nucleotides.
Finally, for improving the efficiency and capabilities of surface-based model, other operations should be added to this model.

2. 1 .
Structure of DNA and DNA Computing Models DNA consists of two long polymers of simple units called nucleotides.Nucleotides are building blocks of DNA and each of them contains three components: sugar, phosphate group and nitrogenous base.There are four different nitrogenous bases which contribute in DNA structure: Thymine(T) and Cytosine(C) which are called pyrimidines; Adenine(A) and Guanine(G) which are called purines.The nucleotides are link together by phosphordiester bonds and form single stranded DNA (ssDNA).Two ssDNA molecules join together to form double stranded DNA (DsDNA) based on complementary rule: "A" always pairs with "T", and likewise "C" pairs with "G".In Figure1, a schematic picture of nucleotide is shown.

2 )
Mark ((i 1 , b 1 ), (i 2 , b 2 ), ••, (i k , b k )): this is an extension of mark (i, b) which marks a desired string based on the values of multiple bits.This operation is also performed by annealing specific probes to desired DNA strands.

V 1 Figure 2 .
Figure 2. The graph of our problem.

V
Separate (T a , i)  (T b , T c ), this operation creates two new tubes T b and T c , T b contains the DNA molecules having the i th bit value 1 (T b = +(T a , i)) and T c contains the DNA molecules having the i th bit value 0 (T c = -(T a, i)).Combine (T a , T b , T c ): the DNA molecules from the tubes T b and T c are combined to form a new tube T a , simply the contents of T b and T c are poured to tube T a .(T a = T b ∪ T c ).

Figure 3 .
Figure 3. Representation of the solution space for our problem.