Cryptanalysis of a Substitution-Permutation Network Using Gene Assembly in Ciliates

In this paper we provide a novel approach for breaking a significant class of block ciphers, the so-called SPN ciphers, using the process of gene assembly in ciliates. Our proposed scheme utilizes, for the first time, the Turing-powerful potential of gene assembly procedure of ciliated protozoa into the real world computations and has a fewer number of steps than the other proposed schemes to break a cipher. We elaborate notions of formal language theory based on AIR systems, which can be thought of as a modified version of intramolecular scheme to model the ciliate bio-operations, for construction of building blocks necessary for breaking the cipher, and based on these nature-inspired constructions which are as powerful as Turing machines, we propose a theoretical approach for breaking SPN ciphers. Then, we simulate our proposed plan for breaking these ciphers on a sample block cipher based on this structure. Our results show that the proposed scheme has 51.5 percent improvement over the best previously proposed nature-inspired scheme for breaking a cipher.


Introduction
L. Adleman, during a laboratory experiment, for the first time discovered the potential of DNA molecules to solve computationally hard problems [1].His revolutionary paper started the interdisciplinary area of DNA computing.Since then, a bulk of research has tried to concentrate on the theoretical ability of DNA strands to solve hard problems.Specifically, language theory has helped researchers find mathematical constructions to build computing machines based on ability of biomolecules represented in form of words.Natural computing which utilizes the potential of biomolecules in their living environments (i.e.cells) is of special interest.In this respect, Kari et al. in [2,3] considered the gene assembly process in ciliates and demonstrated that it has computational capability just like Turing machines.Their findings aroused a hot line of research in cellular computing.
Ciliates are single-celled eukaryotes that have special features which make them appealing and distinctive.They possess cilia which are used for their motion and also for making a current of water to sweep bacteria and other nutrients into their oral cavities.In addition, they have two different sorts of nuclei: A diploid micronucleus and a polyploid macronucleus.The former is germ-line nucleus which is activated only during the sexual process of conjugation and remains dormant in the vegetative cycle.And the latter is the somatic nucleus which is the housekeeping nucleus responsible for production of RNA transcripts which is a must for cell development during its life cycle.A species of ciliated protozoa Oxytricha trifallax is shown in Figure 1.
Ciliates do sorting, inversion and excision of their DNA sequences.We adopt the strategy of encoding all solution candidates into a micronuclear gene, then assembling the gene using intramolecular model [4] and ultimately filtering the results and checking through the cipher to find the right key.
In [5], Adleman et al. proposed a scheme to break the Data Encryption Standard using DNA molecules, by molecular biology tools.
In this paper we want to replace formal biological operations by ciliate bio-operations for cryptanalysis of SPN ciphers.For this reason, we use language-theoretic notions to describe the process of cryptanalysis and by utilizing an encoding scheme of the words of our constructed notion to the MIC genes of a hypothetical ciliated protozoa from the Stichotrichous family as shown in Figure 2, we design AIR systems which simulate different blocks necessary to do the cryptanalysis of a large class of block ciphers, called substitution-permutation  networks and then using these Turing machine constructions, we simulate our theoretical attack on this structure defined by specific and predefined blocks.
The rest of this paper is organized as follows.In Section 2, the substitution-permutation networks are introduced which constitute a large class of block ciphers, in Section 3.1, the concept of splicing schemes that is necessary to understand AIR systems which is a variant of intramolecular operations for modeling ciliate bio-operations is briefly introduced.In Section 3.2, we define the accepting intramolecular recombination systems (or AIR systems) on which our proposed scheme to break the cipher is based.In Section 4, we propose and build the necessary blocks which we need for cryptanalysis of the cipher.In Section 5, based on our previously designed AIR systems, we devise a theoretical approach to attack the cipher.In Section 6, we evaluate the performance of our proposed scheme and derive the total bio-steps necessary to mount the attack.In Section 7, our simulations are discussed and the results are reported.Finally, in Section 8, we summarize our paper and conclusions are drawn, and the plans for the future research based on this work are presented.

Substitution-Permutation Networks
A variety of modern block ciphers are built using an iterative structure of Substitution-Permutation Networks or SPN for short.AES (Rijndael), Shark, Khazad and Anu-bis are good examples for SPN ciphers [6].The selected example SPN is as shown in Figure 3 and we will focus our discussion on this network.As demonstrated in Figure 3, the block size of our cipher is 16-bits and each block of the plaintext is processed by repeating basic operations of a round which are substitution, permutation and key mixing.Indeed, our considered scheme is similar to what is found in many modern block ciphers including Rijndael from basic operations viewpoint and provides us with an insight into cryptanalysis of the real-world block ciphers using natural computing methods.

Preliminary Definitions
Our proposed scheme to break the cipher is based on Accepting intramolecular recombination systems (AIR systems) which is a variant of intramolecular models of gene assembly in ciliated protozoa.In this section we bring some basic notions and notations that are necessary to conceive the attack procedure.


A splicing scheme [7] is defined as a pair

AIR Systems
An accepting intramolecular recombination system is defined as a quadruple is the splicing scheme and in  and  are input and the target words, respectively.Considering a splicing scheme, , we define the contextual intramolecular operations of translocation, trl, and deletion, del, which are generalizations of dlad and ld intramolecular operations, respectively, as follows [8]. Assuming The trl operation with respect to R is defined as Therefore, in the , p q operation the strings of u and v which are flanked by pointers p and q are swapped.The del operation with respect to R is also defined as: . Hence, intuitively,  operation removes the string u that is flanked by two occurrences of p.We can now define the set of all contextual intramolecular operations under guidance of ~ as: Accordingly, we can define the language that is accepted by the AIR system, G, as all the words   op R   in w for which by consecutive application of any number of operations from  , we can get to the target word  .

Constructing Necessary Building Blocks for Attacking the Cipher
Our proposed scheme to break the SPN family of block ciphers considers the modified approach of intramolecular recombination for modeling gene assembly process in ciliates, and then applies this approach in constructing necessary building blocks for breaking the cipher.Our attack approach is brute force in which we assume that we possess a (plaintext, ciphertext) pair and by exhausttive search over all possible keys, we aim to find the correct key of the cipher.Therefore, in the first step, we should produce all genes of a hypothetical ciliate each of which codes for an individual key of the cipher and then using gene assembly process that naturally happens in ciliated protozoa, we construct Turing machines that imitate main operations that we need in the procedure of cryptanalysis such as the substitution, permutation and logical XOR, and ultimately, we can find the key that when mixed with the plaintext, gives the ciphertext if in each step of the computation the micronuclear (MIC) genes are assembled to the expected macronuclear (MAC) genes.In the next section we introduce the main operations needed for cryptanalysis and then, build Turing machines that imitate these operations.

Generation of All Possible Keys
In order for generation of all genes that code for all possible combinations of the key, we utilize the graph of Figure 4 in which all possible paths that start from b and terminate at e code for a different n-bit key.We use intramolecular model of gene assembly in ciliates.Therefore, beginning with a single MIC gene pattern for which there exist more than two occurrences of a pointer, we can assemble different MAC gene patterns that code for different keys of the cipher.Now, assuming that graph of Figure 4 is demonstrated with  for which V and E denote sets of vertices and edges, respectively, we define an encoding of G in the MIC gene pattern in terms of MDS descriptors as follows: we associate a pointer p to each vertex of G and to each directed edge   , p q   , p q n pp p p q  we associate MDS of .Therefore, a path belonging to set of edges of G, have been spliced.Note that MIC MDSs are spliced on their common pointers.For our graph of representation of all possible keys (graph of Figure 4) we have   possible keys.According to the universality result of [9] for intramolecular operations, each path of G can be assembled using intramolecular ciliate operations ld, hi and dlad.Since we are looking for all those paths that start from b and end at e , our desired assembled MAC gene would be of the form, , in which u is any path that contains all 's i a     and all different i 's or i 's .Therefore, if each edge of G is encoded as a MIC MDS then, we show that one can produce all possible keys of the cipher using ld operation only from a single string of MDS descriptors that demonstrate all edges of G as shown in Theorem 1.For this reason, we say that a descriptor b b n is defined as a descriptor that encodes all edges of G. Theorem 1.Any successfully assembled for which all i 's and a total number of n of either i 's or i b 's appear and , produces any possible path that is representative for every one of possible keys of the cipher.
Implementation of logical XOR using ciliate bio operations is explained in Section 4.2. 2 MDS V   , produces any possible path that is representative for every one of possible keys of the cipher.Implementation of logical XOR using ciliate bio operations is explained in Section 4.2.

An Intramolecular Model for Computing Logical XOR
We define logical XOR as satisfiability of a Boolean relationship that is depicted in Equation ( 4).

   
And here we provide a ciliate solution for Equation (4) using intramolecular model which computes the result of XOR in polynomial time.For this reason, we construct an AIR system that evaluates the output of Equation ( 4).The circuit that implements XOR is depicted in Figure 5.
As can be seen in Figure 5, the circuit is composed of three layers and we show its output with the following notation demonstrated in Equation ( 5).

 
In the above notation, each gate is shown with ij in which i denotes number of layer of the circuit and j denotes number of that gate in the layer and and  and    denote logical AND, OR and NOT gates, respectively.In this circuit we have where 31 .Now we construct an AIR system for the Boolean circuit of Figure 5 as in which is the set of inputs of the circuit.Now, if we consider Equations ( 6) and ( 7  and c G accepts the input string if result of XOR is 1 and the axiom YES is produced.Therefore if result of XOR equals 1, G c accepts input string in 4 steps.In the following we construct splicing schemes necessary for computation of XOR in the proposed AIR system.
and the computed words in each step of computation can be written as Equations ( 30)-(33): In the next step, by applying deletion operation to 2  , if the result of the computation equals one, the axiom YES will be produced.

Simulation of S-Box with AIR Systems
We assume the word of Equation (34) as input to S-boxes of Figure 3   The following splicing rules will be used to guide recombinations of our AIR system for simulating S-boxes.
After applying the above splicing rules, the following word is obtained in which we assumed that the 4-bits of input in S-box i are all mapped to logical one or T. Other mappings can be defined as well according to the look-up table of each S-box.
(51) Then, the following splicing scheme is applied to the resultant word.Then, the following splicing scheme is applied to the resultant word.

   
After applying the above rule, a deletion rule (which is based on ld operation) is applied as follows.
(53)    Therefore, axiom OK!Is produced which implies that the S-box has been calculated.
The start sequence for calculation of the substituted words for computing S-box can be written as shown in Equation (54), in which p and   are assumed blank symbols.
(54) After applying the splicing rules of Equations ( 35)-( 50) to  we can obtain i S  as written in Equation (55).
Now, we can obtain the output of S-boxes by applying a deletion rule which is guided by the splicing scheme which is written in Equation (56).
In Equation ( 56), shows the blank symbol and ji in which j is the number of output bit of S-boxes and i is number of S-boxes.Therefore, after applying the above rules, we can write the output of S-boxes as shown in Equation (57).
is the output of S-box number i.

Simulation of P-Box with AIR Systems
In this section, we build a Turing machine based on AIR systems that simulates P-boxes of Figure 3.For this reason, we write the input word and splicing schemes for P-box as shown in Equation (58).
  In the above equation which describes input to the P-boxes of "Figure 3", we further assume that all i , are distinctive and p is the blank symbol.Therefore, we can write the splicing schemes as shown in Equation (59).
After applying trl rules in a parallel fashion, such that they are guided by splicing rules of Equation ( 59), we can get to the word that is shown in Equation (60).
  , Then, a deletion rule as shown in Equation (61) which is guided by splicing rule of Equation (62) produces the output of P-boxes.The output word of P-boxes is demonstrated in Equation (63).
In Equation ( 63), P  is the output of P-box.

The Proposed Attack Plan
In order to demonstrate our algorithm, we use the attack graph of Figure 6 in each node of which a fraction of the attack takes place.In the following, we explain the operations that are accomplished in each node of the graph of Figure 6.
In node in , all possible keys are generated the process of which was explained in Section 4.1.In node 11 , all possible keys that were generated in in are bitwise XORed with the given plaintext using AIR system of Section 4.2 that is guided by splicing rules of Equations ( 8)-(28).Then, the generated strings are forwarded to S-boxes 1 4 of Figure 3 and in node , Sboxes are applied in parallel to the output of node 11 in accordance with splicing relations of Section (4.3).In 13 the generated strings are permuted in a fashion dictated by the cipher instructions according to the splicing relations of Equations ( 59) and (62).The process of consecutive mixing with subkey and applying substitution and permutation operations go on in an iterated fashion in the next nodes such that in node 1 n , 2 subkey and eventually, in node out Y the generated bits are compared with the given ciphertext and we can find the key of the cipher if they are equal.

Performance Evaluation of the Proposed Scheme
Considering that the operations of each node take place concurrently for all strings that exist in that node, we can find the number of steps that takes to produce the key of the cipher.In node in by applying different combinations of consecutive ciliate bio-operations to a single initial word which is applied in parallel and takes one step, we can produce all possible key of the cipher.In node 11 the XOR operation is applied between the generated words of node and the plaintext and considering that each XOR operation has a depth of three, four steps are required to evaluate XOR of two bits that produces output of one.In node 12 two steps are required to produce the resulting words.In node 13 two steps should be accomplished to evaluate P-boxes.So, there will be the same steps for other nodes of 1 2 3 . We further assume that there are n-rounds of substitutions and permutations.So, before the last node, out , we need to do 1 4 operations.The operations of the last node can be carried out as follows: Those bits that have been generated in node out are XORed with the corresponding given ciphertext bits which take 4-steps and then, the AND operation is applied to the results of XORs which takes one step.So, altogether steps are mandatory to accomplish our attack in which n is the number of SP rounds.Assuming that we use our proposed scheme to break 16-Round DES cipher (n = 16) which has a Feistel structure which can be assumed as a variant of SPN ciphers, using the analysis of Section 6, we need 138 steps of computation which gives it a superior performance over the proposed schemes of breaking DES using DNA computer [10] which requires 916 steps and membrane computing [11] which requires 278 steps of computations as well as breaking DES using network of evolutionary processors with parallel string rewriting rules (NEPPS) which requires 268 steps [12].Furthermore, our proposed scheme, as opposed to [11] which needs exponential space, does not need exponential space and for a specific set of instructions of a given cipher, utilizes a constant number of splicing rules.The performance of our proposed scheme for cracking 16R-DES cipher in compareson with the previously proposed schemes has been depicted in Figure 7.

Simulation of the Proposed Attack
We conduct a computer simulation to test our proposed theoretical scheme to break a sample cipher based on the SPN of Figure 3 and in this section, first, we introduce the parameters of the sample cipher which we aim to cryptanalyze and then we explain the steps of our simulations and finally, we present the results of simulations.

Parameters of the Considered Cryptosystem
In what follows, we introduce the cryptosystem under consideration for simulation in this paper.As can be seen in Figure 3, the input block size of our cryptosystem equals to 16 bits and a certain operation takes place for n rounds.Each round consists of substitution block, permutation and mixture of bits.This structure looks like the one that is used in DES and other modern block ciphers such as Rijndael [13].The utilized blocks of the cipher can be defined as follows.

Substitution Block
In the cipher of Figure 3, we break the 16-bit block of input into four 4-bit blocks.Each sub-block constitutes an input to each S-box (a 4-to-4-bit S-box) that can be realized with a look-up table containing sixteen 4-bit values that can be defined with the integer numbers that are shown in the input bits of Table 1.The considered S-box is non-linear and the output bits of each S-box cannot be written as a linear combination of the inputs bits.Furthermore, it is assumed that all substitution boxes of 1 4 are equivalent.Our considered substitution box for the cipher is shown in Table 1.

Permutation Block
The considered permutation schedule for the cipher of Figure 3 is shown in Table 2. Numbers of this table show the position of the bits in the block such that 1 shows the leftmost bit and 16 is the rightmost bit of the input block.

Mixing with Subkey
We define the operation of mixture with the key to be simply equal to applying the XOR gate between round key bits and the input block to the round.
In what follows, we design different sub-systems necessary for implementation of our proposed theoretical attack to the cipher.
Considering the above parameters in the cipher of   Figure 3 and using splicing rules derived in Section 4, we simulate the proposed attack on the cipher.We utilize Turing Machine Simulator software [14] to simulate our proposed attack on the cipher.For this reason, based on the splicing rules of Section 4, we write an initial program to simulate the main blocks of cryptanalysis according to the cipher parameters (such as generation of all keys, doing XOR, S-boxes and P-boxes).We write appropriate programs for each block to do computations on the words written on the tape of the Turing machine which simulate the AIR systems designed in Sections 4.1-4.4.Then we combine all the written programs based on the graph of Figure 4.In our simu-lation, we consider the following assumptions: we assume the number of rounds of the cipher equal to 1. Therefore we have one subkey string which is same as the main key and has a length of 16 bits.We further assume that the plaintext equals to "0000000000000000" and also the ciphertext is assumed to be "01011110011 00110".Now, based on the our designed AIR system, we write the applied algorithm for cryptanalysis as follows.
Initially, by applying different splicing rules on an initial sequence, we can produce all possible combinations of the key on the tape and then we put these strings on different segments of the tape with a certain distance from each other.Then, according to the splicing rules of Equations ( 8)-(28) we apply bitwise XOR operation between key bits of each key and the corresponding plaintext bits and their output is placed in another place on the tape.Then, by using splicing rules of Equations ( 35)-( 50) and (56), we apply the substitution boxes of Table 1 to the derived words in a parallel fashion and the resultant words are rewritten on the same words.Then, P-box of Table 2 is applied in to the derived words in a parallel manner which utilizes the splicing rules of Equations ( 59) and (62) and we rewrite them on the previously derived words.Then, we apply the logical XOR operation between bits of the resultant words of this stage and the key bits which had been mixed with the resultant sequence of the previous steps (which have known location on the tape) and therefore, we can get to the enciphered message.Now, in accordance with the proposed operations in the last node of the graph of Figure 4, in this node, the achieved strings must be compared with the predefined ciphertext and if all the corresponding bits were equal, the used key is known as the cipher key.
We wrote appropriate programs to accomplish the abovementioned instructions and ultimately, we successfully derived the secret key equal to "0001001000 110100" which was predicted.
It is noteworthy that breaking the cipher of "Figure 3" with more rounds can be done in the same way.But for the sake of demonstration of our algorithm and also the problem of taking a long time for execution of the pro-gram we tested it on the cipher with 1 round of operations.
The codes shown in Appendix 1 are our specifically written programs to simulate Turing machines related to different building blocks necessary for the cryptanalysis procedure as demonstrated in Section 4 which are appropriate for execution on the Turing Machine Simulator software [14].
Note that all the written programs halt and generate appropriate output strings in a finite number of iterations The results of execution of codes of Appendix 1 for the considered cipher, defined in Section 7.1, are shown in Tables 3-6.

Conclusion
In this paper, we proposed a language-theoretic notion to break SPN class of block ciphers which is based on the gene assembly process that naturally occurs in ciliated protozoa during the procedure of converting scrambled MIC gene to MAC gene.Our scheme utilizes the AIR system which includes two modified versions of intramolecular ciliate bio-operations p del and , p q trl which renders it the computational flavor of Turing machine.Assuming that we use our proposed scheme to break 16-Round DES cipher (n = 16) which has a Feistel structure which can be assumed as a variant of SPN ciphers, using the analysis of Section 6, we need 138 steps of computation which gives it a superior performance over the proposed schemes of breaking DES using DNA computer [13] which requires 916 steps and membrane  computing [14] which requires 278 steps of computations as well as breaking DES using network of evolutionary processors with parallel string rewriting rules (NEPPS) which requires 268 steps [15].Furthermore, our proposed scheme, as opposed to [14] which needs exponential space, does not need exponential space and for a specific set of instructions of a given cipher, utilizes a constant number of splicing rules.Finding nature-inspired computational models like L-systems or modified versions of P-systems or our utilized model, seem to be promising in developing efficient computational models which simulate universal Turing machines and for future works, other computational problems can be suggested to be solved using these models.
an alphabet and ~ is a binary relation over

Figure 3 .
Figure 3.The basic substitution-permutation network.if the contexts of occurrences p are in the relation ~, the p del R  strategies to the same MIC gene pattern, we can obtain different MAC genes which represent all , , a a b a b a a   b a b a a   Copyright © 2012 SciRes.IJCNS A. KARIMI ET AL.

Figure 4 .
Figure 4.The graph for generation of all possible keys.

Figure 6 .
Figure 6.The attack graph for breaking the cipher.

Figure 7 .
Figure 7. Graph of comparison for performance evaluation.

1 2 of G can be encoded by an interme- diate MDS  
 ):   