KAMAR: A Lightweight Feistel Block Cipher Using Cellular Automata

Wireless Multimedia Sensor Network (WMSN) is an advancement of Wireless Sensor Network (WSN) that encapsulates WSN with multimedia information like image and video. The primary factors considered in the design and deployment of WSN are low power consumption, high speed and memory requirements. Security is indeed a major concern, in any communication system. Consequently, design of compact and high speed WMSN with cryptography algorithm for security, without compromising on sensor node performance is a challenge and this paper proposes a new lightweight symmetric key encryption algorithm based on 1 D cellular automata theory. Simulations are performed using MatLab and synthesized using Xilinx ISE. The proposed approach supports both software and hardware implementation and provides better performance compared to other existing algorithms in terms of number of slices, throughput and other hardware utilization.


Introduction
Security is a critical factor in every communication system in this world; may it be a simple short distance communication or communication between large servers that deal with a large volume of data.One of the recent trends of WSNs is transfer of multimedia information like video, still images, audio etc., through self-organized networks.Such networks are called as Wireless Multimedia Sensor Networks (WMSNs) [1] [2].Conventional wireless sensor nodes are limited by processor speed to handle multimedia information.Hence, the challenge is to design a sensor node prototype with low power consumption and high speed processors O (GHz).Some of such high speed and low power processor hardware specifications available in literature [3] are listed below in Table 1.
In this paper, a new fast symmetric key encryption named as KAMAR (the name of the algorithm derived from KAmaraj and K. J. J. kuMAR) based on the cellular automata theory, suitable for WMSN applications is proposed.The objective therefore is to design and develop an encryption algorithm that occupy less slices/area, consumes minimum power and offers high throughput.
A cellular automaton is finite or infinite dimension grid of cells arranged regularly and can have finite number of states.In information theory, the cell states lie in Galois field, GF (2).For each cell, states depend on the "states of group of cells" (including the cell itself) and it is called as neighbourhood.An initial state (time 0 t = ) is selected by assigning a state for each cell [4]- [7].A new generation is created according to some mathematical fixed rule called local update rule function or simply rule function.This local update rule function is applied to the whole grid simultaneously and the new state of each cell in terms of the current state of the cell and the states of the cells in its neighbourhood is determined [8]- [10].
The simplest form of CA is one dimensional with two possible states.A cell and its two neighbours form a neighbourhood of 3 cells, so there are 3  2 8 = possible configurations for a neighbourhood.A rule is a deciding function for each configuration, for the cell to have state 1 or a 0 in the next generation.Therefore, the total number of possible rules is 8  2 256 = [11].These 256 CAs invented by wolfram [6] are generally referred as Wolfram code, who numbered each rule from 0 to 255.The rule 30, 45 and rule 110 CAs are predominantly attractive in applications of cryptography [9] [10].These rules are of attractive because they generate composite and random patterns.Rule 30 is universally used as random number generator in Wolfram's program Mathematic for cryptographic applications [6].
Reversible Cellular Automata (RCA) is defined as the high order CA in which the future ( ) C and past ( ) C − configuration of the cells.Generally, second order CA is used to construct local transition rule function.A second order local transition rule function is defined as in Equation ( 1) , RCA rule functions RCA rule-30 and RCA rule-45 are used to construct S-Box and key scheduling algorithm for KAMAR block cipher.
In general, the RCA rule-30 function is defined as in Equations ( 2) and (3) And the evaluated function for RCA rule-45 is given in Equations ( 4) and ( 5) In the above equations, the states x t x t x t + + denote next state, current state and previous state of cells respectively.The paper is organized as follows: Section 2 discusses related works; Section 3 illustrates the design specification and procedure of algorithm; Section 4 discusses the security analysis of the proposed KAMAR algorithm; Section 5 discusses the algorithm performances in hardware implementation.

Related Works
Seung-Jo Han et al. (1996) discussed Data Encryption Standard (DES) called the Improved-DES.The Improved-DES is stronger than the DES against differential cryptanalysis for cryptographic security.The authors proposed the improved DES by dividing one data block (96 bits) into 3 sub-blocks of 32 bits and then perform different f functions on each of the 3 sub-blocks, and then increase the S1-S8 of the S-boxes to S1-S16, satisfying the Strict Avalanche Criterion (SAC) and the correlation coefficient.Finally the key length is increased to 112 bits.The analysis showed that the Unicity Distance (UD) in the Improved-DES is increased than the DES's UD [12].However, the cost of hardware implementation is extensively large.Blowfish et al. (2008) describes that the Blowfish cryptosystem is a very fast encryption algorithm consisting of two different modules, a sub-key & S-box generation phase, and an encryption phase.The author presents the encryption scheme and demonstrates how fast the encryption scheme is, as compared to the sub key and S-Box generation.The secrecy of the encryption routine is explained by using several test files of different types, as well as study of the security with respect to the number of rounds [13].
Ren Fang et al. (2009) described KASUMI is a block cipher with the Feistel network.The small and efficient hardware of the KASUMI block cipher is the core of the 3GPP confidentiality algorithm-f8, and the 3GPP integrity algorithm-f9.In designing the hardware, they focused on optimizing the implementation of FO/FI functions that are the major components of KASUMI.They proposed three methods for this optimization: using a loop-structure in the implementation to reduce the number of the FO/FI function, realizing S7 and S9-boxes in combinational logic and optimization of extended key's generation [14].
P. Israsena et al. (2006) proposed and efficient implementation of algorithm for persistent, ubiquitous applications employing RFID devices, low-cost and secure RFIDs tags.The ICs for such systems have stringent requirements in terms of cost related to area and power consumption, creation conventional encryption unsuitable.The author discusses the potential of employing the TEA algorithm for medium secure systems.It is found that using the implementation style wished-for, TEA based encryption hardware can be made to meet the necessities.The potential usage of low-cost secure RFID for applications such as secure device tracking is also discussed [15].The author discusses the potential of employing the operationally-lean XTEA-based hash function core for encryption/authentication in medium secure systems.It is found that with the parallel implementation style proposed, the core to be compact and faster [16].

Algorithmic Description
In cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, it is also commonly known as a Feistel network.A large proportion of block ciphers like DES, Blowfish, Kasumi, TEA, XTEA, Camellia uses this scheme.The Feistel structure has the advantage that encryption and decryption operations are very similar, even identical in some cases, requiring only a reversal of the key schedule.Also, the interesting feature of the Feistel based block cipher is that the round functions are iterated functions and works for half of the input bits.Thus reduces the computational time of the algorithm.Therefore, the size of the code or circuitry required to implement such a cipher is nearly halved.
The proposed algorithm structure is Feistel network.The algorithm takes plaintext of 128-bit and key input of 128-bit, 192-bit and 256-bit.The fundamental nature of the feistel cipher is that a single round offers inadequate security but that multiple rounds offer increased security.Based on the statistical analysis of the proposed algorithm simulated in matlab, the number of rounds "n" is calculated using the empirical formula as given below in Equation ( 6).
( ) where l K is the key length and l b is the block length.The number of rounds r n for 128-bit block length and 128-bit, 192-bit, 256-bit key length are evaluated as 16, 20, and 32 respectively based on statistical analysis.The basic statistical analysis is evaluating correlation factor between the each iterative round output and its input.It is found that after certain number of iterations the correlation factor remained almost constant around 0.03 approx.
The complete operation of KAMAR is described below as: 1) The plaintext block is divided into two equal halves, ( ) where ¬ : represents inversion operation ( ) , .
3) Decryption of a ciphertext ( ) , is accomplished in reverse direction by computing for ; 1, , 0 i n n = −  and ( ) Then ( ) 0 0 , L R is the plaintext again.

Single Round Function
The single round function is described in the

Key Scheduling Algorithm
The key scheduling algorithm takes 128-bit, 192-bit or 256-bit as input and generates n 64-bit sub-keys for each round of encryption with respect to the key length.
The complete operation of key scheduling algorithm to generate key space is described below as: 1) The initial key block is divided into two equal halves, ( ) where C is 64-bit constant of hex value ( )

Differential Cryptanalysis
Differential cryptanalysis attack is one of the widely known attacks against the block cipher.Biham and Shamir introduced cryptanalysis against DES block cipher.In this cryptanalysis, the difference propagation from the plaintext to cipher-text is exploited.These difference propagations are assigned with probabilities to the possible keys and used to determine the most probable one.A cipher is said to be resistance against this type of cryptanalysis only if the maximum differential probability is small.In this proposed algorithm, to measure this probability, 8-bit RCA cell-based S-Boxes (8-bit input to 8bit output) are considered and 16 S-boxes arranged parallel to each other.The differential probability of KAMAR is calculated by using the following theorem described in [9].
Theorem 1: -If d P is the maximum differential probability of all S-Boxes and D be the minimum number of active S-Boxes, then the maximum differential characteristic probability P is bounded by D d P [9].Hence, searching worst case assumption from the complete entries made in the distribution table for the RCA rule 30 based S-Box operations and its differential characteristic probability is 2 −2 .Also, the difference on the input bit affects different bytes at end of each round process.Therefore, 7 active S-boxes exist after a single round of KAMAR.Using theorem 1, 12 round KAMAR will have differential probability of   Thus, for each round in the KAMAR, the minimum number of active S-Box is observed to be 7 and, therefore, the correlation probability of one round KAMAR is Hence, the proposed algorithm is resistant to linear cryptanalysis.

Hardware Implementation and Synthesis Results
In the hardware implementation (FPGA), the HDL module of the proposed KAMAR algorithm is synthesized, and its equivalent hardware circuitry of the algorithm is extracted.The well-organized FPGA implementation results were extracted after place and route with the ISE 12.1i tool from Xilinx on a VIRTEX-4, XC4VL25-10 ff668 platform with speed grade of 12.The Mentor Graphics Modelsim SE PLUS 6.0c is used to post map simulate and verify the hardware module functionality.
The iterative loop architecture for KAMAR shown in Figure 3 consists of a round function on the right hand side and a key scheduling function on the left hand side.In this design, resource consuming blocks are the 1-D RCA Rule-45 based key generation module, XOR, RCA Rule-30 based S-Box and Shifter.According to the specifications, the key scheduling arrangements consists of two multiplexers that allows switching the right and left half of the round key in the algorithm using appropriate control Switch.The roundHalfKey control to the multiplexer provides the round function with the right part of the round key for the first round execution and the left part of the round key for the second round function.The select control to the multiplexer selects the initial/ feedback data block and key to the round function.The loop is executed repeatedly until final round execution of encryption/decryption is completed.
For comparative discussions, a few implementations results of block ciphersL2DCASKE [10], SEA [19] [20], AES [17] [18] [21], DESXL [22], PRESENT [23], XTEA [24] are presented in Table 2.The symbol # denotes the block ciphers that are implemented straightforwardly in to the same FPGA device to aid precise comparative analysis.The summary of the results presents the area requirements (in slices), the work frequency, and the throughput of block cipher implementation.It is observed that the work frequency of KAMAR is higher compared to other cipher implementation.The bit/slice is a normalized parameter to denote the area efficiency of the various block cipher implementation.From the results, KAMAR is signified as area efficient block cipher with acceptable throughput.

Conclusions
This paper presented a new Symmetric key feistel block cipher based on reversible CA.The encryption algorithm is based on a particular class of reversible CA.One dimensional CA using radius 1 rule is used.The CA rule 30 and 45 S-box function with 8-bit input and 8-bit output is designed to operate over 128 bit data.The same operation in reverse order is performed during decryption.Due to a huge key space a brute-force attack appears practically impossible.The algorithm can be easily extended by using larger block size.Because of the parallel nature of CA, this algorithm can be implemented on a massively parallel platform and ensures high encryption/decryption speed.An optimized and Synthesizable VHDL code is developed for the FPGA implementation of KAMAR and other block ciphers.
KAMAR executes one round per six clock cycle, computes the round and the key round in parallel and supports both encryption and decryption at minimal cost.Compared to other recent block ciphers, KAMAR offers very less area utilization and nominal increase in throughput.Consequently, this algorithm can be considered as an interesting alternative for Wireless Multimedia Sensor Networks.Scope of further research includes low power ASIC implementations as well as further cryptanalysis and security evaluations.

Figure 1 . 2 .
The round function takes right half (64-bit) as an input from the 128-bit plaintext or ciphertext.This 64-bit is circular left shifted by 9 bits and then divided into 8 words of 8-bit each.The internal function of RCA rule-30 based S-Box is shown in Figure The right hand side of the Figure 1 presents the key scheduling round function and encryption/decryption algorithm at the left hand side.The 8 bit output of each S-box is the function of preloaded value and the input bits.The value ( ) 6363636363636363 H is used as the preloaded value in the RCA rule-30 S-Box function.The outputs of all the eight S-box are concatenated and XOR-ed with 64-bit sub-key i K .Let S-Box input set

Figure 1 .
Figure 1.Encryptions/decryption round function and key scheduling of KAMAR.
Thus, KAMAR is effectively resistant to differential cryptanalysis attack.

Figure 2 .
Figure 2. Internal structure of RCA rule-30 based S-BOX.

4. 2 .
Linear Cryptanalysis KAMAR is designed to provide security by focusing on minimizing the largest bias and finding different structures to increase the number of active S-Boxes in the round function.As discussed earlier, the 16 S-Boxes are arranged in parallel.The maximum probability of 11 16 defines the magnitude of the correlation between the output of a linear expression and the nonlinear function.The linear probability bias ε is the difference between the 1/2 and the probability of a linear expression.

Table 1 .
List of MHz to GHz processors for WSN.

Table 2 .
Comparison of Implementation results with other block ciphers.