Algebraic Cryptanalysis of GOST Encryption Algorithm

This paper observes approaches to algebraic analysis of GOST 28147-89 encryption algorithm (also known as simply GOST), which is the basis of most secure information systems in Russia. The general idea of algebraic analysis is based on the representation of initial encryption algorithm as a system of multivariate quadratic equations, which define relations between a secret key and a cipher text. Extended linearization method is evaluated as a method for solving the nonlinear sys-tem of equations.


Introduction
The general idea of algebraic cryptanalysis is finding equations that describe nonlinear transformations of Sboxes followed by finding solution of these equations and obtaining the secret key.This method of cryptanalysis belongs to the class of attacks with known plaintext.It is enough to have a single plaintext/ciphertext pair for the success.Algebraic methods of cryptanalysis contain the following stages: • Creation of the system of equations that describe transformations in non-linear cryptographic primitives of the analyzed cipher (i.e., S-boxes for most symmetric ciphers); • Finding solution of this system.
The idea to describe an encryption algorithm as system of linear equations originated quite long time ago.However, until recent years it was a purely theoretical idea because it demanded huge memory capacities for storing variables and supplementary data.Nowadays, just a few research results are available that demonstrated the possibility to attack a full-round encryption algorithm.Analysis of full-round GOST is declared in paper [1].However this paper does not contain any detailed description of the approach, only a brief outline.We will discuss in detail the basic mechanisms of algebraic analysis with regard to GOST.

GOST Encryption Algorithm
GOST encryption algorithm has four operation modes: simple substitution mode, stream mode, stream mode with feedback and authentication mode.Simple substitution mode is the basic one and all other modes contain it in their structure.We will consider only this mode in the chapter.
GOST algorithm is a symmetric block cipher, which conforms to Feistel scheme.64-bit blocks of data are submitted to the input and converted into 64-bit blocks of encrypted data by 256-bit key.In each round the right side of plain text messages is processed by function F, which converts data with three cryptographic operations: adding data and subkey modulo 2 32 , substitution of data using S-boxes, and left cyclic shift by 11 positions.Output of F-function is added modulo 2 to the left part of the plaintext, then right and left sides are swapped for next round.The algorithm has 32 rounds.In the last round of encryption right and left parts are not swapped.The overall dataflow diagram of GOST is shown in Figure 1.

Algebraic Cryptanalysis
Claude Shannon [2] assumed that cracking strong encryption algorithm requires "as much work as for solving simultaneous equations with a large number of the unknown variables".While analyzing encryption algorithms much attention used to be directed to statistical analysis methods and algebraic attacks describing holistic approaches to the problem of cipher strength analysis.
The use of cipher internal structure is a characteristic property of algebraic analysis unlike statistical methods.Cryptanalyst reports enciphering transformation in terms of simultaneous equations and tries to solve the present simultaneous equation to obtain the encryption key.Algebraic attacks can be set as two stages.The first stage is to present an encryption algorithm and some extra algorithm data in a form of polynomial equation set over GF(2) field or other finite field.The second stage is to solve the simultaneous equations and obtain the encryption key.Cryptanalyst should generally know μ pair, (P 1 ,•••, P μ ) plain text and (C 1 ,•••, C μ ) cipher text, when E(P i ) = C i condition holds for any i∈{1, μ}.One or several solutions can be obtained in the course of algebraic analysis.In the simultaneous equation, set for the analyzed encryption algorithm, single unknown value corresponds to every key bit.Furthermore, there could be other unknown values connecting input and output encryption round values in the simultaneous equations.Cryptanalyst's task is to construct a sufficient number of equations from algebraic analysis in order to reduce the number of solutions of simultaneous equations to only one correct solution, but yet simultaneous equations should be solved within acceptable period of time.Fulfillment of the above conditions means that algebraic analysis attack will be performed faster than brute force attack.Different cryptanalysis approaches were developed to solve nonlinear systems of Boolean equations.Practical aspects of cryptanalysis state that methods based on linear approximations of initial system are the most efficient.
One of the first algebraic methods used in cryptography is based on determining Grӧbner basis.One can take a close look at Buchberger algorithm for determining Grӧbner basis in [3][4][5].Later Jean-Charles Faugère [6] recommended using more efficient F4 and F5 algorithms for determining Gröbner basis.
In cryptography the solution of system of nonlinear equations is NP complex task and requires exponential time.At the same time efficient solution algorithms of simultaneous linear equations can be used within polynomial time.Thus, if the number of unique monomials in the system is more or equal to the number of linearly independent equations, initial nonlinear system can be put in linear form with linearization method [7].The linearization method is based upon the substitution of all nonlinear elements for new variables.The linearization method can be represented in the following form.Suppose simultaneous linear equations are over GF (2) field with the equations are given by , , , , where x i , x j -variables; Solution of the present simultaneous linear equations involves substitution of each nonlinear monominal term (x i x j ) for new unknown u g and solution of the resulting simultaneous linear equations with respect to new variables, with further determination of the solutions for the initial simultaneous linear equations by solving x i x j = u g , special systems for each solution of simultaneous linear equations.
If nonlinearity degree for each equation is not more than d value, the number of variables in the resulting simultaneous linear equations is not more than + .Complexity of determining solutions for simultaneous linear equations when using Gaussian algorithm and neglecting the complexity of special systems, is evaluated at O(t) 3 .Cryptanalyst should generate additional equations using mathematics, i.e. when variable products are substituted, the list of the system unknown increases (all the products are taken as self-unknowns).
In [8] devised an advanced relinearization method.Aware that there are efficient solution algorithms of linear systems, Kipnis and Adi Shamir suggested reducing MQ-task to linear form by relinearization, and then obtaining additional system equations with the help of connection between new variables.Authors illustrated that for overdefined system of equations relinearization algorithm will be executed within polynomial time.In relinearization method all nonlinear elements are substituted for new variables as is in the case of linearization, and then the relations between the unknowns produce additional equations.For example, one can use the commutative law of multiplication.Let initial nonlinear system possesses {x 1 , x 2 , x 3 , x 4 } variables, then their products can be substituted in the following way: x 3 •x 4 = y 34 , then the additional equations come out right: New y i y j monomials will appear in the system, they can be substituted for new variables and linear system can be solved.The present method made possible to solve many systems which were not solvable by linearization method.The weak point of relinearization method is its complexity.The complexity is difficult to estimate cor-rectly and depends on the properties of the initial system.
Extended Linearization method (XL) [9] can be evolution of relinearization method.The basis of XL method is to solve system of equations by linearization with linearly independent equations produced preliminarily.Additional equations are formed with multiplication system initial equations by monomials of the specified degree.Let us assume that initial system results from m equations with n unknowns connecting encryption key and known data.Then eXtended Linearization algorithm is given by the following: • Evaluating of D parameter of eXtended Linearization method by: • Multiplication of all initial m system equations for every monomials to ≤D-2 degree.
• Including additional equations, obtained on stage 2, into the initial system.
• Substitution of all unknown products for new variables (system is reduced to a linear form) by linearization.
• Solution of the obtained linear system by Gaussian elimination method.
The attack complexity by eXtended Linearization corresponds to the complexity of finding the solution of linear system.When using Gaussian elimination method for solution of linear system, XL complexity is calculated by , / where ω ≤ 3-exponent of Gauss transformation.
Extended Sparse Linearization algorithm (XSL) was developed in [10].This method is used for the solution of sparse systems, i.e. systems of equations lacking many possible unknown products.XSL method includes 5 steps: • Analysis of initial system of equations and selection of a fixed list of monomials and equations for further usage.
• Selection of Р parameter value and multiplication of the equations selected on the previous step and (Р-1) product results of the selected monomials.It is the principle step of XSL method where it necessary to obtain sufficient number of linearly independent equations.• Т' method is additionally executed, where some selected equations multiplied by single variables.
• The use of linearization method by representing each monomial in the form of a new variable.
• Execution of Gaussian elimination method to find one or several solutions of the equation system.
The vital difference between XL and XSL methods lies in that in XL method initial system equations are multiplied by all possible monomials to the degree not more than D-2 (where D-attack property depending on the number of equations and the unknowns of the initial system).In XSL method several equations should be multiplied by the monomials selected earlier.
The research of applying algebraic attacks to block ciphers can be found in the following papers.In [11] proposed to represent the block cipher algorithm, Advanced Encryption Standard (AES), in the form of one equation with a great number of unknown values over GF(2) field.For 128-bit AES cipher this equation included 2 50 unknowns and didn't practically bring the threat to cipher strength.The obtain results showed that representing the block cipher algorithm in the form of polynomial equation doesn't mean its algebraic attack vulnerabilities.In [12,13] there are further investigations of AES standard analysis.Thus in [14] authors described the attacks on simplified Serpent, Present and AES cipher algorithms, and illustrated potential vulnerability to algebraic attacks.Thesis work [13] described the analysis results of full version of AES-128 and Cryptomeria algorithms by finding Grӧbner basis.Article [15] describes attack on Data Encryption Standard (DES) cipher algorithm.In the present research the authors executed the attack on 6 full rounds of DES encryption with the only known open text/ciphertext pair.Book [16] allows taking a close look at algebraic attack on Keeloq block cipher.Article [17] describes the analysis results by SAT-solving method, 8 rounds of PRINTCipher-48 block cipher algorithm with two texts known and analysis of 9 rounds with processing some assumptions.
Today, despite the growing interest to the algebraic attacks, the use of such attacks in Russian symmetric en-cryption standard GOST is poorly reviewed.When reviewing GOST analysis, we should mark the work [1] it executed the attack 2 39 times as fast than brute force attack with known 2 64 texts.However, in this work the author doesn't reveal GOST attack algorithm, which is why further research of GOST attack will be relevant.

GOST Algebraic Cryptanalysis
Let's review the use of algebraic attack on GOST with eXtended Linearization method.From the beginning the authors considered the attack on the light version of GOST.The present algorithm was chosen in view of simplifying of addition modulo 2 32 with round key in the system of equations.Thereafter when passing to GOST analysis addition modulo 2 32 with round key should be presented as additional equations as described in [14].
Basing on previous research of attack applicability to block ciphers, generation of equation system connecting cipher key and the known data is executed for nonlinear substitution transformation in GOST, presented by S-boxes.GOST substitution boxes are known to be additional secret encryption element, and accordingly the authors considered the implementation of attack preparatory stage directed to calculation of secret substitution tables [18].The complexity of preparatory stage is not more than 2 32 encryption operations for unequivocal finding of substitution table.For recovery of substitution tables, as a part of study, required 2 30 encryptions and 26 minutes to calculate with IntelCore 2 Duo T8100 and memory capacity 2 GB.Algorithm of equation system generation for a fixed substitution table was illustrated in [19].For the substitution boxes of 4 × 4 bit, described in [20], 2 37 possible equations were tested, and 2 21 equations appeared to be correct transformation of substitution block.21 linearly independent equations were found for every block.Transformations in one encryption round were produced by the system which included 168 linearly independent equations with 64 unknowns and 288 various monomials.Linearly independent equations for the first substitution box under research are given by: For two GOST rounds showed in Figure 2, the initial system depicting encryption transformation includes 336 equations, 128 unknowns and 576 various monomials.Using linearization method without additional reducing will not allow finding the solution as the number of equations is less than the number of monomials.Using GOST structure one can substitute output bits of substitution boxes for evaluated values by formulas: Y 1 = (P L ⊕ C R )>>>11, for the first round, Y 2 = (P R ⊕ C L )>>>11, for the second round.After substitution the system will obtain 64 unknowns instead of 128 and the number of monomials in the equations will be reduced to 160.The requirement of linearization applicability will be fulfilled, thus, we will apply linearization method to find encryption key for two GOST rounds in the second attack stage.As it was mentioned above, the attack complexity defines the complexity of linear system solution, so the cryptanalysis complexity of two GOST rounds is 160 3 ≈ 2 22 , which is in 2 42 times less than the complexity of brute force attack (2 64 ).
The attack on two rounds of GOST algorithm was similar to the attack on two rounds of GOST encryption algorithm.The system of equations remains the same for GOST algorithm but round key calculation with the obtained input values of substitution boxes was affected.Being aware that addition is performed according to modulo 2 32 , translation between digits was verified when sum calculation.For example, let us take original attack data as following: Plain text 58b7e5e8a13a Cipher text c3a09d847e1aa55e The input value of substitution boxes was calculated: X 1 = 95771861 X 2 = 2DA91C85 If we know substitution box inputs, we can calculate round encryption keys by formulas: K = P R + X, if X > P R K = 2 32 − P R + X, if X ≤ P R We have found the following keys for two rounds: K 1 = AF8E7727 K 2 = AF8E7727 When passing to the analysis of three GOST encryption rounds illustrated in Figure 3, the system with 504 equations, 192 unknowns and 864 various monomials.
Reduction of the number of the system unknowns for three rounds was not successful and searching of the solution was executed with XL method.When applying eXtended Linearization method the authors consider every substitution box individually.In this case D parameter of XL method is equal to 3, i.e. each of 21 equations for substitution box is multiplied by the unknown to 1 extent.Then we obtain 189 equations for one substitution box, the majority of these equations are linearly independent.For three encryption rounds after resulting in additional equations, the system includes 4536 equations with 192 unknowns and 2208 various monomials.
The system can be solved by linearization.Attack complexity for GOST encryption rounds will be 2208 3 ≈ 2 34 that is in 2 62 times less than complexity of finding round keys with brute force method.
For 32-round GOST encryption algorithm the system will contain 48,384 equations with 2048 unknowns and 23,552 various monomials.Attack complexity will be 23,552 3 ≈ 2 44 .
Note that implementation of GOST attack algorithms mentioned above with GOST ⊕ cipher attack is com- plicated by translation between digits when addition modulo 2 32 with round key.Implementation of unfixed substitution tables is the difference between attack on GOST algorithms and GOST in comparison with other box ciphers, that every time provokes the necessity to restore substitution tables and generate system of equations.It is noteworthy that there is a possibility of efficient parallelization of equation system generation for substitution boxes and solution of simultaneous linear equations.
Nowadays the implementation of eXtended Linearization algorithms for GOST full-function 32-round algorithm is still under investigation.

Figure 2 .
Figure 2. Two GOST rounds in ordinary substitution mode.