Local Search Heuristics for NFA State Minimization Problem *

In the present paper we introduce new heuristic methods for the state minimization of nondeterministic finite automata. These methods are based on the classical Kameda-Weiner algorithm joined with local search heuristics, such as stochastic hill climbing and simulated annealing. The description of the proposed methods is given and the results of the numerical experiments are provided.


Introduction
Finite automata (FA) are widely used in various fields and especially in the theory of formal languages.We suppose that the reader is familiar with the basics of automata theory (see, for example, [1]) and provide only some necessary definitions.
Let be an alphabet and n where i be a word.A set of words * is called a language The nondeterministic finite automaton (NFA) is a 5-tuple , where is a finite set of states,  is a finite alphabet,    is a transition relation, I Q  and F Q  are respectively the sets of initial and final states.Transitions of the automaton A are often described by the transition function .The NFA is called deterministic (DFA) iff Finite automata may be used to recognize and define languages.Two automata are called equivalent if they recognize one and the same language.For each NFA the equivalent DFA may be constructed using the powerset construction process (each state of such DFA is a subset of states of the original NFA).
For a word the reverse word is For a given language L a DFA which recognizes it and has the minimum possible number of states is called the canonical automaton and a DFA which recognizes .L and has the minimum possible number of states is called the reverse canonical automaton (these automata are unique for L up to isomorphism).
The NFA state minimization problem is formulated as follows: for a given NFA A find an automaton which is equivalent to it and has the minimum possible number of states.Note that solution of this problem may not be unique.As it is shown in [2] the state minimization problem for NFA is PSPACE-complete.The worst case complexity for the same problem for DFA is All known exact NFA state minimization methods use different types of exhaustive search and are computationally hard.Very often they become impractical even for relatively small automata.This is one of the reasons why they are not implemented in software tools that deal with finite automata and related structures, such as AMoRE [3], FSM [4], Vaucanson [5], JFlap [6].Moreover, only few of such tools provide heuristic NFA state minimization algorithms.
In the present paper we propose new heuristic methods for NFA state minimization problem which are based on the classical Kameda-Weiner algorithm [7] and wellknown local search heuristics (metaheuristics).The novelty of these methods is that the most time consuming part of the exact algorithm is replaced with fast heuristic procedures.The obtained methods are not exact but they allow to reduce minimization time.
The remainder of the paper has the following structure.In Sections 2 and 3 a brief description of the Kameda-Weiner algorithm and local search heuristics is given, in Section 4 the proposed algorithm is described and in Section 5 the results of some numerical experiments are provided.

Kameda-Weiner Algorithm
Lets us consider the brief description of the Kameda-Weiner algorithm (for detailed description see [7]).
Suppose that NFA A is given.The algorithm searches for the minimum state NFA(s) equivalent to A using binary matrix RAM (Reduced Automaton Matrix) which is constructed as follows.
First, canonical automaton B and reverse canonical automaton C for A are constructed.Note that each state of these automata is a subset of states of the automaton A. Then, for each nonempty states i of and the element of the RAM is defined by the following formula Let X be a subset of rows and Y a subset of columns of the RAM.
is called a (prime) grid if it satisfies the following conditions: 1) all intersections of its rows and columns contain 1 s; and 2) neither X nor Y can be enlarged without violating the fist condition.
The set of grids cover RAM if each 1 in it belongs to at least one grid in the set.A minimum cover of RAM is a cover which consists of the minimum number of grids.
Let us consider the NFA A with transition table shown in Table 1 (the example is taken from [7]).
Tables 2 and 3 show the canonical automaton B and the reverse canonical automaton C of A respectively.RAM of A is presented in Table 4.

There are 4 grids in RAM
. It is easy to see that the first two grids make the minimum cover of RAM.
Given a cover of RAM one can construct an NFA which may be equivalent to the original NFA A (in this case the cover is called legitimate).This is done by the means of the special intersection rule.The number of states in the constructed NFA equals to the number of grids in the cover.In the considered example the minimum cover of RAM is legitimate and yields the minimum state NFA shown in Table 5.
The general schema of the Kameda-Weiner minimization technique is described by Algorithm 1.Note that steps 1, 3 and 4 of this algorithm theoretically have exponential complexity.On practice the construction of the canonical automata usually performed rather quick and the most time consuming parts of the algorithm are steps

Local Search Heuristics
Local Search (LS) is a group of metaheuristic optimization techniques which are widely used especially in combinatorial optimization.Its general schema is described by Algorithm 2.
Each LS algorithm starts from some initial solution (line 1) and then iteratively updates it (lines 2 -4) until stop condition is satisfied (in the most simple case it is the maximum number of steps).The Neighbor() function finds neighbors of the current solution Solution and the Update() function changes the current solution depending on the found neighbors.The quality of the solutions is compared using the special Cost() function.
The simplest LS algorithm is called Hill Climbing (HC).In HC the Update() function always selects the best neighbor of the current solution (the steepest move).The Stochastic Hill Climbing (SHC) is a variant of HC which randomly chooses one of the neighbors and decides whether to move to it or to consider another neighbor.
The disadvantage of the HC algorithms is that they can easily stuck in the local optimum.The Simulated Annealing (SA) is a more complicated LS algorithm which tries to avoid this problem.Algorithm 3 shows in detail the minimization process using SA.
The distinctive feature of the algorithm is the usage of the control parameter T (temperature) which slowly decreases as the number of iterations k increases.The Neighbor() function generates a neighbor of the current solution and the Update() function accepts it with the probability 1, where .

Cost NewSolution C   
As it follows from (1) the algorithm always accepts the generated neighbor if it is not worse than the current solution.If the neighbor is worse than the current solution it is accepted with some probability which decreases as the temperature decreases.So, the worse moves are made more often in the beginning of the optimization process.In classical SA the best solution is not stored (lines 2, 3, 8 -12 are missing) and the algorithm returns the current solution.
More details on heuristic optimization algorithms can be found in [8,9].

Combinig Kameda-Weiner Algorithm with Local Search Heuristics
First of all let us consider in more detail the last step of  Algorithm 4).Here M is a set of minimum state NFAs.IsLegitimateCover() function tests whether the set of grids is a legitimate cover and IntersectionRule() constructs NFA using the cover.The bounds and of the main loop may min be calculated as follows: where G N is the number of grids in RAM, A is the number of states in A and B is the number of states in B.

N
For each step of the outer loop in the inner loop (lines 5 -10) all possible i-combinations of grids have to be analyzed.The idea of the heuristic methods proposed in this paper is to replace this computationally hard process So, we use LS in the Kameda-Weiner algorithm to find minimum covers of RAM and then analyze their size and legitimacy.Now let us consider the details of this process.The solution of both LS methods (SHC and SA) for the considered problem is a cover which is coded by a binary vector where 1 in i-th position means that i-th grid is included in a cover and 0 means that it is not included.In the considered example vector means that only first two grids are included in the cover.

 
1,1, 0, 0 To start search LS algorithms need the initial solution.The simplest way to setup the initial solution is to use the trivial solution with all bits set to 1.To obtain nontrivial initial solution Algorithm 6 may be used.
The Cost() function simply counts the number of 1s in the vector and the Neighbor() function inverts several bits in it.After creating a neighbor of the current solution we need to check its feasibility (i.e. to check whether the obtained set of grids covers all 1s in RAM).If the constructed neighbor is not a feasible solution then we add one or several 1s to it using algorithm similar to Algorithm 6. (e.g., in the considered example the solution is not feasible and it needs to be corrected).To ensure the diversity of minimum covers one may run LS several times or use parallel versions of LS where each thread starts from its own initial solution.Note also that if the minimum legitimate covers not found then both exact end heuristic methods return the canonical automaton B if it has less number of states than the given Algorithm 5. Heuristic search for minimum legitimate covers.

1:
Calculate and i

2:
Find minimum cover(s) of RAM using LS until there are 1s in the RAM automaton A.

Numerical Experiments
We have implemented the proposed NFA minimization methods in the ReFaM project.ReFaM (Rational Expressions and Finite Automata Minimization) is a part of the HeO (Heuristic Optimization) library.This library is a cross-platform open source project written in C++ that provides several parallel metaheuristic optimization methods such as Genetic Algorithm (GA), Simulated Annealing (SA), Stochastic Hill Climbing (SHC) and Branch and Bound (BnB).These methods are implemented as algorithmic skeletons using metaprogramming, pattern design and different parallelization techniques (OpenMP and MPI).The latest version of the library may be obtained via SVN (the homepage of the project: http://code.google.com/p/heo/).
In the exact version of the Kameda-Weiner algorithm the search for grids of the RAM and the search for minimum legitimate covers were parallelized using OpenMP and MPI techniques.In the heuristic versions of this algorithm the parallel versions of SHC and SA algorithms of the HeO library were used.Each version of the algorithm is implemented in a separate solver.
Let us compare the performance of the exact and heuristic solvers for the random sample of the 100 pairwise inequivalent trim NFAs generated with the following parameters: number of states  7 and 8. Since LS is used only at the last stage of the Kameda-Weiner algorithm the first 4 columns of Table 6 will be the same for heuristic solvers and we replace them with the following columns: T -the total number of minimum (reduced) state NFAs found for the sample, U -number of unminimized automata, T-average minimization time in seconds (for columns the mean values are provided).
As it can be seen from these tables the total number of the minimum state NFAs increases and the number of unminimized automata decreases as the number of threads grows.The average minimization time is very small and remains almost constant up to 4 threads and then increases proportionally to the number of threads because the hardware used for experiments supports simultaneous execution only of 4 threads.

Conclusion
In the present paper we have considered new heuristic algorithms for NFA state minimization problem which is known to be computationally hard.These algorithms are a combination of the classical Kameda-Weiner algorithm and local search heuristics which are widely used in combinatorial optimization.The essential feature of the proposed algorithm is that the most time consuming part of the exact algorithm is replaced with fast local search procedures.Numerical experiments have shown that such combination is much less time consuming and allows to BestSolutionthe Kameda-Weiner algorithm, i.e. the exhaustive search for minimum legitimate covers (see

Table 5 . Minimum state NFA equivalent to A.
p Algorithm 1. Kameda-Weiner algorithm.Require: NFA A 1: Construct canonical automata B and C 2: Construct RAM 3: Find all grids of RAM 4: Find minimum legitimate cover(s) of RAM and construct minimum state NFA(s) using intersection rule Ensure: Minimum state NFA(s) equivalent to A Copyright © 2012 SciRes.IJCNS 3 and 4 which perform the exhaustive search for grids and covers respectively.

Algorithm 4. Exhaustive search for minimum legitimate covers.
by non-exhaustive LS procedures (of course, this may result in obtaining approximate solutions, i.e. reduced NFAs).This approach is described by Algorithm 5.

Table 6 .
Here, m and n are the number of rows and number of columns in RAM respectively; d is the density of ones in RAM; G is the number of grids in RAM; M N is the number of the minimum state NFAs; M Q is the number of states in the minimum state NFAs; min and max are minimal and maximal values;  is the mean value;  is the standard deviation.The results were obtained using 4 threads.The average minimization time is 1.1 seconds and the total number of the minimum state automata found for the whole sample is 268, 9 automata have no equivalent minimum state NFA.As it can be seen from the table some of the automata in the sample have multiple minimum state NFAs (the mean value is 2.95) and the average number of states in the minimum state NFAs is 3.52.Now let us consider the results of the heuristic solvers obtained with different number of threads N = 1, 2, 4, 8, 16, 32, 64 which are presented in Tables

Table 8 . SHC algorithm results.
In the future we plan to concentrate on the other time consuming part of the Kameda-Weiner algorithm, i.e. the exhaustive search for grids of the RAM.