A Comparison between Evolutionary Metaheuristics and Mathematical Optimization to Solve the Wells Placement Problem

The Wells Placement Problem (WPP) consists in choosing well locations within an oil reservoir grid to maximize the reservoir total oil production, subject to distance threshold between wells and number of wells cap constraints. A popular approach to WPP is Genetic Algorithms (GA). Alternatively, WPP has been approached in the literature through Mathematical Optimization. Here, we conduct a computational study of both methods and compare their solutions and performance. Our results indicate that, while GA can provide near-optimal solutions to instances of WPP, typically Mathematical Optimization provides better solutions within less computational time.


Introduction
Consider an oil reservoir for which oil production totals over the entire reservoir area are given (or estimated).We are interested in the Wells Placement Problem (WPP), i.e., the problem of maximizing the total oil production of the reservoir by selecting locations to build at most a certain number of vertical wells at a given minimum distance apart from one another.
WPP (and variations of it) has been the subject of a number of studies in the last decades [1][2][3][4][5][6][7][8][9].The main approaches used to find solutions have been Genetic Algorithms (GA), which are a special type of Evolutionary Metaheuristics, and Integer Programming (IP), a subclass of Mathematical Optimization.
To the best of our knowledge, the earliest study of wells placement optimization using IP was the work conducted in 1974 by Rosenwald and Green [8], who developed a numerical optimization framework to select optimal positions of wells.Their objective was to minimize the difference between actual and desired flow curves by choosing optimum well locations from predefined location sets.In 1997, Vasantharajan and Cullick [9] introduced the term reservoir quality to represent reservoir connectivity adjusted by tortuosity, which was the surface area surrounding the flow by the volume contained.The well site selection process was performed through IP, with the goal of maximizing reservoir quality.Later, in 1999, Cullick et al. [4] improved the formulation of the problem by reducing its number of constraints.
On the other side, GA is the most widely used method to solve WPP optimization [1].Guyaguler et al. [6] developed a hybrid method of GA linked with polytope search and coupled with a proxy generated by kriging.The objective function in this work was to maximize net present value by finding the optimum injection well locations.The output of this work indicates that using kriging as a proxy with the hybrid optimization framework reduced the computational expense considerably.This work was extended by Badru and Kabir [2] by including horizontal wells and linking the hybrid GA with experimental design to find the impact of uncertainty on recovery and the number of wells required.In 2006, Bangerth et al. [3] compared GA with four algorithms designed to solve WPP.GA was also compared with Covariance Matrix Adaptation-Evolution Strategy by Ding [5] to optimize well locations for different types of wells with calculated productivity index as the objective function.Onwunalu et al. [7] used GA combined with statistical proxy to establish a relationship between the variables and the objective function using unsupervised learning techniques.Some of the articles cited above report computational results using either GA or IP, but we are not aware of a study that compares both.In this paper, we use GA and IP to solve instances of WPP, and report the computational results of those tests.
Specifically, our test reservoir sits on an m n  grid, where each of the sites in the grid represents a possible location for a well.For each , where is associated with the site, and each pair of sites chosen needs to be at least grid units apart from each other.Finally, the maximum number of well locations selected, , is predetermined.

Evolutionary Metaheuristics
Evolutionary Metaheuristics use ideas of biology and Darwin's theory of evolution, such as reproduction, mutation, and "survival of the fittest", to provide solutions to problems that arise in other fields of study.Because one of the most used types of evolutionary metaheuristics in petroleum engineering is genetic algorithms, we developed a GA to tackle our problem of interest (for details on the structure and use of GA in general, see [10]).

Definition of Terms and Variables
Our GA starts with an initial population that is improved through crossover, local search, mutation, and the addition of new members to the population, which we call intruders.After a generation, the best fit individuals of the population are selected, the others are discarded, and a new generation begins.The algorithm runs for a predetermined number of generations, g n , and returns the fittest individuals of the final population.The fitness of an individual, here, is the total oil production associated with it.
An individual z y is represented by an -array of ordered pairs (which we call coordinates), and denoted as , , , , , , where   p is the total oil production that results from choosing this individual, given by Equation (1).
We denote the production totals matrix by , i.e., an element of row i and column of is   , i j , and we define (0,0) We let be the sorted vector of the elements of and its corresponding coordinates: , , , , , , , , , mn mn mn q q r s q r s q r s   where and , , , , mn mn r s r s I J    .

The Genetic Algorithm
The high level structure of our GA is as follows: At each generation g , p n is the number of individuals in the initial population, c is the number of crossover operations performed, and i is the number of intruders created.The value is used in the sub-routine LOCAL-SEARCH, and will be explained later.
Lines 1 -5 initialize the variables of the algorithm, which are the individuals of the population, at zero.Line 6 creates the initial population, and lines 7 -12 perform evolutionary sub-routines for g n generations.Finally, line 13 returns the best fit individual found.
Throughout the algorithm, and for all individuals considered, it is verified whether the distance of all locations in the grid associated with the individual satisfy the minimum distance constraint.When it does, we say that the individual is feasible.This verification is done using the following sub-routine.We note that, for large values of , it may be hard, and perhaps even impossible, to locations on the grid such that the minimum ween each pair of locations is Our algorit kes atmpts to find such locations, and, in t o where the e P matrix.Thus, this routine brings to the popultion not only variability, but also members with high fitness.
Our final sub-routine selects  , then there will not be another generation, and WPP-GA will return 1 individual with the best fitness found after y , the g n generations.) A careful analysis of WPP-GA and its sub-routines shows that the algorithm terminates, but not necessarily ith an optimal solution to the problem, i.e., an individual whose fitness is the highest possible.

Mathematical Optimization
Mathematical Optimization approaches quantitative problems using tools such as linear algebra, calculus, and graph theory.In a sense, it is a more sophisticated method than Evolutionary Metaheuristics, and, in practice, sually requires bigger and more structured algorithms to case, IP, is tha i It is common practice, in IP defining an objective to be m ized ( nimized fine the problem i question.Here, the for n of WPP is as fo w u solve a given problem.
The main advantage of using mathematical optimization, and, in our particular t a solution may be proven to be optimal, as opposed to GA, which does not guarantee opt mality of the best solution found.
, to formulate problems by function axim or mi-), subject to constraints that de n mulatio llow: where a variable   , represents the location   , i j in the oil reservoir grid, and is equal to 1 if the location is chosen, or 0 otherwise.Hence, a feasible soluti this h shown.Note that EX 12 with default settings, except for the search strategy, which was set to Traditional Branch-and-Cut, and the relative and absolute gap tolerances, which were set to zero.Every test had a time limit of 3600 seconds.
Our test grid has , and we set the values of to h of these values, we tried to solve set to , up to the nt at limi et on to at satisfies all problem is a mn -vector of 0's and 1's t constraints defined in the formulation this is a completely different way of representing a solution, in comparison with our GA.Nonetheless, the values of the objective function, here, and the fitness of the individuals, in the GA, are comparable, since they have the same meaning.
To solve our instances with the IP approach, we use a commercial solver based on branch-and-cut that can handle large-scale optimization problems.For details on branch-and-cut, the reader is referred to [11].

Computational Tests
We ran our computational tests in the Texas Tech High Performance Computing Center's 3.0 GHz CPUs with 16 GB RAM nodes.For the tests using IP, we used CPL for all tests.The values of the P matrix were obtained with the commercial reservoir simulator Eclipse, using the Quality Maps approach (see [12] for details).Some main characteristics of our heterogeneous and anisotropic reservoir are: initial pressure of 4000 psi , porosity of 22%, and horizontal 10  perm of with st eability average 175 mD andard deviation of 91.1 mD .The distance between two closest grid points is 300 ft , and the thickness of the reservoir is 75 ft .
Table 1 shows the results obtained using both GA and displays the value of the ethods.(Note: in GA, this nality of the solution found, i.e., t ated wi so een the G IP.The columns "Best Sol' n" best solutions found by both m value is called the fitness of the best individual, whereas in IP it is called the objective function value of the best solution.We will use the latter expression in our analysis.)"Time" shows the computational time, in seconds, required for the test to finish.We note that, because of the settings used in CPLEX, all solutions obtained using IP are provably optimal precisely when the time is less than 3600 seconds.The column "Card" shows the cardihe number of well locations associ th that lution.Finally, "Gap" refers to the relative percentage difference betw A and IP best solutions, and is computed using Equation (2).From Table 1, it is clear that the problem gets harder as the values of D and N increase.On the IP section of the table, this is reflected in the time required to solve the instances.On the GA side, both the computational time and the cardinality of solution show the increase in the difficulty of the problem.For instance, for 8 D t  and 120 N  , the best solution that GA can find has cardinality 117, althoug tions with greater cardinality (and better objective h solu function value) do exist, as th al software developers who worked on it for decades.The seemingly small values in the last columns reflect this contrast of algorithms: a gap as small as 0.1% can very

5.
wells in an e IP results show.Overall, IP was capable of finding better solutions than GA, and, most of the time, was faster.In every test, the best solution found by IP had an objective function value at least as good as the one found by the GA.In 8 cases, both methods found an optimal solution to the problem, and in 11 other cases, the gap between the two methods was less than 1%.This shows that GA can be effective for instances that are less challenging, but, for the harder instances, IP was notably more successful than GA.
The merit of being faster and finding better solutions is due not only to the differences in approaching the problem (IP vs. GA), but also a result of the algorithms used to solve the instances.While our GA has less than 500 lines of code, CPLEX is the product of profession be mal difficult to close, and thus finding a sub-opti solution can be a much easier task than finding an optimal one.Moreover, the fact that CPLEX can guarantee the optimality of a solution is a feature that a GA does not have.It is only in comparison studies such as the present study, that one can visualize the effectiveness of a GA, in finding optimal solutions.Finally, we note that CPLEX is a multi-purpose solver that can handle a vast number of different problems, while our GA was developed specifically to tackle WPP.

Conclusions and Further Research
In this study, we compared the performance of GA and IP to solve instances of the problem of placing vertical m n  grid that sits on a reservoir, with the

ty
This routine takes as input an individual, , and returns 1 if the individual is feasible, or 0 ot ise.The procedure simply computes the Euclidea between every pair of coordinates that define the individual, and checks whether the distance is at least herw n distance .D The routine INITIAL-POPULATION can be divided into two main blocks: lines 1 -12 and 13 -29.The first block creates a "greedy" solution to the problem by choosing the coordinates of 1 y , the first individual of the population, based on th of sorted values of .The second block creates the rest of the initial po lation by selecting ran on the grid (usi the function e vector dom q locations P pu ng rand( ) A , which returns a random ele ent of a finite set m A ), with the condition that each individual created is feasible.
is te n t successful, some of the coordinates of the individual being created remain as   0, 0 .The sub-routine CROSSOVER performs the reproduction of individuals.In this routine, two individuals of the population 1 , , p n y y  are selected at random (lines 2 -3), and the crossover point, represented by d , is also selected at random (line 4).In lines 5 -12, the crossover occurs, and two new individuals are created.These new individuals are then c for feasibility, and, in the case that they are not feasible, they are "erased" from the population by setting all of th hecked eir coordinates to   0, 0 (li nes 13 -21).The procedure is repeated c n times, and the routine creates 2 c n new individuals.n the dual of Our GA then s a local search in the grid, i geographical sense, to try to improve ever indivi the population.perform y In line 2, a coordinate of individual  z y is chosen at random, and, in case it is not , a local search in the reservoir grid, within a s area and replaces it with another co picked at random the entire grid.If the replacement keeps ordinate in z y feasible and improves the value of z p , then z y is updated; otherwise, z y remains unchanged.Our last genetic operator, INTRUDERS, creates i n individuals that enter the population " utside".The intention of this proc fr is to add om o e dure variability to the population thus far created.This routine is similar to INITIAL-POPULATION.choice o the coordinates of the new individuals more ective: they are chosen a ng the top 10N valueThe main differences are in 6 the initial population of eration.the next gen-The sorting procedure in line 1 is, in fact, a renumbering of the individuals 2 surviving individuals that will start a new generation.(In case g g n

Table 1 . Summary of results using GA and IP.
pa meters on CPLEX.T ma a proaches s mple y, w chose no to ary those ar formulatio for ur I approach.rtherresea ch, we su gest the study o non-cone i involves wells at e not nece sari ve tical and thu may requ D-g d as the se of p ssible lothe wells.