Parallelization of a Branch and Bound Algorithm on Multicore Systems


The general m-machine permutation flowshop problem with the total flow-time objective is known to be NP-hard for m ≥ 2. The only practical method for finding optimal solutions has been branch-and-bound algorithms. In this paper, we present an improved sequential algorithm which is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The experimental results show that the proposed scheme exhibits improved performance compared with the algorithm in [1]. More importantly, our method can be easily extended and implemented with lightweight threads to speed up the execution times. Good speedups can be obtained on shared-memory multicore systems.

Share and Cite:

Chung, C. , Flynn, J. and Sang, J. (2012) Parallelization of a Branch and Bound Algorithm on Multicore Systems. Journal of Software Engineering and Applications, 5, 621-629. doi: 10.4236/jsea.2012.58071.

1. Introduction

In the permutation flowshop problem, each of n jobs has to be processed on machines 1···m in that order. The processing times of each job on each machine are known. At any time, each machine can process at most one job and each job can be processed on at most one machine. Once the processing of a job on a machine has started, it must be completed without interruption. The usual objectives are the minimization of the make-span, flow time, tardiness, lateness, and the number of jobs late. For a review of the general flowshop problem, see [2], and more recently [3]. The application of the flowshop scheduling research can be found in the areas such as chemical process industry and manufacturing systems, especially flexible transfer or assembly lines in which a wide range of parts are manufactured [4].

Schedules where each job must be processed in the same order at every machine are called permutation schedules. When m ≤ 2, the restriction to permutation schedules is harmless; however, when m > 3, there may exist a schedule whose total flow is strictly less than the total flow of any permutation schedule. Finding such a schedule among (n!)m possible schedules is often computationally impractical [3]. Therefore, most approaches to the m-machine flowshop problem restrict attention to permutation schedules.

The general m-machine permutation flowshop problem with the total flow-time objective is known to be NPhard for m ≥ 2. Flow time measures the time a job stays in the system. Minimizing it amounts to maximizing the utilization of resources. Because it is highly unlikely to develop a polynomial algorithm to solve the problem, researchers have focused on the use of branch and bound algorithms to find an optimal schedule for the problem.

In this paper, we propose an improved branch and bound algorithm which is based on the existing algorithm [1] for m-machine permutation flowshop problems. To find the optimal schedule with the minimum total flow time efficiently, the existing algorithm adopts a dominance test and lower bound to fathom nodes. Our approach to improving the algorithm is based on a strict alternation of Generation and Exploration execution modes as well as Depth-First/Best-First hybrid strategies. The algorithm starts in the Generation mode by using the depth-first search to traverse the tree from the root down to a predetermined level. The nodes explored at this level will be inserted into a work pool. When the pool is full, the algorithm switches to the Exploration mode which takes the node with the minimum (i.e. the best) lower bound in the pool as the root of a subtree and then performs a local depth-first search on the corresponding subtree. After the subtree has been explored, the execution changes back to the Generation mode again to find another candidate and insert it to the work pool. Our method fairly selects the best one from the pool because the nodes in the pool are at the same level. Furthermore, the size of pool can be controlled by a predetermined value and hence it will not grow arbitrary large as in the best-first approach. Our empirical results show that the proposed method exhibits improved performance compared with the existing algorithm.

Considering the lengthy process in the search for lower bounds, the use of parallelism has a better chance of speeding up the execution of the algorithm and has emerged as an attractive way of solving larger problems [5]. One of the current trends in microprocessor architecture design is continually increasing chip-level parallelism. Multi-core processors, providing 2 - 16 scalar cores, are now commonplace and affordable. Software designers often use processes or threads to exploit the power of multicore processors. Our improved sequential algorithm can be easily extended and implemented on shared-memory multicore platforms by allowing several worker threads to explore the subtrees concurrently. We have conducted several experiments on a multicore system and the results indicate that almost linear speedups can be obtained.

Related Work

Sequential and parallel branch-and-bound algorithms have been widely studied over the past several decades. Regarding the permutation flowshop problems, the paper [6] developed a branch and bound algorithm for the two-machine case and the other paper [7] extended it to the m-machine case. A new machine-based lower bound for the m-machine case was derived in [8]. For the search strategies, the paper [9] evaluated the depth-first search and best-first search branch and bound algorithms and suggested a best-first search should be used when it expands a much smaller number of subproblems than that of a depth-first search. Otherwise, a depth-first search should be considered. It also pointed out that the choice of a search method is problem dependent.

The paper [10] presented a Java-based software system for large-scale, fault-tolerant, adaptive parallel computing. A master/worker-based branch and bound computation was used as an example for solving Traveling Salesman Problem on a cluster of workstations. In the area of computational biology, a recent paper [11] implemented a parallel branch and bound algorithm for constructing minimum ultrametric trees. The algorithm was designed on distributed memory multiprocessors using the master/worker model.

If there is no master used for generating tasks, the decomposition of the problem tree relies on all cooperating processes. A decomposition method for parallel depthfirst search can be found in [12]. Upon request, the work in a donor’s stack is split into two stacks and one of which is given to the requester. In the paper [13], a parallel decomposite best-first search branch-and-bound algorithm for MIN-based multiprocessor systems was proposed. A probabilistic model is used to estimate the number of evaluated nodes for a serial best-first search and to predict the speed-up of the parallel branch-andbound algorithm. The paper [14] used a simple but elegant method to generate the subtree tasks in a parallel depth-first branch and bound algorithm for solving the quadratic assignment problem on shared memory multiprocessors. In their method, when a process gets a node, say X, at the cutoff level, this process will generate a sibling node of X for other processes before exploring the subtree rooted at node X. Actually, this method works similarly to our model with the pool size equals to one.

An alternative way of dealing with permutation flowshop problems is to develop effective heuristics to obtain approximate solutions. The evaluation and comparison of several heuristic approaches for permutation flowshops can be found in [15,16]. Recently, metaheuristics, such as genetic algorithms, simulated annealing, tabu search, etc., have been successfully applied to many combinatorial optimization problems. See [17,18] for reviews of the literature on metaheuristics.

2. The Hybrid Model

Assume that there are n jobs which will be processed in the same order by m machines. Consider the search tree where root node represents the null schedule. Every other node represents a partial schedule, indicating that job occupies the jth position on each machine, for 1 ≤ j ≤ s, where s is the number of jobs in the partial schedule and 1 ≤ s ≤ n. Any permutation of the set of unscheduled jobs defines a complete schedule. By placing any unscheduled job i in position s+1, we produce a descendant node.

Consider our lower bound on the total flow time at node. Let denote the total flow time for jobs in the partial schedule and let U be the set of jobs that are not included in the partial schedule. Our lower bound is based on estimates of the earliest possible start time for any job in U assigned to position t, where  s + 1 ≤ t ≤ n. The idea behind these estimates is that a job can be processed at a machine only when both the job and the machine are ready. Assume that pjk denotes the processing time of job j on machine k and the partial processing time of job j on machines u···v is represented by, our lower bound LB on the total flow time at node is computed as follows:


where Etk is an underestimate of the earliest start time of the t job on machine k, for t ≥ s + 1. The detail of computing Etk can be found in [1].

Note that some partial sequences can be pruned without checking lower bounds if they are dominated by others. Specifically, let s1 and s2 be two partial schedules for the same set of jobs S. We say that s1 dominates s2, if for every permutation s of the jobs in N − S. If the previous inequality is strict, we say that s1 strictly dominates s2. Hence, during the branch and bound searching process, the partial schedule s2 can be pruned.

A branch and bound algorithm has been developed to solve the m-machine permutation flowshop problem [1]. The basic structure of the algorithm is similar to the one developed in [19]. It employs an adaptive depth-first search (DFS) strategy, which differs from the breadthfirst strategy used in [7]. The advantages of a depth-first plus backtracking search strategy are: 1) the number of active nodes is always ≤ n; 2) the bottom of the tree is reached faster so a feasible solution can be found earlier; 3) a stack can be used to reduce computations. The program uses a heuristic method to compute an upper bound Z* from an initial feasible solution. The search strategy incorporates a dominance and a best bound rule. Given the current node, let Ur denote the set of all jobs not in the set , and for each, let denote the lower bound for node srj. Also, let and. (Note that si º sr.) For each, our algorithm dictates the following. If node ij has not been previously fathomed, use the dominance test mentioned earlier to determine whether sji strictly dominates sij. If so, then fathom sij and assign to it the lower bound; otherwise, compute a lower bound for sij. After calculating all of these lower bounds, , branch on the job j with the smallest lower bound. When the search reaches a leaf node, a complete schedule is found. If the leaf node’s lower bound value is less than the global upper bound Z*, the value and the schedule will replace the current Z* and s*, respectively.

Our improved branch and bound algorithm is based on the existing algorithm in [1]. The new algorithm consists of two modes: Generation and Exploration. In the generation mode, the program searches the upper part of the B&B tree from the root down to a node at level i and then inserts the node along with its information, such aslower bound, partial sequence, etc., into a work pool. In the current implementation, the value of the level i and the size of the pool psize, are determined when the program starts and never change from then. The nodes in the work pool are ordered by using the lower bound as the key. Initially, the program keeps generating candidates until the pool is full. After that, the program switches to the Exploration mode. It extracts the node with the minimum lower bound from the pool and develops a local depth first search on the assigned subtree rooted at level i. When the exploration of the subtree is complete, the program changes back to the Generation mode.

Figure 1 shows a simple example with level i = 3, and pool size psize = 6. Note that the algorithm generates the subtree roots at level i from left to right (denoted from A to F in Figure 1(a)) based on the old existing algorithm. Since node A has the minimum lower bound, it will be explored first as can be seen in Figure 1(b). Then, the algorithm generates next candidate G and inserts it to the work pool (Figure 1(c)). As mentioned before, the algorithm in [1] generates the nodes under the same parent one by one using their lower bound values in ascending order. Namely, A < B < C < D and E < F < G. But this does not guarantee that D has a smaller lower bound than E because they have different parent nodes. That’s why our new algorithm uses a pool to possibly adjust the search order. For example, if E has the minimum value in the pool, the subtree rooted at E will be explored next (see Figure 1(d)).

In general, the algorithm runs by strictly alternating between the Generation and Exploration modes. When the upper tree has been traversed completely, i.e. no new node will be generated into the work pool, all of the remaining nodes in the work pool will be explored based on their lower bound values. Note that the pool size should be reasonable large enough to accommodate subtree candidates as many as possible. The nodes with smaller lower bound values at level i will have higher chances to be explored and hence the optimal solution could be found earlier.

Figure 2 shows the pseudo code of the new algorithm which consists of the two major functions GENERATE() and EXPLORE(). Almost all the codes between the lines 1 - 19 and the lines 25 - 36 in Figure 2 are reused from the algorithm in [1] except that the codes in the function EXPLORE() have been modified a little bit. In the original program, the variables such as Z, U, s, etc. are declared as global and hence many computation functions can access them easily. If such a program is executed without any modification, the codes in the both Generation and Execution modes will access the same copy of permanent variables. Instead of directly using different names/copies of global variables, our solution is through the pointer t which points to the corresponding working data area, so the function EXPLORE() can explore its own assigned subtree using the dedicated space. Hence,

Conflicts of Interest

The authors declare no conflicts of interest.


[1] C. Chung, J. Flynn and O. Kirca, “A Branch and Bound Algorithm to Minimize the Total Flow Time for m-Machine Permutation Flowshop Problems,” International Journal of Production Economics, Vol. 79, No. 3, 2002, pp. 185-196. doi:10.1016/S0925-5273(02)00234-7
[2] K. R. Baker, “Introduction to Sequencing and Scheduling,” Wiley, New York, 1974.
[3] J. N. D. Gupta and E. F. Stafford Jr., “Flowshop Scheduling Research after Five Decades,” European Journal of Operational Research, Vol. 169, No. 3, 2006, pp. 699-711. doi:10.1016/j.ejor.2005.02.001
[4] R. A. Dudek, S. S. Panwalker and M. L. Smith, “The Lessons of Flowshop Scheduling Research,” Operations Research, Vol. 40, No. 1, 1992, pp. 7-13.
[5] D. Gelenter and T. G. Crainic, “Parallel Branch and Bound Algorithms: Survey and Synthesis,” Operation Research, Vol. 2, 1994, pp. 1042-1066.
[6] E. Ignall and L. Schrage, “Application of the Branch and Bound Technique to Some Flow-Shop Scheduling Problems,” Operations Research, Vol. 13, No. 3, 1965, pp. 400-412.
[7] S. P. Bansal, “Minimizing the Sum of Completion Times of n Jobs Over m Machines in a Flowshop—A Branch, Bound Approach,” AIIE Transactions, Vol. 9, 1977, pp. 306-311. doi:10.1080/05695557708975160
[8] R. H. Ahmadi and U. Bagchi, “Improved Lower Bounds for Minimizing the Sum of Completion Times of n Jobs Over m Machines in a Flow Shop,” European Journal of Operational Research, Vol. 44, 1990, pp. 331-336. doi:10.1016/0377-2217(90)90244-6
[9] C.-F. Yu and B. W. Wah, “Efficient Branch-and-Bound Algorithms on a Two-Level Memory System,” IEEE Transactions on Software Engineering, Vol. 14, No. 9, 1988, pp. 1342-1356. doi:10.1109/32.6177
[10] M. O. Neary and P. Cappello, “Advanced eager scheduling for Java-Based Adaptive Parallel Computing,” Concurrency and Computation: Practice and Experience, Vol. 17, No. 7-8, 2005, pp. 797-819.
[11] K. Yu, J. Zhou, C. Lin and C. Tang, “Efficient Parallel Branch-and-Bound Algorithm for Constructing Minimum Ultrametric Trees,” Journal of Parallel and Distributed Computing, Vol. 69, No. 11, 2009, pp. 905-914.
[12] V. N. Rao and V. Kumar, “Parallel Depth-First Search on Multiprocessors—Part I: Implementation,” International Journal of Parallel Programming, Vol. 16, No. 6, 1987, pp. 479-499. doi:10.1007/BF01389000
[13] M. K. Yang and C. R. Das, “Evaluation of a Parallel Branch-and-Bound Algorithm on a Class of Multiprocessors,” IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 1, 1994, pp. 74-86. doi:10.1109/71.262590
[14] B. Mans, T. Mautor and C. Roucairol, “A Parallel Depth First Search Branch and Bound Algorithm for the Quadratic Assignment Problem,” European Journal of Operational Research, Vol. 81, No. 3, 1995, pp. 617-628. doi:10.1016/0377-2217(93)E0334-T
[15] J. Framinan, R. Leisten and R. Ruiz-Usano, “Comparison of Heuristics for Flow Time Minimisation in Permutation Flowshops,” Computers & Operations Research, Vol. 32, No. 5, 2005, pp. 1237-1254.
[16] Y. D. Kim, “Heuristics for Flowshop Scheduling Problems Minimizing Mean Tardiness,” Journal of Operational Research Society, Vol. 44, No. 1, 1993, pp. 19-28.
[17] C. Reeves, “Genetic Algorithms for the Operations Researcher,” INFORMS Journal of Computing, Vol. 9, No. 3, 1997, pp. 231-250.
[18] C. Ruiz and C. Maroto, “Comprehensive Review and Evaluation of Permutation Flowshop Heuristics,” European Journal of Operational Research, Vol. 165, 2005, pp. 479-494. doi:10.1016/j.ejor.2004.04.017
[19] C. N. Potts, “An Adaptive Branching Rule for the Permutation Flow-Shop Problem,” European Journal of Operational Research, Vol. 5, No. 1, 1980, pp. 19-25.
[20] B. Nichols, D. Buttlar and J. P. Farrel, “Pthreads Programming,” 1st Edition, O’Reilly & Associates, Inc., Sebastopol, 1996.
[21] K. A. Robbins and S. Robbins, “Practical UNIX Programming,” Prentice-Hall, Englewood Cliffs, 1996.
[22] E. Taillard, “Benchmarks for Basic Scheduling Problems,” European Journal of Operational Research, Vol. 64, No. 2, 1993, pp. 278-285.
[23] W. Press, S. Teukolsky, W. Vetterling and B. Flannery, “Numerical Recipes in C: The Art of Scientific Computing,” 2nd Edition, Chapter 7, Cambridge University Press, Cambridge, 1992.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.