The Sliding Gradient Algorithm for Linear Programming

The existence of strongly polynomial algorithm for linear programming (LP) has been widely sought after for decades. Recently, a new approach called Gravity Sliding algorithm [1] has emerged. It is a gradient descending method whereby the descending trajectory slides along the inner surfaces of a polyhedron until it reaches the optimal point. In R, a water droplet pulled by gravitational force traces the shortest path to descend to the lowest point. As the Gravity Sliding algorithm emulates the water droplet trajectory, it exhibits strongly polynomial behavior in R. We believe that it could be a strongly polynomial algorithm for linear programming in Rn too. In fact, our algorithm can solve the Klee-Minty deformed cube problem in only two iterations, irrespective of the dimension of the cube. The core of gravity sliding algorithm is how to calculate the projection of the gravity vector g onto the intersection of a group of facets, which is disclosed in the same paper [1]. In this paper, we introduce a more efficient method to compute the gradient projections on complementary facets, and rename it the Sliding Gradient algorithm under the new projection calculation.


Introduction
The simplex method developed by Dantiz [2] has been widely used to solve many large-scale optimizing problems with linear constraints.Its practical performance has been good and researchers have found that the expected number of iterations exhibits polynomial complexity under certain conditions [3] [4] [5] [6].However, Klee and Minty in 1972 gave a counter example showing that its

Cone-Cutting Principle
The cone-cutting theory [16] offers a geometric interpretation of a set of inequality equations.Instead of considering the full set constraint equations in a LP problem, the cone-cutting theory enables us to consider a subset of equations, and how an additional constraint will shape the feasible region.The geometric insight forms the basis of our algorithm development.

Cone-Cutting Principle
In an m-dimension space m  , a hyperplane  into two half spaces.Here τ is the normal vector of the hyperplane and c is a constant.We , , , m α α α  form a cone and V is the vertex of the cone.We now give a formal definition of a cone, which is taken from [1].τ is called the accepted zone of C. The point V is the vertex and α j is the facet plane, or simply the facet of C.
A cone C also has m edge lines.They are formed by the intersection of (m − 1) facets.Hence, a cone can also be defined as follows.
Definition 2. Given m rays { }(  shooting from a point V with rank ( ) ; , , , , , the convex closure of m rays is called a cone in m  .j R is the edge, j r the edge direction, and the edge line of the cone C.
The two definitions are equivalent.Furthermore, P.Z.Wang [11] has observed that i R + and i α are opposite to each other for 1, , i m =  .Edge-line i R + is the intersection of all C-facets except i α , while facet i α is bounded by all C-edges except i R + .This is the duality between facets and edges.For { } 1, , , , is called a pair of cone C.
It is obvious that T 0 j i = r τ (for i j ≠ ) since j r lies on i α .Moreover, we have ( ) T 0 for 1, ,

Cone Cutting Algorithm
Consider a linear programming (LP) problem and its dual: In the following, we focus on solving the dual LP problem.The standard simplex tableau can be obtained by appending an m m × identity matrix m m I × which represents the slack variables as shown below: We can construct a facet tableau whereby each column is a facet denoted as ( ) , , , The facet tableau is depicted as follow.The last column ( ) ; , , ; , , is intersected by another facet j α , the i th edge of the cone is intersected by j α at certain point ij q .We call j α cuts the cone C and the cut points ij q can be obtained by the following equations: ( ) where The intersection is called real if 0 i t ≥ and fictitious if 0 i t < .Cone cutting greatly alters the accepted zone, as can be seen from the simple 2-dimension example as shown in Figures 1(a)-(e).In 2-dimension, a facet

( )
: ,c α τ is a line.The normal vector τ is perpendicular to this line and points to the ac- cepted zone of this facet.Furthermore, a cone is formed by two non-parallel facets in 2-dimension.Figure 1(a) shows such a cone ( ) ; , C α α V . The accepted zone of the cone is the intersection of the two accepted zones of facets α 1 and α 2 .
This is represented by the shaded area A in Figure 1(a).In Figure 1(b), a new facet α 3 intersects the cone at two cut points 13 q and 23 q .They are both real cut points.Since the arrow of normal vector 3 τ points to the same general di- rection of the cone, V lies in the rejected zone of α 3 and we say α 3 rejects V.Moreover, the accepted zone of α 3 intersects with the accepted zone of the cone so that the overall accepted zone is reduced to the shaded area marked as B. In Figure 1(c), 3 τ points to the opposite direction.α 3 accepts V and the overall Figure 1.Accepted zone area of a cone and after it is cut by a facet.accepted zone is confined to the area marked as C. As the dual feasible region  of a LP problem must satisfy all the constraints, it must lie within area C. In Figure 1(d), α 3 cuts the cone at two fictitious points.Since 3 τ points to the same direction of the cone, V is accepted by α 3 .However, the accepted zone of α 3 covers that of the cone.As a result, α 3 does not contribute to any reduction of the overall accepted zone area, and so it can be deleted for further consideration without affecting the LP solution.In Figure 1(e), 3 τ points to the opposite di- rection of the cone.The intersection between the accepted zone of α 3 and that of the cone is an empty set.This means that the dual feasible region  is empty and the LP is infeasible.This is actually one of the criteria that can be used for detecting infeasibility.
Based on this cone-cutting idea, P.Z.Wang [16] [17] have developed a cone-cutting algorithm to solve the dual LP problem.Each cone is a combination of m facets selected from (m + n) choices.Let ∆ denotes the index set of facets of C, (i.e. if ( ) The algorithm starts with an initial coordinate cone C o , then finds a facet   to replace one of the existing facet out α thus forming a new cone.This process is repeated until an optimal point is found.The cone-cutting algorithm is summarized in Table 1 below.This algorithm finds a facet that rejects V the least as the cutting facet in steps 2 and 3.This facet cuts the edges of the cone at m points.In step 4 and 5, the real cut point * I q that is closest to the vertex V is identified.This becomes the  ( ) 1,0, ,0 , 0,1, ,0 , , 0,0, ,1 2 % For all facets not in C & reject V, find one that rejects V the least.
else % For all the real cuts q i, , find the q I* that is closest to V % I * is the facet index to leave 6 % Form new cone C k+1 by updating V k+1 ; edge vectors and facets vertex of a new cone.This new cone retains all the facets of the original cone except that the cutting facet replaces the facet corresponding to the edge I * .Yet the edge I * is retained but the rest of the edges must be recomputed as shown in step 6.Amazingly, P.Z.Wang shows that when 0 > b , this algorithm produces exactly the same result as the original simplex algorithm proposed by Dantz [2].
Hence, the cone-cutting theory offers a geometric interpretation of the simplex method.More significantly, it inspires the authors to explore new approach to tackle the LP problem.

Sliding Gradient Algorithm
Expanding on the cone-cutting theory, the Gravity Sliding Algorithm [1] was developed to find the optimal solution of the LP problem from a point within the feasible region .Since then, several refinements have been made and they are presented in the following sections.

Determining the General Descending Direction
The feasible region  is a convex polyhedron formed by constraints , and the optimal feasible point is at one of its vertices. Let τ be the set of feasible vertices.The dual LP problem (3) can then be stated as: V and b, the optimal vertex * V is the vertex that yields the lowest inner-product value.Thus we can set the principle descending direction 0 g to be the opposite of the b vector (i.e.0 = − g b) and this is referred to as the gravity vector.The descending path then descends along this principle direction inside  until it reaches the lowest point in  viewed along the direction of b.This point is then the optimal vertex * V .

Circumventing Blocking Facets
The basic principle of the new algorithm can be illustrated in Figure 2. Notice that in 2-dim, a facet is a line.In this figure, these facets (lines) form a closed polyhedron which is the dual feasible region .Here the initial point P 0 is inside .From P 0 , it attempts to descend along the 0 = − g b direction.It can go as far as P 1 which is the point of intersection between the ray 0 0 t = + R P g and the facet α 1 .In essence, α 1 is blocking this ray and hence it is called the blocking facet relative to this ray.In order not to penetrate , the descending direction needs to change from g 0 to g 1 at P 1 , and slides along g 1 until it hits the other blocking facet α 2 at P 2 .Then it needs to change course again and slides along the direction g 2 until it hits P 3 .In this figure, P 3 is the lowest point in this dual feasible region  and hence it is the optimal point * V .
It can be observed from Figure 2 that g 1 is the projection of g 0 onto α 1 and g 2 is the projection of g 0 onto α 2 .Thus from P 1 , the descending path slides along α 1 to reach P 2 and then slides along α 2 to reach P 3 .Hence we call this algorithm Sliding Gradient Algorithm.The basic idea is to compute the new descending direction to circumvent the blocking facets, and advance to find the next one until it reaches the bottom vertex viewed along the direction of b.
Let t σ denotes the set of blocking facets at the t th iteration.From an initial point P 0 and a gradient descend vector g 0 , the algorithm iteratively performs the following steps: 1) compute a gradient direction g t based on t σ .In this example, the initial set 2) move t P to 1 t+ P along t g where 1 t+ P is a point at the first blocking facet.
3) Incorporate the newly encountered blocking facet to t σ to form 4) go back to step 1.
The algorithm stops when it cannot find any direction to descend in step (1).This is discussed in details in Section 3.6 where a formal stopping criterion is given.

Minimum Requirements for the Gradient Direction gt
For the first step, the gradient descend vector t g needs to satisfy the following requirements.
Proposition 1. t g must satisfy ( ) so that the dual objective function T y b will be non-increasing when y move from t P to Proof.Since ( ) , t g aligns to the principle direction of 0 g .As

END
This means that if ( ) , then ensure that 1 t+ P remains dual feasible (i.e.
Proof.If for some j, , this means that t g is in the opposite direction of the normal vector of facet ( ) t j σ α so a ray t t t = + Q P g will even- tually penetrate this facet for certain positive value of t.This means that Q will be rejected by and hence Q is no longer a dual feasible point.END

Maximum Descend in Each Iteration
To ensure that where Proof.The equation for a line passing through P along the direction g is t + P g.If this line is not parallel to the plane (i.e.
For those displacements where 0 j t ≥ , we have that * j t is the minimum of all j t in this group.Let 3 k be the ratio between * j t and j t .Obviously, g is parallel to j α .Unless all facets are parallel to t g , Propo- sition 3 can still find the next descend point

Gradient Projection
We now show that the projection of 0 g onto the set of blocking facets t σ sa- tisfies the requirements of Proposition 1 and 2. Before we do so, we discuss the projection operations in subspace first.

Projection in Subspaces
Projection is a basic concept defined in vector space.Since we are only interested in the gradient descend direction of t g but not the actual location of the pro- and the null space is , , , | 0 for and , , , , , , k N τ τ τ  are the orthogonal decomposition of the whole space Y, a vector g in m  can be decomposed into two components: the projection of g onto ( ) and the projection of g onto ( ) , , , k g τ τ τ ↓  and  to denote them and they are called direct projection and null projection respectively.
The following definition and theorem were first presented in [1] and is repeated here for completeness.
Let the set of all subspaces of m Y =  be , and let  stand for 0-dim sub- space, we now give an axiomatic definition of projection.
Definition 3. The projection defined on a vector space Y is a mapping where * is a vector in Y, # is a subspace X in  satisfying that (N.1) (Reflectivity).
For any Y ∈ g and subspaces , X Z ∈  , if X and Z are orthogonal to each other, then , where X Z + is the direct sum of X and Z.
For any Y ∈ g and subspaces , , and especially, , , , k N τ τ τ  . We now show another approach that is more suitable to our overall algorithm.Theorem 1.For any Y ∈ g , we have where { } are an orthonormal basis of subspace ( ) and [ ] Hence ( 8) is true.END The following theorem shows that the projection of 0 g onto the set of all blocking facets σ t always satisfies Propositions 1 and 2. First, let us simplify the notation and use σ to represent σ t in the following section and 0 σ ↓ g to stand for ( ) ( ) where k σ = is the number of elements in σ.
According to (N.5), ( ) . So it satisfies proposition 1 too.END As such, 0 σ ↓ g , the projection of 0 g onto all the blocking facets, can be adopted as the next gradient descend vector t g .Hence, 0 σ ↓ g , the projection of 0 g onto all the blocking facets, can be adopted as the next gradient descend vector t g .

Selecting the Sliding Gradient
In this section, we explore other projection vectors which also satisfy Propositions 1 and 2. Let the j th complement blocking set c j σ be the blocking set σ ex- cluding the j th element; i.e. , which satisfy this proposition, we can compute the inner product of each candidate with the initial gradient descend vector 0 g , (i.e. ( ) ) and select the maximum.This inner product is a measure of how close or similar a candidate is to 0 g so taking the maximum means getting the steepest descend gradient.Notice that if a particular 0 c j σ ↓ g is selected as the next gradient descending vector, the corresponding j α is no longer a blocking facet in computing 0 c j σ ↓ g .Thus j α needs to be removed from t σ to form the set of effective blocking facets * t σ .The set of blocking facets for the next ite- ration 1 t σ + is * t σ plus the newly encountered blocking facet.In summary, the next gradient descend vector t g is: where [ ] The effective blocking set At first, this seems to increase the computation load substantially.However, we now show that once 0 σ ↓ g is computed, 0 c j σ ↓ g can be obtained efficiently.

Computing the Gradient Projection Vectors
This session discusses a method of computing , , , k o o o  can be obtained from the Gram Schmidt procedure as follows: Let us introduce the notation ↓ a b to denote the projection of vector a onto vector b.We have as ( ) ; Thus from (8), After evaluating σ ↓ g , we can find Likewise, it can be shown that The first summation is projections of g onto existing orthonormal basis i o .
Each term in this summation has already been computed before and hence is readily available.However, the second summation is projections on new basis ( ) o .Each of these basis must be re-computed as the facet j α is skipped in c j σ .

Let
; 0 ; Then we can obtain To compute ( ) j i o , some of the intermediate results in obtaining the ortho- normal basis can also be reused.

Let
; for 1, , By using these intermediate results, the computation load can be reduced substantially.

Termination Criterion
When a new blocking facet is encountered, it will be added to the existing set of blocking facets.Hence both t σ and * t σ will typically grow in each iteration unless one of the 0 c j σ ↓ g is selected as t g .In this case, ( ) (10).The following theorem, which was first presented in [1]   shows that when As mentioned about the facet/edge duality in Section 2, for 1, , j m =  , edge-line i R + is the intersection of all C-facets except i α .That means Since an edge-line is a 1-dimensional line, the projection of a vector 0 g onto i R + equals to i ±r and hence Since σ would be less than m.This means that at least one of j σ ∈ has a value ( ) . This contradicts to the fact that −g r means that edge i r is in opposite direction of 0 g .As this is true for all edges, there is no path for t g to descend further from this vertex.It is ob- vious that the vertex V is the lowest point of C when viewed in the b direction.
Since t P is dual feasible, and V is a vertex of .Cone C coincides with the dual feasible region  in a neighborhood N of V, it is obvious that t P is the lowest point of  when viewed in the b direction.END DOI: 10.4236/ajor.2018.82009125 American Journal of Operations Research In essence, when the optimal vertex * V is reached, all the edges of the cone will be in opposite direction of the gradient vector 0 = − g b.There is no path to descend further so the algorithm terminates.

The Pseudo Code of the Sliding Gradient Algorithm
The entire algorithm is summarized as follows in Table 2.
Step 0 is the initialization step that sets up the tableau and the starting point P.
Step 2 is to find a set of initial blocking facets σ in preparation of step 4. In the inner loop, Step 4 calls the Gradient Select routine.It computes 0 σ ↓ g and in view of σ using Equations ( 11) to (21) and select the best gradient vector g according to (9).This routine not only returns g but also the effective blocking facets * σ and 0 c j σ ↓ g for subsequent use.Theorem 3 states that when the size of * σ reaches m, the optimal point is reached.So when it does, step 5 returns the optimal point and the optimal value to the calling routine.
Step 6 is to find the closest blocking facet according to (6).Because P lies on every facets of σ, Hence, we only need to compute those j t where j σ ∉ .The newly found blocking facet is then included in σ in step 7 and the Table 2.The sliding gradient algorithm.inner loop is repeated until the optimal vertex is found.

Experiment on the Klee-Minty Problem
We use the Klee-Minty example presented in [18] 1 to walk through the algorithm in this section.An example of the Klee-Minty Polytope example is shown below: For the standard simplex method, it needs to visit all 1 2 m− vertices to find the optimal solution.Here we show that, with a specific choice of initial point 0 P , the Sliding Gradient algorithm can find the optimal solution in two iterations-no matter what the dimension m is.
To apply the Sliding Gradient algorithm, we first construct the tableau.For an example with 5 m = , the simplex tableau is: Firstly, notice that 5 α and 10 α have the same normal vector (i.e. 5 10 = τ τ ) so we can ignore 10 α for further consideration.This is true for all value of m.
If we choose 0 M = P b, where M is a positive number (e.g.

M =
), It can be shown that 0 P is inside the dual feasible region.The initial gradient descend vector is: 0 = − g b.
With 0 P and 0 g as initial conditions, the algorithm proceeds to find the first blocking facet using (6).The displacements j t for each facet can be found by: With 0 P and 0 g as initial conditions, the algorithm proceeds to find the first blocking facet using (6).The displacements j t for each facet can be found by: We now show that the minimum of all displacements is m t .
First of all, at = m, , and the elements of j τ are: The 2 nd term of Equation ( 22) can be re-written as: The inner product of the denominator is: Since all the elements in the b vector and the τ are positive, the summation is a positive number.Thus Since the value of the denominator is bigger than The gradient vector 1 τ is already an orthonormal vector, we have according to (8) ( ) τ τ .
In other words, 1 g is the same as −b except that the last element is zeroed out.Using 1 P and 1 g , the algorithm proceeds to the next iteration and eva- luates the displacements j t again.For For 1 j m ≤ < , we have:   Thus with a specific choice of the initial point 0 M = P b, the Sliding Gradient algorithm can solve the Klee-Minsty LP problem in two iterations, and it is independent of m.

Issues in Algorithm Implementation
The Sliding Gradient Algorithm has been implemented in MATLAB and tested on the Klee-Minty problems and also self-generated LP problems with random coefficients.As a real number can only be represented in finite precision in digital computer, care must be taken to deal with the round-off issue.For example, when a point P lies on a plane T c = y τ , the value should be exactly zero.But in actual implementation, it may be a very small positive or negative number.Hence in step 2 of the aforementioned algorithm, we need to set a threshold δ so that if d δ < , we regard that point P is laid on the plane.Like- wise for the Klee-Minty problem, this algorithm relies on the fact that in the second iteration, the displacement values i t for 1 i m = + to 2 1 m − should be the same and they should all be smaller than the values of j t for 1 j = to 1 m − .Due to round-off errors, we need to set a tolerant level to treat the first group to be equal and yet if this tolerant level is set too high, then it cannot exclude members of the second group.The issue is more acute as m increases.It will require higher and higher precision in setting the tolerant level to distinguish these two groups.

Conclusions and Future Work
We have presented a new approach to tackle the linear programming problem in this paper.It is based on the gradient descend principle.For any initial point inside the feasible region, it will pass through the interior of the feasible region to reach the optimal vertex.This is made possible by projecting the gravity vector to a set of blocking facets and using that as descending vector in each iteration.
In fact, the descending trajectory is a sequence of line segments that hug either a single blocking facet or the intersections of them, and each line segment is ad- vancing towards the optimal point.It should be noted that there is no parameters (such as step-size, ..., etc.) to tune in this algorithm although one needs to take care of numerical round-off issue in actual implementation.
This work opens up many areas of future research.On the one hand, we are extending this algorithm so that it can relax the constraint of starting from a point inside the feasible region.Promising development has been achieved in this area though more thorough testing on obscure cases need to be carried out.
On the theoretical front, we are encouraged that, from the algorithm walk-through on the Klee-Minty example, this algorithm exhibits strongly polynomial complexity characteristics.Its complexity does not appear to depend on the bit sizes of the LP coefficients.However, more rigorous proof is needed and we are working towards this goal.
denote the positive half space { } T | c ≥ y y τ the accepted zone of the hyperplane and the negative half space where { } is rejected zone.Note that the normal vector τ points to accepted zone area and we call the hyperplane with such orientation a facet ( ) : ,c α τ .When there are m facets in m linear independent, this set of linear equations has a unique solution which is a point V in m  .Geometrically, { } 1 2

Definition 1 .
Given m hyperplanes in m  , with rank ( )

(
steps Input: A, b and c Output: either V as the optimal point & V T b as optimal value or declare the LP problem as infeasible.
will intersect a facet ( ) : ,c α τ at a point Q according to the following equation:

[
2 and hence is a candidate for consideration.For all the candidates, including 0 σ ↓ g the algorithm can stop.Theorem 3 (Stopping criterion) Assuming that the dual feasible region  is non-empty, let t ∈ P  and is descending along the initial direction 0 = − g b;let * tσ be the number of effective blocking facets in * t σ at the t th iteration.If then t P is a lowest point in the dual feasi- ble region .Since the rank is m, its corresponding null space contains only the zero vector.So

(
selected as the next gradient descend vector and, according to(10), it would be deleted from

steps
Input: A, b and c  ; and  is non-empty and P 0 is inside of the Klee-Minty formulas have also been tested and the same results are obtained.
is the smallest displacement.For the case of 5 m = , their values are shown in the first row (first iteration) of the following Table3.Thus m α is the closest blocking facet.Hence,

τm t to 2 1 mt
is a unit vector with only one non-zero entry at the j th element, − have the same value of 5 m − .

1 g
is the same as −b except that the last element is zero, we can express T j

2 P
has reached a vertex of a cone.According to Theorem 3, the algorithm stops.The optimal value is the last element of the b vector.
τ rather than the affine space spanned by the hyperplanes.
jection, we can ignore the constant c in the hyperplane { } T | c = y y τ .In other H. C. Lui, P. Z. Wang DOI: 10.4236/ajor.2018.82009121 American Journal of Operations Research words, we focus on the subspace ( ) V τ spanned by τ and its null space ( ) N