^{1}

^{*}

^{1}

The existence of strongly polynomial algorithm for linear programming (LP) has been widely sought after for decades. Recently, a new approach called Gravity Sliding algorithm [
1] has emerged. It is a gradient descending method whereby the descending trajectory slides along the inner surfaces of a polyhedron until it reaches the optimal point. In
R
^{3}, a water droplet pulled by gravitational force traces the shortest path to descend to the lowest point. As the Gravity Sliding algorithm emulates the water droplet trajectory, it exhibits strongly polynomial behavior in
R
^{3}. We believe that it could be a strongly polynomial algorithm for linear programming in
R
^{n} too. In fact, our algorithm can solve the Klee-Minty deformed cube problem in only two iterations, irrespective of the dimension of the cube. The core of gravity sliding algorithm is how to calculate the projection of the gravity vector
**g** onto the intersection of a group of facets, which is disclosed in the same paper [
1]. In this paper, we introduce a more efficient method to compute the gradient projections on complementary facets, and rename it the Sliding Gradient algorithm under the new projection calculation.

The simplex method developed by Dantiz [

Is it possible to develop a strongly polynomial algorithm to solve the linear programming problem, where the number of iterations is a polynomial function of only the number of constraints and the number of variables? The work by Barasz and Vempala shed some light in this aspect. Their AFFINE algorithm [

In certain aspect, the Gravity Sliding algorithm [

This paper is organized as follows: Section 2 gives an overview of the Cone-Cutting Theory [

The cone-cutting theory [

In an m-dimension space ℝ m , a hyperplane y T τ = c cuts ℝ m into two half spaces. Here τ is the normal vector of the hyperplane and c is a constant. We denote the positive half space { y | y T τ ≥ c } the accepted zone of the hyperplane and the negative half space where { y | y T τ < c } is rejected zone. Note that the normal vector τ points to accepted zone area and we call the hyperplane with such orientation a facet α : ( τ , c ) . When there are m facets in ℝ m and { τ 1 , τ 2 , ⋯ , τ m } are linear independent, this set of linear equations has a unique solution which is a point V in ℝ m . Geometrically, { α 1 , α 2 , ⋯ , α m } form a cone and V is the vertex of the cone. We now give a formal definition of a cone, which is taken from [

Definition 1. Given m hyperplanes in ℝ m , with rank r ( α 1 , ⋯ , α m ) = m and intersection V, C = C ( V ; α 1 , ⋯ , α m ) = α 1 ∩ ⋯ ∩ α m is called a cone in ℝ m . The area { y | y T τ i ≥ c i ( i = 1 , 2 , ⋯ , m ) } is called the accepted zone of C. The point V is the vertex and α_{j} is the facet plane, or simply the facet of C.

A cone C also has m edge lines. They are formed by the intersection of (m − 1) facets. Hence, a cone can also be defined as follows.

Definition 2. Given m rays R j = { V + t r j | 0 ≤ t < + ∞ } ( j = 1 , ⋯ , m ) shooting from a point V with rank r ( r 1 , ⋯ , r m ) = m , C = C ( V ; r 1 , ⋯ , r m ) = c [ R 1 , ⋯ , R m ] , the convex closure of m rays is called a cone in ℝ m . R j is the edge, r j the edge direction, and R j + = { V + t r j | ∞ < t < + ∞ } the edge line of the cone C.

The two definitions are equivalent. Furthermore, P.Z. Wang [

It is obvious that r j T τ i = 0 (for i ≠ j ) since r j lies on α i . Moreover, we have

r i T τ i ≥ 0 ( for i = 1 , ⋯ , m ) (1)

Consider a linear programming (LP) problem and its dual:

(Primary): max { c ˜ T x | A x ≤ b ; x ≥ 0 } (2)

(Dual): min { y T b | y T A ≥ c ˜ ; y ≥ 0 } (3)

In the following, we focus on solving the dual LP problem. The standard simplex tableau can be obtained by appending an m × m identity matrix I m × m which represents the slack variables as shown below:

[ a 11 … a 1 n ⋮ ⋱ ⋮ a m 1 ⋯ a m n 1 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 1 b 1 ⋮ b m c ˜ 1 ⋯ c ˜ n 0 ⋯ 0 0 ]

We can construct a facet tableau whereby each column is a facet denoted as α j : ( τ j , c j ) , where τ i = ( a 1 i , a 2 i , ⋯ , a m i ) T and

c i = { c ˜ i for 1 ≤ i ≤ n 0 for n < i ≤ m + n (4)

The facet tableau is depicted as follow. The last column ( b 1 , b 2 , ⋯ , b m , 0 ) T is not represented in this tableau.

α 1 α 2 ⋯ α m + n [ τ 1 τ 2 ⋯ τ m + n c 1 c 2 ⋯ c m + n ]

When a cone C = C ( V ; α 1 , ⋯ , α m ) = C ( V ; r 1 , ⋯ , r m ) is intersected by another facet α j , the i^{th} edge of the cone is intersected by α j at certain point q i j . We call α j cuts the cone C and the cut points q i j can be obtained by the following equations:

q i j = V + t i r i where t i = ( c i − V T τ j ) / r i T τ j (5)

The intersection is called real if t i ≥ 0 and fictitious if t i < 0 . Cone cutting greatly alters the accepted zone, as can be seen from the simple 2-dimension example as shown in Figures 1(a)-(e). In 2-dimension, a facet α : ( τ , c ) is a line. The normal vector τ is perpendicular to this line and points to the accepted zone of this facet. Furthermore, a cone is formed by two non-parallel facets in 2-dimension. _{1} and α_{2}. This is represented by the shaded area A in _{3} intersects the cone at two cut points q 13 and q 23 . They are both real cut points. Since the arrow of normal vector τ 3 points to the same general direction of the cone, V lies in the rejected zone of α_{3} and we say α_{3} rejects V. Moreover, the accepted zone of α_{3} intersects with the accepted zone of the cone so that the overall accepted zone is reduced to the shaded area marked as B. In _{3} accepts V and the overall

accepted zone is confined to the area marked as C. As the dual feasible region 𝒟 of a LP problem must satisfy all the constraints, it must lie within area C. In _{3} cuts the cone at two fictitious points. Since τ 3 points to the same direction of the cone, V is accepted by α_{3}. However, the accepted zone of α_{3} covers that of the cone. As a result, α_{3} does not contribute to any reduction of the overall accepted zone area, and so it can be deleted for further consideration without affecting the LP solution. In _{3} and that of the cone is an empty set. This means that the dual feasible region 𝒟 is empty and the LP is infeasible. This is actually one of the criteria that can be used for detecting infeasibility.

Based on this cone-cutting idea, P.Z. Wang [_{o}, then finds a facet to replace one of the existing facet α o u t thus forming a new cone. This process is repeated until an optimal point is found. The cone-cutting algorithm is summarized in

This algorithm finds a facet that rejects V the least as the cutting facet in steps 2 and 3. This facet cuts the edges of the cone at m points. In step 4 and 5, the real cut point q I * that is closest to the vertex V is identified. This becomes the

steps | Input: A, b and c ˜ |
---|---|

Output: either V as the optimal point & V^{T}b as optimal value or declare the LP problem as infeasible. | |

0 | V = V 0 = ( 0 , 0 , ⋯ , 0 ) % the coordinate cone; r 1 = ( 1 , 0 , ⋯ , 0 ) , r 2 = ( 0 , 1 , ⋯ , 0 ) , ⋯ , r m = ( 0 , 0 , ⋯ , 1 ) ; |

1 | while (true) |

2 | % For all facets not in C & reject V, find one that rejects V the least. Δ ¯ = { 1 : ( m + n ) } \ Δ ; % Δ ¯ are facets not in C j * = arg min j { e j | e j = ( V T τ Δ ¯ ( j ) − c Δ ¯ ( j ) ) , j ∈ Δ ¯ } |

3 | if e j * ≥ 0 return (V & V^{T}b) as optimal vertex & optimal value else J * = Δ ¯ ( j * ) ; % j^{*} is the facet index to enter |

4 | t i = c J * V T τ J * r i T τ J * ; i = 1 , ⋯ , m ; i = 1 , ⋯ , m }; q i = V + t i r i ; |

5 | if ( t i < 0 ∀ i = 1 , ⋯ , m ) return(“LP is infeasible”) else % For all the real cuts q_{i,}, find the q_{I}_{*} that is closest to V I * = arg min i { q i T b | t i > 0 } % I^{*} is the facet index to leave |

6 | % Form new cone C_{k}_{+1} by updating V_{k}_{+1}; edge vectors and facets V k + 1 = q I * _{;} r i = { r i i = I * s i g n ( t i ) [ q i − V k + 1 ] i ≠ I * _{ } Δ ( I * ) = J * _{;} |

7 | end |

vertex of a new cone. This new cone retains all the facets of the original cone except that the cutting facet replaces the facet corresponding to the edge I^{*}. Yet the edge I^{*} is retained but the rest of the edges must be recomputed as shown in step 6. Amazingly, P.Z. Wang shows that when b > 0 , this algorithm produces exactly the same result as the original simplex algorithm proposed by Dantz [

Expanding on the cone-cutting theory, the Gravity Sliding Algorithm [

The feasible region 𝒟 is a convex polyhedron formed by constraints ( y T τ j ≥ c j ; 1 ≤ j ≤ n + m ) , and the optimal feasible point is at one of its vertices. Let Ω = { V i | V i T τ j ≥ c j ; j = 1 , ⋯ , n + m } be the set of feasible vertices. The dual LP problem (3) can then be stated as: min { V i T b | V i ∈ Ω } . As V i T b is the inner-product of vertex V i and b, the optimal vertex V * is the vertex that yields the lowest inner-product value. Thus we can set the principle descending direction g 0 to be the opposite of the b vector (i.e. g 0 = − b ) and this is referred to as the gravity vector. The descending path then descends along this principle direction inside 𝒟 until it reaches the lowest point in 𝒟 viewed along the direction of b. This point is then the optimal vertex V * .

The basic principle of the new algorithm can be illustrated in _{0} is inside 𝒟. From P_{0}, it attempts to descend along the g 0 = − b direction. It can go as far as P_{1} which is the point of intersection between the ray R = P 0 + t g 0 and the facet α_{1}. In essence, α_{1} is blocking this ray and hence it is called the blocking facet relative to this ray. In order not to penetrate 𝒟, the descending direction needs to change from g_{0} to g_{1} at P_{1}, and slides along g_{1} until it hits the other blocking facet α_{2} at P_{2}. Then it needs to change course again and slides along the direction g_{2} until it hits P_{3}. In this figure, P_{3} is the lowest point in this dual feasible region 𝒟 and hence it is the optimal point V * .

It can be observed from _{1} is the projection of g_{0} onto α_{1} and g_{2} is the projection of g_{0} onto α_{2}. Thus from P_{1}, the descending path slides along α_{1} to reach P_{2} and then slides along α_{2} to reach P_{3}. Hence we call this algorithm Sliding Gradient Algorithm. The basic idea is to compute the new descending direction to circumvent the blocking facets, and advance to find the next one until it reaches the bottom vertex viewed along the direction of b.

Let σ t denotes the set of blocking facets at the t^{th} iteration. From an initial point P_{0} and a gradient descend vector g_{0}, the algorithm iteratively performs the following steps:

1) compute a gradient direction g_{t} based on σ t . In this example, the initial set

of blocking facets σ 0 is empty and g 0 = − b .

2) move P t to P t + 1 along g t where P t + 1 is a point at the first blocking facet.

3) Incorporate the newly encountered blocking facet to σ t to form σ t + 1 .

4) go back to step 1.

The algorithm stops when it cannot find any direction to descend in step (1). This is discussed in details in Section 3.6 where a formal stopping criterion is given.

For the first step, the gradient descend vector g t needs to satisfy the following requirements.

Proposition 1. g t must satisfy ( g t ) T g 0 ≥ 0 so that the dual objective function y T b will be non-increasing when y move from P t to P t + 1 along the direction of g t .

Proof. Since ( g t ) T g 0 ≥ 0 , g t aligns to the principle direction of g 0 . As P t + 1 = P t + t g t , P t + 1 moves along the principle direction of g 0 when t > 0 .

Since P t + 1 T b = P t T b + t ( g t ) T b = P t T b − t ( g t ) T g 0 , P t + 1 T b ≤ P t T b when ( g t ) T g 0 ≥ 0 . END

This means that if ( g t ) T g 0 ≥ 0 , then P t + 1 is “lower than” P t when viewed along the b direction.

Proposition 2. If P 0 ∈ D , g t must satisfy ( τ σ t ( j ) ) T g t ≥ 0 for all j ∈ σ t to ensure that P t + 1 remains dual feasible (i.e. P t + 1 ∈ D ).

Proof. If for some j, ( τ σ t ( j ) ) T g t < 0 , this means that g t is in the opposite direction of the normal vector of facet α σ t ( j ) so a ray Q = P t + t g t will eventually penetrate this facet for certain positive value of t. This means that Q will be rejected by α σ t ( j ) and hence Q is no longer a dual feasible point. END

To ensure that P t + 1 ∈ D , we need to make sure that it won’t advance too far. The following proposition stipulates the requirement.

Proposition 3. Assuming that 𝒟 is non-empty and P 0 ∈ D . If g t satisfies Propositions 1 and 2; and not all g t T τ j = 0 for j = 1 , ⋯ , m + n , then P t + 1 ∈ D provided that the next point P t + 1 is determined according to (6) below:

P t + 1 = P t + t j * g t (6)

where j * = arg min j { t j | t j = c j − P t T τ j g t T τ j ; t j > 0 ; j = 1 , ⋯ , m + n } .

Proof. The equation for a line passing through P along the direction g is P + t g . If this line is not parallel to the plane (i.e. g T τ ≠ 0 ), it will intersect a facet α : ( τ , c ) at a point Q according to the following equation:

Q = P + t g where t = ( c − P T τ ) / g T τ . (7)

We call t the displacement from P to Q. So t j = c j − P t T τ j g t T τ j is the displacement from P t to α_{j}. The condition t_{j} > 0 ensures that P t + 1 moves along the direction g t but not the opposite direction. j * is the smallest of all the displacements thus α j * is the first blocking facet that is closest to P t .

To show that P t + 1 ∈ D , we need to show P t + 1 T τ j − c j ≥ 0 for j = 1 , ⋯ , m + n . Note that

P t + 1 T τ j − c j = ( P t + t j * g t ) T τ j − c j = ( P t T τ j − c j ) + t j * g t T τ j .

Since P t ∈ D , ( P t T τ j − c j ) ≥ 0 for j = 1 , ⋯ , m + n , so we need to show that t j * g t T τ j ≥ 0 for j = 1 , ⋯ , m + n .

The displacements t j can be split into two groups. For those displacements where t j < 0 , c j − P t T τ j g t T τ j = t j < 0 so g t T τ j = ( c j − P t T τ j ) / t j = − ( c j − P t T τ j ) / k 1 , where k 1 = − t j is a positive constant. Since t j * > 0 .

t j * g t T τ j = − t j * k 1 ( c j − P t T τ j ) = k 2 ( P t T τ j − c j ) ≥ 0 since P t ∈ D & k 2 > 0 .

For those displacements where t j ≥ 0 , we have that t j * is the minimum of all t j in this group. Let k 3 be the ratio between t j * and t j . Obviously, k 3 = t j * t j ≤ 1

( P t T τ j − c j ) + t j * g t T τ j = ( P t T τ j − c j ) + k 3 t j g t T τ j = ( P t T τ j − c j ) + k 3 c j − P t T τ j g t T τ j g t T τ j = ( P t T τ j − c j ) − k 3 ( P t T τ j − c j ) ≥ 0

So P t + 1 ∈ D . END

If g t T τ j = 0 , g t is parallel to α j . Unless all facets are parallel to g t , Proposition 3 can still find the next descend point P t + 1 . If all facets are parallel to g t , this means that facets are linearly dependent with each other. The LP problem is not well formulated.

We now show that the projection of g 0 onto the set of blocking facets σ t satisfies the requirements of Proposition 1 and 2. Before we do so, we discuss the projection operations in subspace first.

Projection is a basic concept defined in vector space. Since we are only interested in the gradient descend direction of g t but not the actual location of the projection, we can ignore the constant c in the hyperplane { y | y T τ = c } . In other words, we focus on the subspace V ( τ ) spanned by τ and its null space N ( τ ) rather than the affine space spanned by the hyperplanes.

Let Y be the vector space in ℝ m , V ( τ ) = { y | y = t τ ; t ∈ ℝ } and its corresponding null space is N ( τ ) = { x | x T y = 0 for x ∈ Y and y ∈ V ( τ ) } . Extending to k hyperplanes, we have V ( τ 1 , τ 2 , ⋯ , τ k ) = { y | y = ∑ j k t j τ j ; t j ∈ R } and the null space is N ( τ 1 , τ 2 , ⋯ , τ k ) = { x | x T y = 0 for x ∈ Y and y ∈ V ( τ 1 , τ 2 , ⋯ , τ k ) } . It can be shown that N ( τ 1 , τ 2 , ⋯ , τ k ) = N ( τ 1 ) ∩ N ( τ 2 ) ∩ ⋯ ∩ N ( τ k ) . Since V ( τ 1 , τ 2 , ⋯ , τ k ) and N ( τ 1 , τ 2 , ⋯ , τ k ) are the orthogonal decomposition of the whole space Y, a vector g in ℝ m can be decomposed into two components: the projection of g onto V ( τ 1 , τ 2 , ⋯ , τ k ) and the projection of g onto N ( τ 1 , τ 2 , ⋯ , τ k ) . We use the notation g ↓ [ τ 1 , τ 2 , ⋯ , τ k ] and g ↓ α 1 ∩ ⋯ ∩ α k to denote them and they are called direct projection and null projection respectively.

The following definition and theorem were first presented in [

Let the set of all subspaces of Y = ℝ m be 𝒩, and let 𝒪 stand for 0-dim subspace, we now give an axiomatic definition of projection.

Definition 3. The projection defined on a vector space Y is a mapping

∗ ↓ # : Y × N → Y

where * is a vector in Y, # is a subspace X in 𝒩 satisfying that

(N.1) (Reflectivity).

For any g ∈ Y , g ↓ Y = g ;

(N.2) (Orthogonal additivity).

For any g ∈ Y and subspaces X , Z ∈ N , if X and Z are orthogonal to each other, then g ↓ X + g ↓ Z = g ↓ X + Z , where X + Z is the direct sum of X and Z.

(N.3) (Transitivity).

For any g ∈ Y and subspaces X , Z ∈ N , ( g ↓ X ) ↓ X ∩ Z = g ↓ X ∩ Z ,

(N.4) (Attribution).

For any g ∈ Y and subspace X ∈ N , g ↓ X ∈ X , and especially,

(N.5) For any g ∈ Y and subspace X ∈ N , g T g ↓ X ≥ 0 .

A convention approach to find g ↓ α 1 ∩ ⋯ ∩ α k is to compute it directly from the null space N ( τ 1 , τ 2 , ⋯ , τ k ) . We now show another approach that is more suitable to our overall algorithm.

Theorem 1. For any g ∈ Y , we have

g ↓ α 1 ∩ ⋯ ∩ α k = g − ∑ i k g T o i (8)

where { o 1 , ⋯ , o k } are an orthonormal basis of subspace V ( τ 1 , τ 2 , ⋯ , τ k ) .

Proof. Since α 1 ∩ ⋯ ∩ α k and [ τ 1 , τ 2 , ⋯ , τ k ] are the orthonormal decomposition of Y, according to (N.2) and (N.1) we have

g = g ↓ [ τ 1 , ⋯ , τ k ] + g ↓ α 1 ∩ ⋯ ∩ α k .

According to (N.2) and (N.4), the first term becomes

g ↓ [ τ 1 , ⋯ , τ k ] = g T o 1 + ⋯ + g T o k .

Hence (8) is true. END

The following theorem shows that the projection of g 0 onto the set of all blocking facets σ_{t} always satisfies Propositions 1 and 2. First, let us simplify the notation and use σ to represent σ_{t} in the following section and g 0 ↓ σ to stand for g 0 ↓ α σ ( 1 ) ∩ ⋯ ∩ α σ ( k ) where k = | σ | is the number of elements in σ.

Theorem 2. ( τ σ ( j ) ) T ( g 0 ↓ σ ) = 0 and g 0 T ( g 0 ↓ σ ) ≥ 0 for all j = 1 , ⋯ , k .

Proof. Since g 0 ↓ σ lies on the intersection of ( α σ ( 1 ) ∩ ⋯ ∩ α σ ( k ) ) , it lies on each facet α σ ( j ) for j = 1 , ⋯ , k . Thus g 0 ↓ σ is perpendicular to the normal vector of α σ ( j ) (i.e. ( τ σ ( j ) ) T ( g 0 ↓ σ ) = 0 ). So it satisfies Proposition 2.

According to (N.5), g 0 T ( g 0 ↓ σ ) ≥ 0 . So it satisfies proposition 1 too. END

As such, g 0 ↓ σ , the projection of g 0 onto all the blocking facets, can be adopted as the next gradient descend vector g t . Hence, g 0 ↓ σ , the projection of g 0 onto all the blocking facets, can be adopted as the next gradient descend vector g t .

In this section, we explore other projection vectors which also satisfy Propositions 1 and 2. Let the j^{th} complement blocking set σ j c be the blocking set σ excluding the j^{th} element; i.e. σ j c = α 1 ∩ ⋯ ∩ α j − 1 ∩ α j + 1 ∩ ⋯ ∩ α k . We examine the

projection g 0 ↓ σ j c for j = 1 , ⋯ , k . Obviously, g 0 T ( g 0 ↓ σ j c ) ≥ 0 according to (N.5) as g 0 ↓ σ j c is a projection of g 0 . So if ( τ σ ( j ) ) T ( g 0 ↓ σ j c ) ≥ 0 for all j ∈ σ ,

it satisfies Proposition 2 and hence is a candidate for consideration. For all the candidates, including g 0 ↓ σ , which satisfy this proposition, we can compute the inner product of each candidate with the initial gradient descend vector g 0 , (i.e. g 0 T ( g 0 ↓ σ j c ) ) and select the maximum. This inner product is a measure of how close or similar a candidate is to g 0 so taking the maximum means getting the steepest descend gradient. Notice that if a particular g 0 ↓ σ j c is selected as the next gradient descending vector, the corresponding α j is no longer a blocking facet in computing g 0 ↓ σ j c . Thus α j needs to be removed from σ t to form the set of effective blocking facets σ t * . The set of blocking facets for the next iteration σ t + 1 is σ t * plus the newly encountered blocking facet. In summary, the next gradient descend vector g t is:

g t = max ( g 0 T ( g 0 ↓ σ ) , g 0 T ( g [ 1 ] ) , g 0 T ( g [ 2 ] ) , ⋯ , g 0 T ( g [ k ] ) ) (9)

where g [ j ] = g 0 ↓ σ j c ; j ∈ σ with ( τ σ ( j ) ) T ( g 0 ↓ σ j c ) ≥ 0 and k = | σ | .

The effective blocking set σ t * is

σ t * = { σ t if g t + 1 = g 0 ↓ σ σ t \ { α j } if g t + 1 = g 0 ↓ σ j c . (10)

At first, this seems to increase the computation load substantially. However, we now show that once g 0 ↓ σ is computed, g 0 ↓ σ j c can be obtained efficiently.

This session discusses a method of computing g ↓ σ and g ↓ σ j c for any vector g. According to (8), g ↓ σ = g ↓ α 1 ∩ ⋯ ∩ α k = g − ∑ i k g T o i . The orthonormal basis { o 1 , o 2 , ⋯ , o k } can be obtained from the Gram Schmidt procedure as follows:

o 1 = τ 1 ; o 1 = o 1 / | o 1 | (11)

o 2 = τ 2 − τ 2 T o 1 ; o 2 = o 2 / | o 2 | (12)

Let us introduce the notation a ↓ b to denote the projection of vector a onto vector b. We have a ↓ b = ( a T b b T b ) b , then o 2 = τ 2 − τ 2 ↓ o 1 as ( o 1 T o 1 ) = 1 . Likewise,

o j = τ j − ∑ i = 1 j − 1 τ j ↓ o i ; o j = o j / | o j | . (13)

Thus from (8),

g ↓ σ = g − ∑ i k g T o i = g − ∑ i = 1 k g ↓ o i . (14)

After evaluating g ↓ σ , we can find g ↓ σ j c backward from j = k to 1. Firstly,

g ↓ σ k c = g − ∑ i = 1 k − 1 g ↓ o i = g ↓ σ + g ↓ o k . (15)

Likewise, it can be shown that

g ↓ σ j c = g − ∑ i = 1 j − 1 g ↓ o i − ∑ i = j + 1 k g ↓ o i ( j ) for j = 2 to k . (16)

The first summation is projections of g onto existing orthonormal basis o i . Each term in this summation has already been computed before and hence is readily available. However, the second summation is projections on new basis o i ( j ) . Each of these basis must be re-computed as the facet α j is skipped in σ j c . Let

T k = g + g ↓ o k ; S k = 0 (17)

T j = T j + 1 + g ↓ o j ; S j = ∑ i = j + 1 k g ↓ o i ( j ) . (18)

Then we can obtain g ↓ σ j c recursively from j = k , k − 1 , ⋯ , 1 by:

g ↓ σ j c = T j − S j . (19)

To compute o i ( j ) , some of the intermediate results in obtaining the orthonormal basis can also be reused.

Let μ j , 1 = 0 for all j = 2 , ⋯ , k and μ j , i = μ j , i − 1 + τ j ↓ o i for i = 1 , ⋯ , j − 1 , then we have

o j = τ j − μ j , j − 1 ; o j = o j / | o j | for j = 2 , ⋯ , k . (20)

The intermediate terms μ j , i can be reused in computing o m ( j ) as follows:

o m ( j ) = τ m − μ m , j − 1 − ∑ i = j + 1 m − 1 τ m ↓ o i ( j ) ; o m ( j ) = o m ( j ) | o m ( j ) | for m = + 1 , ⋯ , k . (21)

By using these intermediate results, the computation load can be reduced substantially.

When a new blocking facet is encountered, it will be added to the existing set of blocking facets. Hence both σ t and σ t * will typically grow in each iteration unless one of the g 0 ↓ σ j c is selected as g t . In this case, α σ ( j ) is deleted from σ t according to (10). The following theorem, which was first presented in [

Theorem 3 (Stopping criterion) Assuming that the dual feasible region 𝒟 is non-empty, let P t ∈ D and is descending along the initial direction g 0 = − b ; let | σ t * | be the number of effective blocking facets in σ t * at the t^{th} iteration. If

| σ t * | = m and the rank r ( σ t * ) = m , then P t is a lowest point in the dual feasible region 𝒟.

Proof. If | σ t * | = m and the rank r ( σ t * ) = m , then the m facets in σ t * form a cone C with vertex V = P t . Since the rank is m, its corresponding null space contains only the zero vector. So g 0 ↓ σ = g 0 ↓ α σ ( 1 ) ∩ ⋯ ∩ α σ ( k ) = 0 .

As mentioned about the facet/edge duality in Section 2, for j = 1 , ⋯ , m , edge-line R i + is the intersection of all C-facets except α i . That means R i + = σ i c . Since an edge-line is a 1-dimensional line, the projection of a vector

g 0 onto R i + equals to ± r i and hence g t ↓ σ i c = g t ↓ R i + = ± r i . Since g t ↓ σ i c are projections of g 0 , according to (N.5), g 0 T ( g t ↓ σ i c ) ≥ 0 .

Since | σ t * | = m , it means that g t ↓ σ i c does not satisfy Proposition 2 for all i = 1 , ⋯ , m . Otherwise, one of the g t ↓ σ i c would have been selected as the next gradient descend vector and, according to (10), it would be deleted from σ t * and hence | σ t * | would be less than m. This means that at least one of j ∈ σ

has a value τ j T ( g t ↓ σ i c ) < 0 . However, for all k ≠ i , g t ↓ σ i c is in the null space of so τ k T ( g t ↓ σ i c ) = 0 . This leaves τ i T ( g t ↓ σ i c ) < 0 . If g t ↓ σ i c = r i , then τ i T ( g t ↓ σ i c ) = τ i T r i < 0 . This contradicts to the fact that τ i T r i ≥ 0 in (1). Therefore, g t ↓ σ j c = − r i . Since g 0 T ( g t ↓ σ i c ) ≥ 0 , g 0 T ( − r i ) = ( − g 0 ) T r i ≥ 0 . Note that

( − g 0 ) T r i means that edge r i is in opposite direction of g 0 . As this is true for all edges, there is no path for g t to descend further from this vertex. It is obvious that the vertex V is the lowest point of C when viewed in the b direction.

Since P t is dual feasible, and V is a vertex of 𝒟. Cone C coincides with the dual feasible region 𝒟 in a neighborhood N of V, it is obvious that P t is the lowest point of 𝒟 when viewed in the b direction. END

In essence, when the optimal vertex V * is reached, all the edges of the cone will be in opposite direction of the gradient vector g 0 = − b . There is no path to descend further so the algorithm terminates.

The entire algorithm is summarized as follows in

Step 0 is the initialization step that sets up the tableau and the starting point P. Step 2 is to find a set of initial blocking facets σ in preparation of step 4. In the inner loop, Step 4 calls the Gradient Select routine. It computes g 0 ↓ σ and g 0 ↓ σ j c in view of σ using Equations (11) to (21) and select the best gradient vector g according to (9). This routine not only returns g but also the effective blocking facets σ * and g 0 ↓ σ j c for subsequent use. Theorem 3 states that when the size of σ * reaches m, the optimal point is reached. So when it does, step 5 returns the optimal point and the optimal value to the calling routine. Step 6 is to find the closest blocking facet according to (6). Because P lies on every facets of σ, t j = 0 for j ∈ σ . Hence, we only need to compute those t j where j ∉ σ . The newly found blocking facet is then included in σ in step 7 and the

steps | Input: A, b and c ˜ ; and 𝒟 is non-empty and P_{0} is inside 𝒟 |
---|---|

Output: OptPt & OptVal | |

0 | Construct the Facet Tableau [ τ ] from A, b and c ˜ P = P 0 ; g 0 = − b |

1 | d j = P T τ j − c j for j = 1 , ⋯ , m + n } |

2 | σ = { j | | d j | < δ } ; % δ is a small constant σ * = σ ; % σ * is set of blocking facet |

3 | while (true) |

4 | if σ ≠ ϕ % compute g , g ↓ σ j c and update σ * [ g , g ↓ σ j c , σ * ] = Gradient Select( g 0 , σ , [ τ ] , P , c ) else g = g 0 |

5 | if | σ * | = = m O p t P t = P ; O p t V a l = P T b ; return (OptPt, OptVal) else σ = σ * |

6 | t j = c j P T τ j g T τ j ; for j ∈ { 1 , 2 , ⋯ , m + n } \ σ j * = arg min j { t j | t j > 0 } ; Q = P + t j * g ; P = Q |

7 | σ = σ * ∪ { i | | t i − t j * | < ϵ } % update blocking facets |

8 | end |

inner loop is repeated until the optimal vertex is found.

^{1}Other derivations of the Klee-Minty formulas have also been tested and the same results are obtained.

We use the Klee-Minty example presented in [^{1} to walk through the algorithm in this section. An example of the Klee-Minty Polytope example is shown below:

max 2 m − 1 x 1 + 2 m − 2 x 2 + ⋯ + 2 x m − 1 + x m .

For the standard simplex method, it needs to visit all 2 m − 1 vertices to find the optimal solution. Here we show that, with a specific choice of initial point P 0 , the Sliding Gradient algorithm can find the optimal solution in two iterations―no matter what the dimension m is.

To apply the Sliding Gradient algorithm, we first construct the tableau. For an example with m = 5 , the simplex tableau is:

The b vector is b = [ 5 , 25 , 125 , 625 , 3125 ] T . After adding the slack variables, the facet tableau becomes:

Firstly, notice that α 5 and α 10 have the same normal vector (i.e. τ 5 = τ 10 ) so we can ignore α 10 for further consideration. This is true for all value of m.

If we choose P 0 = M b , where M is a positive number (e.g. M = 100 ), It can be shown that P 0 is inside the dual feasible region. The initial gradient descend vector is: g 0 = − b .

With P 0 and g 0 as initial conditions, the algorithm proceeds to find the first blocking facet using (6). The displacements t j for each facet can be found by:

t j = c j − P 0 T τ j g 0 T τ j = c j − b T τ j − M b T τ j − b T τ j = c j − b T τ j + M = M − c j b T τ j .

With P 0 and g 0 as initial conditions, the algorithm proceeds to find the first blocking facet using (6). The displacements t j for each facet can be found by:

t j = c j − P 0 T τ j g 0 T τ j = c j − b T τ j − M b T τ j − b T τ j = c j − b T τ j + M = M − c j b T τ j . (22)

We now show that the minimum of all displacements is t m .

First of all, at = m, τ m = [ 0 , ⋯ , 0 , 1 ] T , c m = 1 and b m = 5 m , so t m = M − 5 − m .

For m < j ≤ 2 m − 1 , c j = 0 , so t j = M > t m .

For 1 ≤ j < m , c j = 2 m − j , and the elements of τ j are:

τ i j = { 0 if i < j 1 if i = j 2 i − j + 1 if j < i ≤ m .

The 2^{nd} term of Equation (22) can be re-written as:

c j b T τ j = 1 b T ( τ j c j ) = 1 b T ( τ j 2 m − j )

The inner product of the denominator is:

b T ( τ j 2 m − j ) = ∑ i = 1 m b i ( τ i j 2 m − j ) = ∑ i = 1 m − 1 b i ( τ i j 2 m − j ) + b m ( 2 m − j + 1 2 m − j ) = ∑ i = 1 m − 1 b i ( τ i j 2 m − j ) + 2 b m

Since all the elements in the b vector and the τ are positive, the summation is a positive number. Thus

b T ( τ j 2 m − j ) = ∑ i = 1 m − 1 b i ( τ i j 2 m − j ) + 2 b m > b m

Since the value of the denominator is bigger than b m = 5 m , we have

1 b T ( τ j 2 m − j ) < 5 − m

So

t j = M − c j b T τ j = M − 1 b T ( τ j 2 m − j ) > M − 5 − m = t m .

Hence t m is the smallest displacement. For the case of m = 5 , their values are shown in the first row (first iteration) of the following

Thus α m is the closest blocking facet. Hence, σ 1 * = σ 1 = { α m } . For the next iteration,

P 1 = P 0 + t m g 0 = M b + ( M − 5 − m ) ( − b ) = M b − M b + 5 − m b = 5 − m b .

The gradient vector g 1 is g 0 projects onto α m . Because τ m = [ 0 , ⋯ , 1 ] T is already an orthonormal vector, we have according to (8)

g 1 = g 0 − ( g 0 T τ m ) τ m = g 0 − [ 0 , ⋯ , − 5 m ] T = [ − 5 , − 5 2 , ⋯ , − 5 m − 1 , 0 ] T .

In other words, g 1 is the same as − b except that the last element is zeroed out. Using P 1 and g 1 , the algorithm proceeds to the next iteration and evaluates the displacements t j again. For j = m + 1 to 2 m − 1 , since c j = 0 and τ j is a unit vector with only one non-zero entry at the j^{th} element,

t j = − P 1 T τ j g 1 T τ j = − 5 − m b T τ j g 1 T τ j = − 5 − m b j − b j = 5 − m for m + 1 ≤ j ≤ 2 m − 1 .

Thus the displacements t m to t 2 m − 1 have the same value of 5 − m .

For 1 ≤ j < m , we have:

t j = c j − P 1 T τ j g 1 T τ j = c j − 5 − m b T τ j g 1 T τ j = 5 − m 5 m c j − b T τ j g 1 T τ j .

As mentioned before, g 1 is the same as − b except that the last element is zero, we can express b T τ j in terms of g 1 T τ j as follows:

b T τ j = − g 1 T τ j + b m τ m j .

The numerator then becomes:

5 m c j − b T τ j = 5 m c j + g 1 T τ j − b m τ m j .

Since c j = 2 m − j , c j = 2 m − j and τ m j = 2 m − j + 1 , substituting these values to the above equation, the numerator becomes

5 m c j − b T τ j = 5 m 2 m − j − 5 m 2 m − j + 1 + g 1 T τ j = g 1 T τ j − 5 m 2 m − j .

Thus

t j = 5 − m 5 m c j − b T τ j g 1 T τ j = 5 − m ( 1 − 5 m 2 m − j g 1 T τ j ) .

t_{1} | t_{2} | t_{3} | t_{4} | t_{5} | t_{6} | t_{7} | t_{8} | t_{9} | |
---|---|---|---|---|---|---|---|---|---|

1 | 99.9999 | 99.9999 | 99.9999 | 99.9998 | 99.9997 | 100 | 100 | 100 | 100 |

2 | 0.0018 | 0.0018 | 0.0018 | 0.0035 | 0 | 0.00032 | 0.00032 | 0.00032 | 0.00032 |

Notice that all elements in g 1 are negative but all of τ j are positive. So the inner product g 1 T τ j is a negative number. As a result, the last term inside the bracket is a positive number which makes the whole value inside the bracket bigger than one and hence t j > 5 − m for 1 ≤ j < m − 1 . Moreover, t m is zero as g 1 lies on α m . The actual displacement values for the case of m = 5 are shown in the second row of

Since t m to t 2 m − 1 have the same lowest displacement value, all of them are blocking facets so σ 2 * = σ 2 = { α m } ∪ { α m + 1 , ⋯ , α 2 m − 1 } = { α m , α m + 1 , ⋯ , α 2 m − 1 } . Also,

P 2 = P 1 + t m + 1 g 1 = 5 − m b + 5 − m [ − 5 , − 5 2 , ⋯ , − 5 m − 1 , 0 ] T = [ 0 , 0 , ⋯ , 0 , 1 ] T .

Now | σ t * | = m , so P 2 has reached a vertex of a cone. According to Theorem 3, the algorithm stops. The optimal value is P 2 T b = 5 m , which is the last element of the b vector.

Thus with a specific choice of the initial point P 0 = M b , the Sliding Gradient algorithm can solve the Klee-Minsty LP problem in two iterations, and it is independent of m.

The Sliding Gradient Algorithm has been implemented in MATLAB and tested on the Klee-Minty problems and also self-generated LP problems with random coefficients. As a real number can only be represented in finite precision in digital computer, care must be taken to deal with the round-off issue. For example, when a point P lies on a plane y T τ = c , the value d = P T τ − c should be exactly zero. But in actual implementation, it may be a very small positive or negative number. Hence in step 2 of the aforementioned algorithm, we need to set a threshold δ so that if | d | < δ , we regard that point P is laid on the plane. Likewise for the Klee-Minty problem, this algorithm relies on the fact that in the second iteration, the displacement values t i for i = m + 1 to 2 m − 1 should be the same and they should all be smaller than the values of t j for j = 1 to m − 1 . Due to round-off errors, we need to set a tolerant level to treat the first group to be equal and yet if this tolerant level is set too high, then it cannot exclude members of the second group. The issue is more acute as m increases. It will require higher and higher precision in setting the tolerant level to distinguish these two groups.

We have presented a new approach to tackle the linear programming problem in this paper. It is based on the gradient descend principle. For any initial point inside the feasible region, it will pass through the interior of the feasible region to reach the optimal vertex. This is made possible by projecting the gravity vector to a set of blocking facets and using that as descending vector in each iteration. In fact, the descending trajectory is a sequence of line segments that hug either a single blocking facet or the intersections of them, and each line segment is advancing towards the optimal point. It should be noted that there is no parameters (such as step-size, ..., etc.) to tune in this algorithm although one needs to take care of numerical round-off issue in actual implementation.

This work opens up many areas of future research. On the one hand, we are extending this algorithm so that it can relax the constraint of starting from a point inside the feasible region. Promising development has been achieved in this area though more thorough testing on obscure cases need to be carried out.

On the theoretical front, we are encouraged that, from the algorithm walk-through on the Klee-Minty example, this algorithm exhibits strongly polynomial complexity characteristics. Its complexity does not appear to depend on the bit sizes of the LP coefficients. However, more rigorous proof is needed and we are working towards this goal.

The authors wish to thank all his friends for their valuable critics and comments on the research. Special thanks are given to Prof. Yong Shi, Prof. Sizong Guo for their supports. This study is partially supported by the grants (Grant Nos. 61350003, 70621001, 70531040, 90818025) from the Natural Science Foundation of China, and grant (Grant No. L2014133) from Department of Education of Liaoning Province.

Lui, H.C. and Wang, P.Z. (2018) The Sliding Gradient Algorithm for Linear Programming. American Journal of Operations Research, 8, 112-131. https://doi.org/10.4236/ajor.2018.82009