A Dimensional Reduction Approach Based on Essential Constraints in Linear Programming

This paper presents a new dimension reduction strategy for medium and large-scale linear programming problems. The proposed method uses a sub-set of the original constraints and combines two algorithms: the weighted average and the cosine simplex algorithm. The first approach identifies binding constraints by using the weighted average of each constraint, whereas the second algorithm is based on the cosine similarity between the vector of the objective function and the constraints. These two approaches are complementary, and when used together, they locate the essential subset of initial constraints required for solving medium and large-scale linear programming problems. After reducing the dimension of the linear programming problem using the subset of the essential constraints, the solution method can be chosen from any suitable method for linear programming. The proposed approach was applied to a set of well-known benchmarks as well as more than 2000 random medium and large-scale linear programming problems. The results are promising, indicating that the new approach contributes to the reduction of both the size of the problems and the total number of iterations required. A tree-based classification model also confirmed the need for combining the two approaches. A detailed numerical example, the general numerical results, and the statistical analysis for the decision tree procedure are presented.


Introduction
The popularity of linear programming can be attributed to many factors, in-cluding its capacity to model large and complex problems and the ability to solve such problems in polynomial time [1] using effective algorithms.However, many large-scale real-life problems almost always contain a significant number of redundant constraints and variables, increasing their complexity and impacting post-optimal analysis [2].Redundancy in linear programming is expected because of the complete lack of knowledge about the constraints and the desire to formulate the problem without omitting essential elements.Thus, researchers tend to include all the information given in the form of binding and non-binding constraints.Even including non-binding constraints does not alter the optimum solution; it may require additional iterations and increase the computational difficulties [3] [4].Furthermore, the optimal strategy is to remove the types of redundancy, which reduces the total solution time [5].Because they do not affect the solution structure, redundant or excessive constraints can be removed from the problem and not included in the formulation [2].As a result, the constraints used should ideally be less than the original ones.This paper aims to present a method that uses a subset of the original constraints considered essential to the solution.It combines the cosine simplex algorithm [6] with a recently proposed method based on the weighted average of a linear programming problem [7] and eliminates redundant constraints while improving existing solving methods.

Literature Review
The contributions to algorithms and approaches for linear programming (LP) problems are reviewed in this section.The most popular techniques for solving LP problems are provided first.These techniques are used to characterize binding and redundant restrictions in addition to solving LP problems.The literature study also includes algorithms for eliminating redundant constraints, identifying binding constraints, or contributing to the dimension reduction of the problems.The first effective solution technique in linear programming was the Simplex method proposed by Dantzig [8] [9] [10], which is the most popular method that allows efficient post-optimality analysis; however, it is very computationally expensive for large-scale problems [11] [12].Other computational methods for maximizing a linear function subject to linear inequalities were proposed [13] [14], and solving problems in polynomial time; nevertheless, they performed poorly in practice [15].Corley et al. [6] presented the cosine simplex algorithm, an improvement of the Simplex algorithm that decreases the number of simplex iterations and calculations each iteration.The cosine criteria utilized detect active constraints faster than the usual Simplex method.A polynomial projection approach [16], the Exterior Point Simplex Algorithm (EPSA) [17] [18], and improved Simplex algorithms [19] [20] [21] [22] have also been studied.
In addition to methods for solving LP problems, other algorithms have been developed to identify and remove redundancies.According to Terlakey [23], the ultimate goal of any technique is to identify nonessential constraints, and works suggest that redundant constraints should be identified and trivial constraints should be deleted [24] [25] [26].Several techniques dealing with redundancy in chart problems.Algorithms for determining irrelevant constraints in systems of linear inequalities and identifying redundant constraints in a system of linear constraints have also been developed [28]- [34] in primal and dual-form LP problems, while methods for classifying linear constraints as redundant or necessary were also developed [35] [36].Presolving heuristic algorithms have been used for identifying redundancy [5] [37] [38], and routines have been developed for large and sparse LP problems prior to solving them with an interior point-based optimizer [39].Stojković and Stanimirović [40] proposed two direct methods for identifying redundant constraints in linear programming based on cosine similarity and game theory, respectively.Bradley et al. [41] proposed several heuristic algorithms for detecting and exploiting structural redundancy in large-scale mathematical programming models, real-life linear programming, and mixed-integer models.New approaches have been developed to preprocess nonnegative large-scale problems to reduce their dimension by identifying and removing redundant constraints and variables [42] [43] [44].Nikolopoulou et al. [7] proposed a method for identifying binding constraints in LP problems using the weighted average of each constraint.The method was accurate in problems without excessive or superfluous constraints, while in general LP problems, the accuracy of the proposed algorithm was tested using Type I and Type II errors, as described in a previous procedure [32].
Real-life problems are often formulated as LP problems, including all the operative information given as constraints to avoid missing essential elements.This constraint inclusion leads to augmented problems, which are often hard to solve.To address this issue, we propose a method that uses the subset of binding constraints, forming a reduced-size problem equivalent to the original one using the outcomes from identifying binding constraints of two existing methods [6] and [7].Consequently, combining two methods leads to a significantly reduced size of LP problems when applied to medium and large-scale LP problems.The new method is based on a geometric approach to identifying the subset of binding constraints; it eliminates the need for artificial variables and is not computationally demanding.Moreover, it introduces new notions in LP programming: the sufficient and the minimal efficient problem.

Organization of the Paper
The rest of the paper is organized as follows: Section 2 discusses the new approach using the combination of the weighted average method and the cosine simplex method.The proposed algorithm is presented in Section 3. Section 4 presents the numerical results of the algorithm tested on a collection of well-known benchmarks and random LP problems.Further statistical analysis using decision trees is presented in Section 5. Finally, Section 6 discusses the results and remarks on the proposed approach and possible extensions.

Description of the Problem
Consider a linear programming problem with a nonempty convex feasible region: where x is the n-dimensional vector of decision variables, is the righthanded vector, and Large-scale LP problems almost always contain a significant number of redundant and excessive constraints and variables.Unfortunately, no easy method exists to identify constraints except in two-dimensional problems [4].

Definitions and Assumptions
In this section, five definitions are given.The first two refer to the types of constraints in LP problems; the next introduces a new notion regarding the subset of constraints required to solve an LP problem; and the fourth refers to the LP problem that is produced by a subset of constraints of the original problem.Finally, the last definition refers to LP problems formulated only by the binding constraints.
Consider a linear programming problem (1) with a nonempty, convex feasible region.Definition 2.1: A constraint is called "binding" or "active" if it is satisfied as an equality in the optimal solution.Otherwise, the constraint is called "redundant".Definition 2.2: If the plane of a redundant constraint contains no feasible points (it lies beyond the feasible region), then the constraint is called "excessive" or "superfluous".
, , , m S s s s =  be the set of constraints of problem (1).

S s s s s s s
Furthermore: • The S′ subset is called "essential", and a constraint that belongs to the S′ subset is called "essential".• In the special case where the number of constraints is less than the number of variables, then u m ≤ .
Definition 2.4: A linear programming problem consisting of several essential constraints is called a sufficient problem.Definition 2.5: A linear programming problem consisting of binding constraints is called a minimal efficient problem.
Consequently, the following corollaries are formulated: Corollary 2.1: The sufficient problem is equivalent to the original problem and has the same solution.
Corollary 2.2: There could be more than one sufficient problem equivalent to the original one having the same solution.
Therefore, the new problem with the nonempty convex feasible region can be formulated as where x is the n-dimensional vector of decision variables, is the right-handed vector, where  (2) are reduced compared to those in the original LP (1).Thus, the sufficient problem (2) is reduced in size compared to the original problem (1).The main purpose of this paper is to locate the essential constraints that form a sufficient problem, that is, the , , , u S s s s ′ =  subset.

Weighted Average Method & Cosine Simplex Algorithm
In this section, the main idea of the two methods that are used to form the proposed algorithm is presented.
Consider the linear programming problem (1).Assume that in this problem there are no excessive constraints and the feasible region is convex.Let ( )  be the optimal solution to the problem (1).Binding constraints hold equality in the system of inequalities in problem (1), thus * x is a solution for each binding constraint of the LP problem (1).The goal of the method is to identify the constraints that hold equality in the optimal solution * x .
The key idea behind the method is that binding constraints can be predicted if the solution * x can be estimated.For this purpose, is considered the weighted average of the i-constraint, x to the problem (1); therefore, the constraint corresponding to point M could be considered essential to solving the LP problem [45].
Consider the linear programming problem (1).Let  , be the cosine of the angle i θ between the vectors i a and the vector c of the objective function.For a given problem (1), this method begins by solving a relaxed problem of the original objective function subject to a single constraint, yielding a nonempty, bounded feasible region.At each iteration of the algorithm, the constraint that is most parallel to the objective function among those violated by the solution to the current relaxed problem is appended to it.The algorithm adds active constraints until the Kuhn-Tucker conditions [11] and [1], both necessary and sufficient for (1), are satisfied.When no constraints are violated, the solution to the current relaxed problem is optimal for the original problem.

Related Work and Motivation
The main idea is based on a recently proposed notion: the segmentation of the area that is prescribed by the problem's constraints into two parts: the high-chance area of nonbinding constraints and the area of potential binding constraints.For this purpose, a heuristic procedure using cosine similarity and the weighted average is applied.These two methods were chosen since they are both based on methods for potential binding constraints.As a result, a segmentation rule is formed based on the outcomes from the graphical plot between the cosine and the weighted average method [46].The outcome of the segmentation rule was that the essential constraints should be searched among those that are more parallel to the objective function and those that correspond to small weighted average values.
An example of a linear programming problem (1), where the number of constraints is much larger than the number of variables, was used to form the heuristic segmentation rule.In this problem, which constraints are binding, redundant, or excessive is unknown.The example was considered a random problem, created using a jitter function, and implemented in R [47].At first, a vector that was considered a solution to the problem was chosen.Then, the coefficient matrix was formed, and the coefficients ij a , 1, 2, ,  , were generated independently and randomly from the uniform distribution.The vector is formed by multiplying the above matrix A with the consi- dered solution and adding random noise to this vector.To form the objective coefficient vector , the coefficients were generated independently and randomly, and  .Consider a random 750 × 50 LP problem (1).The binding constraints in this example are 18.The optimal value of the objective function is equal to 28.065, and the number of total iterations is 146.Consider each constraint's cosine (4) and weighted average (3).A graphical plot (Figure 1) is used between the cosine and the The black dotted vertical lines refer to the quartiles of the cosine and the , on the plot.In Figure 2, the axis that corresponds to  axis that corresponds to cos i θ , 1, 2, ,750 i =  , is represented by cos.The binding constraints are represented as dark blue dots, while the redundant or excessive constraints are represented as light grey dots.According to Figure 1 there are no binding constraints in the area that is defined below the value of the first quartile of cosine.
Therefore, the points in this area can be removed from the original problem and can be considered points referring to redundant or excessive constraints.This area can be considered a high-chance area of redundant or excessive constraints, while the complementary area can be considered an area where potential binding constraints can be found.The constraints considered essential to solving the problem are located in the complementary area.
After removing the points in the high-chance area of excessive constraints, the constraints of the new problem are 562, which is equal to a 25% reduction of the original number of constraints.The optimal value of the objective function of the new, reduced-sized problem is equal to 28.065, which is equal to the value of the objective function of the original problem.In the reduced problem, the binding constraints are the same as those of the original problem; however, the total iterations are 111, which is a 23.972% reduction of the original number of iterations.In Figure 2, where the same procedure is used, the axis that corres-American Journal of Operations Research ponds to , is represented by 2 1 l , and the axis that cor- , is represented by cos.The binding constraints are represented as dark blue dots, while the redundant or excessive constraints are represented as light grey dots.Ιn the new problem, it is observed that three binding constraints are in the new high-chance area of redundant or excessive constraints.Furthermore, there are no binding constraints in the lower left corner area of the plot, which consists of dots that are below the line referring to the new first quartile of cosine and the line referring to the new first quartile of In this area, 39 constraints can be removed from the problem.
The constraints of the new problem are now 523, which means that the constraint reduction of the original problem is 30.267%.The optimal value of the new, reduced-sized problem equals 28.065; the binding constraints and the nonzero variables are the same as those of the original problem, and the number of total iterations is 109.The final number of iterations is reduced by 25.343%.
The proposed two-step segmentation rule is summarized as follows: Two-step segmentation rule input: The number of decision variables (n), the number of constraints (m), the coefficient matrix of the problem (A), the vector of the right-hand side coefficients (b), the vector of the objective coefficients (c).Output: (Initialization) Compute: Remove the constraints that: 1c Q′ , the first quartile of the values of , the first quartile of the values of The rule's effectiveness was proved by an application to 1000 random medium to large-scale LP problems, where the number of constraints was about 10 -15 times larger than the number of variables.More specifically, the rule was applied in 500 random LP problems, where n = 50 and m = 500, and in 500 random LP problems, where n = 50 and m = 750.The problems were implemented according to the procedure described in this section regarding a random 750 × 50 problem using R. To check the effectiveness of the segmentation rule, the binding constraints of the original problems and the binding constraints of the problems after the first and second segmentations were identified using the Simplex method in R.
After the first segmentation, the binding constraints were identical in 98% of the 500 × 50 problems.After the second segmentation, the binding constraints were identical in 87.2% of the problems, while in 99.2% of the problems, the number of binding constraints differed by one at most.These results are reported in Table 1.In Table 2, the first and the second column refer to the statistics regarding the percentage of the difference between the original objective value and the objective value after the first segmentation (p.obj1) and the percentage of the difference between the original objective value and the objective value after the second segmentation (p.obj2); the third and the fourth column refer to the statistics of the percentage of the difference between the original variables and the variables after the first segmentation (p.sol1) and the percentage of the difference between the original variables and the variables after the second segmentation (p.sol2) and the last column refers to the statistics of the percentage of the total constraint reduction (p.reduced).As is observed from Table 3, the two segmentations did not affect the value of the objective function significantly.
In 94% of the 750 × 50 problems, the binding constraints were the same after the first segmentation, and in 99.8% of the problems, the number of binding constraints differed by one at most.After the second segmentation, the binding constraints were identical in 82% of the problems, while in 97.2% of the problems, the number of binding constraints differed by one at most.These results are reported in Table 3.Finally, Table 4 reports the statistics regarding the  4; in this case, the two segmentations did not significantly affect the objective function's value.To conclude, the essential constraints should be searched among the constraints more parallel to the objective function and those corresponding to small weighted average values.

Description of the Method
For a given LP problem (1), where m > n, a combination of the cosine simplex algorithm and the weighted average algorithm is used to reduce the dimension of the problem.The main objective of the reduction is to keep the essential constraints and subtract the nonessential ones to determine a sufficient problem.To assure the effectiveness of the proposed method regarding the essential constraints, we calculate the percentage of binding constraints after the reduction that is identical to the binding ones in the original problem.For the implementation of the proposed method, the i-constraints, are considered binding.The original and reduced problem can be solved using any LP method; for the proposed procedure, the Simplex method was used.
The two algorithms, the cosine simplex and the weighted average algorithm, are combined and use the essential constraints proposed by the mentioned me-thods individually.Applying this procedure leads to a relaxed problem consisting of the original objective function subject to a subset of the original set of constraints, forming a nonempty, bounded feasible region.Then, using a ranking rule, the proposed method adds essential constraints until the Karush-Kuhn-Tucker conditions [1] [11], which are both necessary and sufficient for the problem (1), are satisfied to form this subset.The ranking rule is related to the cosine simplex and the weighted average algorithm.It can be summarized as follows: The problem's constraints are sorted in ascending order by the weighted average criterion and in descending order by the cosine criterion.After locating the subset of the essential constraints, any LP method can solve the given LP problem.
Regarding the weighted average criterion, a random integer number 1 r : This number is considered the threshold for the weighted average of the constraints.That is, the constraints whose rank (in ascending order) of  , is above this threshold should be used to solve the original problem (1).Let m 1 be the subset of these constraints.This consideration is equivalent to selecting the constraints whose rank (in ascending order) of i λ , for  , is below the threshold.Thus, it is ensured that the con- straint related to min λ is used.Regarding the maximum cosine criterion, a random integer number 2 r : is also chosen, and it is considered a threshold for the cosines of the constraints.Let m 2 be the subset of these constraints.
In this case, the constraints whose cosines (in ascending order) are above this threshold should be used to solve the original problem (1).This consideration ensures that the constraint with the maximum cosine closest to the objective function is used.Constraints that form a minimum angle with the objective function are considered active [40].However, if the system of inequalities in problem (1) contains several excessive constraints, some of the minimal angles derived from the cosine similarity (4) may be determined by excessive constraints [40].
The two criteria could lead to the same or a different subset of the original constraints.Then, the new method uses k constraints, where k is less than or at most equal to the original constraints, n k m ≤ ≤ , to solve the problem (1).The k constraints are a combination of the constraints corresponding to both the ascending order of the weighted average values  , and the descending order of cosine values.More specifically, the constraints satisfying the two criteria are essential for the original problem.The set of constraints proposed by the two approaches-which is the sum of the constraints proposed by the weighted average and cosine method-is used to avoid a potentially non-feasible area.In the event that the constraints chosen do not lead to a feasible solution, the problem cannot be solved, and the procedure of setting new random numbers based on the weighted average and cosine criteria is repeated.More precisely, the new thresholds that are used to choose the two subsets of constraints m 1 and m 2 , should have a lower value than the previous ones.The proposed algorithm terminates when a feasible solution with fewer constraints than the original problem is obtained (k < m) or all the constraints of the original problem are used (k = m).
Combining the two methods leads to a subproblem equivalent to the original one since it consists of essential constraints.Using fewer of the original constraints may lead to a near-optimum because binding constraints may not be included in the subset.However, applying the proposed method to several known benchmarks has shown that this probability is extremely low (see Table 5).Consequently, the new method could use a subset of the original constraints to solve the problem.For large-scale problems where the number of constraints is much larger than the number of variables, e.g., 10 -15 times larger, the number of essential constraints could be much smaller than the number of original ones.Therefore, the dimension reduction can be quite significant.Apart from the dimension reduction, one advantage of the proposed method is that only a subset of the original constraints is included in the overall calculations using constraint selection.Another advantage is that no artificial variables except those used to solve the problem are used.The number of artificial variables is also reduced when a reduced subset of constraints is used.Moreover, in the subset of constraints, there is at least one constraint that forms the feasible region and is proposed by the weighted average method, and at least one constraint that is considered active and is proposed by the weighted average method and/or the cosine method.

The Pseudocode of the Proposed Algorithm
A flow diagram of the weighted average and cosine algorithm (w.a.co) to provide an optimal solution for LP is shown in Figure 3.A formal description of the proposed algorithm is presented below: Algorithm The weighted average and cosine algorithm (w.a.co) input: The number of decision variables (n), the number of constraints (m), the coefficient matrix of the problem (A), the vector of the right-hand side coefficients (b), the vector of the objective coefficients (c), two random integer numbers : r n r m < ≤ .
output: The percentage of binding constraints after the reduction that are identical to the binding ones in the original problem (p.bind), the total used constraints (used), the constraints of the weighted average method (m1), the constraints of the cosine similarity method (m2), the common constraints ( increasing or 1 1 der rank 1 Select: m1 = the constraints of rnd1 set that their rank is

An Example of a Linear Programming Problem
In this section, we solve an LP problem using the proposed method.
Consider the following LP problem (6): max According to the definitions given in Section 2.2, the original set of constraints is { } , , , S s s s =  .The main purpose is to find the S′ subset of con- straints using the proposed approach.LP problem ( 6) is first solved using Simplex in R to validate the effectiveness of the proposed procedure.The optimal value of the objective function is equal to 11.545.The first and fifth (the last) constraints are binding.Consequently, the essential constraints can be located according to the criteria proposed in Section 3. The values for , is referring to the third constraint.The ascending order of the values of cos i θ , , is 2, 1, 5, 4, 3. Using the cosine similarity method, 2 r is randomly chosen as 4.Then, the constraints having the four bigger values of cos i θ , 1, 2,3, 4,5 i = are chosen.In this case, the subset of essential constraints that are finally used are the first, the third, the fourth, and the fifth.The second constraint is not used since it does not satisfy either the weighted average or the cosine criterion.The second constraint is not binding, and its removal does not affect the optimal solution to the problem.
Therefore, the sufficient problem (7) consists of the four constraints is presented below: max LP problem ( 7) is solved again using Simplex in R language.The optimal value of the objective function of the reduced problem is equal to 11.545, which is the same as the optimal value of the objective function of the original problem (6).In the final problem (7), the first and fourth constraints are binding; the binding constraints of LP problem (6) are the same as the binding constraints of LP problem (7).According to the definitions given in Section 2.2, in LP problem (7), the subset of the essential constraints is { } , , , S s s s s ′ = while in LP problem (6), the original set of constraints is { } , , , , S s s s s s = .The constraint re-American Journal of Operations Research duction for the problem is 20%.

Numerical Results
In this section, the numerical results of the proposed algorithm are presented.At first, we apply the proposed method to several well-known benchmarks and random LP problems.Then, the proposed procedure was implemented in R for more than 2000 LP problems.Finally, after applying the procedure, the whole set of the reduced LPs, consisting of several well-known benchmarks and random LP problems, was solved using Simplex in R.

Application of the w.a.co. Algorithm in Netlib Problems
The proposed algorithm was applied to well-known benchmarks in the literature as netlib problems [48].The netlib library includes medium and large-scale LP problems where some of them refer to real-life scenario problems; for example, blend is a variant of the oil refinery problem; boeing1 and boeing2 refer to flap settings on aircraft for economical operations; dfl001 is a real-world airline schedule planning (fleet assignment) problem; finnis is a model for the selection of alternative fuel types; lotfi involves audit staff scheduling; and pilot is one of the economic models developed by George Dantzig's group.In these problems, the original constraints are the given constraints.The main goal was to determine the essential constraints that form the sufficient problem according to the definitions given in Section 2.2, that is, the , , , u S s s s ′ =  subset.However, only 16.45% of the set of problems (13 out of 79 problems) have constraints greater than the variables.Consequently, to deal with this particularity, some steps of the proposed algorithm were partially adapted: Moreover, we should mention that LP problems where the number of variables is greater than the number of constraints can be solved in dual form [27] apart from using the adapted algorithm.Table 5 reports the numerical results of applying the algorithm to a set of netlib problems.Since the number of constraints is less than the number of variables in 83.3% of netlib problems, in some problems there is no significant constraint reduction.More specifically, the number of essential constraints was less than the original constraints in 50% of the problems and equaled that in 50% of the problems.We also observe that the two procedures chose different sets of constraints (m 1 and m 2 ).In some problems, the average weighted method proposed more constraints than the cosine similarity method; in others, the cosine similarity method proposed more constraints than the weighted average method; however, it is unclear which procedure is preferred.The weighted method selects fewer constraints than the cosine similarity method in 48.7% of the netlib problems.The detailed numerical results are summarized in Table 5.
Regarding the names of the columns in Table 5, n refers to the number of variables, m to the number of constraints, used refers to the number of essential constraints proposed by the proposed method, n.bind is the number of binding constraints according to (5), p.bind is the percentage of binding constraints located by the proposed method to the binding constraints of the original LP problem, m 1 is the set of constraints proposed by the weighted average method, m 2 is the set of constraints proposed by the cosine similarity method, and common refers to the common constraints between m 1 and m 2 .According to the definitions given in Section 2.2, the numbers in column "used" refer to the subset , , , u S s s s ′ =  of the essential constraints, while the numbers in column "m" refer to the set , , , m S s s s =  of the original constraints.Regarding the number of iterations, the required number of iterations using the Simplex method is, on average, ( ) [49].Therefore, it is obvious that even when the constraint reduction is small using the proposed method for some netlib problems, the number of iterations and operations is significantly reduced.
Descriptive statistics regarding the application of the algorithm are reported in Table 6.The first column refers to the percentage of the binding constraints (p.bind) after the reduction that are identical to the binding ones in the original problem; the second is the percentage of the used constraints (used_m), while the third and fourth columns are the percentages of m 1 constraints (m1_m) and m 2 constraints (m2_m) to the original ones, respectively.The following three columns are: the percentage of the m 1 constraints regarding the essential ones (m1_used), the percentage of the m 2 constraints regarding the essential ones (m2_used) and finally, the ratio of the m 1 constraints to the m 2 ones (m1_m2).
The last column (c.used) is the percentage of the common constraints between the two subsets m 1 and m 2 and the essential ones.
The method used 95% on average of the original constraints, while the minimum percentage of used constraints is 55.3%, and the maximum percentage is 100%.The median percentage of correct binding constraints for the problems equals 100% (99.9% on average) (Table 6).According to Table 5, the correctly binding constraints are 100% in 87.3% of the netlib problems, and in 12.7% of the problems, the percentage of correctly binding constraints is between 98.4% and 99.8%.Therefore, there is a slight chance, 0.09%, of not including a binding constraint using the proposed method.In terms of average, the percentage of the constraints proposed from the average weighted method and the cosine similarity method was 66.8% and 65.2% of the original constraints of the problem, respectively, and the constraints proposed from the average weighted procedure and the cosine procedure were 70.4% and 67.8% of the essential constraints of the problem, respectively (Table 6).The constraint reduction can be calculated up to 64% (initial constraints minus used ones).Since the mean percentage of the common constraints of the two algorithms to the essential ones is 36.9%,there is a need to exploit the proposed union of constraints between the two algorithms (Table 6).

Application of the w.a.co. Algorithm in Random LP Problems
Apart from netlib problems, the proposed method was also applied to random LPs.We considered applying it to random LPs to exploit the proposed algorithm and prove its efficiency.The problems were created using the same R procedure presented in Section 2.4.The method was applied to 1000 random medium-scale LP problems, where [ ] .The number of constraints was set to 10 -15 times larger than the number of variables.After the application, the random LP problems were solved using Simplex.The proposed algorithm contributed to reducing the number of original constraints and performed fewer iterations when solving the reduced LP problems.The results are summarized in Table 7, where the column names are the same as those described in Table 5.Furthermore, in Table 7, the last column, iter_red, refers to the percentage of the iteration reduction.The method used 85.9% on average of the original constraints in medium-scale problems, while the minimum percentage of used constraints equals 36% and the maximum percentage is 100%.The percentage of binding constraints in the problems is equal to 93.3% on average.In 25% of the random problems, the percentage is 100%, and in 75%, the percentage is between 77.78% and 100%.In terms of the average percentage of constraints, the constraints proposed from the weighted average method and the cosine similarity method were 60.5% and 60.2% of the original ones, respectively, and the constraints proposed from the weighted average method and the cosine similarity method were 69.4% and 69.4% of the essential ones, respectively.The constraint reduction is 14.1% on average and can be calculated up to 64% (initial constraints minus used ones).The ratio of the common constraints of the two methods to the essential ones is 42.3% on average, which means there is a need to exploit the proposed union of constraints between the two methods.
Our method performed fewer iterations on the reduced problems than on the original ones.The number of iterations demanded was 19.6% less on average than the original.The iteration reduction was calculated at 73.7%.As was mentioned in the netlib problems, the two procedures chose a different set of constraints (m 1 and m 2 ).According to Table 8, the weighted method selects more constraints than the cosine similarity method in 51.1% of the random medium-scale problems, while in large-scale problems, it selects more constraints than the cosine similarity method in 48.6% of the random large-scale problems.
Moreover, in large-scale problems, the method used 86.9% on average of the original constraints.The minimum percentage of essential constraints is equal to 55%, and the maximum percentage is equal to 100%.The percentage of binding constraints is 92.5% on average; in 25% of random problems, the percentage is 100%, and in the remaining 87%, the percentage is between 76.5% and 98.3%.In terms of the average percentage of constraints, the constraints proposed from the weighted average method and the cosine similarity method were 62.5% and 60.7% of the original constraints of the problem, respectively, and the constraints proposed from the weighted method and the cosine similarity method were 70.9% and 69% of the essential constraints, respectively.The constraint reduction is 13.1% on average and can be calculated up to 44.5% (initial constraints minus used ones).The percentage of the common constraints of the two methods compared to the essential ones is 40%, which means there is a need to exploit the union of the proposed constraints between the two methods.These results are summarized in Table 9.
The names of the columns in Table 9 are the same as those in Table 7.In terms of the average number of iterations, the number is 18.9% less than those in the original problem.The iteration reduction was calculated at 77.3%, and in this case, the two methods chose a different set of constraints (m 1 and m 2 ).

Statistical Analysis
In this section, the results of a statistical analysis that was conducted using the characteristics of the problems related to the two subsets of constraints, m 1 and m 2 are presented.At first, statistical analysis was used to confirm the need to use a combination of constraints m 1 and m 2 , as proposed by the two procedures, the weighted average and the cosine similarity procedure.Furthermore, statistical analysis was used to identify the most critical factors that may affect the accuracy of the solution that was obtained using the proposed method and describe relationships between these factors regarding the accuracy of finding the optimal solution.These factors arise from the characteristics of the problems that are related to the new approach and its sub-methods.The percentage of correct binding constraints in LP problems using the proposed method was determined as a factor to specify the accuracy of the approach.

Decision Trees for Medium and Large-Scale Problems
The further goal of the statistical analysis was to predict the characteristics of the problem with respect to the two methods and confirm that it is necessary to use both methods to find an optimal solution.According to the numerical results in Section 4.2, the mean percentage of the common constraints of the two algorithms to the essential ones is 42.3% on average in random LPs.Therefore, we conducted further analysis to explain and determine the proposed union of constraints between the two methods.More specifically, our main goal was to study which method should be used in different types of LPs to achieve better accuracy in finding the subset of binding constraints.For this purpose, a tree-based classification model was performed.Decision trees are techniques for determining which subsets of explanatory variables are most relevant to predicting a response variable.A tree structure is generated by recursively dividing the sample into a series of groups.A problem in tree construction is determining the splitting of the initial information into smaller and smaller pieces.The fundamental idea is to select each subset split so that the data in each descendant subset is "purer" than the data in the parent subset.Each subdivision is made such that the difference in the response variable in the resulting two groups is maximized [50] [51].In addition, this technique does not require distributional assumptions; it is more resistant to the effects of outliers, and no advanced statistical knowledge is required to apply or interpret decision trees [50].Moreover, decision trees are data-driven and not model-driven.Therefore, this technique has been preferred in statistical analysis over other classification methods or models, such as discriminant analysis and logistic regression, which have their common origin in the general linear model [50] [51].Two tree-based classification models were constructed from the two datasets of 1000 random medium and 1000 random large-scale problems.
To perform a tree-based classification analysis, we considered the following variables according to the proposed method: • The variable p. bind_95 refers to the percentage of correctly binding constraints.The result is measured in two categories: the percentage that is less than 95% accurate and the percentage that is more than 95% accurate.It was considered a nominal bivariate variable.• P.bind_95 is a variable that we wanted to predict, and it was considered a measure of the accuracy of the proposed method.• Variables m1_m_cat and m2_m_cat refer to the percentage of binding constraints used by the weighted average method and the cosine similarity method, respectively, in the total binding constraints of the problems.These percentages are considered nominal bivariate variables with two options: ○ the number of binding constraints (m_1) that is less than 50% (median) of the original number of constraints and the number of binding constraints (m_1) that is more than 50% (median) of the original number of constraints, ○ or the number of binding constraints (m_2) that is less than 50% (median) of the original number of constraints and the number of binding constraints (m_2) that is more than 50% (median) of the original number of constraints.
• The variable m1_m2 is the ratio between the constraints from the weighted mean method and those from the cosine similarity method.
The ratio was considered a nominal binary variable with two options: the first is when more constraints from the weighted average method are used, and the second is when more constraints from the cosine similarity method are used.
We use the tree-based classification model in medium and large problems, where p.bind_95 is assumed to be the dependent variable and the other variables are independent.According to the performance measures used, the methods QUEST [52] [53], CHAID/exhausted CHAID [54] [55], and CART [56] [57] constructed the same trees.Moreover, the risk of the method and the percentage of correct classification were the same for the three methods [58] [59].Therefore, without loss of generality, the CHAID growing method is used.This method builds a predictive model or tree to help determine how variables best merge to explain the outcome in the given dependent variable p.bind_95.More specifically, we use the CHAID method to predict the correct binding constraints using the constraints proposed by the weighted average method and/or the cosine Simplex algorithm.Furthermore, we want to predict which method should be used more to form the subset of essential constraints.For statistical analysis, we used IBM SPSS Statistics 28.

Statistical Analysis and Numerical Results
The classification trees for medium and large-scale problems are illustrated in Figure 4 and Figure 5 respectively.For each node, the number of problems and the percent of binding constraints, which is less or more than 95%, are given.The splits occur in order of importance.For example, m2_m_cat was the most significant factor regarding the percentage of binding constraints.The "parent" node is the accuracy percent of binding constraints and contains splits into two "child" nodes, one containing the percentage of the constraints suggested by the cosine similarity method, which is less than 50% (median), and the other containing the percentage, which is more than 50% (median).In medium-scale problems, there are eight termination nodes; in large-scale problems, there are six termination nodes.The classification in Figure 4 and Figure 5 is highlighted in gray.For example, in the medium-scale LP tree-based model that is described in Figure 4, the binding constraints in node 5 are classified in the first category (the percentage is less than 95%).
For both medium and large LP classification models, a suitable proposal for the two subsets of constraints, m 1 and m 2 can lead to more than 95% accuracy in constraint binding.More precisely, the constraints from the weighted average method should be less than half of the original ones to achieve more than 95% accuracy in binding constraints.However, the constraints proposed by the cosine method should be less than or more than half of the original constraints.If the constraints proposed by the weighted average method are more than half of the original constraints, the accuracy of the binding constraints is less than 95%.This consideration is described in Figure 4 and Figure 5.The tree-based classification model calculates a measure of prediction accuracy (risk) to indicate the proportion of cases misclassified by the proposed classification.For medium-sized problems, the risk for resubstitution and cross-validation is 0.275 (standard error = 0.014), whereas for large problems, the risk for substitution is 0.266, and for cross-validation is 0.275 (standard error = 0.014).Table 10 reports these results.For medium-scale problems, the model correctly classified 71.2% of the first class of binding constraints (percentage of correct identification < 95%) and 73.8% of the second class of binding constraints (percentage of correct identification > 95%), while the overall percentage of correct classification was 72.5% (Table 11).For large-scale problems, the model correctly classified  71.2% of the first class of binding constraints (percentage of correct identification < 95%) and 73.8% of the second class of binding constraints (percentage of correct identification > 95%), while the overall percentage of correct classification was 73.4%.These results are summarized in Table 11.Consequently, for medium and large problems, there is a 72.5% and 73.4% chance of achieving accuracy greater than 95% for binding constraints using the characteristics of the problems concerning the two subsets of constraints, respectively.
The tree-based classification model supported the combination of the two methods.According to the classification model, the constraints proposed by the weighted average method should be less than 50% of the original ones, while the constraints proposed by the cosine method should be more than 50% of the original ones to achieve accuracy in solving LP problems.

Conclusions
Researchers tend to include constraints that are not binding to the optimal solution when formulating linear programming problems for fear of excluding necessary constraints.This inclusion does not affect the optimal solutions; nonetheless, it may necessitate more iterations and raise the computing difficulty.Thus, algorithms inevitably need to be developed to eliminate redundancy using necessary constraints and reduce the dimension of the problem under study.
For this purpose, this paper proposes a method for dimension reduction of medium and large-scale linear programming problems using a subset of the original constraints considered essential.This subset includes the binding constraints and forms a sufficient problem equivalent to the original one.It is chosen from two complementary procedures: the weighted average procedure and the cosine procedure.The weighted average procedure ensures that the proposed method uses at least one constraint that forms the feasible region and is potentially binding, while the cosine similarity procedure uses at least one constraint with a high probability of binding.According to the numerical results, the combination of the two algorithms mentioned seems to be promising.The accuracy of identifying significant constraints using this method was tested on a collection of known benchmarks and 2000 random medium and large-scale LP problems.
Statistical analysis showed that neither method should be preferred but should be used in a complementary manner.More specifically, the constraints proposed by the weighted average method should be less than 50% of the initial ones, while the constraints suggested by the cosine method should be more than 50% to achieve accuracy in solving LP problems.The reduction of constraints was calculated up to 44.7% in netlib problems, 64% in medium-scale problems, and 44.5% in large-scale problems, and the iteration reduction was calculated up to 73.7% and 77.3% in medium and large-scale problems, respectively.
Future work includes extending the computational studies to a larger number of tested problems from available benchmark collections and in integer or mixed-integer linear programming and improving the reduction and accuracy level by implementing the algorithms to use an appropriate subset of the original constraints.In addition, the quality of the linear model can be improved by checking if the proposed subsets contain constraints that are not necessary by using specific properties as used in constraint satisfaction problems.The proposed method can be applied to large-scale real-life problems like logistics and warehouse LP-formulated problems, transportation and transshipment problems, scheduling problems, product mix problems, airline operations problems, and problems related to personnel allocation and the public sector.Using the proposed method, we can also have information about the subset of constraints that can be used to solve the problem and consider alternative scenarios in case the optimal solution cannot be obtained for real-life problems for unpredictable reasons.
and 1, 2, , j n =  .The coefficient function of the problem is ( ) T = z x c x .All types of constraints are included in problem formulation since it is not known a priori which constraints are binding, redundant, or excessive.


. The coefficient function of the problem is ( ) T = z x c x .The con- straints in the sufficient problem

Figure 2 .
Figure 2. Second segmentation of the random LP problem.

Figure 3 .
Figure 3.A flow diagram of the weighted average and cosine algorithm (w.a.co).
solve the problem (1) using an LP method -STOP end first if Print p.bind, used, m1, m2, common end while = the constraints of rnd1 set that their rank is 1 r ≥ Select: m2 = the constraints of rnd2 set that their rank is solve the problem (1) using an LP method -STOP end first if Print p.bind, used, m1, m2, common end while

Figure 4 .
Figure 4. Tree-based classification model in medium-scale random LPs.

Figure 5 .
Figure 5. Tree-based classification model in large-scale LPs.

Table 1 .
Binding constraints after the first and the second segmentation in 500 × 50 LPs.

Table 2 .
Statistics after the first and the second segmentation in 500 × 50 LPs.

Table 3 .
Binding constraints after the first and the second segmentation in 750 × 50 LPs.

Table 4 .
Statistics after the first and the second segmentation in 750 × 50 LPs.

Table 5 .
Computational results on a selection of well-known benchmark LPs.

Table 6 .
Statistics on a selection of well-known benchmark LPs.

Table 7 .
Statistics on medium-scale random LPs.

Table 8 .
Constraints m1 and m2 in medium and large-scale random LPs.

Table 9 .
Statistics on large-scale random LPs.

Table 10 .
Classification risk in medium and large-scale random LPs.

Table 11 .
Classification in medium and large-scale random LPs.