A Perspective on Stochastic Search Efficiency via Quasigradient Techniques in Constrained Models

Gilberto P&#233; rez-Lechuga

doi:10.4236/ajor.2025.156010

American Journal of Operations Research > Vol.15 No.6, November 2025

A Perspective on Stochastic Search Efficiency via Quasigradient Techniques in Constrained Models

Gilberto Pérez-Lechuga
División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Ciudad Madero, Tecnológico Nacional de México, Ciudad Madero, Tamaulipas, México.
DOI: 10.4236/ajor.2025.156010 PDF HTML XML 30 Downloads 155 Views

Abstract

This article examines some of the properties of quasi-Fejer sequences when used in quasi-gradiental techniques as an alternative to stochastic search techniques for optimizing unconstrained mathematical programming models. The convergence and efficiency of the method are analyzed, and its potential use as an interior-point algorithm for optimizing integer linear programming models is explored, ensuring the feasibility of the solution at each stage of the search. To achieve this, it is proposed to remain within the feasible region by using small perturbations around the points found until convergence is reached. This alternative is compared with the traditional Branch and Bound method using software programs available for this purpose. The results obtained suggest that the technique, applied to models with few variables, is inefficient but is practical for large-scale models, since simple changes in the components of the located points generate a feasible sequence that almost always converges.

Keywords

Fejer Successions, Integer Programming, Stochastic Convergence Techniques, Random Search

Share and Cite:

Pérez-Lechuga, G. (2025) A Perspective on Stochastic Search Efficiency via Quasigradient Techniques in Constrained Models. American Journal of Operations Research, 15, 195-221. doi: 10.4236/ajor.2025.156010.

1. Introduction

Generally, Random Search Techniques (RSM) are methods used for the optimization of some mathematical programming models using approximation sequences that improve in each iteration of the algorithm based on simple changes. Similarly, Stochastic Quasi-Gradient (SQG) methods are stochastic algorithmic procedures for solving general constrained optimization problems with non-differentiable and non-convex functions. In deterministic models, this technique offers extensive possibilities for implementing alternative heuristics that seek to minimize classical computational complexity in models such as integer linear programming. This class of algorithms defines an exploration sequence similar to sequential adaptive learning and improves decisions based on data and simulations, which are known as Adaptive Monte Carlo Optimization [1].

The computational complexity of an integer linear programming model lies in the integrality restrictions imposed on the model, which force it to explore a discrete region with a number of combinations that grow exponentially as the model increases the number of variables. This means searching for solutions in a non-convex and non-continuous region, which greatly complicates the problem. Hence, they are classified as NP-hard problems, in which the time to convergence also grows exponentially. Methods for solving instances of integer programming models use exact algorithms and/or approximation methods that generally work using simple rules, almost always obtaining an approximate solution. In the first case, exact algorithms obtain exact solutions in reasonably finite times by reducing the search space, called cuts. Unfortunately, these methods work well for small problems, but are of no practical use since real problems require an enormous amount of time (exponential) to achieve convergence. Problems related to manufacturing, logistics, facility location, routing and more require millions of variables, almost always integers, to provide a satisfactory solution to the model [2]-[6]. Heuristics for integer programming are sometimes used with relative success because their approximation is not good, but they solve large-scale problems in reasonable times. Among the most popular are Local Branching, Relaxation-Induced Neighborhood Search and Variants.

An integer programming problem can be presented in several forms. In its purest form, all variables involved in the model must be integers; that is, it is a Pure Integer Linear Programming (PILP) model. In other cases, binary variables are required for yes-no decisions, in which case, the model is called Binary Linear Programming (BLP). Finally, a mixture of integer, continuous, and even binary variables in a single model produces a Mixed Integer Linear Programming (MILP) model. It has been intensively demonstrated that the computational complexity of an integer programming model (in any of its meanings) is classified as NP since the algorithm evolves in pseudo-polynomial time for any number of constraints [7].

Some approaches to the problem are based on the use of evolutionary algorithms that make use of selected nodes of a branching tree whose basic elements are the population, combination, mutation and selection [8] [9]. In some cases, integer optimization is used in novel areas such as bilevel optimization, where some variables are constrained to be the solution of another optimization problem [10]. Recently, learning techniques (supervised learning) have been introduced into this algorithm to improve its critical components [11]. These techniques are being considered as an alternative to solve combinatorial optimization (CO) problems and for the moment, they represent a promising alternative idea to NP-complex problems. In this paper, we revisit a concept that has remained classic: the techniques developed by Ermoliev et al. [12] for optimization problems without the need for continuity or differentiability constraints, and orient them toward optimization models with integrity characteristics.

In the practice of engineering and science, it is frequently required to solve optimization models whose decision variables must be integers. This situation arises when non-fractional quantities are handled in decision-making models, such as the number of vehicles exported by a company, the number of people assigned to a manufacturing operation, or the number of meals to be served in a restaurant during a given period.

This paper analyzes and evaluates the alternative of using a perturbation and bounding technique at feasible points in an integer linear programming model. This allows the method to identify how a multivariate function changes in a specific direction, not just along the coordinate axes. Thus, using this rate of change, it is possible to find a direction of descent, allowing the procedure to remain within the search region while maintaining the feasibility of the solution.

Therefore, using a random search method from the first iteration, it is possible to introduce changes into the model at regular intervals driven by a known probability distribution.

Given the nature of the problem analyzed, the properties of quasi-Fejer sequences are useful to determine the convergence of the technique via the monotonicity properties of such sequences.

The problem is initially approached from the perspective of constructing a convex and bounded set from which discrete candidate points can be generated that can be evaluated to locate a descent direction. The $ρ$ -dimensional sphere is ideal for this purpose. A hypersphere, or more commonly called an $ρ$ -dimensional sphere, is a generalization of the circle (called a 2-sphere) and the usual sphere (called a 3-sphere) to dimensions $ρ \geq 4$ . Therefore, the $ρ$ -sphere is defined as the set of $ρ$ -tuples of points $(x_{1}, x_{2}, \dots, x_{ρ})$ such that $x_{1}^{2} + x_{2}^{2} + \dots + x_{ρ}^{2} = R^{2}$ . In optimization theory, an $ρ$ -dimensional sphere takes on special importance because it defines a perfect convex and compact set.

For the above reasons, we take up the ideas developed in [13] [14] and extend them to the solution of cases of entire problems, we show the underlying theory in the proposal and we illustrate the results with a numerical example.

For this exploration, the following research questions will be answered:

1) Can an alternative metaheuristic be constructed that allows an approximation to the optimal value of an integer linear programming model using techniques?

2) What should be the shape of the descent direction?

3) How can a vector perturbation of them be achieved in order to find a sequence of integer points that satisfactorily converges to the desired optimum?

4) Would it be possible to apply the investigated method to large-scale models commonly required in practical engineering and science applications while minimizing its inherent computational complexity?

To address this problem, the document has been organized as follows. In Section 2, the problem to be addressed is formally presented and its notation is defined. Section 3 illustrates how to apply the alternative studied and compares the results in simple visualization examples. A discussion of the findings is presented in Section 4; finally, Section 5 presents the conclusions of this work.

Below, we describe the sequence of steps used to solve the proposed instance.

2. Statement of the Problem

In mathematical programming, a heuristic is a technique used to approximate solutions to complex problems using simple rules that do not guarantee finding the optimal solution, but are good enough to achieve within a reasonable timeframe [15]. When a heuristic can be implemented as a computational algorithm, then it is called a metaheuristic [16].

In this part of the document, we are interested in analyzing an alternative method (heuristic) to optimize models of the following type

$Minimize g (X) = c_{1} x_{1} + c_{2} x_{2} + \dots + c_{n} x_{n} = \sum_{i = 1}^{n} c_{i} x_{i}$ (1)

where

$g (X)$ is the objective function or utility function of the problem.
$c_{i}$ , $i = 1, 2, \dots, n$ define the cost coefficients of the utility function.
$x_{i} \in ℤ^{+} \cup {0}$ , $i = 1, 2, \dots, n$ are the decision variables (positive integers) of the problem

The restrictive set is defined by the matrix system given by

$(\begin{matrix} a_{11} & a_{12} & \dots & a_{1 n} \\ a_{21} & a_{22} & \dots & a_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 2} & \dots & a_{m n} \end{matrix}) (\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix}) = (\begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{m} \end{matrix})$ (2)

where $a_{i j}$ are called the technological coefficients and $b_{i}$ are the available resources. For simplicity, in the remainder of this document, the problem (PILP) will be represented in the following equivalent form

$Minimice g (X) = {C X^{t} | A X = b, X \in D \subset ℤ^{+} \cup {0}} .$ (3)

here, $D$ is the feasible region where all the restrictions are met. $A$ is a matrix of size $m \times n$ , and $b$ is a vector $m \times 1$ . An overview of quasi-Fejer sequences is provided below.

A quasi-Fejer sequence is a sequence in a finite-dimensional Hilbert space that satisfies a Fejer monotonicity property, plus an additional error. Such sequences approach an accumulation point by dragging along an error that decreases with time. Formally, a sequence ${X_{k}}$ in a Hilbert space is Fejer monotone with respect to $D$ , if for each $X \in D$ [17].

${‖ X_{k + 1} - X_{k} ‖}^{2} \leq {‖ X_{k} - X ‖}^{2} + ϵ_{k}$

where $\sum ϵ_{k} < \infty$ .

Similarly, the concept of a quasi-gradient will be formally addressed as a statistical estimate of a true gradient. Thus, when searching within $D$ , an estimated approximation toward a downward direction in the first two search values will be used. The search is the updated based on the newly acquired information.

2.1. The Concept of Quasi-Gradient and Its Construction

Let $g (X)$ be a convex function not necessarily differentiable. The subgradient vector at the point $X = (x_{1}, \dots, x_{n})$ is any $\tilde{\nabla} g (X)$ vector that satisfies the inequality

$g (Y) - g (X) \geq 〈 \tilde{\nabla} g (X), Y - X 〉$ (4)

For any arbitrary, $Y \in ℝ^{n}$ . The vector $\tilde{\nabla} g (X)$ forms a right angle with the normal to the supporting hyperplane of the set ${Y | g (Y) < g (X)}$ , so if $g (X)$ is differentiable, then $\tilde{\nabla} g (X)$ coincides with the gradient of $g$ in $X$ , $\nabla g (X)$ . Analogously, if $f (X)$ is convex, then $g$ is a quasi-gradientin $X$ if

$g^{T} (Y - X) \geq 0 \Rightarrow f (Y) \geq f (X)$

Geometrically, $g$ defines a supporting hyperplane to the sublevel set ${x | f (X) \leq f (X_{0})}$ . In this case, the set of quasigradients at $X_{0}$ forms a cone [18].

Random search techniques work from a sequence of random variables $X$ defined on $D$ that force it towards a limit point $X^{*}$ . The randomness of the search consists of proposing an estimator of the subgradient vector that serves to obtain a direction of descent. Such estimator can be constructed using the Monte Carlo method, where the most important thing is to demonstrate that the proposed subgradient estimator is an expression of the type

$Ξ (X) = E [\nabla g (X)] = c_{k} \tilde{\nabla} g (X_{k}) + Θ_{k}, k = 0, 1 \dots$ (5)

where $c_{k}$ is a non-negative number and $Θ_{k}$ is a vector dimensionally compatible with the subgradient $\tilde{\nabla} g (x_{k})$ .

In this analysis, it is possible to use a variant of the technique to build at least an initial iteration to start the descent sequence, approaching the optimal point $X^{*}$ step by step in such a way that in the $k$ -th iteration, the point $X_{k}$ is known and therefore, the next point $X_{k + 1}$ will be achieved through the classic approximation given by

$X_{k + 1} = X_{k} + α_{k} \tilde{\nabla} g (X_{k}), α > 0, X_{k} \in ℤ^{+}, k = 1, 2, \dots$ (6)

Because of the way in which these types of algorithms approach a solution, the criterion for stopping the search here is based on convergence in probability, that is,

$lim_{n \to \infty} P [‖ X_{k + 1} - X_{k} ‖ \geq ϵ] = 0, \forall ϵ > 0.$ (7)

This possibility is explored below.

2.2. The Mathematical Procedure

Consider a vector $θ = (θ_{1}, \dots, θ_{n})$ whose components are independent and uniformly distributed random variables in $[- c_{1}, c_{2}]$ . In this paper, a discrete uniform distribution is used to obtain a finite set of integer values uniformly distributed over the surface of the hypersphere.

Since there is no a priori information about the probability density that defines the search region, we will consider that the movement can occur in any direction with the same probability. Suppose that in iteration $k$ , $ρ_{k}$ samples of size $s$ are available

$θ_{k_{1}}^{T} = (θ_{k_{1}}^{1}, \dots, θ_{k_{1}}^{s})$

$⋮$

$θ_{k_{i}}^{T} = (θ_{k_{i}}^{1}, \dots, θ_{k_{i}}^{s})$

$θ_{k_{ρ_{k}}}^{T} = (θ_{k_{ρ_{k}}}^{1}, \dots, θ_{k_{ρ_{k}}}^{s})$

If $X_{k}$ is a vector given at iteration $k$ with $Δ_{k} > 0$ , then an estimator $ξ (X_{k})$ of the subgradient of $g$ in $X_{k}$ can be written as

$\begin{matrix} ξ (X_{k}) = \sum_{i = 1}^{ρ_{k}} \frac{g (X_{k} + Δ_{k} θ_{k_{i}}^{T}) - g (X_{k})}{Δ_{k}} θ_{k_{i}} \\ = \frac{g (X_{k} + Δ_{k} θ_{k_{1}}^{T}) - g (X_{k})}{Δ_{k}} θ_{k_{1}} + \dots + \frac{g (X_{k} + Δ_{k} θ_{k_{ρ_{k}}}^{T}) - g (X_{k})}{Δ_{k}} θ_{k_{ρ_{k}}}, \end{matrix}$ (8)

thus, assuming convexity in $g$ , we have that

$\frac{g (X_{k} + Δ_{k} θ_{k_{i}}^{T} - g (X_{k}))}{Δ_{k}} θ_{k_{i}} \geq 〈 \tilde{\nabla} g (X_{k}), θ_{k_{i}}^{T} 〉 θ_{k_{i}} .$

So, applying the mathematical expectation operator, we have to

$\begin{matrix} E {\frac{g (X_{k} + Δ_{k} θ_{k_{i}}^{T} - g (X_{k}))}{Δ_{k}} θ_{k_{i}}} \geq E {〈 \tilde{\nabla} g (X_{k}), θ_{k_{i}}^{T} 〉 θ_{k_{i}} | X_{k}} \\ = E {〈 θ_{k_{i}}^{T}, θ_{k_{i}} 〉 \tilde{\nabla} g (X_{k}) | X_{k}} \\ = \frac{n}{12} {(c_{2} - c_{1})}^{2} \tilde{\nabla} g (X_{k}) \end{matrix}$

Therefore,

$E {ξ (X_{k}) | X_{k}} = \frac{n}{12} {(c_{2} - c_{1})}^{2} \tilde{\nabla} g (X_{k}) + W_{k}$ (9)

From the above, it follows that if, $Y = {(θ_{k_{i}}^{j})}^{2}$ $j = 1, \dots, n$ , then $Y$ has a density given by

$f_{Y} (y) = {\begin{array}{l} \frac{1}{(c_{2} - c_{1}) \sqrt{y}}, & if 0 \leq y \leq \frac{1}{4} \\ 0, & otherwise \end{array}$

and from the previous assumption, it follows that $W_{k}$ is uniformly bounded, i.e., $‖ W_{k} ‖ \leq ϵ$ , $ϵ > 0$ .

To facilitate the analysis process, we now simplify the search space and define the projection operator. Let us define the set as closed and convex $D = {X | a \leq X \leq b}$ . Let $π (X)$ be the projection operator on $D$ ; that is, for any $X \in ℝ^{n}$ , $π (X) \in D$ and

$‖ X - π_{D} (X) ‖ = min_{Y \in D} ‖ X - Y ‖$

Let the random sequence of points $X_{k}$ be defined as

$X_{k + 1} = Π_{D} (X_{k} - α_{k} γ_{k} ξ_{k}), k = 1, \dots,$ (10)

where $X_{0}$ is an arbitrary point for which $E {{‖ X_{0} ‖}^{2}} = cte < \infty$ , $α_{k}$ is the step length, $γ_{k}$ is a normalization factor and $ξ_{k} = (ξ_{k_{1}}, \dots, ξ_{k_{n}})$ is a random vector whose conditional mathematical expectation is given by

$E {ξ_{k} | X_{0}, \dots, X_{k}} = c_{k} \tilde{\nabla} g (X_{k}) + Θ_{k}, k = 1, \dots,$ (11)

here, $c_{k}$ is a non-negative number, $Θ_{k} = (θ_{k_{1}}, \dots, θ_{k_{n}})$ is a vector, $\tilde{\nabla} g (X)$ is a subgradient, that is, the vector $ξ_{k}$ satisfies a relation of the form

$E {ξ (X) | H (X)} = c \tilde{\nabla} g (X) + Θ .$ (12)

Notice that, when $D = ℝ^{n}$ and $π (X) = X$ , Equation (10) can be used to optimize models of the type (1) and the method is called the generalized stochastic quasi-gradient method. The results presented below are based on the iconic work of Ermoliev [19].

Lemma 1 (Convergence). Suppose that the values of $h_{k}$ are known such that $E {{‖ ξ_{k} ‖}^{2} | X_{0}, \dots, X_{k}} \leq h_{k}^{2} \leq M_{B} < \infty$ , $i = 1, \dots, k$ , and also $‖ X_{k} ‖ \leq B < \infty$ , $i = 1, \dots, k$ . Let be the normalization factor $γ_{k}$ that satisfies the equation

$0 \leq γ_{k} (τ_{k} ‖ X_{k} ‖ + h_{k}) < \infty,$

where $h_{k} (X_{0}, \dots, X_{k})$ , $τ_{k} = 1$ if $‖ Θ_{k} ‖ > 0$ , and $τ_{k} = 0$ , if $‖ Θ_{k} ‖ = 0$ . Let the quantities

$α_{k} \geq 0, c_{k} \geq 0, \sum_{k = 0}^{\infty} α_{k} r_{k} < \infty, \sum_{k = 0}^{\infty} α_{k}^{2} r_{k} < \infty,$ (13)

then, the sequence of points defined by Equation (10) is a quasi-Fejer sequence with respect to the set $D$ . Even more, if it satisfied

$\sum_{k = 0}^{\infty} α_{k} l_{k} = \infty$ (14)

then, the sequence ${X_{k}}$ converges globally to the solution of the problem

${Minimize}_{X} E_{W} {ψ (X, w) | X \in D} .$ (15)

Proof. Let $X^{*}$ be an arbitrary solution to the problem (15), we have to

$\begin{matrix} {‖ X^{*} - X_{k + 1} ‖}^{2} = {‖ X^{*} - Π_{D} (X_{k} - α_{k} γ_{k} ξ_{k}) ‖}^{2} \\ \leq {‖ X^{*} - X_{k} + α_{k} γ_{k} ξ_{k} ‖}^{2} \\ = {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} 〈 ξ_{k}, X^{*} - X_{k} 〉 + α_{k}^{2} γ_{k}^{2} {‖ ξ_{k} ‖}^{2} . \end{matrix}$

Taking the mathematical expectation on both sides of the equality, we have

$\begin{array}{l} E {{‖ X^{*} - X_{k + 1} ‖}^{2} | X_{0}, \dots, X_{k}} \\ \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} c_{k} 〈 \tilde{\nabla} g (X_{k}), X^{*} - X_{k} 〉 + 2 α_{k} γ_{k} 〈 Θ_{k}, X^{*} - X_{k} 〉 \\ + α_{k}^{2} γ_{k}^{2} E {{‖ ξ_{k} ‖}^{2} | X_{0}, \dots, X_{k}}, \end{array}$

where

$\begin{matrix} E {‖ \bar{X} ‖} = E {‖ {({\bar{X}}_{1} (ω), \dots, {\bar{X}}_{n} (ω))}^{T} ‖} \\ = \int_{Ω} \sqrt{{\bar{X}}_{1}^{2} (ω), \dots, {\bar{X}}_{n}^{2} (ω)} P (d ω) . \end{matrix}$

here, Ω is sample space corresponding to the probability space $(Ω, F, P)$ . Applying the Cauchy-Schwarz inequality and considering that $g (X^{*} - g (X_{k})) \leq 0$ , we have to

$\begin{array}{l} E {{‖ X^{*} - X_{k + 1} ‖}^{2} | X_{0}, \dots, X_{k}} \\ \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} c_{k} [g (X^{*}) - g (X_{k})] + 2 α_{k} γ_{k} ‖ Θ_{k} ‖ ‖ X^{*} - X_{k} ‖ \\ + α^{2} γ^{2} E {{‖ ξ_{k} ‖}^{2} | X_{0}, \dots, X_{k}} \\ \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} ‖ Θ_{k} ‖ [‖ X^{*} ‖ + ‖ X^{*} ‖] + α^{2} γ^{2} E {{‖ ξ_{k} ‖}^{2} | X_{0}, \dots, X_{k}} \\ \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} [γ^{*} ‖ X^{*} ‖ + γ_{k} ‖ X_{k} ‖] + α_{k}^{2} γ_{k}^{2} M_{B} \\ \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 α_{k} γ_{k} [γ^{*} ‖ X^{*} ‖ + γ^{*} B] + α^{2} γ^{*} M_{B} . \end{array}$

The inequalities found and the conditions defined in Equation (13) prove the first part of the theorem. Now, it will be proven that if the conditions of Equation (14) are met, then one of the limit points of the succession ${X_{s} (ω)}$ , for almost all, $ω$ belongs to the set of solutions to the problem (1). Applying mathematical expectation again, we have to

$\begin{matrix} E {{‖ X^{*} - X_{k + 1} ‖}^{2}} \leq {‖ X^{*} - X_{k} ‖}^{2} + 2 \sum_{k = 0}^{s} α_{k} l_{k} E {γ_{k} 〈 \tilde{\nabla} g (X_{k}), X^{*} - X_{k} 〉} \\ + 2 [γ_{k}^{*} ‖ X^{*} ‖ + γ^{*} B] \sum_{k = 0}^{s} α_{k} r_{k} + \sum_{k = 0}^{s} α_{k}^{2} γ^{*} M_{B} \end{matrix}$ (16)

From (16), it follows that $E {{‖ X^{*} - X_{k + 1} ‖}^{2}}$ is uniformly bounded and

$\sum_{k = 0}^{\infty} α_{k} l_{k} E {γ_{k}, 〈 \tilde{\nabla} g (X_{k}), X^{*} - X_{k} 〉} \geq - \infty .$

Since that $\sum_{k = 0}^{\infty} α_{k} l_{k} = \infty$ , we have to $E {γ_{k} 〈 \tilde{\nabla} g (X_{k}), X^{*} - X_{k} 〉} \to 0$ when $k \to \infty$ . Note also that there exists a subsequence ${s_{t}}, t = 0, 1, \dots$ for which

$γ_{s t} (ω) 〈 \tilde{\nabla} g (X_{s t} (ω)), X^{*} - X_{s t} (ω) 〉 \to 0.$

with probability one, according to $t \to 1$ . It is concluded then that, for almost all $ω$ , the sequence ${X_{k} (ω)}$ is bounded, that is, for almost all $ω$ $〈 \tilde{\nabla} g (X_{k_{t}} (ω)), X - X_{k_{t}} (ω) 〉 \to 0$ . Then, as $t \to \infty$ , the sequence $X_{k_{t}} (ω)$ converges to the solution of the problem (15), the theorem is proven. ☐

2.3. Generating a Descent Trajectory

A first approach to what could be a search method using the technique is to use a variant of Equation (11) by constructing a dome around the point $X_{k}$ ; that is, create a hypersphere of dimension $ρ$ centered on it and generate a sample of points uniformly distributed on the surface of the hypersphere of unit radius and use the distance between the center and the surface of the hypersphere as the difference between $X_{k}$ and $X_{k + 1}$ to obtain an approximation to Equation (6). The formal ideas are discussed below.

Let’s consider a hypersphere of dimension $ρ$ in which we will obtain a number of points uniformly distributed over its surface (Figure 1).

Figure 1. $ρ$ discrete points uniformly distributed over the surface of a hypersphere in $ℝ^{ρ}$ .

Let $X = (X_{1}, X_{2}, \dots, X_{ρ})$ be a random vector where $X_{1}, X_{2}, \dots, X_{ρ}$ are independent and continuous random variables defined over a probability space $(Ω, F, S)$ with joint density function given by

$f_{X_{1}, X_{2}, \dots, X_{ρ}} (x_{1}, x_{2}, \dots, x_{ρ}) = \prod_{i = 1}^{ρ} f_{i} (x_{i}),$ (17)

By the continuity hypothesis of $X$ , $F_{X_{1}, X_{2}, \dots, X_{ρ}} (x_{1}, x_{2}, \dots, x_{ρ})$ is a non-decreasing function and therefore, for each $X_{i}$ there is the inverse function $ξ_{i} = F_{X_{i}}^{- 1} (U)$ defined for any value of $U \in U ~ [0, 1]$ such that

$ξ_{i} = F_{X_{i}}^{- 1} (U) = inf {x : F_{X_{i}} (x) \geq U},$ (18)

The generation of a random vector $Y \in ℝ^{ρ}$ on the surface of a unit hypersphere in the same dimension is given by Algorithm 1 shown below [20].

Thus, given an initial value $X_{k}$ (which is the center of the unitary hypersphere), generate $ρ$ points uniformly distributed on its surface and apply the following criterion to select the consecutive value (applicable to the minimization case).

Algorithm 1. Algorithm for generating vectors on an $ρ$ -dimensional unit hypersphere.

Figure 2. Trajectory search method using a unit hypersphere.

This is, the new value $X_{k + 1}$ becomes the center of the new hypersphere and the process is repeated (Figure 2). The following sequencing process is then generated

$X_{k + 1} \leftarrow X_{k} if [g (X_{k + 1}) < g (X_{k})], k = 1, \dots,$ (19)

Otherwise, the value of $X_{k}$ is retained.

It can be shown that the acceptance-rejection method to generate the corresponding lotteries is highly inefficient. Its effectiveness is given by

$η (ρ) = \frac{Volume of the hypersphere}{Volume of the hypercube} = \frac{1}{ρ 2^{ρ - 1}} \frac{π^{ρ / 2}}{Γ (ρ / 2)}$

where $Γ (\cdot)$ is the gamma function. It is easy to verify that for $ρ > 4$ , the algorithm becomes inefficient and therefore impractical for large-scale problems (Table 1). For this reason, it is of practical use to locate at most $ρ = 4$ points uniformly distributed on the surface of the hypersphere.

Table 1. Efficiency of the algorithm as a function of the number of points required.

$ρ$	1	2	3	4	5	6	7	8
$η (ρ)$	1.0000	0.7854	0.5236	0.3084	0.1645	0.0807	0.0369	0.0159

Thus, it will be perturbing just some of the components $x_{k}$ of $X_{k}$ in the following manner

${\hat{x}}_{k} = x_{k} \pm β_{k}, for some x_{k} \in X_{k}, β_{k} ~ ⌊ U [- 1, 2] ⌋, k = 0, 1, \dots$ (20)

where $⌊ r ⌋$ means the largest integer less than or equal to $r$ , and the draw of the random variable $β_{k}$ provides the lotteries of the integer values −1, 0, 1, expanding the search in the neighborhood given by Equation (20). Therefore, we now have the perturbed sequence given by

${\hat{X}}_{k + 1} = ⌊ {\hat{X}}_{k} - α_{k} \tilde{\nabla} g ({\hat{X}}_{k}) ⌋, k = 0, 1, \dots$ (21)

with $X_{0} \in D$ known, ${\hat{X}}_{k}$ is the new perturbed vector containing one or more perturbed components ${\hat{x}}_{k}$ . This means that the search should be focused on the direction where $g (X_{k})$ changes value as quickly as possible. The guideline for selecting the appropriate component is to perturb the value of $x_{k}$ satisfying the requirements of Equation (19) and using ${\hat{X}}_{k}$ instead of $X_{k}$ .

Algorithm 2. Pseudocode associated with the proposal.

In this, $α_{k}$ satisfies the following conditions:

$α_{k} = \frac{γ_{k}}{{‖ g (X_{k}) ‖}^{2}}, or equivalently {‖ X_{k + 1} - X_{k} ‖}^{2} = γ_{k}$ (22)

$α_{k} \geq 0, \sum_{k = 1}^{\infty} α_{k}^{2} < \infty, \sum_{k = 1}^{\infty} α_{k} = \infty .$ (23)

and the quasi-gradient estimator is given by

$\tilde{\nabla} g (X_{k}) = \frac{g ({\hat{X}}_{k + 1}) - g ({\hat{X}}_{k})}{{‖ ({\hat{X}}_{k + 1}) - ({\hat{X}}_{k}) ‖}^{2}}$ (24)

Equations (22) and (23) show the conditions that must be imposed on the components of the estimator in order to achieve their concurrence at a minimum value. Such values are necessary in the development of a Fejer sequence to guarantee the monotonicity of its convergence and have been widely demonstrated in [21] and Theorem 1 of this document.

Algorithm 2 and the pseudocode shown in Figure 3 illustrate the steps followed in this process.

Once this outline is complete, we now proceed to test the proposal in the next section.

3. Numerical Results

To illustrate the use of the algorithm, the corresponding pseudocode and the algorithm associated with the proposal are shown below in Figure 3.

Figure 3. Pseudocode associated with the proposal.

The use of the proposal is illustrated below with a numerical example.

- Model 1

$\begin{matrix} Minimize g (X) = 2 x_{1} - x_{2} - 6 x_{3} + 8 x_{4} + 15 x_{5} + 17 x_{6} - 21 x_{7} + 14 x_{8} + 16 x_{9} \\ + 6 x_{10} - 6 x_{11} + 2 x_{12} + 7 x_{13} + 16 x_{14} + 11 x_{15} - 5 x_{16} - 3 x_{17} \\ - 2 x_{18} - 9 x_{19} + 16 x_{20} + 11 x_{21} + 21 x_{22} + 14 x_{23} - 5 x_{24} + 5 x_{25} \\ + 4 x_{26} + 2 x_{27} + 8 x_{28} - 7 x_{29} + 3 x_{30} \end{matrix}$

Subject to:

$\begin{array}{l} 12 x_{2} + 6 x_{3} - 8 x_{4} + x_{8} + 11 x_{9} + 4 x_{10} + 7 x_{11} - 4 x_{12} + x_{13} + 5 x_{15} \\ - 4 x_{17} + 2 x_{21} + x_{22} - 2 x_{23} + 4 x_{24} + 6 x_{26} - x_{28} + 8 x_{29} + 3 x_{30} \geq 750 \\ x_{1} - x_{2} + 4 x_{3} + 7 x_{4} + 2 x_{5} + x_{6} - 8 x_{7} + 11 x_{14} + 9 x_{16} + 11 x_{18} \\ + 21 x_{22} - x_{25} + 8 x_{27} + x_{29} \leq 1200 \\ x_{1} + x_{2} + x_{3} + x_{4} + x_{5} + x_{6} + x_{7} + x_{8} + x_{9} + x_{10} + x_{11} + x_{12} \\ + x_{13} + x_{14} + x_{15} + x_{16} + x_{17} + x_{18} + x_{19} + x_{20} - x_{21} - 4 x_{22} \\ - 9 x_{23} - 11 x_{24} - x_{25} - x_{26} - 2 x_{27} - 8 x_{28} + x_{29} + 2 x_{30} \leq 0 \\ x_{2} + x_{5} + x_{8} + x_{11} + x_{14} + x_{17} + x_{20} - x_{23} - x_{26} - x_{29} \geq - 1 \\ 4 x_{4} + 8 x_{8} + 12 x_{12} + 16 x_{16} + 20 x_{20} + 24 x_{24} + 28 x_{28} - 30 x_{30} \leq 1350 \\ x_{1} \leq 4; x_{2} \leq 6; x_{3} \leq 8; x_{4} \leq 10; x_{5} \leq 14; x_{6} \leq 18; x_{7} \leq 22; x_{8} \leq 26 \\ x_{9} \leq 20; x_{10} \leq 12; x_{11} \leq 16; x_{13} \leq 18; x_{14} \leq 16; x_{15} \leq 18; x_{16} \leq 20 \\ x_{17} \leq 22; x_{18} \leq 26; x_{19} \leq 24; x_{20} \leq 24 \\ x_{i j} \in ℤ^{+} \forall i, j \end{array}$

The exact solution to this instance is shown in Table 2. This is reached after 63 iterations using a standard scientific method of Integer Linear Programming (ILP) via LINGO [22], with $g (X^{*}) = - 1816$ .

Table 2. Exact solution of the proposed instance.

$x_{1}$	$x_{2}$	$x_{3}$	$x_{4}$	$x_{5}$	$x_{6}$	$x_{7}$	$x_{8}$	$x_{9}$	$x_{10}$
2	6	8	0	0	0	22	0	0	0
$x_{11}$	$x_{12}$	$x_{13}$	$x_{14}$	$x_{15}$	$x_{16}$	$x_{17}$	$x_{18}$	$x_{19}$	$x_{20}$
16	0	0	0	0	20	22	26	24	0
$x_{21}$	$x_{22}$	$x_{23}$	$x_{24}$	$x_{25}$	$x_{25}$	$x_{27}$	$x_{28}$	$x_{29}$	$x_{30}$
0	0	0	136	0	0	0	0	45	75

To illustrate the use of this algorithm, the initial steps of the algorithm applied to the previous example are indicated below (see Table A1: Initial solutions).

1) Let $X_{A}$ and $X_{B} \in D$ be defined as follows, with $g (X_{A}) = 2069$ and $g (X_{B}) = 2022$ . Thus, for $ϵ = 0.01$ and $k = 0$ , we have

$\begin{matrix} X_{A} = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \\ 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30 \\ X_{B} = 1,2,3,4,5,6,6,7,8,10,11,11,13,14,15 \\ 15,17,17,18,19,21,22,22,24,24,25,26,27,29,29 \end{matrix}$

2) From the above, it is verified that $γ_{k} = {‖ X_{b} - X_{a} ‖}_{2} = 3.7417$ .

3) Similarly, $\tilde{\nabla} (X_{0}) \approx \frac{2069 - 2022}{3.7417} = 12.5611$ .

4) It is also verified that $min {g (X_{A}), g (X_{B})} = 2069$ , therefore $X_{1} = X_{A}$ , and

$α_{k} = \frac{γ_{k}}{{‖ X_{1} ‖}_{2}} = \frac{3.7417}{97.2368} = 0.0384$

5) Therefore, for $k = 2$ , we obtain $X_{2} \in D$ given by

$X_{2} = ⌊ X_{1} - 0.0384 (12.5611) ⌋, with g (X_{2}) = 1936$

6) The incremental value obtained is $ϵ = {‖ X_{2} - X_{1} ‖}_{2} = 4$ . Because $ϵ > 0.01$ , the specified the process continues.

7) What if $X_{k} \notin D$ ? Proceed to the bounding phase as follows: get the random variable $β_{k} = 3 U - 1$ such that $U ~ U (0, 1)$ and modify $X_{k}$ as shown below

${\hat{X}}_{k} = ⌊ β_{k} * X_{k} ⌋$

until ${\hat{X}}_{k} \in D$ . Replace ${\hat{X}}_{k}$ by $X_{k + 1}$ and continue with the process.

The quantity $(3 U - 1)$ allows us to recognize the neighborhood around $X_{k}$ and to carefully advance in the region, avoiding falling into points outside the boundary of $D$ . Although slow, this procedure allows a safe advance towards an approximate convergence. When the sequence approaches the boundary of $D$ , it is highly probable to generate sequences of infeasible points, having to further reduce the size of the search. Finally, an experimental strategy found suggests perturbing only some components of $X_{k}$ as a directional derivative. The way of choosing the components to be perturbed obeys the criterion ${\hat{x}}_{s} = ⌊ β_{k} * x_{s} ⌋$ , such that $g (⌊ {\hat{X}}_{k} ⌋) < g (⌊ X_{k} ⌋)$ .

Clearly, if the sequence ${α_{k}}_{k}$ is such that ${α_{k}}_{k} < 1$ , then it satisfies the conditions imposed in Equations (22) and (23).

The rest of the solutions and the convergence to the optimal solution are shown in Table A1 of Appendix.

The search behavior and its convergence are shown in Figure 4. It shows how quickly the algorithm progresses in its first attempts to locate solutions better than the original. However, as the algorithm progresses, the search slows, and convergence encounters increasing difficulties in locating a new transfer point. This is because both the feasibility and convergence conditions must be met. In particular, the algorithm slows down when approaching points located on or near the boundary of $D$ , since neighborhoods are often located in infeasible zones. However, once the subsequent points enter $D$ , the algorithm appears to advance more quickly. This also explains why it is slow when approaching the optimal value, since, as is known, it lies in a corner of the simplex formed by its constraints.

Using the described technique, the convergence of the method required 116 iterations. Figure 5 shows the graph of the norm ${‖ {\hat{X}}_{k + 1} - {\hat{X}}_{k} ‖}^{2}$ recorded during the iterations of the algorithm.

Finally, Figure 6 shows the behavior of the $α (k)$ value throughout the evolution of the search.

Figure 4. Required iterations and speed of convergence of the objective function.

Figure 5. Difference in norms during the convergence process.

Figure 6. Values of the parameter $α_{k}$ as a function of the number of iterations.

4. Discussion and Statistical Analysis of the Results

The results found suggest that the proposed method is feasible and constitutes an alternative for large-scale integer models. However, there is still considerable work to be done in the form of selection and creation of the ${\hat{x}}_{k}$ sequences since a direct comparison between $g ({\hat{X}}_{k + 1}) - g ({\hat{X}}_{k})$ is inefficient. So far, ensuring that ${\hat{X}}_{k} \in D$ is only achieved through a feasibility test. Therefore, another line of research in this regard consists of developing a method in which the ${{\hat{X}}_{k}}$ sequence remains feasible at all times.

A simple way to evaluate the efficiency of the method is to determine the mathematical expectation of the differences in increments in a single step $Δ_{k} = g (X_{k + 1}) - g (X_{k})$ and obtain the estimator

$η_{k} = - \frac{E {Δ_{k}}}{E {N_{k}}},$ (25)

where $N_{K}$ is the number of points that must be evaluated to locate a descent direction at iteration $k$ . The approach of the research would be oriented to maximize the value of $η_{k}$ .

An approximation to the efficiency function is as follows. Note that $X_{k + 1} = X_{k} + Δ X_{k}$ , therefore

$g (X_{k + 1}) = g (X_{k}) + Δ X_{k}$

Then, by the convexity of $g$ , we have that

$\begin{matrix} g (X_{k + 1}) = g (X_{k}) + Δ X_{k} \\ = g (X_{k}) + 〈 Δ X_{k}, \tilde{\nabla} g (X_{k}) 〉 + δ (Δ X_{k}) \\ = g (X_{k}) + ‖ Δ X_{k} ‖ ‖ \tilde{\nabla} g (X_{k}) ‖ cos (φ) + δ (Δ X_{k}) \end{matrix}$ (26)

where $cos (φ)$ is the cosine of the angle formed by the unit sphere by the vectors $Δ X_{k}$ and $\tilde{\nabla} g (X_{k})$ . These vectors are unitary and start from the center of the sphere, generating points on its surface. Similarly, the function $δ (Δ X_{k})$ is such that $δ (Δ X_{k}) \to 0$ when $(Δ X_{k}) \to 0$ .

Then

$Δ X_{k} = X_{k + 1} - X_{k} = - α \tilde{\nabla} g (X_{k})$

Thus, by conveniently substituting in Equation (26), we have that

$g (X_{k + 1}) = g (X_{k}) + α {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} cos (φ) - δ (α \tilde{\nabla} g (X_{k})) .$

where for a small $Δ X_{k}$ , we have that

$Δ g_{k} = g (X_{k + 1}) - g (X_{k}) = α {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} cos (φ) .$

Thus, by Equation (25), it is concluded that

$\begin{matrix} η = - \frac{E {Δ g_{k}}}{E {N_{k}}} = - \frac{E {α {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} \cos (φ)}}{E {m}} \\ = - \frac{α}{m} {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} E [\cos (φ)], \end{matrix}$ (27)

where $m$ is the number of points evaluated before finding a descent direction, and $- π / 2 \leq φ \leq π / 2$ . It can be shown that the density of the random angle $φ$ for an n-dimensional hypersphere is given by [23].

$ζ_{n} (φ) = \frac{\sin^{n - 2} (φ)}{\int_{0}^{π} \sin^{n - 2} (φ) d φ} = B_{n} \sin^{n - 2} (φ)$

where

$B_{n} = \frac{Γ (n / 2)}{\sqrt{π} Γ [(n + 1) / 2]},$

and $Γ (\cdot)$ is the gamma function. Thus:

$\begin{matrix} η = - \frac{α}{m} {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} \int_{0}^{π / 2} \cos (φ) \sin^{n - 2} (φ) d φ \\ = \frac{2 α B_{n}}{m (n - 1)} {‖ \tilde{\nabla} g (X_{k}) ‖}_{2} . \end{matrix}$

The $η$ function decreases rapidly for large values of $n$ , remaining insensitive to changes in $m$ but is highly influenced by the value of alpha (Figure 7).

Figure 7. Speed of decline of the efficiency function as a function of n.

Another promising line is to find new ways to construct the subgradients that define the search direction of the heuristic. Here, the technique’s competitive advantage lies in the convexity of $g (X)$ . Experimenting with various subgradients and step sizes is also an option to improve the method’s efficiency.

There are extensive studies on the efficiency of search methods that point to the magnitude of the complexity using this approach [24]; however, this framework constitutes a fascinating alternative for study due to the almost unpredictable nature of the method.

Comparative Aspects of the Alternative Method

Comparing two or more algorithms to evaluate their efficiency is an extensive task that involves several performance-related criteria. Below is an empirical analysis on the efficiency of the proposal and its comparison with other alternatives, the analysis is based on the suggestions given in [25].

In general, given two algorithms, AL1 and AL2, AL1 is said to be more efficient than AL2 if the following relationship holds.

$\frac{η_{A L 1}}{η_{A L 2}} > 1$

Likewise, the variance of the magnitude $Δ g_{k}$ constitutes another criterion for metric algorithm efficiency.

In the proposal presented in this document, the heuristic requires two initial feasible solutions to trigger the first point of the process using a estimator. The process then progresses by perturbing some of the components of the points already evaluated, retaining the best values of the objective function and eliminating those that do not contribute to minimizing it.

This approach has the advantage of providing great numerical stability, allowing the objective function value to be reduced via feasible solutions at each iteration.

Regarding solution quality, the results suggest a good approximation at the beginning of the search, with the decline slowing as the method approaches the optimal solution. For practical purposes, its implementation is relatively simple because once the first two values of the method are obtained and the first point is reached by approximation, the rest of the process consists of perturbing the components of the last point using a random search process. This is the part of the method that takes the most time because, according to the graph, after four points on the sphere, the efficiency decreases significantly.

Table 3 shows the results obtained when comparing our heuristic (which we present as OH) with LINGO, AMPL, and GAMS. The following comparative models (including Model 1, described above) were used to perform these runs [26] [27]. The presented mathematical models were coded in the LINGO optimization software (which uses B&B as the default solver) and run on an Apple computer with an M2-pro chip, 16 GHb of memory, and macOS Sequoia.

- Model 2: A model for locating warehouses in a logistics system with fixed costs, 3 warehouses, 3 consumption centers with the following data:

1) Capacity: 300, 525, 325.

2) Demand: 100, 200, 125, 225.

3) Variable costs:

10	5	12	3
4	9	15	6
15	8	6	11

4) Fixed costs: 125,000, 185,000, 100,000.

- Model 3: A multimodal transport model with 3 origins, 4 destinations and 2 modes of transport.

Table 3. Comparisons of the exact method versus the heuristic.

Model	Running using LINGO					Running using OH
Model	NV	NC	RI	OG	CPU	NV	NV	RI	OG	CPU
1	30	26	63	−1816	0.09	30	26	122	−1816	0.45
2	24	37	28	289,100	0.05	24	37	75	289,000	0.42
3	288	55	9	3425	0.05	24	37	38	3400	0.40
4	3	3	20	0.9808	0.34	3	3	32	0.9800	1.03
5	18	13	242	324,760	0.07	18	13	325	32450	1.27
6	65	72	147	5732	1.03	65	72	290	5730	1.63
Model	Running using AMPL					Running using GAMS
Model	NV	NC	RI	OG	CPU	NV	NV	RI	OG	CPU
1	30	26	80	−1816	0.11	30	26	80	−1816	0.24
2	24	37	32	289,100	0.09	24	37	33	289,000	0.14
3	288	55	14	3425	0.06	288	55	15	3400	0.11
4	3	3	24	0.9808	0.38	3	3	24	0.9800	0.39
5	18	13	248	324,760	1.01	18	13	250	324,760	1.00
6	65	72	152	5732	1.07	65	72	152	5732	1.08

Where NV denotes the number of variables involved, NC is the number of model constraints, RI represents the number of iterations required for convergence, OG is the global optimal value, CPU is the time in seconds required by the computer.

1) Capacity: 200, 150, 300.

2) Demand: 100, 200, 125, 225.

3) Variable costs:

10.8	5.4	12.14	3.4
4.6	9.8	15.12	6.5
15.17	8.9	6.9	11.8

- Model 4: A reliability nonlinear model

$Maximize R = (1 - {(1 - 0.65)}^{4 d + 1}) (1 - {(1 - 0.55)}^{3 + d_{2}}) (1 - {(1 - 0.70)}^{4 + d_{3}})$

Subject to:

$16 d_{1} + 12 d_{2} + 13 d_{3} \leq 75, 2 \leq d_{1} \leq 3, 2 \leq d_{2} \leq 2, 2 \leq d_{3} \leq 4$

$d_{1}, d_{2}, d_{3} \in Z^{+}$

- Model 5: A production planning model with multiple processes and multiple products [28].

$Minimize \sum_{t = 1}^{T} \sum_{i = 1}^{N} \sum_{j = 1}^{m_{j}} (C_{i j t}^{p} P_{i j t} + C_{i t}^{I} I_{i t})$

Subject to

$\sum_{i = 1}^{N} \sum_{j = 1}^{m_{j}} a_{i j k} P_{i j t} \leq A_{k t}, t = 1, 2, \dots, T; k = 1, 2, \dots, K$

$I_{i t} = I_{i t - 1} + \sum_{j = 1}^{m_{j}} P_{i j t} - D_{i t}, t = 1, 2, \dots, T; i = 1, 2, \dots, N$

$P_{i j t} I_{i t} \in Z^{+}, t = 1, 2, \dots, T; i = 1, 2, \dots, N; j = 1, 2, \dots, m_{j}$

With the following instance,

$\begin{matrix} Minimize g (X) = 5 I_{11} + 6 I_{12} + 6 I_{21} + 7 I_{22} + 7 I_{23} + 72 P_{111} + 80 P_{121} + 85 P_{211} \\ + 90 P_{221} + 74 P_{112} + 78 P_{122} + 88 P_{212} + 95 P_{222} + 75 P_{113} \\ + 78 P_{123} + 4 P_{213} + 92 P_{223} \end{matrix}$

Subject to:

$\begin{array}{l} 5 p_{111} + 4 p_{121} + 8 p_{211} + 6 p_{221} \leq 8600 \\ 10 p_{111} + 8 p_{121} + 12 p_{211} + 9 p_{221} \leq 17000 \\ 5 p_{112} + 4 p_{122} + 8 p_{212} + 6 p_{222} \leq 8500 \\ 10 p_{112} + 8 p_{122} + 12 p_{212} + 9 p_{222} \leq 16600 \\ 5 p_{113} + 4 p_{123} + 8 p_{213} + 6 p_{223} \leq 8800 \\ 10_{p 113} + 8 p_{123} + 12 p_{213} + 9 p_{223} \leq 18200 \\ I_{11} = 100 + P_{111} + P_{121} - 1000 \\ I_{12} = I_{11} + p_{112} + p_{122} - 1050 \\ I_{13} = I_{12} + P_{113} + P_{123} - 1100 \\ I_{21} = 50 + P_{211} + P_{221} - 500 \\ I_{22} = I_{21} + P_{212} + P_{222} - 600 \\ I_{23} = I_{22} + P_{213} + P_{223} - 550 \end{array}$

$P_{i j t} I_{i t} \in Z^{+}$

- Model 6: A vehicle routing model visiting eight cities with a load capacity of 18 tons. The following distance and demand matrix is as follows.

1) Demand: 0, 6, 3, 7, 7, 18, 4, 5,

2) Dist. matrix:

996	2162	1067	499	2054	2134	2050
0	1167	1019	596	1059	1227	1055
1167	0	1747	1723	214	168	250
1019	1747	0	710	1538	1904	1528
596	1723	710	0	1589	1827	1579
1059	214	1538	1589	0	371	36
1227	168	1904	1827	371	0	407
1055	250	1528	1579	36	407	0

Statistical Analysis

In order to develop comparative statistics of the execution times of the proposed models, the results obtained by applying three non-parametric tests designed specifically for such purposes are shown below [29].

- Friedman Test: As a first test, the non-parametric Friedman test is used to compare related groups since the data do not meet the assumptions of normality. The Friedman test is used to demonstrate that there are significant differences in the sample data in Table 3. The null hypothesis indicates that all algorithms behave similarly. Table 4 shows the ranges obtained in relation to the execution time variable.

Table 4. Range analysis for the Friedman statistic.

LINGO	OH	AMPL	GAMS
0.09 (1)	0.45 (4)	0.11 (2)	0.24 (3)
0.05 (1)	0.42 (4)	0.09 (2)	0.14 (3)
0.05 (1)	0.40 (4)	0.06 (2)	0.11 (3)
0.34 (1)	1.03 (4)	0.38 (2)	0.39 (3)
0.07 (1)	1.27 (4)	1.01 (3)	1 (2)
0.03 (1)	1.63 (4)	1.07 (2)	1.08 (3)

Then, for $n = 6$ and $k = 3$ , we have that the Friedman statistic is equal to

$F_{F} = \frac{12}{n k (k + 1)} [\sum_{j} R_{j}^{2} - 3 n (k + 1)] = 166.26$ (28)

Thus, for a critical value of 0.10, $F_{F} = 4.60$ and the null hypothesis is rejected and there are large significant differences between the running times of the algorithms.

- Multiple Sign Test: This test uses LINGO as a study control method. This test is an extension of the traditional method. Its objective is to determine the direction of the differences (signs) rather than their magnitude. The results are shown in Table 5.

Table 5. Comparative results for the sign test.

LINGO	OH	AMPL	GAMS
0.09	+	+	+
0.05	+	+	+
0.05	+	+	+
0.34	+	+	+
0.07	+	+	+
0.03	+	+	+
Number of minus signs	6	6	6
Number of plus signs	0	0	0
$r_{j}$	0	0	0

Where the respective medians of each method are given by: $M e d_{L i n g o} = 0.08$ , $M e d_{O H} = 0.74$ , $M e d_{A M P L} = 0.245$ and $M e d_{G A M S} = 0.31$ . Thus, for an $α = 0.05$ , the hypothesis $H_{0} : M_{L i n g o} \leq M_{j}$ is accepted since the number of plus signs is less than the critical value $ℛ = 6$ , with $ρ = 24$ and $k - 1 = 3$ .

- Kruskal-Wallis Test: The Kruskal-Wallis test is a non-parametric test that will be used to compare means and medians of independent groups, considering the time taken by each algorithm to solve the proposed instances as a variable of interest. This test uses the following statistic to maintain the null hypothesis of equality between the average times for all the algorithms tested (Table 3). Table 6 shows the summary of calculations for this test.

(29)

where ${\bar{R}}_{L} = 6.4166$ , ${\bar{R}}_{A} = 10$ , ${\bar{R}}_{O} = 16.666$ , ${\bar{R}}_{G} = 11.2500$ , $E [R] = \frac{ρ + 1}{2} = 12.5$ and $σ_{R}^{2} = \frac{ρ^{2} - 1}{12} = 52$ . Thus, for $ρ = \sum_{i = 1}^{24} ρ_{i} = 24$ , we get that $K_{W} = 8.8692$ . Thus, for $χ_{0.05, 3}^{2} = 7.8147$ the null hypothesis is rejected.

Table 6. Ranges associated with the times used by the algorithms.

Alg.	$L$	$L$	$A$	$L$	$L$	$A$	$G$	$A$	$G$	$G$	$L$	$A$
Time	0.05	0.05	0.06	0.07	0.09	0.09	0.11	0.11	0.14	0.24	0.34	0.38
R	1.5	1.5	3	4	5.5	5.5	6.5	6.5	7	8	9	10
Alg.	$G$	$O$	$O$	$O$	$G$	$A$	$L$	$O$	$A$	$G$	$O$	$O$
Time	0.39	0.40	0.42	0.45	1	1.01	1.03	1.03	1.07	1.08	1.27	1.63
R	11	12	13	14	15	16	17	18	19	20	21	22

From the above results, it can be inferred that, although the proposed heuristic appears promising, its computational efficiency is currently low compared to the most common software programs on the market. Therefore, research into how to improve the perturbation direction of the vector $X_{k}$ would have a strong impact on the speed of convergence. Finally, converting the heuristic into a metaheuristic would be the ideal alternative for implementation in engineering models that frequently involve millions of integer variables to address real-world problems.

Like any heuristic, our proposal almost always provides convergent solutions; however, in comparison, it is not very competitive with the branch-and-bound method for cases with few variables. However, as mentioned at the beginning of this presentation, this proposal is an alternative approach to traditional techniques, such as cutting planes. As part of future research, it is recommended to evaluate larger instances and refine the perturbation method to improve search efficiency.

5. Conclusions and Suggestions

This article presents an alternative heuristic to traditional scientific methods for optimizing an integer linear programming model.

The method consists of proposing two initial feasible points and, from there, using a quasi-gradient search method to determine the direction of movement and regulate the step size to remain within the set of integer values.

The novelty of this proposal lies in the fact that the method begins its search within the feasible region and remains there, selecting movement directions by means of slight perturbations to the components of the points that are not feasible, always searching for a downward direction.

The results suggest the viability of the method for use in large-scale models since, despite the enormous computational effort required to locate new points in the sequence, the search for a downward direction becomes rapid at the beginning of the process, but slows down as the method approaches the optimal solution or when the support point in iteration $k$ is on the boundary of the set $D$ .

The results shown for locating points on a hypersphere suggest that a maximum of four perturbed components maintains a good efficiency for the method. Although this result is not conclusive, the alternative is to suggest other types of surfaces for searching for alternate candidates in the descent sequence.

The experience gained in this research suggests expanding research into construction methods, search step sizes, and the selection of the best candidate components to be perturbed during the process. Future avenues in this regard remain open to provide more efficient methods based on the aforementioned conditions.

Appendix: Convergence of Solutions of the Proposed Method

Table A1 and Table A2 show the initial and final results of the process. Convergence is achieved after 121 iterations.

Table A1. Initial solution and first five iterations of the process.

	$g (X_{0}) = 2069$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
	$g (X_{0}) = 2022$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	2	3	4	5	6	6	7	8	10	11	11	13	14	15
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
15	17	17	18	19	21	22	22	24	24	25	26	27	29	29
	$g (X_{0}) = 1936$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
0	1	2	3	4	5	6	7	8	9	10	11	12	13	14
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
15	16	17	18	19	20	21	22	23	24	25	26	27	28	29
	$k = 4$ , $g ({\hat{X}}_{k}) = 1932$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	2	3	3	4	5	6	6	8	10	11	10	12	14	15
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
15	16	16	18	19	20	22	21	24	24	24	25	26	28	29
	$k = 5$ , $g ({\hat{X}}_{k}) = 1927$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	2	3	3	4	5	6	6	8	10	11	10	12	14	15
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
15	16	16	18	19	20	22	21	24	24	24	25	26	28	29
	$k = 5$ , $g ({\hat{X}}_{k}) = 1926$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
0	7	8	7	4	7	6	6	7	8	14	10	14	12	16
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
15	16	14	18	19	21	22	21	27	23	24	32	26	28	32
	$k = 5$ , $g ({\hat{X}}_{k}) = 1905$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
0	2	2	2	4	4	6	6	9	10	11	9	11	13	17
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
14	16	16	14	19	19	21	20	23	23	24	22	25	28	29

Table A2. Last seven iterations of the process.

	$g (X_{0}) = - 1782$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	8	4	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1790$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	8	3	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1798$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	8	2	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1806$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	8	1	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1808$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	7	0	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1814$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
1	6	8	0	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75
	$g (X_{0}) = - 1816$
$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{12}$	$X_{13}$	$X_{14}$	$X_{15}$
0	6	8	0	0	0	22	0	0	0	16	0	0	0	0
$X_{16}$	$X_{17}$	$X_{18}$	$X_{19}$	$X_{20}$	$X_{21}$	$X_{22}$	$X_{23}$	$X_{24}$	$X_{25}$	$X_{26}$	$X_{27}$	$X_{28}$	$X_{29}$	$X_{30}$
20	22	26	24	0	0	0	0	136	0	0	0	0	45	75

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Ermolieva, T., Ermoliev, Y., Obersteiner, M. and Rovenskaya, E. (2021) Chapter 4 Two-Stage Nonsmooth Stochastic Optimization and Iterative Stochastic Quasigradient Procedure for Robust Estimation, Machine Learning and Decision Making. In: Roberts, F.S. and Sheremet, I.A., Eds., Resilience in the Digital Age, Springer, 45-74.[CrossRef]
[2]	Pérez Lechuga, G. (2018) Optimal Logistics Strategy to Distribute Medicines in Clinics and Hospitals. Journal of Mathematics in Industry, 8, Article No. 2.[CrossRef]
[3]	Pérez-Lechuga, G., Aguilar-Velázquez, S.L., Cisneros-López, M.A. and Martínez, F.V. (2019) A Model for the Location and Scheduling of the Operation of Second-Generation Ethanol Biorefineries. Journal of Mathematics in Industry, 9, Article No. 3.[CrossRef]
[4]	Pérez-Lechuga, G., Venegas-Martínez, F. and Martínez-Sánchez, J.F. (2021) Mathematical Modeling of Manufacturing Lines with Distribution by Process: A Markov Chain Approach. Mathematics, 9, Article 3269.[CrossRef]
[5]	Pérez-Lechuga, G., Venegas-Martínez, F., Montufar-Benítez, M.A. and Mora-Vargas, J. (2022) On the Dynamics in Decoupling Buffers in Mass Manufacturing Lines: A Stochastic Approach. Mathematics, 10, Article 1686.[CrossRef]
[6]	Pérez-Lechuga, G., Martínez-Sánchez, J.F., Venegas-Martínez, F. and Madrid-Fernández, K.N. (2024) A Routing Model for the Distribution of Perishable Food in a Green Cold Chain. Mathematics, 12, Article 332.[CrossRef]
[7]	Papadimitriou, C.H. (1981) On the Complexity of Integer Programming. Journal of the ACM, 28, 765-768.[CrossRef]
[8]	Rothberg, E. (2007) An Evolutionary Algorithm for Polishing Mixed Integer Programming Solutions. INFORMS Journal on Computing, 19, 534-541.[CrossRef]
[9]	Fischetti, M. and Lodi, A. (2010) Heuristics in Mixed Integer Programming. In: James, J., Ed., Wiley Encyclopedia of Operations Research and Management Science, John Wiley & Sons, Inc, 1-6. https://homepages.cwi.nl/~dadush/workshop/discrepancy-ip/papers/heuristics-survey-fischetti-lodi-11.pdf
[10]	Kleinert, T., Labbé, M., Ljubić, I. and Schmidt, M. (2021) A Survey on Mixed-Integer Programming Techniques in Bilevel Optimization. EURO Journal on Computational Optimization, 9, Article ID: 100007.[CrossRef]
[11]	Huang, L.Y., Chen, X.M., Huo, W., Wang, J.Z., Zhang, F., Bai, B. and Shi, L. (2021) Branch and Bound in Mixed Integer Linear Programming Problems: A Survey of Techniques and Trends. arXiv: 2111.06257.[CrossRef]
[12]	Ermoliev, Y.M. and Gaivoronski, A.A. (1992) Stochastic Quasigradient Methods for Optimization of Discrete Event Systems. Annals of Operations Research, 39, 1-39.[CrossRef]
[13]	Pérez-Lechuga, G. (1993) bibinfotitleUn algoritmo para la optimización estocástica de algunos modelos dinámicos. Ph.D. Thesis, Universidad Nacional Autónoma de México.
[14]	Ermol’ev, Y.M. (1972) On the Method of Generalized Stochastic Gradients and Quasi-Féjer Sequences. Cybernetics, 5, 208-220.[CrossRef]
[15]	Ball, M.O. (2011) Heuristics Based on Mathematical Programming. Surveys in Operations Research and Management Science, 16, 21-38. https://www.researchgate.net/publication/229415600
[16]	Borne, P., Popescu, D., Filip, F.G. and Stefanoiu, D. (2014) Optimization in Engineering Sciences. John Wiley and Sons, 1-30.
[17]	Combettes, P.L. (2001) Quasi-Fejérian Analysis of Some Optimization Algorithms. Studies in Computational Mathematics, 8, 115-152.[CrossRef]
[18]	Boyd, S., Duchi, J., Pilanci, M. and Vandenberghe, L. (2022) Subgradients. https://web.stanford.edu/class/ee364b/lectures/subgradients_notes.pdf
[19]	Ermoliev, Y.M. (2025) Stochastic Quasigradient Methods and their Application in Systems Optimization. https://pure.iiasa.ac.at/id/eprint/1759/7/WP-81-002.pdf
[20]	Rubinstein, Y. and Reuben, L. (1981) Simulation and the Monte Carlo Method. John Wiley & Sons, Inc.
[21]	Svaiter, B.F. (2025) Fejer-Convergent Algorithms Which Accept Summable Errors, Approximated Resolvents and the Hybrid Proximal-Extragradient Method. https://webdoc.sub.gwdg.de/ebook/serien/e/IMPA_A/715.pdf
[22]	LINGO (2025) Software for Mathematical Optimization. Integer Programming, Linear Programming, Nonlinear Programming, Stochastic Programming, Global Optmization. https://www.lindo.com/
[23]	Rubinstein, R.Y. (1982) Generating Random Vectors Uniformly Distributed inside and on the Surface of Different Regions. European Journal of Operational Research, 10, 205-209.[CrossRef]
[24]	Pérez-Lechuga, G., Tuoh-Mora, J., Morales-Sánchez, E. and Suárez-Álvarez, M. (2005) On the Efficiency of a Random Search Method. https://www.researchgate.net/publication/241769436_ON_THE_EFFICIENCY_OF_A_RANDOM_SEARCH_METHOD
[25]	(2025) How to Compare Two Algorithms Empirically? https://www.baeldung.com/cs/compare-algorithms-performance#::text=Choosing
[26]	Liberatore, M. and Nydick, R. (2003) Decision Technology: Modeling, Soft-Ware, and Applications. John Wiley & Sons, Inc.
[27]	Gupta, N. and Ali, I. (2021). Optimization with LINGO-18 Problems and Applications. CRC Press. [Google Scholar] [CrossRef]
[28]	Sipper, D. and Bulfin, R. (1997) Production: Planning, Control, and Integration. McGraw-Hill College.
[29]	García, S., Fernández, A., Luengo, J. and Herrera, F. (2010) Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power. Information Sciences, 180, 2044-2064.[CrossRef]

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies