Avoiding the Use of Lagrange Multipliers. II. Constrained Extrema of Functionals and the Evaluation of Constrained Derivatives

Abstract

A method for determining the extrema of a real-valued and differentiable function for which its dependent variables are subject to constraints and that avoided the use of Lagrange multipliers was previously presented (Corti and Fariello, Op. Res. Forum 2 (2021) 59). The method made use of projection matrices, and a corresponding Gram-Schmidt orthogonalization process, to identify the constrained extrema. Furthermore, information about the second-derivatives of the given function with constraints was generated, from which the nature of the constrained extrema could be determined, again without knowledge of the Lagrange multipliers. Here, the method is extended to the case of functional derivatives with constraints. In addition, constrained first-order and second-order derivatives of the function are generated, in which the derivatives with respect to a given variable are obtained and, concomitantly, the effect of the variations of the remaining chosen set of dependent variables are strictly accounted for. These constrained derivatives are valid not only at the extrema points, and also provide another equivalent route for the determination of the constrained extrema and their nature.

Share and Cite:

Corti, D. and Fariello, R. (2024) Avoiding the Use of Lagrange Multipliers. II. Constrained Extrema of Functionals and the Evaluation of Constrained Derivatives. Journal of Applied Mathematics and Physics, 12, 2764-2788. doi: 10.4236/jamp.2024.128165.

1. Introduction

Recently, the authors presented [1] a method for determining the extrema of real-valued and differentiable functions for which its dependent variables are subject to constraints and that avoids the use of Lagrange multipliers. The method made use of projection matrices, and a corresponding Gram-Schmidt orthogonalization process, to identify the constrained extrema. Although Lagrange multipliers were not required, a comparison of this approach to the method of Lagrange multipliers nevertheless yielded expressions for the Lagrange multipliers in terms of the gradients of the given function and the additional functions that represent the constraints. Furthermore, information about the second-derivatives of the given function with constraints was generated, from which the nature (i.e., maxima or minima) of the constrained extrema were determined, again without knowledge of the Lagrange multipliers.

Other methods for avoiding the use of Lagrange multipliers have been developed for constrained extrema problems. For example, one prior approach utilized wedge products [2], though only considered first-order variations of the given function. Another approach employed the chain rule and the implicit function theorem [3], and also yields information about second-order derivatives. The nature of the constrained extrema can also be found from the introduction of an extended Hessian matrix [4] [5]. Yet, the values of the Lagrange multipliers are still required when evaluating these matrices, and the general procedure for the case of multiple constraints was not provided. Our proposed method, with its use of well-known aspects of linear algebra and a relatively straightforward incorporation of multiple constraints, may prove computationally more convenient than these prior approaches for some optimization problems with a large number of constraints.

We again note that this method followed some of the ideas, as well as some of the convenient notation, provided in the work of Gál [6]-[11] for the evaluation of first- and second-order functional derivatives with constraints (and also without the use of Lagrange multipliers). To a certain extent, our approach for analyzing constrained extrema without Lagrange multipliers is the discrete version (i.e., the number of variables is finite) of one of the continuum versions (i.e., an “infinite” variable space requiring functional derivatives) presented by Gál. Some aspects of our current discrete approach were therefore hinted at in [11], though not fully developed (at least for multiple constraints).

Besides the direct benefit of solving constrained problems without the need for Lagrange multipliers, which also potentially simplifies the determination of the nature of the extrema, our approach may, upon further exploration, be of additional utility. In various scientific problems of interest that are formulated as constrained problems, the Lagrange multipliers typically have physical meanings. For example, in statistical physics, the maximization of the Gibbs entropy formula subject to certain constraints provides another route to generating the canonical and grand canonical ensembles. In both cases, one of the Lagrange multipliers is associated with the inverse temperature of the thermal reservoir, while in the latter case, another Lagrange multiplier is associated with the chemical potential of the particle reservoir [12]. Here, by direct comparison to the method of Lagrange multipliers, our approach leads to general expressions for the Lagrange multipliers themselves [1]. Given the often found connection between the Lagrange multipliers and physical quantities, these new expressions may yield some different and useful insights into various constrained problems of physical interest [13]-[16].

We do not explore further these connections here, but instead consider other aspects of our approach that can be applied to other important constrained problems of interest. In particular, we present the extension of our approach to the case of functional derivatives with constraints. In addition, the explicitly constrained first-order and second-order derivatives of the given function (for the discrete variable set) are obtained, in which derivatives with respect to a given variable are generated and, concomitantly, the effect of the variations of the remaining chosen set of dependent variables are strictly accounted. These constrained derivatives are valid not only at the extrema points, thereby providing information that may be of use for various problems of interest, and may also be used for the determination of the constrained extrema and their nature.

In the remainder of this paper, we first present a summary of the method developed in [1], in which local extrema and their nature are determined with constraints, again without the use of Lagrange multipliers. We then present the extension of our approach to the continuum limit (i.e., functional derivatives with constraints), some of the results of which are consistent with the analysis provided in [6]-[11]. Finally, we also consider the determination of constrained first- and second-order derivatives of the given function (for the discrete variable set), i.e., derivatives for which the impact of the constraints are explicitly accounted. These derivatives are valid not just at the extrema points, several results of which have not been obtained previously.

2. Review of the Method Avoiding the Use of Lagrange Multipliers

Consider the real-valued function, f, that is dependent upon the n (initially) independent variables x= x 1 , x 2 ,, x n . In the multi-dimensional space x , the (column) vector δx=δ x 1 ,,δ x n describes a general variation of all the n (initially) independent variables, such that the first-order variation of f, or δf , is generated from δf=δ x T f (in which for notational convenience the corresponding transposed row vector is denoted by ( δx ) T δ x T and  =/ x 1 ,,/ x n ). This first-order variation follows from the projection of f onto the direction of δx , which is just the directional derivative of f along δx .

Let there now be m<n constraints on x , which are assumed to be represented by the following set of differentiable functions:

g 1 ( x )= K 1 , g 2 ( x )= K 2 ,, g m ( x )= K m ,

and for which K 1 , K 2 ,, K m are constants. These constraints describe m constraint surfaces in which the vectors g 1 , g 2 ,, g m are normal to their corresponding surface. To find a direction that is orthogonal to these resulting normal vectors, and as such resides in all of the corresponding m constraint surfaces, a Gram-Schmidt orthogonalization process was invoked [1], which generated the following projection matrix Q m that can be represented by the following “formal” determinant

Q m = 1 D m | g 1 T g 1 g 1 T g m g 1 T g m T g 1 g m T g m g m T g 1 g m I | (1)

which is evaluated along the bottom row. In Equation (1), I is the identity matrix and

D m | g 1 T g 1 g 1 T g m g m T g 1 g m T g m | (2)

which is also the m×m upper left minor of the matrix in Equation (1). (We are assuming that D m 0 , which again also follows from the assumed linear independence of the various g i .) For a given vector, Q m projects that vector onto the K 1 , K 2 ,, K m -constraint surfaces, or equivalently onto the K 1 , K 2 ,, K m -conserving direction. As required, Equation (1) indicates that Q m g k =0 for k=1,,m , since Q m g k yields a determinant in which the kth and last columns are identical. (When Q m operates on a column vector only the last column of Equation (1) operates on that column vector.)

An alternative expression for Q m was also obtained [1]

Q m =I 1 D m i=1 m j=1 m g i g j T A ji m (3)

where A ji m is the co-factor of element ji in the m×m matrix used to determine D m in Equation (2). With Equation (3), one can show, as required for a projection matrix, that Q m is idempotent, Q m T = Q m , det( Q m )=0 , and Tr( Q m )=nm (which also equals the number of non-zero eigenvalues of Q m , with each all being equal to one).

Now, the projection of δx onto the K 1 , K 2 ,, K m -conserving direction, or in the direction residing along the various constraint surfaces, is labeled as δ K 1 K 2 K m x , or for convenience δ K x , and is given by δ K x= Q m δx . As noted earlier, the first-order (unconstrained) variation of f, or δf , is generated from δf=δ x T f , which follows from the projection of f onto the direction of δx , which is just the directional derivative of f along δx . Analogously, a first-order variation of f that maintains the constraints, denoted as δ K 1 K m f , or δ K f , can be generated by projecting f onto the direction of δ K x . Hence, the directional derivative of f residing along the constraint surface, or the K 1 , K 2 ,, K m -conserving first-order variation of f, is given by δ K f= δ K x T f , where δ K x T ( δ K x ) T . This first-order variation can be, however, generated in a different way. To do so, we define the K 1 K 2 K m -conserving gradient of f, or what we label as K f , to be the vector that yields, by construction, the K 1 K 2 K m -conserving first-order variation of f when evaluated along the (unconstrained) variation δx . In other words, K f is defined via δ K fδ x T K f . Therefore,

δ K f=δ x T K f= δ K x T f= ( Q m δx ) T f (4)

Given that ( Q m δx ) T =δ x T Q m T =δ x T Q m , then

δ K f=δ x T K f=δ x T Q m f (5)

or

K f= Q m f= 1 D m | g 1 T g 1 g 1 T g m g 1 T f g m T g 1 g m T g m g m T f g 1 g m f | (6)

(which is again evaluated along the bottom row). From the above, we conclude that K f is the projection of f onto the various constraint surfaces. Note that each component of K f includes contributions from all the n variables, since K f accounts implicitly for how all the variables must change due to the constraints. Furthermore, Equation (6) implies that the K 1 K 2 K m -conserving gradient operator, K , is equal to Q m .

At an extremum, K f= Q m f=0 , so we have that

| g 1 T g 1 g 1 T g m g 1 T f g m T g 1 g m T g m g m T f g 1 g m f |=0 (7)

a condition for a constrained extremum that avoids the use of Lagrange multipliers. Equation (7) was shown to be equivalent to what follows from the method of Lagrange multipliers [1]. A comparison of Equation (7) to the method of Lagrange multipliers provides an immediate general expression for the Lagrange multipliers, showing that they are related to the co-factors of g 1 , g 2 , etc. in the bottom row of the matrix in Equation (7) divided by D m . Furthermore, due to various properties of the determinant (e.g., row and column interchanges), the choices of which constraints are labeled 1, 2, etc. are arbitrary (as to be expected), and will not affect the results presented above. Finally, K f is, of course, by construction orthogonal to the vectors g 1 , g 2 , etc. Given Equation (6), g k T K f (for k =1,,m ) generates a determinant in which the kth and last rows are identical, and so g k T K f=0 . Finally, the introduction of the matrix Q ˜ m D m Q m proves convenient in various applications, where Q ˜ m 2 = D m Q ˜ m , Tr( Q ˜ m )=( nm ) D m , det( Q ˜ m )=0 and at an extremum Q ˜ m f=0 .

The nature of an extremum (i.e., maximum or minimum or saddle point) can be determined from an appropriate matrix of the various corresponding 2nd-order derivatives. For the unconstrained case, the second-order variation of f, or δ 2 f , is generated from δ 2 f=δ x T ( T f )δx=δ x T Fδx , where F T f is the matrix of all the (unconstrained) second-order derivatives of f. This result can be obtained in the following manner. As noted previously, the unconstrained first-order variation of f is given by δf=δ x T f , which implies that one can introduce the (scalar) “variation operator” δδ x T . Hence, the second-order variation operator (another scalar) is equal to

δ 2 =δδ=( δ x T )( δ x T )=( δ x T )( T δx )=δ x T ( T )δx ,

and so δ 2 f=δ x T Fδx . (Note when δ 2 operates on f, T does not act on δx , but on f only.)

Now, similar to the previous derivation of the K 1 K m -conserving gradient, we likewise generate two different expressions for the second-order K 1 K m -conserving variation of f, or δ K 2 f . First, a new matrix of constrained second-order derivatives, F K , is introduced, such that by construction when twice operated on by δx , one obtains δ K 2 f . Hence, F K is defined in the following manner

δ K 2 f=δ x T F K δx (8)

On the other hand, from Equation (4), δ K f= δ K x T f=δ x T Q m f , which suggests that we introduce the K 1 K m -conserving variation operator δ K δ K x T =δ x T Q m . Therefore, we also have that

δ K 2 f= δ K δ K f=( δ x T Q m )( δ x T Q m f ) =( δ x T Q m ) ( Q m f ) T δx=δ x T Q m ( Q m f ) T δx (9)

where ( Q m f ) T is a row vector. Comparing Equations (8) and (9) implies that

F K = Q m ( Q m f ) T = Q m ( f T Q m ) (10)

in which ( f T Q m )= ( Q m f ) T . Since the elements of Q m may depend upon the variables x , the order of operation of the gradient operator on Q m and f needs to be maintained in Equation (10). Equation (10) can also be obtained in another way. Starting with the following definition of F K K K T f [7], and noting that K = Q m , we have that

F K =( Q m ) ( Q m ) T f= Q m ( Q m f ) T (11)

When determining F K , first evaluating ( Q m f ) T and then operating on this row vector, initially with (to obtain a matrix) and subsequently with Q m , proves to be convenient. Nonetheless, F K can be expanded to yield alternative forms as shown in [1]. Since Equation (11) shows that F K is the product of Q m and another matrix, det( F K )=0 , which is again consistent with the fact that while F K includes derivatives with respect to all n variables, ultimately not all of these K 1 K m -conserving variations are independent due to the constraint (where we now expect m eigenvalues to be equal to zero due to the m constraints). Finally, F K can also be expressed as [1]

F K = 1 D m 2 Q ˜ m ( Q ˜ m f ) T (12)

an alternative form that proves convenient in some cases, and which is only appropriate for the constrained extrema, for which Q ˜ m f=0 . Ultimately, the nature of the constrained extrema can be obtained from the signs of the eigenvalues of F K .

For certain problems, this new method does yield a decrease in the number of steps needed to identify the constrained extrema, compared to the method of Lagrange multipliers and at least after the projection matrix Q m has been determined [1]. Moreover, the new method arguably provides a clear advantage over the method of Lagrange multipliers when determining the nature of the constrained extrema. As an example to illustrate the potential benefits of the new method, consider the following sample problem [1], in which we are interested in finding the extremum of the function f =  x 1 2  +  x 2 2  + x 3 2   , subject to the two constraints g 1 =2 x 1 + x 2 =1 and g 2 = x 1 + x 3 =2 . To begin, we first note that

Q 2 = 1 6 | 5 2 g 1 T 2 2 g 2 T g 1 g 2 I |= 1 6 ( 1 2 1 2 4 2 1 2 1 ) (13)

Therefore,

K 1 K 2 f= Q 2 f= 1 3 ( x 1 2 x 2 x 3 ,2 x 1 +4 x 2 +2 x 3 , x 1 +2 x 2 + x 3 ) (14)

At the constrained extremum, in which K 1 K 2 f=0 , we find with the constraints that x 1 =2/3 , x 2 =1/3 , and x 3 =4/3 . To determine the nature of the extremum, we note for the two given constraints that each element of Q 2 is a constant, and so Q 2 =0 . Therefore, from Equation (11), F K 1 K 2 = Q 2 f T Q 2 = Q 2 F Q 2 . Now, for the given function f,

F=( 2 0 0 0 2 0 0 0 2 )=2I (15)

Hence, F K 1 K 2 =2 Q 2 . The rank of Q 2 , and so F K 1 K 2 , is 1, in which there are two zero eigenvalues (i.e., two constraints). With the trace of F K 1 K 2 equaling 2, the only non-zero eigenvalue is then equal to 2. Thus, the extremum corresponds to a minimum. All of these results can be verified by direct substitution of the constraints into the function f. With 3 variables and 2 constraints, there is only a single degree of freedom for this problem (i.e., only one of the variables can be varied independently of the other two). So, for example, we note that

f= x 1 2 + ( 12 x 1 ) 2 + ( 2 x 1 ) 2 =6 x 1 2 8 x 1 +5 .

This function of a single variable has an extremum at x 1 =2/3 , and a second derivative equal to 12, confirming that this stationary point is a minimum.

3. Extrema for Functional Derivatives with Constraints

In the calculus of variations, which has important applications in mathematics and various physical problems, one attempts to find a particular function that yields an extremum of another function. The function that relates a particular value to each inputted function is called a functional [17]. The functional derivative indicates how the functional changes upon an infinitesimal change of the function upon which the functional depends. When there are additional constraints on the input functions, the extrema of the functionals can be obtained through the use of functional derivatives and the method of Lagrange multipliers. Yet, formally determining the nature of these extrema with this method is unclear, particularly for the case of functionals that depend upon several functions with multiple constraints. Hence, as noted for the extrema of functions with constraints, avoiding the use of Lagrange multipliers when considering the extrema of functionals with constraints, particularly when evaluating the nature of these extrema, may prove beneficial for various problems of interest.

The extension of our method to the case of finding extrema for functional derivatives with constraints can be obtained by analogy, or by considering the limit of δx when there are an “infinite number of (continuous) variables”. Thus, wherever a δx appears, it is replaced by δy( x ) , where y is a function of the now continuous variable set x . So, for f that is now a functional of y, or f[ y( x ) ] , we have that

f δf δy( x )

g i δ g i δy( x )

K f δf δ K y( x ) (16)

where each of the above can be thought of as a “column vector of infinite size”. Therefore, we also have, for example, that

g i T g j dx δ g i δy( x ) δ g j δy( x )

g i T f dx δ g i δy( x ) δf δy( x ) (17)

The last two expressions can be then used to evaluate the corresponding matrix elements in Equation (1). In addition, we have the following orthogonality condition

g i T K f=0= dx δ g i δy( x ) δf δ K y( x ) (18)

Hence, for the case of a single constraint, g, we have that

δf δ K y( x ) = δf δy( x ) δg δy( x ) d x δg δy( x ) δf δy( x ) d x δg δy( x ) δg δy( x ) (19)

and so at an extremum

δf δy( x ) = δg δy( x ) d x δg δy( x ) δf δy( x ) d x δg δy( x ) δg δy( x ) (20)

As the two integrals are the same for each x , they are constants. Therefore, Equation (20) yields a result similar to what would be obtained with the use of Lagrange multipliers (though, in contrast, we again get an immediate expression for the multiplier with the new method). Note that Equations (19) and (20) are different from what follows from the analysis of Gál [6]-[11]. Gál hinted at their derivation [11], and does not explicitly include them, as it was noted that in some cases the integrals may be unbounded (which may occur when the integrals are not evaluated over a finite range of x ). Thus, this direct extension from the discrete case may not be valid in all situations, which was one of the motivations for Gál to obtain different, and more generally applicable, expressions for the functional derivatives with constraints. Nonetheless, at least for the case of a single constraint, the results following from Equation (20) (assuming the integrals are bounded) are equivalent to what follows from Gál’s approach (as they should, as both the above and Gal’s expression yield a constant that appears next to the functional derivative of the constraint, just like what appears from the method of Lagrange multipliers applied to functional derivatives).

We also consider the case of second-order functional derivatives. The matrix F of discrete (unconstrained) second derivatives now becomes the following in the continuum or functional limit

F 2 f x i x j δ 2 f δy( x )δy( x ) (21)

(where i denotes the row and j the column of F for the discrete case). The determination of F K in the same functional limit can be obtained, for example, from Equation (12), in which the discrete result is first generated followed by taking the continuum limit. The result is in general complicated, and is not provided here. But a significant simplification does arise for the case of a single constraint and when the second functional derivative of the constraint is always equal to zero. For this case (a single constraint in which its second functional derivative always vanishes), one finds that (as shown in the Appendix)

F K δ 2 f δ K y( x ) δ K y( x ) = δ 2 f δy( x )δy( x ) 1 D 1 δg δy( x ) d x δg δy( x ) δ 2 f δy( x )δy( x ) 1 D 1 δg δy( x ) d x δg δy( x ) δ 2 f δy( x )δy( x ) + 1 D 1 2 δg δy( x ) δg δy( x ) d x d x δg δy( x ) δg δy( x ) δ 2 f δy( x )δy( x ) (22)

in which

D 1 = d x δg δy( x ) δg δy( x ) (23)

Through the use of dummy variables, each term in Equation (22) retains its dependence on the functional derivatives of δy( x ) and δy( x ) . The corresponding eigenvalue equation for the second functional derivatives is given by [17]

Kψ=λψ (24)

where

Kψ( x )= d x ψ( x ) δ 2 f δ K y( x ) δ K y( x ) (25)

This is an integral equation, and gives rise to a continuum of eigenvalues (which can again be used to test the nature of the extrema).

In addition, there are cases for which f and the various constraints are functionals of multiple functions, where, for example, we have that f[ y 1 ( x ),, y n ( x ) ] subject to the constraints of g i [ y 1 ( x ),, y n ( x ) ]= K i for i=1,,m . Here, we can treat the various results as arising from the following “extended vectors”, in which

δy( x )δ y 1 ( x );δ y 1 ( x );;δ y n ( x )

f δf δ y 1 ( x ) ; δf δ y 2 ( x ) ;; δf δ y n ( x )

g i δ g i δ y 1 ( x ) ; δ g i δ y 2 ( x ) ;; δ g i δ y n ( x ) (26)

The above are again “column vectors”, with each set of functional derivatives appearing one after the other in succession and each spanning the “same height” within the column. So then, we have, for example, that

g i T g j k=1 n dx δ g i δ y k ( x ) δ g j δ y k ( x )

g i T f k=1 n dx δ g i δ y k ( x ) δf δ y k ( x ) (27)

Second functional derivatives can also be obtained, but the resulting expressions become complicated very quickly.

Finally, let’s consider an example of the application of some of the above results. For a given curve in the y-x plane, the differential arc length ds is ( 1+ ( dy/ dx ) 2 ) 1/2 dx . Given two points a and b on the x-axis, the area enclosed by the curve with the x-axis is given by a b ydx . We can then ask the following question: what is the shape of the curve of fixed arc length L joining the given points, which, along with the x-axis, encloses the largest area? In other words, we want to minimize f= a b ydx subject to the constraint of

g= a b [ 1+ ( y ) 2 ] 1 2 dx =L ,

where y = dy/ dx . The shape should be that of a circle that passes through the two points on the x-axis.

Now, with

δf δy( x ) =1

δg δy( x ) = y [ 1+ ( y ) 2 ] 3 2 = d dx ( y [ 1+ ( y ) 2 ] 1 2 ) (28)

then, according to Equation (20), at an extremum,

δf δy( x ) = δg δy( x ) d x δg δy( x ) δf δy( x ) d x δg δy( x ) δg δy( x ) (29)

Instead of attempting to directly evaluate the above two integrals, we note they have the same values for each x (as they are evaluated over the same stationary curve y and limits of integration, x =a to x =b ). Their ratio is also equal to some constant σ (which is simply the Lagrange multiplier for this problem), and so Equations (28) and (29) indicate that

1= d dx ( σ y ( 1+ ( y ) 2 ) 1 2 ) (30)

Integrating both sides, after which appears the integration constant c1, leads upon rearrangement to

y =± x+ c 1 [ σ 2 ( x+ c 1 ) 2 ] 1 2 (31)

An additional integration, and further rearrangements with a new constant of integration c2, leads to the final result of

( y c 2 ) 2 + ( x+ c 1 ) 2 = σ 2 (32)

which is, as expected, the equation of a circle with radius σ. If so desired, we can determine σ using the constraint, which will be related to L, a, and b. Three points are needed to specify uniquely a given circle, or equivalently, two points and the circumference. If L is not much larger than ba , the center of the circle will reside below the x-axis. If L is much larger than ba , the center of the circle will reside above the x-axis. (By symmetry, the center of the circle should reside halfway between a and b, so that c 1  =  ( a+b )/2 .) Finally, this method provides some insight into why σ, or the Lagrange multiplier, is the radius of the circle. The second expression of Equation (28) indicates that the functional derivative of g is the negative of the curvature of the curve y, or the negative inverse of the radius of curvature [18]. Hence, with Equation (29), the Lagrange multiplier is related to a particular weighted average of the curvature of the stationary curve. Since the curvature of a circle is constant, the Lagrange multiplier (ex post facto) is seen to be equal to the radius of the circle.

4. Determination of the First- and Second-Order Explicitly Constrained Derivatives

Returning to the discrete case, or the finite variable set x= x 1 , x 2 ,, x n , we again note that the K 1 K 2 K m -conserving gradient of f, or K f , is the projection of the unconstrained gradient, f , onto the intersection of all the constraint surfaces. As a result, each one of the n components of K f is, in general, not equal to zero, even though the n variables cannot be independently varied. Since the effect of the constraint is implicitly, and not explicitly, accounted for in K f , the specific meaning of its components is unclear (besides simply being the projection of K f onto each of the separate n initially independent directions). While it is tempting to represent the ith component of K f as ( f/ x i ) K , this partial derivative cannot be meaningful as written. Unlike what was implied for g/ x i , ( f/ x i ) K cannot also imply that all variables besides x i are held fixed, otherwise the K -conserving constraints cannot be satisfied. Such a K -conserving partial derivative, if it is to be meaningfully written, must indicate explicitly which variables are held fixed and which variables are still allowed to vary. For the case of n variables with a single constraint, only n2 variables can be held constant upon partial differentiation, with the variation of the remaining 2 variables being coupled via the constraint. For example, a meaningful constrained first derivative in this case can be represented by ( f/ x i ) K,x[ i,n ] , in which we have arbitrarily chosen x n to be the additional parameter that is allowed to vary and x[ i,n ] denotes that all x are to be held fixed except x i and x n (with in ). There are of course several choices for the dependent or “floating” variable in these partial derivatives. But as presently derived, K f does not immediately provide information about any one of these possible derivatives ( f/ x i ) K,x[ i,n ] .

4.1. Case of a Single Constraint

When there is a single constraint, the derivation of ( f/ x i ) K,x[ i,n ] is straightforward. Using the chain rule, we have that

( f x i ) K,x[ i,n ] = ( f x i ) x[ i ] + ( f x n ) x[ n ] ( x n x i ) K,x[ i,n ] (33)

where, for example, ( f/ x i ) x[ i ] = f/ x i is the partial derivative in which all variables are held fixed except x i (i.e., the standard unconstrained partial derivatives). Noting for the single constraint that dg= g T δx=0 , then

( x n x i ) K,x[ i,n ] = g/ x i g/ x n (34)

a result that is only meaningful if x n appears explicitly within the constraint, i.e., g/ x n 0 . Equation (34) also follows from the implicit function theorem, or implicit differentiation [19]. Equations (33) and (34) can be combined to evaluate the constrained first-order derivative for any appropriate choice of i and n.

Since Q m is the projection matrix onto the K -conserving direction, and thus contains information on variations that reside within the intersection of all those surfaces, Equations (33) and (34) can likewise be obtained using various properties of Q m . We therefore repeat the derivation of these equations, along with generating some additional new relations, making use of Q m . While for a single constraint this approach is admittedly less straightforward than how Equation (33) was already derived, generating these relations in another way nevertheless proves to be useful. For the case of multiple constraints, the use of Q m and its various properties greatly simplifies the derivation of the explicitly constrained first- and second-order derivatives.

To begin, and to illustrate the approach, we again consider the case in which there is a single constraint, g. Now, a set of vectors that reside within, or are tangent to, the constraint surface can be generated from the eigenvectors of Q 1 . If u T =[ u 1 , u 2 ,, u n ] denotes an eigenvector of Q 1 corresponding to one of its n1 eigenvalues with value 1, then by definition Q 1 u=u , i.e., u already resides within the constraint surface. With Equation (1), Q 1 u=u reduces to g( g T u )=0 , which yields n identical equations, all of which are given by

( g x 1 ) u 1 +( g x 2 ) u 2 ++( g x n ) u n =0 (35)

Since u , as an eigenvector of Q 1 , is orthogonal to g , the vector normal to the constraint surface, we have that g T u=0 . To generate a set of n1 eigenvectors that all correspond to Equation (35), we need to set the values of n2 components of u in the above equation.

For example, if we are interested in the relative variations of, say, x 1 and x n , such that x 2 through x n1 are constant and the constraint is maintained, then an eigenvector that corresponds to such a variation is obtained from Equation (35) upon setting u 2 through u n1 equal to zero, or ( g x 1 ) u 1 =( g x n ) u n . If we are further interested in the resulting variation in x n relative to a given variation in x 1 , then we set u 1 =1 and solve for u n = ( g/ x 1 )/ ( g/ x n ) , which corresponds to Equation (34). The resulting eigenvector that is in the direction of this variation is u T =[ 1,0,0,, ( g/ x 1 )/ ( g/ x n ) ] . More formally, the variations of x 1 and x n (based on a change in x 1 ), such that x 2 through x n1 are constant and the constraint is maintained, is given by δ K,[ 1,n ] x=δ x 1 u . Now, the K-conserving first-order variation of f such that x 2 through x n1 are also constant, or δ K,[ 1,n ] f , is generated by projecting f onto the direction of this eigenvector. Hence,

δ K,[ 1,n ] f=δ x 1 ( f x 1 ) K,x[ 1,n ] = δ K,[ 1,n ] x T f=δ x 1 u T f (36)

or ( f/ x 1 ) K,x[ 1,n ] = u T f , a result which upon evaluation matches Equations (33) and (34).

By again selecting x n to be that variable which depends upon the variation of one of the other variables, we can generate the remaining n2 eigenvectors corresponding to Equation (35) in exactly the same manner. The full set of n1 eigenvectors, for the arbitrary choice of x n as the dependent variable, is u 1 T =[ 1,0,0,, ( g/ x 1 )/ ( g/ x n ) ] , u 2 T =[ 0,1,0,, ( g/ x 2 )/ ( g/ x n ) ] , , u n1 T =[ 0,0,0,,1, ( g/ x n1 )/ ( g/ x n ) ] . Note that these n1 eigenvectors are linearly independent since c 1 u 1 + c 2 u 2 ++ c n1 u n1 =0 requires each of the constants to equal zero or c 1 = c 2 == c n1 =0 . Hence, these eigenvectors form a basis set to describe all other variations that reside in the constraint surface. (There are of course other linearly independent basis sets that can be generated, based on the choice of which variable is selected as the common dependent variable.) The projection of f onto each of these eigenvectors generates in turn the following n1 constrained first-order derivatives of f: ( f/ x 1 ) K,x[ 1,n ] , ( f/ x 2 ) K,x[ 2,n ] , , ( f/ x n1 ) K,x[ n1,n ] .

With n choices for the first-order variation of interest, followed by n1 additional choices for the dependent variable, there are n( n1 ) K-conserving first-order derivatives of f that can be generated. Nevertheless, Equations (33) and (34) imply, for example, that ( g/ x n ) ( f/ x i ) K,x[ i,n ] +( g/ x i ) ( f/ x n ) K,x[ i,n ] =0 . Hence, not all of these derivatives are independent. The generation of a linearly independent set of n1 eigenvectors indicates that only n1 of these derivatives are independent (as the eigenvector corresponding to one of the n( n1 ) derivatives can therefore be obtained via a linear combination of the chosen n1 basis set).

Thus, at an extremum, each one of these linearly independent K-conserving first-order derivatives should vanish. Unlike the extremal condition based on K f=0 , which corresponds to n equations, the use of, for example, ( f/ x i ) K,x[ i,n ] =0 ( i=1,,n1 ) at an extrema, corresponds to only n1 equations (which is consistent with the fact that there are only n1 degrees of freedom for this system with a single constraint). Although there is a reduction in the number of conditions to solve at an extrema, additional work is required for this approach. The application of K f=0 is rather straightforward, and still seems to be the preferred approach, as it avoids the additional required step of generating the eigenvectors of Q 1 (although the eigenvectors for the case of a single constraint are rather straightforward to obtain).

All of these n1 independent variations can be represented in the following manner. Let U=[ u 1 u 2 u n1 ] be the n×( n1 ) matrix whose columns are the linearly independent eigenvectors obtained earlier. Hence, δ K,[ i,n ] x n1 =Uδ x n1 , where δ x n1 indicates that we are now only dealing with the first n1 variations, and does not include the variation of the nth variable. The constrained first-order variation of f based on these linearly independent n1 variations is therefore given by

δ K f= δ K,[ i,n ] x n1 T f=δ x n1 T U T f (37)

Hence, K,[ i,n ] f= U T f , where the ith component of K,[ i,n ] is ( / x 1 ) K,x[ i,n ] . Note that, just by considering their dimensions, K f= Q 1 f U T f= K,[ i,n ] f . But a connection between these two gradients can be obtained. Since the columns of U are the eigenvectors of Q 1 , then Q 1 U=U , and so (after transposing both sides) U T Q 1 = U T . Therefore,

K,[ i,n ] f= U T f= U T Q 1 f= U T K f (38)

which indicates that the individual terms of K,[ i,n ] f are the further projections of K f onto each of the linearly independent eigenvectors of Q 1 , e.g., K,[ 1,n ] f= ( f/ x 1 ) K,x[ 1,n ] = u 1 T K f= u 1 T f .

Now, by the construction of the eigenvectors, we have that

( g x i )( g x n ) u i T f=( g x i )( g x n )( f x i ) ( g x i ) 2 ( f x n ) =( g x i )( g x n ) ( f x i ) K,x[ i,n ] (39)

Summing the above from i=1,,n1 , and then comparing with K f= Q 1 f , we find after rearrangement that the nth component of K f is given by

( f x n ) K ( f x n ) 1 D 1 ( g x n )( g T f )= 1 D 1 ( g x n ) i=1 n1 ( g x i ) ( f x i ) K,x[ i,n ] (40)

which provides a connection between the components of K f and the various constrained first-order derivatives. Equation (40) applies to any component of K f or for which variable is allowed to “float”, simply by replacing n with j.

Explicitly constrained second-order derivatives of f, such as ( 2 f/ x 1 2 ) K,x[ 1,n ] , can also be obtained, and are best done so by using the operator formalism and the repeated application of the chain rule. For example, Equations (33) and (36) imply that

( x 1 ) K,x[ 1,n ] = x 1 +( x n ) ( x n x 1 ) K,x[ 1,n ] = u 1 T (41)

Thus, by making sure to maintain the repeated and correct ordering of the gradient operator, one can show that

( 2 x 1 2 ) K,x[ 1,n ] = u 1 T ( u 1 T )= u 1 T ( u 1 T )+ u 1 u 1 T : T (42)

where the second term on the far-right side corresponds to the Frobenius inner product of the two matrices. Consequently,

( 2 f x 1 2 ) K,x[ 1,n ] = u 1 T ( u 1 T )f+ u 1 u 1 T :F= u 1 T ( u 1 T )f+ u 1 T F u 1 (43)

Similar operations can be performed to obtain the mixed second-order derivatives, e.g., first performing ( / x 1 ) K,x[ 1,n ] followed by ( / x 2 ) K,x[ 2,n ] . In general, the result becomes

( x i ) K,x[ i,n ] ( f x j ) K,x[ j,n ] = u i T ( u j T )f+ u i u j T :F= u i T ( u j T )f+ u i T F u j (44)

As these are derivatives that explicitly maintain the constraint (for an arbitrary subset of variables), Equation (44) indicates that these mixed second derivatives ( ij ) are not necessarily equal, as in general u i T u j T . There are various types of constraints, or forms of the constraint function g, in which u i T = u j T , and so the symmetric nature of the mixed second derivatives is recovered in these cases.

A matrix of these explicit K-conserving second-order derivatives of f can be obtained from Equation (44). For example, the K-conserving second-order variation of f based on the chosen linearly independent n1 variations is, as before, given by or again noting that

δ K 2 f= δ K δ K f=( δ K,[ i,n ] x n1 T )( δ K,[ i,n ] x n1 T f ) =( δ x n1 T U T )( δ x n1 T U T f )=δ x n1 T U T ( U T f ) T δ x n1 (45)

which allows us to introduce the following new matrix

F K,n1 = U T ( U T f ) T = U T f T ( U )+ U T FU= U T f T ( U )+U U T :F (46)

The above also could have been obtained from defining F K,n1 K,[ i,n ] K,[i,n T f and again noting that K,[ i,n ] f= U T f . This is an ( n1 )×( n1 ) matrix that is not necessarily symmetric. The eigenvalues of F K,n1 can likewise be used to determine the nature of the extrema, which given its smaller size may be more convenient to evaluate than F K .

As an illustration of the new approach, consider f= x 1 x 2 subject to the constraint of g= x 1 2 + x 2 2 =1 . Upon direct substitution using the constraint, f=± x 1 ( 1 x 1 2 ) 1/2 (where the + sign corresponds to x 2 >0 and the − sign to x 2 <0 ). Therefore, ( df/ d x 1 ) K,x[ 1,2 ] =± ( 12 x 1 2 )/ ( 1 x 1 2 ) 1 2 = ( x 2 2 x 1 2 )/ x 2 and ( d 2 f/ d x 1 2 ) K,x[ 1,2 ] =± x 1 ( 2 x 1 2 3 )/ ( 1 x 1 2 ) 3 2 = x 1 ( x 1 2 x 2 2 2 )/ x 2 3 . This function has several constrained extrema [1], at which ( f/ x 1 ) K,x[ 1,2 ] =0 (and taking x 2 0 ), which implies that x 2 2 = x 1 2 . With the constraint, there are then four extrema occurring at x 1 =± 1 2 and x 2 =± 1 2 .

Now, for this problem, and using Equation (1),

Q 1 =( 1 x 1 2 x 1 x 2 x 1 x 2 1 x 2 2 )=( x 2 2 x 1 x 2 x 1 x 2 x 1 2 ) (47)

Thus, U T = u 1 T =( 1, x 1 / x 2 ) and so

( f x 1 ) K,x[ 1,2 ] = u 1 T f= f x 1 x 1 x 2 f x 2 = x 2 x 1 2 x 2 = x 2 2 x 1 2 x 2 (48)

matching the result obtained by direct substitution. In addition, from Equation (43)

( 2 f x 1 2 ) K,x[ 1,2 ] =( 1, x 1 x 2 )( 0 1 x 2 0 x 1 x 2 2 )( x 2 x 1 )+( 1 x 1 x 2 x 1 x 2 x 1 2 x 2 2 ):( 0 1 1 0 ) = x 1 x 2 ( 1+ x 1 2 x 2 2 ) 2 x 1 x 2 = x 1 ( 1+2 x 2 2 ) x 2 3 = x 1 ( x 1 2 x 2 2 2 ) x 2 3 (49)

again matching the previous result. In addition, at the extrema, Equation (49) indicates that ( 2 f/ x 1 2 ) K,x[ 1,2 ] = 2 x 1 / x 2 3 = 4 x 1 / x 2 . Hence, the two extrema in which x 1 and x 2 have the same sign correspond to maxima, while those with opposite signs correspond to minima.

4.2. Case of Multiple Constraints

Meaningful constrained first-order derivatives of f for the case of multiple constraints can be obtained in a manner similar to what was done for a single constraint. With n total variables and m constraints ( m<n ), there are now only nm degrees of freedom. Hence, we can determine an explicitly constrained first-order derivative of f with respect to a variation in x i , with m additional variables also allowed to vary because of the constraints, and with the remaining n( m+1 ) variables held fixed. If, for example, x n( m1 ) through x n are chosen to be the m dependent or “floating parameters”, then via the chain rule we have that

( f x i ) K,x[ i,n( m1 ),,n ] = f x i +( f x n( m1 ) ) ( x n( m1 ) x i ) K,x[ i,n( m1 ),,n ] + +( f x n ) ( x n x i ) K,x[ i,n( m1 ),,n ] (50)

in which x[ i,n( m1 ),,n ] again denotes the variables that are not held constant in the derivatives, K K 1 ,, K m and, for example, f/ x i = ( f/ x i ) x[ i ] . Unlike the case of a single constraint, due to the complicated and possible coupled nature of the multiple constraints, a direct determination of, for example, ( x n / x i ) K,x[ i,n( m1 ),,n ] may be cumbersome. But as before, information about such derivatives are contained within Q m , or more precisely by the eigenvectors of Q m .

Again, a set of vectors that reside within, or are tangent to, the constraint surfaces can be generated from the eigenvectors of Q m that have an eigenvalue equal to one, for which Q m u=u . Since these nm eigenvectors are already in the K -conserving direction, they are of course also orthogonal to all the gradient vectors of the various constraint surfaces. In other words, this eigenvalue equation is equivalent to the following set of orthogonality conditions, g k T u=0 for k=1,,m , which proves more convenient for generating the various eigenvectors. (With u T Q m = u T , then u T Q m g k = u T g k =0 as Q m g k =0 . Hence, u T g k = g k T u=0 . Or, g k T Q m u= g k T u . But g k T Q m =0 , so g k T u=0 . Again, there is a connection between these eigenvectors and the implicit function theorem for these multiple constraints.) We therefore have m separate equations with n variables, for a total of nm degrees of freedom. For each of the possible nm eigenvectors, we therefore set the value of one component equal to 1 (the variable x i whose derivative we are evaluating in Equation (50)), while setting equal to zero the values of n( m+1 ) other components of the eigenvector (those held strictly fixed in Equation (50)). The orthogonality conditions are then solved to obtain the relative variations of the remaining m dependent variables (again all relative to changes in x i ).

If any one of these eigenvectors is denoted as u i , then as before

( f x i ) K,x[ i,n( m1 ),,n ] = u i T f (51)

Generating the nm eigenvectors in this way again yields a set of linearly independent vectors, and will therefore form a basis set to describe other variations that reside in all of the constraint surfaces. And as before, all of these nm relevant variations can be represented through the use of the matrix U , which is now the n×( nm ) matrix whose columns are the linearly independent eigenvectors just obtained. Hence, as before,

K,[ i,n( m1 ),,n ] f= U T f= U T Q m f= U T K f (52)

Furthermore, explicitly constrained second-order derivatives of f can again be generated. If, for example, x n( m1 ) through x n are the m “floating parameters”, then

( x i ) K,x[ i,n( m1 ),,n ] ( f x j ) K,x[ j,n( m1 ),,n ] = u i T ( u i T )f+ u i u j T :F= u i T ( u j T )f+ u i T F u j (53)

Again, these mixed second derivatives ( ij ) are not necessarily equal, as in general u i T u j T .

Finally, we introduce the following matrix

F K,nm = U T ( U T f ) T = U T f T ( U )+ U T FU= U T f T ( U )+U U T :F (54)

which is formally identical to Equation (46). The eigenvalues of F K,nm can again be used to determine the nature of the extrema, which given its smaller size may again be more convenient to evaluate.

As an example of the application of some of the above results, consider the function f= x 1 2 + x 2 2 + x 3 2 subject to the two constraints of g 1 =2 x 1 + x 2 =1 and g 2 = x 1 + x 3 =2 . From direct substitution of the two constraints into the function, f= x 1 2 + ( 12 x 1 ) 2 + ( 2 x 1 ) 2 =6 x 1 2 8 x 1 +5 , and so df/ d x 1 =12 x 1 8 and d 2 f/ d x 1 2 =12 . For this problem [1], a constrained extremum exists at x 1 =2/3 , where df/ d x 1 =12 x 1 8=0 , and which resides at a minimum.

Now, for this problem,

Q 2 = 1 6 | 5 2 g 1 T 2 2 g 2 T g 1 g 2 I | = 1 3 g 1 g 2 T 1 3 g 1 g 1 T 5 6 g 1 g 1 T + 1 3 g 1 g 1 T +I = 1 6 ( 1 2 1 2 4 2 1 2 1 ) (55)

The single eigenvector of Q 2 with eigenvalue of 1 can also be obtained from the two conditions, g 1 u=0 and g 2 u=0 , or 2 u 1 + u 2 =0 and u 1 + u 3 =0 . If we are interested in obtaining the explicitly constrained derivative with respect to x 1 (i.e., u 1 =1 ), the eigenvector satisfying these two conditions is u T =( 1,2,1 ) . (Note that g 1 represents a plane parallel to the x 3 -direction with a normal vector ( 2,1,0 ) , while g 2 represents a plane parallel to the x 2 -direction with a normal vector ( 1,0,1 ) . The vector residing along the line resulting from the intersection of these two constraint planes must be orthogonal to both normal vectors, which in this case can also be obtained from their cross product, or g 1 × g 2 =( 1,2,1 ) , and is identical to u .) Thus,

( f/ x 1 ) K,x[ 1,2,3 ] = u T f=2 x 1 4 x 2 2 x 3 =12 x 1 8

(upon the subsequent use of the constraints). In addition,

( 2 f/ x 1 2 ) K,x[ 1,2,3 ] = u T ( u T )f+u u T :F=12

(also noting that u T =0 ), as found before. We can equivalently generate, for this problem, the now 1 × 1 matrix from Equation (54), F K,n2 = u T Fu=12 , again confirming that the extremum is a minimum.

Finally, again consider finding the extrema of f= x 1 2 + x 2 2 + x 3 2 , but now subject to the two constraints of g 1 = x 1 x 2 + x 3 =2 and g 2 = x 1 2 + x 2 2 =4 . (For this problem, we are identifying those points along the intersection of a plane and a cylinder, or an ellipse oriented in the three-dimensional space, that are closest to and furthest from the origin.) As shown in Ref. [1], using Equations (7) and (12), there are four extrema located at ( 0,2,0 ) , ( 2,0,0 ) , ( 2 , 2 ,22 2 ) , and ( 2 , 2 ,2+2 2 ) , for which ( ± 2 , 2 ,22 2 ) correspond to the two furthest points from the origin (maxima), while the other two extremal points are the two closest points to the origin (minima).

The single eigenvector of Q 2 with eigenvalue of 1 can be obtained from the two conditions, g 1 u=0 and g 2 u=0 , or u 1 u 2 + u 3 =0 and x 1 u 1 + x 2 u 2 =0 . If we are interested in obtaining the explicitly constrained derivative with respect to x 1 (i.e., u 1 =1 ), the eigenvector satisfying these two conditions is

u T =( 1, x 1 x 2 , x 1 + x 2 x 2 ) .

(Again note that the vector residing in the direction along the intersection of these two constraint surfaces is proportional to g 1 × g 2 =( 2 x 2 ,2 x 1 ,2 x 1 +2 x 2 ) , and is identical to 2 x 2 u .) Thus,

( f x 1 ) K,x[ 1,2,3 ] = u T f=2 x 3 ( x 1 + x 2 ) x 2 (56)

which also follows from the direct substitution of the constraints into f. At an extremum, this derivative is equal to zero, which can be satisfied for either x 3 =0 or x 1 + x 2 =0 , and when x 2 0 . For x 1 + x 2 =0 , or x 1 = x 2 , the second constraint indicates that x 1 =± 2 , and so x 2 = 2 , which then requires that x 3 =22 2 upon using the first constraint. For x 3 =0 , the first constraint reduces to x 1 x 2 =2 , which when substituted into the second constraint leads to the following two possible cases of x 1 =0 , x 2 =2 and x 1 =2 , x 2 =0 . While the latter case is one of the extrema identified in Ref. [1], this result must nevertheless be ignored. Hence, for this problem, Equation (56) can be used to identify three of the previously four determined extremal points. In addition,

( 2 f x 1 2 ) K,x[ 1,2,3 ] = u T ( u T )f+u u T :F= 2[ x 2 ( x 1 + x 2 ) 2 x 3 ( x 1 2 + x 2 2 ) ] x 2 3 (57)

At ( 0,2,0 ) ,   ( 2 f/ x 1 2 ) K,x[ 1,2,3 ] >0 , which corresponds to a local minimum. At ( 2 , 2 ,22 2 ) and ( 2 , 2 ,2+2 2 ) , both result in ( 2 f/ x 1 2 ) K,x[ 1,2,3 ] <0 , indicating these are local maxima (all results that were determined in Ref. [1]).

(Note that the method in Ref. [1] makes use of a vector in the direction along the intersection of the two constraints that always has a finite magnitude. Hence, such a vector remains well-behaved all along the ellipse, even at x 1 =0 or x 2 =0 , so that Equations (7) and (12) are themselves able to identify all four extremal points. While the eigenvector u also points in the same direction along the intersection of these two constraints, it is not necessarily required to have a finite magnitude. Since these explicitly constrained derivatives consider how the dependent, or “floating”, variables change due to a “unit variation” of the chosen independent variable, u is instead required to always have a finite value (equal to 1) for one of its components. In order to keep u pointing in the correct direction around the ellipse, some of the other components of u may therefore diverge, as occurs for example when x 2 =0 . Thus, these explicitly constrained derivatives may not, in general, be able to identify all relevant extremal points.

To identify the other extremal point, we evaluate the explicitly constrained derivatives with respect to x 2 . With u 2 =1 , the appropriate eigenvector is now given by u T =( x 2 x 1 ,1, x 1 + x 2 x 1 ) , which leads to

( f x 2 ) K,x[ 1,2,3 ] = u T f=2 x 3 ( x 1 + x 2 ) x 1 (58)

Equation (58) again identifies the two extremal points corresponding to x 1 + x 2 =0 . But now with x 1 0 , the case of x 3 =0 can be used to identify the fourth extremal point at x 1 =2 , x 2 =0 . Moreover,

( 2 f x 2 2 ) K,x[ 1,2,3 ] = 2[ x 1 ( x 1 + x 2 ) 2 + x 3 ( x 1 2 + x 2 2 ) ] x 1 3 (59)

which indicates that ( 2,0,0 ) corresponds to a local minimum. These constrained derivatives allow for the straightforward determination of both the extrema and their nature, all of which were obtained without any knowledge of the Lagrange multipliers.

Finally, for completeness, we also consider the explicitly constrained derivatives with respect to x 3 . Now with u T =( x 2 x 1 + x 2 , x 1 x 1 + x 2 ,1 ) , then ( f/ x 3 ) K,x[ 1,2,3 ] =2 x 3 and ( 2 f/ x 3 2 ) K,x[ 1,2,3 ] =2 . (Both of these results directly follow from the substitution of the second constraint into the function, or f= x 1 2 + x 2 2 + x 3 2 =4+ x 3 2 .) These constrained derivatives only provide information about two of the four extremal points, in which x 3 =0 .

5. Conclusions

Through the use of linear algebra, and in particular the Gram-Schmidt orthogonalization process, the local extrema of functions for which its independent variables are also subject to constraints were determined without the use of Lagrange multipliers. This approach was extended to the continuum limit, thereby obtaining the extrema of functionals subject to one or more constraints. The direct determination of the explicitly constrained first- and second-order derivatives was also considered, and various additional expressions were obtained for generating information about the nature of the extrema (again without the need to evaluate any Lagrange multipliers).

As noted previously, avoiding the use of Lagrange multipliers may simplify the analysis of various problems of interest. As our method can also be used to obtain immediate expressions for the Lagrange multipliers (in terms of the gradients of the function and constraints), the new relations provided here may yield some new and insightful relations into several physical problems of longstanding interest, which also require the imposition of various constraints. Such claims remain to be tested, and future work should be directed towards the application of the results provided here to standard problems of interest (e.g., the use of the Gibbs entropy formula along with appropriate constraints to generate the probability distributions for the various ensembles of statistical mechanics), as well as some new ones.

Appendix. Derivation of the Functional Derivatives by Analogy

Starting with

Q ˜ m =| g 1 T g 1 g 1 T g m g 1 T g m T g 1 g m T g m g m T g 1 g m I | (60)

and

F K = 1 D m 2 Q ˜ m ( Q ˜ m f ) T (61)

which is only appropriate at the constrained extrema, we note for the case of a single constraint that

Q ˜ 1 = D 1 Ig g T (62)

where D 1 = g T g . Thus, ( Q ˜ 1 f ) T = D 1 f T ( g T f ) g T . Hence,

( Q ˜ 1 f ) T = D 1 f T + D 1 f T ( g T f ) g T ( g T f ) g T (63)

With F= f T and now defining G= g T , and also noting that D 1 =2Gg and ( g T f )=Gf+Fg , Equation (63) can also be expressed as

( Q ˜ 1 f ) T = D 1 FFg g T +2Gg f T ( g T f )GGf g T (64)

So with D 1 2 F K = Q ˜ 1 ( Q ˜ 1 f ) T , then

D 1 2 F K = D 1 2 F D 1 Fg g T D 1 g g T F+g g T Fg g T +2 D 1 Gg f T D 1 ( g T f )G D 1 Gf g T 2g g T Gg f T +g g T Gf g T (65)

Since, for example, g T Fg is a scalar, the above can also be rewritten as

D 1 2 F K = D 1 2 F D 1 Fg g T D 1 g g T F+( g T Fg )g g T +2 D 1 Gg f T D 1 ( g T f )G D 1 Gf g T 2( g T Gg )g f T +( g T Gf )g g T (66)

Finally, for the case in which G=0 , Equation (66) reduces to

D 1 2 F K = D 1 2 F D 1 Fg g T D 1 g g T F+( g T Fg )g g T (67)

Now, the matrix F of discrete (unconstrained) second derivatives becomes the following in the continuum or functional limit

F 2 f x i x j δ 2 f δy( x )δy( x ) (68)

(where i denotes the row and j the column of F for the discrete case). Thus, within Equation (67),

Fg= j F ij g x j d x δ 2 f δy( x )δy( x ) δg δy( x ) (69)

g T F= i g x i F ij dx δg δy( x ) δ 2 f δy( x )δy( x ) (70)

g T Fg= i j g x i F ij g x j dxd x δg δy( x ) δ 2 f δy( x )δy( x ) δg δy( x ) (71)

The other terms appearing in Equation (66) can be evaluated similarly as in the above.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Corti, D.S. and Fariello, R. (2021) Avoiding the Use of Lagrange Multipliers: I—Evaluating the Constrained Extrema of Functions with Projection Matrices. Operations Research Forum, 2, Article No. 59.
https://doi.org/10.1007/s43069-021-00100-0
[2] Zizza, F. (1998) Differential Forms for Constrained Max-Min Problems: Eliminating Lagrange Multipliers. The College Mathematics Journal, 29, 387-396.
https://doi.org/10.1080/07468342.1998.11973974
[3] Gigena, S. (2013) Constrained Local Extrema without Lagrange Multipliers and the Higher Derivative Test.
[4] Spring, D. (1985) On the Second Derivative Test for Constrained Local Extrema. The American Mathematical Monthly, 92, 631-643.
https://doi.org/10.1080/00029890.1985.11971702
[5] Nerenberg, M.A.H. (1991) The Second Derivative Test for Constrained Extremum Problems. International Journal of Mathematical Education in Science and Technology, 22, 303-308.
https://doi.org/10.1080/0020739910220215
[6] Gál, T. (2001) Differentiation of Density Functionals That Conserves the Normalization of the Density. Physical Review A, 63, Article 022506.
https://doi.org/10.1103/physreva.63.022506
[7] Gál, T. (2002) Functional Differentiation under Conservation Constraints. Journal of Physics A: Mathematical and General, 35, 5899-5905.
https://doi.org/10.1088/0305-4470/35/28/309
[8] Gál, T. (2007) Functional Differentiation under Simultaneous Conservation Constraints. Journal of Physics A: Mathematical and Theoretical, 40, 2045-2052.
https://doi.org/10.1088/1751-8113/40/9/010
[9] Gál, T. (2007) Differentiation of Functionals with Variables Coupled by Conservation Constraints: Analysis through a Fluid-Dynamical Model. Journal of Mathematical Physics, 48, Article 053520.
https://doi.org/10.1063/1.2737265
[10] Gál, T. (2010) Stability of Equilibrium under Constraints: Role of Second-Order Constrained Derivatives. Journal of Physics A: Mathematical and Theoretical, 43, Article 425208.
https://doi.org/10.1088/1751-8113/43/42/425208
[11] Gál, T. (2012) On Constrained Second Derivatives.
[12] Chandler, D. (1987) Introduction to Modern Statistical Mechanics. Oxford University Press.
[13] Baxley, J.V. and Moorhouse, J.C. (1984) Lagrange Multiplier Problems in Economics. The American Mathematical Monthly, 91, 404-412.
https://doi.org/10.1080/00029890.1984.11971446
[14] Zheng, Y. and Meng, Z. (2017) A New Augmented Lagrangian Objective Penalty Function for Constrained Optimization Problems. Open Journal of Optimization, 6, 39-46.
https://doi.org/10.4236/ojop.2017.62004
[15] Cacuci, D.G. (2023) Fourth-Order Predictive Modelling: I. General-Purpose Closed-Form Fourth-Order Moments-Constrained Maxent Distribution. American Journal of Computational Mathematics, 13, 413-438.
https://doi.org/10.4236/ajcm.2023.134024
[16] Jarab’ah, O.A. (2023) Lagrangian Formulation of Fractional Nonholonomic Constrained Damping Systems. Advances in Pure Mathematics, 13, 552-558.
https://doi.org/10.4236/apm.2023.139037
[17] Davis, H.T. (1996) Statistical Mechanics of Phases, Interfaces, and Thin Films. Wiley-VCH.
[18] Kreyszig, E. (1991) Differential Geometry. Dover Publications.
[19] Boas, M.L. (1983) Mathematical Methods in the Physical Sciences. 2nd Edition, Wiley.

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.