^{1}

^{2}

^{*}

^{3}

We prove that the density function of the gradient of a sufficiently smooth function
_{}, obtained via a random variable transformation of a uniformly distributed random variable, is increasingly closely approximated by the normalized power spectrum of
_{ }as the free parameter
_{}. The frequencies act as gradient histogram bins. The result is shown using the stationary phase approximation and standard integration techniques and requires proper ordering of limits. We highlight a relationship with the well-known characteristic function approach to density estimation, and detail why our result is distinct from this method. Our framework for computing the joint density of gradients is extremely fast and straightforward to implement requiring a single Fourier transform operation without explicitly computing the gradients.

Density estimation methods provide a faithful estimate of a non-observable probability density function based on a given collection of observed data [

In computer vision parlance—a popular application area for density estimation—these gradient density functions are popularly known as the histogram of oriented gradients (HOG) and are primarily employed for human and object detection [

In our earlier effort [

We introduce a new approach for computing the density of Y, where we express the given function S as the phase of a wave function ϕ , specifically ϕ ( x ) = exp ( i S ( x ) τ ) for small values of τ , and then consider the normalized power spectrum—squared magnitude of the Fourier transform—of ϕ [

lim τ → 0 ∫ N η ( u 0 ) ¯ P τ ( u ) d u = ∫ N η ( u 0 ) ¯ P ( u ) d u

where N η ( u 0 ) ¯ is a small neighborhood around u 0 . We would like to emphasize that our work is fundamentally different from estimating the gradient of a density function [

As mentioned before, the main objective of our current work is to generalize our effort in [

• One of the foremost advantages of our wave function approach is that it recovers the joint gradient density function of S without explicitly computing its gradient. Since the stationary points capture gradient information and map them into the corresponding frequency bins, we can directly work with S without the need to compute its derivatives.

• The significance of our work is highlighted when we deal with the more practical finite sample-set setting wherein the gradient density is estimated from a finite, discrete set of samples of S rather than assuming the availability of the complete description of S on Ω . Given the N samples of S on Ω , it is customary to know the approximation error of a proposed density estimation method as N → ∞ . In [

• Furthermore, obtaining the gradient density using our framework in the finite N sample setting is simple, efficient, and computable in O ( N log N ) time as elucidated in the last paragraph of Section 4.

Our wave function method is motivated by the classical-quantum relation, wherein classical physics is expressed as a limiting case of quantum mechanics [

We begin with a compact measurable subset Ω of ℝ d on which we consider a smooth function S : Ω → ℝ . We assume that the boundary of Ω is smooth and the function S is well-behaved on the boundary as elucidated in Appendix B. Let H x denote the Hessian of S at a location x ∈ Ω and let det ( H x ) denote its determinant. The signature of the Hessian of S at x , defined as the difference between the number of positive and negative eigenvalues of H x , is represented by σ x . In order to exactly determine the set of locations where the joint density of the gradient of S exists, consider the following three sets:

A u = { x : ∇ S ( x ) = u } , (2.1)

B = { x : det ( H x ) = 0 } , (2.2)

and

C = { ∇ S ( x ) : x ∈ B ∪ ∂ Ω } . (2.3)

Let N ( u ) = | A u | . We employ a number of useful lemma, stated here and proved in Appendix A.

Lemma 2.1. [Finiteness Lemma] A u is finite for every u ∉ C .

As we see from Lemma 2.1 above, for a given u ∉ C , there is only a finite collection of x ∈ Ω that maps to u under the function ∇ S . The inverse map ∇ S ( − 1 ) ( u ) which identifies the set of x ∈ Ω that maps to u under ∇ S is ill-defined as a function as it is a one to many mapping. The objective of the following lemma (Lemma 2.2) is to define appropriate neighborhoods such that the inverse function ∇ S ( − 1 ) , required in the proof of our main Theorem 3.2, when restricted to those neighborhoods is well-defined.

Lemma 2.2. [Neighborhood Lemma] For every u 0 ∉ C , there exists a closed neighborhood N η ( u 0 ) ¯ around u 0 such that N η ( u 0 ) ¯ ∩ C is empty. Furthermore, if | A u 0 | > 0 , N η ( u 0 ) ¯ can be chosen such that we can find a closed neighborhood N η ( x ) ¯ around each x ∈ A u 0 satisfying the following conditions:

1) ∇ S ( N η ( x ) ¯ ) = N η ( u 0 ) ¯ .

2) det ( H y ) ≠ 0, ∀ y ∈ N η ( x ) ¯ .

3) The inverse function ∇ S x ( − 1 ) ( u ) : N η ( u 0 ) ¯ → N η ( x ) ¯ is well-defined.

4) For y , z ∈ N η ( x ) ¯ , σ y = σ z .

Lemma 2.3 [Density Lemma] Given X ∼ U N I ( Ω ) , the probability density of Y = ∇ S ( X ) on ℝ d − C is given by

P ( u ) = 1 μ ( Ω ) ∑ k = 1 N ( u ) 1 | det ( H x k ) | (2.4)

where x k ∈ A u , ∀ k ∈ { 1,2, ⋯ , N ( u ) } and μ is the Lebesgue measure.

From Lemma 2.3, it is clear that the existence of the density function P at a location u ∈ ℝ d necessitates a non-vanishing Hessian matrix ( det ( H ) ≠ 0 ) ∀ x ∈ A u . Since we are interested in the case where the density exists almost everywhere on ℝ d , we impose the constraint that the set B in (2.2), comprising all points where the Hessian vanishes, has zero Lebesgue measure. It follows that μ ( C ) = 0 . Furthermore, the requirement regarding the smoothness of S ( S ∈ C ∞ ( Ω ) ) can be relaxed to functions S in C d 2 + 1 ( Ω ) where d is the dimensionality of Ω as we will see in Section 3.2.2.

Define the function F τ : ℝ d → ℂ as

F τ ( u ) = 1 ( 2 π τ ) d 2 μ ( Ω ) 1 2 ∫ Ω exp ( i τ [ S ( x ) − u ⋅ x ] ) d x (3.1)

or τ > 0 . F τ is very similar to the Fourier transform of the function exp ( i S ( x ) τ ) . The normalizing factor in F τ comes from the following lemma (Lemma 3.1) whose proof is given in Appendix A.

Lemma 3.1. [Integral Lemma ] F τ ∈ L 2 ( ℝ d ) and ‖ F τ ‖ 2 = 1 .

The power spectrum defined as

P τ ( u ) ≡ F τ ( u ) F τ ( u ) ¯ (3.2)

equals the squared magnitude of the Fourier transform. Note that P τ ≥ 0 . From Lemma (3.1), we see that ∫ P τ ( u ) d u = 1 . Our fundamental contribution lies in interpreting P τ ( u ) as a density function and showing its equivalence to the density function P ( u ) defined in (2.4). Formally stated:

Theorem 3.2. For u 0 ∉ C ,

lim α → 0 1 μ ( N α ( u 0 ) ) lim τ → 0 ∫ N α ( u 0 ) P τ ( u ) d u = P (u0)

where N α ( u 0 ) is a ball around u 0 of radius α .

Before embarking on the proof, we would like to emphasize that the ordering of the limits and the integral as given in the theorem statement is crucial and cannot be arbitrarily interchanged. To press this point home, we show below that after solving for P τ , the lim τ → 0 P τ does not exist. Hence, the order of the integral followed by the limit τ → 0 cannot be interchanged. Furthermore, when we swap the limits of

which also does not exist. Hence, the theorem statement is valid only for the specified sequence of limits and the integral.

To understand the result in simpler terms, let us reconsider the definition of the scaled Fourier transform given in (3.1). The first exponential

The approximation is increasingly tight as

We wish to compute the integral

at small values of

case (i): We first consider the case where

Lemma 3.3. Fix

Proof. To improve readability, we prove Lemma 3.3 first in the one dimensional setting and separately offer the proof for multiple dimensions.

Let s denote the derivative (1D gradient) of S. The bounded closed interval

Here

It follows that

As

is well-defined. Choose

and

Using the equality

where

which is similar to (3.4).

We would like to add a note on the differentiability of S which we briefly mentioned after Lemma 2.3. The divergence theorem is applied

The additional complication of the d-dimensional proof lies in resolving the geometry of the terms in the second line of (3.6). Here ^{th} integral (out of the n integrals in the summation) in (3.6) is

We then get

Since the cardinality

case (ii): For

where

and the domain

Firstly, note that the the set K contains no stationary points by construction. Secondly, the boundaries of K can be classified into two categories: those that overlap with the sets

To compute G we leverage case (i), which also includes the contribution from the boundary

To evaluate the remaining integrals over

where

for a continuous bounded function

Coupling (3.7), (3.8), and (3.10) yields

where

As

where

and

Observe that the term on the right side of the first line in (3.13) matches the anticipated expression for the density function

Lemma 3.4. [Cross Factor Nullifier Lemma] The integral of the cross term in the second line of (3.13) over the closed region

The proof is given in Appendix A. Combining (3.14) and (3.15) yields

Equation (3.16) demonstrates the equivalence of the cumulative distributions corresponding to the densities

Taking a mild digression from the main theme of this paper, in the next section (Section 4), we build an informal bridge between the commonly used characteristic function formulation for computing densities and our wave function method. The motivation behind this section is merely to provide an intuitive reason behind our Theorem 3.2, where we directly manipulate the power spectrum of

The characteristic function

Here

The inverse Fourier transform of a characteristic function also serves as the density function of the random variable under consideration [

Having set the stage, we can now proceed to highlight the close relationship between the characteristic function formulation of the density and our formulation arising from the power spectrum. For simplicity, we choose to consider a region

Define the following change of variables,

Then

where ^{th} component of

where

The mean value theorem applied to

where

with

Again we would like to drive the following point home. We do not claim that we have formally proved the above approximation. On the contrary, we believe that it might be an onerous task to do so as the mean value theorem point

Furthermore, note that the integral range for

where

By defining

Since both

approaches zero as

as

This form exactly coincides with the expression given in (4.2) obtained through the characteristic function formulation.

The approximations given in (4.6) and (4.8) cannot be proven easily as they involve limits of integration which directly depend on

As we remarked before, the characteristic function and our wave function methods should not be treated as mere reformulations of each other. This distinction is further emphasized when we find our method to be computationally more efficient than the characteristic function approach in the finite sample-set scenario where we estimate the gradient density from N samples of the function S. Given these N sample values

We would like to emphasize that our wave function method for computing the gradient density is very fast and straightforward to implement as it requires computation of a single Fourier transform. We ran multiple simulations on many different types of functions to assess the efficacy of our wave function method. Below we show comparisons with the standard histogramming technique where the functions were sampled on a regular grid between

Observe that the integrals

give the interval measures of the density functions

spectrum of the wave function

the power spectrum of

As mentioned earlier, in [

estimated from a finite, discrete set of samples, instead of assuming that the function is fully described over a compact set

This research work benefited from the support of the AIRBUS Group Corporate Foundation Chair in Mathematics of Complex Systems established in ICTS-TIFR.

This research work benefited from the support of NSF IIS 1743050.

The authors declare no conflicts of interest regarding the publication of this paper.

Gurumoorthy, K.S., Rangarajan, A. and Corring, J. (2019) Gradient Density Estimation in Arbitrary Finite Dimensions Using the Method of Stationary Phase. Advances in Pure Mathematics, 9, 1034-1058. https://doi.org/10.4236/apm.2019.912051

1) Proof of Finiteness Lemma

Proof. We prove the result by contradiction. Observe that

where the linear operator

Since

2) Proof of Neighborhood Lemma

Proof. Observe that the set

Since

3) Proof of Density Lemma

Proof. Since the random variable X is assumed to have a uniform distribution on

For the sake of completeness we explicitly prove the well-known result stated in Integral Lemma 3.1.

4) Proof of Integral Lemma

Proof. Define a function

Let

Letting

As

By noting that

the result follows.

5) Proof of Cross Factor Nullifier Lemma

Proof. Let

Its gradient with respect to

where

Since

where

Here

One of the foremost requirements for Theorem 3.2 to be valid is that the function

Let the location

Stationary points of the second kind occur at locations

This leads us to define the notion of a well-behaved function on the boundary.

Definition: A function S is said to be well-behaved on the boundary provided (B.1) is satisfied only at a finite number of boundary locations for almost all

The definition immediately raises the following questions: 1) Why is the assumption of a well behaved S weak? and 2) Can the well-behaved condition imposed on S be easily satisfied in all practical scenarios? Recall that the finiteness of premise (B.1) entirely depends on the behavior of the function S on the boundary

To streamline our discussion, we consider the special case where the boundary

The boundary condition is best exemplified with a 2D example. Consider a line segment on the boundary