Efficiency Analysis of the Autofocusing Algorithm Based on Orthogonal Transforms

Efficiency of the autofocusing algorithm implementations based on various orthogonal transforms is examined. The algorithm uses the variance of an image acquired by a sensor as a focus function. To compute the estimate of the variance we exploit the equivalence between that estimate and the image orthogonal expansion. Energy consumption of three implementations exploiting either of the following fast orthogonal transforms: the discrete cosine, the Walsh-Hadamard, and the Haar wavelet one, is evaluated and compared. Furthermore, it is conjectured that the computation precision can considerably be reduced if the image is heavily corrupted by the noise, and a simple problem of optimal word bit-length selection with respect to the signal variance is analyzed.


Introduction
We say that the image is sharp (in focus) when it is the most detailed representation of the actual scene seen via a lens.This intuitive observation led to many heuristic algorithms which measure, like for instance (see e.g.[1][2][3]): • the variance of the image, • the sum of squared Laplacian of the image, or • the sum of squared values of the image transform by selected edge-detection algorithms.
In this work we consider the first measure for the following reasons: • it can be shown that the variance of the image produced by the lens is a unimodal function which has a maximum when the image is in focus.Moreover, • the variance can effectively be estimated even in the presence of a random noise.Automatic focusing seems to not only be a desired feature of consumer electronic devices like digital cameras and camcorders, but also an important tool used in security or industrial applications (like surveillance or microscopy; cf.[1,4]).The focusing algorithm whose efficiency we examine is a passive algorithm (that is, it does not require additional equipment) and operates on the data acquired by the image sensor; see e.g.[5].We use a variance of the image data as the focus function and to find its (single) maximum (i.e. to get the sharpest image) we employ the golden section search (GSS) algorithm; see [6].Note that fast yet precise focusing in both surveillance and microscopy is rather cumbersome since we have to deal there with a thin depth-of-focus (DOF) issue (in the former, it comes from application of large aperture (fast) lenses while in the latter, this is a consequence of short distances between a scene and a lens).A thin DOF makes the maximum search problem much harder since it typically implies a flat-tailed focus function with a steeply-sloped peek whose unimodality is easily violated in a noisy environment. 1e begin with a problem statement and a focus function formula.Next, we present the focusing algorithm and three implementations of the variance estimate computation routines based on either: • the discrete cosine, • the Walsh-Hadamard, or • the Haar wavelet orthogonal transforms, respectively.
Then, we experimentally establish the energy efficiency of each implementation using an ARM processor simulator.Finally, we shortly examine the minimum word-length selection as a denoising algorithm in the presence of a thermal (Gaussian) noise.

Autofocusing Algorithm
The proposed algorithm can be considered as the solution to a stochastic approximation problem.The captured scene, the lens system and the image sensor are modelled as follows; see Figure 1 and cf.[5]): 1) The scene is a 2D homogenous second-order stationary process with unknown distribution and unknown correlation function; cf.[9].
2) The lens has a circular aperture system that satisfies the first-order optics laws, and is represented by a centered moving average filter of order proportional to the distance v s − of the sensor from the image plane and to the size of the lens aperture D ; cf.[10].
3) The image sensor acts as an impulse sampler on the lens-produced process.
The autofocusing algorithm simply seeks for the maximum of the focus function, which in our case is the variance of the image produced by the lens for a given scene.The assumptions above allow demonstrating the unimodality property of this focus function, and hence, enable an application of the simple golden-section search algorithm to find the maximum.Since, for a given scene, lens and a sensor, the number of steps performed by the GSS algorithms is the same, it suffices to compare effectiveness of the variance estimation implementations in order to assess the effectiveness of the whole AF algorithm.
We will now present the equivalence between the variance estimate and the orthogonal expansion of the image data.Thanks to this equivalence we can not only compute the estimate (which can clearly be computed directly from a standard variance estimate formula), but we can also interpret the image as a regression function (which can in turn be estimated separately in order to remove the noise from the image).For simplicity we examine the one-dimensional case (which can be justified by the symmetry argument since, by Assumption 2, the lens aperture is circular).

Let
be the vector representing the raw image (i.e. a sample function of the process produced by the lens).The variance of such an image can be estimated by the standard formula .var Also, the vector X can be expanded in a discrete or- By the Parseval's identity, we have that .
Hence, if the first term 1 ϕ of the orthogonal series expansion is a constant function (as it is the case in all three considered series), then we can easily ascertain that the squared value of the first expansion coefficient , equals to second term in the variance estimate (1).As a result, we obtain the equivalent variance estimation formula , var

Orthogonal Transforms
Here we recollect some basic properties of the considered transforms.Each of them has the corresponding matrix representation where A is a square N N × orthogonal matrix (i.e. a matrix of the unit spectral norm, is the vector collecting the transformed image (i.e. the transform coefficients).

DCT Transform Matrix
The matrix of the discrete cosine transform (DCT) consists of the cosine basis functions sampled at the uniform grid.The matrix is orthogonal and with entries; see e.g.[12,13]: Observe that: • The matrix dimension N is an arbitrary natural num- ber (i.e. it is not restricted to powers of two like the remaining transforms (this is because the orthogonal systems of the trigonometric functions remain ortho-gonal on a uniform discrete grid [14,15]).

Walsh-Hadamard Transform Matrix
The matrix of the Walsh-Hadamard (W-H) transform has the following properties; see e.g.[11,16]: • The matrix dimension N is a dyadic natural number (i.e. it is a power of two).• The entries are equal either 1 or 1 − , i.e. there is no multiplication in the transform algorithm.

Haar Transform Matrix
The matrix corresponding to the Haar Wavelet Transform algorithm is somehow unique when compared to the former two.In particular: • The Fast Haar Transform requires merely ( ) N O operations, i.e., it can be computed in a linear time (its lifting version can furthermore be computed in situ).Nevertheless, • Its dimension is an integral power of two (like in the W-H case-in order to preserve orthogonality).• The entries are not integer like in the DCT case (they are integral powers of 2 instead).Example 3: The matrix of the four-point Haar transform has the following entries; see e.g.[12]: .
Note that the Haar transform matrices become sparser (i.e. they have more-and-more zero entries) as their dimensions grow.

Energy Consumption Evaluation
From the formal point of view, all implementations are equivalent, that is, given the same image data, they always yield the same value of the variance estimate.However, different numerical properties of the transforms suggest that the energy consumption of each implementation may vary significantly.
In order to verify this conjecture, we measured experimentally the energy consumptions of the variance calculation.The experiments were run on two sample pictures (Figure 2).The pictures were scaled to three sizes: 32 32× , 64 64× , and 128 128× , and converted to 8-bit grayscale bitmaps.We used the DCT, Walsh-Hadamard and Haar transforms implemented in the FXT library, [17], and compiled them with the standard GCC compiler. 2For our energy consumption assessment we employed the Panalyzer simulator of the ARM processor, which is a popular SimpleScalar simulator augmented with the power models of this processor. 3he calculations were performed using the double precision floating-point numbers.Additionally, in attempt to exploit the specifics of the Walsh-Hadamard transform, we calculated the variance using the fixed-point implementation with the help of a separate integer number routine 4 .In all simulations, we measured the total energy consumption of the processor microarchitecture.
The results are shown in Figure 3. Clearly, the energy consumption does not depend on the particular image and grows with the image size.Moreover: • The implementation based on the Haar transform was the most efficient energy consumption-wise.• Next in order was the implementation based on the DCT transform.• The least efficient was the Walsh-Hadamard transform-based implementation, both in floating-and fixed-point versions.
While it seems to be obvious that the Haar transform implementation has the best efficiency (as the result of the transform's linear computational complexity), the smaller energy consumption of the DCT implementation in relation to the Walsh one is somehow surprising and  requires further study (it can be related with e.g. a better low-level optimization of the DCT transform in the FXT library).In turn, the almost identical results of the fixedand floating-point Walsh implementations suggest that only a very small fraction of energy is consumed by the arithmetic instructions in relation to the whole algorithm.

Word-Length Selection and Denoising
In this section we consider the word-length problem (i.e. the required precision of the data representation) in the presence of noise.The stochastic nature of the noise nature creates random 'fake' local maxima in the focus function estimate in which the GSS algorithm can easily stuck (see however the analysis of the GSS algorithm in [18], where the probability of such an event is shown to vanish with the growing number of pixels).The simplest way to attenuate the influence of the noise is to average multiple pictures taken at each focus distance.This solution however results in a slower and more power consuming focusing routine.In embedded systems, one should therefore prefer more energy efficient, singleimage, approaches based on e.g.nonparametric regression estimation; see [19,20].
Inspired by the one of the most prominent estimation techniques, the wavelet shrinking, we examine here the simple algorithm reducing the noise and based on the word-length selection problem solution; cf.e.g.[21].We assume here that the additive random noise k ε is i.i.d.
and has the Gaussian distribution ( ) Thus, each pixel value can be described as  which guarantees that, given the number of fractional bits is M, the quantization-induced inaccuracy does not exceed (in the MSE sense) the noise-induced inaccuracy.
Remark 1: Since the noise "occupies" the least significant bits which are truncated by the quantization, one can expect that for the selected M the noise will also be (partially) shrinked (this aspect needs however a more careful further examination).Alternatively, one can also consider performing the classic shrinkage algorithm on the transformed image (prior to the variance calculation) in order to remove the noise [19,20].

Final Remarks
In the paper we presented three implementations of the image variance estimate evaluation which is the core and the most computationally demanding part of the autofocusing algorithm.We provided an experimental evidence that the implementation based on the fast Haar transform has a much better (by a wide margin) energy efficiency than the remaining two implementations based on the discrete cosine and the Walsh-Hadamard transforms.Somehow unexpectedly, the experiments revealed that there is no advantage of using the integer number Walsh-Hadamard transform over the cosine one.Finally, having in mind an ASIC implementation of the algorithm, we also proposed the word-selection algorithm which determines the required precision of the image data with respect to the size of the image and to the variance of the noise present in the data.The actual benefit of this algorithm needs however to be verified experimentally.
The Fast Walsh Transform is an in situ algorithm and requires ( )N N O logoperations, i.e., can also be computed in a linearithmic (i.e.quasilinear) time.

Figure 2 .
Figure 2. The two sample images I and II.

Figure 3 .
Figure 3. Energy consumption of the three implementations measured in mJ for the processor clocked at 500 MHz.

⋅αα
n u are the actual (noiseless) raw images.The un-iformly quantized (i.e.finite precision) versions of nx , for M fractional bits, can be represented as denotes the standard floor function.To establish the minimum number of fractional bits M such that the inaccuracy in the final (transformed) image imposed by the quantization error do not exceed the variance error introduced by the noise n ε , we con- sider the following mean squared error (MSE) error of the transformed image are the pixels of the transformed image computed from the noiseless raw image pixels { } n u , { } k α ˆ are the pixels of the transformed image computed in exact arithmetic from the noisy pixels { }, are its quantized versions.For simplicity we will examine the above error for the Walsh-Hadamard transform only.We have Exploiting the orthonormality property of the transform matrix A and the independence of the noise n ε , • The matrix entries are real numbers, that is, in the transform computations their truncated (approximated) values are used.