Enhanced Frequency Resolution in Data Analysis

We present a numerical study of the resolution power of Padé Approximations to the Z-transform, compared to the Fou-rier transform. As signals are represented as isolated poles of the Padé Approximant to the Z-transform, resolution depends on the relative position of signal poles in the complex plane i.e. not only the difference in frequency (separation in angular position) but also the difference in the decay constant (separation in radial position) contributes to the resolution. The frequency resolution increase reported by other authors is therefore an upper limit: in the case of signals with different decay rates frequency resolution can be further increased.


Introduction
It is known that Padé approximants to the Z-transform of a time series allow "super resolution" of signals in low noise: for instance, when the signal to noise ratio (SNR) is high enough (more than 10 5 ) we can resolve frequencies separated by 4 max 10 f   with less than 100 data points: better than discrete Fourier Transform (DFT) in the same noise conditions.This property has been observed both by P. Barone [1,2] and D. Belkich [3].Up to now no extensive study has been made of this remarkable property of Padé approximants, not only concerning the limits of super resolution but also of the implications of why it is so.
The key point we want to address in the present note is that resolution for Padé approximants is in the complex plane, not in frequency alone as is in the case of DFT.This allows not only even better frequency resolutions of those reported above, when the decay rates of neighbouring peaks are different, but also resolution of wide overlapping peaks.
Given a data series 0 1 2 , , , , N s s s s  , its "Z-transform" is: DFT is clearly the "Z-transform" calculated on the roots of unity.This means that DFT has two intrinsic limitations.

N 
1) There is a resolution limit, due to the time of observation: Let T be the total sampling time and N the number of sampled points; the time step will be T N   ; the maximum detectable frequency will therefore be max 1 2 2 f N T    and the frequency step (resolution) 1 1  .Scaling to the unit circle where the roots of unity reside, max f becomes 1 2 the frequency step is therefore max . .
This is the well known Nyquist limit [4].The Padé approximation to the "Z-transform" is supperior to it through nonlinearity: because of its discreteness, the average density of peaks is the same as for the DFT; but while the local density in DFT is the same as the average density, it can be very different when a Padé analysis is used, since its peaks are not bound to fall on the regular lattice of the roots of unity.This is the source of "super resolution" for constant amplitude signals reported in the papers mentioned above.
2) Damped signals have a natural width on the unit circle.
The peak for a damped signal is a Lorentzian of the form where I is the height of the peak,  is the decay factor and 2 f    .The half height width is therefore This means that, when using the Fourier Transform, increasing numerical resolution (sampling rate) beyond the natural linewidth of the signals involved is not much use in separating neighbouring peaks.
When using the Padé Approximant approach, things are very different: signals are represented by poles in the complex plane and all poles are by definition singularities of Z(z) and therefore sharp.The basic point is that poles corresponding to damped signals are off the unit circle and are sharp only if we look at them in the complex plane; if we only look on the unit circle, as it is the case when using DFT, we do not see the singularity itself but the profile of its tail as the intersection of Z(z) with the unit circle.
This has two consequences: 1) as we are looking at the poles themselves, there are no tails of strong wide peaks that can hide nearby peaks.
2) Since what counts is the distance of neighbouring peaks in the complex plane, damped signals can be even closer in frequency than reported above if their damping constants (radial positions) are different.

Summary of the Method
Given a data series 0 1 2 , , , , N s s s s  , we build its generating function, or "Z-transform" Equation 1 and construct its diagonal Padé Approximant, i.e. a rational function with the numerator and denominator having the same degree and whose Taylor expansion equals the Z-transform up to order .The aim is to try and predict the "Z-transform" for .The choice of a diagonal rational approximation is the best for both signal and noise because of the following considerations.
For a finite ensemble of damped oscillators, the Ztransform tends, when the number of data points goes to infinity, to a   n n rational function in z, with 2 n N  equal to twice the number of oscillators [5].A diagonal Padé Approximant therefore has the right structure for the signal.
For pure noise, the organization of poles and zeros in Froissart doublets [6][7][8][9] is again best approximated by a   n n rational function in z .Most data analysts stopped using Padé Approximants because of instabilities due to the fact that for a pure signal, singularities appear when one tries to construct a Padé Approximant of order higher than   n n .The problem is conveniently solved by the presence of noise whose Froissart doublets act as additional "signals".
To numerically calculate poles and zeros of the Padé Approximant of the Z-transform of a finite time series, we build directly from the time series two tridiagonal Hilbert space operators, called J-Matrices, one for the numerator and one for the denominator.The eigenvalues of these matrices readily provide the desired zeros and poles.Details of our method can be found in [5].Knowledge of the positions of all poles and zeros also gives us the residues for all poles and therefore the amplitudes and phases of the signal oscillations.

Results and Sensitivity of the Method
When dealing with resolution, key parameters for both Padé Approximant and Fourier Transform are: 1) the angular distance of the two signal poles, i.e. the frequency difference scaled to the maximum detectable frequency: this is the resolution itself.
2) The radial position  of the two signal poles, i.e. the decay factor ln     of each of the signals.
3) The relative amplitude of the two signals.
4) The signal to noise ratio of the smaller of the two signals, or equivalently, its precision in number of digits.
5) The number of data points.These are the factors that in practice can limit resolution, which assuming infinite data precision and arbitrarily small noise has no limitation for Padé Approximants.
We now pass to look at a few cases that can help clarify how parameters 2, 3, 4 and 5 affect resolution.

Two Equal Peaks on the Unit Circle
As a first example, let us consider two peaks on the unit circle separated by along the circumference, i.e. two constant amplitude waves whose frequency difference is .Using the DFT to resolve them we need a frequency step at least half the distance, i.e. , which means a number of data points .
Figure 1 shows what can be seen with 256 N  data points.By increasing N to (Figure 2) a decent resolution can be obtained.

N 
Assuming the noise average amplitude to be 10 -5 times the amplitude of each of the two signals, and assuming the data to be in double precision, the diagonal Padé Approximant instead needs only 40 data points to resolve the two signals with reasonable accuracy; a reduction by two orders of magnitude.Figure 3 shows the positions and residues of the reconstructed signal poles for 8 different realizations of the noise: there is some spread in position and amplitude (pole residue) which completely disappears by doubling the number of data points, but the 16 poles are clearly grouped around the positions of the two input poles marked by large red dots.8 zeros fall between the two groups of poles.The resolution transition between 1 and 2 signals takes place as follows.For low only 8 poles with sizable residues are visible, one for each noise realization, clustered halfway between the positions of the two signal poles; Figure 4 shows the  case for .The noise doublets are spread around the unit circle away from the signal poles; we discussed this repulsion in [10].By increasing , we see that 8 doublets, one for each data sequence, approach the signal poles, see Figure 5.

N
Close to the signal poles the doublets split while the central cluster of 8 poles also breaks up; finally the poles regroup in the two clusters visible in Figure 3 with the zeros halfway between them.Figure 6 shows this sequence of events.Already for all 16 poles are visible, even if the two clusters visible in Figure 3 are not yet fully formed: see Figure 7.

N 
The case we have shown is that of equal phases of the two signals; for different phases the zeros are on the straight line perpendicular to the line connecting the two poles and crossing it halfway between the two poles.Figure 8 shows the case when the signal residues have    and phase.In this case, we have early indications of a second pole since even with , we can see all the 16 poles: as in Figure 10.

Two Unequal Peaks on the Unit Circle
If the residues of the two poles are unequal, there is an obvious migration of the intervening zero from equidistant to the two poles toward the weaker of the two poles.When the signals also have different phases, the zero is on a circle of radius Resolution in this case depends on the SNR of the smaller of the two signals only: increasing the residue of the larger of the two poles does not alter the spread of the poles of the weaker one.
Figure 11 shows an example where only the residue of larger of the two signals is increased while all the other parameters are kept fixed.We keep the same noise realization, so as to do not move the poles of the weaker of the two signals.Two effects are clearly visible: the reduction of the spread of the poles of the stronger of the two signals and the migration of the zeros of toward the weaker of the two signals.

Two Equal Peaks off the Unit Circle
The case of signal poles on the unit circle is a special one: each new data point gives information with the same precision (assuming a constant noise level).Increasing the number of data points (at constant sampling rate) will therefore always improve resolution, even if more and more slowly.If it is possible to increase the sampling rate at will, then-for any given decay time-the signal poles can be moved as close as we want to the unit circle and we are back to the unit circle case.
If instead the sampling rate is limited, then-for any given distance between signal poles-we'll have to search for the optimal number of data points as a function not only of the SNR but also of the data poles distance from the unit circle.
Signal poles off the unit circle correspond to damped signals; again assuming constant noise amplitude (and data precision) each new point will have lower and lower precision.One might therefore expect (for a given noise level and data precision) the resolution to first increase and then decrease when increasing the number of data points.We do not have evidence of this kind of behaviour.
What we instead see is: 1) Compared to the unit circle case, a very slow resolution increase with the number of points: for  , noise has to be reduced from 3) The relevant distance is not the frequency one, but the one in the complex plane.For example, for , two peaks, radial position ρ = 0.950, angle φ = 1.000, separation , and N = 300, there is no relevant difference in the spread of the signal poles

Four Unequal Peaks
To show that the presence of more signals does not alter the above picture, we now present results for a tight cluster of 4 poles with different residues and decay rates.
Figure 15 shows poles and zeros for an example where the four signal poles are at (ρ, φ) = (0.95, 1.00), (0.90, 1.01), (0.85, 1.02), (0.80, 1.05) with residues  respectively.Noise amplitude is unity.We use 8 samples of 300 points each.The picture is quite clear (large red dots indicate the positions of the signal poles): only one reconstructed pole appears to fall on the signal pole in the foreground (the one with the longer decay time); in effect it's 8 poles superimposed, as they are almost identical.When we look at the position of poles and zeros (Figure 16), we see that again zeros appear between the signal poles, as the phases of the residues of the poles are equal.
In this case we did not plot the residues of the poles; to make the picture more evident and distinguish signal poles (poles from different data sequences form clusters) L. PEROTTI ET AL.Extending the samples to 8192 points does not reveal any additional structure as can be seen in Figure 17(b): here the vertical scale is logarithmic and only the region around the center of the peak has been plotted to better check for the presence of structures.
Of course, the DFT performance is also somehow improved but only two of the four peaks are now clearly visible in Figure 19.

Conclusions
We have thus extended and generalized the remarks previously made in the literature [1][2][3] about the superresolution properties of the Padé Approximations to the Z-transform of a signal, stressing the point that resolution needs to be considered in the complex plane and not only in the frequency domain.We also investigated the effect   of noise, or equivalently of the number of significant digits of the input data.
In passing, let us note that super resolution is not unique to Padé approximants to the Z-transform: see e.g.[11] and references therein.The advantages of Padé approximants are that 1) they only require knowledge that the spectrum is made up of a finite number of damped oscillators; 2) they are stable with respect to the presence of small amounts of noise [5,10].

Figure 1 .Figure 2 .
Figure 1.FFT for a signal corresponding to two peaks on the unit circle distant using 256 data points.The noise average amplitude is 10 −5 times the amplitude of each of the two signals.

Figure 3 .
Figure 3. Residues of the poles of the Padé Approximant to the Z-transform of the same signal as in Figure 1 using 40 data points.Large red dots indicate the position of the signal poles.

Figure 4 .Figure 5 .Figure 6 .
Figure 4. Residues of the poles of the Padé Approximant to the Z-transform of the same signal as in Figure 1 using only 10 data points.Large red dots indicate the position of the signal poles.opposite sign: no zero is present near the signal poles as in Figure 8(a); all the zeros are clustered at the origin as in Figure 8(b).8

Figure 7 .Figure 8 .
Figure 7. Residues of the poles of the Padé Approximant to the Z-transform of the same signal as in Figure 1 using 30 data points.Large red dots indicate the position of the signal poles.

Figure 9 .
Figure 9. Poles (black dots) and zeros (red dots) of the Padé Approximant to the Z-transform of the same signal as in Figure 1 using 30 data points; the noise average amplitude is now reduced to 10 −7 times the amplitude of each of the two signals.Crosses inscribed in circles indicate the position of the signal poles.

2 k
from their middle point, where k is the ratio of the magnitudes of the two poles.

Figure 10 .
Figure 10.Residues of the poles of the Padé Approximant to the Z-transform of the same signal as in Figure 9 using 10 data points.Large red dots indicate the position of the signal poles.
decrease of resolution when moving off the unit circle: in the case of two peaks, angle 1 comparable resolution with 300 data points: see Figure13.

Figure 11 .
Figure 11.Poles (black dots) and zeros (red dots) of the Padé Approximant to the Z-transform of two signals on the unit circle distant using 50 data points.The noise average amplitude is 10 −5 times the amplitude of the residue of the weaker pole.(a) Equal residues; (b) Second pole twice as big as the first one; (c) Second pole 100 times bigger than the first one; Crosses inscribed in circles indicate the position of the signal poles.

Figure 12 .
Figure 12.Residues of the poles of the Padé Approximant to the Z-transform of the signal generated by 2 poles in (ρ, φ) = (0.950, 1.000) and (0.950, 1.004) with residues r = 10 5 .Noise amplitude is unitary.We use 8 data samples of (a) 100 and (b) 200 points each.Large red dots indicate the position of the signal poles.between the case of two poles having the same radial position (decay rate) and different angles (frequencies), Figure 14(a) and the case where the two poles have different radial positions and the same angle, Figure 14(b).

Figure 16 .
Figure 16.Poles (black dots) and zeros (red dots) of the Padé Approximant to the Z-transform of the same signal as in Figure 15.We use the 163 combinations of 4 or more of the 8 samples available, each consisting of 300 data points.Crosses inscribed in circles mark the positions of the signal poles.from noise ones (which do not form clusters), we took advantage of the nonlinearity of the Padé Approximants and calculated them for a number of linear combinations of the available data sequences [10], as in Figure where we used the 163 combinations of 4 or more of the 8 samples available.None of this structure is visible in the DFT. Figure 17(a) shows the relevant section of the DFT over points: the red lines mark the four signals but only a single wide peak is visible.Extending the samples to 8192 points does not reveal any additional structure as can be seen in Figure17(b): here the vertical scale is logarithmic and only the region around the center of the peak has been plotted to better check for the presence of structures.Figures18 and 19show a less extreme case where the average separation of the poles is increased so that the four signal poles are at (ρ, φ) = (0.95, 0.85), (0.90, 0.90), (0.95, 1.00), (0.90, 1.05) with residues r = 10 3 , 5 × 10 3 , 10 3 , 5 × 10 3 respectively.Again, noise amplitude is unitary and we use 8 samples of 300 points each.The residue picture, Figure18, is very clear.Of course, the DFT performance is also somehow improved but only two of the four peaks are now clearly visible in Figure19.

Figure 17 .
Figure 17.FFT for the same signal as in Figure 15 using (a) 256 data points and (b) 8192 data points.

Figure 19 .
Figure 19.FFT for the same signal as in Figure 18 using 256 data points.