Xilinx System Generator ® Based Implementation of a Novel Method of Extraction of Nonstationary Sinusoids

Model based implementation of a novel nonlinear adaptive filter for extraction of time varying sinusoids using Xilinx system generator has been presented in this work. The practicality of this filter model along with its performance makes it one of the foremost candidates to be applied on nonlinear systems for the purpose of estimation and extraction using reconfigurable hardware like FPGA. A design implementation and verification approach has been discussed for more efficient implementation. Timing and power analysis has been performed and the architecture has been optimized for speed and power to perform at higher frequency when integrated on a Xilinx FPGA. The proposed hardware oriented architecture has been successfully implemented and simulated. The simulation results to track a noisy input have also been shown to demonstrate the exceptional performance of the hardware based architecture developed.


Introduction
Field programmable gate array (FPGA) is the fastest growing emerging technology of present day and the need for reconfigurable and compatible design is increasing for system integration in present computationally expensive environments.Adaptive applications and systems are also widely used in the DSP and control systems for unparalleled performance, so there is need to develop FPGA based adaptive algorithms to fulfill future demand.A versatile adaptive filter algorithm which is based upon nonlinear differential equations and tracks the amplitude, frequency and phase of the time varying input sine function [1][2][3] is taken as the base model and its fixed point hardware model has been created and successfully implemented using the schematic design environment of Xilinx System generator (XSG) [4].VHDL\Verilog based programming and solution development is not desirable in most cases as the level of complexity involved is very great and a small mistake in the design can take even days of design time to debug, even if one succeeds in successfully implementing the design at hand it is still a big task to optimize the design to make it compatible with certain area or speed requirements.The Xilinx block set for system generator gives us a Simulink like schematic design environment to work in and create, convert, debug, optimize and implement the DSP based designs easily and quickly onto the desired FPGA device [5].XSG has been successfully utilized in various domains including LMS adaptive filters to design hardware oriented architectures to meet performance demands for systems [6].Figure 1 shows the basic continuous time architecture for the filter, it has exceptional performance in nonlinear applications relevant to the mainstream nonlinear adaptive filter e.g. the extended Kalman filter (EKF) which makes it a suitable candidate for such applications and its simple structure is easy to understand and debug for performance related issues [2].The discretized equations as used in the experimental verification [7] have been taken and implemented on Simulink to create a reference model for the design on system generator.A module by module and block by block implementation was followed to implement this design in system generator during which a number of implementation relevant design problems were faced and successfully solved.The final design was optimized to minimize the latency for critical path to get the results which make the design viable to implement on reconfigurable hardware to support the real-time applications [8][9][10] and other time sensitive systems.

Computer Simulations
The computer simulations are based upon the following set of equations [7] which were implemented in Simulink to observe the tracking performance for the novel method.Equations (1-5) give the estimates for tracked amplitude, tracked frequency, tracked phase, tracked output and estimation error. w

Laboratory Verification
Texas Instruments TM TMS320C6711 floating point DSP was used in the laboratory verification of the adaptive algorithm [7].Equations (1-5) were converted to embedded C and later to DSP assembly using the integrated development environment for DSP.The results were verified for the successful estimation of the desired inputs amplitude and phase.

Simulink Based Implementation
The first step of the design process was to create an equivalent simulation model for the algorithm using the most basic blocks available in Simulink, although this has been done earlier in [7] but as the system generator only supports a discrete sample time Ts in equations (1)(2)(3) where n is a positive integer greater than zero.
It was observed via experimentation that the only way for the discrete model to be successfully implemented was to be implemented for Ts ≤ "0.01".A solution was developed and implemented in the form of a custom discrete integrator in Simulink to simulate the system satisfying Ts and the desired results.

Implementing a Discrete Integrator
As observable in Figure 1 the only component which affected by the Ts is the integrator as it represents as a continuous time function.During its discretization a step size µs was defined having the same value as the desired Ts for the system.Figure 3 shows the custom discrete integrator implemented in Simulink using the simplest blocks e.g. a constant multiplier, delay and a register to store the value.Such blocks are available for design in System generator.The constant multiplier block serves as the desired step size which is a small value of µs ≤ "0.001".

Converting the model to fixed point
Simulink Fixed point tool was used to calculate the min/max values for the model and to purpose fraction lengths.Using these values the floating point model was converted to fixed point model.

Verifying Simulation Results
The fixed point model having the discrete integrator was simulated and its output, amplitude, phase and frequency results were verified.Figure 2 shows the results.Values for parameters 2µ1 = 500, 2µ2 = 8,000 and 2µ3=0.02have been considered.

System Generator Based Implementation
After successful implementation of the Simulink based design for the sampling time of '1' the system generator based design was implemented on equaling basis.Even when it comes to system generator based implementation for simple designs it is much trickier than implementation of the Simulink designs.So a block by block implementation and verification technique was implemented in order to avoid waste of time during debugging of potential problems by eliminating most of their causes at the design step.A mixed (15_15, 15_13, 20_5, 15_6, 15_7 & 14_13) bit fixed point architecture was developed to efficiently allocate resources.

Block by Block Implementation, Verification
Each block of the Simulink based reference model was designed separately for the system generator based model by studying the block behavior for the outputs generated when the specific inputs were applied.The full system implementation can be performed separately by many designers at the same time while cutting short the product development time.This technique also helps to eliminate any design compatibility bugs at the implementation stage.
The technique works by the procedure of one problem at a time and it can be successfully implemented in most of the designs which possess feedback behavior as the case with this design.Figure 6 shows the flowchart for   the technique used.Verification is preformed to check the outputs match those desired by our system.

Calculating Sine and Cosine Function
DDS compiler 4.0 along with output buffer registers has been used in Sin_Cos_Lut mode to calculate sine and cosine for the tracked phase input as it is the most relevant block for generating the outputs we require for the set of inputs we have as our phase input range is -1< Φ[n]< 1.

Parallel Path Balancing
A couple of parallel paths were identified having differ-ent latency and were balanced using a delay element to match the outputs required by the system e.g. the custom discrete integrator being use has a unit delay and the parallel path to the adder has only combinational delay so a unit delay was introduced in the parallel path to balance both paths and get the desired output for the block.All other blocks are readily available in the XSG environment and were used directly with appropriate settings to implement the design.

Optimization
The DDS compiler output for sine and cosine was buffered to avoid the unknown state 'X' from propagating during the initializing phase.Figure 9 show the enable controlled integrator2 to avoid phase jitter during initialization.The phase register was set at the initial value '-0.5'.After the design was successfully implemented in XSG the timing and power analysis for the architecture was performed.Post place and route timing report was generated for the XSG design which showed the max clock frequency supported by the design to be around 40.912 MHz which may not support certain timing critical applications so optimization was performed on the architecture to increase the overall clock speed for the design.The critical path was identified from the timing report studying the overall latency values for path delays and was partially pipelined which decreased the latency for the critical path to increase the operating frequency to 100.482 MHz which is much more compatible with time sensitive real-time systems.igure 5 shows the partially F  pipelined optimized architecture for the algorithm [2] implemented using XSG. Figure 8 shows the histogram for path delay after its critical path was optimized.

Results
The optimized partially pipelined architecture was simu-lated and the results were verified which show the successful implementation of the model.Figure 7 shows internal signal waveforms generated by wave scope for the architecture implemented.Table 1 shows maximum operating frequency and power utilization before and after optimization with partial pipeline in the critical path. he throughput for the design was also calculated and T   shown along with the throughput per slice to serve as the design performance measure.Figure 10 shows power analysis.The designed architecture uses a total power of 0.071(W).The junction temperature is also close to room temperature at 27.0(C).These power and temperature ratings validate the design for usage in portable devices.The resource utilization is shown in Table 1 depicting reasonable usage in Spartan 6 based devices for devel-

Conclusions
The XSG based architecture for the novel adaptive filter was designed and implemented successfully showing promising results.This filter block can be integrated within large systems [11] fulfilling the design requirements for different systems for their XSG based implementation and ultimately their hardware based development for use in real-time user based applications [8], [9], [10].The developed design can serve as a reference model for further improvement in this design or future XSG based development of similar models.

Figure 2 .
Figure 2. Simulation results for tracked amplitude (A[n]), frequency (w[n]) and phase (Φ[n]) along with the test input and tracked output.

Figure 3 .
Figure 3. Simulink block diagram for the discrete integrator.

Figure 4 .
Figure 4. Top level module diagram for Xilinx system generator.

Figure 6 .
Figure 6.Flow chart for block by block implementation and verification.

Figure 7 .
Figure 7. Wave scope view for Xilinx System Generator results.

Figure 11 .
Figure 11.Simulation results for noisy input tracking performance.

Figure 4
represents the test system for the developed XSG block.The tracking validation for the developed hardware based architecture was obtained by tracking a noisy input sinusoid.I76587 shows the simu-lation results for the noisy input and tracked output generated by the filter implemented.Sampled noisy input sinusoid is filtered to obtain the source signal.