Area and Timing Estimation in Register Files Using Neural Networks

The increase in issue width and instructions window size in modern processors demand an increase in the size of the register files, as well as an increase in the number of ports. Bigger register files implies an increase in power consumed by these units as well as longer access delays. Models that assist in estimating the size of the register file, and its timing early in the design cycle are critical to the time-budget allocated to a processor design and to its performance. In this work, we discuss a Radial Base Function (RBF) Artificial Neural Network (ANN) model for the prediction of time and area for standard cell register files designed using optimized Synopsys Design Ware components and an UMC130 nm library. The ANN model predictions were compared against experimental results (obtained using detailed simulation) and a nonlinear regression-based model, and it is observed that the ANN model is very accurate and outperformed the non-linear model in several statistical parameters. Using the trained ANN model, a parametric study was carried out to study the effect of the number of words in the file (D), the number of bit in one word (W) and the total number of read and write ports (P) on the latency and area of standard cell register files.


Introduction
The access time, energy, and area of a register file are critical factors to the performance of modern processors.The access time and size of register files in wide-issue processors often play a critical role in determining cycle time.This is because such files need to be large to support multiple in-flight instructions, and multiported to avoid stalling the wide-issue.Large sized multiport architectures of register files often lead to significant increase in the processor's power consumption.For example, in the Alpha 21,464, the 512-entry 16-read and 8write (16-r/8-w) ports register file consumed more power and was larger than the 64 KB primary caches.To reduce the cycle time impact, it was implemented as two 8-r/8-w split register files [1,2].
Register files are heavily-ported RAM structures.A processor capable of issuing eight integer instructions each cycle may need an integer register file with sixteen read ports (corresponding to two source operands per instruction), and eight write ports.It was reported in [3] that the access time for an 80-entry 24-ported register file can exceed 1.5 ns at 0.18 micron technology, potentially being on critical paths determining the cycle time.
Although, the adverse delays effects can be alleviated by pipelining, this complicates the bypass logic instead.
In addition, having a deep pipeline increases the branch misprediction penalty, lowering IPC or instructions completed per cycle.Therefore, it is difficult to remove the adverse effect of a large register file completely and it is important to optimize the register file size without performance degradation [4].
The access time of a register file consists of two distinct components: the wire propagation delay and the fan-in/fan-out delay.Register files typically contain long word-lines and bit-lines, which can take a long time to propagate a signal across their length.Bigger register file and an increased number of ports result in a taller register file layout, which translates to longer word-lines and bitlines [5], thereby increasing wire propagation delay.Also, wire delays do not at all scale with the silicon technology improvements.Thus as register files grow in size, with faster transistors (smaller feature sizes), it only exacerbates their delay problem.A circuit diagram for a three ports register file is shown in Figure 1.
Additionally, the physical dimensions of a register file play a very important role in determining its power consumption.They influence the power consumption in more than one way: 1) they determine the length of the wires in the file, hence directly affects the power consumption by determining the capacitance of the nodes, nd 2) they impose pipelining constraints, indirectly af-a fecting power by introducing additional power consuming nodes.Therefore, it is critical to have a good model that assists designers in estimating the physical dimensions of these files [7].
Models that can be used to evaluate architectural alternatives in register file design, and assist in making informed decisions prior to the back-end design phase are essential to realizing efficient designs in terms of area, delay, and power.
In recent years, there has been a great advancement in the field of ANN (Artificial Neural Networks), both from theoretical and applications points of view.ANNs have been used in classification, pattern matching, pattern recognition, optimization and control-related problems.In electrical engineering, neural networks have been used to solve a wide variety of VLSI related problems [8][9][10][11].A neural network (NN) approach for modeling the time characteristics of fundamental gates of digital integrated circuits that include inverter, NAND, NOR, and XOR gates is discussed in [8].The modeling approach presented is technology independent, fast, and accurate, which makes it suitable for circuit simulators.The application of an artificial neural network (ANN) to the study of the nanoscale CMOS circuits is presented in [9].A novel method of testing analog VLSI circuits, using wavelet transform for analog circuit response analysis and artificial neural networks (ANN) for fault detection is proposed in [10].Power consumption using neural network of analog components at the system level is discussed in [11].The proposed method provides estimation of the instantaneous power consumption of analog blocks.
In this work, we propose the use of neural networks to model timing and area for standard cell based register files designed using 130 nm technology.Three parameters that influence the power consumed by a register file, namely, the number of words in the register file (Depth), the number of bits in one word (Width), and the total number of read and write ports (Port) are used as input to the ANN.The output parameters of the ANN are delay and area estimates for the perceived design.

Background
Praveen et al. [12], used low level simulation that takes into account the layout details as well as detailed transis-tor characterization provided by a standard cell library to collect data on the size and delays exhibited by various structures of register files.They used optimized Synopsys Design Ware components from the UMC130 nm library to design various register files structures.Layouts were generated for register files with a varying number of ports ranging from 3 to 12, a depth that varies from 4 to 64 words, and a width that varies from 8 to 128 bits.All these combinations of register files were designed; patristic capacitances in the routing wires and gate capacitances of each transistor were extracted from the layouts.The extracted netlist was then simulated using ModelSim.After completing over 100 register file design for the 130 nm technology node, the timing and area of each design were tabulated.Curve fitting was performed on each variable using register file depth, width, number of ports, as well as the activity factor as independent input variables.For the designs it is assumed that each of the ports of the register file is driving a load of F04.Equations ( 1) and ( 2) below are the derived model equations, where Area and Timing are the subjects of the two equations respectively; the authors in [12] referred to it as the Empire Model.For a complete description of the steps taken to arrive to this model, readers are referred to [12].

 
In the equations above: D represents the number of words in the file, W represents the number of bit in one word, P represents the total number of ports (read and write).To validate the curve-fitted formulae described by Equations ( 1) and ( 2), Praveen et al. in [12], compared them against results from the actual implementations.It is reported that the models exhibit on average about 10% error when compared to the values obtained using detailed simulation.

Neural Network Model and Architecture
The field of Artificial Neural Networks is one of the main branches of artificial intelligence that found many applications in several engineering disciplines.ANNs are processing elements that are capable of learning relationships between input and output and they can be used for classification, prediction, clustering, and function approximation, among others.Several neural network architectures with different learning algorithms such as backpropagation were used over the years.In general, an ANN consists of massive parallel computational processing elements (neurons) that are connected with weighted connections and have learning capability that simulates the behavior of a brain [13,14].The network weights and the network threshold values are initially set to random values and new values of the network weights and bias values are computed during the network training phase.The neurons output are calculated using Equation (3) below: where y i is the output of the neuron i, x j are the input of j neurons of the previous layer; value, w ij is the neuron weights, b j is the bias for modeling the threshold; and F is the transfer or activation function [13,14].The transfer function also known as the processing element is the portion of the neural network where all the computing is performed.The activation function maps the input domain (infinite) to an output domain (finite).The ANN error (E) for a given training pattern i is given by Equation (4): where j O i is the output and j T is the target.For a thorough discussion of neural network theory and applications readers are referred to [13].
The Radial Basis Function (RBF) ANN together with the Gaussian activation function, and the Multi-Layer Perceptron (MLP) together with the hyperbolic tangent (tanh) activation function are among the most widely used feed-forward universal approximators.In this study a hybrid of these two universal approximators is used.Specifically, a RBF ANN topology with one additional hidden layer and 15 neurons (processing elements) in first hidden layer, and four processing element in the second hidden layer are used.The RBF neural network has a Gaussian activation function in the first hidden layer while the additional hidden layer has a linear hyperbolic tangent (linear tanh) activation function and the output layer has a bias axon activation function as shown in Figure 2. The performance of this combination of activation functions for the data set used in this work proved to outperform the standard RBF or standard MLP, when used separately.
As depicted in Figure 2, the neural network architecture used in this study, has one input layer, two hidden layers and one output layer.The input layer consists of hree nodes, mainly, the number of words in the register t Copyright © 2012 SciRes.CS file depth (D), the number of bits in one word width (W), and the total number of read and write ports (P).The output layer of the ANN consists of two nodes which are the time and the area estimates.
To train the NN, data collected from details simulation runs in [12] is divided into two categories, namely, the training data set and, the testing data set.For both sets the maximum Depth value used is 64 registers per file, the minimum is 4; the maximum width used is 64 bits and the minimum is 8, while for the ports parameter the maximum number of ports is 12 ports, and the minimum is 3 ports.For the training data set, the maximum timing computed is 7.55 nano-second (ns) and the minimum is 1.11 ns, while for the test data set, the maximum timing within the used set is 4.92 ns, and the minimum is

Results and Discussions
The ANN model was trained using 60 data sets and for verification the trained ANN model is tested next using 20 randomly selected testing data sets.Parameters of the 20 test data sets were also used to predict the time and area using the Empire model.Tables 1 and 2 show the performance indicators of the 20 testing samples.As shown  in Tables 1 and 2, the Normalized Mean Square Error (NMSE) is 0.3977 and 0.0261 and the correlation co-efficient (r) is 0.7983 and 0.9872 for time and area, respectively.This indicates that the measured and the ANN predicted values correlate very well for the area and to a lesser extent for the time.The performance of the Empire model is slightly better than the performance of the ANN in all performance criteria in predicting time, however, the ANN model outperformed the Empire model by far in all performance criteria in predicting area estimates.
Table 3 shows the prediction and accuracy of the ANN model and the Empire model based on the test data set as compared to the measured values of time.In column 1, the case number specifies the depth (D), width (W), and number of ports (P) for each design tested.It is observed that 55% of the ANN model predictions of the test data are within 10% or less of the measured values of time compared to 50% of Empire model predictions of the test data are within 10% of the measured values of time.Furthermore, 80% of the ANN predictions of the test data are within 20% of the measured values of the time while 90% of Empire model predictions of the test data are within 20% of the measured values of time.It is clear that the Empire predictions of time are slightly better than the ANN model prediction which corroborate with the results from the performance criteria presented earlier.
Table 4 shows the prediction and accuracy of the ANN model and the Empire model based on the test data set as compared to the measured values of area.It is observed that 45% of ANN model predictions are within   1.

Parametric Study
To further compare the performance of the Empire model and the ANN model in predicting the time and the area, we varied the input parameters (width, ports, and depth) and computed the resulting outputs for 6 designs.From Figures 4(c) and (d), it is clear ANN model predictions are fairly accurate when the number of ports is varied with a fixed depth and width.Figure 3(b) shows that the ANN model when the width is increased with the depth and ports parameters fixed has underestimated the time specially with wider designs.Similarly, when the depth is varied while keeping the width and ports fixed (Figures 4(e) and (f)), the ANN predications were relatively above and below the experimental values in few cases.
In the instances selected for area comparison (Figure 6), both models performed relatively well and the predicted areas were close to the experimental values obtained from detailed simulation.However, overall and as the statistical results of Table 2 indicate, the ANN model has outperformed the Empire in area prediction.
From the aforementioned analysis of results and validation of the ANN model, it is evident that the proposed ANN model can be used to provide designers with representative estimates of the time and area of a perceived register file design before committing to silicon.The time and the area estimates for all the register file designs used in this study with 130 nm technology and a supply voltage of 1.2 V are shown in Figures 6 and 7 respectively.

Conclusions
The continued trend in microprocessors design towards wider instruction issue and large instruction windows implies register files will have to be designed with large sizes and a large number of read/write ports.Consequently, this will lead to additional power consumption y these large-sized files and a noticeable impact on cy-b
1.81 ns.The areas range is 721,383 μm 2 to 2512 μm 2 for the training set, and 164,590 μm 2 to 4902 μm 2 for the test set.Initial random values are used for the weights of the neural network and different learning rates (step sizes) were used for the different layers of the RBF neural network.The learning rate used for the first and second hidden layers is 1.0 and for the output layer is 0.1.A momentum factor of 0.7 was used for the model all through with a back-propagation learning algorithm.The total number of data items used for training the neural network is 60 and the number of data items used for testing the neural network is 20.The neural network was trained four times with 2000 epochs in each training cycle and the average performance was taken.The computed aver-age Normalized Mean Square Error (NMSE) for the training data was 0.00494 with a standard deviation of 0.000614.Figure 3 shows the convergence rate of the four training runs.There is a sharp decrease in the NMSE during the first 15 epochs.As the number of epoch increases, the MSE remains almost constant.

Figure 3 .
Figure 3. Training NMSE for the four runs of ANN models.

Fig- ures 4
and 5 depict comparative plots showing the predictions of time and area respectively for varied combinations of parameters.

Figure 4 .
Figure 4. Comparison of time for selected register files.

Figure 5 .
Figure 5.Comparison of area for selected register files.cletime.Therefore, models and tools that allow designers to predict the area and the timing of a given design prior to committing to silicon are of great benefit to microprocessors designers.Evaluating architectural tradeoffs early in the design cycle provides designers with insight into the performance of a design, and shortens the time-to-market window.In this paper, we proposed a novel neural network

Figure 6 .
Figure 6.ANN model for time for all ports.

Figure 7 .
Figure 7. ANN model for area for all ports.