 Communications and Network, 2013, 5, 292-297 http://dx.doi.org/10.4236/cn.2013.53B2054 Published Online September 2013 (http://www.scirp.org/journal/cn) On the Optimization of Real T ime Performance of Software Defined Radio on Linux OS Zhen Wang, Limin Xiao, Xin Su, Xin Qi, Xibin Xu State Key Laboratory on Microwave and Digital Communications, Tsinghua National Laboratory for Information Science and Technology, Research Institute of Information Technology, Tsinghua University, Beijing, China Email: wangzhen11@mails.tsinghua.edu.cn Received July, 2013 ABSTRACT With the evolution of the communication standards, Software Defined Radio (SDR) is faced with an increasingly im- portant problem to balance more and more complex wireless communication algorithms against relatively limited proc- essing capability of hardware. And, the competition for computing resources exacerbates the problem and increases time-delay of SDR system. This paper presents an integrated optimization method for the real-time performance of SDR on Linux OS (operating system). The method is composed of three parts: real-time scheduling policy which ensures higher priority for SDR tasks, CGROUPS used to manage and redistribute the computing resources, and fine-grade sys- tem timer which makes the process preemption more accurate. Acco rding to the experiments, th e round-trip data trans- fer latency decreases low enough to meet the requirement for TD-SCDMA via the application of the method. Keywords: SDR; Real-time Priority; CGROUP; System Timer 1. Introduction Software Defined Radio, which holds the promise of fully programmable wireless communication systems [1], has shown more and more importance in the develop- ment of wireless communication systems. An SDR based on general-purpose processor (GPP) enables us to dy- namically modify software to realize different commu- nication standards, instead of time-consuming hardware redesign, which consequently reduces the developing time [2]. In comparing with traditional wireless commu- nication system based on DSP or FPGA, an SDR system based on GPP is a more flexible and convenient to de- velop for researchers [3,4], because high-level program- ming language (e.g. C/C++) is generally easier to de- velop and debug than Verilog or VHDL. Although SDR offers many benefits to the develop- ment of wireless communication system, it cannot be ig- nored that the increasing computational expense brought about by the updates of wireless communication protocol, has posed immense challenge to the limited computing resources of hardware. Inevitably, the real-time per- formance of an SDR based on GPP would be affected by this problem. And worse still, when a system is com- posed of many SDR tasks, like C-RAN [5] (the whole function of BTS is designed to implemented on GPP), the competition for hardware capacity from these tasks would be beyond what the multi-core solutions could handle. Without efforts to manage and redistribute the computing resources, the problems mentioned above would make the real-time performance of the SDR sys- tem unguaranteed. As known, in a personal computer, it is an important task for OS to manage and allocate the computing re- sources to different kinds of processes according to their priorities. Similarly, redistributing and rescheduling processes in an SDR system, which enables us to pre- serve enough computing resource to those time-critical tasks, is a main method to improve the real-time per- formance. As a traditional process scheduling tool of Linux ker- nel, the principle and implementation of scheduling pol- icy on Linux OS have been illustrated in many papers and books like Understanding The Linus Kernel [6]. Through applying different scheduling policies offered by Linux kernel to various processes, the tasks with real-time priority will be allocated more CPU time than other tasks and take precedence on execution. Consider- ing the strict demand for latency from the time-critical processes in an SDR system, instead of scheduling all processes equally, adjusting priorities by selecting ap- propriate process-scheduling strategies, will make great contributions to improving the real-time performance without upgrading hardware. Despite that scheduling strategy promoting the real-time performance of the SDR system, it still leaves much to be desired. For example, setting different priori- C opyright © 2013 SciRes. CN
 Z. WANG ET AL. 293 ties to the processes is a relatively imprecise allocation to CPU time. And what’s worse, without constraints to CPU consumption, a process may consume extra and even the whole CPU resources when no other processes could preempt it. There are many scenarios where this excess CPU share can cause unacceptable utilization or latency [7]. For an SDR system, it would introduce scheduling latency when a time-critical process has been woke up to preempt CPU resources against the process currently running. Therefore, CGROUPS (Control Groups), which is a new resource management tool provided by the Linux kernel [8], should be taken into application. By imposing caps on the CPU usage time of other tasks, CGROUPS enables us to spare more CPU bandwidth to the SDR tasks and reduce the frequency of process switches. More specifically, the CPU share can be changed dynamically as needed. As mentioned above, real-time priority guarantees that the time-critical process will occupy the computing re- sources when competing with other tasks; meanwhile, CGROUPS reduces the occurrence of thos e competitions by constraining the CPU usage of other tasks. In addition to that, a high-resolution timer [9] configured in Linux Kernel would minimize scheduling latency via making process switches more immediate; and in consequence, optimize the overall real-time performance of SDR sys- tem. In this paper, real-time priority, CGROUPS and high- resolution timer are utilized in combination. According to the experiments, this integrated method could con- straint the data transfer latency to meet the requirement of 3G standards like TD-SCDMA. The rest of this paper is organized as follows. Section 2 describes the system model used to analyze the data transfer latency of an SDR system. Section 3 proposes the integrated method to optimize the real-time perform- ance of SDR based on GPP. And then, we analyze the latency requirement for TD-SCDMA in section 4. In sec- tion 5, we design and conduct some experiments based on proposed system model to validate the effectiveness of the optimization method. Finally, in section 6, we draw conclusions based on analysis of experiment re- sults. 2. System Model To validate the effectiveness of the integrated method to optimize the real-time performance of SDR systems run- ning on Linux OS, a system model based on GPP is de- signed and implemented. As shown in Figure 1, the system model is composed of three parts: baseband processing board, adapter board and RF front end. We choose PC on Linux OS as the baseband processing platform, which is major responsi- ble for data processing. The adapter board mainly im- plemented by FPGA serves as an important role in con- trolling the RF front end and transferring data between the PC and the RF front end. The RF front end acts as a transceiver. In baseband processing board, our SDR task is com- posed of three main processes. Firstly, the physical-layer process is responsible for the PHY processing like modulation and demodulation. Secondly, the high-layer process is designed to implement the algorithms of Data Link Layer and Network Layer. Finally, the data inter- face process serves as a role of interacting with adapter board through USB 2.0. Besides, to better validate the effectiveness of resource management via use of the method proposed, there still exist some other tasks (the number of tasks is greater than the number of CPU cores of PC), which bring the increase of computing resources consumption. In this model, the latency of data transfer between RF front end and baseband processing board reflects the real-time performance of the SDR system. Further, there are three factors that might have major effects on the latency. Firstly, because of the limited interface band- width of USB 2.0, transfer delay between RF front end and processing board is inevitable. Secondly, baseband processing board would spend a period of time in proc- essing data from RF front end, which is called processing time. Finally, the scheduling latency under Linux OS would be introduced when process switches occur. Tak- ing data interface process as example, when there need read data from RF front end, Linux kernel would spend some time in reallocating computing resources to data interface process from the process currently running on CPU. 3. Integrated Optimization Method As mentioned above, the transfer delay, processing time and scheduling latency limit t he real-time performance of an SDR system. Generally, transfer delay is fixed be- cause the packet size and interface bandwidth have been defined by transfer protocol and hardware. And then, the processing time is mainly determined by the algorithmic complexity and computing capacity. Therefore, a general optimization method should be aimed to redistribute the Figure 1. System model. Copyright © 2013 SciRes. CN
 Z. WANG ET AL. 294 computing resources to reduce the cost of process sched- uling and process switches. Considering such features, three main methods will be taken in combinations, that is, real-time schedule policy for time-critical tasks, constraint of the CPU consump- tion on disturbance tasks via use of CGROUPS, and re- configuration of f ine-grain timer in Linux kernel. 3.1. Real-time Priority On Linux OS, processes are divided into three categories: interactive processes, batch processes, and real-time processes [6]. Meanwhile, because of the features of dif- ferent processes, there exist three scheduling algorithms: SCHED_NORMAL, SCHED_RR, and SCHED_FIFO. The SCHED_NORMAL policy is designed to schedule conventional, time-shared processes. Then, SCHED_RR and SCHED_FIFO are aimed at real-time processes. In order to reduce process response time and avoid of process starvation, the Linux kernel schedules processes based on time sharing technology [10] and allows proc- esses being preempted according to priority order. When a process gets ready to run in CPU, the kernel checks whether its priority is greater than the priority of the cu r- rently running process. If true, the execution of current will be interrupted and the scheduler is invoked to select the process of a higher priority to run. And additionally, the real-time processes enjoy higher priority than any other ordinary processes, so that, a real-time process will not be interrupted by ordinary process unless it has fin- ished executing, while it can preempt other ordinary process if need as shown in Fi gure 2. Considering such feature of process scheduling, we can specify a real-time scheduler for these time-critical processes in order to get the real-time performance im- proved. In an SDR system, data interface process running in baseband processing boar d is designed to interact with RF front end, which has strict requirement for latency; that is, it must read or write data in specified time slot following air interface specifications. Otherwise, if the data transfer is delayed, the speed and quality of data processing will be diminished, as a result, the real-time performance of SDR system will be affected. Figure 2. Process preemption. Therefore, the data interface process should be speci- fied as real-time process, which makes sure that the data interface has a higher priority than other ordinary proc- esses running on the same platform. The data transfer will not be interrupted by other processes, and what’s more, when some data packages need to be read from RF front end, the data interface process will seize the CPU even if an ordinary pro cess is currently running. 3.2. Computing Resource Redistribution CGROUPS is an abbreviation of Control Groups, which provides a mechanism for aggregation and partitioning sets of tasks, and all their future children, into hierarchi- cal groups with specialized behavior [11]. A set of tasks (processes or threads) are associated with a set of pa- rameters for one or more subsystems. A subsystem is a module that makes use of the task grouping facilities provided by CGROUPS to treat groups of tasks in par- ticular ways and redistribute computing resource such as CPU, memory, IO of block device as designed. There are five main subsystems in CGROUPS and the details will be introduced below.(Table 1) The initial goal of CGROUPS is to provide a unified framework for resource management, not only integrat- ing the existing subsystems, but also providing the inter- face to the new subsystems which may be developed in the future. Nowadays, CGOUPS has been taken into ap- plications in a variety of scenarios, especially in some network services such as Taobao-a leading online C2C auction company in China [12 ], where it is used as a tool of redistributing computing resource for large servers and OS-Level virt u al i zat i on [16] . What’s important for SDR systems, is that we can use the cpu subsystem to assign a specific CPU shares to processes running on the same OS and place constraints to CPUS usage of those less important tasks. In cpu sub- system, there are two main configuration options to allo- cate CPU resource: cpu.cfs_period_us and cpu.cfs_ quota_us [13]. Reconfiguration of CPU bandwidth by set certain value (from 1000 to 1000000 microseconds) to cpu.cfs_ quota_us and cpu.cfs_period_us, works immediately and Table 1. Subsystem of CGROUPS and function. Subsystems Function cpu Allocate the cpu occupancy for a set of tasks cpuset Assign cpus and mem to a set of tasks memory Memory resource controller devices Device whitelist controller blkio Set limits to the I O of block device Copyright © 2013 SciRes. CN
 Z. WANG ET AL. 295 efficiently. The cpu.cfs_quota_us specifies a maximum CPU time that a set of tasks could use in a period of time which is set in cpu.cfs_period_time. For example, when the cpu.cfs_quota_us is set to 10000 while the cpu.cfs_ period_us is 100000, tasks in this group will use 10ms (milliseconds) in 100ms at most, which means the usage of CPU is limited to 10 percent. Through establishing constrains to CPU usage via us- ing cpu.cfs_quota_us and cpu.cfs_period_us, we can prevent the over much consumption from some processes and spare more resources to the time-critical processes in our SDR system. Via use of CGROUP, we could better isolate the execution of tasks in SDR system than multi- cores solutions, because the competition for computing resource could not be relieved when the number of tasks is greater than the number of cores. 3.3. Fine-grain Timer in Linux Kernel The passing of time is important to the Linux kernel, because there exist lots of kernel functions which are time-driven except of event-driven. These periodic tasks occur on a fixed schedule, depending on the System Timer which is a programmable piece of hardware that issues an interrupt at a fixed frequency [10]. Then, sys- tem time gets updated and those tasks get performed by interrupt service routine handling for this timer. Indeed, System Timer plays as similar role as the alarm clock of a microwave oven [10], which makes us- ers aware of that the cooking time interval has elapsed, while System Timer reminds computers that one more time interval has elapsed based on some fixed frequency established by the kernel. The frequency of the system timer is programmed on system boot based on a static preprocessor defi ne , HZ. The main benefit of a higher HZ is the greater accu- racy in process preemption, consequently, the scheduling latency decreased, which improve the real-time per- formance of SDR system. As mentioned above, the Sys- tem Timer interrupt is responsible for decrementing the running process’s timeslice count. When the count reaches zero, a flag called NEED_RESCHED is set and the kernel runs the scheduler as soon as possible. Now assume that a given process is running and has 2 ms of its timeslice remaining. In 2 ms, the scheduler should preempt the running process and begin executing a new more important process. Unfortunately, this event does not occur until the next timer interrupt, which might not be in 2 ms. At worst the next timer interrupt might be 1/HZ of a second aware. With Hz = 100, a process can get nearly 10 extra ms to run; by contrast, with Hz = 1000, the extra time would be limited under 1ms. Due to the decrease of latency created by preemption delay, the real time performance of SDR system would get improved by greater accuracy in process preemption. Taking data interface process as example, even if it has a higher priority, the delay of data transfer is still out of control when scheduling latency is introduced. In con- clusion, fine-grain system timer makes these time-critical processes wait less time to seizing the CPU and response more timely . 4. Latency Requirement For TD-SCDMA As a 3G standard, TD -SCDMA has a stricter demand for data transfer latency. In order to calculate the maximum transfer delay that could be tolerated, some detail of physical channel signal format need to be elaborated in advance. The radio frame of 1.28 Mcps TD-SCDMA has a length of 10ms, composed of two 5ms subframes. In each subframe, there are seven traffic time slots and three spe- cial time slots as shown in Figure 3. The 5ms subframe contains 6400 chips. The traffic time slots are 864 chips long. A physical channel is transmitted in a burst, which is transmitted in a particular timeslot within allocated radio frames. Using the subframes structure, TD-SCDMA can oper- ate on both symmetric and asymmetric mode by properly configuring the number of downlink and uplink time slots [15]. Figure 4 takes the H-ARQ [15] as an example of asymmetric mode. The HS-DSCH related ACK/ NACK must be transmitted on the associated HS-SICH in the next but one subframe. The time between the last HS-DSCH and the HS-SICH would be spent in process- ing and uplink data. If we take the first downlink time slot as zero time reference, responses should be ready before 3.45ms, which is the total time of downlink (in- cluding the DwPTS). And 3.45 ms is the threshold of arrival time. In other words, if the uplink data has not been ready before the time for the uplink time slot transmission comes, it means the response failure. In the next section, Figure 3. Subframe struct ure . Subframe #n Last HS- DSC H HS- SIC H Subframe #n+1 Subframe #n-1 Threshold of arrival time (3.45 ms) Zero time reference Baseband processing time + Uplink time Figure 4. H-ARQ. Copyright © 2013 SciRes. CN
 Z. WANG ET AL. Copyright © 2013 SciRes. CN 296 existence of four infinite loops as disturbance programs on PC (PC has two CPU cores). Consequently, the ratio of response failure is 0.7841, on the contrary to 0.00024 without any interference on the same platform. It proves that the existence of resource competition will make sys- tem performance especially latency deteriorated because less computing resources would allocated to data inter- face process, when the number of tasks is greater than that of CPU cores. experiments are proposed to measure the round-trip la- tency and calculate the ratio of laten cy over 3.45 ms, i.e. response failure. 5. Experiment Result and Analysis To validate the effectiveness of the proposed methods, we conduct some experiments based on the system model mentioned in Section 2. In order to measure the round-trip time of the SDR platform, the whole procedure of data transmission be- tween PC and RF front end is illustrated as Figure 5. It begins when the downlink data is received from the air interface and ends when the uplink response is ready to be sent by RF front end. Firstly, Downlink data arrive at PC a little later than the air interface becau se of the delay of USB2.0 transmission and process schedule, which is shown in the third and fourth line of Figure 5. The PC processes the downlink data and then sends the associ- ated response i.e. ACK/NACK as shown in the fifth and sixth line of Figure 5. When data interface process has read the last data packet and is waiting for the next, for avoiding of process starvation, the kernel will allocate CPU to disturbance tasks. But the issue is, when the data interface process is woke up to read data from adapter board, the kernel has to cost a period of time to reallocate CPU to it as ana- lyzed in section 3.3. Therefore, even after we set real-time priority to data interface process, the ratio of round-trip latency over 3.45ms is only reduced to 0.582, which still far more than 0.00024 in ideal environment. Next, a series of experiments are conducted to validate the effectiveness of the proposed methods. We set real- time priority to data interface process, and then, record the response failure ratios in different CPU usage con- straints (i.e. no constraints, 80%, 60%, 40%, 20%, 10%) to disturbance programs by reconfiguring cpu.cfs_ quota_us in cpu subsystem. The value of system timer frequency is set as 128 Hz, 512 Hz, 1024 Hz. As the analysis in section 4, the round-trip latency must be constrain to 3.45ms at least, otherwise, some uplink responses would not be ready in the RF front end before the uplink time slots, i.e. the response fail. In ex- periments, time between the first downlink time slot and the arrival time of the uplink responses, would be re- corded as round-trip latency to calculate the ratio of re- sponse failure. The results are shown in Figure 6. The constraints to CPU usage are reflected in x-axis and the ratios of re- sponse failure are drawn in y-axis; meanwhile, system timer of 128Hz, 512Hz and 1024Hz are represented by In order to quantitatively analyze the influence of the competition for computing resource on performance of SDR system, we run our data interface program with the Figure 5. Procedure of data transmission in experiments.
 Z. WANG ET AL. 297 10 20 30 40 5060 70 80 90 100 10 -4 10 -3 10 -2 10 -1 10 0 ratio of response failure cons trains to C PU usa ge of distruba nce programs (pe rcent) e xpe ri ment re su l t S yste m t i m er 12 8Hz S yste m t i m er 51 2Hz S yste m t i m er 10 24Hz Figure 6. Experiment result. blue curve, green curve and red curve respectively. The System Timer of 1024 Hz performs better than that of 512 Hz and 128 Hz (default value in PC), because that increasing the timer frequency to 1024 Hz lowers the worst-case scheduling delay to just nearly 1 m. Besides, along with more stringent constrains to CPU usage of disturbance programs, the ratio of response failure has decreased obviously, which proves that via using CGROUPS, we can spare more computing resource to our SDR tasks. Better yet, when the CPU consumption of disturbance programs is limited to 10 percent and the frequency of System timer is raised to 1024 Hz from 128 Hz, the round-trip latency will achieve a satisfactor y ratio of response failure, 0.00056, that is magnitude equal to that in environment lack of competition and lower enough to meet the requirement of TD-SCDMA. 6. Conclusions In this paper, we use real-time priority, CGROUPS and high-resolution system timer in co mbination as above, to optimize the real-time performance of SDR system. Taking TD-SCDMA as example, we justify it by ex- periments that the integrated use of these methods can provide a latency guarantee to data transfer to meet the requirement for this 3G system even under the competi- tion for computing resource from other tasks. 7. Acknowledgements This work was supported by State Key Laboratory team building project ”the key technology research of inte- grated wireless communication”, S&T Major Project (2012ZX03003007-004) and National Science Foun- ation of Beijing (41 10001). d REFERENCES [1] K. Tan, J. Zhang, J. Fang, H. Liu, Y. Ye, S. Wang, Y. Zhang, H. Wu, W. Wang and G. M. Voelker. Sora: High Performance Software Radio Using General Purpose Multi-core Processors. In NSDI 2009. [2] S. Mamidi, E. R. Blem, M. J. Schulte, et al., (2005, Sep- tember). Instruction Set Extensions for Software Defined Radio on a Multithreaded Processor. In Proceedings of the 2005 International Conference on Compilers, Archi- tectures and Synthesis for Embedded Systems, pp. 266-273. [3] J. Zhang, K. Tan, S. Xiang, Q. Yin, Q. Luo, Y. He, J. Fang and Y. Zhang, “Experimenting Software Radio with the SORA Platform,” In ACM SIGCOMM Computer Communication Review, Vol. 40, No. 4, 2010, pp. 469-470. doi:10.1145/1851275.1851268 [4] P. Guo, X. Qi, L. Xiao and S. Zhou, “A Novel GPP-based Software-Defined Radio architecture,” In Communica- tions and Networking in China (CHINACOM), 2012, 7th International ICST Conference on, pp. 838-842. [5] C-RAN http://labs.chinamobile.com/cran/ [6] D. P. Bovet and M. Cesati, Understanding the Linux Kernel. O’Reilly Media, 3 Edition. [7] P. Turner, B. B. Rao and N. Rao, “CPU Bandwidth Con- trol for CFS,” In Proceedings of the Ottawa Linux Sym- posium-OLS, Vol. 10, 2010, pp. 245-254. [8] H. Ishii, Fujitsu’s Activities for Improving Linux as Pri- mary OS for PRIMEQUEST. Fujitsu Science Technology Journal, Vol. 47, No. 2, 2011, pp. 239-246. [9] S. T. Dietrich, D. Walker. The evolution of real-time linux. In Proc. 7th Real-Time Linux Workshop, 2005, pp. 3-4. [10] R. Love. Linux Kernel Development. Addison-Wesley, 3 Edition. [11] CGROUPS. http://www.kernel.org/doc/Documentation/cgroups/cgrou ps.txt [12] The Practice of Resource Control Using Cgroup in Tao- Bao main servers. http://wenku.baidu.com/view/19668a5677232f60ddcca11 3.html [13] CPU subsystem. https://access.redhat.com/knowledge/docs/en-US/Red_Ha t_Enteprise_Linux/6/html/Resource_Management_Guide/ sec-cpu.html. [14] C. S. Wong, I. K. T. Tan, R. D. Kumari, J. W. Lam, W. Fun. (2008, August). Fairness and Interactive Perform- ance of O (1) and CFS Linux Kernel Schedulers. In In- formation Technology, ITSim 2008. International Sym- posium on, Vol. 4, pp. 1-8. [15] 3GPP TS 25.221 V9.4.0 “Physical Channels and Map- ping of Transport Channels onto Physical Channels,” November, 2011. [16] M. Rosenblum, The Reincarnation of Virtual Machines. Queue, Vol. 2, No. 5, p. 34. doi:10.1145/1016998.1017000 Copyright © 2013 SciRes. CN
|