Performance Analysis of Accelerator Architectures and Programming Models for Parareal Algorithm Solutions of Ordinary Differential Equations

DOI: 10.4236/jcc.2021.92003   PDF   HTML   XML   31 Downloads   108 Views  

Abstract

Increasing needs for the study of complex dynamical systems require computing solutions of a large number of ordinary and partial differential time-dependent equations in near real-time. Numerical integration algorithms, which are computationally expensive and inherently sequential, are typically used to compute solutions of ordinary and partial differential time-dependent equations. This presents challenges to study complex dynamical systems in near real-time. This paper examines the challenges of computing solutions of ordinary differential time-dependent equations using the Parareal algorithm belonging to the class of parallel-in-time algorithms on various high-performance computing accelerator-based architectures and associated programming models. The paper presents the code refactoring steps and performance analysis of the Parareal algorithm on two accelerator computing architectures: the Intel Xeon Phi CPU and Graphics Processing Unit many-core architectures, and with OpenMP, OpenACC, and CUDA programming models. The speedup and scaling performance analysis are used to demonstrate the suitability of the Parareal to compute the solutions of a single ordinary differential time-dependent equation and a family of interdependent ordinary differential time-dependent. The speedup, weak and strong scaling results demonstrate the suitability of Graphical Processing Units with the CUDA programming model as the most efficient accelerator for computing solutions of ordinary differential time-dependent equations using parallel-in-time algorithms. Considering the time and effort required to refactor the code for execution on the accelerator architectures, the Graphical Processing Units with the OpenACC programming model is the most efficient accelerator for computing solutions of ordinary differential time-dependent equations using parallel-in-time algorithms.

Share and Cite:

Lakshmiranganatha, S. and Muknahallipatna, S. (2021) Performance Analysis of Accelerator Architectures and Programming Models for Parareal Algorithm Solutions of Ordinary Differential Equations. Journal of Computer and Communications, 9, 29-56. doi: 10.4236/jcc.2021.92003.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Alligood, K.T., Sauer, T.D. and Yorke, J.A. (1996) Fractals. In: Alligood, K.T., Sauer, T. and Yorke, J., Eds., Chaos: An Introduction to Dynamical Systems, Springer, Berlin, 149-191.
https://doi.org/10.1007/978-3-642-59281-2_4
[2] Kantz, H. and Schreiber, T. (2004) Nonlinear Time Series Analysis (Vol. 7). Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511755798
[3] Weather Forecasts.
https://markets.businessinsider.com/news/stocks/ibm-makes-higher-quality-weather-forecasts-available-worldwide-1028689623
[4] Koradi, R., Billeter, M. and Güntert, P. (2000) Point-Centered Domain Decomposition for Parallel Molecular Dynamics Simulation. Computer Physics Communications, 124, 139-147.
https://doi.org/10.1016/S0010-4655(99)00436-1
[5] Xue, W., Shu, J. and Zheng, W. (2004) Parallel Transient Stability Simulation for the National Power Grid of China. In: International Symposium on Parallel and Distributed Processing and Applications, Springer, Berlin, 765-776.
https://doi.org/10.1007/978-3-540-30566-8_89
[6] Esmaeili, S. and Kouhsari, S.M. (2007) A Distributed Simulation Based Approach for Detailed and Decentralized Power System Transient Stability Analysis. Electric Power Systems Research, 77, 673-684.
https://doi.org/10.1016/j.epsr.2006.06.008
[7] Weiss, R.M. and Shragge, J. (2013) Solving 3D Anisotropic Elastic Wave Equations on Parallel GPU Devices. Geophysics, 78, F7-F15.
https://doi.org/10.1190/geo2012-0063.1
[8] Baffico, L., Bernard, S., Maday, Y., Turinici, G. and Zérah, G. (2002) Parallel-in-Time Molecular-Dynamics Simulations. Physical Review E, 66, Article ID: 057701. https://doi.org/10.1103/PhysRevE.66.057701
[9] Staff, G.A. and Rønquist, E.M. (2005) Stability of the Parareal Algorithm. In: Domain Decomposition Methods in Science and Engineering, Springer, Berlin, 449-456.
https://doi.org/10.1007/3-540-26825-1_46
[10] Newnes. Nievergelt, J. (1964) Parallel Methods for Integrating Ordinary Differential Equations. Communications of the ACM, 7, 731-733.
https://doi.org/10.1145/355588.365137
[11] Miranker, W.L. and Liniger, W. (1967) Parallel Methods for the Numerical Integration of Ordinary Differential Equations. Mathematics of Computation, 21, 303-320.
https://doi.org/10.1090/S0025-5718-1967-0223106-8
[12] Burmeister, J. and Horton, G. (1991) Time-Parallel Multigrid Solution of the Navier-Stokes Equations. In: Multigrid Methods III, Birkhäuser, Basel, 155-166.
https://doi.org/10.1007/978-3-0348-5712-3_10
[13] Horton, G. (1992) The Time-Parallel Multigrid Method. Communications in Applied Numerical Methods, 8, 585-595.
https://doi.org/10.1002/cnm.1630080906
[14] Kiehl, M. (1994) Parallel Multiple Shooting for the Solution of Initial Value Problems. Parallel Computing, 20, 275-295.
https://doi.org/10.1016/S0167-8191(06)80013-X
[15] Hackbusch, W. (1985) Parabolic Multigrid Methods. Proceedings of the 6th International Symposium on Computing Methods in Applied Sciences and Engineering, Vol. 6, 189-197.
[16] Lions, J.L., Maday, Y. and Turinici, G. (2001) Résolution d’EDP par un schéma en temps Tpararéel t. Comptes Rendus de l’Académie des Sciences-Series I-Mathematics, 332, 661-668.
https://doi.org/10.1016/S0764-4442(00)01793-6
[17] Farhat, C. and Chandesris, M. (2003) Time-Decomposed Parallel Time-Integrators: Theory and Feasibility Studies for Fluid, Structure, and Fluid-Structure Applications. International Journal for Numerical Methods in Engineering, 58, 1397-1434.
https://doi.org/10.1002/nme.860
[18] Minion, M. (2011) A Hybrid Parareal Spectral Deferred Corrections Method. Communications in Applied Mathematics and Computational Science, 5, 265-301.
https://doi.org/10.2140/camcos.2010.5.265
[19] Emmett, M. and Minion, M. (2012) Toward an Efficient Parallel in Time Method for Partial Differential Equations. Communications in Applied Mathematics and Computational Science, 7, 105-132.
https://doi.org/10.2140/camcos.2012.7.105
[20] Christlieb, A.J., Macdonald, C.B. and Ong, B.W. (2010) Parallel High-Order Integrators. SIAM Journal on Scientific Computing, 32, 818-835.
https://doi.org/10.1137/09075740X
[21] Christlieb, A.J., Haynes, R.D. and Ong, B.W. (2012) A Parallel Space-Time Algorithm. SIAM Journal on Scientific Computing, 34, C233-C248.
https://doi.org/10.1137/110843484
[22] Friedhoff, S., Falgout, R.D., Kolev, T.V., MacLachlan, S. and Schroder, J.B. (2012) A Multigrid-in-Time Algorithm for Solving Evolution Equations in Parallel. No. LLNL-CONF-606952, Lawrence Livermore National Lab. (LLNL), Livermore.
[23] Gander, M.J. and Neumüller, M. (2014) Analysis of a Time Multigrid Algorithm for DG-Discretizations in Time.
[24] Bal, G. and Maday, Y. (2002) A “Parareal” Time Discretization for Nonlinear PDE’s with Application to the Pricing of an American Put. In: Recent Developments in Domain Decomposition Methods, Springer, Berlin, 189-202.
https://doi.org/10.1007/978-3-642-56118-4_12
[25] Maday, Y. and Turinici, G. (2003) Parallel in Time Algorithms for Quantum Control: Parareal Time Discretization Scheme. International Journal of Quantum Chemistry, 93, 223-228.
https://doi.org/10.1002/qua.10554
[26] Staff, G. (2003) Convergence and Stability of the Parareal Algorithm: A Numerical and Theoretical Investigation. No. NTNU-N-2003-2, SIS-2003-312.
[27] Samaddar, D., Newman, D.E. and Sánchez, R. (2010) Parallelization in Time of Numerical Simulations of Fully-Developed Plasma Turbulence Using the Parareal Algorithm. Journal of Computational Physics, 229, 6558-6573.
https://doi.org/10.1016/j.jcp.2010.05.012
[28] Gurrala, G., Dimitrovski, A., Pannala, S., Simunovic, S. and Starke, M. (2015) Parareal in Time for Fast Power System Dynamic Simulations. IEEE Transactions on Power Systems, 31, 1820-1830.
https://doi.org/10.1109/TPWRS.2015.2434833
[29] Duan, N., Dimitrovski, A., Simunovic, S. and Sun, K. (2016) Applying Reduced Generator Models in the Coarse Solver of Parareal in Time Parallel Power System Simulation. 2016 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Ljubljana, 9-12 October 2016, 1-5.
https://doi.org/10.1109/ISGTEurope.2016.7856184
[30] Duan, N., Dimitrovski, A., Simunovic, S., Sun, K., Qi, J. and Wang, J. (2018) Embedding Spatial Decomposition in Parareal in Time Power System Simulation. 2018 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, Washington DC, 19-22 February 2018, 1-6.
https://doi.org/10.1109/ISGT.2018.8403389
[31] Bal, G. (2005) On the Convergence and the Stability of the Parareal Algorithm to Solve Partial Differential Equations. In: Domain Decomposition Methods in Science and Engineering, Springer, Berlin, 425-432.
https://doi.org/10.1007/3-540-26825-1_43
[32] Gander, M.J. and Hairer, E. (2008) Nonlinear Convergence Analysis for the Parareal Algorithm. In: Domain Decomposition Methods in Science and Engineering XVII, Springer, Berlin, 45-56.
https://doi.org/10.1007/978-3-540-75199-1_4
[33] Nielsen, A.S. (2012) Feasibility Study of the Parareal Algorithm. Doctoral Dissertation, MSc Thesis, Technical University of Denmark, Denmark.
[34] Harden, C.R. (2008) Real Time Computing with the Parareal Algorithm. Doctoral Dissertation, Florida State University, Tallahassee.
[35] Ruprecht, D. and Krause, R. (2012) Explicit Parallel-in-Time Integration of a Linear Acoustic-Advection System. Computers & Fluids, 59, 72-83.
https://doi.org/10.1016/j.compfluid.2012.02.015
[36] Eghbal, A., Gerber, A.G. and Aubanel, E. (2017) Acceleration of Unsteady Hydrodynamic Simulations Using the Parareal Algorithm. Journal of Computational Science, 19, 57-76.
https://doi.org/10.1016/j.jocs.2016.12.006
[37] Bedez, M., Belhachmi, Z., Haeberlé, O., Greget, R., Moussaoui, S., Bouteiller, J.M. and Bischoff, S. (2016) A Fully Parallel in Time and Space Algorithm for Simulating the Electrical Activity of a Neural Tissue. Journal of Neuroscience Methods, 257, 17-25.
https://doi.org/10.1016/j.jneumeth.2015.09.017
[38] Subramaniam, A.S. and Upadrasta, R. (2018) Optimization and Parallelization of Tensor and ODE/PDE Computations on GPU. Doctoral Dissertation, Indian Institute of Technology Hyderabad, Sangareddy District.
[39] Arteaga, A., Ruprecht, D. and Krause, R. (2015) A Stencil-Based Implementation of Parareal in the C++ Domain-Specific Embedded Language STELLA. Applied Mathematics and Computation, 267, 727-741.
https://doi.org/10.1016/j.amc.2014.12.055
[40] Lakshmiranganatha, S. and Muknahallipatna, S.S. (2020) Graphical Processing Unit Based Time-Parallel Numerical Method for Ordinary Differential Equations. Journal of Computer and Communications, 8, 39-63.
https://doi.org/10.4236/jcc.2020.82004
[41] Chapman, B., Jost, G. and Van Der Pas, R. (2007) Using OpenMP. Portable Shared Memory Parallel Programming.
[42] Cheng, J., Grossman, M. and McKercher, T. (2014) Professional CUDA c Programming. John Wiley & Sons, Hoboken.
[43] Farber, R. (2016) Parallel Programming with OpenACC. Elsevier, Amsterdam.
https://doi.org/10.1016/B978-0-12-410397-9.00001-9
[44] Intel Haswell Processor.
https://www.nas.nasa.gov/hecc/support/kb/haswell-processors_492.html
[45] Chen, N. and Johnson, R. (2010) Patterns for Cache Optimizations on Multi-Processor Machines. Proceedings of the 2010 Workshop on Parallel Programming Patterns, March 2010, 1-10.
https://doi.org/10.1145/1953611.1953613
[46] Jeffers, J., Reinders, J. and Sodani, A. (2016) Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann, Burlington.
https://doi.org/10.1016/B978-0-12-809194-4.00002-8
[47] Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R. and Liu, Y.C. (2016) Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro, 36, 34-46.
https://doi.org/10.1109/MM.2016.25
[48] Kuttana, B. (2013) Technology Insight: Intel Silvermont.
[49] AVX512 ISA.
https://software.intel.com/en-us/articles/compiling-for-the-intel-xeon-phi-processor-and-the-intel-avx-512-isa
[50] Data Alignment to Assist Vectorization.
https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization
[51] Kumar, R., Muknahallipatna, S. and McInroy, J. (2016) An Approach to Parallelization of SIFT Algorithm on GPUs for Real-Time Applications. Journal of Computer and Communications, 4, 18-50.
https://doi.org/10.4236/jcc.2016.417002
[52] Edwards, H.C., Trott, C.R. and Sunderland, D. (2014) Kokkos: Enabling Many-Core Performance Portability through Polymorphic Memory Access Patterns. Journal of Parallel and Distributed Computing, 74, 3202-3216.
https://doi.org/10.1016/j.jpdc.2014.07.003
[53] oneAPI.
https://www.oneapi.com
[54] Gang, Worker, and Vector with Open-ACC.
https://www.microway.com/hpc-tech-tips/accelerating-code-with-openacc-and-nvidia-visual-profiler/gang_worker_vector
[55] Intel Xeon Phi.
https://ark.intel.com/content/www/us/en/ark/products/94033/intel-xeon-phi-processor-7210-16gb-1-30-ghz-64-core.html
[56] Quadro RTX 6000 GPU.
https://www.nvidia.com/en-us/design-visualization/quadro/rtx-6000

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.