Study and Analysis of the High Performance Computing Failures in China Meteorological Field

HTML  XML Download Download as PDF (Size: 2380KB)  PP. 28-40  
DOI: 10.4236/gep.2017.512002    837 Downloads   1,729 Views  Citations
Author(s)

ABSTRACT

China Meteorological Administration (CMA) has a long history of using High Performance Computing System (HPCS) for over three decades. CMA HPCS investment provides reliable HPC capabilities essential to run Numerical Weather Prediction (NWP) models and climate models, generating millions of weather guidance products daily and providing support for Coupled Model Inter-comparison Project Phase 5 (CMIP5). Monitoring the HPCS and analyzing the resource usage can improve the performance and reliability for our users, which require a good understanding of failure characteristics. Large-scale studies of failures in real production systems are scarce. This paper collects, analyzes and studies all the failures occurring during the HPC operation period, especially focusing on studying the relationship between HPCS and NWP applications. Also, we present the challenges for a more effective monitoring system development and summarize the useful maintenance strategies. This step may have considerable effects on the performance of online failure prediction of HPC and better performance in future.

Share and Cite:

Chen, X. and Sun, J. (2017) Study and Analysis of the High Performance Computing Failures in China Meteorological Field. Journal of Geoscience and Environment Protection, 5, 28-40. doi: 10.4236/gep.2017.512002.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.