Anton Pershin

and 4 more

Numerical simulations of weather and climate models are conventionally carried out using double-precision floating-point numbers throughout the vast majority of the code. At the same time, the urgent need of high-resolution forecasts given limited computational resources encourages development of much more efficient numerical codes. A number of recent studies has suggested the use of reduced numerical precision, including half-precision floating-point numbers increasingly supported by hardware, as a promising avenue. In this paper, the possibility of using half-precision calculations in the radiation scheme ecRad operationally used in the ECMWF's Integrated Forecasting System (IFS). By deliberately mixing half-, single- and double-precision variables, we develop a mixed-precision version of the Tripleclouds solver, the most computationally demanding part of the radiation scheme, where reduced-precision calculations are emulated by a Fortran software rpe. By employing two tools that estimate the dynamic range of model parameters and identify problematic areas of the model code using ensemble statistics, the code variables were assigned particular precision levels.It is demonstrated that heating rates computed by the mixed-precision code are reasonably close to those produced by the double-precision code. Moreover, it is shown that using the mixed-precision ecRad in OpenIFS has a very limited impact on the accuracy of a medium-range forecast in comparison to the original double-precision configuration. These results imply that mixed-precision arithmetic could successfully be used to accelerate the radiation scheme ecRad and, possibly, other parametrization schemes used in weather and climate models without harming the forecast accuracy.

Milan Klöwer

and 4 more

Most Earth-system simulations run on conventional CPUs in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world’s fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bit. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed-precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6.10-5 to 65504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of 3.8x on A64FX with Float16. Adding a compensated time integration the speedup is 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.

Patrick Laloyaux

and 3 more

Model error is one of the main obstacles to improved accuracy and reliability in numerical weather prediction (NWP) conducted with state-of-the-art atmospheric models. To deal with model biases, a modification of the standard 4D-Var algorithm, called weak-constraint 4D-Var, has been developed where a forcing term is introduced into the model to correct for the bias that accumulates along the model trajectory. This approach reduced the temperature bias in the stratosphere by up to 50% and is implemented in the ECMWF operational forecasting system. Despite different origins and applications, Data Assimilation and Deep Learning are both able to learn about the Earth system from observations. In this paper, a deep learning approach for model bias correction is developed using temperature retrievals from Radio Occultation (RO) measurements. Neural Networks require a large number of samples to properly capture the relationship between the temperature first-guess trajectory and the model bias. As running the IFS data assimilation system for extended periods of time with a fixed model version and at realistic resolutions is computationally very expensive, we have chosen to train, the initial Neural Networks are trained using the ERA5 reanalysis before using transfer learning on one year of the current IFS model. Preliminary results show that convolutional neural networks are adequate to estimate model bias from RO temperature retrievals. The different strengths and weaknesses of both deep learning and weak constraint 4D-Var are discussed, highlighting the potential for each method to learn model biases effectively and adaptively.

Milan Klöewer

and 2 more

Jan Ackmann

and 3 more

Semi-implicit time-stepping schemes for atmosphere and ocean models require elliptic solvers that work efficiently on modern supercomputers. This paper reports our study of the potential computational savings when using mixed precision arithmetic in the elliptic solvers. Precision levels as low as half (16 bits) are used and a detailed evaluation of the impact of reduced precision on the solver convergence and the solution quality is performed. This study is conducted in the context of a novel semi-implicit shallow-water model on the sphere, purposely designed to mimic numerical intricacies of modern all-scale weather and climate (W&C) models. The governing algorithm of the shallow-water model is based on the non-oscillatory MPDATA methods for geophysical flows, whereas the resulting elliptic problem employs a strongly preconditioned non-symmetric Krylov-subspace solver GCR, proven in advanced atmospheric applications. The classical longitude/latitude grid is deliberately chosen to retain the stiffness of global W&C models. The analysis of the precision reduction is done on a software level, using an emulator, whereas the performance is measured on actual reduced precision hardware. The reduced-precision experiments are conducted for established dynamical-core test-cases, like the Rossby-Haurwitz wavenumber 4 and a zonal orographic flow. The study shows that selected key components of the elliptic solver, most prominently the preconditioning and the application of the linear operator, can be performed at the level of half precision. For these components, the use of half precision is found to yield a speed-up of a factor 4 compared to double precision for a wide range of problem sizes.