Based on the proposition made by V. Balaji the following set of metrics for assessing the sustained computational performance of Earth System Models has been defined within IS-ENES2 WP JRA1. The metrics are independent of hardware platform, programming and parallelisation paradigm, and provide a good basis for comparison across different ESMs. The collection of the metrics does not require any specialized measurement tools or hardware counters.
- Resolution: Spatial resolution of the computational grid for each of the physical domains (typically atmosphere and ocean) complemented by the total number of grid points. The resolution can be specified in a domain-specific way, for example, the average horizontal spacing and the number of vertical levels for atmospheric grids.
- Complexity: Different measures were discussed before compromising on the number and dimension of variables in the ESM's restart files. The assumption is that the restart files represent the internal state of the ESM, thus allowing an estimate of the complexity to be deduced from the size dimension of the internal state space. The main advantage of this measure is that it is easy to obtain from any ESM.
- Simulated years per day (SYPD): The number of years that can be simulated by the ESM in a given configuration on a given computing platform during a 24-hour period, assuming dedicated computing resources. Practically, this number is often deduced from shorter test runs.
- Actual simulated years per day (ASYPD): The number of years that can be simulated by the ESM in a given configuration on a given platform in a multi-user environment (i.e. not assuming dedicated resources). This metric is usually measured using a long simulation with restarts, thus including queueing time between chunks, and workflow cost (see this term below).
- Core hours per simulated year (CHPSY): This metric measures the actual computational cost of the ESM simulation. It is usually determined by the product of the model run time and the number of cores used.
- Memory bloat: This metric indicates the ratio of actual to ideal memory consumption of the ESM. The ideal consumption memory is deduced from complexity, as being the total memory needed to fit restart file variables. The actual memory is the only figure that requires a generic measurement tool, usually provided with the scheduler.
- Coupler cost: Ratio of time spent in the coupler doing calculations to the overall run time of the model. This needs either a thorough performance analysis (tracing/profiling) or support in the coupler software.
- Load imbalance: Ratio of the time spent waiting in the coupler for one of the components to the overall run time. Again, this can be obtained by carefully examining messages sent within the coupled model using a tracing tool.
- Data output cost: Extra time that an ESM needs to write the model output to the file system. This is measured as the ratio of the run time for a standard run (including standard model output) to the run time for a run with model output switched off.
- Data intensity: Amount of data that is read or written by an ESM in a given time during a typical run. For global climate models, it is mostly the written data that contributes to the data transfer, which is why the I/O speed metric may be limited to the output data.
- Workflow cost: Additional time is often needed to process and/or transfer model data into the form and place such that what is considered to be the result of the ESM run is achieved. The workflow cost is the ratio of this additional time to the run time of the ESM. This metrics needs a certain formalisation of the overall workflow to be able to separate the ESM run from the rest of the workflow steps. It is worth noticing that part of the workflow tasks could be done in parallel (concurrently) with the ESM run. This metric is (only) concerned with the extra (consecutive) part of the time needed for workflow tasks.
- Parallelisation: The number of computational units (cores or nodes as applicable) that is used for a certain ESM run. This number can be specified separately for the components of a coupled model and complemented by information about the parallelisation paradigm.