ESGF Data Ingestion
Except replications, which are treated differently, ESGF data ingestion consists of the steps shown below:
At the end of the publishing step, the data are visible in the ESGF and can be downloaded from there. For long-term archiving and DataCite DOI assignment, additional ingestion steps have to be appended.
Aggregation, format and unit conversion, generation of metadata and additional data
These steps have to be performed by the data provider.
Data with time frequency day, month, season and year are usually aggregated. Depending on the nature of the variable and the rules of the project, an integral (e.g. for precipitation), a mean, minimum or maximum has to be calculated. Sometimes, more than one aggregation step is necessary, for example monthly CORDEX sfcWindmax is a mean of the daily maxima.
Metadata are usually inquired with a form and are project-dependent. If the utilized grid is not the usual latitude-longitude grid, additional data for grid description are needed, rotation angles in case of rotated poles or a grid map file in case of curvilinear coordinates.
Rewriting of NetCDF file
NetCDF allows many data structures, variable definitions and attributes. To guarantee uniformity within a project, each project defines rules and translates them into machine-readable tables, the CMOR tables. The program CMOR (Climate Model Output Rewriter) is able to read these tables and adapts the attributes in the NetCDF header according the project's rules, i.e. overwrites them. It is able to perform unit conversions (if not already done) and can calculate auxiliar variables as time bounds or grid points from the grid map. It also performs some quality checks, e.g. it can detect gaps in time series.
Original CMOR is a subroutine library. Climate data projects usually have a main program and routines for reading the data developed or reuse existing software. Usually, the data provider has to perform the rewriting of the data files. IS-ENES offers guidance.
This step is usually done in the data centre but the quality assurance tool can also be used by the data provider. The tool has been developed in the IS-ENES project and is freely available. The quality assurance tool can check the consistency with the CF standard, with CMIP5 and CORDEX rules. The checks are not limited to the data itself, the directory structure can also be examined. Which tests are performed is project-dependent. The quality assurance tool throws a warning if a rule is violated. It does not perform corrections. In case quality assurance detects inconsistencies with the project's rules, the data are returned to the data provider for adaptation.
This last step can only be launched by an ESGF data node administrator. The ESGF publisher script fills-in data and metadata. It also checks readability again.