wiki:Projects/IS-ENES2/Metadata

Version 12 (modified by lawrence, 7 years ago) (diff)

Metadata in IS-ENES2

This page provides a short list of the key IS-ENES2 launch activities which need to be addressed from a CMS perspective. In the longer term, these activities will be documented on one or both of an IS-ENES2 work tracker (tbd) and the es-doc cog.

Amongst other things, it is intended to help the briefing of CMS staff working on IS-ENES2.

Total metadata resource at CMS (not including NA2, which adds another 6-9pm): 21pm, all associated with task 2.1 of JRA2.

Background

  • The key metadata activity within IS-ENES2 is a follow on project of metafor.
    • The key product of the Metafor project (with collaborators) were:
      • The Common Information Model (CIM) for describing models and the simulation workflow. See for example Lawrence et al 2012.
      • Controlled vocabularies for describing aspects of models (to be used in instances of CIM descriptions). See for example Moine et al 2013
  • Currently the metafor activity has been subsumed into an international project under the es-doc label:

Current Priorities

(These are across the entire IS-ENES2, so CMS doesn't have to do all this, but we probably have to coordinate it.)

Generic: From an IS-ENES2 perspective:

  1. Establish the internal IS-ENES2 metadata organisation
    1. Need to make sure it is cross-cutting wrt IS-ENES2 work packages (JRA, NA etc).
    2. May take some advantage of es-doc cog, and/or telcos, as well as internal mechanisms.
    3. Identify who is the relevant European community - funded within this project or not.
  2. Re-establish the European network to feed into es-edoc
  3. Establish and document priority actions for IS-ENES2 activities
  4. Begin a programme of external communication
    1. Identify an external user group and a set of beta testers.
    2. Consider what training is needed.
    3. Formalise links with other projects (EU, US where necessary (and not already covered by es-doc).
    4. Liaise with CMIP6 preparation.

Metadata Management Issues (these are the most time critical activities):

  1. Quality control the existing CMIP5 documentation.
  2. Update existing CMIP5 documentation in the light of new knowledge acquired for the IPCC AR5.
  3. Communicate with modelling groups accordingly.

Technical Activities:

  1. CIM:
    1. Institute the governance body documented by Metafor
    2. Establish any new requirements from the CMIP5 quality control exercise.
    3. Establish any new requirements from new projects (s2d, obs4mips, CCI, Copernicus, Ermitage).
    4. Revisit CIM 2.0 in the light of new requirements.
  2. Controlled Vocab (CV):
    1. Ensure that the CV is covered by the CIM governance body too (as proposed by Metafor).
    2. Evaluate lessons from CMIP5 for global GCM CV.
      • It has been suggested that some discrimination between core and tiers would be helpful, but consider whether this is an issue for the CV, or the tools (including creators and validators).
    3. Consider how (or whether) to incorporate other CVs (e.g. downscaling, ERMITAGE, s2d, obs etc).
  3. Tools (taking responsibility for technical work at CMS as appropriate):
    1. Evaluate the future of "the questionnaire" vis-a-viz other possible tools (i.e. django-forms and the evolved PIMMS questionnaire).
    2. Help guide the evolution of other tools (including viewers, comparators and validators)
    3. Liaise with work on Annotation (CHARMe)
    4. Scientific and Technical responsibility for repository. Who? Where? How?
      • Who does and validates content migration?

Relevant Kick-Off Meeting Actions

  1. Telco to discuss finalisation of CMIP5 CIM content — Bryan — [5-8 weeks], depends on
    • Resource at Reading - short or long term.
  2. Implementation of CIM questionnaire for SPECS with additional vocabularies[ tbd]

Relevant IS-ENES2 DoW Milestones and Deliverables

NA4 states that:

  • The scientific coordination and development of CIM and CV extension will be done in NA4 whereas the technical support for the resulting governance activities will be covered in SA2. NA4 will further organise the community evaluation of CIM generation software, developed both in JRA3 as well as in other projects. A gap and duplication analysis as well as a sharing of best practice and experience with various experimental generation of CIM content will be performed.

This suggests that CMS should *not* be doing the metadata scientific leadership, as it is not funded in the NA4 work package. However, if we do take on the leadership, this might give us an opportunity to suggest that one of the NA4 partners takes on some aspect of the technical work.

In what follows we cover the metadata aspects of the data packages first, then the metadata aspects of the model packages.

NOT ALL DELIVERABLES AND MILESTONES ARE LISTED YET

Metadata in the "Data" Packages

JRA3

JRA3 will enhance the existing data archives services of SA2 by delivering enhancements on four key service packages , one of which is the :

  • Metadata Services Package The metadata infrastructure necessary for managing petascale climate model archives includes a range of components – many of which are encapsulated in the specifications of the METAFOR Common Information Model (CIM). Software components delivered by the FP7 METAFOR and IS-ENES1 projects will need further development and enhancements to meet ongoing ENES requirements – particularly those captured and delivered in, respectively, the NA4 and SA2 work packages of this project. There will be two major activities:
    1. improving and extending existing tools for metadata capture, and
    2. refactoring and extending tools to provide metadata services (including the repository and services to be deployed in SA2).

This is being delivered in Task 2.1 (21pm with additional 6pm from the UKMO):

  • Tools for metadata capture and generation. The primary tool used thus far for meta-data creation (in the context of IS-ENES support for CMIP5) has been the METAFOR questionnaire. This tool consists of a graphical user interface handling answering hundreds of questions about model capability and configuration. The existing tools are heavily customised for CMIP5, and need re-factoring to improve flexibility for alternative users. As CIM capability is extended within NA4, the new tool will be modified accordingly, and deployed in support of SA2. The MetO internal tool for meta-data will also be modified to directly generate CIM content so that alternative approaches to meta-data entry can be compared.
  • This task is associated with:
    • Deliverable D11.4: Report on meta-data services, due in month 39, itself associated with:
      • Milestone: MS113: Revised Generic Questionnaire software package available.
      • Milestone: MS116: MetO Metadata Entry Tool Evaluation, due in month 21.

Note also other relevant tasks:

  • Tools for metadata services (including repository) (CNRS-IPSL, DKRZ) (19 pm)

The existing meta-data repository is based on XML1 storage, and is not thought to be fit for purpose. A revised storage back-end will be constructed, perhaps based on JSON1 storage artefacts, and the web service interfaces extended accordingly (XML for transport will still be supported, but additional JSON31interfaces will be provided to facilitate CIM navigation by web browsers). Improved support for data quality records and annotation will be provided (supporting task 3 below).

NA4

CMS does not have a formal engagement in NA4, but clearly JRA3 is related to NA4, and the proposal that CMS has leadership for Metadata, means that the relevant information about NA4 is included here:

Task 2:

  • Meta-data, interoperability and standardisation (37 pm) (Engaging, STFC, CERFACS, CSAG, MF-CNRM, UNIMAN, CSAG in collaboration with ESA.)
  • This task focuses on extending the FP7 METAFOR “Common Information Model” (CIM) to support new data and activities, better experiment description, more utility in quality control – and coordinate generation of new content. CIM extensions and new controlled vocabulary will need to be coordinated on a global scale, and notably in direct coordination with the US-led CURATOR and National Unified Operational Prediction Capability (NUOPC) consortiums.
  • This task will establish and manage a community process for identifying and delivering CIM upgrades to improve and extend support for global climate model documentation, including better descriptions of coupling and frameworks, and statistical downscaling methodologies. New support for regional climate models and re-analyses data will also be initiated. This will lead to new CIM controlled vocabularies (CV) to be handed to SA2 for community governance. Working groups will be established for each theme, and these will, in the first 6 months, identify and establish contact with the key community projects (in Europe and elsewhere) in each area.
  • Liaison with similar projects working on remote sensing and ground based observations will be important – particularly in the context of GMES and WMO Climate Service Information System.
    • The DoW then implies that Task 2 itself (rather than one of the data activities) will also
      • (Deliver close collaboration) with ESA to foster the exploitation of European satellite data archive and the new generation of missions (Earth Explorers & Sentinels) and products (in particular from ESA Climate Change Initiative (CCI) when available). More precisely, given the direct involvement of ESA within IS-ENES2, we will ensure interoperability of the ESA CCI database repositories.
        • (This is obviously nothing to do with Metadata, but perhaps the DoW means "interoperatiblity of methods for getting to such metadata in these repositories". More realistically, this should probably now read "Work closely with CHARMe on these issues".
    • The standardisation efforts will be extended to the CORDEX programme statistical downscaling component, with specifications of the experiment design and data format.

The relevant deliverable is:

  • Deliverable D5.4: Report on metadata controlled vocabulary extensions (Month 45): The report will be based on the identification and the delivery of CIM upgrades to improve and extend support for global climate model documentation, including better descriptions of coupling and frameworks, and statistical downscaling methodologies. New draft support for regional climate models and re-analyses data will also be described. This will lead to new CIM controlled vocabularies (CV) that will be described in details in this report. Results will be handed to SA2 for community governance. Itself associated with:
    • Milestone MS54: Synthesis report on metadata requirements regarding relevant data archives, due at month 42, and the responsibility of STFC.

SA2

CMS does not have a formal engagement with SA2, but clearly JRA3 is related to SA2, and the proposal that CMS has leadership for Metadata, means that the relevant information about SA2 is included here:

The ENES Climate Data Infrastructure (ENES-CDI) is a distributed infrastructure accessible via the ENES portal. Operators are located in : Hamburg, Germany (WDCC/DKRZ), Didcot, Great Britain (CEDA/STFC), Paris, France (CNRS-IPSL), De Bilt, The Netherlands (KNMI), Linköping, Sweden (LIU), Copenhagen, and Denmark (DMI).

The relevant part of SA2 is Task2:

  • Meta-data Services New developments in the metadata part for CIM related tools are integrated into the metadata services and will become part of the services activities during the course of the project.
    • Meta-data services in SA2 are focussed on support of meta-data access while meta-data services in SA1 are concentrating on population of CIM instances.
    • Activity CIM Governance (STFC) This task organises the maintenance of CIM schema and related controlled vocabularies with emphasis on model and experiment descriptions. Outputs from NA4 metadata networking task in the form of scientific requirements from CORDEX and the impacts community will be integrated.
    • Activity: CIM Repository (DKRZ, CNRS-IPSL) This task operates the model and experiment metadata repository based on CIM instances from the METAFOR project and CMIP5.
      • Elsewhere a firmer statement is that WDCC/DKRZ will operate the CIM model and experiment repository and provide services for climate model data quality control. I expect this means CNRS-IPSL are maintaining the code for the operational service.

Metadata in the "Model" Packages

NA2

(Future Models)

The relevant part of NA2 is Task 2.1

  • Task 2.1: Model Structure and Code Evaluation. The key aim of this sub-task is to prepare the ground for subsequent community discussion of the different approaches in use in Europe. The first activity will be to establish a consistent methodology for documenting key model components, the exchanges between them, and their scaling properties - across all the major European models. This methodology will exploit the model metadata work in other work packages, and the service documentation activities. It is likely that the resources available will only allow the initial analysis of two models, but this will then feed into a workshop to establish the efficacy for more wide usage of the methodology by model groups.
    • Post DoW Technical Detail: It is proposed that we begin with whatever codes are available from JRA1 (high resolution) and JRA2 (benchmarking). The idea is that we will exploit code-parsing to produce a CIM description of the software, which would then be compared and analysed by a human. Rupert Ford (UMAN) will use some of his time in NA4 to help with producing a CIM outputting code-parser, which may need to make use of CIM2.x.
  • This is due in year one of the project!

NA5

(Workflow environments)

The relevant part of NA5 is Task 3:

  • Task 3: Metadata creation and usage . Significant experience has been gained in CMIP5 and related exercises in providing meta-data to describe ESM experiment sets. A number of sites are recognising the need to build meta-data capture into the heart of the ESM experiment process and to drive data provision exercises; this needs to be supported by both software and processes. This networking activity will promote the sharing of experiences and designs in this emerging area through two workshops organised by DKRZ. The aim will be to encourage investment in software and working processes that will allow more comprehensive meta-data to be collected more efficiently. Further, the development of workflow and diagnostic solutions will be influenced by the meta-data requirements. To support the workshop, the Met Office and DKRZ will develop documents that identify key interfaces between the meta-data and the experiment definition and modelling processes, and explore design solutions.

Attachments (1)

Download all attachments as: .zip