Clone of Data integration for tackling global environmental challenges - Plenary Session

You are here

19 February 2015 812 reads
  • WG BioSharing Registry: connecting data policies, standards & databases in life sciences
  • WG Metadata Standards Directory
  • WG Wheat Data Interoperability
  • IG Agriculture Data Interoperability
  • IG Biodiversity Data Integration
  • IG ELIXIR Bridging Force
  • IG Geospatial
  • IG Marine Data Harmonization
  • IG Metadata


Plenary session objectives and agenda

During the previous RDA plenaries it gradually became obvious that the environment-related Interest and Working Groups share concerns and face similar challenges regarding the lifecycle of Life data. 

Many of our urgent societal challenges, require the effective cross-fertilisation of people, data and processes across multiple biodiversity and environmental disciplines. In order to address these overarching challenges we will need to promote a targeted dialogue within existing fora and use these discussions to build the road-mapping documents that will drive our efforts in the years to come. 

RDA provides a unique opportunity, where existing Working and Interest Groups can talk on shared issues and identify common solutions. In this context we propose to organise a joint meeting of all key RDA groups within the environmental cluster.

Agenda for the plenary session
Convenors: Dimitris Koureas [BDI IG] & Keith Jeffery [Metadata IG] 
  • Helen Glaves - British Geological Survey, UK  

RDA group: Marine Data Harmonisation IG

Title: Common global framework for marine data management

  • Wim Hugo - South African Environmental Observation Network, South Africa

Title: GEO BON Efforts to Establish Components for Global Research Data Infrastructure

  • Rebecca Koskela - Universiy of New Mexico, USA & Keith Jeffery - Keith Jeffery Consultants, UK

RDA group: Metadata Standards Directory IG

Title: Metadata Practices for Biological and Environmental Data

  • Susanna Sansone - University of Oxford, UK

RDA group: Technical Advisory Board & Biosharing Registry WG

Title: Connecting data policies, standards & databases in life sciences

  • Nicky Nicolson - Royal Botanical Gardens Kew, UK

RDA group: Biodiversity Data Integration IG

Title: If you don't know the names, your knowledge gets lost 

Short abstracts

Helen Glaves - British Geological Survey, UK

Common global framework for marine data management

In recent years marine research has undergone a paradigm shift, moving from the traditional discipline specific science towards a more ecosystem level approach. This more multidisciplinary approach to ocean science necessitates large amounts of good quality, interoperable data to be readily available for use in an increasing range of new and innovative applications.

This requirement for large volumes of marine data to be made readily available to users has been addressed on a regional scale by the development of e-infrastructures which are responsible for the managing and delivering data to the end user. However, each of these initiatives has been developed to address specific regional requirements and independently of those in other regions.

To establish a common framework for marine data management on a global scale requires interoperability across these existing data infrastructures and active collaboration between the organisations responsible for their management. The Ocean Data Interoperability Platform project in partnership with the RDA Marine Harmonisation IG is seeking to encourage co-ordination between these regional data infrastructures and capitalise on the range of expertise available in the Research Data Alliance to support the development of this global marine data infrastructure.

Wim Hugo - South African Environmental Observation Network, South Africa

GEO BON Efforts to Establish Components for Global Research Data Infrastructure

The GEO BON Manifesto was developed and discussed at the GEO BON meeting in Asilomar II, December 2012. From an information technology perspective, the GEO BON Manifesto addresses description, discovery, assessment, access, analysis, and application or reporting, by stating that it is the interest of any specific community to do the following:

•        Ensure that scientific data and services are described properly, preserved properly, and discoverable;

◦          This implies availability of metadata standards, harvesters, brokers, and meta-data interoperability.

◦          Persistent identifiers implied.

◦          Protocols and standards for data exchange/ uploads are implied.

◦          Preservation standards and formats implied.

◦          Tools and approaches to make searches more efficient (vocabularies, ontologies, dealing with massive meta-data collections, …).

◦          Sustainable data centers and long-term archives are implied.

•        Once discovered, its utility, quality, and scope can be understood, even if the data sets are huge;

◦          Implies: Visualisations, feedback on quality, quality metrics and standards, viewing search results in relation to referenced spatial, temporal, and ontological/ taxonomic coverages, ability to dynamically extract 'thumbnail' views of large datasets, …

•        Once understood; it can be accessed freely and openly;

◦          Implies: standardised services, licenses and policies, simplified distribution channels, even if costs are involved, …

•        Once accessed, it can be included into distributed processes, and collated - preferably automatically, and on large scales (the ‘Model Web’);

◦          Implies: persistence of mash-ups and mediations, web context documents, web processing services, standards and guidelines for grid computing, ability to construct indicators and standardized, interoperable final products,  …

•        That due recognition is afforded to the creators of the data and services;

◦          Implies: data publication and citation, linking to scholarly articles, …

•        Once processed, the mediations defined, usefulness, and knowledge gathered can be re-used.

◦          Implies: defining and storing templates and examples of finished work, processes, mash-ups, context documents, … 

All of this needs to be implemented against the backdrop of

  • The push to extend formal meta-data with Linked Open Data;
  • The increased availability of crowd-sourced and citizen contributions;
  • A proliferation of devices and sensors;
  • And the construction of knowledge networks.

GEO BON Workgroup 8 is working towards addressing the gaps within this vision, largely by offering formally published guidance, and by engaging initiatives – including RDA -and programmes that can contribute. The standards and specifications landscape is reviewed in light of GEO-BON’s work on Essential Biodiversity Variables. Progress with this is discussed and summarized – defining the state of play in respect of end-to-end interoperability for biodiversity sciences.

Rebecca Koskela - Universiy of New Mexico, USA & Keith Jeffery - Keith Jeffery Consultants, UK

Metadata Practices for Biological and Environmental Data

The scope of biological and environmental research has been changing to focus on long-term, broad-scale, and complex questions that require diverse data collected by interdisciplinary science teams as well as different approaches for managing, preserving, analyzing, and sharing data. The RDA metadata groups focus on all aspects of metadata for research data, including data discovery, contextualization, validation, analytical processing, and interoperation. Metadata is not only important for documenting data (including rights, provenance as well as the usual descriptive information) and assessing datasets for relevance and quality (which includes dataset and publication citation) for the (re-)purpose in hand; it is also necessary for interoperation. This presentation will cover best practices for interoperable metadata useful for interdisciplinary research.

Susanna Sansone - University of Oxford, UK

Connecting data policies, standards & databases in life sciences

As part of the worldwide growing movement for reproducible research, the efforts of funding agencies and journal editors are converging to encourage awardees and authors to provide the underlying data together with a description of that data and the methods used to generate the data, providing such details in a standardized manner and making it available (publicly or via controlled access) for reuse. In parallel, a growing number of community-based groups are developing standards, including content standards for both data and experimental metadata. As a consequence of this general mobilization to support reproducible research there are more than a 1000 databases in the life science, over 300 terminologies, more than 100 reporting guidelines, over 150 exchange formats, and a growing number of data preservation, management, sharing policies and plans that could help in the annotation, reporting and sharing of life science datasets. But what is relevant to the biodiversity and environmental disciplines?
Since 2011, BioSharing works to improve information about the content standards and the databases (maturity, uptake, implementation); provide information to funders and journals about what standards are the appropriate community norms, what databases implement which standards or is appropriate for a certain data types, or where data is curated and openly available (or access is regulated for e.g. ethical reasons) etc. Improving the quality in lists of databases and standards will allow funder/journal policies to encourage transparent information and recommendation of community norms. Interlinking allows the project to close the loop: here are the databases and standards; here are the policies that refer to them (or not). For example, when standards are mature and appropriate standards-compliant systems become available these are channeled to the appropriate stakeholder community, who in turn endorse (in policies) or implement (in databases) them achieving wider harmonization of the data.

Nicky Nicolson - Royal Botanical Gardens Kew, UK

If you don't know the names, your knowledge gets lost 

Scientific names are used in all domains as entry points into biodiversity datasets - but names are updated over time as we refine our understanding of species diversity. Resolution services are essential for data integration efforts to build the linked open data that researchers require: these allow navigation between old and new names as represented in different taxonomies (organising systems) and thus provide access to all content linked to name variants. Names services, operating on high-quality, expert-curated, structured, linked data representing names and their inter-relationships allow the transition of static text to actionable data. These services should be usable by any domain of basic or applied science dealing with scientific names of organisms.