Data integration for tackling global environmental challenges - Plenary Session

You are here

27 January 2015 3460 reads

During the previous RDA plenaries it gradually became obvious that the environment-related Interest and Working Groups share concerns and face similar challenges regarding the lifecycle of Life data.

Many of our urgent societal challenges, require the effective cross-fertilisation of people, data and processes across multiple biodiversity and environmental disciplines. In order to address these overarching challenges we need to promote a targeted dialogue within existing fora and use these discussions to build the road-mapping documents that will drive our efforts in the years to come. 

RDA provides a unique opportunity, where existing Working and Interest Groups can talk on shared issues and identify common solutions. In this context we propose to organise a joint meeting of all key RDA groups within the environmental cluster.
Scope of the session and next steps
Dimitris KoureasNatural History Museum London, UK & Biodiversity Data Integration IG, co-Chair
Session co-chair 

With a PhD on Plant systematics and biodiversity, Dimitris is currently a biodiversity informatics scientist in the Natural History Museum London. Over the last 10 years he has been developing, contributing and managing European funded and co-funded research and research infrastructure projects. Dimitris has significant expertise in virtual research environments and he liaises between different biodiversity research and e-infrastructure teams across Europe investing actively in capacity and contact building. He also participates in the strategic steering of the Scratchpads platform ( and is an invited lecturer in University of Reading, University of Oxford, and Aristotle University of Thessaloniki, teaching biodiversity informatics tools. Dimitris is a European representative in the Biodiversity Information Standards Organisation ( executive.
Keith Jeffery
Keith Jeffery Consultants, UK & Metadata Standards Directory IG, co-Chair
Session co-chair 
Keith Jeffery is an independent consultant and past Director IT at STFC Rutherford Appleton Laboratory with 360,000 users, 1100 servers and 140 staff. Keith holds 3 honorary visiting professorships, is a Fellow of the Geological Society of London and the British Computer Society, is a Chartered Engineer and Chartered IT Professional and an Honorary Fellow of the Irish Computer Society.  Keith is past-President of ERCIM and past President of euroCRIS, and serves on international expert groups, conference boards and assessment panels.  He had advised government on security and green computing.  He chaired the EC Expert Groups on GRIDs and on CLOUD Computing.
Common global framework for marine data management
Helen Glaves,  British Geological Survey, UK  & Marine Data Harmonisation IG
In recent years marine research has undergone a paradigm shift, moving from the traditional discipline specific science towards a more ecosystem level approach. This more multidisciplinary approach to ocean science necessitates large amounts of good quality, interoperable data to be readily available for use in an increasing range of new and innovative applications.
This requirement for large volumes of marine data to be made readily available to users has been addressed on a regional scale by the development of e-infrastructures which are responsible for the managing and delivering data to the end user. However, each of these initiatives has been developed to address specific regional requirements and independently of those in other regions.
To establish a common framework for marine data management on a global scale requires interoperability across these existing data infrastructures and active collaboration between the organisations responsible for their management. The Ocean Data Interoperability Platform project in partnership with the RDA Marine Harmonisation IG is seeking to encourage co-ordination between these regional data infrastructures and capitalise on the range of expertise available in the Research Data Alliance to support the development of this global marine data infrastructure.

Helen Glaves is the Senior Data Scientist at the British Geological Survey (BGS) and directly responsible for the management of the BGS scientific data holdings to ensure delivery of high quality geoscience data and metadata to support the core science priorities of the organization. Helen is also co-ordinator of the Ocean Data Interoperability Platform (ODIP and ODIP2) projects, funded by the European Commission, National Science Foundation (USA) and the Australian Government, to support the development of a common global framework for marine data management. She is also currently involved with a number of other national and international initiatives directly related to the sharing, re-use and preservation of earth science data including ENVRIPLUS, SciDIP-ES and the Belmont Forum e-Infrastructure and Data Management initiative (

GEO BON Efforts to Establish Components for Global Research Data Infrastructure
Wim Hugo, South African Environmental Observation Network, South Africa
The GEO BON Manifesto was developed and discussed at the GEO BON meeting in Asilomar II, December 2012. From an information technology perspective, the GEO BON Manifesto addresses description, discovery, assessment, access, analysis, and application or reporting, by stating that it is the interest of any specific community to do the following:
•        Ensure that scientific data and services are described properly, preserved properly, and discoverable;
◦          This implies availability of metadata standards, harvesters, brokers, and meta-data interoperability.
◦          Persistent identifiers implied.
◦          Protocols and standards for data exchange/ uploads are implied.
◦          Preservation standards and formats implied.
◦          Tools and approaches to make searches more efficient (vocabularies, ontologies, dealing with massive meta-data collections, …).
◦          Sustainable data centers and long-term archives are implied.
•        Once discovered, its utility, quality, and scope can be understood, even if the data sets are huge;
◦          Implies: Visualisations, feedback on quality, quality metrics and standards, viewing search results in relation to referenced spatial, temporal, and ontological/ taxonomic coverages, ability to dynamically extract 'thumbnail' views of large datasets, …
•        Once understood; it can be accessed freely and openly;
◦          Implies: standardised services, licenses and policies, simplified distribution channels, even if costs are involved, …
•        Once accessed, it can be included into distributed processes, and collated - preferably automatically, and on large scales (the ‘Model Web’);
◦          Implies: persistence of mash-ups and mediations, web context documents, web processing services, standards and guidelines for grid computing, ability to construct indicators and standardized, interoperable final products,  …
•        That due recognition is afforded to the creators of the data and services;
◦          Implies: data publication and citation, linking to scholarly articles, …
•        Once processed, the mediations defined, usefulness, and knowledge gathered can be re-used.
◦          Implies: defining and storing templates and examples of finished work, processes, mash-ups, context documents, … 
All of this needs to be implemented against the backdrop of
The push to extend formal meta-data with Linked Open Data;
The increased availability of crowd-sourced and citizen contributions;
A proliferation of devices and sensors;
And the construction of knowledge networks.
GEO BON Workgroup 8 is working towards addressing the gaps within this vision, largely by offering formally published guidance, and by engaging initiatives – including RDA -and programmes that can contribute. The standards and specifications landscape is reviewed in light of GEO-BON’s work on Essential Biodiversity Variables. Progress with this is discussed and summarized – defining the state of play in respect of end-to-end interoperability for biodiversity sciences.

Wim has a master's degree in Chemical Engineering, and many years experience in techno-economic feasibility studies, management consulting, systems engineering and systems architecture. Recent work (5-6 years) has focused on systems architecture and development in support of scientific data management and preservation. Research interests include Knowledge Networks and Certification of Trusted Digital Repositories. Member of the ICSU-World Data System Scientific Committee, and co-chair of the GEO-BON WG8 (Systems and Architecture), and of the newly constituted collaboration between RDA and WDS on repositories. Active in CoDATA, GEO, and GEOSS.

Metadata Practices for Biological and Environmental Data
Rebecca Koskela, Universiy of New Mexico, USA & Keith Jeffery - Keith Jeffery Consultants, UK & Metadata Standards Directory IG
The scope of biological and environmental research has been changing to focus on long-term, broad-scale, and complex questions that require diverse data collected by interdisciplinary science teams as well as different approaches for managing, preserving, analyzing, and sharing data. The RDA metadata groups focus on all aspects of metadata for research data, including data discovery, contextualization, validation, analytical processing, and interoperation. Metadata is not only important for documenting data (including rights, provenance as well as the usual descriptive information) and assessing datasets for relevance and quality (which includes dataset and publication citation) for the (re-)purpose in hand; it is also necessary for interoperation. This presentation will cover best practices for interoperable metadata useful for interdisciplinary research.
Rebecca Koskela is the Executive Director of DataONE at the University of New Mexico. Prior to this position, Rebecca was the Life Sciences Informatics Manager for Alaska INBRE and the Biostatistics and Epidemiology Core Manager for the Center for Alaska Native Health Research at the University of Alaska Fairbanks. In addition to her bioinformatics experience, Rebecca has over 25 years experience in high performance computing including positions at Sandia National Laboratories, Los Alamos National Laboratory, Cray Research and Intel.
Connecting data policies, standards & databases in life sciences
Susanna Sansone, University of Oxford, UK & Technical Advisory Board & Biosharing Registry WG
As part of the worldwide growing movement for reproducible research, the efforts of funding agencies and journal editors are converging to encourage awardees and authors to provide the underlying data together with a description of that data and the methods used to generate the data, providing such details in a standardized manner and making it available (publicly or via controlled access) for reuse. In parallel, a growing number of community-based groups are developing standards, including content standards for both data and experimental metadata. As a consequence of this general mobilization to support reproducible research there are more than a 1000 databases in the life science, over 300 terminologies, more than 100 reporting guidelines, over 150 exchange formats, and a growing number of data preservation, management, sharing policies and plans that could help in the annotation, reporting and sharing of life science datasets. But what is relevant to the biodiversity and environmental disciplines?

Since 2011, BioSharing works to improve information about the content standards and the databases (maturity, uptake, implementation); provide information to funders and journals about what standards are the appropriate community norms, what databases implement which standards or is appropriate for a certain data types, or where data is curated and openly available (or access is regulated for e.g. ethical reasons) etc. Improving the quality in lists of databases and standards will allow funder/journal policies to encourage transparent information and recommendation of community norms. Interlinking allows the project to close the loop: here are the databases and standards; here are the policies that refer to them (or not). For example, when standards are mature and appropriate standards-compliant systems become available these are channeled to the appropriate stakeholder community, who in turn endorse (in policies) or implement (in databases) them achieving wider harmonization of the data.

Associate Director at the University of Oxford e-Research Centre, consultant for Nature Publishing Group's Scientific Data, co-chair of the RDA-Force11 BioSharing WG and member of the RDA TAB. As Principal Investigator at the Centre, Susanna’s activities are around and in support of data curation, management and publication and their pivotal roles in enabling reproducible research, driving science and discoveries. Susanna focuses on life science, environmental and biomedical domains, collaborating with data producers and service providers, and pre-competitive informatics initiatives, journals and funding agencies to develop software and promote the creation and uptake of community-developed ontology and standards.
If you don't know the names, your knowledge gets lost 
Nicky Nicolson, Royal Botanical Gardens Kew, UK & Biodiversity Data Integration IG
Scientific names are used in all domains as entry points into biodiversity datasets - but names are updated over time as we refine our understanding of species diversity. Resolution services are essential for data integration efforts to build the linked open data that researchers require: these allow navigation between old and new names as represented in different taxonomies (organising systems) and thus provide access to all content linked to name variants. Names services, operating on high-quality, expert-curated, structured, linked data representing names and their inter-relationships allow the transition of static text to actionable data. These services should be usable by any domain of basic or applied science dealing with scientific names of organisms.
Nicky Nicolson is the senior research leader in the Kew’s Biodiversity Informatics team, and has over 15 years experience in biodiversity informatics – the effort to curate, mobilise and exploit global scale biodiversity data resources gathered over hundreds of years.

Plenary session groups:

  • WG BioSharing Registry: connecting data policies, standards & databases in life sciences
  • WG Metadata Standards Directory
  • WG Wheat Data Interoperability
  • IG Agriculture Data Interoperability
  • IG Biodiversity Data Integration
  • IG ELIXIR Bridging Force
  • IG Geospatial
  • IG Marine Data Harmonization
  • IG Metadata