Research Data Repository Interoperability

19 May 2016

Research Data Repository Interoperability

The initial idea of establishing this working group was presented during P6 in Paris in the Repository Platforms for Research Data IG session. Shortly after P6 a telephone conference was carried out with the conclusion to prepare a case statement and to finalize it during a BoF session at P7. The initial co-chairs are David Wilcox and Thomas Jejkal. Contacts to potential co-chairs from Asia were already made during P6 and will be finalized during P7.

For more information please visit the web page of this BoF group:

https://rd-alliance.org/groups/research-data-repository-interoperability-wg-bof.html

Charter

The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms focusing on machine-machine communication. These standards may include (but are not limited to) a generic API specification and import/export formats summarized in a document serving as an implementation guide for adoption. The scope of this document and all the WG’s activities will be defined by the following list of initial use cases:

  • Migration/Replication of a Digital Object between research data repository platforms

    • Platform, data model and/or version may differ between source and destination

  • Retrieval of information related to the platform and/or its contents

    • E.g. to register the system in a (repository) registry or to harvest contents

This initial list might be extended in the first phase of the WG’s operational time.

In order to cover these use cases, existing standards and technologies will be identified and evaluated in the second phase. Evaluation results will be summarized in a separate deliverable and will form the basis of the final deliverable. During the evaluation phase, the preparatory work of other RDA WGs will be used as far as possible along with experiences gathered by the RDRI WG’s members during their work with and on existing research data repository platforms.

In the final phase the WG will strive for a consensus regarding a generic API specification and/or import/export formats needed for offering the listed functionalities. The final deliverable will then contain this consensus in a form such that it can be used as an implementation guide for later adoption.

Value Proposition

The Research Data Repository Interoperability working group will provide recommendations and implementation guidelines (e.g. for a generic API or import/export formats) for research data repository interoperability that can be integrated by platform developers and service providers. Therefore, existing standards and technologies will be evaluated and integrated where possible. Once adopted widely, these outcomes will allow institutions and organizations with research data repositories to deposit, access and share their data in a common way and to disseminate repository resources and contents to clients and services easily. For adopters and their users this means:

Removing Barriers: Defining and implementing interoperability standards for realizing the use cases mentioned above could help to identify and to acquire datasets stored in other platforms not available before in order to enrich the own research.

Easier Collaboration: Having a common way to exchange datasets stored in different research data repository platform instances from different institutions or even disciplines can help to identify new starting points for (inter-)disciplinary collaborations.

Creating Commonalities: Agreeing on and implementing common standards for realizing typical research data repository tasks might bring adopters closer together. For the future this could result in fruitful collaborations extending the basic set of functionalities that have been proposed by this WG.

As everything rises and falls with the adoption of the results, repository platform developers contributing to this group have agreed to implement the results as early adopters.

Engagement with Existing Work

A number of related standardization efforts have already taken place; for example, the OAI protocol for metadata harvesting, the SWORD protocol for repository deposits, and the re3data.org schema for collecting information on research data repositories for registration. The Research Data Repository Interoperability WG will review these and other related standards to see how they might be adopted or extended to support our goals. This review period will ensure that we do not duplicate existing efforts.

Related Work

Related RDA Groups

Work Plan

The work of the proposed group is organized in three phases framed by the RDA plenary meetings beginning with P8.

Timing

Action

Main Participants

September 2016

Official start of RDRI WG at P8, working session at P8 for analyzing state of the art

Session participants in an open discussion

September – December 2016

Identification and discussion of additional use cases and adoptable technologies. Mapping of technologies for potential adoption to single functionalities.

Registered members

January – April 2016

Create a primer document describing all use cases and technologies for potential adoption. The document also points out gaps not covered by existing technologies.

Co-chairs

April 2016

Session during P9 to present the primer document and to prepare next steps, e.g. identification of functionalities or exchange formats.

WG members

April 2016 – September 2017

Discussion of functionalities, exchange formats and intended behavior. Create first draft of specification document.

Registered members

September 2017

Presentation of the specification draft at P10 and identification of open points and potential improvements.

Session participants in an open discussion

September – March 2018

Find consensus regarding final specification and write final deliverable serving as implementation/adoption guideline.

Registered members/co-chairs (writing)

March 2018

Present final results at P11.

Co-chairs

 

Deliverables

D1. Research Data Repository Interoperability Primer (M6): This document describes targeted use cases, needed functionalities, as well as existing technologies and their feasibility for adoption. Gaps not covered by existing technologies are also described in this document.

D2. Interface Specification Draft (M12): A first draft document of the final specification. The document gives a basic overview of functionalities, exchange formats and intended behavior targeted by the WG to cover the defined use cases. This document will be the basis for finding a consensus between all WG members.

D3. Interface Specification (M18): This specification represents a consensus of all partners regarding an interoperable repository interface. It describes all functionalities provided by this interface including exchange formats and the expected behavior of a repository platform implementing the interface. This document serves as guideline for adopting the results of this working group.

Mode and Frequency of Operation

The Research Data Repository Interoperability WG will primarily communicate asynchronously online using the mailing list functionality provided by RDA. Online voice meetings will be scheduled as needed; likely once per month. When possible, in-person meetings will also be scheduled; these will take place at RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

Addressing Consensus and Conflicts

Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.

Community Engagement

The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.

Adoption Plan

Representatives of several major repository platforms have already joined this working group, including:

These representatives have agreed to consider implementing the standards recommended by the Research Data Repository Interoperability WG in their respective repository platforms. We will continue to seek representatives from a variety of repository platforms and services to ensure that this working group’s deliverables are widely adopted.

Initial Membership

Co-Chairs

Thomas Jejkal

David Wilcox

 

Members

Stefan Funk

Ralph Mueller-Pfefferkorn

Robert Olendorf

Rick Johnson

Ulrich Schwardmann

Ajinkya Prabhune

Andrew Woods 

Wolfram Horstmann

Cynthia Hudson Vitale

Adam Soroka 

Jared Whiklo

Colleen Fallaw

Rainer Stotzka

Stephen Abrams

Eleni Castro

Amy Nurnberger

Andre Schaaff

Christopher Harrison

Holger Mickler

Jibo Xie

Juanle Wang

Muhammad Naveed Tahir

Niclas Jareborg

Shaun de Witt

Volker Hartmann

William Gunn

Wouter Haak

 

  • Eva Méndez's picture

    Author: Eva Méndez

    Date: 20 May, 2016

    I think this group is a very interesting idea. Congratulations and count with me for P8.  However, on reading the Related RDA Groups, I am really missing its relationship with metadata WG. I think we should look for sinergies.

    Looking forward to hearing more...

     

  • Thomas Jejkal's picture

    Author: Thomas Jejkal

    Date: 20 May, 2016

    Of course, there is also overlap with other RDA groups not explicitly mentioned in the Case Statement and there are definitely contact points with the metdata groups. Therefor, it would be great if we could stay in contact for information exchange.

  • Elizabeth Griffin's picture

    Author: Elizabeth Griffin

    Date: 20 May, 2016

    What is outlined is obviously an essential stage in realizing All Data for All Researchers, but there is rather more to it than enabling repositories to contact and communicate seamlessly. That is only one small step, and it is a long way down the stream of eventual confluence of 'dissimilar' data-sets. In theory it sounds great, but how is it going to work in practice, and to what uses can merely transported data be put when there are so very many other variables in the works? Different sciences use different interpretations of the same word to describe features of their observations. It isn't as if all researchers use identical computers and identical reduction software or modelling tools. Data formats sound misleadingly alike, but can refuse to conform even within the same science. A simple example: different instruments will deliver either fluxes or intensities, or some machine-uncorrected version of either, and it can be crucial to sort that kind of trivial-sounding matter out before drawing erroneous conclusions. Of course, you will reply, all those things will be properly sorted out in due time. But when, and by whom, and will they all be? It only takes one publication to present wrong conclusions that resulted from not fully understanding the subtle differences between different types of data to place the whole effort in jeopardy. I therefore believe that, while the topic in question is worthy of deep consideration, it does also need to be placed very precisely within its rightful place along the whole chain of actions from inter-departmental agreements on format unification, language unification and metadata unification, via inter-university or country-wide or international agreements of the same kind, with ample trials and feedback at every stage and involving users at every stage, until it could be claimed that the data scientists have done their work thoroughly. It will then also take inordinate amounts of dedicated time for the other half of the population, the users, to come up with their own judgements at every step. All of that cannot be swept up in one RDA IG for 'interoperability', though trying to get a full perspective of the total procedure will help to place the intentions of this particular (would-be) IG more nearly into its correct context.

  • Malcolm Wolski's picture

    Author: Malcolm Wolski

    Date: 24 May, 2016

    I have the same concerns as Elizabeth. To achieve something within a short timeframe you will need to keep the scope narrow and focused. As Elizabeth points out it is a big issue. But we have to start somewhere. Perhaps there are some outputs around general principles and approaches rather than specific solutions for every situation. 

  • Thomas Jejkal's picture

    Author: Thomas Jejkal

    Date: 17 Jun, 2016

    From your perspective, having the overall goal of "All Data for All Researchers" in mind, I totally agree with you. This is something a single WG can impossibly achieve. Of course, the proposed WG contributes only a very small piece to the ultimate vision of sharing every data with everybody. However, we think that this small piece is worth to be tackled and may contribute (on a more technical level) to improve data sharing and exchange.

    All other aspects like format, language and metadata unification are out of the scope of this WG, but if there are recommendations of other groups, from inside or outside RDA, in these directions these recommendations will be definitely taken into account as far as possible.

  • Donald Pellegrino's picture

    Author: Donald Pellegrino

    Date: 20 May, 2016

    It might be useful to reach out to a representative of the iRODS repository platform as well. More information on iRODS can be found at http://irods.org/.

  • Thomas Jejkal's picture

    Author: Thomas Jejkal

    Date: 17 Jun, 2016

    Of course, having an iRods partner would be great. Do you have someone in mind?

  • Stefan Kramer's picture

    Author: Stefan Kramer

    Date: 03 Jun, 2016

    I believe that this WG's proposed undertaking is a very worthwhile effort, having personally encountered the challenge of "Research Data Repository Interoperability" (or lack thereof) in investigating how to mirror data submitted to a data visualization platform, for interactive access to data (namely, opendata.american.edu) into a data archiving platform (namely, dra.american.edu).

    Disclosure: I am one of the co-chairs of the Repository Platforms for Research Data IG (with David Wilcox & Ralph Müller-Pfefferkorn).

    -- Stefan

    P.S.: there seems to be a glitch in this platform - I posted this comment on June 3, but it was datestamped May 19, the same date that the case statement was posted for review. As are all the other previous comments.

     

  • Tim Smith's picture

    Author: Tim Smith

    Date: 20 Jun, 2016

    Please could you add Invenio to the list of represented repository platforms (underlies services such as Zenodo, B2SHARE, INSPIRE, etc). I've joined the nascent WG and look forward to meeting at P8.

  • Thomas Jejkal's picture

    Author: Thomas Jejkal

    Date: 06 Jul, 2016

    Thank you for joining the group. I'll add Invenio to the list.

submit a comment