RDA WG DMP Common Standards Case Statement
WG Charter: A concise articulation of what issues the WG will address within an 18 month time frame and what its “deliverables” or outcomes will be.
The need for establishing this working group was articulated during the 9th plenary meeting in Barcelona during the Active DMPs IG session. The discussion was framed by a white paper by Simms et al. on machine-actionable data management plans (DMPs). The white paper is based on outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, America, Australia, and Europe. It describes eight community use cases which articulate consensus about the need for a common standard for machine-actionable DMPs (where machine actionable is defined as “information that is structured in a consistent way so that machines, or computers, can be programmed against the structure”)
The specific focus of this working group is on developing common information model and specifying access mechanisms that make DMPs machine-actionable. The outputs of this working group will help in making systems interoperable and will allow for automatic exchange, integration, and validation of information provided in DMPs, for example, by checking whether a provided PID links to an existing dataset, if hashes of files match to their provenance traces, or whether a license was specified. The common information models are NOT intended to be prescriptive templates or questionnaires, but to provide re-usable ways of representing machine-actionable information on themes covered by DMPs.
The vision that this working group will work to realise is one where DMPs are developed and maintained in such a way that they are fully integrated into the systems and workflows of the wider research data management environment. To achieve this vision we will develop a common data model with a core set of elements. Its modular design will allow customisations and extensions using existing standards and vocabularies to follow best practices developed in various research communities. We will provide reference implementations of the data model using popular formats, such as JSON, XML, RDF, etc. This will enable tools and systems involved in processing research data to read and write information to/from DMPs. For example, a workflow engine can add provenance information to the DMP, a file format characterization tool can supplement it with identified file formats, and a repository system can automatically pick suitable content types for submission and later automatically identify applicable preservation strategies.
The deliverables will be publicly available under CC0 license and will consist of models, software, and documentation. The documentation will describe functionality and semantics of terms used, rationale, standard compliant ways for customisation, and requirements for supporting systems to fully utilise the capabilities of the developed model.
The working group will be open to everyone and will involve all stakeholders representing the whole spectrum of entities involved in research data management, such as: researchers, tool providers, infrastructure operators, repository staff and managers, software developers, funders, policy makers, and research facilitators. We will take into account requirements of each group.This will likely speed up and increase adoption of the working group outcomes.
The group will predominantly collaborate online, but will use any possibility to meet in person during RDA plenaries, conferences, workshops, hackathons or other events in which their members participate. All meetings in which decisions are made will be documented and their summaries will be circulated using the RDA website.
The work will be performed iteratively and incrementally following the best practices from system and software engineering. We will evaluate preliminary drafts of the model with community to receive early feedback and to ensure that the developed common model is interoperable and exchangeable across implementations. We will also express existing DMPs using the developed common model and will investigate how to support modification of machine actionable DMPs by various tools involved in data management process, while ensuring that proper provenance and versioning information is stored with. Finally, we will build prototypes to investigate possible system integrations and to evaluate to which degree the information contained in the DMPs can be automatically validated and which actions or alerts depending on a DMP state can be triggered, e.g. by sending notifications to repositories or funder systems.
During our work we will monitor parallel efforts and engage with various research communities to find candidates for pilot studies and to transfer the acquired know-how. Towards the end of the lifetime of the working group we will launch pilot projects in which the model will be customised to suit the needs of the identified interested communities. Pilot studies will use the models to integrate systems and demonstrate how machine-actionable DMPs can work.
We believe that the outcomes delivered by this group will contribute to improving the quality of research data and research reproducibility, while at the same time reducing the administrative burden for researchers and systems administrators.
Value Proposition: A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.
A common data model for machine-actionable DMPs will enable interoperability of systems and will facilitate automation of data collection and validation processes. The common model and accompanying interfaces and libraries are an essential building block for the infrastructure. Although for some stakeholder groups, the developments will be invisible (and should be) so that the unification and standardisation of a DMP model will bring benefits to all of them.
Researchers will benefit from having fewer administrative procedures to follow. Machine-actionable DMPs can facilitate the automatic collection of metadata about experiments. They will accompany experiments from the beginning and will be updated over the course of the project. Consecutive tools used during processing can read and write data from machine-actionable DMPs. As a result, parts of the DMPs can be automatically generated and shared with other collaborators or funders. Furthermore, researchers whose data is reused in other experiments will gain recognition and credit because their data can be located, reused, and cited more easily.
Reusing parties will gain trust and confidence that they can build on others’ previous work because of a higher granularity of available information.
Funders and repositories will be able to automatically validate DMPs. For example, they will be able to check whether the specified ORCID iD or e-mail are correct, whether the data is available at the specified repository, and whether the data checksums are correct – in other words, whether the information provided in a DMP reflects reality.
Infrastructure providers will get a universal format for exchange of (meta-) data between the systems involved in data processing and data storage. They could also be able to automate processes associated to DMPs, like backup, storage provision, grant access permissions, etc.
Society will be better able to safeguard investment made in research and will gain assurance that scientific findings are trustworthy and reproducible, while the underlying data is available and properly preserved.
Author: Wouter Haak
Date: 13 Jul, 2017
Upon request of the RDA OA(B) I have looked deeper into the case statement of this WG & evaluate it.
The objective of this group is very clear and resonates well. I was personally at the Belmont Forum in London (joint meeting between publishers and funding bodies, to discuss data sharing practices). This was a topic of great interest by all parties. Exactly as the case statement describes, many researchers and institutions are currently implementing data management plans. However, to avoid that this becomes a 'paper excercise' and rather has some meaning, it would be incredibly helpful to standardize and computerize DMP's (i.e. make them machine readible).
That does not mean we need one standard DMP. Each domain has its own specialties. However, there are many commonalities and all stakeholders benefit from starting from these commonalities first.
At this forum i heard a plea to organize this indeed from 2 funding bodies (NSF and Wellcome), 3 publishers (Elsevier, IOP, Nature), and 3 platforms (Digital Science, Mendeley Data, Scopus). Several attendees, having been made aware of the existence of this RDA working group, expressed interest in attending and contributing. So therefore my conclusion was that this WG should get a lot of support.
Author: Lynn Yarmey
Date: 19 Aug, 2017
Many thanks for your comment Wouter. Your points were considered in the final TAB review and we encourage the group to follow up on these opportunities as well.
Many thanks again!
Author: Jean-Yves CHATELIER
Date: 25 Jul, 2019
First of all, congratulations on your initiative and for the work accomplished since the launch of this group.
I think that the DMP is too often perceived as an administrative document, one more to be completed by the researchers.
However, the questionnaire that is used to develop this document is primarily a methodology to guide the management of research data and give confidence to principals throughout the project and to future researchers who want to reuse the data produced.
Having a representation scheme for the information to be integrated in a DMP seems to me to be really essential at all stages of the project and not only at the end when the data are already structured.
Before all the data sets are even designed, the JSON file containing the information collected according to the schema can represent the trace left by the different versions of the DMP independently of the tool that made it possible to enter this information (DMPonline, Opidor ...).
This same file can actually be synchronized at the end of the project with the infrastructure that stores the data and become a quality element of comparable project from one project to another.
I think it would be good to present several use cases of the ontology by showing how to manually enter the information that can not be deduced from an existing system, generate several versions of the same file, complete the file with information automatically collected from existing systems. Examples of tools that can help produce the DMP file at each stage of the project lifecycle would, in my opinion, increase the chances of joining this format.
It seems that a draft validation tool for such a DMP file is under study, it might be interesting to imagine a parameterization taking into account the phase in which the project is located.