Data Versioning WG

WG

Group details

Secretariat Liaison: 
Stefanie Kethers
TAB Liaison: 
Tobias Weigel
WGs Wrapping up (from ~12 months after RDA endorsement)
 

The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown it is now common to have online petabyte data stores, This has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level pre-processed data products can be generated in really short time frames.

This means that data sets and data products are needing some form of systematized way of being able to reference the exact version of the data that was used to underpin the research findings, and/or was used to generate higher level products. This was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for Data Versioning. However, there were no specifics on best practice for data versioning, particularly for large volume multi-terabyte and even petabyte scale data sets. A BoF meeting held at the RDA Plenary in September 2016 in Denver highlighted the fact that there are no recognised best practices for versioning of data.

Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bare some semblance to large dynamic datasets. Are these suitable for data sets or do we need a separate suite of practices for data versioning?

Ultimately versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.


The BoF initially emerged at Plenary 8 in Denver through the discussion available here:  https://www.rd-alliance.org/data-versioning-rda-8th-plenary-bof-meeting



Recent Activity

26 Jun 2019

Plenary 14 Session Submission deadline is tomorrow!

Dear members of the Data Versioning WG,

With RDA Plenary 14 scheduled to occur from October 23-25, the deadline for session submission is this Thursday, 27 June at 16:00 UTC. Submissions for meeting sessions are open to working groups, interest groups, joint groups and birds of feather (BoF) meetings.

Please note, however, submissions are accepted from group chairs only. If you submit a session request for a group, please notify the other chairs of that group.

26 Jun 2019

Plenary 14 Session Submission deadline is tomorrow!

Dear members of the Data Versioning WG,
With RDA Plenary 14 scheduled to occur from October 23-25, *the deadline
for session submission is this Thursday, 27 June at 16:00 UTC.
*Submissions for meeting sessions are open to working groups, interest
groups, joint groups and birds of feather (BoF) meetings.
Please note, however, submissions are accepted _from group chairs only_.
If you submit a session request for a group, please notify the other
chairs of that group.
* Group Session Application form:

03 Apr 2019

Data Versioning Examples in Break-out Session on Dynamic Data Citation today, 12:00-13:30, Commonwealth B

Dear all,

As there has been quite intensive discussion of how different versions can be identified in yesterday's break-out session some of you might be interested in joining the break-out session of the WG on Dynamic Data Citation today where several adoption stories will be presented, showing how versioning (and the identification of arbitrary subsets)  is being implemented in settings ranging from relatively small NoSQL databases to large-scale infrastructures for satellite images.

12 Mar 2019

RDA Data Versioning WG

Please note the updated time!
[X]
Dear Members of the RDA Data Versioning Working Group,
The RDA Plenary Meeting 13 in Philadelphia is coming up soon. At this Plenary we plan to present a draft of our final report and recommendations on data versioning practices. The Working Group's activities will end in September 2019 and the final report will be presented at the RDA Plenary 14 in Helsinki.