Data Versioning WG


Group details

Secretariat Liaison: 
Stefanie Kethers
TAB Liaison: 
Tobias Weigel
WGs Wrapping up (from ~12 months after RDA endorsement)

The demand for reproducibility of research results is growing, Therefore it will become increasingly important for a researcher to be able to cite the exact extract of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown it is now common to have online petabyte data stores, This has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level pre-processed data products can be generated in really short time frames.

This means that data sets and data products are needing some form of systematized way of being able to reference the exact version of the data that was used to underpin the research findings, and/or was used to generate higher level products. This was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for Data Versioning. However, there were no specifics on best practice for data versioning, particularly for large volume multi-terabyte and even petabyte scale data sets. A BoF meeting held at the RDA Plenary in September 2016 in Denver highlighted the fact that there are no recognised best practices for versioning of data.

Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bare some semblance to large dynamic datasets. Are these suitable for data sets or do we need a separate suite of practices for data versioning?

Ultimately versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.

The BoF initially emerged at Plenary 8 in Denver through the discussion available here:

Recent Activity

11 Feb 2019

Registration Now Open for RDA Plenary 13 in Philadelphia

Registration Now Open for RDA Plenary 13 in PhiladelphiaRegister today for RDA Plenary 13 (P13), to be held from 2-4 April 2019 in Philadelphia, Pennsylvania at the Loews Hotel. Early bird pricing is available only until 2 March 2019.  


Register today at


21 Sep 2018

Preparing for RDA P12 Gaborone

Dear Members of the RDA Data Versioning WG,
Our next in-person meeting at RDA P12 in Gaborone is not too far away. Since the last plenary we have made great progress with the compilation of use cases, and I would like to thank all of you who have contributed to this now pretty comprehensive collection.
Data versioning use cases:

25 Jul 2018

For review: Data versioning use cases document

Dear Members of the RDA Data Versioning Working Group,
Thanks to the help of the Australian Research Data Commons (ARDC, formerly also known as ANDS) we have made great progress with collecting data versioning use cases. The document that I had circulated already in the past has now been substantially expanded. This document is the basis for our input to the W3C Dataset Exchange Working Group (, which is one of our WG outputs and implementation pathways.

21 Feb 2018

RDA Council Endorsement - Data Versioning WG

Dear Members of the RDA Data Versioning WG,
The RDA Council has approved our Data Versioning WG and has updated our group status to fully Recognized and Endorsed. Our TAB Liaison is Tobias Weigel, and Stefanie Kethers will be our Secretariat Liaison.
February 2018 - WG fully endorsed
March 2019 - 12-month community report at P13
August 2019 - WG ends
September 2019 - Final recommendations to be presented at P14
Work beyond August 2019 will require an extension proposal to the RDA TAB.

25 Jan 2018

Versioning use cases and W3C Data Set Exchange Working Group

Dear Members of the RDA Data Versioning Group,
In preparation for RDA P11 in Berlin, we are revising our collection of data versioning use cases. The use cases are documented here:
Please add comments or add changes in edit mode.

05 Sep 2017

RE: W3C Data eXchange Working Group is also considering versioning

Hi Simon,
Thank you for pointing out this current work in W3C, I added it to our collection of materials.
Dr Jens Klump
E ***@***.*** T +61 8 6436 8828
CSIRO ARRC, 26 Dick Perry Avenue, Kensington, WA 6151, Australia

05 Sep 2017

W3C Data eXchange Working Group is also considering versioning

Dear Data Versioners -
Probably of interest that Dataset Versioning is one of the topics on the list for consideration by the W3C Data eXchange Working Group (DXWG [1])- see
Note that the W3C DXWG group is scheduled to deliver by 30 June 2019 [2].
The anticipated products are
1. A revision of the Data Catalogue vocabulary (DCAT)
2. A standard way to formalize DCAT profiles (for a particular community or application)