Dynamic Data Citation for frequently modifying High Resolution Climate Data

You are here

13 May 2019 53 reads

Climate Change Centre Austria (CCCA) Data Centre adopts Research Data Alliance (RDA) Recommendation on Data Citation of Evolving Data

The Climate Change Centre Austria (CCCA) Data Centre expected a comprehensive project outcome of completely new simulated High Resolution Climate Scenarios for Austria in the time range from 1965 till 2100 on a daily basis. For consumption,  13 model runs, 5 meteorological parameters like temperature, 3 emission scenarios, over 1600 NetCDF files with an average size of 13 GB were calculated. How could we implement proper data management processes on such data packages? We were looking for best practices on persistent identifiers and sub-setting tools for such big data containers. By chance, I met members of the RDA Data Citation Working Group. The idea of using the RDA recommendation on dynamic data citation as a pilot “NetCDF Pilot Implementation of Climate Scenarios” was born.   

 

High Resolution Climate Data modify frequently, due to their complex dependencies and statistical methods for downscaling. In order to re-use these data and services in a reproducible manner, to share and cite, data analysts and researchers need a possibility to identify the exact version used.

Chris Schubert, Head of CCCA-Data Centre

 

The technical challenges we had to overcome in our project revolved around performance issues for the sub-set services of the big NetCDF files. The real challenge we faced, however, was a social one: we had to make sure our user community accepted and trusted reference  data for Climate Services we offered. This certainly was the case for our “small” user group of Climate Services in Austria for the re-use of climate scenarios. Beyond trustful services, the readiness for good scientific practice on sub-set tools with dynamic data citation present barriers.

 

The impact of the adoption: 

With the operational application for Dynamic Data Citation the data becomes significantly more attractive  for data analysts. The user gets a dynamic generated citation text, which contains the original author, label of the dataset, versions, selected and applied subsetting parameters as well the alignment to the persistent identifier. For a new created and published subset, all metadata are inherited from the original ones and supplemented by the defined arguments, like the adapted bounding box, observed parameter and the name of the subset creator.

If we had not adopted the Dynamic Data Citation Sub-Set Service, our users would be forced to download data themselves and thus create an unintended first disruptive point against data provenance information. Data would still, for example, be prepared by selecting the area of interest and time range on the user’s desktop computer. Dynamic data citation clearly increases the handling of data quality through redraw-able corrections and improvements.

 

The CCCA-Data Centre  Software components are Open Source and published on GitHub.

 

Country of your organisation : 
Austria
Stakeholder Classification of your organisation: 
Infrastructure
Funding sources: 
RDA Europe 3.0
RDA ouput(s) adopted: 
Scalable Dynamic-data Citation Methodology