WGDC Pilots
-
Discussion
-
Pilots: Adoption of the WGDC Recommendations
The following table shows a list of pilots implementing the RDA WGDC recommendations. Further details can be found by clicking on the name in the table below.
Name Data Type Status Notes WG Pilots
CSV-Reference CSV / SQL reference running Reference implementation Natural History Museum London RDBMS operational finished TIMBUS RDBMS research finished Sensor data XML-Reference XML research finished eXist-DB DEXHELPP CSV/RDBMS research running Social security data Git-Reference ASCII reference running Reference implementation VAMDC SQL/NoSQL/ASCII/XML deployment running Distributed data center CBMI@wustl RDBMS deployment starting integration into i2b2 CCCA NetCDF deployment finished climate scenarios data ACDH RDBMS, LoD deployment starting thesaurus ARGO NetCDF deployment planned ODIP-II BCO-DMO CSV deployment planned ENVRIplus deployment running Ocean Networks Canada Data streams deployment starting Oceanographic data … CSV, RDBMS deployment planning Conceptual evaluation, seeking funding
Short Template
– Pilot name:
– Contact person:
– Type: research pilot / reference implementation / operational system
– Status: finished / active / starting / planned
– Type of data: (RDBMS, XML, CSV, file-based, other)
– Dynamics: (very high-frequency (microsecs) / very frequent (minutes) / frequent (daily/hourly) / sometimes (every other month) / rarely /none)
– Domain:
– Short description:
– Solution / approach:
– Timeline:
– Supplementary material: slides, reports, screenshots, papers, SW, …Details Template
RDA Data Citation Recommendations and their Application in the CSV Reference Implementation
A. Preparing the Data and the Query Store
- R1 – Data Versioning: For retrieving earlier states of data sets the data needs to be versioned.
- R2 – Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp.
- R3 – Query Store: Provide means to store the queries used to select data and associated metadata.
B. Persistently Identify Specific Data sets
When a data set should be persisted, the following steps need to be applied:- R4 – Query Uniqueness: Re-write the query to a normalised form so that identical queries can be detected. Compute a checksum of the normalized query to efficiently detect identical queries.
- R5 – Stable Sorting: Ensure an unambiguous sorting of the records in the data set.
- R6 – Result Set Verification: Compute a checksum of the query result set to enable verification of the correctness of a result upon re-execution.
- R7 – Query Timestamping: Assign a timestamp to the query either based on the last update to the entire database or the last update to the selection of data affected by the query or the query execution time. This allows retrieving the data as it existed at query time.
- R8 – Query PID: Assign a new PID to the query if either the query is new or if the result set returned from an earlier identical query is different due to changes in the data. Otherwise, return the existing PID.
- R9 – Store Query: Store query and metadata (e.g. PID, original and normalised query, query & result set checksum, timestamp, superset PID, data set description and other) in the query store.
- R10 – Citation Text: Provide a recommended citation text and the PID to the user.
C. Upon Request of a PID
- R11 – Landing Page: PIDs should resolve to a human readable landing page of the data set, which provides metadata including a link to the superset (PID of the data source) and citation text snippet.
- R12 – Machine Actionability: the landing page should be machine-actionable and allow retrieving the data set by re-executing the timestamped query.
D. Upon Modifications to the Data Infrastructure
- R13 – Technology Migration: When data is migrated to a new representation (e.g. new database system, a new schema or a completely different technology), the queries and associated checksums need to be migrated.
- R14 – Migration Verification: Successful query migration should be verified by ensuring that queries can be re-executed correctly.
Log in to reply.