Skip to main content

Notice

The new RDA web platform is still being rolled out. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Please report bugs, broken links and provide your feedback using the UserSnap tool on the bottom right corner of each page. Stay updated about the web site milestones at https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/.

WGDC Pilots

  • Creator
    Discussion
  • #137950

    Pilots: Adoption of the WGDC Recommendations


    The following table shows a list of pilots implementing the RDA WGDC recommendations. Further details can be found by clicking on the name in the table below.

    Name Data Type Status Notes

    WG Pilots

    CSV-Reference CSV / SQL reference running Reference implementation
    Natural History Museum London RDBMS operational finished  
    TIMBUS RDBMS research finished Sensor data
    XML-Reference XML research finished eXist-DB
    DEXHELPP CSV/RDBMS research running Social security data
    Git-Reference ASCII reference running Reference implementation
    VAMDC SQL/NoSQL/ASCII/XML deployment running Distributed data center
    CBMI@wustl RDBMS deployment starting integration into i2b2
    CCCA NetCDF deployment finished climate scenarios data
    ACDH RDBMS, LoD deployment starting thesaurus
    ARGO NetCDF deployment planned ODIP-II
    BCO-DMO CSV deployment planned  
    ENVRIplus   deployment running  
    Ocean Networks Canada Data streams deployment starting Oceanographic data
    CSV, RDBMS deployment planning Conceptual evaluation, seeking funding

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     


    Short Template

    – Pilot name:
    – Contact person:
    – Type: research pilot / reference implementation / operational system
    – Status: finished / active / starting / planned
    – Type of data: (RDBMS, XML, CSV, file-based, other)
    – Dynamics: (very high-frequency (microsecs) / very frequent (minutes) / frequent (daily/hourly) / sometimes (every other month) / rarely /none)
    – Domain:
    – Short description:
    – Solution / approach:
    – Timeline:
    – Supplementary material: slides, reports, screenshots, papers, SW, …

    Details Template

    RDA Data Citation Recommendations and their Application in the CSV Reference Implementation

    A.    Preparing the Data and the Query Store

    • R1 – Data Versioning: For retrieving earlier states of data sets the data needs to be versioned.
    • R2 – Timestamping: Ensure that operations on data are timestamped, i.e. any additions, deletions are marked with a timestamp.
    • R3 – Query Store: Provide means to store the queries used to select data and associated metadata.

    B.    Persistently Identify Specific Data sets
             When a data set should be persisted, the following steps need to be applied:

    • R4 – Query Uniqueness: Re-write the query to a normalised form so that identical queries can be detected. Compute a checksum of the normalized query to efficiently detect identical queries.
    • R5 – Stable Sorting: Ensure an unambiguous sorting of the records in the data set.
    • R6 – Result Set Verification: Compute a checksum of the query result set to enable verification of the correctness of a result upon re-execution.
    • R7 – Query Timestamping: Assign a timestamp to the query either based on the last update to the entire database or the last update to the selection of data affected by the query or the query execution time. This allows retrieving the data as it existed at query time.
    • R8 – Query PID: Assign a new PID to the query if either the query is new or if the result set returned from an earlier identical query is different due to changes in the data. Otherwise, return the existing PID.
    • R9 – Store Query: Store query and metadata (e.g. PID, original and normalised query, query & result set checksum, timestamp, superset PID,  data set description and other) in the query store.
    • R10 – Citation Text: Provide a recommended citation text and the PID to the user.

    C.    Upon Request of a PID

    • R11 – Landing Page: PIDs should resolve to a human readable landing page of the data set, which provides metadata including a link to the superset (PID of the data source) and citation text snippet.
    • R12 – Machine Actionability: the landing page should be machine-actionable and allow retrieving the data set by re-executing the timestamped query.

    D.    Upon Modifications to the Data Infrastructure

    • R13 – Technology Migration: When data is migrated to a new representation (e.g. new database system, a new schema or a completely different technology), the queries and associated checksums need to be migrated.
    • R14 – Migration Verification: Successful query migration should be verified by ensuring that queries can be re-executed correctly.

     

Log in to reply.