RDA Data Usage Metrics WG Recommendations

08
Mar
2021

RDA Data Usage Metrics WG Recommendations

By Daniella Lowenberg


 

Data Usage Metrics WG

Group co-chairs: Daniella LowenbergIan Bruno

Recommendation title:  RDA Data Usage Metrics WG Recommendations

Impact: This document outlines next steps and recommendations for widespread adoption of normalized data usage practices, as well as hurdles and limitations to be prioritized going forward. Repositories that utilize these recommendations will help drive a better understanding of data usage and contribute towards the development of research data assessment metrics.

Authors: Daniella Lowenberg, Thomas Jouneau, Ian Bruno

DOI: 10.15497/RDA00062

Citation:  Lowenberg, D., Jouneau, T., & Bruno, I. (2021). RDA Data Usage Metrics WG Recommendations. Research Data Alliance. DOI: 10.15497/RDA00062.

 

Abstract:

Research data are increasingly recognized as important outputs of scholarly research, yet there are currently no standardized or comprehensive metrics for research data as there are for articles. This Working Group was founded following a Birds of a Feather at RDA Plenary 10 hosted by the Make Data Count initiative. Lending expertise from various projects and research stakeholders, this WG, a part of the Publishing Data IG, aimed to harness community buy-in of standardized approaches to data usage metrics and drive widespread adoption. The first WG meeting at RDA11 focused on an overview of initiatives in the data usage metrics space and spent the majority of the time discussing scope for the WG. Two virtual meetings took place before RDA12 focused on refining scope and defining data usage metrics. The RDA12 session centered on use cases for usage metrics, updates to the COUNTER Code of Practice for Research Data, and a discussion on barriers to adoption of standardized usage metrics. RDA13 had the largest attendance yet, overflowing a room as we presented on survey results of current implementations of usage metrics and barriers to adoption. At RDA14, WG members presented on the pitfalls and shortcomings of data usage metrics and further analyses of the survey. A discussion began about where the WG should head, sharing and developing practices around data usage metrics. The last WG session at RDA16 gave an opportunity for a split crowd of new and returning members to give input on the proposed recommendations below. The broad takeaway is that community-agreed usage metrics are essential for the future of research data evaluation, but technical, bibliometric, and social infrastructure are required to properly develop indicators. 

 

Outputs

 

 

Output Status: 
Recommendations with RDA Endorsement in Process
Review period start: 
Tuesday, 9 March, 2021 to Friday, 9 April, 2021
Group content visibility: 
Use group defaults
Primary WG Focus / Output focus: 
Domain Agnostic: 
Domain Agnostic
File: 
  • Hans Pfeiffenberger's picture

    Author: Hans Pfeiffenberger

    Date: 07 Apr, 2021

    Premise 6: Using the number of downloads as a proxy for data usage (which is the title of this WG), is a flawed approach and an invitation to disaster even worse than the h-index pandemic. Should this indicator, downloads, ever be used in evaluations - be it of the performance of individuals or repositories - it is just too easy to game the system, either by carefully automated downloads or by clever titles/descriptions ("clickbait")

    There is still only one moderately trustworthy and verifiable indicator of actual usage: citation.  However, in the current landscape of datasets with and without DOIs, repositories of varying sophistication, and extremely uneven cultural practises of data citation, there are still too many datasets which cannot be properly cited, or if so, are not.

    For a reasonably important data collection, see our "Twenty-Year Review of GBIF", https://doi.org/10.35035/ctzm-hz97, Full Report, chapters 3 and 4 on "User and Contributor Perspectives" and "GBIF Biodiversity Information Impact and Metrics". Note, particularly, that the impact of GBIF mediated biodiversity data on such important outputs as the IPCC and IPBES reports is not or barely discernible by any technically/automatically collected metric (because of indirect/missing citation). Also, even if a selected subset of the highly dynamic GBIF collection is properly cited (even implementing the recommendations of the RDA WG on dynamic data citation), it is, in most cases, not possible to give attribution to individual contributing scientists.

     

submit a comment