You are here

Body:

v.2

Last update 01April2021

  1. Charter

The Global Open Research Commons (GORC) is an ambitious vision of a global set of interoperable resources necessary to enable researchers to address societal grand challenges including climate change, pandemics, and poverty. The realized vision of GORC will provide frictionless access to all research artifacts including, but not limited to: data, publications, software and compute resources; and metadata, vocabulary, and identification services to everyone everywhere, at all times.

The GORC is being built by a set of national, pan-national and domain specific organizations such as the European Open Science Cloud, the African Open Science Platform, and the International Virtual Observatory Alliance (see Appendix A for a fuller list). The GORC IG is working on a set of deliverables to support coordination amongst these organizations, including a roadmap for global alignment to help set priorities for Commons development and integration. In support of this roadmap, this WG will establish benchmarks to compare features across commons.  We will not coordinate the use of specific benchmarks by research commons. Rather, we will review and identify features currently implemented by a target set of GORC organizations and determine how they measure their user engagement with these features.

In the first case we will collect and curate a set of benchmarks that will allow Commons developers to compare features across science clouds. For example, we would consider benchmarks such as evidence or the existence of :

        1. A well defined decision making process
        2. A consistent and openly available data privacy policy
        3. Federated Authentication and Authorization infrastructure
        4. Community supported and well documented metadata standard(s)
        5. A workflow for adding and maintaining PIDs for managed assets
        6. A mechanism for utilizing vocabulary services
        7. A process to inventory research artefacts and services
        8. An Open Catalogue of these artefacts and services
        9. A proven workflow to connect multiple different research artefact types (e.g. data and publications; data and electronic laboratory notebooks; data and related datasets)
        10. A mechanism to capture provenance for research artefacts
        11. Mechanisms for community engagement and input; an element or scale for inclusion

These benchmarks will be an initial starting point of what we would expect to find in a mature research commons. We will then review each of the commons in the target list to see if they provide other features that should be included as benchmarks.  As part of our review, we will document implementations of features in research commons.

We will collect information about each of the benchmarks we see “in the wild”, but the benchmarks are not intended to be prescriptive regarding implementation. For example, the benchmark: “A mechanism for utilizing (or accessing) vocabulary services” is evidenced by the NERC Vocabulary Server (NVS) in the EOSC, and by Research Vocabularies Australia (RVA) in the Australian National Data Service (ANDS). NERC uses the Simple Knowledge Organization System (SKOS) to represent concepts in the vocabulary service and provides access via both SPARQL and SOAP endpoints. ANDS RVA also serves SKOS-encoded vocabularies and provides a SPARQL endpoint, but also a RESTful API, and the option to bulk download complete vocabularies in a single file for local processing.  The benchmark in this case is evidence of the ability to use a vocabulary service, satisfied by both RVA and NVS. Whenever possible we will collect information about the particular implementation of the benchmark or feature as we review the commons, but that is not the primary goal. The WG will collectively decide what constitutes a benchmark. For example, the ANDS RVA service also has the ability for users to self-register and create, edit or upload vocabularies, a function not available in the NVS. In this case, the WG will decide if the ability to create and edit, not just access, a vocabulary service should constitute a separate benchmark. Whenever possible we will utilize outputs from other RDA groups to identify benchmarks. In particular the RDA 9 functional requirements for data discovery will be very informative of the benchmarks associated with data repositories.

Secondly, the WG will collect information about how existing commons are measuring success, adoption or use of their services within their organization, such as data downloads, contributed software, and similar KPI and access statistics. The first set of benchmarks  will be  the existence of a feature or service and is comparable across organizations.  The second set of benchmarks are quantitative measures used within an organization to measure the uptake or use of a feature or service.

  1. Value Proposition

This WG is motivated by the broader goal of openly sharing data and related services across technologies, disciplines, and countries to address the grand challenges of society. The deliverables of the WG itself will inform roadmaps for development of the infrastructure necessary to meet that goal, while engagements and relationships formed during the work period will help forge strong partnerships across national, regional and domain focused members which are crucial to its success. Identifying observable and measurable benchmarks in pursuit of the global open science commons will help create a tangible path for development and support strategic planning within and across science commons infrastructures. In the future, best practices for commons development will emerge based on the experience of what actions led to successful outcomes. This work will provide a forum for discussion that will allow members to identify the most important features and the minimal elements required to guide their own development and build a commons that is globally interoperable. Building interoperable commons will support many research efforts including work focused on societal grand challenges and UN Sustainable Development Goals (SDGs). Finally, it will support developers as they seek resources to build the global commons by helping them respond to funding agencies requirements for measurable deliverables.

The proposed WG was discussed at the RDA 16 virtual plenary.[1] Participants discussed the initial work packages and agreed during the meeting this was a worthy goal and an appropriate approach. 

 

  1. Engagement with Existing Work

This WG will review all appropriate IG and WG outputs to determine intersection with this work, and engage with the WG/IGs as appropriate. Some of the efforts are reasonably well known now: the GORC IG builds on, and incorporates the previous National Data Services IG, which was embarking on a similar exercise when the GORC started; the  Domain Repositories IG, specifically the repository-specific discovery metrics/benchmarks. The Commons that will be investigated in this WG are likely either to have considered or implemented outputs from other RDA groups, such as the Data Fabric IG, and the Virtual Research Environment IG, just to name a few. These groups and many others outside of RDA will have recommendations that speak to functionality and features of various components of Commons; for example the EOSC FAIR WG and Sustainability WG that seek to define the EOSC as a Minimum Viable Product (MVP).  We will review these and other related outputs to see if they have identified benchmarks that we feel will support our goals. This review period will ensure that we do not duplicate existing efforts. Appendix B of this case statement identifies a few of these existing efforts, both within and without RDA; this list will be expanded and reviewed by the WG members.

 

  1. Work Plan

To create these deliverables, members of the group will:

  1. Create a target list of Commons (Appendix A)
  2. Create a database structure to capture benchmarks
  3. Create an initial list of benchmarks
  4. Create an online form to capture benchmarks
  5. Create task groups within the Benchmarking WG, each responsible for reviewing a subset of the target list and ancillary documents
  6. Each task group reviews public facing documentation of their assigned Commons to extract benchmarking information (both KPIs and feature lists) and reports back to the larger WG.
  7. A separate task group reviews public facing documentation of recommendations and roadmaps from related communities to extract benchmarking information (Appendix B) and reports back to the larger WG. This evaluation phase will include an examination of the outputs from other RDA WGs and position papers available in the wider science infrastructure community, along with experiences gathered by the WG’s members.
  8. Because benchmarking information may not be easily found in public documents we will conduct outreach to Commons representatives and related organizations to ask for additional feedback and information about benchmarks used by their community.  This may include benchmarks already in use, as well as benchmarks that organizations feel would be useful but which are not yet implemented.
  9. Begin drafting adoption plan
  10. Synthesize and document the benchmarks into 3 deliverables, described below.

There are multiple ways for the WG to create task groups. The WG will decide if they would rather define the task group according to the deliverables, creating a Commons Internal Benchmarking TG and a Commons External Benchmarking TG, or if they would rather subdivide according to a typology of the commons, for example with some members looking at pan-national, national, or domain specific commons, or by some other subdivision of labor.

The WG will proceed according to the following schedule:

Month

Activity

Jan-Mar

2020

Group formation

  1. Agreement on the scope of work and deliverables (broad scope)
  2. Case statement community review

Apr-Jun

2021

RDA17

Refine scope: Agree to target list of commons and organizational approach

Begin to define methodology, especially the form for data collection and initial set of commons

Begin literature review of public facing documents from Science Commons and related organizations

Report on progress to International Symposium on Global Open Science Cloud (June 2021)

Jul-Sep

2021

Recruit additional members to WG, continue lit review

Oct-Dec

2021

Begin outreach to Science Commons and related organizations

Update at RDA18

Report on progress to https://internationaldataweek.org/ (Nov 2021)

Jan-Mar

2022

First draft: External Benchmarks  distributed for community review

Apr-Jun

2022

First draft: Internal Benchmarks distributed for community review

Update at RDA19

Jul-Sep

2022

Develop adoption plan

Oct-Dec

2022

Final deliverables

Update at RDA20

 

  1. Deliverables

This group will create Supporting Outputs in furtherance of the goals of the  GORC IG. Specifically, 3 documents:

D1: a list of observable international benchmarks of features, structures and functionality that can help define a Commons and that will feed into a roadmap of Commons interoperability. The benchmark criteria needs to remain simple, understandable and not skewed towards the particular reality of some of the commons so as not to appear as irrelevant or unattainable to commons developers. It will include a description of implementations observed or planned in Commons examined in this work.

D2: a non-redundant set of KPIs and success metrics currently utilized, planned or desired for existing science commons, and classified by functional layers defined by the GORC IG; how do we define a minimal interoperability

D3: Adoption plan, described below

  1. Mode and Frequency of Operation

The WG will meet monthly over Zoom, at a time to be determined by the membership. The WG will also communicate asynchronously online using the mailing list functionality provided by RDA and via shared online documents. If and when post-Covid international travel is restored during the 18 month work period of this WG then we will propose and schedule meetings during RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

  1. Addressing Consensus and Conflicts

The WG will adhere to the stated RDA Code of Conduct and will work towards consensus, which will be achieved primarily through mailing list discussions and online meetings, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by reviewing progress relative to the deliverables. Any new ideas about deliverables or work that the co-chairs deem to be outside the scope of the WG defined here will be referred back to the GORC IG to determine if a new WG should be formed.

  1. Community Engagement

The working group case statement will be disseminated to RDA mailing lists and communities of practice related to Commons development that are identified by the GORC IG in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. The GORC Benchmarking effort is also being facilitated by the RDA Secretariat, providing a strong intersection with the EOSC community - this will provide an additional level of community engagement. Similarly, the CODATA GOSC work, and the associated coordination of both efforts by the Data Together group, will provide additional engagement and outreach to the WDS and GO FAIR communities. When appropriate, draft outputs will also be published to relevant stakeholders and mailing lists to encourage broad community feedback, this will include both the GORC WG and GORC IG membership. When appropriate we will ask members of the WG to reach out to their own networks.

  1. Adoption Plan
    • The Adoption Plan will be detailed in an additional document that will provide additional information for the 2 primary outputs, and will include the following.
      1. Integration of the benchmarks into the Typology and larger GORC roadmap being created by the parent IG. 
      2. Integration/intersections with the CODATA GOSC work, including use cases.
      3. Promoted/tested in additional infrastructures not part of the CODATA GOSC or GORC IG work (some of which are listed below in Appendix A).

 

  1. Initial Membership

Co-chairs:

  1. Karen Payne <ito-director@oceannetworks.ca>
  2. Mark Leggott <mark.leggott@rdc-drc.ca>
  3. Andrew Treloar <andrew.treloar@ardc.edu.au>

Current members represent Europe, the U.S., Canada, Australia, and the UK. It is anticipated that additional membership will include colleagues from organizations that were part of the pre-P17 outreach, as well as members of the GORC IG and CODATA GOSC WG. The CODATA-led GOSC Symposium being planners for September 2021, will also generate additional memberships.

Appendix A: List of Commons

Pan National Commons

  1. European Open Science Cloud
  2. African Open Science Platform
    1. including H3Africa?
  3. Nordic e-Infrastructure Collaboration
  4. the Arab States Research and Education Network, ASREN

 

National Commons

European Roadmaps - The European Commission and European Strategy Forum on Research Infrastructures (ESFRI) encourage Member States and Associated Countries to develop national roadmaps for research infrastructures.

  1. German National Research Data Infrastructure (NFDI)
  2. DANS
  3. GAIA-X (non- member state?; see also) (focused on data sharing in the commercial sectors - without excluding research)
  4. UK JISC Open Research Framework

Non-European 

  1. China Science and Technology Cloud (CSTCloud); see also
  2. Australian Research Data Commons
  3. NDRIO (Canada)
  4. NII Research Data Cloud (Japan)
  5. KISTI (South Korea)

 

Domain Commons

  1. International Virtual Observatory Alliance (IVOA) (including SKA?)
  2. Earth Sciences[2]
    1. DataOne Federation
    2. Federation of Earth Science Information Partners (ESIP)
    3. EarthCube
    4. GEO / GEOSS (GEOSS Requirements lists functionality; GEOSS Common Infrastructure - GCI)
    5. Near-Earth Space Data Infrastructure for e-Science (ESPAS, prototype)
    6. Polar
      1. The Arctic Data Committee landscape map of the Polar Community
      2. Polar View - The Canadian Polar Data Ecosystem (includes international initiatives, infrastructure and platforms)
      3. Polar Commons / Polar International Circle (PIC) [not sure if this is active]
      4. PolarTEP
    7. Infrastructure for the European Network for Earth System Modelling (IS-ENES)
    8. Global Ocean Observing Systems (composed of Regional Alliances)
    9. Global Climate Observing System
    10. CGIAR Platform for Big Data in Agriculture
  3. Health and Life Sciences
    1. ELIXIR Bridging Force IG (in the process of being redefined as “Life Science Data Infrastructures IG”)
    2. NIH Data Commons; Office of Data Science Strategy (USA)
    3. AIRR Data Commons
    4. Global Alliance for Genomics and Health (GA4GH)
  4. Social Sciences & Humanities Open Cloud (SSHOC)
  5. Dissco https://www.dissco.eu/ Research infrastructure for natural collections (a commons for specimens and their digital twins)
  6. Datacommons.org - primarily statistics for humanitarian work

 

Appendix B: Draft List of WG/IG, documents, recommendations, frameworks and roadmaps from related and relevant communities to be reviewed during research phase

  1. RDA Outputs and Recommendations Catalogue
  2. RDA Data publishing workflows (Zenodo)
  3. RDA FAIR Data Maturity Model
  4. RDA 9 functional requirements for data discovery
  5. Repository Platforms for Research Data IG
  6. Metadata Standards Catalog WG
  7. Metadata IG
  8. Brokering IG
  9. Data Fabric IG
  10. Vocabulary Services IG
  11. Repository Platform IG
  12. International Materials Resource Registries WG
  13. RDA Collection of Use Cases (see also)
  14. Existing service catalogues (for example the eInfra service description template used in the EOSC)
  15. the Open Science Framework
  16. Matrix of use cases and functional requirements for research data repository platforms.
  17. Activities and recommendations arising from the interdisciplinary EOSC Enhance program
  18. Scoping the Open Science Infrastructure Landscape in Europe
  19. Docs from https://investinopen.org/about/who-we-are/
  20. Monitoring Open Science Implementation in Federal Science-based Departments and Agencies: Metrics and Indicators
  21. Next-generation metrics:Responsible metrics and evaluation for openscience. Report of the European Commission Expert Group on Altmetrics (see also)
  22. Guidance and recommendations arising from EOSC FAIR WG and Sustainability WG
  23. Outputs from the International FAIR Convergence Symposium (Dec 2020) (particularly the session Mobilizing the Global Open Science Cloud (GOSC) Initiative: Priority, Progress and Partnership
  24. The European Strategy Forum on Research Infrastructures (ESFRI) Landscape Analysis “provides the current context of the most relevant Research Infrastructures that are available to European scientists and to technology developers”
  25. NIH Workshop on Data Metrics (Feb 2020)
  26. WMO’s Global Basic Observing Network (GBON) has internationally agreed metrics to guide investments, “using data exchange as a measure of success, and creating local benefits while delivering on a global public good.”
  27. Evolving the GEOSS Infrastructure: discussion paper on stakeholders, user scenarios and capabilities
  28. There is a national open access policy in Ethiopia that was released last year, one in the first in Africa to my knowledge.  Part of AOSP?
  29. Briefing Note for CODATA Officers: CAS GOSC (Global Open Science Cloud) Project
  30. UNESCO Open Science Recommendation
  31. Open Science in the ISC Science Action Plan
  32. CODATA: Coordinating Global Open Science Commons Initiatives
  33. CODATA: Policies and Interoperability for Global Big Earth Data: a joint CASEarth and CODATA Workshop Session
  34. CODATA: Building a global network infrastructure for international cooperation on data-intensive science
  35. Outputs from European Plate Observing System (EPOS) under ERI (European Research Infrastructure Consortium) upcoming work package: “Strategy for engagement across solid Earth research infrastructures on a global scale" in the section Key initiative and infrastructure [architecture]
  36. A Research Data Infrastructure for Materials Science
  37. CeNAT (Costa Rica)
  38. Canada’s Roadmap for Open Science
  39.  Are there any ontologies for metrics and measurements we should be aware of?
 

[2] Both the Earth Sciences and Health and Life Sciences groups do not have an overarching governance structure and are not identifiable as a GORC per se. We anticipate creating sub groups that review the interoperability and development plans of these communities.

Review period start:
Wednesday, 7 April, 2021
Custom text:
Body:

Updated Case Statement

Review period start:
Thursday, 1 April, 2021
Custom text:
Body:

 

Name of Proposed Interest Group: Sensitive Data Interest Group

RDA site: https://www.rd-alliance.org/groups/sensitive-data-interest-group

 

1. Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community)

 

Sensitive Data: A working definition of sensitive data is: Information that is regulated by law due to possible risk for plants, animals, individuals and/or communities and for public and private organisations. Sensitive personal data include information related to racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership and data concerning the health or sex life of an individual. These data that could be identifiable and potentially cause harm through their disclosure. For local and government authorities, sensitive data is related to security (political, diplomatic, military data, biohazard concerns, etc.), environmental risks (nuclear or other sensitive installations, for example) or environmental preservation (habitats, protected fauna or flora, in particular). The sensitive data of a private body concerns in particular strategic elements or elements likely to jeopardise its competitiveness.
Adapted from: David et al., 2020, “Templates for FAIRness evaluation criteria - RDA-SHARC IG”
https://zenodo.org/record/3922069#.YCJU7ehKg2w

 

A range of disciplines collect data which are potentially sensitive, presenting serious barriers to reuse and reproducibility. There are a number of barriers which need to be overcome before sensitive data can be utilised safely and to its best advantage. One major challenge is that not all sensitive data is alike, with significant disciplinary variation in how sensitive data is defined, linked, managed, stored, and reused. Additionally, common approaches to working with, sharing and managing data are not always appropriate for sensitive data. For example, sensitive data exposes the different perspectives underlying the FAIR and CARE principles. Further, sensitive data requires careful stewarding such that it can be disseminated in an ethically and culturally appropriate way. Nonetheless, sensitive data has significant potential to be utilised in the conduct of novel and impactful work. Therefore, it is essential that a set of community standards and best practices be developed for sensitive data usage and management.

 

Issues the IG will address

In addition to issues identified by the RDA community as this IG develops, we envisage this IG will address the following issues:

  1. Data carries with it different levels of sensitivity depending on its context (e.g., research discipline, who the data is about, what the data is being used for). However, it is not always clear how we should assess data for sensitivities in different contexts. A resource is needed for those working with data to allow them to make informed decisions about data sensitivity and, consequently, data governance, management, and usage.
  2. Sensitive data is often identified. However, re-identification can be possible and can cause serious harm. Resources are needed on mechanisms of reidentification and the different risks for different types of sensitive data.
  3. Data that has been labeled sensitive is often not shared beyond the team that collected/created this data. This means that data collection is sometimes duplicated, and is a challenge for reproducible research. More ethically and culturally safe sharing of sensitive data may also enhance the robustness of research design and development. Resources are needed which provide information for those working with sensitive data with information about how that data can be shared and reused in a safe and ethical manner.
  4. At times there is a duality between sharing and reusing data in general, and for stewarding data in culturally and ethically appropriate ways. This duality is exacerbated in the context of sensitive data due to lower rates of data sharing, and increased potential for harm. Guidelines are needed for balancing principles of data sharing and reuse (e.g., FAIR) with ethically and culturally appropriate principles (e.g., CARE) specifically in the context of sensitive data.
  5. Consent is a major consideration when sharing any data, especially sensitive data. However, informed consent can be challenging to obtain, especially when reusing data. This is sometimes a barrier to sharing sensitive data. Guidelines are needed that explore consent models, especially post-hoc consent, for governing the primary and secondary use of sensitive data. 

 

How this IG is aligned with the RDA mission

The RDA Vision: This IG aligns with the RDA vision because it will develop mechanisms for the responsible reuse of sensitive data - a data source that is both extremely valuable but which also carries many ethical and cultural considerations. Sensitive data will play an increasingly significant role in addressing the grand challenges of the 21st century, such as issues of social and environmental justice. Indeed, the benefits and potential harms of sensitive data are increasingly being discussed in public forums as corporations and private companies leverage such data for profit. As mechanisms for sensitive data reuse become widely available (such as through the work of this IG), new innovation and invention will be fostered through the reuse of sensitive data. This IG has participants from University and non-University sectors, which strongly positions the IG to engage with all the variety of stakeholders.

 

The RDA Mission: This IG aligns with the RDA mission as it develops guidelines for the technical components of working with sensitive data, and for addressing the social aspects of working with sensitive data including fostering discussion around the cultural and ethical considerations of data reuse. This IG is well positioned to meet these challenges given the diverse backgrounds of the initial members. The connection between the technical aspects of working with sensitive data (such as secure virtual environments) and the ethical and cultural aspects (such as consent, disciplinary perspectives and norms, and CARE principles) is a key point of interest for this IG.

 

How this IG would be a value-added contribution to the RDA community

Sensitive Data is ubiquitous. However, its context varies. For this reason, this IG complements the work of a range of existing IGs and WGs, including:

 

 

 

  • Infectious Diseases Community of Practice (forthcoming)

 

 

The aims of the Sensitive Data IG is to provide a space to focus explicitly on sensitive data. While the scope is interdisciplinary, this IG focuses on sensitive data types. Our planned activities will compliment the above IGs as we address sensitive data in domain specific terms (e.g., sensitive data in the health domains) as well as in general terms (e.g., systems for sharing sensitive data). The Sensitive Data IG already has members from a number of the above IGs, which will aid us in coordinating our activities with these groups. The Sensitive Data co-chairs are collectively members of over 20 RDA groups.

 

All members of the Sensitive Data IG are also active members of the RDA community. We will draw on this to ensure that our efforts take account of previous work in the RDA, and to ensure that our group remains up-to-date on RDA activities.

 

2. User scenario(s) or use case(s) the IG wishes to address
(what triggered the desire for this IG in the first place):

 

We identified the following key reasons for forming this IG. We envisage that additional use cases will be developed through working with the RDA community following endorsement.

  1. There are a lack of guidelines for working with sensitive data both within and between disciplines/research areas. One reason for this is because sensitive data varies between contexts (e.g., between disciplines). To develop a cohesive but also targeted set of guidelines, a group is needed which comprises members of a range of disciplines with a shared interest in sensitive data.
  2. There is a need for a framework which considers the ethical and cultural aspects of sensitive data, alongside the technical aspects. Individuals may want to share their sensitive data and may have conducted all the necessary ethical/cultural safe guards. However, they may lack an understanding of how this can be achieved with the technical resources available to them, what repository or sharing mechanism can handle such data, and how best to access persistent IDs which allow them to track the use of their data. Conversely, individuals may have the ideal technological solution for sharing without an understanding of the ethical/cultural considerations. A group is needed to facilitate a dialogue between the ethical/cultural and technical aspects of sensitive data sharing, and to produce tangible outputs which progress this discussion.
  3. There is a general consensus that sensitive data is highly valuable but that it is not being utilised to its full potential. While there is a range of anecdotal support for this claim, a body of work is needed which explores and documents the state of sensitive data primary and secondary usage, and which examines the underlying causes of sensitive data reuse practices within and between disciplines.
  4. There is a recognition that there are a number of stakeholders with respect to  sensitive data assets, and that each stakeholder has different requirements, needs, expectations, and terminology (e.g., in the case of health data, government, hospitals, researchers, community members). A group is needed which can synthesise the main expectations of different stakeholders to develop resources of individuals and organisations to use when engaging with, sharing, and accessing sensitive data (i.e., a resource for a shared language between stakeholders).
  5. There is a need for adequate and specialised resourcing and infrastructure to manage, work with, and share sensitive data. Different data types require different solutions for management, analysis, and sharing. While a range of solutions are available for these different data types, their suitability for sensitive data is not always clear. Work is required to assess solutions for different sensitive data types specifically.
  6. Our era is experiencing the most brutal collapse in biodiversity that the earth has known. Yet biodiversity produces many ecosystem services, and resources. However, species and habitat diversity is undermined by many human activities. The preservation of both fragile and overly coveted species and resources makes the publication of their geolocation sensitive. Other data concerning the characteristics of certain pathogens have also proven to be sensitive.
  7. The humanities and social science disciplines likewise require clear guidance regarding collection, use and reuse of sensitive data. This may encompass specific ethical considerations pertaining to data collection (e.g., balancing FAIR v CARE principles), research data collection methods when working with vulnerable individuals or communities often on sensitive topics, the joining of disparate datasets, and considerations of how long such data should be retained, and where.

 

3. Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.  Articulate how this group is different from other current activities inside or outside of RDA.):[

 

  1. Using the definition presented at the top of this document as a starting point, develop a shared understanding and refined definition of sensitive data.
  2. Define various levels of “sensitivity” for data.
  3. Data should be as open as possible and as closed as necessary. Within this context, develop an understanding of how sensitivity relates to openness.
  4. Identify different consent models.
  5. Identify types of sensitive data holdings and resources across various domains.
  6. Identify existing data definitions and standards for different types of sensitive data.
  7. Identify challenges in collecting, using and sharing sensitive data.
  8. Engage with key stakeholders working in the area of sensitive data management/analytics .
  9. Identify existing solutions for sensitive data collection, analysis, storage and dissemination.
  10. Identify differences in how sensitive data is managed between groups and regions.

 

4. Participation (Address which communities will be involved, what skills or knowledge they should have, and how will you engage these communities. Also address how this group proposes to coordinate its activity with relevant related groups.):

Use these people to help grow the case studies

 

While the interested participants in this interest group are currently mostly from Australia, we have been working to establish this group as part of a global Community of Practice. We are currently developing a strategy to achieve international engagement.

 

To further this effort, the group has seen the recent addition of chairs from Europe and the USA to the group. The Social Science Interest Group, which comprises a broad international membership base and chairs from Norway, USA and Australia, also has formal participation in the Sensitive Data interest group.

 

The next phase of this engagement strategy will be through specific engagement with RDA groups and other stakeholders covering a range of domains and geographic regions. Specific stakeholders to be approached are still to be determined, but will be drawn from the these target groups These include:

  • RDA Interest Groups: Social Science IG (established), International Indigenous Data Sovereignty IG (Initial approach made, pending response), Ethics and Social Aspects of Data IG, RDA-COVID19 WG (and the various sub-groups), Reproducible Health Data Services WG, Epidemiology common standard for surveillance data reporting WG, Domain Repositories IG, Health Data Interest Group, RDA/NISO Privacy Implications of Research Data Sets IG, Virtual Research Environment IG, Social Dynamics of Data Interoperability IG
  • Communities outside of RDA: Relevant domain and discipline communities, eg. The SSHOC and EOSC work programs around sensitive data, US and Canadian networks of Research Data Centres, International and Regional Statistical Agencies (WHO, UNStat, Eurostat, National Statistical Offices), (HEALTH DATA EXAMPLE COMMUNITIES??)?

 

 

5. Outcomes (Discuss what the IG intends to accomplish. Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

  1. To identify the key expectations of the community and use these to refine the IG's objectives.
  2. List different types of data across disciplines such as health, social sciences, etc and how different levels of sensitivities apply to those types of data.
  3. Identify best practices in sensitive data management across multiple regions, domains and disciplines and how to adapt the best practices.
  4. Engage with relevant RDA IGs, WGs and CoPs to identify priorities in the area of sensitive data management.
  5. Gather common guidelines and recommendations for working with sensitive data in different disciplines and in different regions.
  6. Catalogue of ethical, philosophical and cultural principles that underpin the use of sensitive data assets.

 

6. Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

The IG will meet every 3 - 4 weeks via Zoom. Meeting times will be alternated to accommodate as many time zones as possible. Google Docs will be used to develop shared documentation. Email will be used to communicate about meetings and tasks requiring follow-up between meetings. The current chairs/members of the IG are already successfully using this system to meet and maintain momentum.

 

The IG will also meet regularly at Plenaries as an opportunity to workshop new ideas with the RDA community and foster new engagements. The group will also establish an informal communication channel through Slack, or a similar platform, to allow for ongoing conversation. The group will also organise webinars and information sessions between Plenaries to share ideas and for group members to stay in touch with the activities of the group. The IG will also use our RDA page to share documents and communicate regularly with the RDA community.

 

7. Timeline (Describe draft milestones and goals for the first 12 months.):

 

Initial activities: The group met for the first time as a Birds of a Feather session at RDA 16. Following this, a core group of interested members met to begin drafting the group charter. This group also submitted a proposal for an IG session for RDA 17. The group will send the draft charter for initial TAB review and community consultation in the lead up to RDA 17. The draft charter will also be sent for feedback specifically to members who have joined the IG page and who attended the BoF session at RDA 16. The draft charter and TAB/community/group feedback will be discussed at the RDA 17 session. Following this, the revised charter will be submitted for formal endorsement.

 

First 12 months: Once the IG is formally endorsed, we will undertake the following activities in the first 12 months:

  1. Formally launch the IG - update our RDA IG site, call for additional co-chairs, share the approved charter with group members, establish a regular meeting time, establish RDA mailing list for the IG.
  2. Engage in group consultation to identify the main themes of interest and develop a strategy for establishing working groups/task forces to address these.
  3. Engage with stakeholders for feedback on key sensitive data issues and to develop the IGs networks within and outside of RDA.
  4. Invite existing RDA IGs identified in section 4 above to provide feedback on, and participate in, working groups/task forces themes.
  5. Presentation of webinar/workshop to workshop working group/task force topics and open the working groups/task forces topics for group comment through interactive platforms like Google Docs. 
  6. Formalise the working groups/task forces, share the goals of the working groups/task forces with the group and RDA more broadly to increase participation, prepare for RDA18 as an opportunity to share progress of the IG and working groups/task forces.
  7. Prepare reports and outputs from the working groups/task forces, share reports with the community, present a webinar/workshop to share the outputs with the community.
  8. Hold an IG meeting to assess the progress from the preceding 12 months and determine the next steps for working groups/task forces.

 

 

8. Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest.)

 

  • People interested in leadership:

FIRST NAME

LAST NAME

EMAIL

TITLE/AFFILIATION

Kristal

Spreadborough

kristal.spreadborough@unimelb.edu.au

University of Melbourne, Research Data Specialist  

Aleks

 

Michalewicz

aleksm@unimelb.edu.au

University of Melbourne, Research Data Specialist  

Priyanka

Pillai

priyanka.pillai@unimelb.edu.au

University of Melbourne, Research Data Specialist  

Nichola

Burton

nichola.burton@ardc.edu.au

ARDC, Data Technologist

Keith

Russell

keith.russell@ardc.edu.au

ARDC, Manager (Engagements)

Stefanie

Kethers

stefanie.kethers@ardc.edu.au

ARDC, RDA Director of Operations

Steven

 Mceachern

steven.mceachern@anu.edu.au

 

Australian Data Archive, Director

Romain

David

Romain.david@erinha.eu

Data manager, Research fellow European Research Infrastructure on Highly Pathogenic Agents

Dharma

 Akmon dharmrae@umich.edu
Director of Project Management and User Support
Assistant Research Scientist
Inter-university Consortium for Political and Social Research
University of Michigan
  • People who have joined on the Sensitive Data RDA IG so far

 

 

Name

Country

Frankie Stevens

Australia

Vince Bayrd

United States

Bénédicte Madon

France

Tiiu Tarkpea

Estonia

Lars Eklund

Sweden

Kristan Kang

Australia

Amy Nurnberger

United States

Su Nee Goh

Singapore

Robert Pocklington

Australia

Kristan Kang

Australia

Genevieve Rosewall

Australia

Graham Smith

United Kingdom

 

  • People who attended the BoF expressed interest in participating following the BoF:

Name

Affiliation and role

Email

Interested in participating further?

Marjolaine Rivest-Beauregard

McGill University,

MSc student

marjolaine.rivest-beauregard@mail.mcgill.ca

Yes

Kiera McNeice

Cambridge University Press, Research Data Manager

kmcneice@cambridge.org

Yes

Matthew Viljoen

EGI Foundation, Service Delivery and information security lead

matthew.viljoen@egi.eu

Yes

Stephanie Thompson

Research Data Management, University of Birmingham

s.e.m.thompson@bham.ac.uk

Yes

Y. G. Rancourt

Portage Network, Curation Officer

yvette.rancourt@carl-abrc.ca

Yes

Thea Lindquist

University of Colorado Boulder, Center for Research Data and Digital Scholarship, Executive Director

thea.lindquist@colorado.edu

 

Briana Ezray

Penn State University, Research Data Librarian - STEM

bde125@psu.edu

Yes

Gen Rosewall

Agile Business Analyst, AARNet

gen.rosewall@aarnet.edu.au

Yes

Becca Wilson

University of Liverpool, UK ; Research Fellow

becca.wilson@liverpool.ac.uk

Yes

Karen Thompson

University of Melbourne

karen.thompson@unimelb.edu.au

Yes

Jeaneth Machicao

Universidade de São Paulo / Research fellow

machicao@usp.br

 

Jules Sekedoua KOUADIO

Gustave Eiffel University

jules.kouadio@univ-eiffel.fr

yes

Mahamat Abdelkerim Issa

Institut national de recherche scientifique (INRS), Québec, CA, Phd. Student

Mahamat_Abdelkerim.Issa@ete.inrs.ca

Yes

Erin Clary

Portage, Canadian Association of Research Libraries

erin.clary@carl-abrc.ca

Yes

Kylie Burgess

Research Data Lead, University of New England

kburge22@une.edu.au

Yes

 

 

 

Review period start:
Tuesday, 23 February, 2021 to Tuesday, 23 March, 2021
Custom text:
Body:

The Data Granularity Task Force of the Data Discovery Paradigms Interest Group (DDPIG) of the Research Data Alliance (RDA) proposes to form an RDA Data Granularity Working Group (WG).  This WG would address issues of data granularity in data discovery, access, interoperability, analysis, citation, and more. More efficient and effective reuse of data requires that users can find and access data at various levels of granularity. The WG will explore key questions and collect and share valuable information for how to best support data granularity, providing guidance to help data professionals to determine the best level of granularity for user discovery, access, interoperability and citability. 

The activities and final recommendations of the Data Granularity WG will build upon and complement existing and ongoing work of several RDA Working and Interest Groups that touch upon the subject of data granularity. The final deliverable for the WG is a set of collected use cases and a guidance document of data granularity approaches for prioritized use cases, including terminology, methods to evaluate approaches, and a summary of community feedback.

Review period start:
Friday, 5 February, 2021 to Friday, 5 March, 2021
Custom text:
Body:

RDA Case Statement

GORC International Benchmarking WG

 

  1. Charter

The Global Open Research Commons (GORC) is an ambitious vision of a global set of interoperable resources necessary to enable researchers to address societal grand challenges including climate change, pandemics, and poverty. The realized vision of GORC will provide frictionless access to all research artifacts including, but not limited to: data, publications, software and compute resources; and metadata, vocabulary, and identification services to everyone everywhere, at all times.

The GORC is being built by a set of national, pan-national and domain specific organizations such as the European Open Science Cloud, the African Open Science Platform, and the International Virtual Observatory Alliance (see Appendix A​       for a fuller list). The ​  GORC IG is working on a set of deliverables to support coordination amongst these organizations, including a roadmap for global alignment to help set priorities for Commons development and integration. In support of this roadmap, this WG will develop and collect a set of benchmarks for GORC organizations to measure their user engagement and development internally within the organization, gauge their maturity and compare features across commons. 

In the first case, the WG will collect information about how existing commons are measuring success, adoption or use of their services within their organization, such as data downloads, contributed software, and similar KPI and access statistics. 

Secondly, we will also develop, validate, collect and curate a set of benchmarks that will allow Commons developers to compare features across science clouds. In the latter case for example, we would consider benchmarks such as evidence or the existence of : 

  1. A well defined decision making process
  2. A consistent and openly available data privacy policy
  3. Federated Authentication and Authorization infrastructure
  4. Community supported and well documented metadata standard(s)
  5. A workflow for adding and maintaining PIDs for managed assets
  6. A mechanism for utilizing vocabulary services or publishing to the semantic web
  7. A process to inventory research artefacts and services
  8. An Open Catalogue of these artefacts and services
  9. A proven workflow to connect multiple different research artefact types (e.g. ​data and publications; data and electronic laboratory notebooks; data and related datasets)
  10. A mechanism to capture provenance for research artefacts
  11. Mechanisms for community engagement and input; an element or scale for inclusion

We anticipate that the first set of metrics will be quantitative measures used within an organization, while the second set of benchmarks will be comparable across organizations.

  1. Value Proposition

This WG is motivated by the broader goal of openly sharing data and related services across technologies, disciplines, and countries to address the grand challenges of society. The deliverables of the WG itself will inform roadmaps for development of the infrastructure necessary to meet that goal, while engagements and relationships formed during the work period will help forge strong partnerships across national, regional and domain focused members which are crucial to its success. Identifying observable and measurable benchmarks in pursuit of the global open science commons will help create a tangible path for development and support strategic planning within and across science commons infrastructures. In the future, best practices for commons development will emerge based on the experience of what actions led to successful outcomes. This work will provide a forum for discussion that will allow members to identify the most important features and the minimal elements required to guide their own development and build a commons that is globally interoperable. Finally, it will support developers as they seek resources to build the global commons by helping them respond to funding agencies requirements for measurable deliverables.

The proposed WG was discussed at the RDA 16 virtual plenary.[1] Participants discussed the initial work packages and agreed during the meeting this was a worthy goal and an appropriate approach. 

 

  1. Engagement with Existing Work

The GORC IG builds on, and incorporates the previous National Data Services IG. The Commons that will be investigated in this WG are likely either to have considered or implemented outputs from other RDA groups, such as the  ​Domain Repositories IG, the ​Data Fabric IG, and the Virtual Research Environment IG, just to name a few. These groups and many others outside of RDA will have recommendations that speak to functionality and features of various components of Commons; for example the re3data.org schema for collecting information on research data repositories for registration, the EOSC ​FAIR WG and Sustainability WG that seek to define the EOSC as a Minimum Viable Product (MVP).  We will review these and other related outputs to see if they have identified benchmarks that we feel will support our goals. This review period will ensure that we do not duplicate existing efforts. ​Appendix B of this case statement identifies a few of these existing efforts, both within and without RDA; this list will be expanded and reviewed by the WG members.

 

  1. Work Plan

To create these deliverables, members of the group will:

  1. Create a target list of Commons (​Appendix A)
  2. Review public facing documentation of each Commons to extract benchmarking information (both KPIs and feature lists). 
  3. Review public facing documentation of recommendations and roadmaps from related communities to extract benchmarking information (Appendix B​). This evaluation phase will include an examination of the outputs from other RDA WGs and position papers available in the wider science infrastructure community, along with experiences gathered by the WG’s members.
  4. Because benchmarking information may not be easily found in public documents we will conduct outreach to Commons representatives and related organizations to ask for additional feedback and information about benchmarks used by their community.  This may include benchmarks already in use, as well as benchmarks that organizations feel would be useful but which are not yet implemented.
  5. Synthesize and document the benchmarks into 2 deliverables, described below..

We anticipate that the WG will create sub-working groups or task groups. The WG will decide if they would rather define the task group according to the deliverables, creating a Commons Internal Benchmarking TG and a Commons External Benchmarking TG, or if they would rather subdivide according to a typology of the commons, for example with some members looking at pan-national, national, or domain specific commons, or by some other subdivision of labor.

The WG will proceed according to the following schedule:

Month

Activity

Jan-Mar

2020

Group formation

  1. Agreement on the scope of work and deliverables (broad scope)
  2. Case statement community review
  3. Creation of sub-working groups

Apr-Sept

2021

Begin literature review of public facing documents from Science Commons and related organizations

Refine scope: Meeting point to consolidate list of topics to be addressed in the deliverables and assess level of resource available to achieve them

Oct-Dec

2021

Begin outreach to Science Commons and related organizations

Update at ​RDA17

Jan-Mar

2021

First draft: Internal Benchmarks distributed for community review

 

Mar-Jun

2022

First draft: External Benchmarks  distributed for community review

July

2022

Final deliverables

 

  1. Deliverables

This group will create Supporting Outputs in furtherance of the goals of the  ​GORC IG.

Specifically, 2 documents: 

D1: a non-redundant set of KPIs and success metrics currently utilized, planned or desired for existing science commons, and 

D2: a list of observable international benchmarks of features, structures and functionality that can help define a Commons and that will feed into a roadmap of Commons interoperability.

D3: Adoption Plan: described in section 9 below.  

  1. Mode and Frequency of Operation

The WG will meet monthly over Zoom, at a time to be determined by the membership. The WG will also communicate asynchronously online using the mailing list functionality provided by RDA and via shared online documents. If and when post-Covid international travel is restored during the 18 month work period of this WG then we will propose and schedule meetings during RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

  1. Addressing Consensus and Conflicts

The WG will adhere to the stated RDA Code of Conduct and will work towards consensus, which will be achieved primarily through mailing list discussions and online meetings, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by reviewing progress relative to the deliverables. Any new ideas about deliverables or work that the co-chairs deem to be outside the scope of the WG defined here will be referred back to the GORC IG to determine if a new

WG should be formed.  

  1. Community Engagement

The working group case statement will be disseminated to RDA mailing lists and communities of practice related to Commons development that are identified by the GORC IG in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Similarly, when appropriate, draft outputs will also be published to relevant stakeholders and mailing lists to encourage broad community feedback.

  1. Adoption Plan

The WG will create an adoption plan for distributing and maintaining the deliverables.  A specific plan will be developed to facilitate adoption or implementation of the WG

Recommendation and other outcomes within the organizations and institutions represented by WG members.  This will include possible strategies  for adoption more broadly within the global community, and in such a way as to facilitate interoperability of global infrastructures.  Pilot adoptions or implementations would ideally start within the 18 month timeframe before the WG is complete. We envision implementation occurring when developers of commons compare themselves with similar organizations. We also envision the adoption plan will speak to howwe include the benchmarks in the larger GORC roadmap being created by the parent IG. 

 

 

  1. Initial Membership

Co-chairs: 

  1. Karen Payne <​ito-director@oceannetworks.ca>
  2. Mark Leggott <​mark.leggott@rdc-drc.ca> 
  3. Andrew Treloar <​andrew.treloar@ardc.edu.au

 

 Appendix A: List of Commons

 

Pan National Commons

  1. European Open Science Cloud
  2. African Open Science Platform
  3. Nordic e-Infrastructure Collaboration
  4. the Arab States Research and Education Network, ASREN
  5. LIBSENSE  (​LIBSENSE is a community of practice, not an infrastructure. The infrastructure will be built by the RENs, NRENs and universities)
  6. WACREN  
  7. LA Referencia 

 

National Commons

European Roadmaps -​The European Commission and European Strategy Forum on Research Infrastructures (ESFRI) encourage Member States and Associated Countries to develop national roadmaps for research infrastructures.  

  1. German National Research Data Infrastructure (NFDI)
  2. DANS
  3. ATT (Finland) 
  4. GAIA-X (non- member state?; see also) (​focused on data sharing in the commercial sectors - without excluding research)
  5. UK
    1. UK Research and Innovation
    2. JISC
    3. Digital Curation Centre

Non-European 

  1. QNL (Qatar) 
  2. China Science and Technology Cloud (CSTCloud); ​​  see also 
  3. Australian Research Data Commons
  4. Canadian National Data Services Framework (in development)
  5. National Research Cloud (US; AI focused)​ 
  6. NII Research Data Cloud (Japan) 
  7. KISTI (South Korea)

 

Domain Commons

  1. International Virtual Observatory Alliance (IVOA)
  2. NIH Data Commons; Office of Data Science Strategy​           (USA)
  3. NIST RDaF (USA)
  4. Earth Sciences 
    1. DataOne Federation
    2. Federation of Earth Science Information Partners (ESIP)​    
    3. EarthCube
    4. GEO / GEOSS
    5. Near-Earth Space Data Infrastructure for e-Science (ESPAS, prototype)​       
    6. Polar
      1. The Arctic Data Committee landscape map of the Polar Community 
      2. Polar View - The Canadian Polar Data Ecosystem (includes international​       initiatives, infrastructure and platforms)
      3. Polar Commons / Polar International Circle (PIC) [not sure if this is active] iv.            PolarTEP
    7. Infrastructure for the European Network for Earth System Modelling (IS-ENES)​        
  5. Global Ocean Observing Systems (composed of ​Regional Alliances)
  6. CGIAR Platform for Big Data in Agriculture
  7. Social Sciences & Humanities Open Cloud (SSHOC)
  8. Dissco ​https://www.dissco.eu/ Research infrastructure for natural collections (a commons for specimens and their digital twins)
  9. ELIXIR Bridging Force IG (in the process of being redefined as “Life Science Data Infrastructures IG”)
  10. Global Alliance for Genomics and Health (GA4GH) 
  11. Datacommons.org - primarily statistics for humanitarian work​        

 

Gateway/Virtual Research Environment/Virtual Laboratory communities and other Services

  1. International Coalition on Science Gateways
  2. Data Curation Network
  3. CURE Consortium
  4. OpenAire
  5. RDA VRE IG

 

 Appendix B: Draft List of WG/IG, documents, recommendations, frameworks and roadmaps from related and relevant communities

 

  1. RDA Outputs and Recommendations Catalogue
  2. RDA D​ata publishing workflows (​Zenodo)
  3. RDA FAIR Data Maturity Model
  4. RDA 9 functional requirements for data discovery
  5. Repository Platforms for Research Data IG
  6. Metadata Standards Catalog WG  
  7. Metadata IG
  8. Brokering IG
  9. Data Fabric IG
  10. Repository Platform IG
  11. International Materials Resource Registries WG
  12. RDA Collection of Use Cases (see also)
  13. Existing service catalogues (for example the ​eInfra service description template used in the EOSC)
  14. the Open Science Framework
  15. Matrix of use cases and functional requirements for research data repository platforms.
  16. Activities and recommendations arising from the interdisciplinary EOSC Enhance program
  17. Scoping the Open Science Infrastructure Landscape in Europe
  18. Docs from https://investinopen.org/about/who-we-are/  
  19. Monitoring Open Science Implementation in Federal Science-based Departments and Agencies: Metrics and Indicators
  20. Next-generation metrics:Responsible metrics and evaluation for openscience. Report of the European Commission Expert Group on Altmetrics (​see also)
  21. Guidance and recommendations arising from EOSC FAIR WG and ​​   Sustainability WG
  22. Outputs from the International FAIR Convergence Symposium (Dec 2020) (particularly the session Mobilizing the Global Open Science Cloud (GOSC) Initiative: Priority, Progress and Partnership
  23. The European Strategy Forum on Research Infrastructures (​ESFRI) Landscape Analysis “provides the current context of the most relevant Research Infrastructures that are available to European scientists and to technology developers”
  24. NIH Workshop on Data Metrics (Feb 2020)

 

 

 
Review period start:
Friday, 8 January, 2021 to Monday, 8 February, 2021
Custom text:
Body:

 

CASE STATEMENT: RDA/CODATA Epidemiology common standard for surveillance data reporting WG

See, also:

 

 

1. WG CHARTER

A concise articulation of what issues the WG will address within a 12-18 month time frame and what its “deliverables” or outcomes will be.

 

In May 2020, the Organization for Economic Cooperation and Development (OECD) discussed why and how Open Science is critical to preventing and combating pandemics such as COVID-19 caused by the novel coronavirus, SARS-CoV-2 (OECD 2020). Open Science is transparent and accessible knowledge that is shared and developed through collaborative networks (Vicente-Saez and Martinez-Fuentes 2018). FAIR (findable, accessible, interoperable, and reusable) data principles are an integral part of Open Science. FAIR data principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with no or minimal human intervention) (GoFAIR).  

 

However, there is an urgent need to develop a common standard for reporting communicable disease surveillance data without which Open Science and FAIR data will be difficult to achieve. Limited by antiquated systems and the lack of an established infrastructure, the tempo of the spread of the disease has outpaced our ability to react and adjust (Austin et al. 2020a,b; Garder et al. 2020). 

 

The need for developing a common standard for reporting epidemiology surveillance data was articulated by the RDA COVID-19 Epidemiology work group (WG) in their recommendations and guidelines, and supporting output (RDA COVID-19 WG 2020; RDA COVID-19 Epidemiology WG 2020). 

 

On October 27, 2020, the WHO, UNESCO, HCHR, and CERN issued a Joint Appeal for Open Science, a call on the international community to take all necessary measures to enable universal access to scientific progress and its applications UNESCO et al. 2020; UNESCO 2020:

 

"The open science movement aims to make science more accessible, more transparent and thereby more effective. A crisis such as the COVID-19 pandemic demonstrates the urgent need to strengthen scientific cooperation and ensure the fundamental right to universal access to scientific progress and its applications. Open Science is about free access to scientific publications, data and infrastructure, as well as open software, open educational resources and open technologies such as tests or vaccines. Open science also promotes trust in science, at a time when rumours and false information abound."

 

Michelle Bachelet, United Nations High Commissioner for Human Rights stated, 

 

"Data are a vital human rights tool."

 

The WG will build upon existing standards and guidelines to develop uniform definitions and data elements to improve data comparability and interoperability. 

 

We will build upon the work begun by the RDA COVID-19 Epidemiology WG, and extend beyond the COVID-19 pandemic to provide an actionable specification for reporting communicable disease surveillance data and metadata, including geospatial data.

 

This work will be a consensus building effort that contributes to CODATA’s Decadal programme:

  • Enabling Technologies and Good Practice for Data-Intensive Science
  • Mobilising Domains and Breaking Down Silos
  • Advancing Interoperability Through Cross-Domain Case Studies

 

Outcome

A standard specification for reporting communicable disease surveillance data. 

 

2. VALUE PROPOSITION

A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.

 

Epidemiology surveillance data will enable governments and public health agencies to detect and respond to newly emergent threats of disease. Early detection may prevent development of epidemics and pandemics. It will also enable them to deliver more effective responses at all stages of the threat, from emergence through containment, mitigation, and reopening of society in the case of pandemics. Epidemiology surveillance data and geospatial data are large and varied. Treated as a strategic asset, they have the potential to support evidence-informed policy, stimulate new research areas, expand collaboration opportunities, and increase the health and economic well-being of society. A common standard for reporting epidemiology surveillance data will support these outcomes by improving data and metadata management and provision of findable, accessible, interoperable, reusable, ethical, and reproducible (FAIRER) data. 

 

The common standard for reporting epidemiology surveillance data is intended for implementation by government and international agencies, policy and decision-makers, epidemiologists and public health experts, disaster preparedness and response experts, funders, data providers, teachers, researchers, clinicians, and other potential users.

 

3. ENGAGEMENT WITH EXISTING WORK IN THE AREA

A brief review of related work and plan for engagement with any other activities in the area.

 

Nature of the problem to be addressed

The World Health Organization (WHO) defines public health surveillance as, “An ongoing, systematic collection, analysis and interpretation of health-related data essential to the planning, implementation, and evaluation of public health practice” (WHO 2020a). The WHO is a source of international standardized COVID-19 data and evidence-based guidelines, and is an invaluable source of technical guidance (WHO 2020b). Available instruments include a case-based reporting form, data dictionary, template, and aggregated weekly reporting form (WHO 2020c). There is also a global COVID-19 clinical data platform for clinical characterization and management of hospitalized patients with suspected or confirmed COVID-19 (WHO 2020d). The WHO (2020e) also notes that continued vigilance is needed to detect the emergence of novel zoonotic viruses affecting humans. 

 

Unfortunately, there are inconsistencies in the manner in which various jurisdictions agencies collect and report their data. This is due to gaps in existing standards, and a failure to comply with those standards that do exist. 

 

COVID-19 threat detection has been slow and ineffective, resulting in rapid development of a pandemic. Countries around the world have implemented a disparate series of public health measures in attempting to suppress and mitigate spread of the disease. The world was not prepared to respond to a novel zoonose that spreads with the tempo and severity of COVID-19 (Greenfield et al. 2020). The pandemic has resulted in serious health and economic consequences for both High Income Countries (HICs) and for Low and Middle Income Countries (LMICs) (Bong et. al, 2020). 

 

The RDA COVID-19 WG recommendations, guidelines, and supporting output highlighted discrepancies in the number of COVID-19 incident and mortality data across data sources which could be directly attributed to varying definitions and reporting protocols (RDA COVID-19 Epidemiology WG, 2020a,b). For example, mortality data from COVID-19 are frequently not comparable between and within jurisdictions due to varying definitions (Dudel, 2020). Variations resulting from discrepancies in official statistics limit effective disease-specific strategies (Modi et al., 2020; Modig & Ebeling, 2020). Other variables  (e.g., confirmed cases, probable cases, probable deaths, negative tests, recoveries, critical cases) are also inconsistently defined (Austin et al. 2020b). For example, while the WHO (2020f) defines a confirmed case as "a person with laboratory confirmation of COVID-19 infection, irrespective of clinical signs and symptoms", other datasets report confirmed cases as the number of both laboratory positive subjects and probable cases (JHU, 2020). The US CDC (2020) has amended its previous policy and now reports case counts from commercial and reference laboratories, public health laboratories, and hospital laboratories, but still excludes data from other testing sites within a jurisdiction (e.g. point-of-care test sites). In Turkey, the number of cases published until the end of July represented only symptomatic COVID-19 subjects, excluding asymptomatic laboratory positive individuals (Reuters Editors, 2020). Other issues affecting data accuracy include: duplicate event records, laboratory report delays, missing data, and incorrect dates.

 

Much of the developed world has a notifiable disease surveillance system for effective and efficient reporting within their national limits, with varying fields of data elements. There exist, also, a large number of international data standards which should be used when reporting epidemiology surveillance data (Table 1). However, these do not address specific requirements that would ensure that epidemiology surveillance data are comparable and interoperable. 

 

Table 1. Initial list of data standards useful for notifiable disease surveillance systems (non-exhaustive list) [SOURCE: Haghiri et. al. 2019].

 

Format

Proposed Standard

Machine-organizable data

HL7

Medical document exchange format

Clinical Document Architecture (CDA), Continuity of Care Document (CCD), and Continuity Care Record (CCR)

Markup language

XML Document Transform (XDT)

Classification systems

International classification of disease (ICD, ICD9, ICD9-CM)

Other classification systems (DRG, CPT, ICECI, HCPCS, ICPM, ICF, DSM)

Nomenclature systems

LOINC

SNOMED

Rx NORM

Standard content-maker formats

Standard address format definition, standard contact number format definition, standard ID format definition, and standard date format definition

 

Disease surveillance systems rely on complex hierarchies for data reporting. Raw data are collected at local level followed by anonymization and aggregation as necessary before sending it up the hierarchy which includes many levels. Even in many of the most developed regions of the world, much of this process continues to be done by hand, although the push to electronic medical records is gaining traction. As a result, most disease surveillance systems across the world experience reporting lags of at least one to two weeks (Fairchild et.al 2018, Janati et.al 2015).

 

Publicly available data are made available on websites that are often difficult to navigate to find the data and associated definitions.

 

Historical data are not fixed the first time they are published due to undetected errors, late or missing data, laboratory delays, etc. The dataset is updated when the data becomes available. This problem, often called “backfill,” is due to the complex reporting hierarchy and antiquated systems that disease surveillance systems rely on. Backfill can in some cases drastically affect analyses (Fairchild et.al 2018). The problem is compounded when corrected and missing case counts are added to the date on which the correction was reported, instead of the date on which the event occured. 

 

Case definitions used in epidemiological surveillance data are not clearly defined. Publicly available data are made available on websites that are often difficult to navigate to find the data and associated definitions. Fairchild et al. (2018) have highlighted the challenges with data reporting and stressed the importance of explicit and clear case definitions. Even with standardized definitions, regions with little support on funding for public health institutions may struggle to adopt a framework of best practices. We will develop guidelines that recognise these limitations and that will support both LMICs and HICs.

 

Engagement with other related activities

The proposed “Epidemiology common standard for surveillance data reporting WG” will address a high-priority challenge based on assessments of public health needs during a pandemic using COVID-19 as a use case.

 

The initial WG membership (see Section 6, below) is well connected to various community-based initiatives and WGs that address similar and other relevant topics. The WG will monitor and align its efforts with other related activities, including:

 

RDA WG and interest groups (IG):

Others:

See, also, “Solicited WG membership in Section 6, below.”

 

4. WORK PLAN 

A specific and detailed description of how the WG will operate.

 

Deliverables

D1 (months 4-12). Epidemiology common standard for surveillance data reporting.

This deliverable will contain the developed common standard specification for reporting epidemiology surveillance data, including variable names, definitions, and rationale.

 

D2 (months 12-16). Guidelines for adopting the common standard.

Guidelines will be based on lessons learned during development of the standard.

 

Milestones

M1 (months 0-1). Engagement of representatives from prominent stakeholders in public health. 

We will seek engagement with the WHO, eCDC, US-CDC, ICMR etc.

 

M2 (months 0-3). Identification of standards gaps and issues concerning data interoperability and comparability across and within jurisdictions. 

  • We will use COVID-19 surveillance data as a use-case to identify issues that can be resolved by implementation of a common standard for reporting communicable disease surveillance data.

  • Identify related standards and guidelines.

  • Identify standards gaps.

 

M3 (months 1-3). Definition of the scope of the standard and detailed objectives.

We will develop a detailed project management and work plan. 

 

M4 (months 3-6). Hackathon.

A hackathon will be conducted for the RDA 17th Plenary in April 2021. The objective will be to combine publicly available COVID-19 related datasets and to present solutions that overcome the barriers encountered. The hackathon will be announced at the 16th Plenary in November 2020 and will be opened in February 2021. Participants will present their results at the 17th Plenary at which time judging will take place and winners announced. Winners will be offered co-authorship on a peer-reviewed publication. From November to January, we will seek sponsors for cash prizes to be awarded to 1st, 2nd, and 3rd place winners.

 

M5 (months 3-12). Development of a draft standard for reporting epidemiology surveillance data.

 

M6 (months 12-14). Public review of the draft standard.

 

M7 (months 12-16). Development of guidelines for adoption of the standard

 

M8 (month 15-17). Finalization of the standard.
 

M9 (month 1-18). Dissemination and Communication.

WG activities and outcomes will be disseminated via the RDA website, preprint(s), submission to a peer-reviewed journal, RDA Plenaries, conference presentations, and social media.

 

Simplified Gantt Chart

Months

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Deliverables

                     

D1

draft

 

D2

draft

 

D2

final

D1

final

 

Milestones

M1

 

M2

M3

 

M4

         

M5

 

M6

 

M7

M8

M9

Plenaries

         

P17

         

P18

       

P19

 

 

Work space

The WG will use the following platforms for communication and development:

  • RDA website

  • Google drive

    • Working documents will be managed on Gdrive to facilitate open collaboration, and to generally make things easier.

  • GitHub 

    • We will develop a public GitHub repository to host the hackathon material,  models, source code, and the proposed common standard, and to raise and resolve issues.

  • Zotero

Tools

We will use a variety of tools, for example:

  • Visualization
    • Mindmapping
    • Infographics
  • Gantt charting for project management
  • Voting and consensus building tools

 

Meetings

  • Meetings will be held weekly.

  • An online platform (e.g., GoToMeeting, Zoom, WebEx, MS Teams, or Google Meet) will be used for meetings. Participants will be asked to activate their video to enhance communication effectiveness. The WG will meet at RDA Plenaries, the first such meeting being at the 16th Plenary on November 12, 2020 at 12:00 - 1:30 AM UTC

  • Agenda, minutes and rolling notes will be circulated via google doc. 

  • Discussions will be held at the RDA 16th, 17th, and 18th Plenaries, and at other conferences and workshops where possible. 

Consensus

A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation, and

 

Consensus will be achieved mainly through discussions in our regular weekly meetings, where conflicting viewpoints will be identified and openly discussed and debated by group members. If consensus cannot be reached in this manner, the final decision will be taken by the group co-chairs. By setting realistic deadlines and assessing progress on assigned tasks, the co-chairs will keep the WG on track and within scope.

 

Community engagement

A description of the WG’s planned approach to broader community engagement and participation. 

 

To encourage broader community engagement and participation in the development of a standard, the WG case statement will be circulated to various public health organizations and  epidemiological societies across the globe, and on social media (Linkedin and Twitter). A regular update on events/news related to Epidemiology common standards will be posted on RDA WG webpage to encourage involvement of specialists in the field. 

 

License

WG outputs will be published under a CC BY-SA license. 

 

5. ADOPTION PLAN

A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 12-18 month timeframe before the WG is complete.

 

The WG members will be encouraged to implement the new standard and guidelines within their organizations. We will pursue adoption by a variety of stakeholders and research communities, particularly those involved in public health.The standard will be disseminated via RDA webinars, other scientific presentations and twitter handle. We will also seek to publish the final standard and guidelines as an open access peer-reviewed journal article. We will follow up with adoption stories.

 

6. INITIAL WG MEMBERSHIP

A specific list of initial members of the WG and a description of initial leadership of the WG.

 

Co-Chairs: Claire Austin and Rajini Nagrani

 

RDA Liaison: Stefanie Kethers

 

Members/Interested:

Soegianto Ali

Anthony Juehne

Nada El Jundi

Fotis Georgatos

Jitendra Jonnagaddala

Miklós Krész

Gary Mazzaferro

Jiban K. Pal

Carlos Luis Parra-Calderon

Bonnet Pascal

Fotis Psomopoulos

Stefan Sauermann

Henri Tonnang

Marcos Roberto Tovani-Palone    

Anna Widyastuti

Becca Wilson

Eiko Yoneki

 

Current initial membership

The initial WG includes:

  • Cross-domain expertise 

    • biostatistics, clinical informatics, computer engineering, data science, epidemiology, global health, health informatics, health sciences, interoperability, IT architecture, mathematics, open science, pathology, predictive modeling, public health, research data management, software development, veterinary medicine
  • Experience 

    • academia, editor of scientific journals, government, international WG leadership,  program director, research, standards development.
  • Regional representation 

    • Africa (sub-saharan), Asia (maritime southeast), Asia (south), Australasia, Europe, North America, and South America.
  • Income groups

    • Two lower-middle income, two upper-middle income, and 15 high-income countries.

 

Initial membership comprises a core group from the RDA-COVID19-Epidemiology WG, and additional members who bring additional domain specific expertise. We aim to further strengthen the group to expand global participation (low income, lower-middle income, and upper-middle income countries), interdisciplinary experts, and stakeholder representation to address this pressing common epidemiology surveillance data challenge across the public health domain. 

 

Actively soliciting WG membership

The initial membership does not currently include any potential adopters. We will be soliciting the active participation in the WG of representatives from key stakeholders, including the following:

Official agencies and funders

  • Official agencies, organizations, and funders having international reach
  • Supernational organizations
  • European Centers for Disease Control (eCDC)
  • Global Early Warning System (GLEWS+)
  • Global Health Security Agenda (GHSA)
  • Global Influenza Surveillance and Response System (GISRS)
  • Global Partnership for Sustainable Development Data (GPSDD)
  • GloPID-R
  • Indian Council of Medical Research (ICMR)
  • Observational Health Data Sciences and Informatics (OHDSI)
  • UN Office for Disaster Risk Reduction (UNDRR)
  • United Nations Educational, Scientific and Cultural Organization (UNESCO)
  • U.S. Centers for Disease Control (CDC)
  • Wellcome Trust
  • World Data System (WDS)
  • World Health Organization (WHO)
  • World Bank World Development Indicators (WDI)

     

Data aggregators in academia

  • Johns Hopkins University (Killeen et al. 2020)
  • University of California, Berkeley (Altieri et al. 2020)
  • University of Oxford (Roser et al. 2020)

News Outlets

  • The Atlantic

  • The Economist

  • The Financial Times

  • The New York Times

Communications/graphic artist expertise

 

 

7. REFERENCES

Altieri, N., Barter, R. L., Duncan, J., Dwivedi, R., Kumbier, K., Li, X., Netzorg, R., Park, B., Singh, C., Tan, Y. S., Tang, T., Wang, Y., Zhang, C., & Yu, B. (2020). Curating a COVID-19 Data Repository and Forecasting County-Level DeathCounts in the United States. Harvard Data Science Review. https://doi.org/10.1162/99608f92.1d4e0dae

 

Austin, Claire C; Nagrani, Rajini; Widyastuti, Anna; El Jundi, Nada (2020a). Global status of COVID-19 data: A cross-jurisdictional and international perspective. Canadian Public Health Association Conference. October 14-16. https://www.cpha.ca/publichealth2020

 

Austin, Claire C; Widyastuti, Anna; El Jundi, Nada; Nagrani, Rajini; and the RDA COVID-19 WG. (2020b). Surveillance Data and Models: Review and Analysis, Part 1 (September 18, 2020). Preprint available at SSRN: http://dx.doi.org/10.2139/ssrn.3695335

 

Bong CL, Brasher C, Chikumba E, McDougall R, Mellin-Olsen J, Enright A (2020). The COVID-19 Pandemic: Effects on Low- and Middle-Income Countries. Anesth Analg, 131:86-92. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7173081/

 

CDC. Coronavirus Disease 2019 (COVID-19) in the U.S.. Centers for Disease Control and Prevention. 2020 [cited 2020 Oct 23]. Available from: https://covid.cdc.gov/covid-data-tracker

 

Fairchild G, Tasseff B, Khalsa H, Generous N, Daughton AR, Velappan N, Priedhorsky R, Deshpande A (2018). Epidemiological Data Challenges: Planning for a More Robust Future Through Data Standards. Front Public Health, 6:336. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6265573/

 

Gardner, L., Ratcliff, J., Dong, E., & Katz, A. (2020). A need for open public data standards and sharing in light of COVID-19. The Lancet Infectious Diseases, 0(0). https://doi.org/10.1016/S1473-3099(20)30635-6

 

Greenfield J., Tonnang E.Z., Mazzaferro G., Austin, C.C.; and the RDA-COVID19-WG. (2020). Epi-TRACS: Rapid detection and whole system response for emerging pathogens such as SARS-CoV-2 virus and the COVID-19 disease that it causes. IN: COVID-19 Data sharing in epidemiology, version 0.06b. Research Data Alliance RDA-COVID19-Epidemiology WG. https://doi.org/10.15497/rda00049

 

GLEWS (2013). Global Early Warning System. http://www.glews.net/?page_id=5

 

Haghiri H, Rabiei R, Hosseini A, Moghaddasi H, Asadi F (2019). Notifiable Diseases Surveillance System with a Data Architecture Approach: A Systematic Review. Acta Inform Med, 27:268-277. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7004293/

 

Janati A, Hosseiny M, Gouya MM, Moradi G, Ghaderi E (2015). Communicable Disease Reporting Systems in the World: A Systematic Review. Iran J Public Health, 44:1453-1465. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4703224/

 

JHU (2020). Coronavirus resource center. Johns Hopkins University. https://coronavirus.jhu.edu/

 

Killeen, B. D., Wu, J. Y., Shah, K., Zapaishchykova, A., Nikutta, P., Tamhane, A., Chakraborty, S., Wei, J., Gao, T., Thies, M., & Unberath, M. (2020). A County-level Dataset for Informing the United States’ Response to COVID-19. ArXiv:2004.00756 [Physics, q-Bio]. http://arxiv.org/abs/2004.00756

 

Modig K, Ebeling M (2020). Excess mortality from COVID-19. weekly excess death rates by age and sex for aweden. Preprint available at medRxiv: https://doi.org/10.1101/2020.05.10.20096909

 

Norton, A., Pardinz-Solis, R., & Carson, G. (2017). Roadmap for data sharing in public health emergencies. GloPID-R. https://www.glopid-r.org/our-work/data-sharing/

 

OECD (2020). Why open science is critical to combatting COVID-19—OECD. Organisation for Economic Co-Operation and Development, May 12, 2020. https://read.oecd-ilibrary.org/view/?ref=129_129916-31pgjnl6cb&title=Why-open-science-is-critical-to-combatting-COVID-19

 

OHDSI (2020). Observational Health Data Sciences and Informatics. https://ohdsi.github.io/TheBookOfOhdsi/

 

RDA COVID-19 WG (2020). Recommendations and guidelines. Research Data Alliance. https://doi.org/10.15497/rda00052

 

RDA COVID-19 Epidemiology WG (2020). Sharing COVID-19 epidemiology data: Supporting output. Research Data Alliance. https://doi.org/10.15497/rda00049

 

Reuters Editors. Turkey has only been publishing symptomatic coronavirus cases - minister. Reuters. 2020 [cited 2020 Oct 15]; Available from: https://www.reuters.com/article/health-coronavirus-turkey-int-idUSKBN26L3HG

 

Roser, M., Ritchie, H., Ortiz-Ospina, E., & Hasell, J. (2020). Coronavirus Pandemic (COVID-19). Our World in Data. https://ourworldindata.org/coronavirus

 

SDMX (2020). The Business Case for SDMX. SDMX Initiative. https://sdmx.org/?sdmx_news=the-business-case-for-sdmx

 

UN (2018). Overview of standards for data disaggregation. United Nations. https://unstats.un.org/sdgs/files/Overview%20of%20Standards%20for%20Data%20Disaggregation.pdf

 

UN (2020). IAEG-SDGs—Data Disaggregation for the SDG Indicators. United Nations. https://unstats.un.org/sdgs/iaeg-sdgs/disaggregation/

 

UNESCO (2020). Preliminary report on the first draft of the Recommendation on Open Science—UNESCO Digital Library. United Nations Educational, Scientific and Cultural Organization. https://unesdoc.unesco.org/ark:/48223/pf0000374409.locale=en.page=10

 

UNESCO, WHO, HCHR, & CERN (2020, October 27). ​Joint Appeal for Open Science. https://events.unesco.org/event/?id=1522100236

 

Vicente-Saez, R., & Martinez-Fuentes, C. (2018). Open Science now: A systematic literature review for an integrated definition. Journal of Business Research, 88, 428–436. https://doi.org/10.1016/j.jbusres.2017.12.043

 

WHO (2020a). Public health surveillance. United Nations, World Health Organization. https://www.who.int/immunization/monitoring_surveillance/burden/vpd/en/

 

WHO. (2020b). Country & Technical Guidance—Coronavirus disease (COVID-19). World Health Organization. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance-publications?publicationtypes=df113943-c6f4-42a5-914f-0a0736769008

 

WHO. (2020c). Global COVID-19 Clinical Data Platform for clinical characterization and management of hospitalized patients with suspected or confirmed COVID-19. World Health Organization. https://www.who.int/docs/default-source/documents/emergencies/information-sheet-global-covid19-data-platofrm.pdf?sfvrsn=ff1f4e64_2

 

WHO. (2020d). Global COVID-19 Clinical Data Platform. World Health Organization. https://www.who.int/teams/health-care-readiness-clinical-unit/covid-19/data-platform

 

WHO (2020e). Preparing GISRS for the upcoming influenza seasons during the COVID-19 pandemic – practical considerations. United Nations, World Health Organization. https://apps.who.int/iris/bitstream/handle/10665/332198/WHO-2019-nCoV-Preparing_GISRS-2020.1-eng.pdf.

 

WHO (2020f). COVID-19 case definition. https://www.who.int/publications/i/item/WHO-2019-nCoV-Surveillance_Case_Definition-2020.1

 

WHO (2020g). Global Influenza Surveillance and Response System (GISRS). United Nations, World Health Organization. https://www.who.int/influenza/gisrs_laboratory/en/.

 

WHO (2020h). COVID-19 Core Version Case Record Form (CRF). United Nations, World Health Organization. https://media.tghn.org/medialibrary/2020/05/ISARIC_WHO_nCoV_CORE_CRF_23APR20.pdf 

 

WHO (2020i). COVID-19 Rapid Version Case Record Form (CRF). United Nations, World Health Organization. https://media.tghn.org/medialibrary/2020/04/ISARIC_COVID-19_RAPID_CRF_24MAR20_EN.pdf

 

WHO (2020j). WHO Information Network for Epidemics (EPI-WIN). United Nations, World Health Organization. https://www.who.int/teams/risk-communication/about-epi-win

 

 

Review period start:
Wednesday, 28 October, 2020 to Friday, 25 December, 2020
Custom text:
Body:
Review period start:
Monday, 26 October, 2020
Custom text:
Body:
Review period start:
Monday, 26 October, 2020
Custom text:
Body:

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

Extensive work has been, and continues to be done on data interoperability at the technical and information domains. However, a large portion of the challenges in building interoperable information infrastructures are the result of the interplay between organisations, institutions, economics, and individuals.  Collectively these form the social dynamics that foster or hinder the progress towards achieving technical and information interoperability.

These are some of the most difficult challenges to address.  Currently there is only a limited body of work on how to address these challenges in a systematic way.  In keeping with the mission of the RDA, the focus of this group is to focus on what is required to build the social bridges that enable open sharing and re-use of data.

The focus of this interest group is to identify opportunities for the development of systematic approaches to address the key social challenges and to build a corpus of knowledge on building and operating interoperable information infrastructures.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

Within Australia, the National Collaborative Research Infrastructure Strategy (NCRIS) set forth the need to establish a National Environmental Prediction System (NEPS).  This requires the collaboration, coordination and (most importantly) the interoperability between a range of facilities, organisations and government entities for this system to work effectively.  A number of facilities involved have recently come to the realization that the Social dynamics between the facilities is a key factor in the success (or failure) of this initiative.

Within the United States, initiatives such as the Pacific Research Platform, the National Research Platform, and the Eastern Regional Network are a few examples of cross-institutional initiatives whose success is dependent as much on social dynamics as on overcoming technical challenges.

The problem exists at smaller scales as well.  At the institutional level, the need to drive adoption across IT, IT Security, Research units, and Libraries provides a persistent challenge.

The BoF session held at the 13th Plenary session highlighted that similar challenges exist within other research domains.

There are many solutions that are being applied everyday around the world to address these challenges.  Many of these are conceived and developed through the knowledge and experiences of the individuals involved.  However , at present there is limited systematic knowledge on this topic and therefore they have limited systematic knowledge to draw upon.

For example, the RDA itself is an instrument intended to address some of the challenges that exist in the social dynamics across the global research data landscape.  As such it provides both an interesting case study as well as a representative microcosm of the broader challenges in this space.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

Currently there is no other IG within the RDA that has a specific focus on the social dynamics, (ie: the interplay between organisations, institutions, economics, and individuals) relating to interoperable information infrastructure.

The main objective of this IG is to:

  • Identify organisational, institutional, economic, and individual aspects that increase the friction to achieving information interoperability.
  • Develop a corpus of knowledge, including models, frameworks and patterns that can be applied by practitioners to develop the desired social dynamics that reduce friction and foster information interoperability.
  • Identify and develop case studies of solutions that demonstrate the application of the corpus of knowledge on this topic.  It is acknowledged that often the details of specific case studies could be sensitive and documented case studies may need to be synthesised drawing upon actual cases.

The purpose of this IG is to create the body of knowledge and illustrative case studies for practitioners to be able to equip themselves with the best knowledge to understand the social dynamics that exist in their specific context and to be able to draw on this knowledge to influence positive change. 

                                                                                                   

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

The participation in this IG is left open and broad to anyone who has an interest in the social dynamics as it relates to building interoperable data infrastructures.  Specific skills and knowledge that would be useful for this IG include,

  • Social psychology
  • Organisational behaviour and organisational psychology
  • Economics
  • Legal frameworks
  • Digital anthropology
  • Digital ethnography

It is expected that many of the topics of interest for this IG will have some degree of overlap with other IGs and WGs within RDA.  It is intended that this IG will keep these related IGs informed of its activity, and seek to coordinate with them on topics that overlap or have a common interest.  It is feasible that in the future we could hold joint sessions at plenary events around common topics.

Drawing on the description provided in the RDA website, the following IGs have been identified as potentially having overlapping interests with this IG,

  1. Big Data IG
  2. Biodiversity Data Integration IG
  3. Chemistry Research Data IG
  4. CODATA/RDA Research Data Science Schools for Low and Middle Income Countries
  5. Data Economics IG
  6. Data Fabric IG
  7. Data Foundations and Terminology IG
  8. Data in Context IG
  9. Data policy standardisation and implementation IG
  10. Digital Practices in History and Ethnography IG
  11. Domain Repositories IG
  12. Early Career and Engagement IG
  13. Education and Training on handling of research data IG
  14. ELIXIR Bridging Force IG
  15. Engaging Researchers with Data IG
  16. Ethics and Social Aspects of Data IG
  17. Federated Identity Management
  18. Global Water Information IG
  19. National Data Services IG
  20. Physical Samples and Collections in the Research Data Ecosystem IG
  21. PID IG
  22. Preservation Tools, Techniques, and Policies
  23. RDA/CODATA Legal Interoperability IG
  24. RDA/CODATA Materials Data, Infrastructure & Interoperability IG
  25. RDA/NISO Privacy Implications of Research Data Sets IG
  26. RDA/WDS Certification of Digital Repositories IG
  27. Research Data Architectures in Research Institutions IG

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

There are two primary outcomes of this IG:

  1. Create a community of interest on the Social dynamics of interoperable information infrastructures;
  2. Create a corpus of knowledge on the topic.
  3. Identify and develop case studies of solutions that demonstrate the application of the corpus of knowledge on this topic.

Some initial topics that could lead to Working Groups include,

  • Problem and solution patterns in Information Infrastructure;
  • Governance  & participation models;
  • Frameworks for trust;
  • Incentives and disincentives for collaboration and participation;
  • Specific institutional partnerships known to exist, how they came to be, and their varying degrees of success

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

     The group will aim to have at least 1 virtual meeting between sessions.  It will also establish a mechanism (possibly the mailing-list) for offline discussions.

Timeline (Describe draft milestones and goals for the first 12 months):

   

Research and identify organisational, institutional, economic, and individual challenges to achieving interoperability

Month 1-6

Identify case studies

Month 7-12

creation of knowledge corpus

Month 12-24

Apply knowledge corpus to case studies

Month 24+

  

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

 

FIRST NAME

LAST NAME

EMAIL

TITLE

Kheeran

Dharmawardena

kheerand@cytrax.com.au

Co-Chair

Greg

Madden

gregmadden@psu.edu

 

Heidi

Laine

heidi.laine@csc.fi

 

Jay

Pearlman

jay.pearlman@fourbridges.org

 

Jeremy

Cope

jez.cope@bl.uk

 

Kathleen

Gregory

kathleen.gregory@dans.knaw.nl

 

Kiera

McNeice

kmcneice@cambridge.org

 

Lisa

Raymond

lraymond@whoi.edu

 

Maggie

Hellström

margareta.hellstrom@nateko.lu.se

 

Stefanie

Kethers

stefanie.kethers@ardc.edu.au

 

Review period start:
Tuesday, 20 October, 2020
Custom text:
Body:

Introduction

Data management plans (DMPs) serve as the first step in the RDM lifecycle. They aid in recording metadata at various levels during the data description process and are intended to be adapted as a project evolves. During consultations and training focused on the concept of a data management plan, it becomes clear that the views of research funders and researchers differ widely on the use of DMPs. Research funders want to know what happens to the data during and after the project. Researchers, on the other hand, want support in their daily work with data and tend to see the DMP as an additional bureaucratic burden. Additionally, creating DMPs is further complicated because individual disciplines may have very different requirements and challenges for data collection and data management.

 

The full case statement of the WG  is attached to this page.

 

 

 

 

 

Review period start:
Tuesday, 20 October, 2020 to Friday, 20 November, 2020
Custom text:

Pages