You are here

Body:

 

Please note: The following text is the revised and final Charter dated 10 Jan 2018. It is also attached to this page.

The original Charter can be found at the end of this page.


Name of Proposed Interest Group: Virtual Research Environments IG

                                                                       

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The vision of the Research Data Alliance (RDA) is that “researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” The Mission of RDA is that it “builds the social and technical bridges that enable open sharing of data.”

Increasingly researchers who are not co-located are seeking to work dynamically together at various scales from the local to global using the internet to share data, models, workflows, best practices, publications, management and administration of their research etc. The Virtual Research Environments Interest Group (VRE-IG) seeks to build the required technical bridges, skills and social communities that enable global sharing and processing of data across technologies, disciplines and countries through the creation of shared online virtual environments. As these individual VREs grow, inevitably they need to also connect with other major research infrastructures.

 

The goal of the VRE-IG is to identify the technical issues to and, where known,  share solutions that enable online access to data and other research assets required to address issues that can range from local challenges (which are also potentially of direct relevance to researchers in other geographical areas or other research domains), to the research grand challenges currently being faced by society on global issues, e.g., societal impacts of climate change; sustainable cities; and environmentally sensitive utilisation of the scarce resources of our planet.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

  1. Domain specific VREs are being built in individual nationally and regionally funded research projects (e.g., geophysics, environment, hazards mitigation). Although the data sets being accessed are of national extent, can these tools be utilised for development of similar VREs, such as for geophysical inversions, species tracking, flood prediction and mitigation)?
  2. A new group wishes to develop a shared virtual research environment - what are the best practices defined for how to technically build and sustain a VRE?
  3. Building a VRE requires specialised skills - what are those skills and how can they best be shared?
  4. As a VRE grows it will inevitably link with major infrastructure initiatives such as European Open Science Cloud (EOSC), the US Extreme Science and Engineering Discovery Environment (XSEDE) and the Australian National Research Data Cloud (NRDC) – but how to connect to these?
  5. How can a community around online access to and processing of major data resources be built and maintained?
  6. How to access and build gateways to major supercomputer or cloud resources to enable processing of data in data intensive scientific environments?

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

VREs are synonymous with Science Gateways (SGs) in the USA and Virtual Laboratories (VLs) in Australia, and are increasingly being used to support a more dynamic approach to collaborative working across the internet. The proposed VRE-IG will explore all aspects of existing and planned future VRE/SG/VLs with the aim of moving towards common policies and best practices, such as those now being promoted by the European EOSC, the US XSEDE and the Australian NRDC. There is currently no coordination of the development of the underlying architectures, as well as specifications for components and interfaces in any of these initiatives, nor is there any agreed best practice way to connect to the major research infrastructures, in particular data to compute resources. Likewise there is also no mechanism for sharing best practice, skills, tools and software that connect tools to data in online environments that could ultimately allow these individual VREs to interoperate on a global scale. The goal of the VRE IG is to encourage initiatives tasked with developing these technologies to create ‘building blocks’ of common data infrastructures and build specific ‘data bridges’ to enable online sharing and in situ processing of data. The US SGCI (begun in August 2016) is starting to work on these challenges for the US and will closely collaborate with this IG.

 

The VRE IG will aim to act as a longer-term organization responsible for tracking and contributing to the evolution of VRE/SG/VL technologies, particularly as they relate to data access. It will also seek to engage with those making use of these online technologies in an effort to identify the necessary technical aspects, social and community building practices, required skills, as well as governance issues and best practice required to support a more coordinated approach to the development of collaborative environments that enable data sharing and in situ online processing.

 

The proposed VRE-IG group is in effect, an ‘umbrella group’ that brings together:

  1. Those initiatives that are actively developing VRE/SGs/VLs internationally;
  2. Representatives of the common eInfrastructure (eIs) services e.g. EUDAT, EOSC, XSEDE, NRDC, etc.; and
  3. Specific RDA groups (e.g., software citation, metadata IG, Versioning IG, etc.), which are developing outputs, that are themselves best practice inputs to research groups developing VREs.

 

The objectives of the VRE-IG are to

  1. Review the state of the art and compare/contrast existing VREs, VLs and SGs;
  2. Ensure associated relevant technologies are highlighted to IG participants so that they are aware of them and understand their potential to enhance their own VRE efforts, particularly those that enhance online access to data and enable in situ processing;
  3. Compare architectures used for a VREs that facilitate connecting people to the required resources online (data, tools and compute) (it may be feasible to develop a reference architecture as a dedicated Working Group);
  4. Propose specifications for standard components (software and interfaces) for a VRE/SG/VLs;
  5. Propose best practices for VRE/SG/VLs development and implementation, in particular definition of best practice for building communities around and sustaining VREs;
  6. Contributing to the SGCI’s scientific software collaborative to build a central information hub for researchers and developers seeking to connect data, tools and compute infrastructures online; and
  7. Suggest policies to stakeholders VREs in close collaborations with existing foundation projects and initiatives e.g. VRE4EIC, SGCI, XSEDE, OSG, NRDC, etc..

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

The proposed VRE-IG is domain-agnostic and is relevant to the academic, government and industry sectors. It will bring together experts in data, tools and compute resources. The group already has 92 members, who truly reflect this diversity of interest.

 

The proposed VRE-IG will engage with the relevant IG/WGs including:

  • Software Citation IG
  • Metadata IG: definition of packages of metadata elements appropriate for the VRE/SG/VL
  • Metadata catalogue WG which will potentially provide resources for documenting the metadata used in different VREs
  • Preservation Tools, Techniques and Policies IG
  • Research Data Provenance IG
  • Reproducibility IG
  • Federated Identity Management IG
  • Data Fabric IG
  • Domain groups for use cases, requirements and possible later validation
  • Mapping the Landscape IG

 

In addition, the register of VRE’s and components of VREs being developed by the SCGI, will be entered into the RD-A Mapping the Landscape IG Inventory ( https://sciencegateways.org/resources/catalog  and https://catalog.sciencegateways.org/#/home)

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

VRE/SG/VLs and associated technologies have matured in the last 10 years as evidenced by the evolution from more one-off, bespoke, single workflow systems developed by a specific set of researchers, to loosely coupled platforms shared by many groups of researchers. If the objectives outlined above for the VRE IG can be achieved it will lead to interoperating VRE/SG/VLs across multiple domains and where feasible, supported by integration of underlying national e-RIs.  The alternative is divergent and heterogeneous systems that will have high maintenance costs and are incapable (or only capable with great effort) of interoperating: these more bespoke, more specialised systems have well known issues of long-term sustainability.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

The Group will meet twice a year at each RD-A plenary. Specific VRE sessions will also be held at major domain conferences such as AGU, EGU.

 

Between RDA plenaries the momentum will be sustained via the webpage (https://rd-alliance.org/groups/vre-ig.html ) and via teleconferences for specific discussion topics.

 

Timeline (Describe draft milestones and goals for the first 12 months):

The VRE-IG has already met (and has been well attended) at previous plenaries as follows:

 

  1. 7th RD-A Plenary BoF Tokyo: Kick-Off Meeting to establish IG
    Link:
    https://rd-alliance.org/bof-kick-meeting-establish-ig-vre-virtual-research-environment.html.

    Focus: BoF to determine we should proceed to an RD-A Interest Group

  2. 8th RD-A Plenary IG Denver: VREs/Virtual Laboratories/Science Gateways - opportunities for developing a more coordinated approach to support interoperability across different systems.
    Link:
    https://rd-alliance.org/ig-virtual-research-environment-rda-8th-plenary-meeting.

    Focus: Discuss Case Statement and present on a variety of VREs

  3. 9th RD-A Plenary IG Barcelona: Virtual Research Environments - coordinating sustainable online research environments across multiple infrastructures
    Link:
    https://www.rd-alliance.org/ig-virtual-research-environment-vre-ig-rda-9th-plenary-meeting.

    Focus: Intercontinental comparison and contrast of VREs/SGs/VLs, particularly with respect to interoperability, community building and sustainability of components of a VRE.

  4. 10th RD-A Plenary IG Montreal: Understanding VREs/SGs/VLs: planning for sustainable collaborative development
    Link: https://www.rd-alliance.org/ig-virtual-research-environment-vre-ig-rda-10th-plenary-meeting.

    Focus: Intercontinental comparison and contrast of VREs/SGs/VLs, particularly with respect to understanding the differences/commonalities of VREs/SGs/VLs and on ensuring sustainability of community VRE platforms once they are built.

 

The format of meetings has been to choose 2 or 3 relevant topics and then present case studies on the topic from European VREs, Australian VLs and North American SGs..

 

For the Berlin Plenary the proposed title is Virtual Research Environments – how do I find them and what skills do I need to build and use them? The focus will be on intercontinental comparison and contrast on (1) preparing catalogs/inventories of VREs and (2) on approaches to developing skills needed to build and to use VREs.

 

At the end of each Plenary session the attendees are asked as to what are their burning issues for the next Plenary.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

There are currently 92 members of the VRE IG identified on RD-A portal (https://www.rd-alliance.org/groups/vre-ig.html). The proposed chairs are listed in bold below.

 

Current membership includes those directly engaged with the development of VRE/SG/VL technologies but also representatives of those responsible for governance structure of existing individual VRE/SG/VLs and their respective user communities.

 

 

No

FIRST NAME

LAST NAME

 

TITLE

1

     Lesley

     Wyborn

 

    

2

     Keith

     Jeffery

    

    Prof

3

     Sandra

     Gesing

    

    

4

     Helen

     Glaves

    

    

5

Afonso

Duarte

 

 

6

Alessandro

Saretta

 

 

7

Alex

Hardisty

 

 

8

Anton

Van de Putte

 

 

9

Antonio

Rosato

 

 

10

Aubert

Landry

 

 

11

Ben

Evans

 

 

12

Bert

Jagers

 

 

13

Brian

Matthews

 

 

14

Bridget

Almas

 

 

15

Christian

Page

 

 

16

Christopher

Brown

 

 

17

Clare

Austin

 

 

18

Claire

Trenham

 

 

19

Cosima

Wagner

 

 

20

Daniel

Mietchen

 

 

21

Daniele

Bailo

 

 

22

Daryl

Grenz

 

 

23

David

Morse

 

 

24

Denise

Hills

 

 

25

Dimitrios

Koureas

 

 

26

Ebrahim

Jahanshiri

 

 

27

Eva

Mendez

 

 

28

Franco

Zoppi

 

 

29

Hamish

Holewa

 

 

30

Hiela

Pienaar

 

 

31

Ingemar

Häggström

 

 

32

Johann

Van Wyk

 

 

33

Jonathan

Crabtree

 

 

34

Jose

Borbinha

 

 

35

Julian

 Barde

 

 

36

Katherine

Lawrence

 

 

37

Kheeran

Dharmawardena

 

 

38

Lene Krøl

Andersen

 

 

39

Leonardo

Candela

 

 

40

Leslie

Hsu

 

 

41

Luca

Trani

 

 

42

Madeline

Huber

 

 

43

Maggie

Hellström

 

 

44

Malcolm

Wolski

 

 

45

Mario J

Silver

 

 

46

Mark

Leggott

 

 

47

Markus

Stocker

 

 

48

Marta

Busse-Wiche

 

 

49

Martie

van Deventer

 

 

50

Martin

Hammitzsch

 

 

51

Massimiliano

Assante

 

 

52

Mathew

Fry

 

 

53

Merret

Buurman

 

 

54

Michael

Jones

 

 

55

Michael

Witt

 

 

56

Michael

Crusoe

 

 

57

Michael

Kahle

 

 

58

Michael

Maragakis

 

 

59

Michelle

Barker

 

 

60

Mingfang

Wu

 

 

61

Monique

Crichlow

 

 

62

Nancy

Wilkins-Diehr

 

 

63

Natalie

Myers

 

 

64

Nayiri

Mullinix

 

 

65

Oded

Kariti

 

 

66

Paolo

 Tagliolato

 

 

67

Pawel

Ciecieląg

 

 

68

Pedro

Goncalves

 

 

69

Peter

Fox

 

 

70

Plato

Smith

 

 

71

Pyrou

Chung

 

 

72

Raphael

Levy

 

 

73

Raul

Palma

 

 

74

Rebecca

Koskela

 

 

75

Richard

Grunzke

 

 

76

Rob

Hooft

 

 

77

Roger

Proctor

 

 

78

Roman

Gerlach

 

 

79

Rossana

Paciello

 

 

80

Sarah

Jones

 

 

81

Siddeswara

Guru

 

 

82

Silvana

Asteggiante

 

 

83

Simone

Mantovani

 

 

84

Stephanie

Cheviron

 

 

85

Timea

Biro

 

 

86

Trudi

Wright

 

 

87

Vincent

Smith

 

 

88

Weicheng

Huang

 

 

89

Yannis

Marketakis

 

 

90

Yong

 Liu

 

 

91

Yulia

Karimova

 

 

92

Zhengzhe

Wu

 

 

 

 

                                                           


Previous versions of the Charter

  • The original Charter can be found below.
  • Following the initial TAB review of the initial Charter, the group submitted a revised Charter dated July 2017, which can be downloaded here.

 


Original Charter Statement

 

Case Statement

Increasingly researchers who are not co-located are seeking to work dynamically together at various scales from the local to the international. These researchers want to share data, models, workflows, best practice, publications, management and administration of their research etc. This is to address either local challenges which are also potentially of direct relevance to researchers in other geographical areas, or they have a shared interest in addressing a common issue such as the grand challenges currently being faced by society on a global scale e.g. climate change.

Virtual research environments (VREs), synonymous with science gateways in the USA and virtual laboratories in Australia, are increasingly being used to support this more dynamic approach to collaborative working. This has led to a number of regional VRE/SG/VL initiatives such as VRE4EIC, whose goals include to increase the VRE usability for multidisciplinary research and quality of VRE user experiences. Although these systems are seeking to share some of the same resources and common infrastructure services e.g. EUDAT, GEANT, etc., there is no coordination of the development of the underlying architecture that would allow these individual VREs to interoperate.

The proposed VRE IG will explore all aspects of existing and planned future VRE/SG/VLs with the aim of moving towards common policies and best practices, such as those being promoted by the US Science Gateways Community Institute (SGCI), the Australian Research Data Services (ARDS) and common reference architectures as well as specifications for components and interfaces 

Objectives

The proposed VRE interest group would bring together those initiatives actively developing VRE/SGs/VLs and also the representatives of the common infrastructure services e.g. EUDAT, ARDS. It will also seek to engage with those seeking to make use of these technologies in an effort to identify the necessary technical aspects, governance issues and best practice required to support a more coordinated approach to the development of the collaborative environments.

The proposed IG will bring together this experience and evolve towards

  1. Reference architectures for a VRE based on superposition over e-RIs e-Research Infrastructures) and e-Is (e-Infrastructures);
  2. The definition  of a set of components (software and interfaces) for use in a VRE;
  3. The definition of interfaces between a VRE and e-RIs;
  4. The definition of best practice in constructing VREs; and
  5. Recommendations for policies in e-RIs and e-Is.

 

Value Proposition

VRE/SG/VLs are relatively new concepts and the associated technologies have matured in the last 10 years as evidenced by novel developments of these frameworks.  If the objectives outlined above for the VRE IG can be achieved it will lead to interoperating VRE/SG/VLs (themselves supported by integration of heterogeneous e-RIs that are in turn supported by e-Is).  The alternative is divergent and heterogeneous systems incapable (or only capable with great effort) of interoperating.

 

Activities

The VRE IG will aim to act as a longer-term organization responsible for tracking and contributing to the evolution of VRE/SG/VL technologies. To achieve these objectives the VRE IG will:

  1. Review the state if the art;
  2. Ensure associated relevant technologies are known and understood;
  3. From (1) and (2) propose canonical architectural models for VREs;
  4. Propose specifications for standard components (software and interfaces) for a VRE/SG/VLs;
  5. Propose best practices for VRE/SG/VLs development and implementation;
  6. Contributing to the SGCI’s scientific software collaborative to build a central information hub for researchers and developers; and
  7. Suggest policies to stakeholders of e-RIs and e-Is in close collaborations with existing projects and initiatives e.g. VRE4EIC, EVER-EST, SGCI, XSEDE, OSG, ARDS, etc..

 

Relationships with other WG/IGs

The proposed VRE-IG will engage with the relevant IG/WGs that will include:

  • Big Data IG
  • Metadata IG: definition of packages of metadata elements appropriate for the VRE/SG/VL
  • Metadata catalogue WG which will potentially provide a resources for documenting the metadata used in different VREs
  • Preservation Tools, Techniques and Policies IG
  • Research Data Provenance IG
  • Reproducibility IG
  • Federated Identity Management IG
  • Data Fabric IG
  • Domain groups for use cases, requirements and possible later validation

 

Participants

There are currently 57 members of the VRE IG identified on RDA portal (https://www.rd-alliance.org/groups/vre-ig.html). Current membership includes those directly engaged with the development of VRE/SG/VL technologies but also representatives of those responsible for governance structure of existing individual VRE/SG/VLs and their respective user communities.

The proposed group is co-chaired by:

  • Keith Jeffery (UK)
  • Helen Glaves (UK
  • Lesley Wyborn (Australia)
  • Sandra Gesing (USA)

Group Charter versions

For the original version of the Charter, see immediately above.

Following the initial TAB review, the VRE IG submitted a revised consolidated charter (July 2017) - download here  

Final version of the Charter (January 2018) is at the top of the page and can also be downloaded here

Review period start:
Friday, 1 September, 2017
Custom text:
Body:

NOTE - Please see the revised version of this Chater Statement attached to the page here.  Updated as of 16 June 2017.

 

 

      The call for Indigenous data sovereignty (ID-Sov) —the right of a nation to govern the collection, ownership, and application of its own data—has grown in intensity and scope over the past five years. To date three national-level Indigenous data sovereignty networks exist: Te Mana Raraunga - Maori Data Sovereignty Network, the United States Indigenous Data Sovereignty Network (USIDSN), and the Maiamnayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Group in Australia. Similar initiatives are underway in Hawaii and Sweden. Currently, these networks are engaging in an informal, and somewhat adhoc fashion, to share information and strategies, hold joint events, and collaborate on research. In the last two years alone this spirit of collaboration has produced four events [1], six joint panel/workshops initiatives [2], and a co-edited book, Indigenous Data Sovereignty: Toward an Agenda. Freely available online, the book had about 2,000 downloads within a month of publication, reflecting the very high level of interest in ID-Sov. These efforts notwithstanding, there are resource and infrastructure constraints to advancing the shared goals and aspirations of these ID-Sov stakeholders. What is needed is a more robust and coherent international collaboration to achieve impactful outcomes at the intersection of Indigenous data sovereignty, Indigenous data governance, and research.

The goals of the International Indigenous Data Sovereignty Interest Group are clearly aligned with the RDA mission of creating a global community to develop and adopt infrastructure that promotes data-sharing, data-driven research, and data use. Those of us already involved in the national-level networks are strong advocates for data-driven research and data use, and are also working in varied ways to build data capabilities beyond academic institutions, so as to benefit Indigenous communities. Through more effective collaboration, we seek to provide a highly visible international platform for ID-Sov that integrates and leverages existing ID-sov groups to create new opportunities for research and outreach. We also seek to attract new stakeholders beyond our current networks, including researchers, data users and indigenous communities. To that end all three existing ID-Sov networks have developed strong relationships with Indigenous stakeholders including tribes, Non Governmental Organisations, and Indigenous policy institutes, and researchers.

 

            The International Indigenous Data Sovereignty Interest Group will add value to the RDA and ID-Sov communities through the following objectives:

  1. Serving as a platform that leads to the formation of one or more Working Groups. We envisage that our ID-Sov IG would lead to the establishment of a Working Group, with a focus on co-creating an international indigenous data governance framework founded on ID-Sov principles (see below).
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

     Like other nation states, Indigenous nations need data about their citizens and communities to make informed decisions. However, the information that Indigenous nations have access to is often unreliable, inaccurate, and irrelevant. Federal, state, and local governments have primarily collected these data for their own use. Indigenous nations’ reliance on external data that do not reflect the community’s needs, priorities, and self-conceptions is a threat to self-determination.

The demand for Indigenous data is increasing as Indigenous nations and communities engage in economic, social, and cultural development on an unprecedented level. Given the billions of dollars in research funding spent each year and the increasing momentum of the international big data and open data movements, Indigenous nations and communities are uniquely positioned to claim a seat at the table to ensure Indigenous peoples are directly involved in efforts to promote data equity in Indigenous communities.

The International Indigenous Data Sovereignty Interest Group will provide infrastructure and collaboration to advance the shared goals and aspirations of these ID-Sov stakeholders. In addition, the IG provides a platform at the intersection of Indigenous data sovereignty, Indigenous data governance, and research to educate scholars across disciplines share WG outcomes and outputs with

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

     

Neither an IG nor a WG exists within or external to RDA that focuses on international collaborations on ID-Sov. The International Indigenous Data Sovereignty Interest Group objectives include:

  1. Serving as a platform that leads to the formation of one or more Working Groups. We envisage that our ID-Sov IG would lead to the establishment of a Working Group, with a focus on co-creating an international indigenous data governance framework founded on ID-Sov principles (see below).
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

 

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

     The International Indigenous Data Sovereignty Interest Group will link Indigenous data users, leaders, information and communication technology providers, researchers, policymakers and planners, businesses, service providers, and community advocates together to provide a highly visible international platform for ID-Sov that integrates and leverages existing ID-sov groups to create new opportunities for research and outreach. To that end all three existing ID-Sov networks have developed strong relationships with Indigenous stakeholders including tribes, Non Governmental Organisations, and Indigenous policy institutes, and researchers. We propose to use the RDA Plenary as an opportunity to establish relationships and connections with other IG. We also seek to attract new stakeholders beyond our current networks, including researchers, data users and indigenous communities. Note that IG members need not be Indigenous, so long as they are interested in furthering the aims of ID-Sov, data governance toward ID-Sov, and data-driven research.​

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.): The International Indigenous Data Sovereignty Interest Group envisions three categories of outcomes:

 

  1. Working Groups. We envisage that our ID-Sov IG would lead to the establishment of Working Groups, with a focus on:
    1. Co-creating an international indigenous data governance framework founded on ID-Sov principles and

    2. Establishing an international collaborative funding proposal to Indigenous stakeholders in order to design a clear pathway from research to impact.
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

The International Indigenous Data Sovereignty Interest Group will use the following mechanisms for communication and collaboration.

  • Monthly virtual meetings via video conference, shared documents, etc.
  • Informal and frequent email contact among chairs and workgroups.
  • Monthly listserv messages to IG members from the chairs about IG updates, WG efforts, etc.
  • Biannual RDA Plenaries
  • Listserv and Facebook group where members may post about Indigenous data sovereignty and data governance resources, current events, and conferences.

 

 

Timeline (Describe draft milestones and goals for the first 12 months):

The International Indigenous Data Sovereignty Interest Group will commence via virtual chair meetings. In addition, varying combinations of IG co-chairs and members have collaborative events1 and panels2 planned in March, April, and May 2017. These events will launch the WG1 effort to co-create an Indigenous data governance framework.

 

 

4/

17

5/

17

6/

17

7/

17

8/

17

9/

17

10/

17

11/

17

12/

17

1/

18

2/

18

3/

18

Virtual Chair

Meetings

X

X

X

X

X

X

X

X

X

X

X

X

Listserv message

X

X

X

X

X

X

X

X

X

X

X

X

Workgroup 1:

Framework

 

X

X

X

X

X

X

X

X

X

X

X

Workgroup 2:

Research Collaborations

 

 

 

X

X

X

X

X

X

X

X

X

RDA Plenaries

 

 

 

 

 

X

 

 

 

 

 

 

 

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

The International Indigenous Data Sovereignty Interest Group co-chairs include representation from three continents (Europe, North America, Oceana), four countries (Australia, New Zealand, Sweden, United States) and four disciplines (demography, history, public health, sociology). The initial IG members expand the discipline to include business administration and Indigenous studies.

 

Footnotes:

[1] These are: International Workshop on Data Sovereignty for Indigenous Peoples: Current Practice and Futures Needs, Canberra, July 2015 (hosted by John Taylor and Tahu Kukutai, sponsored by Academy of Social Sciences in Australia); Indigenous open data summit, Madrid, October 2016 (hosted by USIDSN, sponsored by the Native Nations Institute and the International Open Data Conference); Indigenous Data Sovereignty Summit, Auckland, November 2016 (hosted by Te Mana Raraunga, sponsored by Swedish Research Council, Wallenberg Academy Fellows, and Ngā Pae o Te Māramatanga Māori Centre of Research Excellence); Indigenous Data Governance, Los Angeles, May 2017 (hosted by USIDSN, sponsored by the Native Nations Institute and the University of California Los Angeles).

 

[2] A joint panel ‘Indigenous Data Governance and Open Data Futures’ at the 3rd International Open Data Conference, Ottawa, May 2015; a joint panel ‘In pursuit of Indigenous data sovereignty:  Directions and challenges’ at the Native American Indigenous Studies conference, Hawaii, May 2016; a collaborative workshop ‘Indigenous Data and Information Sovereignty: Making Open Data work for Indigenous Peoples’ at the RDA Eighth Plenary, September 2016; a joint panel ‘Indigenous + Data’ at the 4th International Open Data Conference, Madrid, October, 2016; a masterclass on Indigenous Data Sovereignty at the Common Roots Indigenous Governance conference, Brisbane, March 2017; a joint panel “Indigenous Nation Data Governance: Data for Nation Rebuilding” at the National Congress of American Indians mid-year Policy Research Center data pre-conference, Uncasville, CT, June 2017.

 

 

 

 

Review period start:
Sunday, 19 March, 2017
Custom text:
Body:

Case Statement: PID Kernel Information profile management WG

(aka PID Kernel Information WG #2)

 

Co-chairs: Tobias Weigel, Beth Plale, Jens Klump

 

Motivation and scope

The PID Kernel Information WG produced a recommendation that contains i) guiding principles for identifying information appropriate as PID Kernel Information, ii) an exemplar PID Kernel Information profile, iii) several use cases, and iv) architectural considerations. PID Kernel Information is defined as the set of attributes stored within a PID record. It supports smart programmatic decisions that can be accomplished through inspection of the PID record alone.

At the RDA P13 PID Kernel Information WG session, attendees expressed enthusiasm over the PID KI work. While the WG feels that the exemplar profile reflects a consensus decision while at the same time satisfying the guiding principles, it does not preclude other profiles. But a global technology, such as the PID KI intends to be, that is without considered organizational governance or management will never be adopted except in highly leading edge (aka research) settings.

Thus the PID KI WG agreed at the conclusion of the P13 session that there was need to examine the governance and management of some small number of globally relevant PID KI profiles. Without further guidance on how to manage profiles, there is the risk of wide-scale proliferation of overlapping or incompatible profiles, which could significantly hamper long-term realization of coherent middleware that supports Kernel Information. As the PID KI depends on a type profile to be interpretable, it requires the existence of a Data Type Registry to store the type definition of the PID KI. In fact, both PID KI and DTR are part of the same digital object ecosystem. Thus the issue of governance must be coordinated with the DTR#3 group. A follow-on WG additionally presents an opportunity to further define the boundary conditions for such middleware and foster alignment across disciplines and regions.

The WG lives in the context of

  • Multiple use cases from disciplinary adoption and project work

  • Relevant work inside RDA (other WG/IG) and outside RDA (W3C PROV, SemWeb/LD in general)

 

Objectives
 

The WG will work towards the following objectives:

  1. Life cycle model defined for KI profiles and mechanisms (principles, processes, tools) through which KI profiles can be defined and governed. Define profile metadata and how to encode a profile.

  2. Baseline KI architecture extended to match the needs of profile management, for example, by including profile registries and connectors.

  3. Describe the technical interface for interaction with profile registries, most likely based on the DTR WG recommendation and API. Preferably, a separate new API does not need to be defined.

  4. As the PID KI depends on a type profile to be interpretable, and both PID KI and DTR exist in the same digital object ecosystem, the governance and management of PID KI profiles should be synergistic with the governance and management of Data Type Registries. Objective is coordinated guidance on governance between this WG and the DTR#3 group.

  5. Facilitate coordination with other RDA groups mentioned further below more generally.

 

Value proposition

 

Cyberinfrastructure providers will benefit from the architectural reference model, which makes collaboration for development and use of shared infrastructure components using PID KI profiles easier. The governance mechanisms provided by the WG are a necessary element for serving KI profiles in operational settings.

Tools and services builders will benefit from the availability of well-defined, well-managed and interoperable KI profiles as they can rely on agreed profiles to underpin specific tasks in data workflows. A common approach for interfacing with profile registries can reduce development costs, for example by sharing software library development for registry clients.

As a result of the combined efforts by cyberinfrastructure providers and service/tool builders, scientific users will benefit from the better availability of information across data life cycle stages through the connections made by the PID KI graph. The adoption of profile management can make the links in the graph more coherent.

Data producers , particularly if organized in research infrastructures, could benefit from having a wide range of KI profiles available, with ensured interoperability between them and related services. Potential needs and usages will differ between research disciplines, but knowing which profiles are supported by which repositories could help in the process of identifying the optimal storage and cataloguing facility for a given dataset.

 

Stakeholders and adoption
 

The topic is of relevance to multiple stakeholders, which will be sought out. Initially, these are:

  • The NSF-funded eRPID project: The predecessor RPID project evaluated and prototyped usage of the early Kernel Information recommendation and its findings within the project context. The eRPID successor project will continue this and provide two-way interaction on practical use cases and needs, and be informed by the PID KI WG outcomes.

  • EUDAT: The operational B2HANDLE service is relying on Kernel Information as an essential element to facilitate cross-service integration at e-infrastructure level, largely hidden from users as is in line with the KI vision. The latest integration activities (e.g. B2SHARE, OneData) introduce a complexity level where profile governance would be highly beneficial to support long-term stability.

  • Similar to EUDAT, ePIC/GWDG take further interest in the KI concept to underpin identifier-level metadata management with a stable conceptual framework, and participate in the discussion of governance processes with the eyes of a potential adopter of these processes at the ePIC management level.

  • International GeoSample Number (IGSN): IGSN interest in the KI concepts was renewed at and after the P13 session. IGSN may benefit from KI to provide better integration with e-infrastructure services, and a discussion on profile governance may be a key requirement for adoption actions.

  • DiSSCo: As a new research infrastructure in Europe connecting natural scientific collections of more than hundred collecting holding institutes across Europe, DiSSCo has a potential interest in the KI concept to inform its future e-infrastructure strategy. DiSSco is in conversation with IGSN to develop a joint Handle-based approach for specimen object identifiers, and this has the potential to lead to billions of new Handles. The KI concepts and governance discussions are relevant and crucial for long-term stable e-infrastructure services in this context.

  • Deutsches Klimarechenzentrum (DKRZ): DKRZ has introduced PID management services into the wider IS-ENES/ESGF CMIP6 climate data infrastructure, which are now based on PID Kernel Information and other related RDA outcomes. Notably, the recommendations of the Research Data Collections WG have also been put to practice, and the resulting solution may also inform KI discussions. With the first generation of these services operational, first practical experience indicates that integration beyond the e-infrastructure and/or community boundary requires solid KI profiles, and such may easily fail if governance were not addressed. DKRZ follows these discussions with a specific view on the future IS-ENES infrastructure and potential next generation of services for CMIP7.

To facilitate adoption in practice, the group’s work needs to be informed by specific exemplary cases from these stakeholders, ensuring that both the group’s work is grounded in practical reality and that the adoption barrier is lowered, turning stakeholders into ‘guinea pigs’. The following examples illustrate where the availability of profiles and their management have or can have benefits in actual usage:

  • eRPID example: The RPID demonstrator uses the PID KI exemplar profile already registered in DTR. This will be further extended as part of eRPID. The findings of RPID/eRPID in a prototypical service environment can inform the PID KI discussions for matching practical use cases and infrastructure needs.

  • EOSC example: The constituent technical services of the European Open Science Cloud (EOSC), notably from EUDAT and EGI, could benefit from better structure of PID records through profiles and the organizational streamlining that adherence to profiles fosters. Profiles can be specific and characteristic for different life cycle phases (preliminary data sharing, data archival). In particular, the ECAS data processing services is prototyping basic data lineage tracking through a basic data input-processing-output workflow. The processing environment needs to become aware of an object’s context (e.g., are they shared preliminary data or archived data?) to a) inform the user of any underlying implications for their data analysis; b) record provenance with confidence.

  • Service orchestration example: This is a use case discussed in past iterations of the DTR WG with wider general applicability in service-oriented architectures compliant with the Data Fabric model. If data objects and service objects receive PIDs and PID KI, then it may be required to filter out services from among all objects in order to let an orchestrator determine precisely which objects can be put to which services and how service chains can be generated dynamically. Profiles are an essential ingredient to make the filtering work.

 

Relation to other RDA efforts

 

As reflected in objective #4, the group’s work should be closely coordinated with DTR #3 as this group deals with a specific application case for DTR, and the PID KI and DTR both exist within the same digital object data ecosystem (one that builds upon, but is not limited to, the Handle system for PID resolution).

In addition, the use cases and stakeholder perspectives must be further taken in by connecting to the PIDs for Instruments WG, the emerging group on Interoperability of Observable Property Descriptions, the Biodiversity Data Integration IG and the PID IG. The latter is of specific importance as a forum to engage new adopters, both from use and added-value perspectives as well as infrastructure and service provisioning perspective. In this context, discussions between FREYA and RDA as part of the PID IG group are relevant to the Kernel Information context, and will be actively followed on by the group.

Within the Instrument PID group scope, metadata under consideration is primarily based on DataCite kernel, which is much more extensive and geared towards different usage scenarios than PID KI. However, there is also ongoing discussion about implementing a more “generic Handle-based registry” profile implementation. As such, they may form a candidate for a profile for getting instruments and their products in the PID graph and enable filtering per instrument/measurement campaign/sample ‘type’ (physical, virtual, or their subtypes). A discussion along these lines will be taken up during early WG lifetime.

 

Work plan and milestones

 

The WG will first assemble requirements for profiles and supporting services. Then, it will work towards the objectives in parallel.

Assuming a first formal session at P14, the group work aligns with the following milestones:

  • P14: Presentation of group scope and goals to RDA community, engagement with additional use cases, discussion of requirements for profile management.

  • P15: Analysis of use cases and requirements complete and first draft of boundary conditions for profile management presented. Intake of community input to profile management and governance principles, review of possible frameworks.

  • P16: Presentation of first draft of full profile management framework and readily adoptable profiles. Discussion of implementation gap analysis.

  • P17: Delivery of outputs.

In the end, the WG will deliver a recommendation for KI profile management, example profiles defined together with potential adopters, profile registry interface specification (if needed).

 

Initial supporters

 

Tobias Weigel, DKRZ
Beth Plale, IU
Larry Lannom, CNRI
Ulrich Schwardmann, GWDG Jens Klump, CSIRO

Mark Parsons, RPI
Maggie Hellström, Lund University

Review period start:
Tuesday, 30 July, 2019 to Friday, 30 August, 2019
Custom text:
Body:

NOTE - This Case Statement has been deprecated in favor of the revised version attached here (18 Dec 2017).

 

 

Charter

Overview

Tracking provenance for research data is vital to science and scholarship, providing answers to common questions researchers and institutions pose when sharing and exchanging data.

 

The tasks for this Working Group focus on finding, detailing and recommending best practices for provenance representation and management.

 

This group will conduct its work in the manner of a business analysis task: identifying business needs and determining solutions to business problems. Since RDA WGs are not themselves research groups (rather groups of researchers and research agencies), this group will look for existing practice and re-present that for use rather than generate new practice.

 

The six activity areas of the Working Group will be:

  1. Common provenance Use Cases
  2. Provenance design patterns
  3. Sharing provenance
  4. Strategies for enterprise provenance management
  5. Tools for provenance
  6. Provenance data collections

Deliverables

The deliverables for this Working Group are separated into three time-based cohorts as below. Short-term goals are mostly about seeking existing practice. Medium-term about determining possible output forms for the activity areas. Long-term about delivering those outputs and After-term about ensuring continuation of output custodianship, where required.

 

Medium-term (M12)

  1. A provenance use case recording system.
  2. An initial collection of provenance use cases, elicited from other interest groups and working groups.
  3. First documented provenance design patterns generalised from use cases.
  4. A report on investigation of provenance sharing implementations.
  5. A review of existing enterprise provenance management implementations.
  6. A listings of provenance tools compiled from interviews with RDA members and the provenance research community.
  7. A directory of open and non-open provenance data collections.

Long-term (M18)

  1. A taxonomy for provenance use cases.
  2. Recommendations for aligning new use cases with provenance design patterns.
  3. Lessons for provenance sharing and enterprise management implementations.
  4. A synthesis and critical comparison of community recommendations for provenance tool custodianship.
  5. A summary of best practice principles for provenance data collection stewardship.

After-term (M18+)

  1. A sustainability plan for ongoing tool and data collection custodianship.

Value Proposition

Effective provenance management is sought by many members of the RDA and wider science data community. We propose a working group to help those members adopt existing provenance management practice. This help will be in the form of documenting provenance use cases: centralising a list of them and generalising them to reveal common ones; documenting existing technical and business processes for provenance management, assisting organisations with sharing provenance and listing existing sources of real provenance information.

 

We propose a working group on provenance patterns.

  • The patterns should relate to core RDA interests, perhaps data/data and data/people relationships.
  • Provenance vocabularies offer a level of generality/specificity that address what we perceive to be implementation gaps.
  • Our goal: constructive engagement with and response to published RDA recommendations.

WG activity points

  1. Common provenance Use Cases
    • Use Cases for provenance data or systems are often articulated in terms understood by a particular community however in our group's experience, many provenance Use Cases are differently worded instances of general Cases.
    • The establishment of a published set of UCs would allow people to compare their UCs with known UCs for which recommended implementations and other patterns may already be known. It will also allow people to consider provenance UCs posed by others that may be of future iterest to them.
  2. Provenance design patterns
    • Some ways of doing things in provenance are better than others. This activity is to generate provenance design patterns (for any provenance task such as representation, transmission, use etc.) perhaps in response to a series of provenance use cases that we would generate.
    • The patterns should relate to core RDA interests, perhaps data/data and data/people relationships.
  3. Sharing provenance
    • This may only be a single class of provenance Use Cases but it is one that is less maturely answered by the provenance research community than, say, provenance representation. This activity might be to generate requirements for the research community to answer or perhaps find that no more research is needed for sensible recommendations for provenance sharing.
  4. Strategies for enterprise provenance management
    • Some provenance use cases apply to whole organisations (or consortia) and some organisations (or consortia) may already have experience in implementing solutions to them. This activity will list such Use Cases and seek descriptions of implemented or proposed solutions from members.
  5. Tools for provenance
    • In addition to several well-known provenance conceptual models, there are tools to assist with the management of provenance. We will list those tools with comparisons in relation to RDA interests (perhaps taken from IG and other WG members).
    • We will also seek to establish a mechanism to keep these tool lists up-to-date beyond the life of the WG.
  6. Provenance data collections
    • The provenance research community knows that provenance ontologies and tools are used due to communication with them and research papers but are only anecdotally aware of many current provenance datasets (i.e. whole datasets of provenance information) and have not yet counted datasets linking to standardised provenance information. In order to know the state of operational system's adoption of provenance models and in order to provide access to public provenance data for both education and actual use, we will list as many current provenance datasets as we can find owned by RDA members and others and catalogues of datasets linking to standardised provenance information.
    • We will also seek to establish a mechanism to keep this listing up-to-date beyond the life of the WG.

Engagement

In addition to serving the RDA community directly, this Working Group aims to serve the immediate interests of existing RDA groups. Provenance is foundational to many other RDA groups' activity and thus maximal impact on the RDA community can be achieved by aligning and assisting work in existing groups. Therefore this working group will engage heavily with other groups and source its primary requirements and exemplars from other groups. Examples of intersections we believe will be productive include the following:

  • Publishing Data Workflows WG: Interest in workflow persistence and quality control, data deposit and citation, reference models and implementation.
  • Dynamic Data Citation WG: Interest in a conceptual model for citation fidelity despite changes over time.
  • PID Information Types: The Use Case "A.10 Provenance tracing."
  • Reproducibility IG: The role of provenance models in support of replication.
  • PID IG: Requirements for PIDs to maintain provenance content.
  • Archives and Records Professionals for Research Data IG: Need for semantic understanding of archived material.
  • Data Discovery IG: Upper ontology elements relevant to data discovery.
  • Preservation e-Infrastructure IG: Semantic content of preserved data holdings.

Work Plan

Timeline

  • Dec 16: Identify initial set of focus areas and discuss.
  • Feb 17: Draft case statements distributed.
  • Apr 17: Discuss draft case statements and formation of WGs at Plenary 9.
  • May 17: Finalize and circulate case statements for WGs.
  • May - Oct 17: Short-term goals.
  • Sep 17: Meeting WGs and summary of activity at Plenary 10.
  • Nov 17 - Apr 18: Medium-term goals.
  • Apr 18: Group health check Plenary 11.
  • May 18 - Oct 18: Long-term goals.
  • Sep 17: Final group Plenary 12.
  • October 18+: After-term goals.

Initial Membership

Co-chairs: Nicholas Car and David Dubin

 

Membership will be sought from the Provenance IG and supplemented with a call to both other RDA groups and known non-RDA provenance communities, such as the provenance research community.

 

Since this group's work is likely to be highly relevant to, or even directed by, other WGs, it may be sensible to have other WG members attend this group's meetings either in a liaison role or as members in their own right.

 

Adoption

Other RDA groups

This WG proposal is engagement-driven, primarily with other RDA groups, thus it is in them that we expect to see initial adoption.

 

Where another RDA group presents us with a provenance use case, we hope to either:

  • associate that use case with a generic use case and a thus a pre-made generic resolution
  • provide a direct provenance pattern-based resolution directly

In either case, we hope to promote a pattern that the RDA group will adopt and promote to its members.

 

Prov WG member institutions

Most current RDA Provenance IG members are likely to become Prov WG members for their interest in provenance is adoption and that too is the WG's goal. It is likely that outputs from this group, having been generated by its members in response to their direct needs and similar needs of other RDA groups, will therefore be fed back into their home institutions for adoption there.

 

Non-RDA groups

The international provenance research community is in contact with many potential consumers of provenance patterns due to their profile as experts on provenance. The potential consumers don't always receive the advice they are seeking due to differences in their aims and that of the research community's. The research community needs to push the provenance envelope forward and not dwell on previous work, even when that work may contain patterns perfectly suited to the potential consumers' needs.

 

The Provenance IG has and the WG will have, if membership proceeds as expected, good contacts with the international provenance research community with several IG members having made substantial contributions to provenance research initiatives such as the Open Provenance Model, ProvONE, the PROV W3C standard and having presented at many recent provenance conferences such as IPAW 2014 & 2016, TaPP 2015, TaPP 2017 (coming), Classification Soc 2017, ISWC 2016 and IDCC 2017.

 

The full listing of the IG's involvement in provenance conferences is available on the RDA Prov IG wiki:

Continued adoption

Some of the outputs from this WG are targeted at continued adoption over time. This proposal includes a deliverable for a "sustainability plan for ongoing tool and data collection custodianship" after having initially established a provenance use case database, listings of provenance tools and provenance datasets. Such a plan is currently missing from the international provenance community despite widespread recognition that it would be useful. This was recognised at IPAW 2016 independently of any RDA involvement.

 

It is expected that at the conclusion of this WG, the current provenance IG will have some role in the custodianship of its outputs.

Review period start:
Monday, 6 March, 2017
Custom text:
Body:

WG Charter:

The Certification and Accreditation for Data Science Training and Education WG will engage in a yearlong exercise to map existing certification and accreditation schemes from around the world that relate to courses of all varieties that relate to Data Science and other data-related themes. The survey will encompass undergraduate, postgraduate, professional development and other non-academic pathways. We will include modules, courses and programmes, from both public and private bodies. At the end of 12 months the Working Group will deliver a comprehensive report comparing and analyzing a broad sweep of the options together with recommendations for how best to serve the interests of future Data Science and data-related education and training.

We propose that the WG has a shorter day-to-day title of Certification and Accreditation WG.

 

Value Proposition

Certification and accreditation matters to the RDA community as it has a direct bearing on the career pathways of nearly all members. This is particularly true of this community with its novel and interdisciplinary professional pathways. Not only are Data Science and data-related fields creating new professions but they are necessitating a common classification of competences and skills so that managers and policy makers can make sense of the different traditions that are coming together in today’s hybrid qualifications and professions.

The report that this WG will deliver will be therefore of significant value to managers, leaders, policy makers, educators and trainers as well as, crucially, many RDA members who are themselves pioneering such novel data-centric careers. Ultimately, establishing a fully recognised international Data Science profession would most likely take ten-years; it is in the interests of both current and future RDA members and many others that this activity commences as soon as possible.

 

Engagement with existing work in the area

The original idea for this WG came from a series of discussions within the existing Interest Group Education and Training on handling of research data. A further stimulus was the work that is underway within the EU-funded EDISON project which was established to accelerate the creation of Data Science and other data-related professions. EDISON is particularly committed to support the promotion of certification solutions for Data Science but is also interested to support work in promoting course accreditation.

Whilst there are no other RDA Groups looking at this specific topic, we see a great opportunity to connect with other RDA groups representing other traditional employment areas including: libraries, archives, security and specific academic and professional domains.

Beyond the RDA activities we are aware of various initiaitives to take these ideas forward. These include the Mozilla Open Badges scheme: https://openbadges.org.

Discussions so far have identified three key areas that need to be addressed:

  1. The challenges presented by data-driven approaches to scientific research are evolving across traditional disciplines:
    1. Interdisciplinary teams will increasingly necessitate complimentary skills
    2. As these skills are cross-cutting and interdisciplinary, traditional qualifications will only go so far.
  2. In the professional world university diplomas are not always sufficient to prove that a candidate can apply the knowledge in a given working environment
  3. Who and what bodies will provide the accreditation and certification for these new and evolving courses and modules?
    1. They will need to be trusted and valued
    2. They will need a past and a future

Work Plan:

A work plan, following standard project management procedures, will be developed to deliver expand the following elements:

Key deliverables:

  1. M1 - updated workplan, target communities that we should consider for connections
  2. M2 - horizon scanning document highlighting professional areas to be covered, related data-science, data-related eg. data librarian professional pathways, case studies of examples eg. badging 
  3. M3 - presentation of progress to date at 9th Plenary, Barcelona, encourage greater engagement 
  4. M6 - draft report, progress to date, oppotunities/issues identified
  5. M8 - 10th Plenary, Montreal: share preliminary report with other RDA groups and expternal bodies to gather further input and comment for prparation of final report
  6. M12 - A comprehensive report comprising an extensive catalogue of Data Science courses, modules and programmes capturing their certification schemes and, where applicable, the accreditation of the bodies offering the courses 

Activities:

  • The WG will build a community of individuals with interests in this area but coming from diverse backgrounds both geographically and professionally. We appreciate that not all participants will be able to commit to the fieldwork, but we hope will provide insight and critical reasoning into the work in progress. The WG will report regularly, informally through social media and blogs, and formally through bi-monthly reports of draft elements of the final report.

  • As this a fast-track twelve-month taskforce, we will meet face-to-face between Plenaries and also hold bi-monthly telephone conference in order to maintain momentum.

  • The initial drive for the WG came from the pre-existing work that has started at a smaller scale within the EDISON project. However, the Denver BoF demonstrated that there is a strong broader interest in this topic from both around the world and also across many traditionally different disciplines. Clearly, there are many who now realize that close collaboration and interchangeable professional development is inevitable. We therefore aim to build the number of co-chairs to 4 quite swiftly in order to maintain a collegiate leadership approach.
  • As mentioned, the BoF demonstrated a broad spectrum of interest from others already quite active in other groups. We will build on this and aim to convene a Liaison group within the WG of RDA members who also happen to hold administrative roles within other RDA groups and who can therefore play a communication role between two groups.

Adoption Plan:

The WG will deliver a comprehensive report after 12-months.

We anticipate that the recommendations from the report would include a strong recommendation for a follow on WG designed to take forward the work and further exploit the lessons learned and continue to be of benefit to the broad RDA community in terms of their professional development.

Furthermore, part of the legacy planning for the EDISON project involves delivering a functioning community portal offering a number of services to support Data Science and other data-related professions. The report on certification and accreditation would compliment such services in a manner that could benefit all parties.

Initial Membership:

The following people attended the BoF meeting held at the 8th Plenary in Denver and indicated an interest in supporting and participating in the proposed WG:

  • Steve Brewer, EDISON, University of Southampton, UK
  • Małgorzata Krakowian, EDISON, EGI-Foundation, NL
  • Hugh Shanahan, Royal Holloway University, UK
  • Freyja van den Boom, FutureTDM,
  • Leighton Christiansen, National Transportation Library, US
  • Natasha Simons, Australian National Data Service, AU
  • Vicky Lucas, Institute for Environmental Analytics, UK
  • Kathrin Beck,
  • Christopher Jung,
  • Pete Pascuzzi, Purdue University Libraries, US

Others interested in participating in a WG are:

  • Prof Jeremy Frey, Chemistry, University of Southampton
  • Edit Herczog
  • Dr Liz Lyon, Visiting Professor, School of Information Sciences (iSchool), University of Pittsburgh

Working Group Chairs:

Co-chairs: Steve Brewer (University of Southampton, EDISON project), Małgorzata Krakowian (EGI Foundation, EDISON project)

We are keen to expand to 4 co-chairs once up and running. We have had interest from a number of other RDA members from regions beyond Europe, but none so far have been able to commit to a co-chair role due to existing demands on their time. Please do get in touch if interested in taking on this role.

Both co-chairs are experienced and qualified project managers and have good experience of the RDA ways of working having attended a number of previous Plenaries over the last few years.

Review period start:
Monday, 30 January, 2017
Custom text:
Body:

Provenance

Research data provenance is: information about the inputs, entities, systems, and processes that influence data of interest. It can be recorded and stored and can provide a historical record of the data and its origins.

 

Group Purpose

The Research Data Provenance Interest Group exists to coordinate activities to do with provenance within the RDA.The group has been in existence for a number of years and has many links to many other RDA groups.

 

Group's work in 2017

In 2017, the major undertaking of the RDP IG is to form a provenance working group with specific objectives and an 18-month lifespan. The intention is for that group to be formed at Plenary 9 in April, 2017. The Case Statement for that WG can be found here:

Other goals for the IG in 2017 are:

Review period start:
Tuesday, 10 January, 2017
Custom text:
Body:

NOTE - The following Charter text has been revised, see the attached document - 29 Jan 2018

 

 

RDA Interest Group Draft Charter Template
Name of Proposed Interest Group: Preservation Tools, Techniques, and Policies

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

The Preservation Tools, Techniques, and Policies (PTTP) IG provides a forum to bring together domain researchers, data and informatics experts, and policy specialists to discuss such issues as:

  • What data/software/artifacts/documentation (hereafter referred to as “knowledge products”) should be preserved for sharing, re-use, and reproducibility for a given research domain? For other domains?
  • What tools are available for researchers to preserve these elements in a manner that does not obstruct or hinder their research?
    • What are the strengths and weaknesses of these tools?
    • Are there common features that could allow tools from one domain to be re-used elsewhere?
    • Are there tools that archives/repositories could provide that could make preservation much easier for researchers?
    • What are the longer-term development goals of each of these tools?
  • What preservation policies exist, imposed by government agencies, publishers, or other actors? How are they changing? How are they implemented? What are their strengths and weaknesses?
  • How can preservation policies be implemented in a way that aids research both now and in the future?
    • How does this depend on the tools provided?

Through the course of these discussions, the PTTP IG acts to strengthen the dialogue between domain researchers and the data community by focusing on how researchers are enabled to use previously generated and preserve new results. This enhanced engagement amplifies the voice of the research community within the fabric of RDA. The additional focus on policy considerations, by nature nation-, agency-, and organization-specific, serves to illustrate the means by which research preservation can be encouraged (or required) and the implications of these policy decisions.

Given that one must preserve knowledge products before one can (usefully) share them, the mechanisms by which this preservation happens is primarily in the hands of the researcher and should be a critically important element to the mission of the RDA. The quality of the data and the information relevant to their creation can only be guaranteed by the researcher who produces the data. Thus, it is in the RDA’s best interest to consider this an integral part of its progress.

This group has obvious synergies with the Reproducibility IG, the Provenance IG, the Active Data Management Plans IG, and the Preservation e-Infrastructure IG, among others. It entirely complementary in charter/focus with the existing Preservation e- Infrastructure IG.

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):
Largely absent from formal RDA deliberations thus far have been discussions of how researchers can interact with repositories in order to preserve their findings. Most researchers do not consider preservation as part of their research workflow, and, when confronted with an unfamiliar repository interface for data ingestion, are unable to provide the information required. They do not, as a matter of course, use tools that allow the automatic generation of the necessary metadata and other information that is necessary for preserving the knowledge behind their research results. In fact, for many researchers, the situation is represented by the (re-purposed) familiar cartoon, below:

The PTTP IG will operate in the “overlap space” that is (somewhat unfairly) represented as empty in the above figure. The IG will explore how knowledge preservation is currently being done, what tools exist, what are their strengths and shortfalls, and how policy considerations are (if at all) driving preservation strategies, preservation tool development, and preservation tool adoption. These discussions are extremely urgent given the impending implementation of “open data” policies from all US funding agencies and the corresponding move in the EU in this same direction. The knowledge preservation tools for most researchers are either inadequate or woefully under-adopted. This clash between the research enterprise and policy can only be resolved with discussions between the primary stakeholders. RDA is the only global forum that currently provides an opportunity for these discussions. By encouraging increased participation by domain researchers in these important discussions, the PTTP IG can make significant contributions to solving one of the most important issues around data and research.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place. Articulate how this group is different from other current activities inside or outside of RDA.):

Following from the issues listed above, the PTTP IG will:

  • Catalogue available preservation tools, including capabilities, compatibilities, and rates of adoption, and make this information available to researchers and archivists. This catalogue will serve as a basis for discussion of tool development and deployment in order to better meet the needs of diverse research communities.

  • Undertake outreach activities, including holding sessions at RDA plenaries and conducting outside workshops to engage researchers and archivists in preservation tool specification and, potentially, adoption.

  • Engage related RDA IGs: Metadata, Provenance, Reproducibility, Active Data Management Plans, Preservation e-Infrastructure; and domain-specific IGs in a wider dialogue around preservation needs, tools, and the specifications thereof.

  • Survey preservation policies across countries, funding agencies, and research areas to assemble a comprehensive view of researcher and archive responsibilities

  • Engage representatives of funding agencies, either at RDA plenaries or at other workshops, in order to involve them in the details of these discussions.

     

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities. Also address how this group proposes to coordinate its activity with relevant related groups.):

The PTTP IG seeks to connect domain researchers with data scientists and data-handling professionals in order to improve, first, the communication between them, and second and more important, the tools the researchers use to preserve the knowledge inherent in their research results. If this is taken as a primary goal, the communities involved are quite broad: all researchers on one hand, and all those whose goal is to provide the platforms for preservation and access on the other. This second community is currently the main constituents of RDA, and they are quite engaged already. Emissaries of the research community are also enthusiastic RDA members. The PTTP IG will prevail upon this smaller cadre for outreach (or in-reach) to other researchers who should be involved in the discussions. Several means will be exploited to achieve additional in-reach or outreach to broader communities, including attendance at scientific society meetings or at domain-specific conferences. Additional, smaller workshops outside of RDA plenaries may be a more targeted way to attract additional participation and dialogue. Extensive networks of interested researchers exist; the task is to bring more of them into the conversation.

In terms of coordination, members of the proposed IG have already coordinated two joint sessions at the 8th RDA Plenary in Denver with two of the related groups, the Reproducibility IG and the Active Data Management Plans IG. IG leaders from the Provenance IG and several of the discipline-specific groups were also attendees at those sessions. Thus, coordination is already occurring. Maintaining open lines of communication and combining meeting sessions when appropriate should be relatively straightforward.

Outcomes (Discuss what the IG intends to accomplish. Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

A. As mentioned above, the PTTP IG intends to produce two catalogues/reviews:

  1. A taxonomy of preservation tools, including their features, strengths and

    weaknesses, and their rates of adoption in various research domains

  2. A survey of open access policies worldwide

B. A primary outcome of this IG will be better communication between the data scientist/archivist realms and that of the domain researcher

C. A primary outcome of this IG will be wider adoption of preservation tools by domain researchers

D. A primary outcome of this IG will be the delineation of desirable characteristics and features of scientific preservation tools for future tool development

A potential WG project would be to choose a pilot research domain with no good preservation tools and to solve that particular problem in a manner that isn’t completely domain-specific.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

The PTTP IG will have meetings every month to six weeks, partially reflecting on issues raised by previous plenary meetings, and planning for the next round of plenary sessions. Significant attention will also be devoted to accomplishing the goals of the IG, namely greater researcher participation and the development of the proposed catalogues of tools and policies. Planning of additional workshops, when appropriate, will certainly keep the level of engagement high.

 

Timeline (Describe draft milestones and goals for the first 12 months):

Goals:
1. Organize at least one breakout session at the 9
th RDA Plenary in Barcelona, including researchers who have not previously attended RDA. a. Timeline: months

2. Compile preliminary list of available preservation tools a. Timeline: 3 months

3. Compile preliminary tool taxonomies (features, strengths, weaknesses, adoption) a. Timeline: available for 10th Plenary (12 months)

4. Plan for Preservation policy discussion at 10th Plenary  a. Timeline: begin discussions in Barcelona (6 months), detailed planning finished: 9 months

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):   Visible in the attached PDF

Review period start:
Monday, 9 January, 2017
Custom text:
Body:

Introduction

     Small Unmanned Aircraft Systems (sUAS) are rapidly becoming important tools for data capture across many Scientific domains, as well as within commercial industry.  sUAS have the potential to transform how data are captured in many arenas by, offering higher temporal and spatial resolutions, with less impact on the environments being monitored, and access to new locations and parameters.  In many cases these advantages are further accompanied by lowered costs and increased human safety during data capture.   

 

As a new technology, however, there are currently no industry-wide accepted best practices for sUAS sensor and flight data handling and management.  There are many reasons for why such would be beneficial but 3 of particular note include:

(1) The creation of standards would lower the barrier to entry and  innovation in terms of what might be monitored with sUAS, by reducing the number of unknowns a new user faces and providing working examples to serve as guides.

(2) With no common goal standards to build to, the development of mature tools for sUAS captured data processing and fusion (with sUAS and other data sources) is currently hampered.  As a consequence, each use case generally develops a unique custom pipeline that only sees one-time use. 

(3) sUAS captured data is - for the most part - not being managed according to data stewardship best practices, such as would ensure the data is FAIR, as articulated by Force11 (Findable, Accessible, Interoperable, and Re-usable).  

 

This interest group therefore seeks to explore and publish (via the RDA community based working group model), some best practices as regards the handling of sUAS captured sensor and flight data.  By publishing such, after a broad, cross-community engagement process, it is hoped and expected that such will see adoption by both those already using sUAS for scientific work those just beginning to explore their possibilities.  They will therefore address the 3 concerns laid out above, with the associated positive consequences for the scientific community. These outcomes also align directly with the RDA’s Vision and Mission focus, namely, promoting the open sharing of data.

 

User scenario(s) or use case(s) the IG wishes to address

There are many examples to list here, the following 3 specific examples are selected solely for the broader context they represent:

(1) It is possible to place a temperature sensor on a sUAS. However, there is currently no other equivalent (spatially or temporally) example of capturing temperature data.  It is therefore left to each individual researcher to; create a sampling protocol, to select a data storage format, to determine which of the many possible metadata parameters are worth storing, to develop a tool for processing the captured data for integration with other data sets, and finally to choose how to publish the captured data and with what metadata.

(2) It is currently a non-trivial task (generally one that requires at least team including of members with electrical, computational, and mechanical engineering expertise, along with the target science expertise) to go about using a sUAS to capture data in the field.  As a result, there is a new industry evolving that is able to provide many of the desired data products to a researcher for a fee.  If standard practices existed these providers would firstly be able to utilise them where advantageous to their own models. Secondly, researchers would be able to require the commercial providers adhere to such, so as to ensure good open data stewardship practices are upheld.

(3) As indicated above it is is currently a non-trivial task to use a sUAS based sensory system.  However, in addition to the industry avenue - thanks to the long standing hobbyist Remote Control market - there is already a highly sophisticated and very mature fully open sUAS stack, that is also available to researchers.  While already mature in fundamental function this stack is immature in terms of usability and science use case features.  It therefore still requires many of the above mentioned expertise to be successfully utilised.  However, many of these remaining challenges could be removed or overcome, if the appropriate common standards were in place for developers to build to. 

 

Objectives:

  1. Provide a venue for data standards and recommendations comparisons with oceanographic AUVs, and other similar platforms.
  2. Identify common and divergent data needs across sUAS implementations in different domains.
  3. Identify a community aggregation point for others in the field who are currently isolated.
  4. Identify community partnerships, including with industry, tech companies/manufacturers, and computing organizations and infrastructures.
  5. Provide a venue for ongoing community discussion around the legalities, logistics and opportunities governing sUAS use, given that sUAS are a relatively new data collection platform.

 

Participation:

Within RDA:

Agricultural Data IG, Geospatial IG, Metadata IG, Marine Data Harmonization, Vocabulary Services IG, Weather Climate and Air Quality IG

 

External to RDA:

Earth Science Information Partners (ESIP): This group will be closely linked with the Earth Science Informatics community through joint development (and continued) collaboration with the Federation of Earth Science Information Partners (ESIP). The Drone Cluster (chaired by Lindsay Barbieri and Jane Wyngaard) provides ample opportunity to work closely with Earth Science data practitioners from NASA, NOAA, USGS, USDA and other major sUAS research organizations. Sessions at biannual meetings and monthly telecons have set the stage for collaborative work and can continue to attract sUAS user interest both from the researcher and data practitioner perspective. Additionally, previous collaborations between the ESIP Drone Cluster and the ESIP Education Workgroup have already resulted in sUAS-use education for K-12 teachers and further workshops for education and implementation activities could be developed.

 

The following is a list of groups whom Wyngaard and Barbieri have been in contact with, with interest in helping to develop further data and metadata standards and community working relationships:

  • AgGateway: Consortium of over 300 agricultural industry partners (including sUAS companies) for the development of agricultural industry standards. Barbieri has attended their annual meeting, presented during their geospatial working group session, and has garnered interest and support from their UAS precision agriculture community.
  • UAViators, Humanitarian UAV Network: With over 2,500 members in 80+ countries they promote the safe, coordinated and effective use of UAVs for data collection and cargo delivery in a wide range of humanitarian and development settings by developing and championing international guidelines for the responsible use of UAVs. Barbieri has connected with Patrick Meier (director), and had him speak at an ESIP meeting and garnered interest for the continued discussion and community development of UAS data standards.
  • The American Geophysical Union (AGU): Members of the AGU are currently discussing formalizing a UAS Focus Group, or more formalized UAS in Earth Sciences working group. Barbieri has been in communication with them and garnered interest and support for collaboration between AGU Focus Group and an RDA IG.

 

Other organizations we intend to reach out to, with whom we’ve had some communication and collaborative ties, but no direct explicit RDA IG communication yet:

  • OGC, DOT, USGS, W3C, CTEMPS, ASPRS, NOAA

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

  1. Provide a discussion venue for sUAS use within many disciplines to distill current data and metadata uses and needs - with a final report on current practices and identify gaps.
  2. Provide a list of recommended data formats for a relevant range of parameters.
  3. Provide a list of recommended metadata formats for a range of relevant parameters.
  4. Provide a recommended parameter naming convention to be used.
  5. Provide a recommended file naming convention to be used.
  6. Provide an international and transdisciplinary community platform for continued discussion, development, and implementation of sUAS data recommendations.

     

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • Regular telecons,potentially subdivided into relevant sections, and as frequent as it relevant for each.For instance, initially there may need to be a weekly telecon for those interested in the broad goal and contributing new insights.This might fade to a monthly telecon.Simultaneously, there may need to be a weekly telecon for those interested in and focused on organising the first kickoff session.Post P9 this may convert into a weekly telecon focused on spinning off a working group.
  • Within the USA, the ESIP drone cluster will support bi-annual meetings at meetings in January and July annually.It is hoped that similar equivalent local meetings will develop in Europe and elsewhere.
  • The Interest group may potentially support the submission of proposals where the goals of such align with those of this Interest Group.
  • Active documentation of IG activity through use of the Open Science Framework, RDA website, or other web-based project management tool, and possible ongoing collaboration through Slack or other online host.

 

Timeline (Describe draft milestones and goals for the first 12 months):

  • Hold a kick-off session at P9 in April 2017 that sees contributions from as many relevant sectors as possible (sUAS manufacture and data collection-processing industry, various academic and non-academic current sUAS users, data practice experts, RC hobbiest sUAS community members, and experts from relevant analogous fields).
  • Post P9, host continued community discussions to develop a 3 year strategic plan for the sUAS RDA IG, including targeting a specific goal to address via a working group by then end of the first 12 months.
  • Conduct a Survey with sUAS users and leaders from a variety of disciplines and sectors to draft a report on current sUAS data and metadata practices and identification of the gap between current practices and ideal data and metadata needs. With the goal of publishing this report and hosting a follow up workshop.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

Name                              Title                                               Institution                              

Jane Wyngaard               Data Technologist                        University of Notre Dame

Lindsay Barbier               Doctoral Student                          University of Vermont Gund Institute

Rob Stevenson               Associate Professor of Biology    University of Massachusetts Boston 

Cynthia Parr                    Technical Information Specialist   United States Department of Agriculture

Vanessa  Raymond         Graduate Research Assistant      Geographic Information Network of Alaska

Bill Teng                          Programme Manager                    National Aeronautics and Space Administration

Karen Anderson              Associate Professor                      Exeter University

Adam Steer                     Earth systems data specialist       National Computational Infrastructure

Charles Vardeman II       Professor                                       University of Notre Dame

Lance Christensen         Researcher                                    Jet Propulsion Laboratory

Sean Barberie                Data Scientist                                 University of Alaska Fairbanks

Stephen Gray                 Senior Research Data Librarian     University of Bristol

 

Add more lines as needed by hitting the ‘tab’ key at the very end of the ‘Title’ line

Review period start:
Wednesday, 21 December, 2016
Custom text:
Body:

Introduction

Increasing the availability of research data for reuse is in part being driven by research data policies and the number of funders and journals and institutions with some form of research data policy is growing. The research data policy landscape of funders, institutions and publishers is however too complex (Ref: http://insights.uksg.org/articles/10.1629/uksg.284/) and the implementation and implications of policies for researchers can be unclear.  While around half of researchers share data, their primary motivations are often to carry out and publish good research, and to receive renewed funding, rather than making data available. Data policies that support publication of research need to be practical and seen in this context to be effective beyond specialist data communities and publications.

 

Use cases and user scenarios

The prevalence of research data policies from institutions and research funders (such as the UK research councils and European Commission) is increasing (Ref: https://riojournal.com/articles.php?id=14673), so publishers and editors are paying more attention to  standardisation and the wider adoption of data sharing policies. The International Committee of Medicial Journal Editors introduced a data sharing policy; Springer Nature is implementing a standardised research data policy framework with four standard data policy types, each with a defined set of requirements, and is encouraging adoption across all its journals (Ref: https://doi.org/10.1101/122929). More than 1000 journals have adopted one of these policies as of June 2017.  This policy framework is available for reuse by others under a Creative Commons license but requires wider debate in the research and publishing communities. We envisage there to be common elements of research data policy shared between all stakeholders, such as support for data repositories and data citation.

Much of this work draws on earlier Jisc activity in examining the potential for a tabulation of publisher research data policies. Naughton and Kernohan (2016) (Ref: http://insights.uksg.org/articles/10.1629/uksg.284/) reported that the journal data policy landscape was not at the required maturity to be comparable or indexable in this way. Jisc is  therefore committed to working with publishers in supporting the standardisation of journal data policies, with an end goal of supporting machine readable policies that would be easier for researchers and research support staff to utilize in selecting a suitable journal for publication, ensuring compliance with journal and funder data requirements.

 

Objectives and Outcomes

  • Help define common frameworks for research data policy allowing for different levels of commitment and requirements and disciplinary differences that could be agreed by multiple stakeholders

  • Identify priority areas/stakeholders where policy frameworks can be defined e.g. beginning with journal/publisher policy, then considering funder policy

  • For these prioritised areas, stimulate creation of Working Groups to:

    • Produce guidance for researchers on complying with and implementing research data policy and the tools to support compliance

  • Facilitate greater understanding of the landscape of research data policies across disciplines, institutions and learned societies

  • Increase adoption of (standardised) research data policies by all stakeholders in particular journals and publishers

  • Connect stakeholders and broaden a collective understanding of their roles and relationships in data policy implementation

The report from the RDA P9 meeting is here: https://docs.google.com/document/d/1GiJI7kJA3MgDvJyC9zw-zHIbg3n2azhN16W_2Kn1uwM/edit?usp=sharing

Minutes from first informal meeting of this group at RDA 8th Plenary are here: https://docs.google.com/document/d/1jtJyJVNOXyjondprQXvHH9xJfEDeovhchShi...

 

Participation

While the focus of the policies developed by the Group would be on publishing research data, multiple stakeholders (publishers, institutions, repositories, societies, funders) will be included. Common elements of data policy likely exist for all these stakeholders and this will be explored.

The proposed group would complement the Practical Policy WG (https://rd-alliance.org/groups/practical-policy-wg.html) as this proposed group has a specific focus on journals and publishing with a goal of harmonising and standardising policy. These seem to be prerequisites to and would feed into efforts to create machine readable and actionable policies.

The proposed group would also complement efforts aimed at publishing and citing research data, as data policy of publications should help raise awareness of both these activities.

 

Mechanism

Co-chairs will have regular conference calls (every 1-2 months) and communicate updates to group members via the RDA group mailing list and using other RDA communication resources as needed e.g. group wiki, file repository. Group members will be invited to a group/community call that will take place every 2-3 months, after an initial meeting of the group at the RDA plenary - currently scheduled for April 2017.

We will use collaborative editing tools (Google Drive etc) to rapidly share outcomes of calls, key documents and to solicit feedback from group members.

 

Timeline

The first 6-9 months will involve further discussions with members and stakeholders to prioritise the objectives and secure support for delivering them, which might require  the creation of sub-groups focused on specific tasks. We envisage our first priority to be the first listed objective, to “Help define a common framework for research data policy allowing for different levels of commitment and requirements and disciplinary differences that could be agreed by multiple stakeholders”, to support academic publishers and others in developing usable and practical research data policies. We will gather requirements in 2017 and present them to group members, by September 2017.

Our goal is to evolve from an Interest Group to a Working Group for publisher/journal policy by 2018, in coordination with RDA plenary meetings.

 

Chair/Co-chairs

Chair:
Iain Hrynaszkiewicz (iain.hrynaszkiewicz@nature.com), Springer Nature (group proposer)
Co-Chairs
Natasha Simons, ANDS
Simon Goudie, Wiley
TBC, Jisc

Review period start:
Monday, 19 December, 2016
Custom text:
Body:

WDS/RDA Publishing Data Interest Group

WDS/RDA Certification of Digital Repositories Interest Group

 

Assessment of Data Fitness for Use

 

WG Charter

The increasing availability of research data and its evolving role as a first class scientific output in the scholarly communication requires a better understanding of and the possibility to assess data quality, which in turn can be described as conformance of data properties to data usability or fitness for use. These properties are multifaceted and cover various aspects related to data objects, access services, and data management processes such as the level of annotation, curation, peer review, and citability or machine readability of datasets. Moreover, the compliance  of a data repository or data center providing datasets - for example with certification requirements - could serve as a useful proxy.

Currently, there is a fairly good understanding on how to certify the quality of a data center / repository as a whole, but there is no generally acknowledged concept for assessment of data usability (or fitness for use) of individual datasets. Some of the properties describing data usability are not available or not transparent to users and requirements for other properties cannot be matched with standards. Furthermore, current certifications and accreditations of data repositories only allow limited conclusions on the re-usability of individual datasets. Thus assessing the fitness for purpose and making a decision whether to reuse a dataset is not straightforward.  This situation  reduces  the chances of shared data being reused  and  in case of reuse could decrease the reliability of research results.

Firstly, a concept of data fitness requires assessment of quality criteria to be included as well as the weighing of each of those criteria. The process should preferably lead to the development  of a corresponding metric. Secondly, we want to find effective ways to expose and communicate this metric, for e.g. by using a labelling or tagging system whereby different usability levels are importantly made explicit.

The proposed working group would work towards the following deliverables:

  • The definition of criteria and procedures for assessment of fitness for use

  • The development of a system of badges/labels communicating fitness for use of individual datasets

Criteria would be used such as:

  • Trustworthiness of the data centers/repositories (such as assessed through existing certifications: DSA-WDS, DIN, ISO 16363 etc.)

  • Data accessibility in terms of discoverability, openness, interoperability etc.

  • Level of curation applied  (citability, metadata completeness, data harmonization, machine readability etc.)

 

Value Proposition

The following stakeholders would benefit:

  • Researchers who deposit data can visibly improve and communicate the quality of their datasets, thereby increasing reuse and citation, which provides researcher with additional metrics showing their productivity.

  • Researchers who reuse data can more easily assess the quality of a dataset and in particular its fitness for their reuse. This makes reuse of data safer and more efficient.

  • Data centers/repositories can offer better quality data publication services - such as  more transparent curation - thus increasing the overall usage of services which in turn might lead to improving the facility's financial base.

  • Science publishers can better integrate referenced data into the editorial process and improve the review of articles and related datasets as well as citations and cross-linking of datasets and literature as a result of more transparency about data usability.   

  • Funders can make provisions for funded data archiving and publication services in accordance with their funding requirements and expectations in terms of data fitness for use (and reuse).

Overall impacts:

  • Improved and standardized data publication services

  • Improved communication of data fitness for use

  • Improved reliability and efficiency in the reuse of research data

 

Engagement with existing work in the area

Data fitness for use has been addressed in literature over the last 20 years. The topic received more attention with the general increase of data production. The following gives a brief overview of selected publications. It is by no means exhaustive.  In 1998 Tayi and Ballou stated that the concept of data quality is relative with quality being dependant on users and applications. Some authors concentrated on special aspects as for example assessment of accuracy of geospatial data (de Bruin 2001) or de-duplication relevant for example to data mining approaches (Christen & Goiser 2007). A further aspect is preservation of usability of sensitive data (Bhumiratana & Bishop 2009). In 2007, the OECD underlined the importance of efficiency in reusing data (OECD 2007). For example  efficient compilations of data from multiple providers require harmonized and machine readable data, in particular for data with high volumes. Correspondingly, the FAIR Data Publishing group supplies a set of principles for publishing data and emphasizes machine readability of data as one of the major challenges (Wilkinson et al. 2016). More recently authors also started to investigate data usability with respect to big data approaches (Jianzhong 2013). The effect of peer-review on data quality, respectively usability was stressed by Lawrence et al (Lawrence 2011) and an editorial in the Nature Scientific Data Journal (2016). Costello linked data fitness for use with the data publication concept (Costello 2013).  Also worthy to note is the ISO/IEC 25012 data quality model (ISO/IEC 2008) and the ISO 8000 Requirements for Quality Data (ISO 2009). The W3C Data on the Web Best Practices Working Group elaborated vocabularies needed to describe data quality and highlights the importance of data provenance (W3C 2016), which – if applicable — should include also detailed information about physical samples, for example in the case of biocollections (Bishop 2016). Finally, fitness for use of datasets should be transparent and comprehensive to users. The effectiveness of using badges or labels for this purpose was shown by Kidwell et al (Kidwell 2016).

In addition to works published in the literature, the WG can build on a wide range of activities that are relevant to the aims and scope of the group. In particular:

  • The Working Group would operate under the umbrella of the RDA-WDS  Data Publishing IG and RDA/WDS Certification of Digital Repositories IG

  • This Working Group will follow up on the work of the RDA/WDS Data Publishing Workflows WG and assess the impact of workflows on fitness for use (Austin et al. 2016)

  • This Working Group will follow up on the work of the Repository Audit and Certification DSA–WDS Partnership WG and develop a related certification system for individual datasets

  • The Working Group would incorporate the criteria defined by the FAIR working
    Group (Wilkinson 2016) as a starting point.

  • The Working Group will collaborate with the NIH Commons FAIR metrics group to elaborate on the FAIR criteria (NIH 2016)

  • This Working Group would incorporate the W3C data quality vocabulary to define quality processes (W3C 2016).

 

Work Plan

Work will be along four strands:

  1. Descriptions and definitions of data fitness criteria. In a first step we will gather literature and initiatives having addressed the topic. To sort out ambiguities of term definitions relevant to this group, we will collaborate with the CODATA/CASRAI development of an International Research Data Management glossary (IRiDiuM) and maintain consistency with terms in the RDA Term Definition Tool (TeD-T). The selection of data fitness criteria will be set out to the wider community before finalizing the document.

  2. Development of a fitness for use label at the level of datasets

    1. Conceptual model

      1. Selection and evaluation/weighing of criteria with respect to the different aspect of fitness for use such as curation or accessibility

      2. Considerations for adoption by stakeholders (archives/repositories: for e.g built into workflows, science publishers)

    2. Design of label/badge

  3. Development of service components

    1. Investigate how a fitness of use concept can be integrated into current certification procedures for data centers/repositories (WDS/DSA)

    2. Investigate data centers/repositories service components

    3. Setup of a testbed of several data centers/repositories

  4. Governance and sustainability:

    1. Concept for a long-term organizational structure to operate elaborated services successfully and in a way that meets the needs of all stakeholder groups. This stream will also deliver a process through which new organizations can connect to the service.

Deliverables

  • Addition or revision of relevant terms in the IRiDiuM glossary (CODATA/CASRAI)

  • Document defining fitness for use criteria

  • Description and design of fitness for use label (badge system)

  • Concept for a certification procedure including the fitness for use aspect

  • Concept for a data centers/repositories service components

  • Adoption plan including certifying organizations and governance

  • Manuscript for submission to a peer-reviewed journal.

Milestones

  • Fitness for use concept ready

  • Setup of a testbed with several data centers/repositories and science publishers

  • Prototype of fitness for use label available

Mode & frequency of operation

  • Telecons every 4 weeks

  • Face to face meetings during RDA plenaries and at least one additional workshop. RDA plenaries in particular will be used to engage the wider community and coordinate the work with related groups.

  • Additional meetings of subgroups working on particular deliverables including adoption

Timeline

Months

Action

Deliverable

April - July 2017

Terminology & definition of criteria

Overview of criteria, for discussion at 9th plenary meeting

July - December 2017

Pilot assessment of criteria

Report on outcomes of pilot, for discussion at 10th plenary meeting

December - February 2017

Development/design of badge system and integration with current certification schemes

Guide for repositories

February - August 2018

Concept for integration of data repository service components. Piloting Integration of badge system.

Governance structure and adoption plan

May 2017 - October 2018

Draft article for peer review

Submission of article to a peer-reviewed Journal.

 

Adoption Plan

Members of the proposed working group are planning to carry out a pilot during the 12-18 month timeframe in which they incorporate the insights that come out of the working group. In this pilot, a first assessment of the fitness for use of individual datasets will be carried out. This simultaneous pilot will provide the working group with important information about both benefits of and challenges with adoption which will make it easier for additional organizations to adopt the outcomes of the working group. The goal is that at the end of the 18 month timeframe, a first network of adopters will exist.

 

Initial Membership

Claire Austin (Research Data Canada, Co-Chair, claire.austin@gmail.com )

Bradley Wade Bishop (Univ. Tennessee)

Helena Cousijn (Elsevier, Co-Chair, h.cousijn@elsevier.com )

Michael Diepenbroek (PANGAEA, Co-Chair, mdiepenbroek@pangaea.de )

Amy Nurnberger (Columbia University Libraries)

Ingrid Dillo (DANS)

Stephane Pesant (MARUM)

Mustapha Mokrane (ICSU-WDS)

Markus Stocker (PANGAEA)

Rob Hooft (DTL)

Peter Doorn (DANS)

Christina Lohr (Elsevier)

Robert R. Downs (CIESIN, Columbia University)

Daniel Fowler (Open Knowledge International)

Martina Stockhause (WDC Climate, DKRZ)

Ian Bruno (CCDC)

Tim Smith (CERN/Zenodo)

Donna Scott (NSIDC)

Jonathan Petters (Virginia Tech)

Kathleen Gregory (DANS)

 

References

Austin CC, *Bloom T , *Dallmeier-Tiessen S, Khodiyar V, Murphy F, Nurnberger A, Raymond L, Stockhause M, Tedds J, Vardigan M, & Whyte A (2016). Key components of data publishing: Using current best practices to develop a reference model for data publishing. International Journal on Digital Libraries (IJDL), Research Data Publishing Special Issue. Pages 1-16. DOI 10.1007/s00799-016-0178-2

Bhumiratana B & Bishop M (2009) Privacy aware data sharing: balancing the usability and privacy of datasets, in: Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments, https://doi.org/10.1145/1579114.1579187

Bishop, B. W. & Hank, C. F. (2016) Fitness for Use in Data Curation Profiles for Biocollections [Presentation] American Society for Information Science and Technology Annual Meeting, October 2016, Copenhagen, Denmark

de Bruin S, Bregt A, van de Ven M (2001) Assessing fitness for use: the expected value of spatial data sets, International Journal of Geographical Information Science, v15, no5, p457-471

Christen P & Goiser K (2007) Quality and Complexity Measures for Data Linkage and Deduplication, in: Guillot FC & Hamilton HJ (eds) Quality Measures in Data Mining, Studies in Computational Intelligence pp 127-151

Costello M et al (2013) Biodiversity data should be published, cited, and peer reviewed, Trends in Ecology & Evolution, p1-8

International Renewable Energy Agency (2013) Data quality for the Global Renewable Energy Atlas – Solar and Wind, https://goo.gl/a8xr1Q

ISO (2009ff) Data quality, https://en.wikipedia.org/wiki/ISO_8000

ISO/IEC (2008) Data quality model, http://www.iso.org/iso/catalogue_detail.htm?csnumber=35736

Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S, Falkenberg L-S, et al. (2016) Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency. PLoS Biol 14(5): e1002456. http://doi.org/10.1371/journal.pbio.1002456  

Lawrence, B., Jones, C., Matthews, B., Pepler, S. & Callaghan, S. (2011). Citation and Peer Review of Data: Moving Towards Formal Data Publication. International Journal of Digital Curation 6, 4–37

Li Jianzhong & Liu Xianmin (2013) An important aspect of big data: data usability, Journal of Computer Research and Development, v6

NIH Commons FAIR metrics group (2016) WG interim report, https://goo.gl/n4PpWv

OECD (2007) OECD Principles and Guidelines for Access to Research Data from Public Funding, http://www.oecd.org/sti/sci-tech/38500813.pdf

Scientific Data Journal (2016) Let referees see the data, editorial, Nature Scientific Data Journal, 3, 160033. http://doi.org/10.1038/sdata.2016.33

Tayi GK & Ballou DP (1998) Examining data quality, Communications of the ACM, v41, no2, p54-57

W3C (2016) Data on the Web Best Practices: Data Quality Vocabulary, W3C Working Group Note, https://www.w3.org/TR/vocab-dqv/#mapping-ISOZaveri

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., and Baak, A. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018, http://doi.org/10.1038/sdata.2016.18

  •  
Review period start:
Tuesday, 15 November, 2016 to Thursday, 15 December, 2016
Custom text:

Pages