You are here

Body:
Review period start:
Thursday, 25 May, 2017
Custom text:
Body:

The purpose of the Early Career and Engagement Interest Group (ECEIG) is to provide a focal point for Early and Mid-Career Researchers and Professionals, including those involved in various RDA-related fellowships and Early Career programs.

While other efforts exist in RDA to support Early Career Researchers and Professionals, such as the RDA-US and RDA-EU fellowship programs, the RDA ECEIG seeks to complement existing efforts by (i) building a peers network, (ii) maintaining a “live” document of advice for Early Career Researchers and Professionals and (iii) creating opportunities for formal and informal mentoring within RDA.

Specifically, objectives of this IG are to:

1. Focus on Early Career Researchers and Professionals because they need the most support

2. Establish a volunteer-based mentoring progam 3. Network across domains to establish an interdisciplinary network of peers

4. Provide a space for people who have more experience with RDA to pass on knowledge and lessons learned to Early Career Researchers and Professionals

5. Create a social outlet specifically for Early Career Researchers and Professionals

To find out more, please view the IG Charter and leave your comments below. 

Review period start:
Tuesday, 2 May, 2017
Custom text:
Body:

Version 1.0 14th June 2017

(the draft version 0.1 and final approved version are available as attached documents at the bottom the page]

 

Introduction

Data are fundamental to the field of linguistics. Examples drawn from natural languages provide a foundation for claims about the nature of human language, and validation of these linguistic claims relies crucially on these supporting data. Yet, while linguists have always relied on language data, they have not always facilitated access to those data. Publications typically include only short excerpts from data sets, and where citations are provided, the connections to the data sets are usually only vaguely identified. At the same time, the field of linguistics has generally viewed the value of data without accompanying analysis with some degree of skepticism, and thus linguists have murky benchmarks for evaluating the creation, curation, and sharing of data sets in hiring, tenure and promotion decisions.

 

This disconnect between linguistics publications and their supporting data results in much linguistic research being unreproducible, either in principle or in practice. Without reproducibility, linguistic claims cannot be readily validated or tested, rendering their scientific value moot. In order to facilitate the development of reproducible research in linguistics, The Linguistics Data Interest Group (LDIG) plans to develop the discipline-wide adoption of common standards for data citation and attribution. In our parlance citation refers to the practice of identifying the source of linguistic data, and attribution refers to mechanisms for assessing the intellectual and academic value of data citations. The LDIG is for data at all linguistic levels (from individual sounds or words to video recordings of conversations to experimental data) and data for all of the world’s languages, and acknowledges that many of the world’s languages have high cultural value and are underrepresented with regards to the amount of information that is available about them.

 

This interest group is aligned with the RDA mission to improve open sharing of data through forming transparent discipline-specific data citation and attribution conventions to be adopted by the international research community. This interest group will add value to the RDA community by providing breadth to the current roster of RDA interest groups: linguistics is a discipline that straddles social/behavioral sciences and the humanities, and thus we have a great deal to contribute to the general RDA discussion on a multiplicity of data types. This group ties in with other initiatives in transparent research methods in linguistics at all stages of the workflow, including Open Access data archiving and publishing, reproducible methodologies and critical consideration of data licensing. The LDIG seeks to support these initiatives while focusing on data citation specifically. The LDIG provides an ongoing space for linguists to come together to improve how we manage and cite our data, and how we train linguists in good practice.

 

Who this group is for?

The LDIG is for people who work with linguistic and language data. This work includes, but is not limited to, the collection, management and analysis of linguistic data. We encourage participation from academic and speaker communities.

 

Objectives and outcomes

Our overarching objective is to contribute to a positive culture of linguistic data management and transparency in ways that are in keeping with what is happening in the larger digital data management community. To do this we aim to be a group that is able to provide tangible tools (e.g. guidelines, software) for improving the culture of data citation and attribution within linguistics. This will also involve understanding the breadth of data types linguists work with, and current uses of persistent identifiers. We outline three main objectives. For each objective we also suggest specific outcomes, which would be the focus of shorter term timelines (e.g. Working Groups):

  • Development and adoption of common principles and guidelines for data citation and attribution by professional organizations, such as the Linguistic Society of America and the Societas Linguistica Europaea, academic publishers, and archives for linguistic and language data. Principles and guidelines will follow the recommendations in the Joint Declaration of Data Citation Principles.

    Potential WG topics include:

    • Development of a common stylesheet for citation of linguistic data

    • Adoption of the style sheet by publishers, archives, organisations and individuals

    • Integrating RIS with linguistic data services like the Open Language Archives Community

  • Education and outreach efforts to make linguists more aware of the principles of reproducible research and the value of data creation methodology, curation, management, sharing, citation and attribution. Practical training also helps make proper data preparation less burdensome for researchers, and normalises this work as an expectation of the discipline. While much of this work will be practical training, outreach also needs to take into account the complex and varying attitudes towards creation of open access data sets across linguistics.
    Potential WG topics include:

    • Development of training modules

    • Delivery of training at conferences and workshops

    • Development of tools for the management of linguistic data

  • Efforts to ensure greater attribution of linguistic data set preparation within the linguistics profession.
    Potential WG topics include:

    • Framework for valuing the development of linguistic data sets in job appointments, tenure and promotion applications and in research degrees and postdoctoral research projects.

It will be up to the LDIG to decide if any of these specific outcomes would be best met by forming short term working groups with specific timelines for the deliverables. Other outcomes may be worked on within the LDIG on a more open timeline. Further goals include fostering greater transparency in research methodology, and data access rights. We expect that other outcomes will be developed as LDIG grows and responds to the changing research environment.

 

Mechanism

The co-chairs will hold a conference call every two months. The wider LDIG will convene quarterly meetings. The timezone spread of LDIG members means that these meetings will be held asynchronously in an editable document. The agenda will be posted with discussion points, and will be open for comment for a week, before actions are decided upon and delegated. We will also host face-to-face meetings at relevant linguistics conferences, such as Societas Linguistica Europaea, Linguistic Society of America, the Australian Linguistics Society, and at the RDA plenaries.

 

Interaction with groups in RDA

The following RDA groups have been identified as having interests that are relevant to LDIG, both in terms of technical and ethical issues in linguistic data management:

While setting up the LDIG we will ask at least four of our members to nominate themselves to participate in one of these other groups and be officially named as our cross-group co-ordinator. This will facilitate cross-group relevance.

Linguists from particular subfields may find that particular interest groups are relevant to particular issues in their area, for example corpus linguists may find that the Big Data IG addresses relevant issues. We encourage LDIG participants to also engage with other interest groups and working groups in the RDA.

 

Related projects and activities

There are also a number of organisations and groups outside the RDA that LDIG will engage with directly as the objectives of the group are addressed.

 

Contributors

Co-Chairs:

Andrea L. Berez-Kroeker, U Hawai‘i at Mānoa

Lauren Gawne, La Trobe University

Helene N. Andreassen, UiT The Arctic University of Norway

Potential members are welcome to sign up to the LDIG or contact the co-chairs for more information. LDIG has been promoted through the LINGUIST List, and we invite any interested party to participate.

 

Timeline

The LDIG aims to be an ongoing group, whose overall aim is to promote better practice in linguistic data management. A general timeline is given, however some of these responsibilities may be handed over to a working group specifically set up for the delivery of the data citation standards.

Outreach - first 6 months (May-November 2017)

  • April 2017    Draft charter posted

  • May 2017    Group advertised publically

  • June 2017    Amended charter posted

  • Sept 2017    Attend Montreal RDA plenary and connect with relevant RDA groups

  • Oct 2017    Finalise LDIG structure and communication processes

Groundwork - second 6 months (November 2017-May 2018)

This groundwork helps us expand the reach of the LDIG and ensures that we are as relevant and inclusive as possible. Includes attendance at April 2018 RDA plenary:

  • Survey of linguists on current data citation practice (individual practice and institutional level training opportunities)

  • Collate possible citation practices

  • Survey of linguists on current practices for academic attribution of curation of linguistic data sets in departmental tenure and promotion

Review period start:
Saturday, 8 April, 2017
Custom text:
Body:

 

The demand for reproducibility of research results is growing, therefore it will become increasingly important for a researcher to be able to cite the exact version of the data set that was used to underpin their research publication. The capacity of computational hardware infrastructures have grown and this has encouraged the development of concatenated seamless data sets where users can use web services to select subsets based on spatial and time queries. Further, the growth in computer power has meant that higher level data products can be generated in really short time frames. This means that we need a systematic way to refer to the exact version of a data set or data product that that was used to underpin the research findings, or was used to generate higher level products.

Versioning procedures and best practices are well established for scientific software and can be used enable reproducibility of scientific results. The codebase of very large software projects does bear some semblance to large dynamic datasets. Are these practices suitable for data sets or do we need different practices for data versioning? The need for unambiguous references to specific datasets was recognised by the RDA Working Group on Data Citation, whose final report recognises the need for systematic data versioning practices.

This gap was discussed at a BoF meeting held at the RDA Plenary in September 2016 in Denver, resulting in the formation of an Interest Group on data versioning. A review of the recommendations by the RDA Data Versioning IG (the precursor to this group) concluded that systematic data versioning practices are currently not available. The Working Group will produce a white paper documenting use cases and recommended practices, and make recommendations for the versioning of research data. To further adoption of the outcomes, the Working Group will contribute the use cases and recommended data versioning practices to other groups in RDA, W3C, and other emerging activities in this field. Furthermore, versioning concepts developed for research data will need to be brought in line with versioning concepts used in persistent identifier systems.

 

Value Proposition

Data versioning is a fundamental element in work related to ensuring the reproducibility of research. Work in other RDA groups on data provenance and data citation, as well as the W3C Dataset Exchange Working Group, have highlighted that definitions of data versioning concepts and recommended practices are still missing. The outcomes of the Data Versioning Working Group will add a central element to the systematic management of research data at any scale by providing recommendations for standard practices in the versioning of research data. These practice guidelines will be illustrated by a collection of use cases.

Engagement with existing work in the area

A lack of accepted data versioning practices has been recognised in different fields where reproducibility of research is a concern, e.g. data citation, data provenance, and virtual research environments. Versioning procedures and standard practices are well established for scientific software and can be used to facilitate the goals of reproducibility of scientific results. The Working Group will work with other groups within RDA and external on topics where data versioning is of importance to develop a common understanding of data versioning and standard practices.

Within RDA the Working Group will work together with the Data Citation WG to include its outputs into the collection of use cases, and with the Data Foundations and Terminology IG, the Research Data Provenance IG, the Provenance Patterns WG, and the Software Source Code IG to align data versioning concepts

The Working Group will work closely with the W3C Dataset Exchange Working Group to introduce the use cases collected by the RDA Data Versioning Working Group into the W3C Working Group’s collection of use cases and align versioning concepts. Additionally, the RDA Versioning Working Group will work closely with the AGU FAIR Data Project, in particular Task Group E on Data Workflows.

Work Plan

The outcome and deliverable of the Data Versioning WG will be a white paper documenting use cases, and recommending standard practices for data versioning. The use cases and recommendations will be aligned with the recommendations from other working groups in RDA, and external, where data versioning is of concern.

Milestones for the development of the document will be aligned with the coming RDA plenaries. The final document will be presented at the RDA Plenary in early 2019.

The Data Versioning WG will meet face-to-face at the RDA plenaries for broader discussions of the group’s findings and recommendations with other relevant RDA Groups. Between plenaries, the group will work online.

Besides sessions at the RDA plenaries, members of the working group will present the working group’s findings and recommendations at disciplinary conferences and in national working groups to achieve a broader community involvement in the development of the recommendations for data versioning.

The work on the data versioning white paper will be coordinated by the chairs of the working group. A collection of use cases will serve to illustrate the recommended practices for data versioning. The outcomes will be contributed as an addendum to the RDA Data Citation Recommendations to resolve differences between file-based and database-based applications.

Use cases collected by the Working Group will be fed into the W3C Dataset Exchange WG. This W3C Working Group has parallel timelines to the proposed RDA Data Versioning WG and will end in July 2019. It is now six months into its two year term.

Adoption Plan

The Working Group will work with existing adopters to support the adoption process and document any successes, failures, and lessons learnt. The Working Group will collect feedback from adopters and make sure it is considered for inclusion in the outputs.

The Working Group will work closely with the W3C Dataset Exchange Working Group to introduce the use cases collected by the RDA Versioning WG into the W3C Working Group’s collection of use cases and align versioning concepts. Initial outcomes will also be exchanged with the AGU FAIR Data Project.

Initial Membership

The initial membership of the Data Versioning WG will be drawn from the membership of the Data Versioning IG. The initial membership will include links to other RDA groups, e.g. Research Data Provenance, Provenance Patterns WG, and Software Source Code IG (Mingfang Wu), to the W3C Dataset Exchange Working Group (Simon Cox), and the AGU FAIR Data Project (Jens Klump).

The Data Versioning WG will initially be led by Jens Klump (CSIRO), Lesley Wyborn (ANU), Robert Downs (Columbia University) and Ari Asmi (University of Helsinki).

 

Review period start:
Tuesday, 9 January, 2018 to Friday, 9 February, 2018
Custom text:
Body:

 

Please note: The following text is the revised and final Charter dated 10 Jan 2018. It is also attached to this page.

The original Charter can be found at the end of this page.


Name of Proposed Interest Group: Virtual Research Environments IG

                                                                       

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The vision of the Research Data Alliance (RDA) is that “researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” The Mission of RDA is that it “builds the social and technical bridges that enable open sharing of data.”

Increasingly researchers who are not co-located are seeking to work dynamically together at various scales from the local to global using the internet to share data, models, workflows, best practices, publications, management and administration of their research etc. The Virtual Research Environments Interest Group (VRE-IG) seeks to build the required technical bridges, skills and social communities that enable global sharing and processing of data across technologies, disciplines and countries through the creation of shared online virtual environments. As these individual VREs grow, inevitably they need to also connect with other major research infrastructures.

 

The goal of the VRE-IG is to identify the technical issues to and, where known,  share solutions that enable online access to data and other research assets required to address issues that can range from local challenges (which are also potentially of direct relevance to researchers in other geographical areas or other research domains), to the research grand challenges currently being faced by society on global issues, e.g., societal impacts of climate change; sustainable cities; and environmentally sensitive utilisation of the scarce resources of our planet.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

  1. Domain specific VREs are being built in individual nationally and regionally funded research projects (e.g., geophysics, environment, hazards mitigation). Although the data sets being accessed are of national extent, can these tools be utilised for development of similar VREs, such as for geophysical inversions, species tracking, flood prediction and mitigation)?
  2. A new group wishes to develop a shared virtual research environment - what are the best practices defined for how to technically build and sustain a VRE?
  3. Building a VRE requires specialised skills - what are those skills and how can they best be shared?
  4. As a VRE grows it will inevitably link with major infrastructure initiatives such as European Open Science Cloud (EOSC), the US Extreme Science and Engineering Discovery Environment (XSEDE) and the Australian National Research Data Cloud (NRDC) – but how to connect to these?
  5. How can a community around online access to and processing of major data resources be built and maintained?
  6. How to access and build gateways to major supercomputer or cloud resources to enable processing of data in data intensive scientific environments?

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

VREs are synonymous with Science Gateways (SGs) in the USA and Virtual Laboratories (VLs) in Australia, and are increasingly being used to support a more dynamic approach to collaborative working across the internet. The proposed VRE-IG will explore all aspects of existing and planned future VRE/SG/VLs with the aim of moving towards common policies and best practices, such as those now being promoted by the European EOSC, the US XSEDE and the Australian NRDC. There is currently no coordination of the development of the underlying architectures, as well as specifications for components and interfaces in any of these initiatives, nor is there any agreed best practice way to connect to the major research infrastructures, in particular data to compute resources. Likewise there is also no mechanism for sharing best practice, skills, tools and software that connect tools to data in online environments that could ultimately allow these individual VREs to interoperate on a global scale. The goal of the VRE IG is to encourage initiatives tasked with developing these technologies to create ‘building blocks’ of common data infrastructures and build specific ‘data bridges’ to enable online sharing and in situ processing of data. The US SGCI (begun in August 2016) is starting to work on these challenges for the US and will closely collaborate with this IG.

 

The VRE IG will aim to act as a longer-term organization responsible for tracking and contributing to the evolution of VRE/SG/VL technologies, particularly as they relate to data access. It will also seek to engage with those making use of these online technologies in an effort to identify the necessary technical aspects, social and community building practices, required skills, as well as governance issues and best practice required to support a more coordinated approach to the development of collaborative environments that enable data sharing and in situ online processing.

 

The proposed VRE-IG group is in effect, an ‘umbrella group’ that brings together:

  1. Those initiatives that are actively developing VRE/SGs/VLs internationally;
  2. Representatives of the common eInfrastructure (eIs) services e.g. EUDAT, EOSC, XSEDE, NRDC, etc.; and
  3. Specific RDA groups (e.g., software citation, metadata IG, Versioning IG, etc.), which are developing outputs, that are themselves best practice inputs to research groups developing VREs.

 

The objectives of the VRE-IG are to

  1. Review the state of the art and compare/contrast existing VREs, VLs and SGs;
  2. Ensure associated relevant technologies are highlighted to IG participants so that they are aware of them and understand their potential to enhance their own VRE efforts, particularly those that enhance online access to data and enable in situ processing;
  3. Compare architectures used for a VREs that facilitate connecting people to the required resources online (data, tools and compute) (it may be feasible to develop a reference architecture as a dedicated Working Group);
  4. Propose specifications for standard components (software and interfaces) for a VRE/SG/VLs;
  5. Propose best practices for VRE/SG/VLs development and implementation, in particular definition of best practice for building communities around and sustaining VREs;
  6. Contributing to the SGCI’s scientific software collaborative to build a central information hub for researchers and developers seeking to connect data, tools and compute infrastructures online; and
  7. Suggest policies to stakeholders VREs in close collaborations with existing foundation projects and initiatives e.g. VRE4EIC, SGCI, XSEDE, OSG, NRDC, etc..

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

The proposed VRE-IG is domain-agnostic and is relevant to the academic, government and industry sectors. It will bring together experts in data, tools and compute resources. The group already has 92 members, who truly reflect this diversity of interest.

 

The proposed VRE-IG will engage with the relevant IG/WGs including:

  • Software Citation IG
  • Metadata IG: definition of packages of metadata elements appropriate for the VRE/SG/VL
  • Metadata catalogue WG which will potentially provide resources for documenting the metadata used in different VREs
  • Preservation Tools, Techniques and Policies IG
  • Research Data Provenance IG
  • Reproducibility IG
  • Federated Identity Management IG
  • Data Fabric IG
  • Domain groups for use cases, requirements and possible later validation
  • Mapping the Landscape IG

 

In addition, the register of VRE’s and components of VREs being developed by the SCGI, will be entered into the RD-A Mapping the Landscape IG Inventory ( https://sciencegateways.org/resources/catalog  and https://catalog.sciencegateways.org/#/home)

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

VRE/SG/VLs and associated technologies have matured in the last 10 years as evidenced by the evolution from more one-off, bespoke, single workflow systems developed by a specific set of researchers, to loosely coupled platforms shared by many groups of researchers. If the objectives outlined above for the VRE IG can be achieved it will lead to interoperating VRE/SG/VLs across multiple domains and where feasible, supported by integration of underlying national e-RIs.  The alternative is divergent and heterogeneous systems that will have high maintenance costs and are incapable (or only capable with great effort) of interoperating: these more bespoke, more specialised systems have well known issues of long-term sustainability.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

The Group will meet twice a year at each RD-A plenary. Specific VRE sessions will also be held at major domain conferences such as AGU, EGU.

 

Between RDA plenaries the momentum will be sustained via the webpage (https://rd-alliance.org/groups/vre-ig.html ) and via teleconferences for specific discussion topics.

 

Timeline (Describe draft milestones and goals for the first 12 months):

The VRE-IG has already met (and has been well attended) at previous plenaries as follows:

 

  1. 7th RD-A Plenary BoF Tokyo: Kick-Off Meeting to establish IG
    Link:
    https://rd-alliance.org/bof-kick-meeting-establish-ig-vre-virtual-research-environment.html.

    Focus: BoF to determine we should proceed to an RD-A Interest Group

  2. 8th RD-A Plenary IG Denver: VREs/Virtual Laboratories/Science Gateways - opportunities for developing a more coordinated approach to support interoperability across different systems.
    Link:
    https://rd-alliance.org/ig-virtual-research-environment-rda-8th-plenary-meeting.

    Focus: Discuss Case Statement and present on a variety of VREs

  3. 9th RD-A Plenary IG Barcelona: Virtual Research Environments - coordinating sustainable online research environments across multiple infrastructures
    Link:
    https://www.rd-alliance.org/ig-virtual-research-environment-vre-ig-rda-9th-plenary-meeting.

    Focus: Intercontinental comparison and contrast of VREs/SGs/VLs, particularly with respect to interoperability, community building and sustainability of components of a VRE.

  4. 10th RD-A Plenary IG Montreal: Understanding VREs/SGs/VLs: planning for sustainable collaborative development
    Link: https://www.rd-alliance.org/ig-virtual-research-environment-vre-ig-rda-10th-plenary-meeting.

    Focus: Intercontinental comparison and contrast of VREs/SGs/VLs, particularly with respect to understanding the differences/commonalities of VREs/SGs/VLs and on ensuring sustainability of community VRE platforms once they are built.

 

The format of meetings has been to choose 2 or 3 relevant topics and then present case studies on the topic from European VREs, Australian VLs and North American SGs..

 

For the Berlin Plenary the proposed title is Virtual Research Environments – how do I find them and what skills do I need to build and use them? The focus will be on intercontinental comparison and contrast on (1) preparing catalogs/inventories of VREs and (2) on approaches to developing skills needed to build and to use VREs.

 

At the end of each Plenary session the attendees are asked as to what are their burning issues for the next Plenary.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

There are currently 92 members of the VRE IG identified on RD-A portal (https://www.rd-alliance.org/groups/vre-ig.html). The proposed chairs are listed in bold below.

 

Current membership includes those directly engaged with the development of VRE/SG/VL technologies but also representatives of those responsible for governance structure of existing individual VRE/SG/VLs and their respective user communities.

 

 

No

FIRST NAME

LAST NAME

 

TITLE

1

     Lesley

     Wyborn

 

    

2

     Keith

     Jeffery

    

    Prof

3

     Sandra

     Gesing

    

    

4

     Helen

     Glaves

    

    

5

Afonso

Duarte

 

 

6

Alessandro

Saretta

 

 

7

Alex

Hardisty

 

 

8

Anton

Van de Putte

 

 

9

Antonio

Rosato

 

 

10

Aubert

Landry

 

 

11

Ben

Evans

 

 

12

Bert

Jagers

 

 

13

Brian

Matthews

 

 

14

Bridget

Almas

 

 

15

Christian

Page

 

 

16

Christopher

Brown

 

 

17

Clare

Austin

 

 

18

Claire

Trenham

 

 

19

Cosima

Wagner

 

 

20

Daniel

Mietchen

 

 

21

Daniele

Bailo

 

 

22

Daryl

Grenz

 

 

23

David

Morse

 

 

24

Denise

Hills

 

 

25

Dimitrios

Koureas

 

 

26

Ebrahim

Jahanshiri

 

 

27

Eva

Mendez

 

 

28

Franco

Zoppi

 

 

29

Hamish

Holewa

 

 

30

Hiela

Pienaar

 

 

31

Ingemar

Häggström

 

 

32

Johann

Van Wyk

 

 

33

Jonathan

Crabtree

 

 

34

Jose

Borbinha

 

 

35

Julian

 Barde

 

 

36

Katherine

Lawrence

 

 

37

Kheeran

Dharmawardena

 

 

38

Lene Krøl

Andersen

 

 

39

Leonardo

Candela

 

 

40

Leslie

Hsu

 

 

41

Luca

Trani

 

 

42

Madeline

Huber

 

 

43

Maggie

Hellström

 

 

44

Malcolm

Wolski

 

 

45

Mario J

Silver

 

 

46

Mark

Leggott

 

 

47

Markus

Stocker

 

 

48

Marta

Busse-Wiche

 

 

49

Martie

van Deventer

 

 

50

Martin

Hammitzsch

 

 

51

Massimiliano

Assante

 

 

52

Mathew

Fry

 

 

53

Merret

Buurman

 

 

54

Michael

Jones

 

 

55

Michael

Witt

 

 

56

Michael

Crusoe

 

 

57

Michael

Kahle

 

 

58

Michael

Maragakis

 

 

59

Michelle

Barker

 

 

60

Mingfang

Wu

 

 

61

Monique

Crichlow

 

 

62

Nancy

Wilkins-Diehr

 

 

63

Natalie

Myers

 

 

64

Nayiri

Mullinix

 

 

65

Oded

Kariti

 

 

66

Paolo

 Tagliolato

 

 

67

Pawel

Ciecieląg

 

 

68

Pedro

Goncalves

 

 

69

Peter

Fox

 

 

70

Plato

Smith

 

 

71

Pyrou

Chung

 

 

72

Raphael

Levy

 

 

73

Raul

Palma

 

 

74

Rebecca

Koskela

 

 

75

Richard

Grunzke

 

 

76

Rob

Hooft

 

 

77

Roger

Proctor

 

 

78

Roman

Gerlach

 

 

79

Rossana

Paciello

 

 

80

Sarah

Jones

 

 

81

Siddeswara

Guru

 

 

82

Silvana

Asteggiante

 

 

83

Simone

Mantovani

 

 

84

Stephanie

Cheviron

 

 

85

Timea

Biro

 

 

86

Trudi

Wright

 

 

87

Vincent

Smith

 

 

88

Weicheng

Huang

 

 

89

Yannis

Marketakis

 

 

90

Yong

 Liu

 

 

91

Yulia

Karimova

 

 

92

Zhengzhe

Wu

 

 

 

 

                                                           


Previous versions of the Charter

  • The original Charter can be found below.
  • Following the initial TAB review of the initial Charter, the group submitted a revised Charter dated July 2017, which can be downloaded here.

 


Original Charter Statement

 

Case Statement

Increasingly researchers who are not co-located are seeking to work dynamically together at various scales from the local to the international. These researchers want to share data, models, workflows, best practice, publications, management and administration of their research etc. This is to address either local challenges which are also potentially of direct relevance to researchers in other geographical areas, or they have a shared interest in addressing a common issue such as the grand challenges currently being faced by society on a global scale e.g. climate change.

Virtual research environments (VREs), synonymous with science gateways in the USA and virtual laboratories in Australia, are increasingly being used to support this more dynamic approach to collaborative working. This has led to a number of regional VRE/SG/VL initiatives such as VRE4EIC, whose goals include to increase the VRE usability for multidisciplinary research and quality of VRE user experiences. Although these systems are seeking to share some of the same resources and common infrastructure services e.g. EUDAT, GEANT, etc., there is no coordination of the development of the underlying architecture that would allow these individual VREs to interoperate.

The proposed VRE IG will explore all aspects of existing and planned future VRE/SG/VLs with the aim of moving towards common policies and best practices, such as those being promoted by the US Science Gateways Community Institute (SGCI), the Australian Research Data Services (ARDS) and common reference architectures as well as specifications for components and interfaces 

Objectives

The proposed VRE interest group would bring together those initiatives actively developing VRE/SGs/VLs and also the representatives of the common infrastructure services e.g. EUDAT, ARDS. It will also seek to engage with those seeking to make use of these technologies in an effort to identify the necessary technical aspects, governance issues and best practice required to support a more coordinated approach to the development of the collaborative environments.

The proposed IG will bring together this experience and evolve towards

  1. Reference architectures for a VRE based on superposition over e-RIs e-Research Infrastructures) and e-Is (e-Infrastructures);
  2. The definition  of a set of components (software and interfaces) for use in a VRE;
  3. The definition of interfaces between a VRE and e-RIs;
  4. The definition of best practice in constructing VREs; and
  5. Recommendations for policies in e-RIs and e-Is.

 

Value Proposition

VRE/SG/VLs are relatively new concepts and the associated technologies have matured in the last 10 years as evidenced by novel developments of these frameworks.  If the objectives outlined above for the VRE IG can be achieved it will lead to interoperating VRE/SG/VLs (themselves supported by integration of heterogeneous e-RIs that are in turn supported by e-Is).  The alternative is divergent and heterogeneous systems incapable (or only capable with great effort) of interoperating.

 

Activities

The VRE IG will aim to act as a longer-term organization responsible for tracking and contributing to the evolution of VRE/SG/VL technologies. To achieve these objectives the VRE IG will:

  1. Review the state if the art;
  2. Ensure associated relevant technologies are known and understood;
  3. From (1) and (2) propose canonical architectural models for VREs;
  4. Propose specifications for standard components (software and interfaces) for a VRE/SG/VLs;
  5. Propose best practices for VRE/SG/VLs development and implementation;
  6. Contributing to the SGCI’s scientific software collaborative to build a central information hub for researchers and developers; and
  7. Suggest policies to stakeholders of e-RIs and e-Is in close collaborations with existing projects and initiatives e.g. VRE4EIC, EVER-EST, SGCI, XSEDE, OSG, ARDS, etc..

 

Relationships with other WG/IGs

The proposed VRE-IG will engage with the relevant IG/WGs that will include:

  • Big Data IG
  • Metadata IG: definition of packages of metadata elements appropriate for the VRE/SG/VL
  • Metadata catalogue WG which will potentially provide a resources for documenting the metadata used in different VREs
  • Preservation Tools, Techniques and Policies IG
  • Research Data Provenance IG
  • Reproducibility IG
  • Federated Identity Management IG
  • Data Fabric IG
  • Domain groups for use cases, requirements and possible later validation

 

Participants

There are currently 57 members of the VRE IG identified on RDA portal (https://www.rd-alliance.org/groups/vre-ig.html). Current membership includes those directly engaged with the development of VRE/SG/VL technologies but also representatives of those responsible for governance structure of existing individual VRE/SG/VLs and their respective user communities.

The proposed group is co-chaired by:

  • Keith Jeffery (UK)
  • Helen Glaves (UK
  • Lesley Wyborn (Australia)
  • Sandra Gesing (USA)

Group Charter versions

For the original version of the Charter, see immediately above.

Following the initial TAB review, the VRE IG submitted a revised consolidated charter (July 2017) - download here  

Final version of the Charter (January 2018) is at the top of the page and can also be downloaded here

Review period start:
Friday, 1 September, 2017
Custom text:
Body:

NOTE - Please see the revised version of this Chater Statement attached to the page here.  Updated as of 16 June 2017.

 

 

      The call for Indigenous data sovereignty (ID-Sov) —the right of a nation to govern the collection, ownership, and application of its own data—has grown in intensity and scope over the past five years. To date three national-level Indigenous data sovereignty networks exist: Te Mana Raraunga - Maori Data Sovereignty Network, the United States Indigenous Data Sovereignty Network (USIDSN), and the Maiamnayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Group in Australia. Similar initiatives are underway in Hawaii and Sweden. Currently, these networks are engaging in an informal, and somewhat adhoc fashion, to share information and strategies, hold joint events, and collaborate on research. In the last two years alone this spirit of collaboration has produced four events [1], six joint panel/workshops initiatives [2], and a co-edited book, Indigenous Data Sovereignty: Toward an Agenda. Freely available online, the book had about 2,000 downloads within a month of publication, reflecting the very high level of interest in ID-Sov. These efforts notwithstanding, there are resource and infrastructure constraints to advancing the shared goals and aspirations of these ID-Sov stakeholders. What is needed is a more robust and coherent international collaboration to achieve impactful outcomes at the intersection of Indigenous data sovereignty, Indigenous data governance, and research.

The goals of the International Indigenous Data Sovereignty Interest Group are clearly aligned with the RDA mission of creating a global community to develop and adopt infrastructure that promotes data-sharing, data-driven research, and data use. Those of us already involved in the national-level networks are strong advocates for data-driven research and data use, and are also working in varied ways to build data capabilities beyond academic institutions, so as to benefit Indigenous communities. Through more effective collaboration, we seek to provide a highly visible international platform for ID-Sov that integrates and leverages existing ID-sov groups to create new opportunities for research and outreach. We also seek to attract new stakeholders beyond our current networks, including researchers, data users and indigenous communities. To that end all three existing ID-Sov networks have developed strong relationships with Indigenous stakeholders including tribes, Non Governmental Organisations, and Indigenous policy institutes, and researchers.

 

            The International Indigenous Data Sovereignty Interest Group will add value to the RDA and ID-Sov communities through the following objectives:

  1. Serving as a platform that leads to the formation of one or more Working Groups. We envisage that our ID-Sov IG would lead to the establishment of a Working Group, with a focus on co-creating an international indigenous data governance framework founded on ID-Sov principles (see below).
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

     Like other nation states, Indigenous nations need data about their citizens and communities to make informed decisions. However, the information that Indigenous nations have access to is often unreliable, inaccurate, and irrelevant. Federal, state, and local governments have primarily collected these data for their own use. Indigenous nations’ reliance on external data that do not reflect the community’s needs, priorities, and self-conceptions is a threat to self-determination.

The demand for Indigenous data is increasing as Indigenous nations and communities engage in economic, social, and cultural development on an unprecedented level. Given the billions of dollars in research funding spent each year and the increasing momentum of the international big data and open data movements, Indigenous nations and communities are uniquely positioned to claim a seat at the table to ensure Indigenous peoples are directly involved in efforts to promote data equity in Indigenous communities.

The International Indigenous Data Sovereignty Interest Group will provide infrastructure and collaboration to advance the shared goals and aspirations of these ID-Sov stakeholders. In addition, the IG provides a platform at the intersection of Indigenous data sovereignty, Indigenous data governance, and research to educate scholars across disciplines share WG outcomes and outputs with

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

     

Neither an IG nor a WG exists within or external to RDA that focuses on international collaborations on ID-Sov. The International Indigenous Data Sovereignty Interest Group objectives include:

  1. Serving as a platform that leads to the formation of one or more Working Groups. We envisage that our ID-Sov IG would lead to the establishment of a Working Group, with a focus on co-creating an international indigenous data governance framework founded on ID-Sov principles (see below).
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

 

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

     The International Indigenous Data Sovereignty Interest Group will link Indigenous data users, leaders, information and communication technology providers, researchers, policymakers and planners, businesses, service providers, and community advocates together to provide a highly visible international platform for ID-Sov that integrates and leverages existing ID-sov groups to create new opportunities for research and outreach. To that end all three existing ID-Sov networks have developed strong relationships with Indigenous stakeholders including tribes, Non Governmental Organisations, and Indigenous policy institutes, and researchers. We propose to use the RDA Plenary as an opportunity to establish relationships and connections with other IG. We also seek to attract new stakeholders beyond our current networks, including researchers, data users and indigenous communities. Note that IG members need not be Indigenous, so long as they are interested in furthering the aims of ID-Sov, data governance toward ID-Sov, and data-driven research.​

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.): The International Indigenous Data Sovereignty Interest Group envisions three categories of outcomes:

 

  1. Working Groups. We envisage that our ID-Sov IG would lead to the establishment of Working Groups, with a focus on:
    1. Co-creating an international indigenous data governance framework founded on ID-Sov principles and

    2. Establishing an international collaborative funding proposal to Indigenous stakeholders in order to design a clear pathway from research to impact.
  2. Enabling better communication and coordination across different Working Groups/Interest Groups. One of the important features of ID-Sov is that it has broad relevance and potential for impact across diverse sectors and activities including (but not limited to) agriculture, genetics, archiving, intellectual property rights relating to traditional knowledge, data versioning, and mapping. In addition to sharing strategies and resources within their groups, the IG and WGs will also be in a position to engage with a global community of researchers, policy-makers, and leaders.
  3. Serving to communicate and coordinate the efforts of the national-level Indigenous data sovereignty networks efforts, fostering synergies, bringing new groups/members to RDA and conversely bringing the WGs activities to the attention of external parties.

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

The International Indigenous Data Sovereignty Interest Group will use the following mechanisms for communication and collaboration.

  • Monthly virtual meetings via video conference, shared documents, etc.
  • Informal and frequent email contact among chairs and workgroups.
  • Monthly listserv messages to IG members from the chairs about IG updates, WG efforts, etc.
  • Biannual RDA Plenaries
  • Listserv and Facebook group where members may post about Indigenous data sovereignty and data governance resources, current events, and conferences.

 

 

Timeline (Describe draft milestones and goals for the first 12 months):

The International Indigenous Data Sovereignty Interest Group will commence via virtual chair meetings. In addition, varying combinations of IG co-chairs and members have collaborative events1 and panels2 planned in March, April, and May 2017. These events will launch the WG1 effort to co-create an Indigenous data governance framework.

 

 

4/

17

5/

17

6/

17

7/

17

8/

17

9/

17

10/

17

11/

17

12/

17

1/

18

2/

18

3/

18

Virtual Chair

Meetings

X

X

X

X

X

X

X

X

X

X

X

X

Listserv message

X

X

X

X

X

X

X

X

X

X

X

X

Workgroup 1:

Framework

 

X

X

X

X

X

X

X

X

X

X

X

Workgroup 2:

Research Collaborations

 

 

 

X

X

X

X

X

X

X

X

X

RDA Plenaries

 

 

 

 

 

X

 

 

 

 

 

 

 

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

The International Indigenous Data Sovereignty Interest Group co-chairs include representation from three continents (Europe, North America, Oceana), four countries (Australia, New Zealand, Sweden, United States) and four disciplines (demography, history, public health, sociology). The initial IG members expand the discipline to include business administration and Indigenous studies.

 

Footnotes:

[1] These are: International Workshop on Data Sovereignty for Indigenous Peoples: Current Practice and Futures Needs, Canberra, July 2015 (hosted by John Taylor and Tahu Kukutai, sponsored by Academy of Social Sciences in Australia); Indigenous open data summit, Madrid, October 2016 (hosted by USIDSN, sponsored by the Native Nations Institute and the International Open Data Conference); Indigenous Data Sovereignty Summit, Auckland, November 2016 (hosted by Te Mana Raraunga, sponsored by Swedish Research Council, Wallenberg Academy Fellows, and Ngā Pae o Te Māramatanga Māori Centre of Research Excellence); Indigenous Data Governance, Los Angeles, May 2017 (hosted by USIDSN, sponsored by the Native Nations Institute and the University of California Los Angeles).

 

[2] A joint panel ‘Indigenous Data Governance and Open Data Futures’ at the 3rd International Open Data Conference, Ottawa, May 2015; a joint panel ‘In pursuit of Indigenous data sovereignty:  Directions and challenges’ at the Native American Indigenous Studies conference, Hawaii, May 2016; a collaborative workshop ‘Indigenous Data and Information Sovereignty: Making Open Data work for Indigenous Peoples’ at the RDA Eighth Plenary, September 2016; a joint panel ‘Indigenous + Data’ at the 4th International Open Data Conference, Madrid, October, 2016; a masterclass on Indigenous Data Sovereignty at the Common Roots Indigenous Governance conference, Brisbane, March 2017; a joint panel “Indigenous Nation Data Governance: Data for Nation Rebuilding” at the National Congress of American Indians mid-year Policy Research Center data pre-conference, Uncasville, CT, June 2017.

 

 

 

 

Review period start:
Sunday, 19 March, 2017
Custom text:
Body:

Case Statement: PID Kernel Information profile management WG

(aka PID Kernel Information WG #2)

 

Co-chairs: Tobias Weigel, Beth Plale, Jens Klump

 

Motivation and scope

The PID Kernel Information WG produced a recommendation that contains i) guiding principles for identifying information appropriate as PID Kernel Information, ii) an exemplar PID Kernel Information profile, iii) several use cases, and iv) architectural considerations. PID Kernel Information is defined as the set of attributes stored within a PID record. It supports smart programmatic decisions that can be accomplished through inspection of the PID record alone.

At the RDA P13 PID Kernel Information WG session, attendees expressed enthusiasm over the PID KI work. While the WG feels that the exemplar profile reflects a consensus decision while at the same time satisfying the guiding principles, it does not preclude other profiles. But a global technology, such as the PID KI intends to be, that is without considered organizational governance or management will never be adopted except in highly leading edge (aka research) settings.

Thus the PID KI WG agreed at the conclusion of the P13 session that there was need to examine the governance and management of some small number of globally relevant PID KI profiles. Without further guidance on how to manage profiles, there is the risk of wide-scale proliferation of overlapping or incompatible profiles, which could significantly hamper long-term realization of coherent middleware that supports Kernel Information. As the PID KI depends on a type profile to be interpretable, it requires the existence of a Data Type Registry to store the type definition of the PID KI. In fact, both PID KI and DTR are part of the same digital object ecosystem. Thus the issue of governance must be coordinated with the DTR#3 group. A follow-on WG additionally presents an opportunity to further define the boundary conditions for such middleware and foster alignment across disciplines and regions.

The WG lives in the context of

  • Multiple use cases from disciplinary adoption and project work

  • Relevant work inside RDA (other WG/IG) and outside RDA (W3C PROV, SemWeb/LD in general)

 

Objectives
 

The WG will work towards the following objectives:

  1. Life cycle model defined for KI profiles and mechanisms (principles, processes, tools) through which KI profiles can be defined and governed. Define profile metadata and how to encode a profile.

  2. Baseline KI architecture extended to match the needs of profile management, for example, by including profile registries and connectors.

  3. Describe the technical interface for interaction with profile registries, most likely based on the DTR WG recommendation and API. Preferably, a separate new API does not need to be defined.

  4. As the PID KI depends on a type profile to be interpretable, and both PID KI and DTR exist in the same digital object ecosystem, the governance and management of PID KI profiles should be synergistic with the governance and management of Data Type Registries. Objective is coordinated guidance on governance between this WG and the DTR#3 group.

  5. Facilitate coordination with other RDA groups mentioned further below more generally.

 

Value proposition

 

Cyberinfrastructure providers will benefit from the architectural reference model, which makes collaboration for development and use of shared infrastructure components using PID KI profiles easier. The governance mechanisms provided by the WG are a necessary element for serving KI profiles in operational settings.

Tools and services builders will benefit from the availability of well-defined, well-managed and interoperable KI profiles as they can rely on agreed profiles to underpin specific tasks in data workflows. A common approach for interfacing with profile registries can reduce development costs, for example by sharing software library development for registry clients.

As a result of the combined efforts by cyberinfrastructure providers and service/tool builders, scientific users will benefit from the better availability of information across data life cycle stages through the connections made by the PID KI graph. The adoption of profile management can make the links in the graph more coherent.

Data producers , particularly if organized in research infrastructures, could benefit from having a wide range of KI profiles available, with ensured interoperability between them and related services. Potential needs and usages will differ between research disciplines, but knowing which profiles are supported by which repositories could help in the process of identifying the optimal storage and cataloguing facility for a given dataset.

 

Stakeholders and adoption
 

The topic is of relevance to multiple stakeholders, which will be sought out. Initially, these are:

  • The NSF-funded eRPID project: The predecessor RPID project evaluated and prototyped usage of the early Kernel Information recommendation and its findings within the project context. The eRPID successor project will continue this and provide two-way interaction on practical use cases and needs, and be informed by the PID KI WG outcomes.

  • EUDAT: The operational B2HANDLE service is relying on Kernel Information as an essential element to facilitate cross-service integration at e-infrastructure level, largely hidden from users as is in line with the KI vision. The latest integration activities (e.g. B2SHARE, OneData) introduce a complexity level where profile governance would be highly beneficial to support long-term stability.

  • Similar to EUDAT, ePIC/GWDG take further interest in the KI concept to underpin identifier-level metadata management with a stable conceptual framework, and participate in the discussion of governance processes with the eyes of a potential adopter of these processes at the ePIC management level.

  • International GeoSample Number (IGSN): IGSN interest in the KI concepts was renewed at and after the P13 session. IGSN may benefit from KI to provide better integration with e-infrastructure services, and a discussion on profile governance may be a key requirement for adoption actions.

  • DiSSCo: As a new research infrastructure in Europe connecting natural scientific collections of more than hundred collecting holding institutes across Europe, DiSSCo has a potential interest in the KI concept to inform its future e-infrastructure strategy. DiSSco is in conversation with IGSN to develop a joint Handle-based approach for specimen object identifiers, and this has the potential to lead to billions of new Handles. The KI concepts and governance discussions are relevant and crucial for long-term stable e-infrastructure services in this context.

  • Deutsches Klimarechenzentrum (DKRZ): DKRZ has introduced PID management services into the wider IS-ENES/ESGF CMIP6 climate data infrastructure, which are now based on PID Kernel Information and other related RDA outcomes. Notably, the recommendations of the Research Data Collections WG have also been put to practice, and the resulting solution may also inform KI discussions. With the first generation of these services operational, first practical experience indicates that integration beyond the e-infrastructure and/or community boundary requires solid KI profiles, and such may easily fail if governance were not addressed. DKRZ follows these discussions with a specific view on the future IS-ENES infrastructure and potential next generation of services for CMIP7.

To facilitate adoption in practice, the group’s work needs to be informed by specific exemplary cases from these stakeholders, ensuring that both the group’s work is grounded in practical reality and that the adoption barrier is lowered, turning stakeholders into ‘guinea pigs’. The following examples illustrate where the availability of profiles and their management have or can have benefits in actual usage:

  • eRPID example: The RPID demonstrator uses the PID KI exemplar profile already registered in DTR. This will be further extended as part of eRPID. The findings of RPID/eRPID in a prototypical service environment can inform the PID KI discussions for matching practical use cases and infrastructure needs.

  • EOSC example: The constituent technical services of the European Open Science Cloud (EOSC), notably from EUDAT and EGI, could benefit from better structure of PID records through profiles and the organizational streamlining that adherence to profiles fosters. Profiles can be specific and characteristic for different life cycle phases (preliminary data sharing, data archival). In particular, the ECAS data processing services is prototyping basic data lineage tracking through a basic data input-processing-output workflow. The processing environment needs to become aware of an object’s context (e.g., are they shared preliminary data or archived data?) to a) inform the user of any underlying implications for their data analysis; b) record provenance with confidence.

  • Service orchestration example: This is a use case discussed in past iterations of the DTR WG with wider general applicability in service-oriented architectures compliant with the Data Fabric model. If data objects and service objects receive PIDs and PID KI, then it may be required to filter out services from among all objects in order to let an orchestrator determine precisely which objects can be put to which services and how service chains can be generated dynamically. Profiles are an essential ingredient to make the filtering work.

 

Relation to other RDA efforts

 

As reflected in objective #4, the group’s work should be closely coordinated with DTR #3 as this group deals with a specific application case for DTR, and the PID KI and DTR both exist within the same digital object data ecosystem (one that builds upon, but is not limited to, the Handle system for PID resolution).

In addition, the use cases and stakeholder perspectives must be further taken in by connecting to the PIDs for Instruments WG, the emerging group on Interoperability of Observable Property Descriptions, the Biodiversity Data Integration IG and the PID IG. The latter is of specific importance as a forum to engage new adopters, both from use and added-value perspectives as well as infrastructure and service provisioning perspective. In this context, discussions between FREYA and RDA as part of the PID IG group are relevant to the Kernel Information context, and will be actively followed on by the group.

Within the Instrument PID group scope, metadata under consideration is primarily based on DataCite kernel, which is much more extensive and geared towards different usage scenarios than PID KI. However, there is also ongoing discussion about implementing a more “generic Handle-based registry” profile implementation. As such, they may form a candidate for a profile for getting instruments and their products in the PID graph and enable filtering per instrument/measurement campaign/sample ‘type’ (physical, virtual, or their subtypes). A discussion along these lines will be taken up during early WG lifetime.

 

Work plan and milestones

 

The WG will first assemble requirements for profiles and supporting services. Then, it will work towards the objectives in parallel.

Assuming a first formal session at P14, the group work aligns with the following milestones:

  • P14: Presentation of group scope and goals to RDA community, engagement with additional use cases, discussion of requirements for profile management.

  • P15: Analysis of use cases and requirements complete and first draft of boundary conditions for profile management presented. Intake of community input to profile management and governance principles, review of possible frameworks.

  • P16: Presentation of first draft of full profile management framework and readily adoptable profiles. Discussion of implementation gap analysis.

  • P17: Delivery of outputs.

In the end, the WG will deliver a recommendation for KI profile management, example profiles defined together with potential adopters, profile registry interface specification (if needed).

 

Initial supporters

 

Tobias Weigel, DKRZ
Beth Plale, IU
Larry Lannom, CNRI
Ulrich Schwardmann, GWDG Jens Klump, CSIRO

Mark Parsons, RPI
Maggie Hellström, Lund University

Review period start:
Tuesday, 30 July, 2019 to Friday, 30 August, 2019
Custom text:
Body:

NOTE - This Case Statement has been deprecated in favor of the revised version attached here (18 Dec 2017).

 

 

Charter

Overview

Tracking provenance for research data is vital to science and scholarship, providing answers to common questions researchers and institutions pose when sharing and exchanging data.

 

The tasks for this Working Group focus on finding, detailing and recommending best practices for provenance representation and management.

 

This group will conduct its work in the manner of a business analysis task: identifying business needs and determining solutions to business problems. Since RDA WGs are not themselves research groups (rather groups of researchers and research agencies), this group will look for existing practice and re-present that for use rather than generate new practice.

 

The six activity areas of the Working Group will be:

  1. Common provenance Use Cases
  2. Provenance design patterns
  3. Sharing provenance
  4. Strategies for enterprise provenance management
  5. Tools for provenance
  6. Provenance data collections

Deliverables

The deliverables for this Working Group are separated into three time-based cohorts as below. Short-term goals are mostly about seeking existing practice. Medium-term about determining possible output forms for the activity areas. Long-term about delivering those outputs and After-term about ensuring continuation of output custodianship, where required.

 

Medium-term (M12)

  1. A provenance use case recording system.
  2. An initial collection of provenance use cases, elicited from other interest groups and working groups.
  3. First documented provenance design patterns generalised from use cases.
  4. A report on investigation of provenance sharing implementations.
  5. A review of existing enterprise provenance management implementations.
  6. A listings of provenance tools compiled from interviews with RDA members and the provenance research community.
  7. A directory of open and non-open provenance data collections.

Long-term (M18)

  1. A taxonomy for provenance use cases.
  2. Recommendations for aligning new use cases with provenance design patterns.
  3. Lessons for provenance sharing and enterprise management implementations.
  4. A synthesis and critical comparison of community recommendations for provenance tool custodianship.
  5. A summary of best practice principles for provenance data collection stewardship.

After-term (M18+)

  1. A sustainability plan for ongoing tool and data collection custodianship.

Value Proposition

Effective provenance management is sought by many members of the RDA and wider science data community. We propose a working group to help those members adopt existing provenance management practice. This help will be in the form of documenting provenance use cases: centralising a list of them and generalising them to reveal common ones; documenting existing technical and business processes for provenance management, assisting organisations with sharing provenance and listing existing sources of real provenance information.

 

We propose a working group on provenance patterns.

  • The patterns should relate to core RDA interests, perhaps data/data and data/people relationships.
  • Provenance vocabularies offer a level of generality/specificity that address what we perceive to be implementation gaps.
  • Our goal: constructive engagement with and response to published RDA recommendations.

WG activity points

  1. Common provenance Use Cases
    • Use Cases for provenance data or systems are often articulated in terms understood by a particular community however in our group's experience, many provenance Use Cases are differently worded instances of general Cases.
    • The establishment of a published set of UCs would allow people to compare their UCs with known UCs for which recommended implementations and other patterns may already be known. It will also allow people to consider provenance UCs posed by others that may be of future iterest to them.
  2. Provenance design patterns
    • Some ways of doing things in provenance are better than others. This activity is to generate provenance design patterns (for any provenance task such as representation, transmission, use etc.) perhaps in response to a series of provenance use cases that we would generate.
    • The patterns should relate to core RDA interests, perhaps data/data and data/people relationships.
  3. Sharing provenance
    • This may only be a single class of provenance Use Cases but it is one that is less maturely answered by the provenance research community than, say, provenance representation. This activity might be to generate requirements for the research community to answer or perhaps find that no more research is needed for sensible recommendations for provenance sharing.
  4. Strategies for enterprise provenance management
    • Some provenance use cases apply to whole organisations (or consortia) and some organisations (or consortia) may already have experience in implementing solutions to them. This activity will list such Use Cases and seek descriptions of implemented or proposed solutions from members.
  5. Tools for provenance
    • In addition to several well-known provenance conceptual models, there are tools to assist with the management of provenance. We will list those tools with comparisons in relation to RDA interests (perhaps taken from IG and other WG members).
    • We will also seek to establish a mechanism to keep these tool lists up-to-date beyond the life of the WG.
  6. Provenance data collections
    • The provenance research community knows that provenance ontologies and tools are used due to communication with them and research papers but are only anecdotally aware of many current provenance datasets (i.e. whole datasets of provenance information) and have not yet counted datasets linking to standardised provenance information. In order to know the state of operational system's adoption of provenance models and in order to provide access to public provenance data for both education and actual use, we will list as many current provenance datasets as we can find owned by RDA members and others and catalogues of datasets linking to standardised provenance information.
    • We will also seek to establish a mechanism to keep this listing up-to-date beyond the life of the WG.

Engagement

In addition to serving the RDA community directly, this Working Group aims to serve the immediate interests of existing RDA groups. Provenance is foundational to many other RDA groups' activity and thus maximal impact on the RDA community can be achieved by aligning and assisting work in existing groups. Therefore this working group will engage heavily with other groups and source its primary requirements and exemplars from other groups. Examples of intersections we believe will be productive include the following:

  • Publishing Data Workflows WG: Interest in workflow persistence and quality control, data deposit and citation, reference models and implementation.
  • Dynamic Data Citation WG: Interest in a conceptual model for citation fidelity despite changes over time.
  • PID Information Types: The Use Case "A.10 Provenance tracing."
  • Reproducibility IG: The role of provenance models in support of replication.
  • PID IG: Requirements for PIDs to maintain provenance content.
  • Archives and Records Professionals for Research Data IG: Need for semantic understanding of archived material.
  • Data Discovery IG: Upper ontology elements relevant to data discovery.
  • Preservation e-Infrastructure IG: Semantic content of preserved data holdings.

Work Plan

Timeline

  • Dec 16: Identify initial set of focus areas and discuss.
  • Feb 17: Draft case statements distributed.
  • Apr 17: Discuss draft case statements and formation of WGs at Plenary 9.
  • May 17: Finalize and circulate case statements for WGs.
  • May - Oct 17: Short-term goals.
  • Sep 17: Meeting WGs and summary of activity at Plenary 10.
  • Nov 17 - Apr 18: Medium-term goals.
  • Apr 18: Group health check Plenary 11.
  • May 18 - Oct 18: Long-term goals.
  • Sep 17: Final group Plenary 12.
  • October 18+: After-term goals.

Initial Membership

Co-chairs: Nicholas Car and David Dubin

 

Membership will be sought from the Provenance IG and supplemented with a call to both other RDA groups and known non-RDA provenance communities, such as the provenance research community.

 

Since this group's work is likely to be highly relevant to, or even directed by, other WGs, it may be sensible to have other WG members attend this group's meetings either in a liaison role or as members in their own right.

 

Adoption

Other RDA groups

This WG proposal is engagement-driven, primarily with other RDA groups, thus it is in them that we expect to see initial adoption.

 

Where another RDA group presents us with a provenance use case, we hope to either:

  • associate that use case with a generic use case and a thus a pre-made generic resolution
  • provide a direct provenance pattern-based resolution directly

In either case, we hope to promote a pattern that the RDA group will adopt and promote to its members.

 

Prov WG member institutions

Most current RDA Provenance IG members are likely to become Prov WG members for their interest in provenance is adoption and that too is the WG's goal. It is likely that outputs from this group, having been generated by its members in response to their direct needs and similar needs of other RDA groups, will therefore be fed back into their home institutions for adoption there.

 

Non-RDA groups

The international provenance research community is in contact with many potential consumers of provenance patterns due to their profile as experts on provenance. The potential consumers don't always receive the advice they are seeking due to differences in their aims and that of the research community's. The research community needs to push the provenance envelope forward and not dwell on previous work, even when that work may contain patterns perfectly suited to the potential consumers' needs.

 

The Provenance IG has and the WG will have, if membership proceeds as expected, good contacts with the international provenance research community with several IG members having made substantial contributions to provenance research initiatives such as the Open Provenance Model, ProvONE, the PROV W3C standard and having presented at many recent provenance conferences such as IPAW 2014 & 2016, TaPP 2015, TaPP 2017 (coming), Classification Soc 2017, ISWC 2016 and IDCC 2017.

 

The full listing of the IG's involvement in provenance conferences is available on the RDA Prov IG wiki:

Continued adoption

Some of the outputs from this WG are targeted at continued adoption over time. This proposal includes a deliverable for a "sustainability plan for ongoing tool and data collection custodianship" after having initially established a provenance use case database, listings of provenance tools and provenance datasets. Such a plan is currently missing from the international provenance community despite widespread recognition that it would be useful. This was recognised at IPAW 2016 independently of any RDA involvement.

 

It is expected that at the conclusion of this WG, the current provenance IG will have some role in the custodianship of its outputs.

Review period start:
Monday, 6 March, 2017
Custom text:
Body:

WG Charter:

The Certification and Accreditation for Data Science Training and Education WG will engage in a yearlong exercise to map existing certification and accreditation schemes from around the world that relate to courses of all varieties that relate to Data Science and other data-related themes. The survey will encompass undergraduate, postgraduate, professional development and other non-academic pathways. We will include modules, courses and programmes, from both public and private bodies. At the end of 12 months the Working Group will deliver a comprehensive report comparing and analyzing a broad sweep of the options together with recommendations for how best to serve the interests of future Data Science and data-related education and training.

We propose that the WG has a shorter day-to-day title of Certification and Accreditation WG.

 

Value Proposition

Certification and accreditation matters to the RDA community as it has a direct bearing on the career pathways of nearly all members. This is particularly true of this community with its novel and interdisciplinary professional pathways. Not only are Data Science and data-related fields creating new professions but they are necessitating a common classification of competences and skills so that managers and policy makers can make sense of the different traditions that are coming together in today’s hybrid qualifications and professions.

The report that this WG will deliver will be therefore of significant value to managers, leaders, policy makers, educators and trainers as well as, crucially, many RDA members who are themselves pioneering such novel data-centric careers. Ultimately, establishing a fully recognised international Data Science profession would most likely take ten-years; it is in the interests of both current and future RDA members and many others that this activity commences as soon as possible.

 

Engagement with existing work in the area

The original idea for this WG came from a series of discussions within the existing Interest Group Education and Training on handling of research data. A further stimulus was the work that is underway within the EU-funded EDISON project which was established to accelerate the creation of Data Science and other data-related professions. EDISON is particularly committed to support the promotion of certification solutions for Data Science but is also interested to support work in promoting course accreditation.

Whilst there are no other RDA Groups looking at this specific topic, we see a great opportunity to connect with other RDA groups representing other traditional employment areas including: libraries, archives, security and specific academic and professional domains.

Beyond the RDA activities we are aware of various initiaitives to take these ideas forward. These include the Mozilla Open Badges scheme: https://openbadges.org.

Discussions so far have identified three key areas that need to be addressed:

  1. The challenges presented by data-driven approaches to scientific research are evolving across traditional disciplines:
    1. Interdisciplinary teams will increasingly necessitate complimentary skills
    2. As these skills are cross-cutting and interdisciplinary, traditional qualifications will only go so far.
  2. In the professional world university diplomas are not always sufficient to prove that a candidate can apply the knowledge in a given working environment
  3. Who and what bodies will provide the accreditation and certification for these new and evolving courses and modules?
    1. They will need to be trusted and valued
    2. They will need a past and a future

Work Plan:

A work plan, following standard project management procedures, will be developed to deliver expand the following elements:

Key deliverables:

  1. M1 - updated workplan, target communities that we should consider for connections
  2. M2 - horizon scanning document highlighting professional areas to be covered, related data-science, data-related eg. data librarian professional pathways, case studies of examples eg. badging 
  3. M3 - presentation of progress to date at 9th Plenary, Barcelona, encourage greater engagement 
  4. M6 - draft report, progress to date, oppotunities/issues identified
  5. M8 - 10th Plenary, Montreal: share preliminary report with other RDA groups and expternal bodies to gather further input and comment for prparation of final report
  6. M12 - A comprehensive report comprising an extensive catalogue of Data Science courses, modules and programmes capturing their certification schemes and, where applicable, the accreditation of the bodies offering the courses 

Activities:

  • The WG will build a community of individuals with interests in this area but coming from diverse backgrounds both geographically and professionally. We appreciate that not all participants will be able to commit to the fieldwork, but we hope will provide insight and critical reasoning into the work in progress. The WG will report regularly, informally through social media and blogs, and formally through bi-monthly reports of draft elements of the final report.

  • As this a fast-track twelve-month taskforce, we will meet face-to-face between Plenaries and also hold bi-monthly telephone conference in order to maintain momentum.

  • The initial drive for the WG came from the pre-existing work that has started at a smaller scale within the EDISON project. However, the Denver BoF demonstrated that there is a strong broader interest in this topic from both around the world and also across many traditionally different disciplines. Clearly, there are many who now realize that close collaboration and interchangeable professional development is inevitable. We therefore aim to build the number of co-chairs to 4 quite swiftly in order to maintain a collegiate leadership approach.
  • As mentioned, the BoF demonstrated a broad spectrum of interest from others already quite active in other groups. We will build on this and aim to convene a Liaison group within the WG of RDA members who also happen to hold administrative roles within other RDA groups and who can therefore play a communication role between two groups.

Adoption Plan:

The WG will deliver a comprehensive report after 12-months.

We anticipate that the recommendations from the report would include a strong recommendation for a follow on WG designed to take forward the work and further exploit the lessons learned and continue to be of benefit to the broad RDA community in terms of their professional development.

Furthermore, part of the legacy planning for the EDISON project involves delivering a functioning community portal offering a number of services to support Data Science and other data-related professions. The report on certification and accreditation would compliment such services in a manner that could benefit all parties.

Initial Membership:

The following people attended the BoF meeting held at the 8th Plenary in Denver and indicated an interest in supporting and participating in the proposed WG:

  • Steve Brewer, EDISON, University of Southampton, UK
  • Małgorzata Krakowian, EDISON, EGI-Foundation, NL
  • Hugh Shanahan, Royal Holloway University, UK
  • Freyja van den Boom, FutureTDM,
  • Leighton Christiansen, National Transportation Library, US
  • Natasha Simons, Australian National Data Service, AU
  • Vicky Lucas, Institute for Environmental Analytics, UK
  • Kathrin Beck,
  • Christopher Jung,
  • Pete Pascuzzi, Purdue University Libraries, US

Others interested in participating in a WG are:

  • Prof Jeremy Frey, Chemistry, University of Southampton
  • Edit Herczog
  • Dr Liz Lyon, Visiting Professor, School of Information Sciences (iSchool), University of Pittsburgh

Working Group Chairs:

Co-chairs: Steve Brewer (University of Southampton, EDISON project), Małgorzata Krakowian (EGI Foundation, EDISON project)

We are keen to expand to 4 co-chairs once up and running. We have had interest from a number of other RDA members from regions beyond Europe, but none so far have been able to commit to a co-chair role due to existing demands on their time. Please do get in touch if interested in taking on this role.

Both co-chairs are experienced and qualified project managers and have good experience of the RDA ways of working having attended a number of previous Plenaries over the last few years.

Review period start:
Monday, 30 January, 2017
Custom text:
Body:

Provenance

Research data provenance is: information about the inputs, entities, systems, and processes that influence data of interest. It can be recorded and stored and can provide a historical record of the data and its origins.

 

Group Purpose

The Research Data Provenance Interest Group exists to coordinate activities to do with provenance within the RDA.The group has been in existence for a number of years and has many links to many other RDA groups.

 

Group's work in 2017

In 2017, the major undertaking of the RDP IG is to form a provenance working group with specific objectives and an 18-month lifespan. The intention is for that group to be formed at Plenary 9 in April, 2017. The Case Statement for that WG can be found here:

Other goals for the IG in 2017 are:

Review period start:
Tuesday, 10 January, 2017
Custom text:

Pages