You are here

Body:

** Note - the following text has been deprecated in favor of the attached Version 2 Case Statement document as of 21 May 2018.

 

 

 

Case Statement to Create a Working Group On Capacity Development for Agriculture Data

 

Submitted to Research Data Alliance

 

Case Statement for Capacity Development for Agriculture Data WG

Charter

Introduction and Rationale

 

Introduction

 

This case statement is an effort to highlight the adequate need for capacity development of research data and open data practitioners in agriculture and related sciences, in particular to the existing RDA working groups related to IGAD. It shall broadly look at the existing gaps that requires urgent attention, so as to provide the right capacity for roles of all the stakeholders within the agriculture and nutrition ecosystem that can support international development goals. It shall present diverse use cases and linkages to different sectors of development, and cites resources for more in-depth examination of capacity development issues. It will aim to provide in-depth strategic actionable insights into capacity and capabilities necessary for managing today’s large scale data assets and opportunities for stakeholders to network; set methodologies for curriculum development and new approaches to some of the key techniques and tools for agricultural data management. It will provide practitioners with the necessary capacities to contribute to and utilize the vast open data ecosystem in agriculture and nutrition.

 

1. Rationale

 

The aim of this Working Group (WG) is to develop synergies between existing education and training activities and agricultural science needs by performing a landscape assessment to identify existing gaps and training requirements within the Interest Group on Agricultural Data (IGAD) WGs related. A particular focus will be on sharing knowledge about training initiatives and technologies, reducing digital divides so that researchers and practitioners in developing countries can also benefit. It will also empower the existing collaboration with GODAN and GODAN Action.

 

 

Formed in 2013, since its inception the Interest Group on Agricultural Data (IGAD) has grown in community strength to becoming one of the RDA’s most prominent Thematic Groups. IGAD is a domain-oriented group working on all issues related to global agriculture data. It represents stakeholders in managing data for agricultural research and innovation, including producing, aggregating and consuming data. Beyond this IGAD promotes good practices in research with regard to data sharing policies, data management plans and data interoperability, and it is a forum for sharing experience and providing visibility to research and work in agricultural data.

 

At the RDA 9 in Barcelona , more than 100 data experts from more than 35 countries came together for the IGAD pre-meeting where the participants were briefed about ongoing activities of IGAD Working Groups (WGs) and conveyed new ideas through the proposals for  future work. At the RDA9 Pre-meeting for IGAD there was strong interest for developing a WG for building synergies in Capacity Development, in the context of research data in agriculture and its potential contribution to the realization of the Sustainable Development Goals (SDGs).

 

2. Proposal

 

The  Capacity Development Working Group aims to

 

  1. Perform a landscape assessment to identify existing gaps and training requirements within the context of IGAD network including GODAN
    • Using existing needs assessment survey data from GODAN Action
    • Literature review of existing training capacity challenges and opportunities, taking into account language barriers
    • Focused around an identified IGAD research data community
  2. Present a report of the landscape assessment to RDA11 and circulate for expert comment.
  3. Develop a case study around an identified community to identify good practices for capacity development and advocacy.
  4. Build synergies with other initiatives, such as GODAN Action,  to  increase the coherence and impact on the delivery of training.

 

The working group aims to identify and mitigate the gaps in capacity development as evidenced by the needs assessment of the different stakeholders in domains under agriculture. The WG also aims to map and enlist the active initiatives, organisations and agencies already involved in capacity development activities  and build synergies between them and then evidence needs of the stakeholders. In view of the above, we will identify possible stakeholders such as the content providers, technology solution providers, user community (practitioners, researchers, policy makers) and resource persons.

 

3. Value proposition

 

There is a recognised need for capacity development in the fields of agricultural data management. A number of different organisations are attempting to address this, however there does not appear to be a coordinated approach across all agricultural domains which can adequately identify training gaps and make recommendations. Language is a further obstacle for identifying and evaluating existing resources. This working group aims to address this gap by acting as a hub for coordinating the development and provision of training to agriculture practitioners, researchers and policy makers.

This WG outputs will be of benefit to a wide range of stakeholders, users and communities in the Agriculture and Nutrition domain.

Key impacts

  • Reduce effort and duplication in development of curricula
  • Support capacity development for individuals, communities, and initiatives that will benefit from utilising Agricultural data outputs
  • Advocate for collaborative efforts agriculture and nutrition open data initiatives through knowledge sharing among capacity development activities
  • Advocate programs, good practices, and lessons learned that enable the use of open data. Promote capacity development and diversity of open data users for a more effective accessibility, use, engagement and understand of open data.

 

4. Work plan

 

The deliverables for the Capacity Development WG will be:

 

  • Bring multidisciplinary inputs about best practices into the curriculum development process for Agriculture data with the objective to expand the societal impact of open agricultural research.
  • To explore ideas for closer collaborations between RDA/IGAD - RDA/Geospatial IG - GODAN Capacity Development - CODATA and synergies to set up joint education and training activities on Open Data in food and agricultural sciences.  We will work with RDA-CODATA Summer Schools, Teaching TDM for Education and skills development initiatives to build synergies. A particular focus will be on sharing knowledge about training programs and platforms for reducing digital divides so that researchers in developing countries can also benefit.

 

The outcomes for the Capacity Development WG will be:

 

  1. Report on the current landscape assessment.
  2. From the landscape assessment, identify and report, with interested stakeholders, existing capacity gaps, including language barriers.
  3. Develop a case study around the identified community to identify good practices for capacity development and advocacy
  4. Initiate ideas for Joint training programs in Agricultural data (both online and hands- on ) in collaboration with interested organisations for example GODAN as well as other IGs in RDA (Geospatial IG)

 

7. Milestones

 Deliverables and Outcomes

 Engagement

 

IGAD has spearheaded two important capacity development initiatives. These include two introductory online MOOCS course on data management, principally aimed at learners in Latin America, and four Forums on Open Data and Open Science in Agriculture for Africa. The MOOCs courses offer an introduction to themes of data management using examples from the international sphere and with a particular focus on data management research in fisheries and nutrition. As well as teaching how to define data; the course shall look at issues of data storage, sharing and long term preservation, highlighting the different resources available to researchers in data management.  IGAD leads the course in conjunction with the FAO of the United Nations, Spain’s Polytechnic University of Valencia and the International Food Policy Research Institute (IFPRI).

 

The goal of the forums is to provide a platform for stakeholders in Africa to share knowledge on institutional and national initiatives aimed at enhancing the visibility, accessibility and usability of agricultural data and science, and the potential contribution to the realization of Africa’s sustainable development goals and the SDGs. We aim to reach out to GODAN Capacity Building activities to expand our synergies and work jointly for expanding capacity development in agriculture data globally.

 

Work Plan

A specific and detailed description of how the WG will operate includes:

 

  1. The final deliverables of the Capacity Development in Agriculture Data WG will consist of:
    1. A landscape assessment on the current state of play for agricultural research data with emphasis on a selected case study group
    2. Good practice guidelines for identifying and addressing capacity gaps, derived from the case study (Including guidance on advocacy to support adoption )
    3. A capacity building resource hosted on the RDA website

 

  1. Milestones for the WG shall include:

 

  1. November 2017 - Submission of the WG Case Statement to  RDA
  2. November 2017 (M1) - Acceptance and endorsement of the WG Case Statement by  RDA?
  3. March 2018 (M5)  - Identified an Ag data community for the case study and completed and initial landscape assessment to identify the current state of play for the community,
  4. March 2018 (M5) - Present a draft report on the current state of play for the case study community to IGAD @ RDA11.
  5. April 2018 (M6) - Circulate report to IGAD community for expert feedback
  6. April 2019 (M18) - Report of good practices and lessons learned
  7. April 2019 (M18) - Handover of developed research data capacity development resource to the wider IGAD/RDA community
  • A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation

The WG will have all the discussions through the IGAD maillists to ensure openness and inclusivity  in the capacity development training activities. This will ensure all viewpoints are discussed in an open and participatory way.

As contributors will be meeting regularly through GoTo Meeting facility and various conferences, they will be in an ideal position to let the wider agricultural community know about the importance and the status of the work done within the WG.

 

Adoption Plan

Work for Agreement among key agriculture capacity building communities (GODAN, IGAD, CODATA etc) by April 2019 to expand synergies in training programs using the guidelines developed as part of the deliverables has been established as an objective of the WG.

 

Initial Membership

 

  • Muchiri Nyaggah (Local Development Research Institute, Kenya)
  • Karel Chavat (Czech Center for Science and Society, Czech Republic)
  • Antonio Sanchez-Padial (INIA, Spain)
  • Andreas Kamilaris (IRTA, Spain)
  • Elizabeth Zeitler (AAAS Science and Technology Policy Fellow, USA)
  • Ruthie Musker (GODAN, USA)
  • Imma Subirats (FAO of the UN, Italy)
  • Karna Wegner(FAO of the UN, Germany)
  • Patricia Rocha (Embrapa, Brazil)
  • Devika Madalli (Indian Statistical Institute, UK)
  • David Tarrant (ODI, UK)
  • Isaura Ramos Lopes (Netherlands)
  • Michael Ball (Biotechnology and Biological Sciences Research Council (BBSRC), UK)
  • Shaik N. Meera (Rice Knowledge Management Portal, India)
  • Carme Reverté Reverté (IRTA, Spain)
  • Sonigitu Ekpe (Nigeria)
  • Telemachos Koliopoulos (University of Srathclyde, Greece)
  • Salman Siddiqui(IWMI, SriLanka)
  • Sophie Fortuno (CIRAD, France)
  • Marie-Claude Deboin (CIRAD, France)
  • Zhang Xuefu (CAAS, China)
  • Richard Ostler (Rothamsted Research, UK)
  • Boniface Akuku (KALRO, Chair of the CODATA Task Group on Agriculture Data, Knowledge for Learning and Innovation, Kenya)

 

Initial leadership in English(tbc):

 

  1. Suchith Anand (GODAN( UK/India) - Coordination
  2. Chipo Msengezi (CTA, The Netherlands)
  3. Karna Wegner (FAO of the UN, Italy)

 

 

Review period start:
Wednesday, 3 January, 2018 to Saturday, 3 February, 2018
Custom text:
Body:

** Please Note - the following text has been deprecated in favor of the revised Charter Statement in the attached document - 7 June 2018 **

 

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

The Earth, space, and environmental science communities are developing, through multiple international efforts, both general and domain-specific leading practices for data management, infrastructure development, vocabularies, and common data/digital services. This Interest Group will work towards coordinating and harmonizing these efforts to reduce possible duplication, increase efficiency, share use cases, and promote partnerships and adoption in the community.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

This Interest Group builds on the BOF specific to EarthCube held at the RDA P10 meeting.  Results from that session showed strong interest in intenational collaboration for Earth, space, and environmental infrastructure concerns that are currently be addressed in RDA across all the sciences and external to RDA across many geographically oriented efforts. 

Key participating groups and their use cases include:

1. The Earth Science Information Partners (ESIP) is expanding to include communities outside of the United States.  Specifically the next area of growth is Australia.  ESIP is an independent forum used to address topics of interest to the Earth science data and technology community, such as data management, data citation and documentation. The work of the ESIP community is advanced through our collaboration areas, where participants contribute their expertise toward resolving common problems of the Earth science data and technology community.

2. The Australian AuScope program provides research infrastructure to the Earth and Geospatial Science research communities with a focus on data discovery, delivery and interoperability with an increasing focus on FAIR data principles.  AuScope has been developing related data and interoperability platforms for the last decade and is currently focusing on increased international collaboration.  AuScope is Federally funded by the National Collaborative Research Infrastructure (NCRIS) program.

3.  The European Union funded European Plate Observing System (EPOS) is in the midst of its implementation phase, with increasing interest to be interoperable with their international counterparts. EPOS, the European Plate Observing System, is a long-term plan to facilitate integrated use of data, data products, and facilities from distributed research infrastructures for solid Earth science in Europe.

4.  The American Geophysical Union (AGU) is leading an international community-driven effort, Enabling FAIR Data, to adopt existing and develop new leading practices that result in data supporting a publication to be preserved in an appropriate repository with proper data citations placed in the reference. Data will no longer be in the supplement of the paper. This effort seeds to coordinate across similar and supporting efforts. Partners include RDA, ESIP, ANDS, NCI, AuScope, Nature, Science, PNAS, and the Center for Open Science.

5. The Environmental Data Initiative (EDI) is an NSF-funded project meant to accelerate curation and archive of environmental data, emphasizing data from projects funded by the NSF DEB.  Programs served include Long Term Research in Environmental Biology (LTREB), Organization for Biological Field Stations (OBFS), Macrosystems Biology (MSB), and Long Term Ecological Research (LTER).

6.  The U. S. National Science Foundation’s EarthCube effort is moving into implementation phase and reaching out internationally to be informed and coordinate efforts. EarthCube, initiated by the National Science Foundation (NSF) in 2011, transforms geoscience research by developing cyberinfrastructure to improve access, sharing, visualization, and analysis of all forms of geosciences data and related resources.  

7. The Open Geospatial Consortium has a series of Domain Working Groups including Earth Systems, Geoscience, Hyrdology and Marine. These Domain Working Groups provide a forum for discussion of key interoperability requirements and issues, discussion and review of implementation specifications, and presentations on key technology areas relevant to solving geospatial interoperability issues. However, their supporters come mainly from the Government and Industry Sectors, and it is hoped by linking through this interest group there will be greater connectivity to those equivalent activities in the academic/research sector.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

This Interest Group will focus on awareness and coordination where applicable of dependent efforts across the international Earth, space, and environmental science communities.

 

This group is different from other current activities as it is focused on the scientific domains specific to the Earth, space, and environmental sciences.  Some overlap potentially exists with interdisciplinary work in the biological community and social science community.

 

Within RDA, we will leverage the existing working groups and interest groups to help integrate and coordinate the needs of the Earth, space, and environmental sciences with the objectives and deliverables across those related efforts. 

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

External (to RDA) Communities Include:

EarthCube

Council for Data Facilities (part of EarthCube, specific to digital repositories)

iSamples

AuScope

European Plate Observing System (EPOS)

OGC

Enabling FAIR Project (AGU)

Earth Space Information Partners (co-manage the group)

ICSU CODATA (e.g. Science Unions)

IGSN e.V. (International Geo Sample Number)

Environmental Data Initiative

 

RDA Communities:

RDA Working Groups:

Data Citation WG

Data Usage Metrics WG

DMP Common Standards WG

Metadata Standards Catalog WG

Persistent Identification of Instruments 

RDA/WDS Publishing Data Workflows WG

RDA/WDS Scholarly Link Exchange (Scholix) WG

Research Data Repository Interoperability WG

 

RDA Interest Groups:

Active Data Management Plans IG

Big Data IG

Chemistry Research Data IG

Data Discovery Paradigms IG

Data Foundation and Terminolgy IG

Data Policy Standardization and Implementation

Disciplinary Collaboration Framework 

Domain Repositories Interest Group

From Observational Data to Information

Geospatial IG

Global Water Information IG

Long tail of research data IG

Mapping the Landscape

Marine Data Harmonization

Physical Samples and Collections in the Research Data Ecosystem

PID IG

Quality of Urban Life IG

RDA/CODATA Legal Interoperability IG

RDA/WDS Certification of Digital Repositories

RDA/WDS Publishing Data IG

Reproducibility IG

Software Source Code IG

Virtual Research Environment IG

Vocabulary Services IG

Weather, Climate and Air Quality IG

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

Improved awareness, sharing, leveraging synergies, and coordination on projects that have dependencies, overlap, or multiple stakeholder communities.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

The group will meet quarterly with two of the yearly meetings at the RDA Plenary. Between plenaries, the IG will maintain momentum with face-to-face meetings at domain events: AGU, EGU, ESIP Winter, ESIP Summer, and EarthCube All Hands.  Virtual task force and planning meetings will support objective needs.

 

Timeline (Describe draft milestones and goals for the first 12 months):

Milestones and Goals

1.  Communicate with Earth, space, and environmental science communities not yet part of the Interest Group; invite them to join.

2. Establish mechanism to share project information and assess any dependencies, overlaps, and gaps. This needs to include allowing multiple efforts that are similar to proceed.

3. Co-chairs will reach out to RDA Working Groups and Interest Groups to determine potential collaboration.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

 

FIRST NAME

LAST NAME

TITLE

Erin

Robinson

Proposed Chair

Lesley

Wyborn

Proposed Chair

Danie

Kinkade

Proposed Chair

Shelley

Stall

Proposed Chair

Helen

Glaves

Proposed Chair

Mohan

Ramamurthy

 

Simon

Cox

 

Brooks

Hanson

 

Lynn

Yarmey

 

Kerstin

Lehnert

 

Simon

Hodson

 

Tim

Rawling

 

Roger

Proctor

 

Siddeswara

Guru

 

Kirsten

Elger

 

Grellet

Sylvain

 

Scott

Simmons

 

Greg

Buehler

 

George

Percivall

 

Denise

McKenzie

 

Ben

Evans

 

Andrew

Treloar

 

Corrina

Gries

 

Ari Asmi  
Jens Klump  
Mark  Parsons  
Martina Stockhause  
Steve Diggs  
Tobias Weigel  
Ben Evans  
Jingbo Wang  
Clare Richards  
Ted Haberman  
Jane Wynngaard  
Ruth Duerr  
Denise Hills  
Sophie Hou  

  Add more lines as needed by hitting the ‘tab’ key at the very end of the ‘Title’ line.

 

 

 

Review period start:
Thursday, 28 December, 2017 to Sunday, 28 January, 2018
Custom text:
Body:

Revised version (26/6/2018)

 

Introduction

The Research Data Architectures in Research Institutions Interest Group is primarily concerned with technical architectures for managing research data within universities and other multi-disciplinary research institutions. It provides insight into the approaches being taken to the development and operation of such architectures and their success or otherwise in enabling good practice.

 

An institution’s research data management infrastructure consists of more than just a data repository and discovery mechanisms. It includes the underlying storage technologies, the networking, hardware, system interfaces, authentication mechanisms, data brokers, monitoring platforms, semantic interoperability tools, long-term preservation services, high-performance and high-throughput computing facilities, data science platforms, and potentially many other technologies that process data and control the flows of data and metadata between systems.

 

Seamless data interoperability and movement between different systems both local and in national or disciplinary services is a particular challenge at present, given the need to provide researchers with a smooth and efficient user experience – a key requirement for any research data service. Governance and policies, project management environments, and communications platforms are also vital elements in shaping and informing IT architectures, as is the management of business information associated with research.

 

This IG seeks to understand the various architectures used by institutions globally, identify pain points within those architectures, and learn from those who have overcome or avoided those pain points.
The general approach of this IG is to encourage discussion about architectures and enable interested parties to collaborate and learn from one another. Many institutions are at present planning and working towards overarching data management architectures, and there is a legitimate concern that without such a forum as is provided by this IG institutions will relive the same experiences and repeat the same mistakes as their peers.

 

User scenario(s) or use case(s) the IG wishes to address

The Research Data Architectures Interest Group treats the following user scenarios:

  • Knowledge support for research data infrastructure architects, and project and service managers
  • Knowledge sharing regarding enterprise architecture best practises  at multi-disciplinary research institutions
  • Development of a greater understanding of objectives and goals between different stakeholders at research institutions, including  management, infrastructure developers, research data engineers,  IT systems architects, and technology vendors.

 

Objectives

The main themes of the IG are

  • Exploring how diverse tools, technologies, and services can be integrated to meet the evolving needs of researchers in research institutions.
  • Considering interoperability between institutional research data infrastructures and (inter)national or discipline-based infrastructures
  • Understanding the different institutional approaches to governance structures and business processes in responding to research ICT demands (e.g. capacity planning/forecasting for storage)
  • Sharing case studies of solutions developed by data infrastructure projects in research institutions
  • Presenting technical innovations and ideas that can further the development of integrated research data infrastructures
  • Agreeing best practice relating to research data architectures in research institutions

This group has connections with some other RDA groups, such as:

  • National Data Services IG – research institutions frequently support, use or deliver services and solutions provided by national data service vendors or operators, and these therefore form part of an institution’s architecture
  • Repository Platforms for Research Data IG and Research Data Repository Interoperability WG – repositories are an essential component of research data architectures, but need to be considered as one element of a larger infrastructure
  • Data Fabric IG – technology and interoperability solutions for research institutions are an essential part of the bigger picture)Storage Service Definitions WG (formerly “WG QoS-DataLC Definitions”) – storage definitions, vocabularies etc. are important for achieving mutual understanding of research data architectures

The IG differs from these RDA IG’s or WG’s in following areas:

  • The scope of the group is institutional-level solutions and architectures rather than for national or disciplinary-specific architectures, although it will be important to consider the relationships between these different levels
  • The Group will discuss technological solutions at research institutions at the enterprise architecture level, not focus on a single element in isolation

 

Participation

The target audience of this IG includes anyone involved in research data infrastructure planning projects or services as well as researchers with an interest in systems, technologies and data flows at the institutions level. This includes research data infrastructure project managers, ICT architects, senior managers with responsibility for research IT services, developers, data managers, data engineers, and data scientists. Also representatives of the service or technology vendors and data industry are welcome.

 

Outcomes

  • Outcomes of the IG:
  • Shared knowledge about research data tools, infrastructures and architectures
  • Knowledge base for best practises and lessons learned log
  • More specific outputs will be determined by the participants in this IG, but might conceivably include:
  • Shared repository of architectural diagrams
  • List of technologies used for specific purposes, and their interfaces
  • Landscape report and gap analysis

 

Mechanism

The Interest Group will hold regular meetings at RDA plenaries (if accepted in the schedule). Between Plenaries the IG will collect case studies and examples of good practice and add these to the open knowledge base. Group will have 1 - 2 web meetings between plenaries (possibly more, if it is needed).

 

Timeline

P10: BoF and start of the review process
P11: Starting the IG; Discussion about tools, best practices, use cases and knowledge base
P12: Templates for the best practices reports
P13: First collection of the knowledge base

 

Review period start:
Thursday, 28 December, 2017 to Sunday, 28 January, 2018
Custom text:
Body:

Please note - this text has been revised as part of the group review process. See the attached document for the latest case statement.  4 April 2018

 

Data Usage Metrics WG Case Statement

Research data is increasingly recognized as an important output of scholarly research, however, there are not yet standardized or comprehensive metrics for research data as there are for articles. Data Citations are necessary for the advancement of research data recognition, and initiatives in the RDA and scholarly communications space (eg. RDA/WDS Scholarly Link Exchange WG) are working to gain community adoption. Complementary to citations are data usage metrics (views, downloads). Both usage metrics and data citations are currently not yet counted and aggregated into clear metrics, as it is done for articles.

 

The “Make Data Count” project hosted a Birds of a Feather at RDA Plenary 10 (attended by 60 people) and recognized that many other groups are impacted by this initiative and there is widespread interest in implementing a standardized set of data usage metrics. Lending expertise from various projects and research stakeholders, this WG, a part of the Publishing Data IG, aims to harness community in-put and buy-in of data usage metrics and drive widespread adoption.

 

The WG intends to be international by design. The WG recognizes that for data usage metrics to be adopted at a global scale, representatives from all global regions need to help shape the WG outcomes. Therefore, the WG will actively recruit participants from all regions globally, including international initiatives (e.g.,. ANDS) and will be directing outreach globally through webinars, related conferences, and consistent communication.

 

Value Proposition

Adoption of Data Usage metrics is necessary for the recognition of research data as a first-class research output. Researchers will benefit from data usage metrics by being able to see the value-add of opening up their work, funders will be able to track the impact of their funding, publishers will see the relation and usage of articles with the underlying data, and repositories will be able to better serve their research communities.

 

Engagement with existing work in the area

This WG has members from the Make Data Count project, a Sloan grant between the California Digital Library, DataCite, and DataONE to build the technical infrastructure for data-level-metrics. For data usage metrics to work, the WG will need to leverage existing initiatives (CrossRef Event Data, DataCite) and work closely with RDA/WDS Scholarly Link Exchange WG. As part of the Publishing Data IG, the WG will work closely with the repository, publisher, and library communities to ensure data usage metrics have widespread community input and buy-in.

 

Work Plan

 

Goals and Deliverables (18 months):

  1. Community consensus on use cases and priorities for data usage metrics

  2. Development of a standardized recommendation for counting research data usage metrics and citations

  3. Community feedback and testing of data usage metrics in repositories

  4. Global repository adoption of standardized data usage metrics

  5. Researcher education around utilizing data usage metrics

  6. Collection of needs from research stakeholders for future iterations of data metrics

 

Milestones & Timeline:

  • RDA Plenary 11, Spring 2018

    • Share goals and deliverables and receive community input

    • Conversation on Goal 1: community consensus on use cases and priorities for data usage metrics

    • Share out of Goal 2: standardized recommendation for counting research data usage metrics and citations

  • Summer, 2018

    • Goal 3: Community feedback and testing of data usage metrics in repositories

    • Milestone: Adoption in a couple of repositories

  • RDA Plenary 12,  Fall 2018

    • Goal 4: Global repository adoption

    • Goal 5: Development of materials and outreach for awareness of data usage metrics

    • Intermediate deliverable: Development of use cases and progress report on adoption in repositories

  • Winter, 2019

    • Goal 4: Global repository adoption

    • Intermediate deliverable: Share out to community of progress in adoption of data usage metrics

    • Intermediate deliverable: Iteration of recommendation around data usage metrics

  • RDA Plenary 13, Spring 2019

    • Goal 6: Collection of needs from research stakeholders for future of data usage metrics

    • Report out to community on next steps and needs for data usage metrics as well as progress in 18 months of developing standards and driving adoption of data usage metrics

 

The WG plans to meet at each RDA for the next 18 months as well as web meetings monthly (when there is not an RDA Plenary). The WG would have consistent communication (via email listserv) throughout the 18 months with updates on deliverables as well as ongoing work in the data metrics space.

 

While there may be disagreements or difficulty building consensus in an area that does not yet have standards, the WG would be keen to involve as many members from the various communities involved in research data as possible (to illustrate the full landscape) and would work methodically to ensure the community is in agreement about each piece of the deliverables. Standards would be developed with community input. WG Plenary sessions would not be a presentation but rather a working meeting to use the time as a consensus building and feedback session to ensure deliverables are in line with community needs.

 

The RDA community is large and diverse but this WG would look beyond the WG to ensure researchers (end users) are involved and supportive, and to to drive adoption to as many repositories and institutions as possible. Utilizing connections from WG members and displaying successful implementation in WG member repositories, the WG would be focused on mass adoption and community buy-in.

 

Adoption Plan

Data usage metrics would be implemented within repository organizations that are involved in the WG and further pushed out to the larger repository community. Initially, data usage metrics will be implemented on the Dash platform (California Digital Library), DataONE repositories, and Mendeley Data (Elsevier) for early adoption. Implementation in these three spaces will show the broad scope of ways to implement and documentation about each will aid with mass-adoption. Institutions in the WG will be able to implement data usage metrics for institutional repositories and publishers will be able to display data usage metrics in relation to the corresponding article-level-metrics. Funders involved in the WG will be able to utilize these data usage metrics and push for grantees to be depositing in repositories with these metrics (and reporting back on them).

 

Review period start:
Wednesday, 27 December, 2017
Custom text:
Body:

At the GEO-XIV Plenary, GEO Secretariat Director Barbara Ryan stated that the "link from data provider research infrastructures to users will become more important" [1]. This statement opens several interesting questions. Who are the users? What data are provided by research infrastructures (RIs)? What kind of links are there between data provider RIs and users? How do RIs provide the data needed by users? Do users produce data? Are such derived data acquired by RIs, further curated, published, processed and used?

 

Users are of different kinds, ranging from individual researchers and research communities to industry, decision makers and the general public. Data are of different kinds as well, ranging from primary observational, experimental, or computational data to data derived in numerous activities performed by users along contextualized value chains. Data provided by RIs can be primary or derived. While users consume data provided by RIs, many groups of users surely also produce data. Such derived data should be acquired by infrastructures.

 

In this complex landscape, there is an important constant across infrastructures and domains that lies at the core of the OD2I IG. Along value chains, primary data are interpreted for their meaning in determinate contexts of scientific, industrial, or broadly societal relevance. Within the context of a particular value chain, primary data are uninterpreted. In contrast, meaningful data resulting from data interpretation are information, interpreted data. Primary data thus evolve to become contextually meaningful information further used for both scientific and nonscientific purposes.

 

With primary focus on (i.e., not exclusively for) observational data and environmental research infrastructures, the OD2I IG studies this constant. Building on collected use cases and existing conceptual frameworks, the OD2I IG advances understanding for how observational data evolve to information, ultimately integrated into bodies of knowledge about natural and human worlds.

 

In other words, the OD2I IG studies and advances understanding of the relatively unexplored interface between users and infrastructures in the data use phase of the research data lifecycle. It studies and models the roles of, and interactions among, human and computer agents; the data and information consumed and produced by agents in this phase; the performed activities and the systems supporting their execution.

 

The notion that primary data evolve to information (and knowledge) is increasingly common. Research infrastructures emphasize there is knowledge to gain through observation [2]. Earth observation satellites "provide critical information for global food security" [3]. The European Open Science Cloud (EOSC) is envisioned as an environment that enables turning ever increasing amounts of data "into knowledge as renewable, sustainable fuel for innovation in turn to meet global challenges" [4]. At the 2016 Fall Meeting of the American Geophysical Union (AGU), Rebecca Moore (Google) envisioned the possibility of monitoring a changing planet and "generating precise, actionable information and knowledge".

 

Following a proposal by members of the OD2I IG that suggests to adopt the Floridi framework [5, 6] with the notion of data interpretation borrowed from Aamodt and Nygård [7], the OD2I IG considers technical aspects of (semantic) information representation in systems and the management of explicit and formal semantics. Connecting data to users relies on systems capable of acquiring, curating and processing the meaning of data generated by users along value chains. A critical aspect is the mechanism for representing information in ways suitable for both machine-to-machine interaction and for presentation to and use by users. Since the ability to exploit any given information demands a specific knowledge on the part of the user, presentations need to consider both user type and intended purpose. Data provide a basis for building information that will lead to decision making. The transition from data to information involves processes of interpretation, in which meaning is attached to data. It is information, and its use against a background of prior knowledge that provides sufficient understanding to allow consequences of decisions to be foreseen.

 

The OD2I IG aligns to the mission and vision of RDA through the specific concern of socio-technical support for the extraction of information from primary observational data, activities that are primarily carried out by research communities as they make use of data in their everyday work. The OD2I IG will add value by working to realize information and knowledge-based systems layered above the current data systems, resulting in improved usability of data as information by both humans and machines. Of specific  emphasis is the outcome that machines are enabled in automated processing of information. The OD2I IG is committed to make a difference in this regard.

 

Please see the attached From Observational Data to Information Charter Statement for additional information on the Interest Group plans.

 

Review period start:
Wednesday, 27 December, 2017 to Saturday, 27 January, 2018
Custom text:
Body:

The Persistent Identification of Instruments RDA Working Group (PIDINST WG) seeks to propose a community-driven solution for globally unique and unambiguous identification of instruments instances that are operational in the sciences.

 

Please see https://www.rd-alliance.org/sites/default/files/case_statement/rda-wg-pi... for the full case statement for the PIDINST WG.

 

In her recent book, entitled “Big Data, Little Data, No Data” [1], Christine Borgman writes “To interpret a digital dataset, much must be known about the hardware used to generate the data, whether sensor networks or laboratory machines.” Borgman further highlights that “When questions arise [...] about calibration [...], they sometimes have to locate the departed student or postdoctoral fellow most closely involved.” This is a striking account for the role information about instruments plays in science and the costs of not being able to find and access such information.

 

The need to uniquely identify an instrument instance is rapidly growing in many research communities. Indeed, persistent identifiers enable unambiguous reference to digital representations of instruments, which has many potential benefits:

  • Metrics that quantify the use of instruments and the rationale for future funding
  • Link data to the instruments that generated them (provenance), improving the interpretation and validity of data
  • Aid equipment logistics and mission planning
  • Facilitate interoperability and open data sharing, especially in advancing technologies that foster sharing of instruments
  • Improve the discoverability and visibility of instruments and their data, published on the web.

 

Currently, there is no universal way to identify instrument instances. As the primary outcome, PIDINST WG contributes to establishing a cross-discipline, operational solution for the unique and lasting identification of active and decommissioned instruments. This case statement outlines the work planned for PIDINST WG.

 

Issues to be addressed

  • Instruments as physical entities - What is an instrument? Implications of identifying the instrument instance as a physical object versus identifying a digital information object (metadata) about the instrument. What do instruments produce, their real-world configurations, their relations to platforms and deployments, and the implications of instrument modifications to identification (new versions).
  • Granularity - Instruments can be parts of other (compound) instruments. For example, instruments can be manufactured with multiple bespoke sensor components, such as modular weather stations that simultaneously measure multiple meteorological variables. The granularity at which to reference and describe instrument instances (compound versus component) can vary for different stakeholders. How can these types of instruments be described in a generic way.
  • Use cases - Support the analysis of community requirements and inform the work carried out by PIDINST WG.
  • Metadata - Explore the types and sources of metadata that could be resolved under a PID and the difference between metadata registered at PID infrastructure provider (e.g. DataCite, ePIC, Crossref) vs. metadata at institutional instrument database provider. Develop a minimum common metadata schema for the registration of instruments with PID infrastructure providers.
  • Machine readability, interoperability, and provenance - Investigate the need and the requirements involved to make metadata (at the institutional level) machine readable and compatible with existing interoperable technologies. Provenance, in particular the relation between data and instruments that generated them, is another aspect to be addressed.
  • Landscaping - Explore the links, potential relationships and overlaps with instrument manufacturers, institutional instrument database providers, RDA groups and PID infrastructure providers.

 

Outcomes

The work of the PIDINST WG will contribute to the following outcomes. Note that these are long-term outcomes this WG aims at contributing to. This WG will not build a sustainable infrastructure for the persistent identification of instruments. It will merely contribute to specifying such infrastructure. The concrete deliverables of this WG are presented in the Work Plan.

  • A sustainable infrastructure will support the registration of instrument instances by submitting metadata about them and allowing for minting an instrument instance PID. The PID must follow agreed standards for persistent identifiers, e.g. long-lasting actionable, descriptive digital identifiers.
  • Improved understanding within research communities for how to describe instrument instances, including relations to other entities such as instrument model (type) or instrument deployment, the issue of identifying physical objects versus digital representations, and other related issues.
  • Collaborations with one or more PID infrastructure provider interested in implementing the approach to persistent identification of instruments proposed by the PIDINST WG.
  • Strong linkages to the activities of the RDA PID IG and other related RDA groups.

 

References

[1] Borgman, C.L. (2015). Big Data, Little Data, No Data. MIT Press.

 

Please see https://www.rd-alliance.org/sites/default/files/case_statement/rda-wg-pi... for the full case statement for the PIDINST WG.

 

Review period start:
Wednesday, 27 December, 2017 to Saturday, 27 January, 2018
Custom text:
Body:

Version 0.9, 9/17/2017

 

Name of Proposed Interest Group:   Physical Samples and Collections in the Research Data Ecosystem

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

Physical samples are a basic element for reference, study, and experimentation in research.  Tests and analysis are conducted directly on samples, such as biological specimens, rock or mineral specimens, soil or sediment cores, plants and seeds, water quality samples, archeological artefacts, or DNA and human tissue samples, because they represent a wider population or a larger context. Other physical objects, such as maps or analog images are also direct objects of study, and, if digitized, may become a source of digital data. There is an urgent need for better integrating these physical objects into the digital research data ecosystem, both in a global and in an interdisciplinary context to support search, retrieval, analysis, reuse, preservation and scientific reproducibility. This group aims to facilitate cross-domain exchange and convergence on key issues related to the digital representation of physical samples and collections, including but not limited to use of globally unique and persistent identifiers for samples to support unambiguous citation and linking of information in distributed data systems and with publications, metadata standards for documenting samples and collections and for landing pages, access policies, and best practices for sample and collection catalog, including a broad range of issues from interoperability to persistence.

A growing community of stakeholders, comprising domain scientists, collection curators, information scientists, data managers, all working at the interface with computational science, are developing detailed practices and standards around identifiers, vocabularies, and software interfaces, which are necessary for wider community application. Publishers and funders represent additional stakeholders interested in best practices for sample citation and registration of sample metadata in online catalogs that are fundamental for reproducibility of sample-based data and future use of valuable collection specimens. Currently, these efforts are fragmented, as is the communication of technical solutions and organizational best practices. This IG will support cross-disciplinary and international dialog helping to build technical and social bridges among a broad range of stakeholders to align and coordinate ongoing efforts, strengthen solutions, and broaden their adoption.

At RDA Plenary 4 and Plenary 6 Bird of Feather sessions were held that already gathered an international and multi-disciplinary group of stakeholders. A preliminary case statement was reviewed by participants in the P6 BoF and informed the current version.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

Best practices, standards, and infrastructure are needed to properly link physical samples and collections to digital data generated by their study or to features in the real world. Samples need to be cited with globally unique, persistent, and resolvable identifiers in publications to ensure that they can be unambiguously linked to online metadata profiles (landing pages) and to other data generated by other studies of the same sample. Scientists want to search for data for a given sample across the entire literature. This can now be achieved as sample PIDs can be included in publication DOIs or data DOIs as related identifiers that can be harvested and searched through systems like SCHOLIX. Scientists also want to find out where a given sample can be accessed to reproduce data or add new measurements to the available knowledge about a sample. Both the approaches to, and maturity of technical and organizational solutions and infrastructure differ across the many disciplines that work with physical samples. Diverse and uncoordinated practices make it difficult to advance the adoption of best practices that link physical samples to the digital research data ecosystem. Further, commercial software providers for museum and collection catalogs and publishers are reluctant to implement best practices if they are different and incompatible across domains.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

RDA presents a multi-disciplinary and international community engaged in research data management that presents a unique opportunity for the goals of this IG. The objectives of this IG are:

  1. Identify commonalities and diversities across the stakeholders and establish prioritized action items that are appropriate for Working Groups. Relevant issues are: unique sample identifiers; sample documentation including vocabularies and taxonomies and alignment with international metadata standards; sample registration and interoperability of digital online catalogs; policies for sample citation in publications; and access to samples and sample metadata.
  2. Identify and characterize existing systems and solutions relevant to linking physical samples with digital research data; identify gaps and challenges.
  3. Facilitate international cooperation to develop harmonized approaches and best practices for physical object identification and digital curation; enable the facilitation of object and sample identification infrastructure both at the national and international levels.
  4. Build linkages between object repositories and museums, digital data repositories, scientific publications, museum software providers, and science communities.
     

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

Communities that will be involved in this IG range from museum and collection curators, to research data managers, researchers in domain sciences, information sciences, and computer sciences, to publishers and funders. Various workshops happened over the last few years that brought together stakeholders primarily from interested in the topic of Physical Samples in the Digital Research Ecosystem, including:

  • Linking Environmental Data and Samples, CSIRO, Australia, May 2017
  • Physical Samples and Digital Collections, iConference, China, March 2017
  • Physical Samples, Digital Collections, ASIS&T Conference, Denmark, October 2016

In the previous two BoF sessions at RDA P4 and P6 the following communities were represented:

  • Biodiversity
  • Oceanography
  • French science archive
  • Australian meteorology - water quality sampling and provenance
  • German Research Center, library
  • European PID network
  • Geological Society
  • Kew Gardens
  • CDL - neurobiology, Berkeley museum
  • Agricultural research in Italy, soil samples
  • Zoology and environmental science
  • Provenance and workflows, biodiversity workflows
  • Material science, Air Force
  • Ethnography
  • Natural History Museums
  • National Repositories

 

We will facilitate workshops at ASIS&T, JCDL, SPNHC, and domain-specific conferences such as AGU for the Earth and Space Sciences to broaden participation and dissemination of outcomes.

We will work with the following organizations to engage relevant communities.

  1. International Geo Sample Number IGSN e.V. (Kerstin Lehnert, Jens Klump; http://www.igsn.org) - Global implementation organization for unique sample identifiers, members if 5 continents)
  2. Global Biodiversity Information Facility (Donald Holborn)
  3. Taxonomic Data Working Group (John Wieczorek, http://www.tdwg.org)
  4. DISSCO (Distributed System of Scientific Collections, http://dissco.eu)
  5. AuScope (Lesley Wyborn, http://www.auscope.org.au/)
  6. EPOS (Kirsten Elger, https://www.epos-ip.org/ )
  7. SPNHC (Society For The Preservation of Natural History Collections, http://www.spnhc.org)
  8. DataCite (https://www.datacite.org)
  9. CODATA Task Group on Coordinating Data Standards amongst Scientific Unions (Marshall Ma, http://www.codata.org/task-groups/coordinating-data-standards )
  10. ESIP (Earth Science Information Partners) (Erin Robinson, http://www.esipfed.org)
  11. Scientific Collections International (SciColl, http://scicoll.org)

 

Related RDA groups

●      TAB

●      WG/IG Chairs

●      Biodiversity Data Integration IG

●      Long tail of research data IG

●      PID IG

●      Research Data Provenance

●      RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

  1. A report that synthesizes existing best practices for digital curation and sharing of physical samples from disparate disciplines and institutions.
  2. A journal special volume on sample and collection management in the research data ecosystem (journal TBD).
  3. Creation of RDA Working Groups to develop recommendations for best practices and standards related to sample unique identifiers, sample metadata, and sample citation, such that they can be linked with data and publications derived from them.
  4. Joint sessions with other RDA groups such as Biodiversity Data Integration IG, Long Tail of Research Data IG, PID IG, Research Data Provenance, and others as appropriate for knowledge exchange, to align with emerging relevant standards, and to promote recommendations from the IG.
  5. Facilitation of collaborations that advance interoperability between collection catalogs, sample registries, data repositories, and publications for improved data sharing across disparate disciplines, through e.g., alignment of sample metadata with existing metadata standards.

 

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • Primary mechanism for communication will be email with periodic (quarterly web conferencing) meetings, plus sessions at plenary meetings.
  • We will also leverage other meetings such as EGU, AGU, SciDataCon, and ESIP
  • Knowledge gathering and capture will be via RDA IG web site. We may use other collaboration tools as appropriate, e.g. wiki’s, or tools such as GitHub, or Center for Open Science Open Science Framework.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

September 2017 - P10 session: BoF session, presentation of Case Statement

December 2017 - AGU meeting and progress report

March  2018 - P11 session, evaluate progress, revisit workplan

September 2018 - P12

 

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):  Bold indicates co-chairs

 

FIRST NAME

LAST NAME

EMAIL

Kerstin

Lehnert (co-chair)

lehnert@ldeo.columbia.edu

Lesley

Wyborn (co-chair)

lesley.wyborn@anu.edu.au

Jens

Klump (co-chair)

Jens.Klump@csiro.au

Simon

Cox (co-chair)

Simon.Cox@csiro.au

Helen

Glaves

hmg@bgs.ac.uk

Rowena

 Davis

rowenaidavis@email.arizona.edu

Markus

Stocker

mstocker@marum.de

Lindsay

Powers

lpowers@usgs.gov

Christopher

Lenhardt

clenhardt@renci.org

Denise

Hills

dhills@gsa.state.al.us

Dirk

Fleischer

dfleischer@kms.uni-kiel.de

Kirsten

Elger

kelger@gfz-potsdam.de

Wim

Hugo

wim@saeon.ac.za

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Sarah

Ramdeen

ramdeen@email.unc.edu

John

Wieczorek

tuco@berkeley.edu

Leslie

Hsu

lhsu@usgs.gov

Donald

Hobern

dhobern@gbif.org

Nicky

Nicholson

n.nicolson@kew.org

Unmil

Karadkar

unmil@ischool.utexas.edu

Ashlee

Dere

adere@unomaha.edu

Nicholas

Car

Nicholas.Car@ga.gov.au

Anusuriya

Devaraju

anusuriya.devaraju@csiro.au

Sean

Toczko

sean.jamstec@gmail.com

Lynne

Yarmey

yarmel@rpi.edu

Dawn

Wright

DWright@esri.com kelger@gfz-potsdam.de

Marshall

Ma

max@uidaho.edu

 

The following are additional potential participants who attended the previous BoF sessions at P4 and P6: 

 

FIRST NAME

LAST NAME

EMAIL

Institution

Aaron

ADDISON

 

Washington University in St Louis

Arturo

ARIÑO PLANA

artarip@unav.es

University of Navarra

Toshihiro

ASHINO

ashino@acm.org

Toyo University

Sven

BINGERT

sven.bingert@gwdg.de

GWDG

Daphne

DUIN

daphne.duin@naturalis.nl

Naturalis Biodiversity Center

Ian

FORE

 

National Council Institute (NIH)

Kazu

FUKUDA

 

JAMSTEC

Bryon

FOSTER

Bryon.Foster@us.af.mil

USAF/AFRL

Margaret

FOTLAND

m.l.fotland@admin.uio.no

University of Oslo

Jason

JACKSON

 

Indiana University; Mathers Museum of World Cultures

John

KRATZ

John.Kratz@ucop.edu

California Digital Library

Giovanni

L'ABATE

giovanni.labate@entecra.it

Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CRA-ABP) Research centre for agrobiology and pedology

Bertram

LUDAESCHER

ludaesch@gmail.com

University of Illinois, Urbana-Champaign

Paolo

MISSIER

Paolo.Missier@ncl.ac.uk

Newcastle University

Magalie

MOYSAN

magalie.moysan@univ-paris-diderot.fr

Université Paris Diderot

Fiona

MURPHY

fionalm27@gmail.com

Research Consultant

Ritsuko

NAKAJIMA

rnakajim@jst.go.jp

Japan Science and Technology Agency

Nicky

NICOLSON

n.nicolson@kew.org

Royal Botanic Gardens, Kew

Kunihao

NIWA

 

Research Organization of Information and Systems (Japan)

Yoshinori

OCHIAI

 

JST

Carole

PALMER

 

iSchool Washington

Paul

SHEAHAN

sheahanpaul@hotmail.com

Sheahan

Paola

TAROCCO

ptarocco@regione.emilia-romagna.it

Geological, Seismic and Soil Survey. Emilia-Romagna Region (Italy)

Anne

THESSEN

annethessen@gmail.com

The Data Detektiv

Nicholas

WEBER

nmweber@uw.edu

University of Washington

Matt

WOODBURN

 

Natural History Museum London

Themis

ZAMANI

sakka@grnet.gr

GRNET

Carlo

ZWÖLF

carlo-maria.zwolf@obspm.fr

Observatoire de Paris

Review period start:
Thursday, 21 September, 2017
Custom text:
Body:

** PLEASE NOTE - The following text has been deprecated in favor of the revised Charter Statement attached to this page - 6 June 2018 **

 

 

CODATA/RDA Research Data Science Schools for Low and Middle Income Countries IG

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The goals of this RDA Interest Group are to continue the information sharing, targeted outreach and community collaboration with RDA members about the CODATA-RDA Schools for Research Data Science. At this point, there have been two very successful schools hosted by the International Center Theoretical Physics in Trieste :- the first being held in August of 2016 and the second in July of 2017. Both were held in Trieste, Italy. There are two planned upcoming events, one in Sao Paulo in December 2017 and a third Trieste school in August 2018. We are also looking into an African school in 2018. We plan to continue providing open and evolving curriculum materials, creation of a practical framework for hosting regionalized instances of the course, and focus on train-the-trainer concepts to grow regional capacity. This is aligned with the RDA mission in that it enables data sharing through its training and adds value to the RDA community by teaching some of the outputs of the RDA in Research Data Management.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

The school curriculum focuses on Open Data and FAIR practices, ethical data use, and builds a foundation of Data Science skills for early career researchers in all disciplines. The attending researchers are given priority based on a World Bank ranking of Low or Middle Income Countries (LMICs), so the focus is on resource constrained researchers. This specifically speaks to the RDA Vision of “…researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” In the past RDA as an organization and its recommendations and outputs have been introduced to the school’s students. While the events thus far have targeted LMICs, the curriculum should have universal application for Early Career Researchers (ECRs) worldwide.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

·      Continuing to provide successful School for Research Data Science Events.

·      Creating a framework for hosting of regional events.

·      Providing ECRs with a foundation of skills necessary to thrive in an Open Science and Open Data environment.

·      Grow a base of worldwide trainers prepared to lecture, mentor, and provide support to future worldwide and regional events.

·         Continually evolve the foundational Data Science curriculum designed for this course, along with making it accessible and reusable for other projects with similar goals.

This is distinct from other groups within the RDA.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

·      The communities targeted for students are ECRs in LMICs with interest of need of a foundation in Data Science skills and resources. They should also embrace the concepts of Open Science and Open Data.

·      The communities targeted for lectures and mentors are any Research Data Science professionals or academics interested in developing the next generation of Data Scientists in LMICs. This includes, but in not limited to RDA Members.

·      NGO and Corporate sponsorship will need be leveraged for successful hosting of events.

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

·      Continued successful instances of the RDA/CODATA School for Research Data Science.

·      Continued growth of lecturer and mentor base to regionalize training events and reduce the cost per event by leveraging local experience and talent.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

Regular event planning meetings will be held prior to upcoming scheduled events and targeted outreach events. A CODATA Task Group will continue to meet about governance, funding, curriculum evolution and operation of the project. We will leverage the RDA IG as an information sharing and recruiting platform.

 

Timeline (Describe draft milestones and goals for the first 12 months):

September 2017: RDA Plenary Final CODATA/RDA Summer School for Research Data Science WG Session has been accepted.

December 2017: 1st São Paulo, Brazil instance of the school.

August 2018: 3rd Trieste, Italy instance of the school.

Review period start:
Custom text:
Body:

CODATA/RDA Interest Group Charter

 

Updated version, dated 15 December 2017, endorsed 6 Feb 2018

The original version of the Charter can be found at the end of this page, and also attached.

 

Name of Proposed Interest Group: Mapping of the Landscape of Research Data Activities

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The Internet now connects research data, computer resources and software from globally distributed resources in real time. Where on planet Earth these resources are geographically located is irrelevant, but to enable online access to them, there is a rising need for programmatic access to both data, and to software to find and process data across institutional, domain and national boundaries. This requires the development of standardized machine-to-machine interfaces that loosely couples data and software through agreed formats, interfaces, vocabularies and ontologies, preferably across multiple domains. The complexity of these online infrastructures require that they are built by much wider communities, through effective cooperation and governance, to enable new and innovative forms of interdisciplinary science from globally accessible data stores.

 

The time is ripe for identifying the key communities and partnerships within the major scientific domains that are developing digital research infrastructures that enable sharing and processing of scientific data and ‘Mapping Of the Landscape’ (MOL) of these activities to further improve collaborations and partnerships, particularly those ‘umbrella’ alliances that are enabling interdisciplinary data sharing.  The key advantage of better Landscape Mapping is that researchers and infrastructure developers will know who is doing what where and hopefully avoid unintended duplication. Further, where duplication of activities are discovered, it is hoped that once groups are aware of equivalent activities, that MOL IG can help become a conduit where these groups can connect, share experiences and learn from each other to improve coordination. The IG will concentrate on efforts applying to digital work products; not physical structures. However, some geographical maps may be used to portray informatic connections between these entities.

 

At RDA Plenary 8 and Plenary 9, sixteen groups were identified undertaking “MOL’’ activities across a variety of data infrastructures and organisations. This not only reinforced that it was logical to attempt to coordinate all these MOL activities, but at the same time highlighted there was no agreed process on how to undertake a MOL activity so that outputs could be synthesised and leveraged.

 

 Key points identified at the P8 and P9 meetings were:

  1. There were actually a significant number of MOL activities being undertaken;
  2. That there was a diversity of research data infrastructures that each activity was trying to map (technology, data/information, computational systems, etc);
  3. There was no agreed vocabulary or ontology to describe what research data infrastructures that each MOL is reviewing in a consistent way; and
  4. There was a diversity of tools that were being used - each had different functionalities and the tool chosen was influenced to some extent by the type of MOL being undertaken.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

The MOL IG is surveying the community of landscape mappers to put together a current list of projects and a legend of vocabularies and visualization tools. The primary purpose of these lists is to increase awareness of current project and existing tools. Additional purposes are to enable current mappers to document and evaluate the differing methodologies/tools and vocabularies used by MOL- mappers (MOLers), share goals, and start to determine shared practices to enable future mapping projects to identify gaps and to align their tools and vocabularies to existing projects.

 

Ultimately, if the underlying data sets are sufficiently standardised  it should be possible to crosswalk between and interrogate across multiple MOLs.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.  Articulate how this group is different from other current activities inside or outside of RDA.):

 

  1. Develop a structured web page with a catalogue of MOL activities related to identifying research data infrastructures: this catalogue can be self populated by any MOL activity;
  2. Provide a key list of methodologies, tools, workflows, etc. being used; and
  3. Provide a key list of potential map indexes (vocabularies, ontologies, data models).

 

This group was partially informed by the RDA Atlas of Knowledge (AOK) and the RDA Technical Advisory Board (TAB) Landscape Overview Group (LOG) mapping exercises, though in contrast to this activity, the proposed MOL IG will focus on activities eternally to RDA and at a higher organizational level.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

Please refer to the MOL spreadsheet (available here); it covers a diversity of data infrastructure mapping projects, standards, repositories, organizations. Mappers and map projects across a diversity of domains, including health, arctic, earth sciences, environment, agriculture, and e-infrastructures (Note: this spreadsheet also contains the list of tools and vocabularies/ontologies.)

 

Related RDA groups

  • TAB
  • WG/IG Chairs
  • All RDA groups are indirectly related to this project
  • Education and Training on handling of research data IG (MOL outputs could be used as training tools or used to identify tools.  Many of these maps are considered onboarding materials.
  • Data Foundations and Terminology IG (could advise on foundation vocabularies).

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

  • Generating a dynamic list of MOL activities
  • Developing a portfolio of mapping methodologies, tools, to document best practice for those wanting to undertake MOLs
    • Comparing/contrasting strengths and weaknesses of each
  • Developing a list of potential vocabularies/ontologies that can be used as potential legends in MOL exercises
  • Example outcomes:
    • Recommendations for others working on ‘mapping the landscape’ activities to increase alignment and possible future integration.
    • Promotion of knowledge of existing exercises and limit duplicate efforts
    • Identification of knowledge gaps.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • AGU (December) /EGU (April) meetings - both are large international science meetings and attracted many interested MOL individuals this past year.  We plan to continue taking advantage of these gatherings as a feasible in-person discussion venue. This is however concentrated on Geosciences/Environmental/Earth Systems sciences only.
  • Regular telecons and emails.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

March  2018 - P11 session  - introduce new MOLers and road test the tools and vocab resource lists.

September 2018 - P12 session - introduce new MOLers and evaluate if cross MOL mappings can be technically undertaken, and/or automated.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):  Bold indicates co-chairs

 

FIRST NAME

LAST NAME

EMAIL

Rowena

Davis

rowenaidavis@email.arizona.edu

Lesley

Wyborn

lesley.wyborn@anu.edu.au

Ari

Asmi

ari.asmi@helsinki.fi

Steve

Diggs

sdiggs@ucsd.edu

 Helen

Glaves

hmg@bgs.ac.uk

 Peter

Pulsifer

Peter.Pulsifer@colorado.edu

Lindsay

Powers

lpowers@usgs.gov

Lynn

Yarmey

yarmel@rpi.edu

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Dawn

Wright

DWright@esri.com

Jonathan

Petters

jpetters@vt.edu

Leslie

Hsu

lhsu@usgs.gov

Rebecca

Koskela

rkoskela@unm.edu

Ma

Marshall

max@uidaho.edu

McQuilton

Peter

peter.mcquilton@oerc.ox.ac.uk

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Danie

Kincade

dkinkade@whoi.edu

Denise

Hills

dhills@gsa.state.al.us

Erin

Robinson

erinrobinson@esipfed.org

Fiona

Murphy

fionalm27@gmail.com

Gary

Berg-Cross

gbergcross@gmail.com

Mustapha

Mokrane

mustapha.mokrane@icsu-wds.org

Leslie

 McIntosh-Barelli

borrel2@rpi.edu

Mark

Parsons

parsom3@rpi.edu

Mohan

Rammamurthy

mohan@ucar.edu

Sara

Graves

SGraves@itsc.uah.edu

Simon

Lambert

simon.lambert@stfc.ac.uk

Xin

Mou

mou1609@vandals.uidaho.edu

Sophie

Hou

hou@ucar.edu

 

 


 

PLEASE NOTE - the following text was the original version of the Charter and has been deprecated in favor of the above text (which is also attached to this page).  

 

 

Name of Proposed Interest Group:   Mapping of the Landscape of Research Data Activities

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The Internet now connects research data, computer resources and software from globally distributed resources in real time. Where on planet Earth these resources are geographically located is irrelevant, but to enable online access to them, there is a rising need for programmatic access to both data, and to software to find and process data across institutional, domain and national boundaries. This requires the development of standardized machine-to-machine interfaces that loosely couples data and software through agreed formats, interfaces, vocabularies and ontologies, preferably across multiple domains. The complexity of these online infrastructures require that they are built by much wider communities, through effective cooperation and governance, to enable new and innovative forms of interdisciplinary science from globally accessible data stores.

 

The time is ripe for identifying the key communities and partnerships within the major scientific domains that are developing infrastructures that enable sharing and processing of scientific data and ‘Mapping Of the Landscape’ (MOL) of these activities to further improve collaborations and partnerships, particularly those ‘umbrella’ alliances that are enabling interdisciplinary data sharing.  The key advantage of a better Landscape Map is that researchers will know who is doing what where and hopefully avoid unintended duplication. Further, where duplicate or more activities are discovered, it is hoped that once groups are aware of equivalent activities, that MOL IG can help become a conduit where these groups can connect, share experiences and learned from each other to improve coordination and avoid any more duplication of effort.    

 

At RDA Plenary 8 and Plenary 9 sixteen groups were identified undertaking   “MoL’’ activities across a variety of data infrastructures and organisations. This not only reinforced that it was logical to attempt to coordinate all these MoL activities, but at the same time highlighted there was no agreed process on how to undertake a MoL activity so that outputs could be synthesised and leveraged

 

 Key points identified at the P8 and P9 meetings were:

  1. There was no agreed vocabulary or ontology to describe what research data infrastructures that each MoL is reviewing in a consistent way; and
  2. That there was a diversity of infrastructures that each was trying to map (technology, data/information, computational systems, etc).

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

MOL activities identified so far are both within and across many scientific domains. These have similar goals and host parallel working groups that support the mission of advancing scientific research through data interoperability. Several are looking for common ‘mapping’ methodologies so that ‘maps’ created by multiple groups can be interconnected and results shared.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

  1. Develop a web page with a catalogue of MOL activities related to identifying research data infrastructures;
  2. Develop a synthesis of existing MoL activities for research data infrastructure activities within and beyond RDA;
  3. Investigate mapping practices including methodologies, tools, workflows, etc. and identifying whether any key pieces are missing; and
  4. Discuss opportunities for collaborations on existing MoL exercises.

 

This group was partially informed by the RDA Atlas of Knowledge and TAB LOG mapping exercises, though in contrast to this activity, the proposed MoL IG will focus on activities  eternally to RDA and at a higher organizational level.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

  1. Arctic Data Committee Landscape Exercise (Peter Pulsifer, http://arcticdc.org/products/data-ecosystem-map)
  2. EarthCube (http://www.arcgis.com/home/item.html?id=9bde7150da474c828d61a5e67e98855d, http://www.goring.org/resources/neo4j_engagement.html)
  3. ESRI mapping tool (http://dusk.geo.orst.edu/ec-story) - developed by Dawn Wright of ESRI that was used to map the location of, and types of communities within EarthCube
  4. Belmont Forum (Rowena Davis)
  5. Atlas of Knowledge (Simon Lambert, RDA/EU http://core.cloud.dcu.gr/rda_aok/ )
  6. AuScope (Lesley Wyborn)
  7. CODATA Task Group on Coordinating Data Standards amongst Scientific Unions (Marshall Ma, http://www.codata.org/task-groups/coordinating-data-standards )
  8. TAB LOG (Steve Diggs,  [link]
  9. RDA Education IG connection?  (Sophie Hou pointed to Amy Nernburger’s education landscape survey as a possible connection point at the AGU in-person meeting)
  10. USGS Community for Data Integration (CDI) (Leslie Hsu, CDI wiki). Current working groups include Tech Stack, Semantic Web, Data Management, Citizen Science, Mobile App, and more. CDI Community can be engaged through Leslie Hsu, who coordinates communication to the 500+ members from within and outside of USGS. We have some initial coordination such as joint Tech Dive monthly calls with ESIP, and are interested in leveraging more opportunities, events, etc. to reduce redundancy and bring information to our members. Can serve as link to USGS data assets.
  11. RISCAPE (European Research infrastructures in the international landscape) (Ari Asmi)
  12. ESIP (Earth Science Information Partners) (Erin Robinson)

 

Related RDA groups

  • TAB
  • WG/IG Chairs
  • Education and Training on Handling of Research Data IG
  • Brokering IG
  • Data Foundations and Terminology IG

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

Given that the landscapes of interest are eternally changing making a map or maps virtually impossible to keep current, this IG will instead focus on more manageable areas of alignment.

  • Example WG topics:
    • Developing a vocabulary/ontology to describe components of research data infrastructures (that this does not exist has been a huge stumbling block for the MoL IG)
    • Mapping methodologies to document best practice for those wanting to undertake MoL’s
    • Developing a portfolio of Landscape mapping tools and comparing/contrasting strengths and weaknesses of each
    • Example outcomes:
    • Recommendations for others working on ‘mapping the landscape’ activities to increase alignment and possible future integration.
    • Promotion of knowlege of existing exercises

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • ESIP meetings - ESIP runs two meetings each year in the US, one in Summer and one in Winter.Their off-Plenary schedule directly complements the RDA calendar and distance virtual meeting options are supported as part of the meetings.
  • AGU meetings - The AGU Fall Meeting is a large international science meeting and attracted many interested MoL individuals this past year.We plan to continue taking advantage of this gathering as a feasible in-person discussion venue.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

September 2017 - P10 session: Mini Summit and consolidation of work plan

December 2017 - AGU meeting and progress report

March  2018 - P11 session and revisit workplan

September 2018 - P12

 

Review period start:
Friday, 1 September, 2017
Custom text:
Body:

Introduction

This interest group will provide a forum to discuss issues on management, sharing, discovery, archival and provenance of software source code. The group will pay special attention to source code that generates research data and plays an important role in scientific publications. The Research Data Alliance (RDA) mission is to build the social and technical bridges that enable open sharing of data. Software (as source code and executables) and data are intrinsically linked, both to ensure continued creation, analysis and reuse of data and also to preserve the knowledge of the software development, relationships with other assets and the context in which it was created.

This IG adds value to the RDA community by channeling expertise in software development, sharing, management, versioning, reproducibility and preservation into RDA, and into the RDA groups which could benefit from this expertise.
 
User scenario(s) or use case(s) the IG wishes to address

Software source code plays a critical role in all fields of modern research, where source code is written and developed to address a variety of needs, like cleaning, processing and visualising data. Software source code is a necessary component for research reproducibility and reusability. Thus software source code should be properly curated in the same way as other research inputs and outputs such as research data and paper publication. Software source code developers and organisations that sponsor software development should also be properly credited and attributed. 
 
Objectives

This interest group focuses on software source code as a first class citizen in the landscape of scientific research, related to but distinct from research data. The group’s objective is to bring together entities and individuals with complementary expertise and different use cases in order to address the following:

  • Develop a consistent metadata profile for discovery of software, source code, algorithms and other software artefacts
  • Review existing metadata for describing source code if they are already in place, especially those metadata that link source code to data and research publication;
  • Investigate if there is a need for additional specific metadata for software in order to make it citable, findable and accessible
  • Review existing schemas for identifying software artefacts
  • Identify and promote an identification schema specifically adapted to track software artefacts
  • Collect and publish use cases of current examples and practices
  • Develop guidelines for managing, describing and publishing software source code
  • Liaison with other groups in RDA which express interest in issues specifically related to software source code

Participation

This group is open to all RDA members to participate. 
This group will interact with the  following relevant RDA IGs/WGs:

  • Research data provenance IG&WG
  • PID kernel information WG
  • Reproducibility IG
  • Metadata IG
  • Preservation Tools, Techniques, and Policies IG
  • Virtual Research Environment IG (VRE-IG)
  • Data versioning IG
  • Data Citation WG

And other IGs/WGs if they become relevant to this group.

The group will also liaison with outside expertise on software that will be beneficial for RDA, like WSSSPE, FORCE11 (the software citation work in particular), the Software Sustainability Institute, the Software Heritage initiative, journals that publish software, and relevant national and international initiatives.
 
Outcomes

Provide an extensive background for RDA members on software source code development, sharing, management, versioning, reproducibility and preservation in order to foster the emergence of shared standards across the research community on how to describe, identify, find and attribute software source code.
 
Mechanism

This group will coordinate activities and communicate through following means:

  • Monthly teleconference to discuss specific issues
  • Asynchronous collaboration through Google docs, RDA mailing list and wikis
  • Inform other relevant RDA IG/WG of the group’s ongoing activities through RDA group mailing lists
  • Hold face-to-face interactions within and across groups at RDA plenaries.      

Timeline
 
In the first year, we plan to set up an active discussion in three key areas: metadata, identifiers, and use cases.
 

Potential Group Members

  • Benoit Baudry
  • Daniel S. Katz
  • Fernando Rios
  • Gribonval Rémi
  • Ian Bruno
  • Jen Martin
  • Jonathan Tedds
  • Julia Collins
  • Lesley Wyborn
  • Martin Hammitzsch
  • Martin Monperrus
  • Michelle Barker
  • Mingfang Wu
  • Morane Gruenpeter
  • Neil Chue Hong        (co-chair)
  • Roberto Di Cosmo   (co-chair)
  • Sandra Gesing
  • Stefanie Kethers
  • Victoria Stodden
Review period start:
Monday, 28 August, 2017
Custom text:

Pages