You are here

Body:

Introduction

This interest group will provide a forum to discuss issues on management, sharing, discovery, archival and provenance of software source code. The group will pay special attention to source code that generates research data and plays an important role in scientific publications. The Research Data Alliance (RDA) mission is to build the social and technical bridges that enable open sharing of data. Software (as source code and executables) and data are intrinsically linked, both to ensure continued creation, analysis and reuse of data and also to preserve the knowledge of the software development, relationships with other assets and the context in which it was created.

This IG adds value to the RDA community by channeling expertise in software development, sharing, management, versioning, reproducibility and preservation into RDA, and into the RDA groups which could benefit from this expertise.
 
User scenario(s) or use case(s) the IG wishes to address

Software source code plays a critical role in all fields of modern research, where source code is written and developed to address a variety of needs, like cleaning, processing and visualising data. Software source code is a necessary component for research reproducibility and reusability. Thus software source code should be properly curated in the same way as other research inputs and outputs such as research data and paper publication. Software source code developers and organisations that sponsor software development should also be properly credited and attributed. 
 
Objectives

This interest group focuses on software source code as a first class citizen in the landscape of scientific research, related to but distinct from research data. The group’s objective is to bring together entities and individuals with complementary expertise and different use cases in order to address the following:

  • Develop a consistent metadata profile for discovery of software, source code, algorithms and other software artefacts
  • Review existing metadata for describing source code if they are already in place, especially those metadata that link source code to data and research publication;
  • Investigate if there is a need for additional specific metadata for software in order to make it citable, findable and accessible
  • Review existing schemas for identifying software artefacts
  • Identify and promote an identification schema specifically adapted to track software artefacts
  • Collect and publish use cases of current examples and practices
  • Develop guidelines for managing, describing and publishing software source code
  • Liaison with other groups in RDA which express interest in issues specifically related to software source code

Participation

This group is open to all RDA members to participate. 
This group will interact with the  following relevant RDA IGs/WGs:

  • Research data provenance IG&WG
  • PID kernel information WG
  • Reproducibility IG
  • Metadata IG
  • Preservation Tools, Techniques, and Policies IG
  • Virtual Research Environment IG (VRE-IG)
  • Data versioning IG
  • Data Citation WG

And other IGs/WGs if they become relevant to this group.

The group will also liaison with outside expertise on software that will be beneficial for RDA, like WSSSPE, FORCE11 (the software citation work in particular), the Software Sustainability Institute, the Software Heritage initiative, journals that publish software, and relevant national and international initiatives.
 
Outcomes

Provide an extensive background for RDA members on software source code development, sharing, management, versioning, reproducibility and preservation in order to foster the emergence of shared standards across the research community on how to describe, identify, find and attribute software source code.
 
Mechanism

This group will coordinate activities and communicate through following means:

  • Monthly teleconference to discuss specific issues
  • Asynchronous collaboration through Google docs, RDA mailing list and wikis
  • Inform other relevant RDA IG/WG of the group’s ongoing activities through RDA group mailing lists
  • Hold face-to-face interactions within and across groups at RDA plenaries.      

Timeline
 
In the first year, we plan to set up an active discussion in three key areas: metadata, identifiers, and use cases.
 

Potential Group Members

  • Benoit Baudry
  • Daniel S. Katz
  • Fernando Rios
  • Gribonval Rémi
  • Ian Bruno
  • Jen Martin
  • Jonathan Tedds
  • Julia Collins
  • Lesley Wyborn
  • Martin Hammitzsch
  • Martin Monperrus
  • Michelle Barker
  • Mingfang Wu
  • Morane Gruenpeter
  • Neil Chue Hong        (co-chair)
  • Roberto Di Cosmo   (co-chair)
  • Sandra Gesing
  • Stefanie Kethers
  • Victoria Stodden
Review period start:
Monday, 28 August, 2017
Custom text:
Body:

Name of Proposed Interest Group: SHARC (SHAring Reward & Credit)

 

Introduction 

(A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

Data sharing statements and promotion is a strong reality but challenging, especially when considering the many obstacles that remain on several fronts. Among these obstacles is the lack of relevant and recognized rewarding mechanisms for the very specific efforts required to share organized datasets. 

 

The prerequisite for data sharing lies in implementing the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) which can add to the workload if done by the researchers themselves; however this aspect is never accounted for when activity is evaluated by funders or reviewers.

In some cases, resources may come from different domains that were not necessarily initially developed for research (e.g. museum, clinical care….). Data and physical resources sharing each comprise very different steps, methods and the involvement of diverse communities:

  • Building of a research collection or resource infrastructure according to the FAIR principles (including all necessary steps for data and physical entities repositories)
  • Elaboration of governance and sharing policies for the resource
  • Development of tools to follow up on the use of the resource

Individuals with different expertise may contribute at each of these steps (laboratory technicians, resource managers, researchers, legal experts…).

 

In 2014, the Expert Advisory Group on Data Access (EAGDA) carried out, through UK cohort studies, research into the governance of data access. The aim of their research was to identify the factors that help or hinder individual researchers in making their data (both published and unpublished) available to other researchers, and to examine the potential need for new types of incentives in order to enable data access and sharing. Among their findings:

  • Research culture and environment are not perceived as providing sufficient support, nor adequate rewards for researchers who generate and share high-quality datasets.
  • Making data accessible to others can carry a significant cost to researchers (both in terms of financial resource and the time it requires);
  • There is typically very little, if any, formal recognition for data outputs in key assessment processes – including in funding decisions, academic promotion;
  • Data managers have an increasingly vital role as members of research teams, but are often afforded a low status and few career progression opportunities;

Recommendations:

  • At first, develop mechanisms that encourage and reward good practice, rather than on penalise researchers who fail to fulfil their planned approaches for sharing data (the carrot not the stick)
  • It is vital therefore that funders and research leaders foster an active, on-going dialogue with international partners and work with them to build common incentive structures and effect cultural change.
  • Recognise the contribution of those who generate and share high quality datasets, including as a formal criterion for assessing the track record and achievements of researchers during funding decisions.
  • Form a partnership among funders, research institutions and other stakeholders to establish career paths for data managers.
  • Ensure that the contributions of both early-career researchers and data managers are recognised and valued appropriately, and that the career development of both types of individuals is nurtured.
  • Champion greater recognition of data outputs in the assessment processes to which they contribute.
  • Strengthen career pathways for data managers; and recognise data outputs in performance reviews

(https://wellcome.ac.uk/sites/default/files/establishing-incentives-and-changing-cultures-to-support-data-access-eagda-may14.pdf)

These recommendations have not been promoted so much out of UK whilst they are of great interest at the international level of research governance.

 

Existing initiatives recognise the value of certain steps of the chain towards sharing resources however gaps remain to be filled, especially as regards physical resources. As an example, the BRIF initiative (BRIF: Bioresource Research Impact factor/framework) has already tackled these issues for human biological samples and data. As a result, the CoBRA guideline has been produced and work performed on unique identifiers and relevant parameters towards specific metrics.

 

RDA IGs could build on those previous results and ideas to further identify such gaps and suggest practical solutions to promote resources provision to the community as a valuable genuine activity in research practices.

 

The workflow of the entire process, from resources production to their impact back on the producer has not been explored in RDA groups, to our knowledge. Furthermore, RDA community is focused mainly on data. Extending the work to resources that also have physical samples in addition to data would be a value-added contribution. As part of its mission in keeping within the goal of RDA, the IG will work at finding solutions to foster open sharing of resources. 

 

User scenario(s) or use case(s) the IG wishes to address

(what triggered the desire for this IG in the first place):

 

Use case n°1 - The biomedical community: A growing portion of research relies on sample collections and databases. This is especially true in biological and medical sciences with the development of large scale biology in the –omics era. High throughput ‘omics’ platforms require biospecimens, and generate a great amount of data on large numbers of patients and/or healthy individuals. The size and complexity of the collections needed to promote translational research typically extends far beyond the scope of individual research projects and the need to produce these valuable data is being met by contemporary bioresource facilities. While sharing of such resources is essential for optimizing knowledge production, so far only a very small part of them are. A major obstacle lies in the fact that establishing a valuable bioresource requires considerable time and effort.  Finding ways to recognize and credit this upstream work is essential.

 

Use case 2 - The Industrial Ecology community: Industrial ecologists rely heavily on data to assess the environmental performances of product during their life cycle. This requires interdisciplinary data from several domains such: as chemistry, ecology, economy, toxicology and climate science, among others. Currently, the availability of harmonized datasets for environmental Life Cycle Assessment (LCA) of products is scarce and the existing proprietary databases are incomplete. Sharing research data on a Research Data Infrastructure is an additional, time-consuming effort for researchers that is not acknowledged. Reward mechanisms for sharing data would significantly improve the transparency of such products’ environmental assessments and the accuracy of environmental models. Moreover, it would also have a high informational value, facilitating responsible consumption and thus, increase the weight of the public opinion’s pressure for significant environmental improvements of activities with high environmental impact.

 

Use case 3 - Data produced by marine and terrestrial biodiversity research projects that evaluate and monitor Good Environmental Status have a high potential of use by stakeholders involved in environmental management. The accessibility of data on the environment, especially in ecology, has never been more problematic, however. The cost of these data and their heritage value is increasingly highlighted, whereas due to budgetary constraints, the resources allocated to their production and their availability are limited. Rewarding data sharing could have a beneficial impact on the whole system. As a case in point, the data produced by biodiversity research are heterogeneous and produced by a multitude of entities, therefore standard formats and protocols would allow the interconnection of databases, and semantic approaches could contribute significantly to their interoperability. However, the specific scientific objectives and the logistics of project management and information gathering lead to a decentralised distribution of data, which can hinder environmental research. Moreover, data are considered as a technical end, and should be more intended as a scientific end, as an object of study: by furthering primary analyses, in the context of a research question for which they have been collected, data can be reused - within the limits allowed by their quality - and their exploration, by appropriate method as graphs, may lead to the formulation of new scientific hypotheses. Actually, the “rising tide of data” requires new approaches to data management and data preservation; access and sharing should be supported in a seamless way. According to the situational analysis of the French landscape of biodiversity research observatories[1], data planning, collection, quality assurance, description, conservation and analysis are mostly led by observatories, whereas data discovery (of potentially useful data) and data integration from varied sources are poorly done. This case study aims to present the latest trends in data infrastructure and data management solutions for research and to discuss the progress of the Open Science Cloud, tools and initiatives about data sharing rewarding in the field of biodiversity and environmental data.

 

A wide range of disciplines face the issue of no or little data sharing, including but not limited to the above mentioned use cases. They could be addressed within the SHARC IG along with its development and ongoing membership:

  • Low-temperature physics: cryostats data
  • Earth science: samples and data
  • Materials science: catalysts, microscopy data, etc.
  • Social science: raw data from surveys, interviews, focus groups or case studies
  • Neuroscience: imaging data.

(See Anita de Waard 0000-0002-9034-4119; VP Research Data Collaborations ; Elsevier RDM Services)

 

Objectives

(A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

The SHARC IG group will have four main objectives:

 

  1. To review the existing rewarding mechanisms in various communities, as well as their limits and identify factors that could to improve the process and optimize the sharing of bioresources; i.e. data and physical samples (ex: tools, incentives, requirements…).
  2. To use this analysis to encourage the inclusion of bioresources sharing-related criteria in the research evaluation process at the European institutional level, (i.e. without making this activity mandatory, increase coherence between evaluation and real practice).
  3.  To disseminate information and findings to diverse communities of stakeholders.
  4. To develop a process for stepwise adoption of principles and implementation measures adapted to national, local and institutional contexts. 

 

Participation

(Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

Currently, seven different communities are represented in the group (details in the table at the end of document): Biology and Biomedicine (7 ppl.), Information Sciences and Technology (3 ppl.), Geospatial data (1 ppl.), Marine Biology (1 ppl.), Biodiversity (2 ppl.) , Industrial Ecology (1 ppl.), Bioethics (4 ppl.),

Anne Cambon-Thomsen is the initial leader.

Laurence Mabile will dedicate 30 % of her workload to the coordination of the group itself. Co-chairs will help in interacting with the relevant RDA groups and to coordinate meetings on their continent.

The different communities will contribute to the white paper detailing the existing and lacking rewarding mechanisms in the sharing process.

 

Three existing RDA groups have identified themselves during our BoF session at RDA P9, as having common concerns: the ‘Research data provenance working group’, the ‘RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship’ and the RDA/WDS Publishing Data Workflows WG.  Data Citation WG, Elixir Bridging Force IG, Reproducibility IG may have some overlapping interests, too.

Those groups will be contacted via the RDA platform, and virtual meetings will be organized to start with. If relevant, cross-sessions will be organized at RDA plenaries. We also plan to alert them about the events organized by our BoF/IG group.

 

 

Outcomes

(Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

  • White paper /position paper on ‘rewarding’ mechanisms (existing and lacking) for sharing bioresources and their link to research institutional evaluation; To be published if possible as an RDA endorsed paper in an open access high visibility science journal with a science policy section
  • Submission of a session proposal to European Science Open Forum 2018, Toulouse
  • Answering to the next European Community public stakeholder consultation related to the preparation of the EU research FP9 and explore the possibility to include such recognition criteria in FP9 as well as in an EU-level strategies that foster implementation at an institutional level (such as what exists for human resources with the HRS4R (Human resource strategy for research)) .
  • Forming RDA working groups to address issues such as whether future working groups will pertain to a specific community (like ecological, biomedical, geospatial…) or resolve around specific stakeholders across communities (editors, funders, governing bodies of research institutions, research evaluation policy makers…) or both.

Mechanism

(Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

  • Virtual web meetings will be organized as often as necessary, with a minimum of once a month for a regular update.
  • Face to face meetings will be encouraged at each RDA plenary conference.
  • Regular feedback will be relayed towards all interested RDA groups about relevant meetings and conferences of interest for group members.

 

 

Timeline

(Describe draft milestones and goals for the first 12 months):

  • June 2017: ESOF session proposal

The submission of a proposal to ESOF (Euroscience Open Forum) for a scientific session has been done under the coordination of Fiona Murphy.

The conference will be held in July 2018 in Toulouse, FR.

More info at:

http://www.esof.eu/en/about/programme/call-for-proposals.html

 

  • RDA plenary conference 10, Montreal, 19-21 sept 2017:

Attendance by some of the group members; mapping of overlapping topics by other groups and contacting them

 

  • First draft of the white paper: end 2017

 

Potential Group Members

(Include proposed chairs/initial leadership and all members who have expressed interest):

*In bold, co-chairs

FIRST NAME

LAST NAME

INSTITUTION/ COUNTRY

Anne

Cambon-Thomsen

Public Health Department, INSERM-University Toulouse III, FR

Laurence

Mabile

Public Health Department, INSERM-University Toulouse III, FR

Rodrigo

Costas-Comesana

Centre for Science and Technology Studies (CWTS). Faculty of Social and Behavioral Sciences. Leiden University.

Mogens

Thomsen

Public Health Department, INSERM-University Toulouse III, FR

Michele

De Rosa

Bonsai/Denmark
Aalborg University/Denmark

Laurent

Dollé

Erasme Hospital, ULB, 1070 Brussels, Belgium

Mohamed

Yahia

INIST, CNRS, FR

Fiona

Murphy

MMC Ltd (Research Data/Publishing Consultant); University of Reading

Elena

Bravo

Research Coordination and Support Service, Istituto Superiore di Sanità (National Health Institute),  IT

Martina

Zilioli

Institute for Electromagnetic Sensing of Environment (Milan), IT

Sofie

Bekaert

Clinical Research Center of Ghent University Hospital

Romain

David

CNRS, Mediteraneen Institute of Biodiversity and Marine and Continental Ecology

Anna

Cohen Nabeiro

Fondation pour la Recherche sur la Biodiversité, ECOSCOPE (Observations et données sur la biodiversité), FR

Alison

Specht

Fondation pour la Recherche sur la Biodiversité, CESAB (Centre de synthèse et d’analyse sur la biodiversité), FR

Jane

Carpenter

NSW Health Pathology -

Biobanking Services|, Australia

Anne Marie

Tassé

P3G

Gabrielle

 

Bertier

 

Centre of Genomics and Policy, McGill University Human Genetics department, Canada

INSERM-University Toulouse III, FR

Jantina

De Vries

Department of Medicine

University of Cape Town,

South Africa

Louise

Bezuidenhout

Institute for Science Innovation and Society, University of Oxford

[1] Fondation pour la Recherche sur la Biodiversité (2016), Etat des lieux et analyse du paysage national des observatoires de recherche sur la biodiversité, une étude de l’infrastructure ECOSCOPE. Série FRB, Expertise et synthèse. Ed. Aurélie Delavaud et Robin Goffaux, 72 pp.

 

Review period start:
Monday, 7 August, 2017
Custom text:
Body:

NOTE - This Case Statement has been updated in the revised version attached (3 Jan 2018)

 

 

A variety of stakeholders are showing growing interest in exposing data management plans (*) to other actors (human/machine) in the research lifecycle, beyond their creator and the funder or institution that mandates their production. Interested stakeholders include researchers themselves, funders, institutions, and a variety of service providers and community organisations including repositories, institutions, journals, publishers, and providers of tools for writing and maintaining plans.  Implementation and adoption is currently hampered by two problems:

  • A lack of standards for expression and interchange of DMPs

  • Insufficient understanding of the needs of users and the benefits and risks of different modes of action

This proposed working group will address both of these issues; the issue of a standardised form of expression for DMPs is the concern of the proposed DMP Common Standards Working Group. The group’s output will include a reference model and alternative strategies for exposing plans, to best serve community interests in meeting FAIR principles,  based on shared experience of ‘early adopters’ in test implementations. It will be supported by work to gauge user needs and motivations for exposing DMPs as well as perceived risks and disbenefits. Note * our main focus is on Data Management Plans (DMPs) but we will seek examples of Software Management Plans (SMPs) where relevant to the exposure use cases of interest to the Active DMP Interest Group.   

The key beneficiaries of the WG outcomes will be stakeholders with a common interest in using Data or Software Management Plans as instruments for demonstrating that research products have been managed according to research community standards and generic principles (e.g. that the research products should be FAIR), and that recognition is given for doing so. 

There is potential value in exposing plans for a variety of stakeholders involved in their production and consumption. These include researchers themselves, funders, institutions, and a variety of service providers and community organisations including repositories, institutions, journals, publishers, and providers of tools to help write and maintain plans. The WG will provide a Use Cases Catalogue to describe implementation scenarios and articulate their benefits to researchers and other stakeholders, with case studies of how those benefits have been realised. Through consultation with users of well-established planning tools (DMPTool, DMPonline), the Use Cases Catalogue will also identify the degree of acceptance among researchers for the levels of exposure/publication each use case entails, barriers to realising the benefits, and any concerns about undesirable impacts.

Generalising from the scenarios and  examples contained in the Use Cases Catalogue, the WG will produce a Reference Model to document generic components and workflows for exposing plans (and metadata about them), and offer recommendations for further action by each of the relevant stakeholder groups . By gaining endorsements for the Reference Model from relevant stakeholders for each use case we will provide a community endorsed approach to using plans to share demonstrable advancement in data sharing practice.

Review period start:
Monday, 24 July, 2017 to Thursday, 24 August, 2017
Custom text:
Body:

Name of Proposed Interest Group:

 

Disciplinary Collaboration Framework (DCF)

(originally introduced as Disciplinary Interoperability Framework)

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

A fragmented landscape or a diverse ecosystem?

 

Over the last couple of years, we have witnessed an increase in the number of Interest and Working Groups operating within RDA. A significant proportion of that increase is due to the creation of disciplinary groups[1]. The operation of such groups in RDA is crucial, as they act as direct channels for communication and collaboration between RDA and their respective scientific communities. As such they enable the interplay between the RDA outputs and community practices, tools and infrastructures. There are approximately (based on the definition used) 20 IGs that can be considered ‘disciplinary’ currently established and active in the wider RDA ecosystem.

 

However, to benefit fully from the existence of these groups it is vital that the RDA community self-organises its activities, to turn the challenges associated with a fragmented landscape into opportunities derived from the operation of a diverse ecosystem. Arguably, the turning point is the capacity of the RDA community to organically develop interfaces between groups, and streamline the inter-group communication.

 

The need for the formulation of a group that will take on the work of strengthening the voice and position of disciplines within RDA, was first identified during a panel discussion at Sci Data Con and a subsequent paper was published in the CODATA Data Science Journal (Genova, F. et al., (2017) Building a Disciplinary, World‐Wide Data Infrastructure. Data Science Journal. 16, p.16. DOI: http://doi.org/10.5334/dsj-2017-016).

 

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

Issues relating to managing, linking and curating research data are often perceived in different ways within different disciplines. This has led to a challenging landscape that lacks a consistent requirements framework. Such a framework could can drive and steer development of technological solutions and improve their applicability across scientific disciplines. A collaboration and coordination forum where these issues are openly addressed from a discipline specific perspective is needed. Such a forum, however, needs to be organized and operated from the respective groups themselves, providing them with the flexibility to steer the agenda in an agile and responsive manner according to changing needs.

 

The RDA DCF Interest Group will act as a collaboration and coordination working space, bringing together representatives from communities of practice across scientific disciplines to better organise and drive the discussion for prioritising, harmonising and efficiently articulating communities’ needs. 

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

The RDA Disciplinary Collaboration Framework (DCF) sets out with a vision and a clear list of standing objectives in support of its work within the RDA ecosystem.

 

Vision

Strengthen the voice of disciplinary groups and improve the clarity and visibility of discipline-specific data management and stewardship needs within RDA. Work towards the development of a disciplinary interoperability framework.

 

Mission statements

  1. Identify and describe common challenges, needs and objectives of scientific communities of practice relevant to managing and sharing their research data;
  2. Improve communication and interplay between disciplinary groups;
  3. Connect and liaise between disciplinary groups with technical and socio-cultural cross-cutting groups;
  4. Improve visibility and applicability of RDA outputs across disciplines;
  5. Act as a forum and represent discipline specific communities that are currently not represented as an RDA group.

 

Objectives/Focus areas

 

The quick wins

  • Act as an inter-disciplinary open forum;
  • Act as a forum to introduce and discuss RDA outputs;
  • Perform a gap analysis for disciplinary participation in RDA;
  • Support RDA domain ambassadors and the ambassadors’ scheme;
  • Act as a single authoritative voice in RDA, representing disciplines.

 

The long runs

  • Use the group as a window to RDA for scientific communities that currently do not participate in RDA;
  • Provide authoritative opinions to TAB/OAB/Council as needed on disciplinary engagement and coordination matters;
  • Take actions towards the defragmentation of the disciplinary groups landscape;
  • Identify and prioritise common technical challenges

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

The DCF is predicated upon strong participation by all co-chairs of discipline specific Interest Groups, as well as, individuals who represent scientific disciplines outside formal RDA groups. DCF will also invite all co-chairs of other cross-cutting groups addressing technical and socio-cultural issues to participate in the DCF meetings.

 

As the group develops its working agenda and selects specific issues to address, it will make calls to specific RDA Interest and Working Groups to participate.

 

Recognising the role of the group in the wider RDA organisation, the group will have an open invitation to members of all the organisational bodies (Secretariat, TAB, OAB and Council members).

 

Sessions and proceedings of the group will be public and subject to community review/comment.

 

Rules of procedure of the group will be further developed and agreed at its inaugural meeting.

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

As mentioned above, DCF will act as a collaboration and activity coordination space for disciplinary groups and discipline-representing individuals. Following a prioritisation exercise of the technical and socio-cultural issues that cross-cut disciplinary needs,

 

DCF will:

 

  1. Propose and support joint sessions at RDA plenaries between technical and disciplinary groups;
  2. Propose the formation of new working groups to address specific challenges, which are not otherwise addressed by existing groups;
  3. Support the development of new disciplinary IGs, to address gaps in the scientific coverage;
  4. Organise focused sessions and events to help disciplinary groups navigate and exploit RDA outputs/products.
  5. Support disciplinary ambassadors in their role within their respective communities;
  6. Other outputs as evaluated by the group membership.

 

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

  1. Breakout sessions during Plenary meetings (every six months)
  2. Participation in the RDA co-chairs collaboration meetings (every six months) 
  3. Online meetings (on an ad-hoc basis)     

 

By having a staggered meeting schedule, we will ensure that the group will convene quarterly (baseline schedule).

 

Timeline (Describe draft milestones and goals for the first 12 months):

 

Month 6: Report on gap analysis for disciplinary participation; Which disciplines are represented and which key scientific areas are not.

 

Month 12: Communication across the RDA ecosystem of key group statements on urgent issues (statements)

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

 

 

FIRST NAME

LAST NAME

EMAIL

TITLE

Andi

Ogier

 

Member

Andrea

Perego

 

Member

Claire

Austin

 

Member

Dimitrios

Koureas

d.koureas@nhm.ac.uk

Co-chair

David

Schade

david.schade@nrc-cnrc.gc.ca

Co-chair

Francoise

Genova

francoise.genova@astro.unistra.fr

Member

Gail

Clement

 

Member

Helen

Glaves

hmg@bgs.ac.uk

Member

Ilya

Zaslavsky

 

Member

Rainer

Stotzka

 

Member

Rebecca

Koskela

 

Member

Rob

Hooft

 

Member

Sarah

Ramdeen

 

Member

Simon

Hodson

 

Member

Tobias

Weigel

 

Member

Ian

Bruno

bruno@ccdc.cam.ac.uk

Member

Bridget

Almas

bridget.almas@tufts.edu

Member

Richard

Kidd

kiddr@rsc.org

Member

Martin

Hicks

bridget.almas@tufts.edu

Member

Wenbo

Chu

wchu@geosec.org

Member

 

 

 

 


[1] Disciplinary groups are herein defined as groups that approach research data challenges from the perspective of specific scientific disciplines. Examples of such groups in RDA include:  Agricultural Data Interest Group (IGAD); Biodiversity Data Integration IG; Chemistry Research Data IG; Data for Development IG; Digital Practices in History and Ethnography IG; Education and Training on handling of research data IG; Geospatial IG; Global Water Information IG; Health Data; Linguistics Data IG; Marine Data Harmonization IG; Quality of Urban Life IG; RDA/CODATA Materials Data, Infrastructure & Interoperability IG; Structural Biology IG; Weather, climate and air quality.

Review period start:
Monday, 17 July, 2017
Custom text:
Body:

WG Charter: A concise articulation of what issues the WG will address within an 18 month time frame and what its “deliverables” or outcomes will be.

 

The need for establishing this working group was articulated during the 9th plenary meeting in Barcelona during the Active DMPs IG session.  The discussion was framed by a white paper by Simms et al. on machine-actionable data management plans (DMPs). The white paper is based on outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, America, Australia, and Europe. It describes eight community use cases which articulate consensus about the need for a common standard for machine-actionable DMPs (where machine actionable is defined as “information that is structured in a consistent way so that machines, or computers, can be programmed against the structure”)

 

The specific focus of this working group is on developing common information model and specifying access mechanisms that make DMPs machine-actionable. The outputs of this working group will help in making systems interoperable and will allow for automatic exchange, integration, and validation of information provided in DMPs, for example, by checking whether a provided PID links to an existing dataset, if hashes of files match to their provenance traces, or whether a license was specified. The common information models are NOT intended to be prescriptive templates or questionnaires, but to provide re-usable ways of representing machine-actionable information on themes covered by DMPs.

 

The vision that this working group will work to realise is one where DMPs are developed and maintained in such a way that they are fully integrated into the systems and workflows of the wider research data management environment. To achieve this vision we will develop a common data model with a core set of elements. Its modular design will allow customisations and extensions using existing standards and vocabularies to follow best practices developed in various research communities. We will provide reference implementations of the data model using popular formats, such as JSON, XML, RDF, etc.  This will enable tools and systems involved in processing research data to read and write information to/from DMPs. For example, a workflow engine can add provenance information to the DMP, a file format characterization tool can supplement it with identified file formats, and a repository system can automatically pick suitable content types for submission and later automatically identify applicable preservation strategies.

 

The deliverables will be publicly available under CC0 license and will consist of models, software, and documentation. The documentation will describe functionality and semantics of terms used, rationale, standard compliant ways for customisation, and requirements for supporting systems to fully utilise the capabilities of the developed model.

 

The working group will be open to everyone and will involve all stakeholders representing the whole spectrum of entities involved in research data management, such as: researchers, tool providers, infrastructure operators, repository staff and managers, software developers, funders, policy makers, and research facilitators. We will take into account requirements of each group.This will likely speed up and increase adoption of the working group outcomes.

 

The group will predominantly collaborate online, but will use any possibility to meet in person during RDA plenaries, conferences, workshops, hackathons or other events in which their members participate. All meetings in which decisions are made will be documented and their summaries will be circulated using the RDA website.

 

The work will be performed iteratively and incrementally following the best practices from system and software engineering. We will evaluate preliminary drafts of the model with community to receive early feedback and to ensure that the developed common model is interoperable and exchangeable across implementations. We will also express existing DMPs using the developed common model and will investigate how to support modification of machine actionable DMPs by various tools involved in data management process, while ensuring that proper provenance and versioning information is stored with. Finally, we will build prototypes to investigate possible system integrations and to evaluate to which degree the information contained in the DMPs can be automatically validated and which actions or alerts depending on a DMP state can be triggered, e.g. by sending notifications to repositories or funder systems.

 

During our work we will monitor parallel efforts and engage with various research communities to find candidates for pilot studies and to transfer the acquired know-how. Towards the end of the lifetime of the working group we will launch pilot projects in which the model will be customised to suit the needs of the identified interested communities. Pilot studies will use the models to integrate systems and demonstrate how machine-actionable DMPs can work.

 

We believe that the outcomes delivered by this group will contribute to improving the quality of research data and research reproducibility, while at the same time reducing the administrative burden for researchers and systems administrators.

 

Value Proposition: A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.

 

A common data model for machine-actionable DMPs will enable interoperability of systems and will facilitate automation of data collection and validation processes. The common model and accompanying interfaces and libraries are an essential building block for the infrastructure. Although for some stakeholder groups, the developments will be invisible (and should be) so that the unification and standardisation of a DMP model will bring benefits to all of them.

  • Researchers will benefit from having fewer administrative procedures to follow.  Machine-actionable DMPs can facilitate the automatic collection of  metadata about experiments. They will accompany experiments from the beginning and will be updated over the course of the project. Consecutive tools used during processing can read and write data from machine-actionable DMPs. As a result, parts of the DMPs can be automatically generated and shared with other collaborators or funders. Furthermore, researchers whose data is reused in other experiments will gain recognition and credit because their data can be located, reused, and cited more easily.

  • Reusing parties will gain trust and confidence that they can build on others’ previous work because of a higher granularity of available information.

  • Funders and repositories will be able to automatically validate DMPs.  For example, they will be able to check whether the specified ORCID iD or e-mail are correct, whether the data is available at the specified repository, and whether the data checksums are correct – in other words, whether the information provided in a DMP reflects reality.

  • Infrastructure providers will get a universal format for exchange of (meta-) data between the systems involved in data processing and data storage. They could also be able to automate processes associated to DMPs, like backup, storage provision, grant access permissions, etc.

  • Society will be better able to safeguard investment made in research and will gain assurance that scientific findings are trustworthy and reproducible, while the underlying data is available and properly preserved.  


Download the full case statement 

Review period start:
Tuesday, 4 July, 2017 to Friday, 4 August, 2017
Custom text:
Body:

Case Statement for RDA WG DMP Common Standards

 It can also be found as PDF here: https://drive.google.com/open?id=0BwdfVsSKpOzveGJRSWxtaTBWbzA 

Contents

  • WG Charter

  • Value Proposition

  • Engagement with existing work in the area

  • Work Plan

  • Adoption Plan

  • Initial Membership

 

WG Charter: A concise articulation of what issues the WG will address within an 18 month time frame and what its “deliverables” or outcomes will be.

The need for establishing this working group was articulated during the 9th plenary meeting in Barcelona during the Active DMPs IG session.  The discussion was framed by a white paper by Simms et al. on machine-actionable data management plans (DMPs). The white paper is based on outputs from the IDCC workshop held in Edinburgh in 2017 that gathered almost 50 participants from Africa, America, Australia, and Europe. It describes eight community use cases which articulate consensus about the need for a common standard for machine-actionable DMPs (where machine actionable is defined as “information that is structured in a consistent way so that machines, or computers, can be programmed against the structure”)

The specific focus of this working group is on developing common information model and specifying access mechanisms that make DMPs machine-actionable. The outputs of this working group will help in making systems interoperable and will allow for automatic exchange, integration, and validation of information provided in DMPs, for example, by checking whether a provided PID links to an existing dataset, if hashes of files match to their provenance traces, or whether a license was specified. The common information models are NOT intended to be prescriptive templates or questionnaires, but to provide re-usable ways of representing machine-actionable information on themes covered by DMPs.

The vision that this working group will work to realise is one where DMPs are developed and maintained in such a way that they are fully integrated into the systems and workflows of the wider research data management environment. To achieve this vision we will develop a common data model with a core set of elements. Its modular design will allow customisations and extensions using existing standards and vocabularies to follow best practices developed in various research communities. We will provide reference implementations of the data model using popular formats, such as JSON, XML, RDF, etc.  This will enable tools and systems involved in processing research data to read and write information to/from DMPs. For example, a workflow engine can add provenance information to the DMP, a file format characterization tool can supplement it with identified file formats, and a repository system can automatically pick suitable content types for submission and later automatically identify applicable preservation strategies.

The deliverables will be publicly available under CC0 license and will consist of models, software, and documentation. The documentation will describe functionality and semantics of terms used, rationale, standard compliant ways for customisation, and requirements for supporting systems to fully utilise the capabilities of the developed model.

The working group will be open to everyone and will involve all stakeholders representing the whole spectrum of entities involved in research data management, such as: researchers, tool providers, infrastructure operators, repository staff and managers, software developers, funders, policy makers, and research facilitators. We will take into account requirements of each group.This will likely speed up and increase adoption of the working group outcomes.

The group will predominantly collaborate online, but will use any possibility to meet in person during RDA plenaries, conferences, workshops, hackathons or other events in which their members participate. All meetings in which decisions are made will be documented and their summaries will be circulated using the RDA website.

The work will be performed iteratively and incrementally following the best practices from system and software engineering. We will evaluate preliminary drafts of the model with community to receive early feedback and to ensure that the developed common model is interoperable and exchangeable across implementations. We will also express existing DMPs using the developed common model and will investigate how to support modification of machine actionable DMPs by various tools involved in data management process, while ensuring that proper provenance and versioning information is stored with. Finally, we will build prototypes to investigate possible system integrations and to evaluate to which degree the information contained in the DMPs can be automatically validated and which actions or alerts depending on a DMP state can be triggered, e.g. by sending notifications to repositories or funder systems.

During our work we will monitor parallel efforts and engage with various research communities to find candidates for pilot studies and to transfer the acquired know-how. Towards the end of the lifetime of the working group we will launch pilot projects in which the model will be customised to suit the needs of the identified interested communities. Pilot studies will use the models to integrate systems and demonstrate how machine-actionable DMPs can work.

We believe that the outcomes delivered by this group will contribute to improving the quality of research data and research reproducibility, while at the same time reducing the administrative burden for researchers and systems administrators.

 

Value Proposition: A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.

A common data model for machine-actionable DMPs will enable interoperability of systems and will facilitate automation of data collection and validation processes. The common model and accompanying interfaces and libraries are an essential building block for the infrastructure. Although for some stakeholder groups, the developments will be invisible (and should be) so that the unification and standardisation of a DMP model will bring benefits to all of them.

  • Researchers will benefit from having fewer administrative procedures to follow.  Machine-actionable DMPs can facilitate the automatic collection of  metadata about experiments. They will accompany experiments from the beginning and will be updated over the course of the project. Consecutive tools used during processing can read and write data from machine-actionable DMPs. As a result, parts of the DMPs can be automatically generated and shared with other collaborators or funders. Furthermore, researchers whose data is reused in other experiments will gain recognition and credit because their data can be located, reused, and cited more easily.

  • Reusing parties will gain trust and confidence that they can build on others’ previous work because of a higher granularity of available information.

  • Funders and repositories will be able to automatically validate DMPs.  For example, they will be able to check whether the specified ORCID iD or e-mail are correct, whether the data is available at the specified repository, and whether the data checksums are correct – in other words, whether the information provided in a DMP reflects reality.

  • Infrastructure providers will get a universal format for exchange of (meta-) data between the systems involved in data processing and data storage. They could also be able to automate processes associated to DMPs, like backup, storage provision, grant access permissions, etc.

  • Society will be better able to safeguard investment made in research and will gain assurance that scientific findings are trustworthy and reproducible, while the underlying data is available and properly preserved.  

 

Engagement with existing work in the area: A brief review of related work and plan for engagement with any other activities in the area.

The need for machine-actionable DMPs is recognized by the community and is being discussed within the Research Data Alliance. Participants of the CERN workshop organized in 2016 identified “encodings for exporting DMPs” as one of the next developments needed[1].  Automation and machine actionability are meant to be key factors enabling deployment of the European Open Science Cloud. A workshop on machine-actionable DMPs organized by the Digital Curation Centre and University of California Curation Center at the California Digital Library at IDCC in Edinburgh in 2017 resulted in a white paper that describes the current state of the art and expresses a need for a common standard for machine-actionable DMPs.

As a result of these ongoing discussions the participants of the 9th plenary meeting in Barcelona during the Active DMPs IG session decided to establish specific working groups that address various identified challenges related to DMPs. The proposed group on DMP common standards will address a high-priority challenge based on the most recent assessments of community needs.

Members of the proposed group are well connected to various community-based initiatives and working groups that address similar topics. The group will monitor and align the efforts with others in this area. We will specifically monitor:

  • RDA groups related to DMPs, such as, but not limited to:

    • Active DMPs IG,

    • Research Data Repository Interoperability WG,

    • Reproducibility IG,

    • e-Infrastructure IG,

    • RDA/WDS Certification of Digital Repositories IG,

    • BioSharing Registry: connecting data policies, standards & databases in life sciences WG,

    • Exposing DMPs WG (under review),

  • tools, such as, but not limited to:

    • DMPTool,

    • DMPonline,

    • RDM Organiser,

  • the DMP fora e.g. Force 11 FAIR DMP or Belmont Forum e-Infrastructures and Data Management Collaborative Research Action

  • e-Infrastructure projects e.g. OpenAIRE, EUDAT, European Open Science Cloud (EOSC)

  • W3C,

  • and others.

 

Work Plan: A specific and detailed description of how the WG will operate including:

·         The form and description of final deliverables of the WG,

D1. Common data model for machine-actionable DMPs
This deliverable will contain the developed data model and documentation describing semantics of terms used, rationale, and standard compliant ways for customisation of the model.

D2. Reference implementations

Reference implementation of the common data model will provide ready to use models in popular standards such as JSON, XML, RDF, etc. It will also provide example models of DMPs in each format.

D3. Guidelines for adoption of the common data model

Guidelines will be based on lessons learned from the common model development and prototyping. They will describe requirements for supporting systems to fully utilise the capabilities of the common data model.

·     The form and description of milestones and intermediate documents, code or other deliverables that will be developed during the course of the WG’s work,

 

M1. Requirements and candidate solutions reviewed (M5)

We will analyse existing DMP tools, as well as tools from domains of digital preservation, reproducible research, open science, and data repositories that cover the full data lifecycle. We will look for mappings to popular DMP creation tools, such as checklists, discuss lessons learned, and identify limits of automation and machine actionability. We will also investigate modelling techniques used in model engineering and linked data domains to identify suitable notation and tools for the common model.  Furthermore, we will identify and analyse existing domain-specific standards and evaluate their applicability. Based on this research, we will define requirements for the common model and identify domain-specific models and controlled vocabularies that need to interoperate with the common data model.

 

M2. Common model specification drafted (M10)

We will design a common data model and example expressions in mainstream representation formats (e.g. JSON). The development will be iterative and based on both real and synthetic examples of DMPs. We will develop prototypes to demonstrate how the model works and what its capabilities are.

 

M3. Common model refined (M15)

We will develop further extensions to the core model (the model will likely be modular) to evaluate its scalability and customisability. Furthermore, we will test integrations with existing tools and continue evaluation using sample DMPs. Based on these activities we will introduce necessary refinements to the common data model.

 

M4. Dissemination and pilot studies (M18)

We will formulate guidelines for the adoption of the common model and release final documentation of the developed model and reference implementations. We will disseminate the results of our work through mailing lists, participation in conferences, as well as social media. We will launch pilot studies that implement the working group outcomes. We will facilitate and encourage crowd-sourced descriptions of implementations beyond the direct activities of the working group.

 

·         A description of the WG’s mode and frequency of operation (e.g. on-line and/or on-site, how frequently will the group meet, etc.),

The group will predominantly collaborate online, but will use any opportunity to meet in person during RDA plenaries, conferences, workshops, hackathons or other events in which their members participate. All meetings in which decisions are made will be documented and their summaries will be circulated using the RDA website.

The group will have regular monthly calls to report on progress and discuss open issues. We will also use GitHub to host developed models and source code. We will use issue tracking mechanisms to discuss enhancements, bugs, and other issues. Important updates, such as reaching a milestone, will be communicated through the RDA website.

·         A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation, and

Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.

 

·         A description of the WG’s planned approach to broader community engagement and participation.

The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories (e.g. ICSU World Data System) in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.

 

Adoption Plan: A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 18 month timeframe before the WG is complete.

 

Representatives of various stakeholders groups who are prominent in the area of DMPs have already joined this working group, including:

  • DMPRoadmap

  • DMPonline / Digital Curation Centre

  • DMPTool / California Digital Library

  • ELIXIR data stewardship wizard

  • RDM Organiser

  • Islandora

  • Phaidra

  • Open Science Framework

  • Data Intensive Research Initiative of South Africa (DIRISA)

  • Belmont Forum e-Infrastructure and Data Management

  • DSA-WDS Core Trustworthy Data Repositories  

  • DMP OPIDoR

  • INESC-ID

  • INIA - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria

  • EDINA

  • Data Archiving and Networked Services (DANS)

These representatives have agreed to consider implementing the standards recommended by the working group in their respective tools. Some of them have already committed to active participation in the group and plan to adopt the outputs. We will continue to seek representatives from a variety of research communities to ensure that this working group’s deliverables are widely adopted.

 

Initial Membership: A specific list of initial members of the WG and a description of initial leadership of the WG.

Leadership:

  • Chair: Tomasz Miksa (SBA Research, Austria)

  • Co-chair: Paul Walk (University of Edinburgh, Great Britain)

  • Co-chair: Peter Neish (University of Melbourne, Australia)

Members/Interested (based on 9th Plenary volunteer list and subsequent calls):

  • Adil Hasan

  • Amir Aryani

  • Andreas Rauber

  • Andrew Janke

  • Anna Dabrowski

  • Antonio Sánchez-Padial

  • Christoph Becker

  • Cristina Ribeiro

  • Daniel Mietchen

  • Dessi Kirilova

  • Fernando Aguilar

  • Heike Görzig

  • Janez Štebe

  • Jens Ludwig

  • Jérôme Perez

  • Joao Aguiar Castro

  • João Cardoso

  • Jonathan Petters

  • Karsten Kryger Hansen

  • Lesley Wyborn

  • Madison Langseth

  • Marie-Christine Jacquemot-Perbal

  • Mark Leggott

  • Mustapha Mokrane

  • Myriam Mertens

  • Natalie Meyers

  • Nobubele Shozi

  • Paolo Budroni

  • Peter Doorn

  • Peter McQuilton

  • Peter Neish

  • Raman Ganguly

  • Rob Hooft

  • Sarah Jones

  • Stephanie Simms

  • Terry Longstreth

  • Timea Biro

  • Wim Hugo

     


[1] CERN workshop on Active DMPs: indico.cern.ch/event/520120/attachments/1302179/2036378/CERN-ADMP-iPRES206.pdf

 

Review period start:
Tuesday, 4 July, 2017
Custom text:
Body:

Interoperability is a wide concept that encompasses the ability of organisations to work together towards mutually beneficial and commonly agreed goals. The Working group is using the following definition from the EIF:  ‘An interoperability framework is an agreed approach to interoperability for organisations that wish to work together towards the joint delivery of public services. Within its scope of applicability, it specifies a set of common elements such as vocabulary, concepts, principles, policies, guidelines, recommendations, standards, specifications and practices.’

The working group aims to provide a common framework for describing, representing linking and publishing Wheat data with respect to open standards. Such a framework will promote and sustain Wheat data sharing, reusability and operability. Specifying the Wheat linked data framework will come with many questions: which (minimal) metadata to describe which type of data? Which vocabularies/ontologies/formats? Which good practices? 

Mainly based on the the needs of the Wheat initiatiative Information System (WheatIS) in terms of functionalities and data types, the working group will identify relevant use cases in order to produce a  “cookbook” on how to produce “wheat data” that are easily shareable, reusable and interoperable.

For more details download the Case statement

Review period start:
Tuesday, 25 June, 2013
Custom text:
Body:

Draft Case Statement to Create a Working Group Entitled “On-Farm Data Sharing”

 

 

Submitted to

Research Data Alliance

 

Submitted by

Tom Morris, PhD

Professor

Department of Plant Science and Landscape Architecture

University of Connecticut, Storrs, Connecticut, USA

thomas.morris@uconn.edu

 

Nicolas Tremblay, PhD

Research Scientist

Saint-Jean-sur-Richelieu Research and Development Centre

Agriculture and Agri-Food Canada

Saint-Jean-sur-Richelieu, Quebec, Canada J3B 3E6

nicolas.tremblay@agr.gc.ca

 

 

 

25 May 2017

 

 

Case Statement for On-Farm Data Sharing WG

 

Charter

 

Introduction and Rationale

 

Farmers have capabilities that they have never had before to critically evaluate management practices using field-scale replicated strip trials. Farmers have gained this powerful capability because yield monitors on combines enable accurate measurement of yields. Networks of farmers have been established around the world to exploit the potential of yield monitors to evaluate management practices at the field level. Networks of farmers have become increasingly common because farmers understand the power of evaluating management practices on their fields and across many fields in a similar agroecosystem. Scientists, and then policy makers, can also find value in data coming from a diversity of agroecosystems as previously unknown G x E x M (Genetics × Environment × Management Interactions) (Hatfield and Walthall, 2015) relationships could be derived from contrasted soil, climatic conditions, genotype evaluations, and farming practices.

 

Collection of results from strip trials across many farmers’ fields requires protocols for data stewardship, that is, for data reporting, sharing and archiving. Most farmer networks have developed data stewardship protocols. The protocols, however, vary from network to network, and the protocols are not easily accessible to people outside the networks. Creation of a standardized set of protocols for data stewardship that are publicly available, especially for confidentiality of the data and for sharing of data, would enable the pooling of results from many networks into one secure database. The protocols would be specific to on-farm research performed at a field scale with yields measured by yield monitors. Protocols developed for more general data collection by farmers such as the Thirteen Principles on Data Privacy and Security from the American Farm Bureau Federation, and those developed by the Agricultural Data Coalition will underpin these specific protocols. One big difference in the specific protocols we will create is that our protocols for on-farm research will include minimum data requirements, which other protocols for data stewardship do not include.

 

Questions to address in the protocols include life cycle, data quality, data infrastructure, formats, standards, protocols, archives, FAIR principles (Wilkerson, 2016), availability, provenance, stewardship, privacy, property rights, laws, confidentiality and governance. Creation of a standardized set of protocols also would promote the formation of new farmer networks and the collection of many more results from on-farm trials, which would greatly increase the value of a secure database. A secure database open to researchers from around the world based on the guidelines to be created by this WG would be an enormously valuable resource for farmers, farm advisors and policy makers.

 

As a first step, we aim at combining the results of thousands of field-scale replicated trials completed across a diversity of agroecosystems in the US Corn Belt. The Corn Belt covers much of the 65 million hectares of maize and soybeans planted in the US. This vast dataset would make possible new and previously unavailable analyses to improve productivity, profitability and environmental stewardship. One example of the type of research that could be completed with such a database is from a proposal submitted to the United States Department of Agriculture (USDA) by three farmer networks in the US. The three networks are seeking to develop an interactive, online tool for improved management of nitrogen (N) across the numerous agroecosystems in the Corn Belt of the US. The tool will provide information that farmers need to create locally adapted N recommendations. The tool will have four main components: 1) risk assessment of late-season deficient and excessive maize N status; 2) uncertainty of yield response in individual trials; 3) probability of an economic yield response for different levels of N fertilization, different N timings and fertilizer sources, different cropping systems, and observed rainfall and soil characteristics within fields based on aggregate data; and 4) statistical power analysis to estimate the number of locations and treatment replications needed to detect a specific yield response of interest to a farmer or agronomist.

 

The online tool will be based on the analysis of archived information from two types of data collected by three farmer networks: 1) 5,420 systematic surveys of the N status of maize fields from Ohio, Indiana, and Iowa across 13 years, and 2) 812 field-scale, replicated N rate on-farm trials in maize fields from Iowa, Ohio, Indiana, Illinois, Michigan, and South Dakota across 12 years.

 

This will be the first time data from different farmer networks shall be integrated if the proposal to USDA is funded. These data are only a small part of the data that resides in individual databases of farmer networks in the US. Only three of the six networks in the US were cooperators on this USDA proposal. The other networks were hesitant to contribute their data for several reasons, but the main reason was the lack of guidelines about who would have access to the data, for how long, and for what purposes. The data available in these farmer networks are not only the results of N rate trials but contain results from trials about fungicide effectiveness, plant population studies, the effectiveness of N stabilizers, effects of tillage on yield, and many other topics. 

 

One huge advantage of analyzing results from such a large database of fields over many years is that results can be displayed as probabilities. Typical N recommendations for grain crops are made with little to no estimate of the variability in N needs across fields. Because the variability in N needs across fields and years has been shown to be large (Dhital and Raun, 2016), current N recommendations are much less reliable than needed for widespread adoption by farmers.

 

An example of how results from large numbers of trials can be used to estimate the probability that a maize field will have deficient or excess N, and some of the factors affecting the N status at the field scale is shown in Kyveryga et al. (2013). This data set contained 56 field-scale, replicated, two-treatment studies over 2 years in the state of Iowa where the N fertilizer rate was decreased by 56 kg ha-1 compared with the rate normally applied by farmers who participated in the trials. The intent of this study was to help farmers decide whether they could profitably reduce their N rates by 56 kg ha-1. The results showed that the probability of increased profit with reduced N fertilizer was reduced by 35% when high amounts of rainfall occurred in June, but was increased by 20% when soil organic matter was high.

 

Other important advantages of combining data from farmer networks are that meta-analysis techniques are not needed because the data are the raw results from individual strips in the trials. Results from individual strips are preferable to aggregate data because more comprehensive data analyses can be performed to fully understand the treatment effect (Jones et al., 2009). Combined data also are of much greater value to other scientists such as economists who analyze data using different techniques and hypotheses than agronomists. And field-scale trials allow measurement of the effect of spatial variability within fields on yield and profit, which small research plot studies are not capable of measuring. Because management practices by farmers are greatly affected by spatial variation of soil properties (including topography) within fields, field-scale trials are needed to measure these effects.

 

Deliverables and Outcomes

 

The deliverables for the On-Farm Data Sharing WG will be:

  1. Minimum data requirements for field-scale, replicated strip trials completed by farmers using GPS-guided equipment including combines with calibrated yield monitors.
  2. Guidelines for collecting, handling, storage and formatting results and metadata from field-scale, replicated strip trials
  3. Guidelines for stewardship of data collected from field-scale, replicated trials completed on production grain fields, which will include guidelines for:
    1. Data accessibility
    2. Licensing options for allowable uses of the data (data sharing)
    3. Curation of the data
    4. Maintaining confidentially of the data

The outcomes for the On-Farm Data Sharing WG will be:

  1. Agreement among interested farmer networks in the world to place their data in one common database using the guidelines developed as part of the deliverables. The data and meta data managed by the 6 major farmer networks in the US1 will be the first data to populate the database with data from other networks in the US and in other parts of the world, especially from countries with many combines having yield monitors such as Canada, countries in western Europe, Australia, and Argentina who are interested to participate, added later.
  2. Submission of a proposal to the U.S. National Science Foundation or other funding organization for funding to clean and collate the data that will be entered into the database, and to create a common, secure database for the results of trials and for the field metadata.

When these outcomes are achieved, the guidelines established will serve as a baseline for other networks representing other agroecosystems to follow suit with their own adapted sets of requirements.

 

1 The 6 major farmer networks in the US are:

 

  1. On-Farm Network managed by the Iowa Soybean Association
  2. Adapt Network managed by Environmental Defense Fund
  3. Infield Advantage managed by the Indiana State Department of Agriculture
  4. On-Farm Research Network managed by the University of Nebraska Extension
  5. New York New York On-Farm Research Partnership managed by Cornell University
  6. K-State On-Farm Network managed by Kansas State University, Kansas State Research and Extension.

 

Value Proposition

 

Society will be the largest benefactor from implementation of the On-Farm Data Sharing WG outcomes. Grain crops will be grown with lower costs and less pollution. Specific benefactors will be farmers and farm advisors, scientists and policy makers.

The tangible benefits for each group are:

 

  1. Farmers and farm advisors will obtain more reliable and accurate recommendations for many management practices that are difficult or impossible to evaluate in small-plot research. Examples of management practices that are best evaluated on a field scale include: fertility management, especially N management; pest management; plant population management and interactions with fertility; soil and fertilizer enhancement products such as N stabilizers and products derived from humic acids, etc. Economic analysis of changes in management practices will also be more accurate and realistic with results from field-scale trials.
  2. Scientists will have access to reliable, replicated research results about the effects of changes in management practices on profit and the environment at an unprecedented scale, both geographically and numerically. Given the complexity and diversity of biological and physical conditions in agriculture fields, and the interactions of these conditions with the enormous number of management practices (types and degrees) used by farmers, large data sets of replicated, field-scale trials are needed to categorize practices into probabilities of success. Current methods of research are inadequate to create probability distributions of management practices by environment. The guidelines developed by this WG will enable scientists to publish much more reliable and accurate estimates of which management practices are best used in any environment. Also, a dataset of this quality and magnitude will be used to apply data mining algorithms and machine learning leading to further discoveries of potentially applicable decision rules.
  3. Policy makers will benefit by having access to more reliable conclusions about the effect of management practices on profit and the environment. This will enable policy makers to create better informed and effective programs for food production. 

 

Engagement

 

There are state and regional efforts to create databases of results of replicated field-scale trials to improve recommendations for management practices. The Iowa Soybean Association is the leader in this type of effort in the state of Iowa. The Indiana State Department of Agriculture’s INField Advantage program is modeled after the Iowa Soybean Association’s program, and the Environmental Defense Fund’s On-Farm Network is a similar program except their program crosses state borders to include trials from Ohio, Indiana, Michigan and Illinois. Members of these organizations are part of the International Society of Precision Agriculture’s Community entitled “On-Farm Data Sharing”, and representatives of these organizations will be part of the IGAD On-Farm Data Sharing WG.

 

The 4R Research Fund works to create databases of existing research on nutrient management, and to create new research to increase the size of the databases they are creating. This program is organized and run by the International Plant Nutrition Institute (IPNI). A scientist from IPNI is a member of the On-Farm Data Sharing Community and will be a member of the On-Farm Data Sharing WG.

 

One goal of the On-Farm Data Sharing WG will be to seek scientists from around the world who are working with farmers, either informally or formally in organizations, to implement replicated field-scale trials harvested by combines for the purpose of improving management practices.

 

Work Plan

 

A specific and detailed description of how the WG will operate including:

  1. The final deliverables of the On-Farm Data Sharing WG will consist of:
    1. Guidelines for minimum data requirements for field-scale, replicated strip trials completed by farmers using GPS-guided equipment including combines with calibrated yield monitors.
    2. Guidelines for collecting, handling, storage and formatting results and metadata from field-scale, replicated strip trials
    3. Guidelines for stewardship of data collected from field-scale, replicated trials completed on production grain fields, which will include guidelines for:
      1. Who has access to the data
      2. Allowable uses of the data
      3. Curation of the data
      4. Maintaining confidentially of the data
  2. Milestones for the WG include:
    1. September 2017. Acceptance of the WG Case Statement by the Research Data Alliance
    2. December 2017. Guidelines for minimum data requirements completed.
    3. February 2018. Guidelines for collecting, handling, storage and formatting results and metadata completed.
    4. April 2018. Guidelines for stewardship of data collected from field-scale, replicated trials completed.
    5. June 2018. Proposal submitted to a funder such as the U.S. National Science Foundation for funding to clean existing data, format data, and create a secure database for placement of data from 6 existing farmer networks.
    6. June 2018. Presentation of guidelines at the International Conference on Precision Agriculture.
    7. July 2019. Presentation of guidelines at the European Conference on Precision Agriculture.
    8.  

As the bulk of the contributors are coming from the crop science sector, it is unlikely that many of them will be attending the RDA plenaries. Our intention is to seize opportunities stemming from agronomical scientific meetings (for instance, before the launch of the OFDS-WG activities, a poster will be presented at the 11th European Conference on Precision Agriculture (ECPA 2017, July 16 – 20, 2017, Edinburgh, UK), and to work in a collaborative environment such as DropBox, which allows for fluid comment and changes to documents. It is expected that contributors will be meeting in person at the many conferences on crop science, agronomy or precision agriculture that are being held several times per year. Example of those are:

 

  • 7th Asian-Australasian Conference on Precision Agriculture (October 15 – 20, 2017, Hamilton, New Zealand)
  • American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America 2017 International Annual Meeting (October 22 – 25, 2017, Tampa, Florida)
  • 14th International Conference on Precision Agriculture (June 24 – 27, 2018, Montreal, Canada)

• A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation.

 

Consensus will be built by submitting all drafts of the guidelines and proposals to all members of the WG, and by providing sufficient time, usually 3 weeks, for review of the documents. Conflicts will be addressed by discussion and by building consensus through discussion and email exchanges. With members of the WG located distant from each other discussions will occur by using Skype. Face-to-face meetings will be held at RDA plenaries and at meetings such as the American Society of Agronomy’s annual meeting to build consensus. To stay on track and within the scope of the work plan, monthly email exchanges will occur to check on progress of writing the guidelines and proposals. 

 

• A description of the WG’s planned approach to broader community engagement and participation.

 

We will attend many conferences on crop science, agronomy and precision agriculture, and we will inform the agronomy community about the importance and status of ongoing and completed work within the WG.

 

Adoption Plan

 

Agreement among the 6 major farmer networks in the US to place their data in one common database by December 2018 using the guidelines developed as part of the deliverables has been established as an objective of the WG. Also, the submission of a proposal to the U.S. National Science Foundation or other funding organization for funding to clean and collate the data in each of the 6 major farmer networks in the US, and to create a common, secure database for the results of trials and for the field metadata is part of the plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community.

 

Initial Membership

 

Initial leadership:

 

Tom Morris                   U Connecticut                             Thomas.Morris@uconn.edu

Nicolas Tremblay          Agriculture Agri-Food Canada    Nicolas.Tremblay@agr.gc.ca

 

 

Initial members (TBC)

 

Bertin, Patricia              Embrapa, Brazil                          patricia.bertin@embrapa.br

Bonnet, Pascal              CIRAD, France                           pascalbonnet@cirad.fr

Ciampitti, Ignacio         K-State U                                     ciampitti@ksu.edu

Clay, David                    South Dakota State U                 david.clay@sdstate.edu

Craker, Ben                   AGCO                                          ben.craker@AGCOcorp.com

Ekpe, Sonigitu A.          Nigeria                                         sonigitu.ekpe@graduateinstitute.ch

Ferreyra, R. Andres      Ag Connections LLC                   andres.ferreyra@agconnections.com

Gullotta, Gaia               Bioversity International, Italy    

Hatfield, Gary               South Dakota State U                 gary.hatfield@sdstate.edu

Kyveryga, Peter            Iowa Soybean Association         pkyveryga@iasoybeans.com

Murrell, Scott               IPNI                                             smurrell@ipni.net

Neveu, Pascal               INRA                                          Pascal.Neveu@inra.fr

Rabe, Nicole                 Ontario Ministry of Ag                Nicole.rabe@ontario.ca

Reverte, Carmen          IRTA, Spain                                 carme.reverte@irta.cat

Soonho, Kim                 International Food Policy RI       soonho.kim@cgiar.org

Stavrataki, Maritina     Agroknow, Greece                       maritinastavrataki@agroknow.com

Stelford, Mark              Premier Crop                              mstelford@premiercrop.com

Thompson, Laura         U Nebraska-Lincoln                    Laura.thompson@unl.edu

Yost, Matt                     ARS – U Missouri                        Matt.Yost@ARS.USDA.GOV

 

References

 

Dhital, S., and W. R. Raun. 2016. Variability in optimum nitrogen rates for maize. Agronomy J. 108: 2165-2173.

 

Hatfield, J. L., and C. L. Walthall. 2015. Meeting global food needs: Realizing the potential via genetics × environment × management interactions. Agronomy J. 107: 1215-1226.

 

Jones, A.P., R. D. Riley, P. R. Williamson, A. Whitehead. 2009. Meta-analysis of individual patient data versus aggregate data from longitudinal clinical trials. Clinical Trials 6: 16–27.

 

Kyveryga, P. M., P. C. Caragea, M. S. Kaiser and T. M. Blackmer. 2013. Predicting risk from reducing nitrogen fertilization using hierarchical models and on-farm data. Agronomy J. 105: 85-94.

 

Wilkerson, M. D., M. Dumontier, I. J. Aalbersberg, G. Appleton, B. Mons et al. 2016. The FAIR guiding principles for scientific data management and stewardship. Nature. SCIENTIFIC DATA | 3:160018 | DOI: 10.1038/sdata.2016.18 1.

 

Review period start:
Tuesday, 6 June, 2017 to Friday, 30 June, 2017
Custom text:
Body:

This is not a real case statement. 

Review period start:
Tuesday, 30 May, 2017
Custom text:
Body:
Review period start:
Friday, 26 May, 2017
Custom text:

Pages