You are here

Body:

Please note - this text has been revised as part of the group review process. See the attached document for the latest case statement.  4 April 2018

 

Data Usage Metrics WG Case Statement

Research data is increasingly recognized as an important output of scholarly research, however, there are not yet standardized or comprehensive metrics for research data as there are for articles. Data Citations are necessary for the advancement of research data recognition, and initiatives in the RDA and scholarly communications space (eg. RDA/WDS Scholarly Link Exchange WG) are working to gain community adoption. Complementary to citations are data usage metrics (views, downloads). Both usage metrics and data citations are currently not yet counted and aggregated into clear metrics, as it is done for articles.

 

The “Make Data Count” project hosted a Birds of a Feather at RDA Plenary 10 (attended by 60 people) and recognized that many other groups are impacted by this initiative and there is widespread interest in implementing a standardized set of data usage metrics. Lending expertise from various projects and research stakeholders, this WG, a part of the Publishing Data IG, aims to harness community in-put and buy-in of data usage metrics and drive widespread adoption.

 

The WG intends to be international by design. The WG recognizes that for data usage metrics to be adopted at a global scale, representatives from all global regions need to help shape the WG outcomes. Therefore, the WG will actively recruit participants from all regions globally, including international initiatives (e.g.,. ANDS) and will be directing outreach globally through webinars, related conferences, and consistent communication.

 

Value Proposition

Adoption of Data Usage metrics is necessary for the recognition of research data as a first-class research output. Researchers will benefit from data usage metrics by being able to see the value-add of opening up their work, funders will be able to track the impact of their funding, publishers will see the relation and usage of articles with the underlying data, and repositories will be able to better serve their research communities.

 

Engagement with existing work in the area

This WG has members from the Make Data Count project, a Sloan grant between the California Digital Library, DataCite, and DataONE to build the technical infrastructure for data-level-metrics. For data usage metrics to work, the WG will need to leverage existing initiatives (CrossRef Event Data, DataCite) and work closely with RDA/WDS Scholarly Link Exchange WG. As part of the Publishing Data IG, the WG will work closely with the repository, publisher, and library communities to ensure data usage metrics have widespread community input and buy-in.

 

Work Plan

 

Goals and Deliverables (18 months):

  1. Community consensus on use cases and priorities for data usage metrics

  2. Development of a standardized recommendation for counting research data usage metrics and citations

  3. Community feedback and testing of data usage metrics in repositories

  4. Global repository adoption of standardized data usage metrics

  5. Researcher education around utilizing data usage metrics

  6. Collection of needs from research stakeholders for future iterations of data metrics

 

Milestones & Timeline:

  • RDA Plenary 11, Spring 2018

    • Share goals and deliverables and receive community input

    • Conversation on Goal 1: community consensus on use cases and priorities for data usage metrics

    • Share out of Goal 2: standardized recommendation for counting research data usage metrics and citations

  • Summer, 2018

    • Goal 3: Community feedback and testing of data usage metrics in repositories

    • Milestone: Adoption in a couple of repositories

  • RDA Plenary 12,  Fall 2018

    • Goal 4: Global repository adoption

    • Goal 5: Development of materials and outreach for awareness of data usage metrics

    • Intermediate deliverable: Development of use cases and progress report on adoption in repositories

  • Winter, 2019

    • Goal 4: Global repository adoption

    • Intermediate deliverable: Share out to community of progress in adoption of data usage metrics

    • Intermediate deliverable: Iteration of recommendation around data usage metrics

  • RDA Plenary 13, Spring 2019

    • Goal 6: Collection of needs from research stakeholders for future of data usage metrics

    • Report out to community on next steps and needs for data usage metrics as well as progress in 18 months of developing standards and driving adoption of data usage metrics

 

The WG plans to meet at each RDA for the next 18 months as well as web meetings monthly (when there is not an RDA Plenary). The WG would have consistent communication (via email listserv) throughout the 18 months with updates on deliverables as well as ongoing work in the data metrics space.

 

While there may be disagreements or difficulty building consensus in an area that does not yet have standards, the WG would be keen to involve as many members from the various communities involved in research data as possible (to illustrate the full landscape) and would work methodically to ensure the community is in agreement about each piece of the deliverables. Standards would be developed with community input. WG Plenary sessions would not be a presentation but rather a working meeting to use the time as a consensus building and feedback session to ensure deliverables are in line with community needs.

 

The RDA community is large and diverse but this WG would look beyond the WG to ensure researchers (end users) are involved and supportive, and to to drive adoption to as many repositories and institutions as possible. Utilizing connections from WG members and displaying successful implementation in WG member repositories, the WG would be focused on mass adoption and community buy-in.

 

Adoption Plan

Data usage metrics would be implemented within repository organizations that are involved in the WG and further pushed out to the larger repository community. Initially, data usage metrics will be implemented on the Dash platform (California Digital Library), DataONE repositories, and Mendeley Data (Elsevier) for early adoption. Implementation in these three spaces will show the broad scope of ways to implement and documentation about each will aid with mass-adoption. Institutions in the WG will be able to implement data usage metrics for institutional repositories and publishers will be able to display data usage metrics in relation to the corresponding article-level-metrics. Funders involved in the WG will be able to utilize these data usage metrics and push for grantees to be depositing in repositories with these metrics (and reporting back on them).

 

Review period start:
Wednesday, 27 December, 2017
Custom text:
Body:

At the GEO-XIV Plenary, GEO Secretariat Director Barbara Ryan stated that the "link from data provider research infrastructures to users will become more important" [1]. This statement opens several interesting questions. Who are the users? What data are provided by research infrastructures (RIs)? What kind of links are there between data provider RIs and users? How do RIs provide the data needed by users? Do users produce data? Are such derived data acquired by RIs, further curated, published, processed and used?

 

Users are of different kinds, ranging from individual researchers and research communities to industry, decision makers and the general public. Data are of different kinds as well, ranging from primary observational, experimental, or computational data to data derived in numerous activities performed by users along contextualized value chains. Data provided by RIs can be primary or derived. While users consume data provided by RIs, many groups of users surely also produce data. Such derived data should be acquired by infrastructures.

 

In this complex landscape, there is an important constant across infrastructures and domains that lies at the core of the OD2I IG. Along value chains, primary data are interpreted for their meaning in determinate contexts of scientific, industrial, or broadly societal relevance. Within the context of a particular value chain, primary data are uninterpreted. In contrast, meaningful data resulting from data interpretation are information, interpreted data. Primary data thus evolve to become contextually meaningful information further used for both scientific and nonscientific purposes.

 

With primary focus on (i.e., not exclusively for) observational data and environmental research infrastructures, the OD2I IG studies this constant. Building on collected use cases and existing conceptual frameworks, the OD2I IG advances understanding for how observational data evolve to information, ultimately integrated into bodies of knowledge about natural and human worlds.

 

In other words, the OD2I IG studies and advances understanding of the relatively unexplored interface between users and infrastructures in the data use phase of the research data lifecycle. It studies and models the roles of, and interactions among, human and computer agents; the data and information consumed and produced by agents in this phase; the performed activities and the systems supporting their execution.

 

The notion that primary data evolve to information (and knowledge) is increasingly common. Research infrastructures emphasize there is knowledge to gain through observation [2]. Earth observation satellites "provide critical information for global food security" [3]. The European Open Science Cloud (EOSC) is envisioned as an environment that enables turning ever increasing amounts of data "into knowledge as renewable, sustainable fuel for innovation in turn to meet global challenges" [4]. At the 2016 Fall Meeting of the American Geophysical Union (AGU), Rebecca Moore (Google) envisioned the possibility of monitoring a changing planet and "generating precise, actionable information and knowledge".

 

Following a proposal by members of the OD2I IG that suggests to adopt the Floridi framework [5, 6] with the notion of data interpretation borrowed from Aamodt and Nygård [7], the OD2I IG considers technical aspects of (semantic) information representation in systems and the management of explicit and formal semantics. Connecting data to users relies on systems capable of acquiring, curating and processing the meaning of data generated by users along value chains. A critical aspect is the mechanism for representing information in ways suitable for both machine-to-machine interaction and for presentation to and use by users. Since the ability to exploit any given information demands a specific knowledge on the part of the user, presentations need to consider both user type and intended purpose. Data provide a basis for building information that will lead to decision making. The transition from data to information involves processes of interpretation, in which meaning is attached to data. It is information, and its use against a background of prior knowledge that provides sufficient understanding to allow consequences of decisions to be foreseen.

 

The OD2I IG aligns to the mission and vision of RDA through the specific concern of socio-technical support for the extraction of information from primary observational data, activities that are primarily carried out by research communities as they make use of data in their everyday work. The OD2I IG will add value by working to realize information and knowledge-based systems layered above the current data systems, resulting in improved usability of data as information by both humans and machines. Of specific  emphasis is the outcome that machines are enabled in automated processing of information. The OD2I IG is committed to make a difference in this regard.

 

Please see the attached From Observational Data to Information Charter Statement for additional information on the Interest Group plans.

 

Review period start:
Wednesday, 27 December, 2017 to Saturday, 27 January, 2018
Custom text:
Body:

The Persistent Identification of Instruments RDA Working Group (PIDINST WG) seeks to propose a community-driven solution for globally unique and unambiguous identification of instruments instances that are operational in the sciences.

 

Please see https://www.rd-alliance.org/sites/default/files/case_statement/rda-wg-pi... for the full case statement for the PIDINST WG.

 

In her recent book, entitled “Big Data, Little Data, No Data” [1], Christine Borgman writes “To interpret a digital dataset, much must be known about the hardware used to generate the data, whether sensor networks or laboratory machines.” Borgman further highlights that “When questions arise [...] about calibration [...], they sometimes have to locate the departed student or postdoctoral fellow most closely involved.” This is a striking account for the role information about instruments plays in science and the costs of not being able to find and access such information.

 

The need to uniquely identify an instrument instance is rapidly growing in many research communities. Indeed, persistent identifiers enable unambiguous reference to digital representations of instruments, which has many potential benefits:

  • Metrics that quantify the use of instruments and the rationale for future funding
  • Link data to the instruments that generated them (provenance), improving the interpretation and validity of data
  • Aid equipment logistics and mission planning
  • Facilitate interoperability and open data sharing, especially in advancing technologies that foster sharing of instruments
  • Improve the discoverability and visibility of instruments and their data, published on the web.

 

Currently, there is no universal way to identify instrument instances. As the primary outcome, PIDINST WG contributes to establishing a cross-discipline, operational solution for the unique and lasting identification of active and decommissioned instruments. This case statement outlines the work planned for PIDINST WG.

 

Issues to be addressed

  • Instruments as physical entities - What is an instrument? Implications of identifying the instrument instance as a physical object versus identifying a digital information object (metadata) about the instrument. What do instruments produce, their real-world configurations, their relations to platforms and deployments, and the implications of instrument modifications to identification (new versions).
  • Granularity - Instruments can be parts of other (compound) instruments. For example, instruments can be manufactured with multiple bespoke sensor components, such as modular weather stations that simultaneously measure multiple meteorological variables. The granularity at which to reference and describe instrument instances (compound versus component) can vary for different stakeholders. How can these types of instruments be described in a generic way.
  • Use cases - Support the analysis of community requirements and inform the work carried out by PIDINST WG.
  • Metadata - Explore the types and sources of metadata that could be resolved under a PID and the difference between metadata registered at PID infrastructure provider (e.g. DataCite, ePIC, Crossref) vs. metadata at institutional instrument database provider. Develop a minimum common metadata schema for the registration of instruments with PID infrastructure providers.
  • Machine readability, interoperability, and provenance - Investigate the need and the requirements involved to make metadata (at the institutional level) machine readable and compatible with existing interoperable technologies. Provenance, in particular the relation between data and instruments that generated them, is another aspect to be addressed.
  • Landscaping - Explore the links, potential relationships and overlaps with instrument manufacturers, institutional instrument database providers, RDA groups and PID infrastructure providers.

 

Outcomes

The work of the PIDINST WG will contribute to the following outcomes. Note that these are long-term outcomes this WG aims at contributing to. This WG will not build a sustainable infrastructure for the persistent identification of instruments. It will merely contribute to specifying such infrastructure. The concrete deliverables of this WG are presented in the Work Plan.

  • A sustainable infrastructure will support the registration of instrument instances by submitting metadata about them and allowing for minting an instrument instance PID. The PID must follow agreed standards for persistent identifiers, e.g. long-lasting actionable, descriptive digital identifiers.
  • Improved understanding within research communities for how to describe instrument instances, including relations to other entities such as instrument model (type) or instrument deployment, the issue of identifying physical objects versus digital representations, and other related issues.
  • Collaborations with one or more PID infrastructure provider interested in implementing the approach to persistent identification of instruments proposed by the PIDINST WG.
  • Strong linkages to the activities of the RDA PID IG and other related RDA groups.

 

References

[1] Borgman, C.L. (2015). Big Data, Little Data, No Data. MIT Press.

 

Please see https://www.rd-alliance.org/sites/default/files/case_statement/rda-wg-pi... for the full case statement for the PIDINST WG.

 

Review period start:
Wednesday, 27 December, 2017 to Saturday, 27 January, 2018
Custom text:
Body:

Version 0.9, 9/17/2017

 

Name of Proposed Interest Group:   Physical Samples and Collections in the Research Data Ecosystem

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

Physical samples are a basic element for reference, study, and experimentation in research.  Tests and analysis are conducted directly on samples, such as biological specimens, rock or mineral specimens, soil or sediment cores, plants and seeds, water quality samples, archeological artefacts, or DNA and human tissue samples, because they represent a wider population or a larger context. Other physical objects, such as maps or analog images are also direct objects of study, and, if digitized, may become a source of digital data. There is an urgent need for better integrating these physical objects into the digital research data ecosystem, both in a global and in an interdisciplinary context to support search, retrieval, analysis, reuse, preservation and scientific reproducibility. This group aims to facilitate cross-domain exchange and convergence on key issues related to the digital representation of physical samples and collections, including but not limited to use of globally unique and persistent identifiers for samples to support unambiguous citation and linking of information in distributed data systems and with publications, metadata standards for documenting samples and collections and for landing pages, access policies, and best practices for sample and collection catalog, including a broad range of issues from interoperability to persistence.

A growing community of stakeholders, comprising domain scientists, collection curators, information scientists, data managers, all working at the interface with computational science, are developing detailed practices and standards around identifiers, vocabularies, and software interfaces, which are necessary for wider community application. Publishers and funders represent additional stakeholders interested in best practices for sample citation and registration of sample metadata in online catalogs that are fundamental for reproducibility of sample-based data and future use of valuable collection specimens. Currently, these efforts are fragmented, as is the communication of technical solutions and organizational best practices. This IG will support cross-disciplinary and international dialog helping to build technical and social bridges among a broad range of stakeholders to align and coordinate ongoing efforts, strengthen solutions, and broaden their adoption.

At RDA Plenary 4 and Plenary 6 Bird of Feather sessions were held that already gathered an international and multi-disciplinary group of stakeholders. A preliminary case statement was reviewed by participants in the P6 BoF and informed the current version.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

Best practices, standards, and infrastructure are needed to properly link physical samples and collections to digital data generated by their study or to features in the real world. Samples need to be cited with globally unique, persistent, and resolvable identifiers in publications to ensure that they can be unambiguously linked to online metadata profiles (landing pages) and to other data generated by other studies of the same sample. Scientists want to search for data for a given sample across the entire literature. This can now be achieved as sample PIDs can be included in publication DOIs or data DOIs as related identifiers that can be harvested and searched through systems like SCHOLIX. Scientists also want to find out where a given sample can be accessed to reproduce data or add new measurements to the available knowledge about a sample. Both the approaches to, and maturity of technical and organizational solutions and infrastructure differ across the many disciplines that work with physical samples. Diverse and uncoordinated practices make it difficult to advance the adoption of best practices that link physical samples to the digital research data ecosystem. Further, commercial software providers for museum and collection catalogs and publishers are reluctant to implement best practices if they are different and incompatible across domains.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

RDA presents a multi-disciplinary and international community engaged in research data management that presents a unique opportunity for the goals of this IG. The objectives of this IG are:

  1. Identify commonalities and diversities across the stakeholders and establish prioritized action items that are appropriate for Working Groups. Relevant issues are: unique sample identifiers; sample documentation including vocabularies and taxonomies and alignment with international metadata standards; sample registration and interoperability of digital online catalogs; policies for sample citation in publications; and access to samples and sample metadata.
  2. Identify and characterize existing systems and solutions relevant to linking physical samples with digital research data; identify gaps and challenges.
  3. Facilitate international cooperation to develop harmonized approaches and best practices for physical object identification and digital curation; enable the facilitation of object and sample identification infrastructure both at the national and international levels.
  4. Build linkages between object repositories and museums, digital data repositories, scientific publications, museum software providers, and science communities.
     

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

Communities that will be involved in this IG range from museum and collection curators, to research data managers, researchers in domain sciences, information sciences, and computer sciences, to publishers and funders. Various workshops happened over the last few years that brought together stakeholders primarily from interested in the topic of Physical Samples in the Digital Research Ecosystem, including:

  • Linking Environmental Data and Samples, CSIRO, Australia, May 2017
  • Physical Samples and Digital Collections, iConference, China, March 2017
  • Physical Samples, Digital Collections, ASIS&T Conference, Denmark, October 2016

In the previous two BoF sessions at RDA P4 and P6 the following communities were represented:

  • Biodiversity
  • Oceanography
  • French science archive
  • Australian meteorology - water quality sampling and provenance
  • German Research Center, library
  • European PID network
  • Geological Society
  • Kew Gardens
  • CDL - neurobiology, Berkeley museum
  • Agricultural research in Italy, soil samples
  • Zoology and environmental science
  • Provenance and workflows, biodiversity workflows
  • Material science, Air Force
  • Ethnography
  • Natural History Museums
  • National Repositories

 

We will facilitate workshops at ASIS&T, JCDL, SPNHC, and domain-specific conferences such as AGU for the Earth and Space Sciences to broaden participation and dissemination of outcomes.

We will work with the following organizations to engage relevant communities.

  1. International Geo Sample Number IGSN e.V. (Kerstin Lehnert, Jens Klump; http://www.igsn.org) - Global implementation organization for unique sample identifiers, members if 5 continents)
  2. Global Biodiversity Information Facility (Donald Holborn)
  3. Taxonomic Data Working Group (John Wieczorek, http://www.tdwg.org)
  4. DISSCO (Distributed System of Scientific Collections, http://dissco.eu)
  5. AuScope (Lesley Wyborn, http://www.auscope.org.au/)
  6. EPOS (Kirsten Elger, https://www.epos-ip.org/ )
  7. SPNHC (Society For The Preservation of Natural History Collections, http://www.spnhc.org)
  8. DataCite (https://www.datacite.org)
  9. CODATA Task Group on Coordinating Data Standards amongst Scientific Unions (Marshall Ma, http://www.codata.org/task-groups/coordinating-data-standards )
  10. ESIP (Earth Science Information Partners) (Erin Robinson, http://www.esipfed.org)
  11. Scientific Collections International (SciColl, http://scicoll.org)

 

Related RDA groups

●      TAB

●      WG/IG Chairs

●      Biodiversity Data Integration IG

●      Long tail of research data IG

●      PID IG

●      Research Data Provenance

●      RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

  1. A report that synthesizes existing best practices for digital curation and sharing of physical samples from disparate disciplines and institutions.
  2. A journal special volume on sample and collection management in the research data ecosystem (journal TBD).
  3. Creation of RDA Working Groups to develop recommendations for best practices and standards related to sample unique identifiers, sample metadata, and sample citation, such that they can be linked with data and publications derived from them.
  4. Joint sessions with other RDA groups such as Biodiversity Data Integration IG, Long Tail of Research Data IG, PID IG, Research Data Provenance, and others as appropriate for knowledge exchange, to align with emerging relevant standards, and to promote recommendations from the IG.
  5. Facilitation of collaborations that advance interoperability between collection catalogs, sample registries, data repositories, and publications for improved data sharing across disparate disciplines, through e.g., alignment of sample metadata with existing metadata standards.

 

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • Primary mechanism for communication will be email with periodic (quarterly web conferencing) meetings, plus sessions at plenary meetings.
  • We will also leverage other meetings such as EGU, AGU, SciDataCon, and ESIP
  • Knowledge gathering and capture will be via RDA IG web site. We may use other collaboration tools as appropriate, e.g. wiki’s, or tools such as GitHub, or Center for Open Science Open Science Framework.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

September 2017 - P10 session: BoF session, presentation of Case Statement

December 2017 - AGU meeting and progress report

March  2018 - P11 session, evaluate progress, revisit workplan

September 2018 - P12

 

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):  Bold indicates co-chairs

 

FIRST NAME

LAST NAME

EMAIL

Kerstin

Lehnert (co-chair)

lehnert@ldeo.columbia.edu

Lesley

Wyborn (co-chair)

lesley.wyborn@anu.edu.au

Jens

Klump (co-chair)

Jens.Klump@csiro.au

Simon

Cox (co-chair)

Simon.Cox@csiro.au

Helen

Glaves

hmg@bgs.ac.uk

Rowena

 Davis

rowenaidavis@email.arizona.edu

Markus

Stocker

mstocker@marum.de

Lindsay

Powers

lpowers@usgs.gov

Christopher

Lenhardt

clenhardt@renci.org

Denise

Hills

dhills@gsa.state.al.us

Dirk

Fleischer

dfleischer@kms.uni-kiel.de

Kirsten

Elger

kelger@gfz-potsdam.de

Wim

Hugo

wim@saeon.ac.za

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Sarah

Ramdeen

ramdeen@email.unc.edu

John

Wieczorek

tuco@berkeley.edu

Leslie

Hsu

lhsu@usgs.gov

Donald

Hobern

dhobern@gbif.org

Nicky

Nicholson

n.nicolson@kew.org

Unmil

Karadkar

unmil@ischool.utexas.edu

Ashlee

Dere

adere@unomaha.edu

Nicholas

Car

Nicholas.Car@ga.gov.au

Anusuriya

Devaraju

anusuriya.devaraju@csiro.au

Sean

Toczko

sean.jamstec@gmail.com

Lynne

Yarmey

yarmel@rpi.edu

Dawn

Wright

DWright@esri.com kelger@gfz-potsdam.de

Marshall

Ma

max@uidaho.edu

 

The following are additional potential participants who attended the previous BoF sessions at P4 and P6: 

 

FIRST NAME

LAST NAME

EMAIL

Institution

Aaron

ADDISON

 

Washington University in St Louis

Arturo

ARIÑO PLANA

artarip@unav.es

University of Navarra

Toshihiro

ASHINO

ashino@acm.org

Toyo University

Sven

BINGERT

sven.bingert@gwdg.de

GWDG

Daphne

DUIN

daphne.duin@naturalis.nl

Naturalis Biodiversity Center

Ian

FORE

 

National Council Institute (NIH)

Kazu

FUKUDA

 

JAMSTEC

Bryon

FOSTER

Bryon.Foster@us.af.mil

USAF/AFRL

Margaret

FOTLAND

m.l.fotland@admin.uio.no

University of Oslo

Jason

JACKSON

 

Indiana University; Mathers Museum of World Cultures

John

KRATZ

John.Kratz@ucop.edu

California Digital Library

Giovanni

L'ABATE

giovanni.labate@entecra.it

Consiglio per la ricerca in agricoltura e l'analisi dell'economia agraria (CRA-ABP) Research centre for agrobiology and pedology

Bertram

LUDAESCHER

ludaesch@gmail.com

University of Illinois, Urbana-Champaign

Paolo

MISSIER

Paolo.Missier@ncl.ac.uk

Newcastle University

Magalie

MOYSAN

magalie.moysan@univ-paris-diderot.fr

Université Paris Diderot

Fiona

MURPHY

fionalm27@gmail.com

Research Consultant

Ritsuko

NAKAJIMA

rnakajim@jst.go.jp

Japan Science and Technology Agency

Nicky

NICOLSON

n.nicolson@kew.org

Royal Botanic Gardens, Kew

Kunihao

NIWA

 

Research Organization of Information and Systems (Japan)

Yoshinori

OCHIAI

 

JST

Carole

PALMER

 

iSchool Washington

Paul

SHEAHAN

sheahanpaul@hotmail.com

Sheahan

Paola

TAROCCO

ptarocco@regione.emilia-romagna.it

Geological, Seismic and Soil Survey. Emilia-Romagna Region (Italy)

Anne

THESSEN

annethessen@gmail.com

The Data Detektiv

Nicholas

WEBER

nmweber@uw.edu

University of Washington

Matt

WOODBURN

 

Natural History Museum London

Themis

ZAMANI

sakka@grnet.gr

GRNET

Carlo

ZWÖLF

carlo-maria.zwolf@obspm.fr

Observatoire de Paris

Review period start:
Thursday, 21 September, 2017
Custom text:
Body:

** PLEASE NOTE - The following text has been deprecated in favor of the revised Charter Statement attached to this page - 6 June 2018 **

 

 

CODATA/RDA Research Data Science Schools for Low and Middle Income Countries IG

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The goals of this RDA Interest Group are to continue the information sharing, targeted outreach and community collaboration with RDA members about the CODATA-RDA Schools for Research Data Science. At this point, there have been two very successful schools hosted by the International Center Theoretical Physics in Trieste :- the first being held in August of 2016 and the second in July of 2017. Both were held in Trieste, Italy. There are two planned upcoming events, one in Sao Paulo in December 2017 and a third Trieste school in August 2018. We are also looking into an African school in 2018. We plan to continue providing open and evolving curriculum materials, creation of a practical framework for hosting regionalized instances of the course, and focus on train-the-trainer concepts to grow regional capacity. This is aligned with the RDA mission in that it enables data sharing through its training and adds value to the RDA community by teaching some of the outputs of the RDA in Research Data Management.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

The school curriculum focuses on Open Data and FAIR practices, ethical data use, and builds a foundation of Data Science skills for early career researchers in all disciplines. The attending researchers are given priority based on a World Bank ranking of Low or Middle Income Countries (LMICs), so the focus is on resource constrained researchers. This specifically speaks to the RDA Vision of “…researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society.” In the past RDA as an organization and its recommendations and outputs have been introduced to the school’s students. While the events thus far have targeted LMICs, the curriculum should have universal application for Early Career Researchers (ECRs) worldwide.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

·      Continuing to provide successful School for Research Data Science Events.

·      Creating a framework for hosting of regional events.

·      Providing ECRs with a foundation of skills necessary to thrive in an Open Science and Open Data environment.

·      Grow a base of worldwide trainers prepared to lecture, mentor, and provide support to future worldwide and regional events.

·         Continually evolve the foundational Data Science curriculum designed for this course, along with making it accessible and reusable for other projects with similar goals.

This is distinct from other groups within the RDA.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

·      The communities targeted for students are ECRs in LMICs with interest of need of a foundation in Data Science skills and resources. They should also embrace the concepts of Open Science and Open Data.

·      The communities targeted for lectures and mentors are any Research Data Science professionals or academics interested in developing the next generation of Data Scientists in LMICs. This includes, but in not limited to RDA Members.

·      NGO and Corporate sponsorship will need be leveraged for successful hosting of events.

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

·      Continued successful instances of the RDA/CODATA School for Research Data Science.

·      Continued growth of lecturer and mentor base to regionalize training events and reduce the cost per event by leveraging local experience and talent.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

Regular event planning meetings will be held prior to upcoming scheduled events and targeted outreach events. A CODATA Task Group will continue to meet about governance, funding, curriculum evolution and operation of the project. We will leverage the RDA IG as an information sharing and recruiting platform.

 

Timeline (Describe draft milestones and goals for the first 12 months):

September 2017: RDA Plenary Final CODATA/RDA Summer School for Research Data Science WG Session has been accepted.

December 2017: 1st São Paulo, Brazil instance of the school.

August 2018: 3rd Trieste, Italy instance of the school.

Review period start:
Custom text:
Body:

CODATA/RDA Interest Group Charter

 

Updated version, dated 15 December 2017, endorsed 6 Feb 2018

The original version of the Charter can be found at the end of this page, and also attached.

 

Name of Proposed Interest Group: Mapping of the Landscape of Research Data Activities

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The Internet now connects research data, computer resources and software from globally distributed resources in real time. Where on planet Earth these resources are geographically located is irrelevant, but to enable online access to them, there is a rising need for programmatic access to both data, and to software to find and process data across institutional, domain and national boundaries. This requires the development of standardized machine-to-machine interfaces that loosely couples data and software through agreed formats, interfaces, vocabularies and ontologies, preferably across multiple domains. The complexity of these online infrastructures require that they are built by much wider communities, through effective cooperation and governance, to enable new and innovative forms of interdisciplinary science from globally accessible data stores.

 

The time is ripe for identifying the key communities and partnerships within the major scientific domains that are developing digital research infrastructures that enable sharing and processing of scientific data and ‘Mapping Of the Landscape’ (MOL) of these activities to further improve collaborations and partnerships, particularly those ‘umbrella’ alliances that are enabling interdisciplinary data sharing.  The key advantage of better Landscape Mapping is that researchers and infrastructure developers will know who is doing what where and hopefully avoid unintended duplication. Further, where duplication of activities are discovered, it is hoped that once groups are aware of equivalent activities, that MOL IG can help become a conduit where these groups can connect, share experiences and learn from each other to improve coordination. The IG will concentrate on efforts applying to digital work products; not physical structures. However, some geographical maps may be used to portray informatic connections between these entities.

 

At RDA Plenary 8 and Plenary 9, sixteen groups were identified undertaking “MOL’’ activities across a variety of data infrastructures and organisations. This not only reinforced that it was logical to attempt to coordinate all these MOL activities, but at the same time highlighted there was no agreed process on how to undertake a MOL activity so that outputs could be synthesised and leveraged.

 

 Key points identified at the P8 and P9 meetings were:

  1. There were actually a significant number of MOL activities being undertaken;
  2. That there was a diversity of research data infrastructures that each activity was trying to map (technology, data/information, computational systems, etc);
  3. There was no agreed vocabulary or ontology to describe what research data infrastructures that each MOL is reviewing in a consistent way; and
  4. There was a diversity of tools that were being used - each had different functionalities and the tool chosen was influenced to some extent by the type of MOL being undertaken.

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

The MOL IG is surveying the community of landscape mappers to put together a current list of projects and a legend of vocabularies and visualization tools. The primary purpose of these lists is to increase awareness of current project and existing tools. Additional purposes are to enable current mappers to document and evaluate the differing methodologies/tools and vocabularies used by MOL- mappers (MOLers), share goals, and start to determine shared practices to enable future mapping projects to identify gaps and to align their tools and vocabularies to existing projects.

 

Ultimately, if the underlying data sets are sufficiently standardised  it should be possible to crosswalk between and interrogate across multiple MOLs.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.  Articulate how this group is different from other current activities inside or outside of RDA.):

 

  1. Develop a structured web page with a catalogue of MOL activities related to identifying research data infrastructures: this catalogue can be self populated by any MOL activity;
  2. Provide a key list of methodologies, tools, workflows, etc. being used; and
  3. Provide a key list of potential map indexes (vocabularies, ontologies, data models).

 

This group was partially informed by the RDA Atlas of Knowledge (AOK) and the RDA Technical Advisory Board (TAB) Landscape Overview Group (LOG) mapping exercises, though in contrast to this activity, the proposed MOL IG will focus on activities eternally to RDA and at a higher organizational level.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

Please refer to the MOL spreadsheet (available here); it covers a diversity of data infrastructure mapping projects, standards, repositories, organizations. Mappers and map projects across a diversity of domains, including health, arctic, earth sciences, environment, agriculture, and e-infrastructures (Note: this spreadsheet also contains the list of tools and vocabularies/ontologies.)

 

Related RDA groups

  • TAB
  • WG/IG Chairs
  • All RDA groups are indirectly related to this project
  • Education and Training on handling of research data IG (MOL outputs could be used as training tools or used to identify tools.  Many of these maps are considered onboarding materials.
  • Data Foundations and Terminology IG (could advise on foundation vocabularies).

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

  • Generating a dynamic list of MOL activities
  • Developing a portfolio of mapping methodologies, tools, to document best practice for those wanting to undertake MOLs
    • Comparing/contrasting strengths and weaknesses of each
  • Developing a list of potential vocabularies/ontologies that can be used as potential legends in MOL exercises
  • Example outcomes:
    • Recommendations for others working on ‘mapping the landscape’ activities to increase alignment and possible future integration.
    • Promotion of knowledge of existing exercises and limit duplicate efforts
    • Identification of knowledge gaps.

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • AGU (December) /EGU (April) meetings - both are large international science meetings and attracted many interested MOL individuals this past year.  We plan to continue taking advantage of these gatherings as a feasible in-person discussion venue. This is however concentrated on Geosciences/Environmental/Earth Systems sciences only.
  • Regular telecons and emails.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

March  2018 - P11 session  - introduce new MOLers and road test the tools and vocab resource lists.

September 2018 - P12 session - introduce new MOLers and evaluate if cross MOL mappings can be technically undertaken, and/or automated.

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):  Bold indicates co-chairs

 

FIRST NAME

LAST NAME

EMAIL

Rowena

Davis

rowenaidavis@email.arizona.edu

Lesley

Wyborn

lesley.wyborn@anu.edu.au

Ari

Asmi

ari.asmi@helsinki.fi

Steve

Diggs

sdiggs@ucsd.edu

 Helen

Glaves

hmg@bgs.ac.uk

 Peter

Pulsifer

Peter.Pulsifer@colorado.edu

Lindsay

Powers

lpowers@usgs.gov

Lynn

Yarmey

yarmel@rpi.edu

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Dawn

Wright

DWright@esri.com

Jonathan

Petters

jpetters@vt.edu

Leslie

Hsu

lhsu@usgs.gov

Rebecca

Koskela

rkoskela@unm.edu

Ma

Marshall

max@uidaho.edu

McQuilton

Peter

peter.mcquilton@oerc.ox.ac.uk

Colleen

Strawhacker

colleen.strawhacker@colorado.edu

Danie

Kincade

dkinkade@whoi.edu

Denise

Hills

dhills@gsa.state.al.us

Erin

Robinson

erinrobinson@esipfed.org

Fiona

Murphy

fionalm27@gmail.com

Gary

Berg-Cross

gbergcross@gmail.com

Mustapha

Mokrane

mustapha.mokrane@icsu-wds.org

Leslie

 McIntosh-Barelli

borrel2@rpi.edu

Mark

Parsons

parsom3@rpi.edu

Mohan

Rammamurthy

mohan@ucar.edu

Sara

Graves

SGraves@itsc.uah.edu

Simon

Lambert

simon.lambert@stfc.ac.uk

Xin

Mou

mou1609@vandals.uidaho.edu

Sophie

Hou

hou@ucar.edu

 

 


 

PLEASE NOTE - the following text was the original version of the Charter and has been deprecated in favor of the above text (which is also attached to this page).  

 

 

Name of Proposed Interest Group:   Mapping of the Landscape of Research Data Activities

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

The Internet now connects research data, computer resources and software from globally distributed resources in real time. Where on planet Earth these resources are geographically located is irrelevant, but to enable online access to them, there is a rising need for programmatic access to both data, and to software to find and process data across institutional, domain and national boundaries. This requires the development of standardized machine-to-machine interfaces that loosely couples data and software through agreed formats, interfaces, vocabularies and ontologies, preferably across multiple domains. The complexity of these online infrastructures require that they are built by much wider communities, through effective cooperation and governance, to enable new and innovative forms of interdisciplinary science from globally accessible data stores.

 

The time is ripe for identifying the key communities and partnerships within the major scientific domains that are developing infrastructures that enable sharing and processing of scientific data and ‘Mapping Of the Landscape’ (MOL) of these activities to further improve collaborations and partnerships, particularly those ‘umbrella’ alliances that are enabling interdisciplinary data sharing.  The key advantage of a better Landscape Map is that researchers will know who is doing what where and hopefully avoid unintended duplication. Further, where duplicate or more activities are discovered, it is hoped that once groups are aware of equivalent activities, that MOL IG can help become a conduit where these groups can connect, share experiences and learned from each other to improve coordination and avoid any more duplication of effort.    

 

At RDA Plenary 8 and Plenary 9 sixteen groups were identified undertaking   “MoL’’ activities across a variety of data infrastructures and organisations. This not only reinforced that it was logical to attempt to coordinate all these MoL activities, but at the same time highlighted there was no agreed process on how to undertake a MoL activity so that outputs could be synthesised and leveraged

 

 Key points identified at the P8 and P9 meetings were:

  1. There was no agreed vocabulary or ontology to describe what research data infrastructures that each MoL is reviewing in a consistent way; and
  2. That there was a diversity of infrastructures that each was trying to map (technology, data/information, computational systems, etc).

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

MOL activities identified so far are both within and across many scientific domains. These have similar goals and host parallel working groups that support the mission of advancing scientific research through data interoperability. Several are looking for common ‘mapping’ methodologies so that ‘maps’ created by multiple groups can be interconnected and results shared.

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

  1. Develop a web page with a catalogue of MOL activities related to identifying research data infrastructures;
  2. Develop a synthesis of existing MoL activities for research data infrastructure activities within and beyond RDA;
  3. Investigate mapping practices including methodologies, tools, workflows, etc. and identifying whether any key pieces are missing; and
  4. Discuss opportunities for collaborations on existing MoL exercises.

 

This group was partially informed by the RDA Atlas of Knowledge and TAB LOG mapping exercises, though in contrast to this activity, the proposed MoL IG will focus on activities  eternally to RDA and at a higher organizational level.

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

  1. Arctic Data Committee Landscape Exercise (Peter Pulsifer, http://arcticdc.org/products/data-ecosystem-map)
  2. EarthCube (http://www.arcgis.com/home/item.html?id=9bde7150da474c828d61a5e67e98855d, http://www.goring.org/resources/neo4j_engagement.html)
  3. ESRI mapping tool (http://dusk.geo.orst.edu/ec-story) - developed by Dawn Wright of ESRI that was used to map the location of, and types of communities within EarthCube
  4. Belmont Forum (Rowena Davis)
  5. Atlas of Knowledge (Simon Lambert, RDA/EU http://core.cloud.dcu.gr/rda_aok/ )
  6. AuScope (Lesley Wyborn)
  7. CODATA Task Group on Coordinating Data Standards amongst Scientific Unions (Marshall Ma, http://www.codata.org/task-groups/coordinating-data-standards )
  8. TAB LOG (Steve Diggs,  [link]
  9. RDA Education IG connection?  (Sophie Hou pointed to Amy Nernburger’s education landscape survey as a possible connection point at the AGU in-person meeting)
  10. USGS Community for Data Integration (CDI) (Leslie Hsu, CDI wiki). Current working groups include Tech Stack, Semantic Web, Data Management, Citizen Science, Mobile App, and more. CDI Community can be engaged through Leslie Hsu, who coordinates communication to the 500+ members from within and outside of USGS. We have some initial coordination such as joint Tech Dive monthly calls with ESIP, and are interested in leveraging more opportunities, events, etc. to reduce redundancy and bring information to our members. Can serve as link to USGS data assets.
  11. RISCAPE (European Research infrastructures in the international landscape) (Ari Asmi)
  12. ESIP (Earth Science Information Partners) (Erin Robinson)

 

Related RDA groups

  • TAB
  • WG/IG Chairs
  • Education and Training on Handling of Research Data IG
  • Brokering IG
  • Data Foundations and Terminology IG

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

Given that the landscapes of interest are eternally changing making a map or maps virtually impossible to keep current, this IG will instead focus on more manageable areas of alignment.

  • Example WG topics:
    • Developing a vocabulary/ontology to describe components of research data infrastructures (that this does not exist has been a huge stumbling block for the MoL IG)
    • Mapping methodologies to document best practice for those wanting to undertake MoL’s
    • Developing a portfolio of Landscape mapping tools and comparing/contrasting strengths and weaknesses of each
    • Example outcomes:
    • Recommendations for others working on ‘mapping the landscape’ activities to increase alignment and possible future integration.
    • Promotion of knowlege of existing exercises

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

  • ESIP meetings - ESIP runs two meetings each year in the US, one in Summer and one in Winter.Their off-Plenary schedule directly complements the RDA calendar and distance virtual meeting options are supported as part of the meetings.
  • AGU meetings - The AGU Fall Meeting is a large international science meeting and attracted many interested MoL individuals this past year.We plan to continue taking advantage of this gathering as a feasible in-person discussion venue.

 

Timeline (Describe draft milestones and goals for the first 12 months):

    

September 2017 - P10 session: Mini Summit and consolidation of work plan

December 2017 - AGU meeting and progress report

March  2018 - P11 session and revisit workplan

September 2018 - P12

 

Review period start:
Friday, 1 September, 2017
Custom text:
Body:

Introduction

This interest group will provide a forum to discuss issues on management, sharing, discovery, archival and provenance of software source code. The group will pay special attention to source code that generates research data and plays an important role in scientific publications. The Research Data Alliance (RDA) mission is to build the social and technical bridges that enable open sharing of data. Software (as source code and executables) and data are intrinsically linked, both to ensure continued creation, analysis and reuse of data and also to preserve the knowledge of the software development, relationships with other assets and the context in which it was created.

This IG adds value to the RDA community by channeling expertise in software development, sharing, management, versioning, reproducibility and preservation into RDA, and into the RDA groups which could benefit from this expertise.
 
User scenario(s) or use case(s) the IG wishes to address

Software source code plays a critical role in all fields of modern research, where source code is written and developed to address a variety of needs, like cleaning, processing and visualising data. Software source code is a necessary component for research reproducibility and reusability. Thus software source code should be properly curated in the same way as other research inputs and outputs such as research data and paper publication. Software source code developers and organisations that sponsor software development should also be properly credited and attributed. 
 
Objectives

This interest group focuses on software source code as a first class citizen in the landscape of scientific research, related to but distinct from research data. The group’s objective is to bring together entities and individuals with complementary expertise and different use cases in order to address the following:

  • Develop a consistent metadata profile for discovery of software, source code, algorithms and other software artefacts
  • Review existing metadata for describing source code if they are already in place, especially those metadata that link source code to data and research publication;
  • Investigate if there is a need for additional specific metadata for software in order to make it citable, findable and accessible
  • Review existing schemas for identifying software artefacts
  • Identify and promote an identification schema specifically adapted to track software artefacts
  • Collect and publish use cases of current examples and practices
  • Develop guidelines for managing, describing and publishing software source code
  • Liaison with other groups in RDA which express interest in issues specifically related to software source code

Participation

This group is open to all RDA members to participate. 
This group will interact with the  following relevant RDA IGs/WGs:

  • Research data provenance IG&WG
  • PID kernel information WG
  • Reproducibility IG
  • Metadata IG
  • Preservation Tools, Techniques, and Policies IG
  • Virtual Research Environment IG (VRE-IG)
  • Data versioning IG
  • Data Citation WG

And other IGs/WGs if they become relevant to this group.

The group will also liaison with outside expertise on software that will be beneficial for RDA, like WSSSPE, FORCE11 (the software citation work in particular), the Software Sustainability Institute, the Software Heritage initiative, journals that publish software, and relevant national and international initiatives.
 
Outcomes

Provide an extensive background for RDA members on software source code development, sharing, management, versioning, reproducibility and preservation in order to foster the emergence of shared standards across the research community on how to describe, identify, find and attribute software source code.
 
Mechanism

This group will coordinate activities and communicate through following means:

  • Monthly teleconference to discuss specific issues
  • Asynchronous collaboration through Google docs, RDA mailing list and wikis
  • Inform other relevant RDA IG/WG of the group’s ongoing activities through RDA group mailing lists
  • Hold face-to-face interactions within and across groups at RDA plenaries.      

Timeline
 
In the first year, we plan to set up an active discussion in three key areas: metadata, identifiers, and use cases.
 

Potential Group Members

  • Benoit Baudry
  • Daniel S. Katz
  • Fernando Rios
  • Gribonval Rémi
  • Ian Bruno
  • Jen Martin
  • Jonathan Tedds
  • Julia Collins
  • Lesley Wyborn
  • Martin Hammitzsch
  • Martin Monperrus
  • Michelle Barker
  • Mingfang Wu
  • Morane Gruenpeter
  • Neil Chue Hong        (co-chair)
  • Roberto Di Cosmo   (co-chair)
  • Sandra Gesing
  • Stefanie Kethers
  • Victoria Stodden
Review period start:
Monday, 28 August, 2017
Custom text:
Body:

Name of Proposed Interest Group: SHARC (SHAring Reward & Credit)

 

Introduction 

(A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

Data sharing statements and promotion is a strong reality but challenging, especially when considering the many obstacles that remain on several fronts. Among these obstacles is the lack of relevant and recognized rewarding mechanisms for the very specific efforts required to share organized datasets. 

 

The prerequisite for data sharing lies in implementing the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) which can add to the workload if done by the researchers themselves; however this aspect is never accounted for when activity is evaluated by funders or reviewers.

In some cases, resources may come from different domains that were not necessarily initially developed for research (e.g. museum, clinical care….). Data and physical resources sharing each comprise very different steps, methods and the involvement of diverse communities:

  • Building of a research collection or resource infrastructure according to the FAIR principles (including all necessary steps for data and physical entities repositories)
  • Elaboration of governance and sharing policies for the resource
  • Development of tools to follow up on the use of the resource

Individuals with different expertise may contribute at each of these steps (laboratory technicians, resource managers, researchers, legal experts…).

 

In 2014, the Expert Advisory Group on Data Access (EAGDA) carried out, through UK cohort studies, research into the governance of data access. The aim of their research was to identify the factors that help or hinder individual researchers in making their data (both published and unpublished) available to other researchers, and to examine the potential need for new types of incentives in order to enable data access and sharing. Among their findings:

  • Research culture and environment are not perceived as providing sufficient support, nor adequate rewards for researchers who generate and share high-quality datasets.
  • Making data accessible to others can carry a significant cost to researchers (both in terms of financial resource and the time it requires);
  • There is typically very little, if any, formal recognition for data outputs in key assessment processes – including in funding decisions, academic promotion;
  • Data managers have an increasingly vital role as members of research teams, but are often afforded a low status and few career progression opportunities;

Recommendations:

  • At first, develop mechanisms that encourage and reward good practice, rather than on penalise researchers who fail to fulfil their planned approaches for sharing data (the carrot not the stick)
  • It is vital therefore that funders and research leaders foster an active, on-going dialogue with international partners and work with them to build common incentive structures and effect cultural change.
  • Recognise the contribution of those who generate and share high quality datasets, including as a formal criterion for assessing the track record and achievements of researchers during funding decisions.
  • Form a partnership among funders, research institutions and other stakeholders to establish career paths for data managers.
  • Ensure that the contributions of both early-career researchers and data managers are recognised and valued appropriately, and that the career development of both types of individuals is nurtured.
  • Champion greater recognition of data outputs in the assessment processes to which they contribute.
  • Strengthen career pathways for data managers; and recognise data outputs in performance reviews

(https://wellcome.ac.uk/sites/default/files/establishing-incentives-and-changing-cultures-to-support-data-access-eagda-may14.pdf)

These recommendations have not been promoted so much out of UK whilst they are of great interest at the international level of research governance.

 

Existing initiatives recognise the value of certain steps of the chain towards sharing resources however gaps remain to be filled, especially as regards physical resources. As an example, the BRIF initiative (BRIF: Bioresource Research Impact factor/framework) has already tackled these issues for human biological samples and data. As a result, the CoBRA guideline has been produced and work performed on unique identifiers and relevant parameters towards specific metrics.

 

RDA IGs could build on those previous results and ideas to further identify such gaps and suggest practical solutions to promote resources provision to the community as a valuable genuine activity in research practices.

 

The workflow of the entire process, from resources production to their impact back on the producer has not been explored in RDA groups, to our knowledge. Furthermore, RDA community is focused mainly on data. Extending the work to resources that also have physical samples in addition to data would be a value-added contribution. As part of its mission in keeping within the goal of RDA, the IG will work at finding solutions to foster open sharing of resources. 

 

User scenario(s) or use case(s) the IG wishes to address

(what triggered the desire for this IG in the first place):

 

Use case n°1 - The biomedical community: A growing portion of research relies on sample collections and databases. This is especially true in biological and medical sciences with the development of large scale biology in the –omics era. High throughput ‘omics’ platforms require biospecimens, and generate a great amount of data on large numbers of patients and/or healthy individuals. The size and complexity of the collections needed to promote translational research typically extends far beyond the scope of individual research projects and the need to produce these valuable data is being met by contemporary bioresource facilities. While sharing of such resources is essential for optimizing knowledge production, so far only a very small part of them are. A major obstacle lies in the fact that establishing a valuable bioresource requires considerable time and effort.  Finding ways to recognize and credit this upstream work is essential.

 

Use case 2 - The Industrial Ecology community: Industrial ecologists rely heavily on data to assess the environmental performances of product during their life cycle. This requires interdisciplinary data from several domains such: as chemistry, ecology, economy, toxicology and climate science, among others. Currently, the availability of harmonized datasets for environmental Life Cycle Assessment (LCA) of products is scarce and the existing proprietary databases are incomplete. Sharing research data on a Research Data Infrastructure is an additional, time-consuming effort for researchers that is not acknowledged. Reward mechanisms for sharing data would significantly improve the transparency of such products’ environmental assessments and the accuracy of environmental models. Moreover, it would also have a high informational value, facilitating responsible consumption and thus, increase the weight of the public opinion’s pressure for significant environmental improvements of activities with high environmental impact.

 

Use case 3 - Data produced by marine and terrestrial biodiversity research projects that evaluate and monitor Good Environmental Status have a high potential of use by stakeholders involved in environmental management. The accessibility of data on the environment, especially in ecology, has never been more problematic, however. The cost of these data and their heritage value is increasingly highlighted, whereas due to budgetary constraints, the resources allocated to their production and their availability are limited. Rewarding data sharing could have a beneficial impact on the whole system. As a case in point, the data produced by biodiversity research are heterogeneous and produced by a multitude of entities, therefore standard formats and protocols would allow the interconnection of databases, and semantic approaches could contribute significantly to their interoperability. However, the specific scientific objectives and the logistics of project management and information gathering lead to a decentralised distribution of data, which can hinder environmental research. Moreover, data are considered as a technical end, and should be more intended as a scientific end, as an object of study: by furthering primary analyses, in the context of a research question for which they have been collected, data can be reused - within the limits allowed by their quality - and their exploration, by appropriate method as graphs, may lead to the formulation of new scientific hypotheses. Actually, the “rising tide of data” requires new approaches to data management and data preservation; access and sharing should be supported in a seamless way. According to the situational analysis of the French landscape of biodiversity research observatories[1], data planning, collection, quality assurance, description, conservation and analysis are mostly led by observatories, whereas data discovery (of potentially useful data) and data integration from varied sources are poorly done. This case study aims to present the latest trends in data infrastructure and data management solutions for research and to discuss the progress of the Open Science Cloud, tools and initiatives about data sharing rewarding in the field of biodiversity and environmental data.

 

A wide range of disciplines face the issue of no or little data sharing, including but not limited to the above mentioned use cases. They could be addressed within the SHARC IG along with its development and ongoing membership:

  • Low-temperature physics: cryostats data
  • Earth science: samples and data
  • Materials science: catalysts, microscopy data, etc.
  • Social science: raw data from surveys, interviews, focus groups or case studies
  • Neuroscience: imaging data.

(See Anita de Waard 0000-0002-9034-4119; VP Research Data Collaborations ; Elsevier RDM Services)

 

Objectives

(A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

The SHARC IG group will have four main objectives:

 

  1. To review the existing rewarding mechanisms in various communities, as well as their limits and identify factors that could to improve the process and optimize the sharing of bioresources; i.e. data and physical samples (ex: tools, incentives, requirements…).
  2. To use this analysis to encourage the inclusion of bioresources sharing-related criteria in the research evaluation process at the European institutional level, (i.e. without making this activity mandatory, increase coherence between evaluation and real practice).
  3.  To disseminate information and findings to diverse communities of stakeholders.
  4. To develop a process for stepwise adoption of principles and implementation measures adapted to national, local and institutional contexts. 

 

Participation

(Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

Currently, seven different communities are represented in the group (details in the table at the end of document): Biology and Biomedicine (7 ppl.), Information Sciences and Technology (3 ppl.), Geospatial data (1 ppl.), Marine Biology (1 ppl.), Biodiversity (2 ppl.) , Industrial Ecology (1 ppl.), Bioethics (4 ppl.),

Anne Cambon-Thomsen is the initial leader.

Laurence Mabile will dedicate 30 % of her workload to the coordination of the group itself. Co-chairs will help in interacting with the relevant RDA groups and to coordinate meetings on their continent.

The different communities will contribute to the white paper detailing the existing and lacking rewarding mechanisms in the sharing process.

 

Three existing RDA groups have identified themselves during our BoF session at RDA P9, as having common concerns: the ‘Research data provenance working group’, the ‘RDA / TDWG Metadata Standards for attribution of physical and digital collections stewardship’ and the RDA/WDS Publishing Data Workflows WG.  Data Citation WG, Elixir Bridging Force IG, Reproducibility IG may have some overlapping interests, too.

Those groups will be contacted via the RDA platform, and virtual meetings will be organized to start with. If relevant, cross-sessions will be organized at RDA plenaries. We also plan to alert them about the events organized by our BoF/IG group.

 

 

Outcomes

(Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

  • White paper /position paper on ‘rewarding’ mechanisms (existing and lacking) for sharing bioresources and their link to research institutional evaluation; To be published if possible as an RDA endorsed paper in an open access high visibility science journal with a science policy section
  • Submission of a session proposal to European Science Open Forum 2018, Toulouse
  • Answering to the next European Community public stakeholder consultation related to the preparation of the EU research FP9 and explore the possibility to include such recognition criteria in FP9 as well as in an EU-level strategies that foster implementation at an institutional level (such as what exists for human resources with the HRS4R (Human resource strategy for research)) .
  • Forming RDA working groups to address issues such as whether future working groups will pertain to a specific community (like ecological, biomedical, geospatial…) or resolve around specific stakeholders across communities (editors, funders, governing bodies of research institutions, research evaluation policy makers…) or both.

Mechanism

(Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

  • Virtual web meetings will be organized as often as necessary, with a minimum of once a month for a regular update.
  • Face to face meetings will be encouraged at each RDA plenary conference.
  • Regular feedback will be relayed towards all interested RDA groups about relevant meetings and conferences of interest for group members.

 

 

Timeline

(Describe draft milestones and goals for the first 12 months):

  • June 2017: ESOF session proposal

The submission of a proposal to ESOF (Euroscience Open Forum) for a scientific session has been done under the coordination of Fiona Murphy.

The conference will be held in July 2018 in Toulouse, FR.

More info at:

http://www.esof.eu/en/about/programme/call-for-proposals.html

 

  • RDA plenary conference 10, Montreal, 19-21 sept 2017:

Attendance by some of the group members; mapping of overlapping topics by other groups and contacting them

 

  • First draft of the white paper: end 2017

 

Potential Group Members

(Include proposed chairs/initial leadership and all members who have expressed interest):

*In bold, co-chairs

FIRST NAME

LAST NAME

INSTITUTION/ COUNTRY

Anne

Cambon-Thomsen

Public Health Department, INSERM-University Toulouse III, FR

Laurence

Mabile

Public Health Department, INSERM-University Toulouse III, FR

Rodrigo

Costas-Comesana

Centre for Science and Technology Studies (CWTS). Faculty of Social and Behavioral Sciences. Leiden University.

Mogens

Thomsen

Public Health Department, INSERM-University Toulouse III, FR

Michele

De Rosa

Bonsai/Denmark
Aalborg University/Denmark

Laurent

Dollé

Erasme Hospital, ULB, 1070 Brussels, Belgium

Mohamed

Yahia

INIST, CNRS, FR

Fiona

Murphy

MMC Ltd (Research Data/Publishing Consultant); University of Reading

Elena

Bravo

Research Coordination and Support Service, Istituto Superiore di Sanità (National Health Institute),  IT

Martina

Zilioli

Institute for Electromagnetic Sensing of Environment (Milan), IT

Sofie

Bekaert

Clinical Research Center of Ghent University Hospital

Romain

David

CNRS, Mediteraneen Institute of Biodiversity and Marine and Continental Ecology

Anna

Cohen Nabeiro

Fondation pour la Recherche sur la Biodiversité, ECOSCOPE (Observations et données sur la biodiversité), FR

Alison

Specht

Fondation pour la Recherche sur la Biodiversité, CESAB (Centre de synthèse et d’analyse sur la biodiversité), FR

Jane

Carpenter

NSW Health Pathology -

Biobanking Services|, Australia

Anne Marie

Tassé

P3G

Gabrielle

 

Bertier

 

Centre of Genomics and Policy, McGill University Human Genetics department, Canada

INSERM-University Toulouse III, FR

Jantina

De Vries

Department of Medicine

University of Cape Town,

South Africa

Louise

Bezuidenhout

Institute for Science Innovation and Society, University of Oxford

[1] Fondation pour la Recherche sur la Biodiversité (2016), Etat des lieux et analyse du paysage national des observatoires de recherche sur la biodiversité, une étude de l’infrastructure ECOSCOPE. Série FRB, Expertise et synthèse. Ed. Aurélie Delavaud et Robin Goffaux, 72 pp.

 

Review period start:
Monday, 7 August, 2017
Custom text:
Body:

NOTE - This Case Statement has been updated in the revised version attached (3 Jan 2018)

 

 

A variety of stakeholders are showing growing interest in exposing data management plans (*) to other actors (human/machine) in the research lifecycle, beyond their creator and the funder or institution that mandates their production. Interested stakeholders include researchers themselves, funders, institutions, and a variety of service providers and community organisations including repositories, institutions, journals, publishers, and providers of tools for writing and maintaining plans.  Implementation and adoption is currently hampered by two problems:

  • A lack of standards for expression and interchange of DMPs

  • Insufficient understanding of the needs of users and the benefits and risks of different modes of action

This proposed working group will address both of these issues; the issue of a standardised form of expression for DMPs is the concern of the proposed DMP Common Standards Working Group. The group’s output will include a reference model and alternative strategies for exposing plans, to best serve community interests in meeting FAIR principles,  based on shared experience of ‘early adopters’ in test implementations. It will be supported by work to gauge user needs and motivations for exposing DMPs as well as perceived risks and disbenefits. Note * our main focus is on Data Management Plans (DMPs) but we will seek examples of Software Management Plans (SMPs) where relevant to the exposure use cases of interest to the Active DMP Interest Group.   

The key beneficiaries of the WG outcomes will be stakeholders with a common interest in using Data or Software Management Plans as instruments for demonstrating that research products have been managed according to research community standards and generic principles (e.g. that the research products should be FAIR), and that recognition is given for doing so. 

There is potential value in exposing plans for a variety of stakeholders involved in their production and consumption. These include researchers themselves, funders, institutions, and a variety of service providers and community organisations including repositories, institutions, journals, publishers, and providers of tools to help write and maintain plans. The WG will provide a Use Cases Catalogue to describe implementation scenarios and articulate their benefits to researchers and other stakeholders, with case studies of how those benefits have been realised. Through consultation with users of well-established planning tools (DMPTool, DMPonline), the Use Cases Catalogue will also identify the degree of acceptance among researchers for the levels of exposure/publication each use case entails, barriers to realising the benefits, and any concerns about undesirable impacts.

Generalising from the scenarios and  examples contained in the Use Cases Catalogue, the WG will produce a Reference Model to document generic components and workflows for exposing plans (and metadata about them), and offer recommendations for further action by each of the relevant stakeholder groups . By gaining endorsements for the Reference Model from relevant stakeholders for each use case we will provide a community endorsed approach to using plans to share demonstrable advancement in data sharing practice.

Review period start:
Monday, 24 July, 2017 to Thursday, 24 August, 2017
Custom text:
Body:

Name of Proposed Interest Group:

 

Disciplinary Collaboration Framework (DCF)

(originally introduced as Disciplinary Interoperability Framework)

 

Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

 

A fragmented landscape or a diverse ecosystem?

 

Over the last couple of years, we have witnessed an increase in the number of Interest and Working Groups operating within RDA. A significant proportion of that increase is due to the creation of disciplinary groups[1]. The operation of such groups in RDA is crucial, as they act as direct channels for communication and collaboration between RDA and their respective scientific communities. As such they enable the interplay between the RDA outputs and community practices, tools and infrastructures. There are approximately (based on the definition used) 20 IGs that can be considered ‘disciplinary’ currently established and active in the wider RDA ecosystem.

 

However, to benefit fully from the existence of these groups it is vital that the RDA community self-organises its activities, to turn the challenges associated with a fragmented landscape into opportunities derived from the operation of a diverse ecosystem. Arguably, the turning point is the capacity of the RDA community to organically develop interfaces between groups, and streamline the inter-group communication.

 

The need for the formulation of a group that will take on the work of strengthening the voice and position of disciplines within RDA, was first identified during a panel discussion at Sci Data Con and a subsequent paper was published in the CODATA Data Science Journal (Genova, F. et al., (2017) Building a Disciplinary, World‐Wide Data Infrastructure. Data Science Journal. 16, p.16. DOI: http://doi.org/10.5334/dsj-2017-016).

 

 

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

 

Issues relating to managing, linking and curating research data are often perceived in different ways within different disciplines. This has led to a challenging landscape that lacks a consistent requirements framework. Such a framework could can drive and steer development of technological solutions and improve their applicability across scientific disciplines. A collaboration and coordination forum where these issues are openly addressed from a discipline specific perspective is needed. Such a forum, however, needs to be organized and operated from the respective groups themselves, providing them with the flexibility to steer the agenda in an agile and responsive manner according to changing needs.

 

The RDA DCF Interest Group will act as a collaboration and coordination working space, bringing together representatives from communities of practice across scientific disciplines to better organise and drive the discussion for prioritising, harmonising and efficiently articulating communities’ needs. 

 

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):

 

The RDA Disciplinary Collaboration Framework (DCF) sets out with a vision and a clear list of standing objectives in support of its work within the RDA ecosystem.

 

Vision

Strengthen the voice of disciplinary groups and improve the clarity and visibility of discipline-specific data management and stewardship needs within RDA. Work towards the development of a disciplinary interoperability framework.

 

Mission statements

  1. Identify and describe common challenges, needs and objectives of scientific communities of practice relevant to managing and sharing their research data;
  2. Improve communication and interplay between disciplinary groups;
  3. Connect and liaise between disciplinary groups with technical and socio-cultural cross-cutting groups;
  4. Improve visibility and applicability of RDA outputs across disciplines;
  5. Act as a forum and represent discipline specific communities that are currently not represented as an RDA group.

 

Objectives/Focus areas

 

The quick wins

  • Act as an inter-disciplinary open forum;
  • Act as a forum to introduce and discuss RDA outputs;
  • Perform a gap analysis for disciplinary participation in RDA;
  • Support RDA domain ambassadors and the ambassadors’ scheme;
  • Act as a single authoritative voice in RDA, representing disciplines.

 

The long runs

  • Use the group as a window to RDA for scientific communities that currently do not participate in RDA;
  • Provide authoritative opinions to TAB/OAB/Council as needed on disciplinary engagement and coordination matters;
  • Take actions towards the defragmentation of the disciplinary groups landscape;
  • Identify and prioritise common technical challenges

 

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

 

The DCF is predicated upon strong participation by all co-chairs of discipline specific Interest Groups, as well as, individuals who represent scientific disciplines outside formal RDA groups. DCF will also invite all co-chairs of other cross-cutting groups addressing technical and socio-cultural issues to participate in the DCF meetings.

 

As the group develops its working agenda and selects specific issues to address, it will make calls to specific RDA Interest and Working Groups to participate.

 

Recognising the role of the group in the wider RDA organisation, the group will have an open invitation to members of all the organisational bodies (Secretariat, TAB, OAB and Council members).

 

Sessions and proceedings of the group will be public and subject to community review/comment.

 

Rules of procedure of the group will be further developed and agreed at its inaugural meeting.

 

 

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

 

As mentioned above, DCF will act as a collaboration and activity coordination space for disciplinary groups and discipline-representing individuals. Following a prioritisation exercise of the technical and socio-cultural issues that cross-cut disciplinary needs,

 

DCF will:

 

  1. Propose and support joint sessions at RDA plenaries between technical and disciplinary groups;
  2. Propose the formation of new working groups to address specific challenges, which are not otherwise addressed by existing groups;
  3. Support the development of new disciplinary IGs, to address gaps in the scientific coverage;
  4. Organise focused sessions and events to help disciplinary groups navigate and exploit RDA outputs/products.
  5. Support disciplinary ambassadors in their role within their respective communities;
  6. Other outputs as evaluated by the group membership.

 

 

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

 

  1. Breakout sessions during Plenary meetings (every six months)
  2. Participation in the RDA co-chairs collaboration meetings (every six months) 
  3. Online meetings (on an ad-hoc basis)     

 

By having a staggered meeting schedule, we will ensure that the group will convene quarterly (baseline schedule).

 

Timeline (Describe draft milestones and goals for the first 12 months):

 

Month 6: Report on gap analysis for disciplinary participation; Which disciplines are represented and which key scientific areas are not.

 

Month 12: Communication across the RDA ecosystem of key group statements on urgent issues (statements)

 

Potential Group Members (Include proposed chairs/initial leadership and all members who have expressed interest):

 

 

FIRST NAME

LAST NAME

EMAIL

TITLE

Andi

Ogier

 

Member

Andrea

Perego

 

Member

Claire

Austin

 

Member

Dimitrios

Koureas

d.koureas@nhm.ac.uk

Co-chair

David

Schade

david.schade@nrc-cnrc.gc.ca

Co-chair

Francoise

Genova

francoise.genova@astro.unistra.fr

Member

Gail

Clement

 

Member

Helen

Glaves

hmg@bgs.ac.uk

Member

Ilya

Zaslavsky

 

Member

Rainer

Stotzka

 

Member

Rebecca

Koskela

 

Member

Rob

Hooft

 

Member

Sarah

Ramdeen

 

Member

Simon

Hodson

 

Member

Tobias

Weigel

 

Member

Ian

Bruno

bruno@ccdc.cam.ac.uk

Member

Bridget

Almas

bridget.almas@tufts.edu

Member

Richard

Kidd

kiddr@rsc.org

Member

Martin

Hicks

bridget.almas@tufts.edu

Member

Wenbo

Chu

wchu@geosec.org

Member

 

 

 

 


[1] Disciplinary groups are herein defined as groups that approach research data challenges from the perspective of specific scientific disciplines. Examples of such groups in RDA include:  Agricultural Data Interest Group (IGAD); Biodiversity Data Integration IG; Chemistry Research Data IG; Data for Development IG; Digital Practices in History and Ethnography IG; Education and Training on handling of research data IG; Geospatial IG; Global Water Information IG; Health Data; Linguistics Data IG; Marine Data Harmonization IG; Quality of Urban Life IG; RDA/CODATA Materials Data, Infrastructure & Interoperability IG; Structural Biology IG; Weather, climate and air quality.

Review period start:
Monday, 17 July, 2017
Custom text:

Pages