You are here

Body:

Maecenas faucibus mollis interdum. Nullam quis risus eget urna mollis ornare vel eu leo. Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur blandit tempus porttitor. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Donec sed odio dui. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Praesent commodo cursus magna, vel scelerisque nisl consectetur et.

Review period start:
Friday, 27 May, 2016
Custom text:
Body:

The initial idea of establishing this working group was presented during P6 in Paris in the Repository Platforms for Research Data IG session. Shortly after P6 a telephone conference was carried out with the conclusion to prepare a case statement and to finalize it during a BoF session at P7. The initial co-chairs are David Wilcox and Thomas Jejkal. Contacts to potential co-chairs from Asia were already made during P6 and will be finalized during P7.

For more information please visit the web page of this BoF group:

https://rd-alliance.org/groups/research-data-repository-interoperability-wg-bof.html

Charter

The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms focusing on machine-machine communication. These standards may include (but are not limited to) a generic API specification and import/export formats summarized in a document serving as an implementation guide for adoption. The scope of this document and all the WG’s activities will be defined by the following list of initial use cases:

  • Migration/Replication of a Digital Object between research data repository platforms

    • Platform, data model and/or version may differ between source and destination

  • Retrieval of information related to the platform and/or its contents

    • E.g. to register the system in a (repository) registry or to harvest contents

This initial list might be extended in the first phase of the WG’s operational time.

In order to cover these use cases, existing standards and technologies will be identified and evaluated in the second phase. Evaluation results will be summarized in a separate deliverable and will form the basis of the final deliverable. During the evaluation phase, the preparatory work of other RDA WGs will be used as far as possible along with experiences gathered by the RDRI WG’s members during their work with and on existing research data repository platforms.

In the final phase the WG will strive for a consensus regarding a generic API specification and/or import/export formats needed for offering the listed functionalities. The final deliverable will then contain this consensus in a form such that it can be used as an implementation guide for later adoption.

Value Proposition

The Research Data Repository Interoperability working group will provide recommendations and implementation guidelines (e.g. for a generic API or import/export formats) for research data repository interoperability that can be integrated by platform developers and service providers. Therefore, existing standards and technologies will be evaluated and integrated where possible. Once adopted widely, these outcomes will allow institutions and organizations with research data repositories to deposit, access and share their data in a common way and to disseminate repository resources and contents to clients and services easily. For adopters and their users this means:

Removing Barriers: Defining and implementing interoperability standards for realizing the use cases mentioned above could help to identify and to acquire datasets stored in other platforms not available before in order to enrich the own research.

Easier Collaboration: Having a common way to exchange datasets stored in different research data repository platform instances from different institutions or even disciplines can help to identify new starting points for (inter-)disciplinary collaborations.

Creating Commonalities: Agreeing on and implementing common standards for realizing typical research data repository tasks might bring adopters closer together. For the future this could result in fruitful collaborations extending the basic set of functionalities that have been proposed by this WG.

As everything rises and falls with the adoption of the results, repository platform developers contributing to this group have agreed to implement the results as early adopters.

Engagement with Existing Work

A number of related standardization efforts have already taken place; for example, the OAI protocol for metadata harvesting, the SWORD protocol for repository deposits, and the re3data.org schema for collecting information on research data repositories for registration. The Research Data Repository Interoperability WG will review these and other related standards to see how they might be adopted or extended to support our goals. This review period will ensure that we do not duplicate existing efforts.

Related Work

Related RDA Groups

Work Plan

The work of the proposed group is organized in three phases framed by the RDA plenary meetings beginning with P8.

Timing

Action

Main Participants

September 2016

Official start of RDRI WG at P8, working session at P8 for analyzing state of the art

Session participants in an open discussion

September – December 2016

Identification and discussion of additional use cases and adoptable technologies. Mapping of technologies for potential adoption to single functionalities.

Registered members

January – April 2016

Create a primer document describing all use cases and technologies for potential adoption. The document also points out gaps not covered by existing technologies.

Co-chairs

April 2016

Session during P9 to present the primer document and to prepare next steps, e.g. identification of functionalities or exchange formats.

WG members

April 2016 – September 2017

Discussion of functionalities, exchange formats and intended behavior. Create first draft of specification document.

Registered members

September 2017

Presentation of the specification draft at P10 and identification of open points and potential improvements.

Session participants in an open discussion

September – March 2018

Find consensus regarding final specification and write final deliverable serving as implementation/adoption guideline.

Registered members/co-chairs (writing)

March 2018

Present final results at P11.

Co-chairs

 

Deliverables

D1. Research Data Repository Interoperability Primer (M6): This document describes targeted use cases, needed functionalities, as well as existing technologies and their feasibility for adoption. Gaps not covered by existing technologies are also described in this document.

D2. Interface Specification Draft (M12): A first draft document of the final specification. The document gives a basic overview of functionalities, exchange formats and intended behavior targeted by the WG to cover the defined use cases. This document will be the basis for finding a consensus between all WG members.

D3. Interface Specification (M18): This specification represents a consensus of all partners regarding an interoperable repository interface. It describes all functionalities provided by this interface including exchange formats and the expected behavior of a repository platform implementing the interface. This document serves as guideline for adopting the results of this working group.

Mode and Frequency of Operation

The Research Data Repository Interoperability WG will primarily communicate asynchronously online using the mailing list functionality provided by RDA. Online voice meetings will be scheduled as needed; likely once per month. When possible, in-person meetings will also be scheduled; these will take place at RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

Addressing Consensus and Conflicts

Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.

Community Engagement

The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.

Adoption Plan

Representatives of several major repository platforms have already joined this working group, including:

These representatives have agreed to consider implementing the standards recommended by the Research Data Repository Interoperability WG in their respective repository platforms. We will continue to seek representatives from a variety of repository platforms and services to ensure that this working group’s deliverables are widely adopted.

Initial Membership

Co-Chairs

Thomas Jejkal

David Wilcox

 

Members

Stefan Funk

Ralph Mueller-Pfefferkorn

Robert Olendorf

Rick Johnson

Ulrich Schwardmann

Ajinkya Prabhune

Andrew Woods 

Wolfram Horstmann

Cynthia Hudson Vitale

Adam Soroka 

Jared Whiklo

Colleen Fallaw

Rainer Stotzka

Stephen Abrams

Eleni Castro

Amy Nurnberger

Andre Schaaff

Christopher Harrison

Holger Mickler

Jibo Xie

Juanle Wang

Muhammad Naveed Tahir

Niclas Jareborg

Shaun de Witt

Volker Hartmann

William Gunn

Wouter Haak

 

Review period start:
Thursday, 19 May, 2016 to Monday, 20 June, 2016
Custom text:
Body:

RDA Interest Group Charter

Name of Proposed Interest Group:  New Paradigms for Data Discovery

Introduction:

An emerging statement on research data is that it should be FAIR: “Findable, Accessible, Interpretable and Reusable”. To comply with the first of these criteria, being Findable, we need a data infrastructure that supports users in discovering research data regardless of its location or the manner in which it is stored, described and exposed. This is a significant and growing challenge, as the number of research data repositories, and the need for cross-disciplinary data discovery, increases. This interest group aims to explore common elements and shared issues that those who search for data, and who build systems that enable data search, share.

User scenarios or use cases the IG wishes to address:

  1. Data search engines are interested in developing components and practices to connect resources and results
  2. Data repositories are interested in improving and expanding search on their platforms
  3. Users are interested in better interfaces and fewer places to look for data
  4. Data creators are interested in a shared set of data metrics for all search engines
  5. Data search builders are interested in sharing knowledge and tools about ranking, relevance and content enrichment.  

Objectives:

The objectives are to provide a forum where representatives from across the spectrum of stakeholders and roles pertaining to data search can discuss issues related to improving data discovery. The goal is to identify concrete deliverables such as a registry of data search engines, common test datasets, usage metrics, and a collection of data search use cases and competency questions.

Related Activities:

  • NASA’s WG on Search Relevancy – focus is on improving search result relevance for EOSDIS data
  • ESIP’s Information Quality Cluster and NASA’s WG on Data Quality are both addressing ways of capturing and conveying quality information
  • W3C’s Best Practices for Spatial Data on the Web aims to improve discoverability and accessibility of geodata

Other RDA IGs whose activities are of interest and who we will interact with:

  • Metadata
  • Registries
  • Brokering
  • PIDs
  • Research data collections

Participation:

The Data Search Birds of a Feather session at RDA7 attracted 41 people, with additional people who could not come to Tokyo expressing interest. Some of the names appearing below come from this initial group, and we expect others will join when an announcement about the formation of the IG is made.

Outcomes:

A detailed set of deliverables is to be discussed when the IG meets, but can include those mentioned under use cases and objectives, above.

Mechanism:

We plan to meet via video conferencing at least monthly, initially, with ongoing discussions via email and biannually at the RDA Plenary meetings.

Timeline:

M0 – Hold kickoff meeting at RDA Plenary 8

M1 – Establish web presence and mailing list, hold initial virtual meeting, prioritize deliverables, distribute workload

M3 – Draft an initial set of use cases; form subgroups around use cases

M6 – Post use cases to workspace and announce on RDA email list

M8 – Subgroups report out on analysis of feedback on use cases (RDA #9)

M9 – Draft case statements for promising use cases

M12 – Finalize use cases, decide on next steps; presentation at RDA Plenary #10

Potential Group Members(the following people have agreed to join. We realize this is a US-only list, and aim to rectiry this: several non-US members have indicated, but not yet confirmed, their interest)

FIRST NAME

LAST NAME

EMAIL

TITLE

Anita

de Waard

A.dewaard@elsevier.com

Co-Chair

Siri Jodha

Khalsa

sjsk@nsidc.org

Co-Chair

Adrian

Burton

adrianburton@ands.org.au

Member

Ruth

Duerr

ruth.duerr@ronininstitute.org

Member

Rick

Johnson

rick.johnson@nd.edu

Member

Dawei

Lin

dawei.lin@nih.gov

Member

Kerstin

Lehnert

lehnert@ldeo.columbia.edu

Member

Ilya

Zaslavsky

zaslavsk@sdsc.edu

Member

 

Review period start:
Tuesday, 17 May, 2016 to Thursday, 16 June, 2016
Custom text:
Body:

Interpretation and use of scientific datasets by those that are not engaged in the creation or production of those datasets is pivotal for enabling science that is driven by data. RDA has made significant progress in addressing this issue through the initial Data Type Registries (DTR) WG, which has now finished, but much remains to be done, hence this proposal for a follow-on working group. The fundamental effort here is to describe scientific datasets in a human-and-machine-readable fashion, enabling humans and software to parse and understand the semantics, context, and assumptions behind the data. We reference all such descriptions “data type records”, regardless of the standard or best practices standing behind those descriptions. Data types complement traditional descriptive metadata records, providing re-usable descriptions of dataset structure and semantics aimed mainly at supporting data processing, while at the same time providing an additional attribute that can be used for a certain kind of discovery. The initial DTR WG focused on developing an infrastructural component that would manage such data type records. The new WG, DTR2, will focus on aiding data producers come up with useful data type records.

Please see the attached PDF for the full case statement.

Review period start:
Wednesday, 30 March, 2016 to Saturday, 30 April, 2016
Custom text:
Body:

REVISED with responses to TAB review comments - updated Charter document attached to this page.

 

 

The objective of this group is to explore the areas where the principles and practices in the information disciplines of archives, records management, and research data curation overlap and where they diverge. Archives and records professionals serve in a range of critical roles: as experts in ensuring access, preservation, and reuse of records and archival collections; as provocateurs for good records curation practices; and as advocates for the construction of sustainable infrastructures for information sharing. Examining alignments in theoretical frameworks, practical implementations, and goals among archivists, records managers and data curators will lead to the development of a vibrant community of information professionals working in a more interoperable way across different domains.

The proposed IG is explicitly aligned with RDA’s goal to build social bridges that enable the open sharing of data. Specifically, the IG will build relationships among professionals who are committed to developing data sharing frameworks and practices that are able to withstand the various challenges introduced by the passage of time: technological obsolescence, loss of contextual understanding of data, and resource constraints that make it impractical to commit to preserving all data forever. By creating a space for archivists, records management professionals, and data curators to come together, this IG has the potential to act as a launching pad for RDA Working Groups that will produce recommendations addressing these challenges by bringing preservation, arrangement and description, and appraisal methodologies from the archives and records professions to the RDA community. This IG also aims to to bring the skills and competencies which have long existed in the archives and records management communities to the wider RDA community ­ many of whom will not be aware of this existing expertise.


Download the full Case Statement and use the comment box below for your comments, questions and suggestions. 

 

 

Review period start:
Thursday, 24 March, 2016 to Sunday, 24 April, 2016
Custom text:
Body:

Mission
To reduce the likelihood of misunderstanding of a research community's storage requirements, or of a storage provider's service.
To facilitate dialogue between a research community and multiple storage providers, and between a storage provider and multiple research communities.
To maximise the scientific output of a research community with a fix budget by allowing them to use the cheapest storage that supports their requirements and to automate data management tasks that are predictable.

 

Stakeholders
The organisations responsible for procuring storage capacity for researchers that either work with sufficiently large amount of data that storing it themselves is impractical, or that form a distributed community using distributed resources. Such organisations are often universities and other research institutes.
Storage technology providers (vendors and developers) that create storage solutions for research communities.
Storage providers that offer a service to multiple research communities.
Brokers that allow research communities to discover the best storage provider for storing their data.
Agents that provide an enhanced storage service by aggregating resources from multiple storageproviders.
Public organisations that procure data storage.
Anyone with archival responsibility.
 

Goals
Provide a common vocabulary that may be used by both researcher communities and storage providers, or by storage service providers and storage technology providers (vendors and developers) to:
• Describe the Quality of Service the researchers expect and that the storage providers will deliver. The vocabulary may be used both within documents (such as SLAs) and computer interactions (both person-to-computer and computer-to-computer).
• Describe known, predicable transitions that data will undergo throughout its life-cycle so that a research community may delegate responsibility for managing such transitions.


Download the full Case Statement

 

 

 

Review period start:
Wednesday, 30 March, 2016
Custom text:
Body:

Problem Statement and Scope

The exchange of information is based on a very fundamental concept: mutual trust. Trust is difficult to establish and easy to lose. In order to be able to trust each other, research facilities, companies and institutions need to agree on and ensure minimum data protection standards. This requires a verifiable layer of trustworthiness, which goes beyond paying lip service. Only if the partners can agree on data authenticity and integrity standards, only if they can show that the protection of sensitive data meets the policy needs and if access can be granted and revoked based on valid criteria, trustworthy data exchange is possible.

So far the topics of data security and trust are dealt with in isolation in the existing RDA working groups, with some essential aspects potentially not covered. What we want is a common arena for understanding and harmonizing various notions of data security and trust across research domains, which should facilitate common agreements on standards, best practices and policies. This, in turn, should allow access to, and/or exchange, of sensitive data without disclosure to unauthorised parties and according to clearly defined and verifiable protocols.

It is clear that the RDA Working Group for Data Security and Trust (WGDST) cannot address all relevant questions from the very start. Our aim is to focus on a few practical topics which immediately allow to improve security and identify opportunities for the controlled sharing of sensitive research data. These topics initially include:

●  Operational (technical) policies on data access and data release for research data that is deemed sensitive because of privacy or commercial considerations, or other concerns

●  Authentication and authorisation protocols for data access

●  Protocols for data integrity and authenticity

●  Secure and privacy aware data processing

●  Data sharing (processing data offsite) and code exchange (processing onsite)

 

Deliverables and Work Plan
Within the 18 month time frame, the WGDST will produce the following items:

 

D1 - (Virtual) Kick-Off Meeting

In the first meeting we will discuss and agree on a timeline for this WG and set monthly web-conferences.

Output: The minutes and the agreed timeline will be published on the WG wiki.

 

D2 - Investigating the Existing Standards, Recommendations

Best Practices, Processes and Solutions

We will identify the different aspects of research data security and trust and create an overview of the area. We will investigate the existing standards, recommendations and best practices to distill the common questions and issues that are related to the protection and verification of data - mapping them to the different aspects of research data security previously identified. We will identify exemplar projects that demonstrate how research data can be exchanged in a secure way, how trust can be established and maintained and what pitfalls need to be avoided.

This overview does not only help us to limit the scope of the working group, but also is an important output providing insights to the security requirements of the research community.

Output: Make available a summary of D2 findings on the best practices (including examples), standards, recommendations, process and solutions currently available.

 

D2 - Scenarios, Use Cases and Potential Adopters

We will collect use cases and scenarios that can demonstrate the applicability of any guidelines we develop. These use cases will additionally provide insight into current data security and trust practices as well as highlight any potential challenges.

The following set of initial use cases, potential adopters and solutions were presented during the BoF session at P6.

●  Access and Use of Confidential Microdata in Social and Economic Sciences: Mike Priddy (DANS, Netherlands)

●  Big Facilities for Small Science: Vasily Bunakov (STFC, UK)

●  Data Exchange in Biomarker Research: Peter Kieseberg (SBA Research, Vienna, Austria)

●  DataSHIELD: Prof Paul Burton, Dr Becca Wilson (University of Bristol, UK)

●  DEXHELPP: Rudolf Mayer , Stefan Pröll (SBA Research, Vienna, Austria)

During P7 we added a new initial use case:

● E-infrastructure and Distributed Data: Allesandra Scicchitano (GEANT, Netherlands)

Output: Maintain a list of use cases and potential adopter s that will grow through the engagement activities of the working group.

 

D4 - Gap Analysis

Based on the findings in D2 and D3, we will create a gap analysis of the security landscape in relation to research data protection. We aim to identify areas where there is a lack of best practices, protocols and standards. These results may serve as a basis for identifying research questions for further projects and allow to raise the awareness for potential data leaks and security breaches.

Output: Make available a summary identifying areas lacking best practice, protocols and standards with respect to data security and trust.

 

D5 - Guidelines

Combining the exploratory work from D2-D4 we will produce guidelines - harmonised across domains - to facilitate the sharing of research data in a secure, usable, trustworthy and systematic way. This set of guidelines will enable data creators to look up common data sharing scenarios and identify the best practice processes, solutions and examples accordingly. These guidelines will reassure data creators in their data sharing practices and provide support for those not yet familiar with data security practices.

Output: The WG guidelines for data security and trust in data sharing will be made available for potential adopters.

 

D6 - Final Meeting

We will present the guidelines produced by the working group for each of the treated topics, which facilitate data creators to employ appropriate processes for exchanging data in a secure way with respect to their own requirements. The meeting will serve as a mechanism to identify potential adopters of the guidelines for future evaluation.

The meeting will also discuss any further WG/IG for future RDA activity related to data security/trust including e.g.

●  the long term view of encryption standards

●  security audit of data repositories

●  collaboration in research environments that involve sensitive data

●  the definition, the application and the verification of machine-executable policies on

data access and data release

 

Engagement with Existing RDA Work and Groups

Several RDA activities deal with security in a broader context, which highlights the importance of the topic for the research data community. We are currently aware of the following list of related groups under the umbrella of the RDA:

●  RDA/CODATA Legal Interoperability IG

●  RDA/NISO Privacy Implications of Research Data Sets IG

●  Ethics and Social Aspects of Data IG

●  BioSharing Registry: connecting data policies, standards & databases in life sciences WG

●  Health Data Interest Group

We are actively asking for collaboration and looking for synergies to exploit and aim to avoid duplication. Our goal is to link existing work within the groups and collaborate and exchange with other active RDA initiatives in the area of data security and trust. We will also seek to engage further examples of best practice, use case scenarios and potential adopters of our outputs.

 

Value Proposition

We expect that various stakeholders will benefit from the outputs of this proposed working group, examples include:

Data users : the outputs will work towards making research data available for (re-)use, rather than being siloed.
Data creators, owners and holders: can adopt the WG guidelines to facilitate data security and trust when data sharing and also examine exemplar projects that demonstrate best practice.

Research and Research-related communities: c an make use of a knowledge base of current processes, protocols, best practice, solutions and challenges surrounding data security and trust in research data sharing.

 

Chairs and Founding Members

Name

Institution

Stefan Pröll

SBA Research, Austria

Rudolf Mayer (co-chair)

SBA Research, Austria

Peter Kieseberg

SBA Research, Austria

Andreas Rauber

TU Wien, Austria

Vasily Bunakov

STFC, UK

Paul Burton

Newcastle University, UK

Mike Priddy

DANS, Netherlands

Becca Wilson (co-chair)

University of Bristol / Newcastle University, UK

Alessandra Scicchitano (co-chair)

GEANT, Netherlands

Mary O'Brien Uhlmansiek (co-chair)

Washington University in St Louis, USA

 

Conclusions

The aim of this WG is to provide usable data security guidelines for data creators, allowing them to exchange data in a secure way. The guidelines provided by this WG are based upon examples of best practice and explained using concrete examples. Besides delivering the necessary security know-how, the group also raises the awareness for security measurements in the domain of research data.

 

 

Review period start:
Friday, 19 February, 2016
Custom text:
Body:

As a domain, Materials Science and Engineering (MSE) is exceptionally broad and interdisciplinary with its origins most directly from metallurgy, ceramics and polymer science, but also with important ties to other disciplines such as physics, chemistry, chemical engineering, geology, electronics, optics, and biology. As a global community, MSE is expanding rapidly worldwide through the establishment of large, multi-institutional academic research centers, government labs, industrial consortia, and computing facilities. MSE researchers often need to answer complex questions such as “What structural properties and processing methods are required to develop new lightweight materials that significantly improve fuel efficiency yet meet safety standards satisfied by traditional materials in use today?” To this end we have seen the creation of programs such as the Materials Genome Initiative (MGI) in the US that aim to decrease the cost and time to develop new materials by a factor of two through more effective discovery, access, and interoperability of experimental and simulation data. However, finding the latest materials data and resources to answer such questions amidst this rich diversity and accelerated growth is an increasingly difficult and time-consuming endeavor.

In response, the RDA/CODATA Materials Data, Infrastructure, and Interoperability Interest Group (MDII IG; co-chairs James Warren, NIST, and Laura Bartolo, Northwestern University) in collaboration with materials science professional societies, proposes to create its first Working Group focused on developing the metadata standards required to establish a network of

International Materials Resource Registries (IMRR) in key sub-domains and regions.

Developing a successful international materials science resource registry requires a combination of technical and political process. As an outgrowth to discussions held in MDII IG working sessions and based on knowledge of the materials community, MDII IG proposes core members for its Working Group. The core members would be comprised of “doers” in the materials and cognate communities to identify those in their organizations who need to be involved.  More details are available in the draft Case Statement.

Review period start:
Thursday, 18 February, 2016
Custom text:
Body:

NOTE - This charter statement has been replaced by the attached V2 text.  See the attached PDF for the updated version.  15 May 2017

-------------------------------------------------------------------

1. WG Charter

The empirical humanities include history, folklore, cultural anthropology and other fields in which researchers collect primary data of different types that can be used for cultural analysis. Today, these researchers often need to collaborate to understand phenomena that operate across geographic regions, scale and communities of people. But established research practices and infrastructures in the empirical humanities do not support this. The Empirical Humanities Metadata WG (EHM) will conduct research, develop a statement of best practices and release an adoptable product centered on what needs to be in place (standards, protocols, policies, cultural expectations) to make ethnographic and historical data archivable, discoverable and shareable.

 

In a preliminary phase of this work, we have identified and categorized a broad range of metadata standards and use cases within history and ethnography, surveyed the relevant literature, hosted a number of conversations asking about the many kinds of data (such as recorded interviews, field notes, and photographs, among others) that require metadata to be archived and shared in a number of different user scenarios (born-digital interviews produced by early career researchers, for example, or newly digitized data from more established researchers) and ensured complementarity between the goals of the EHM and existing RDA metadata groups. Building on this exploratory phase researching standards and use-cases, the first phase of this WG will entail facilitating the implementation of new metadata fields within the Platform for Experimental and Collaborative Ethnography (PECE), resulting in a prototype of the suggested metadata fields for about a dozen artifact types common in the empirical humanities. A second phase will document and analyze some of the diverse metadata practices within the empirical humanities through a literature review, an environmental scan of projects and interviews with project leads. This stage will also include working with metadata and provenance experts within and beyond the RDA to better understand existing best practices and recommendations. Finally, we will propose best metadata practices for a variety of use cases/scenarios and facilitate the uptake of these deliverables within a number of projects (see Adoption Plan below), updating our initial metadata fields from phase one of the project. Each phase is expected to take six months. 

 

Confirmed early adopters of this WG’s deliverables will include research groups working with PECE including The Asthma Files and The STS Disasters Studies Network. We are also in contact with a number of other RDA members that have agreed to consider adopting our outputs. Beyond the 18 month timeframe of this WG, lessons learned from these early adopters will feed into the RDA Digital Practices in History and Ethnography (DPHE) Interest Group’s continued dissemination of these best practices. Our deliverables will make digital artifacts in ethnographic and historical research much easier to share, find, use and cite effectively, contributing to the development of a credit/reward structure that would not only reduce barriers, but further incentivize the sharing of data in the digital humanities.

 

1a. Preliminary research (completed) (M0)

In preparation for this Case Statement submission, we have reviewed a wide range of use cases, identifying scenarios that historians and ethnographers (within and beyond RDA) encounter when working with metadata for shared artifacts. This phase has benefitted from multiple existing RDA initiatives including the Metadata IG, the Data Fabric IG and the Repository Platforms IG. We hosted an “issues share” call-in on metadata with representatives from the Metadata IG and other RDA groups and have received positive feedback from leadership within metadata-related RDA groups on this case statement and the complementarity of the EHM and these existing initiatives. Preliminary work conducted within the DPHE suggests that researchers often struggle to develop appropriate metadata practices when digitizing and sharing the following data types especially important to history and ethnography: field notes, interviews (audio and video), grey matter, images, analytic structure, structured annotations, surveys, maps, quantitative data, bibliographies, translations and work flows.

 

1b. (M6)

In the first phase of this WG we will expand the metadata fields within PECE for the following data types: projects, groups, fieldsites, design logics, substantive logics, fieldnotes, texts, PDF documents, images, audio, video, websites, licenses, annotations, structured annotations, events, memos and bibliographies. As we continue our research, and use the platform, throughout the remaining phases of this WG, we will pay attention to (and perhaps go beyond) the limits of our initial choices about which metadata fields to include for these data types.

 

1c. Mid-term goals (M12)

Our second phase will continue to document and analyze the diverse existing metadata practices of researchers in the empirical humanities. In order to scope this project appropriately for our timeline, we will focus primarily on the data types we have identified as priorities in PECE. In our analysis, we will identify best practices that could be codified and distributed.

 

1d. Long-term goals (M18)

In our final phase of work we will facilitate the uptake of our deliverables from the second phase (best practices) in at least two projects using the PECE platform (The Asthma Files and The STS Disaster Research Network) and then work with other adopters. Users of PECE will be able to use a simple form-based interface to input the relevant metadata for artifacts such as images, documents, audio, video and a variety of file types. In order to capture metadata for various analytic structures, PECE developers will establish micro-attribution vocabularies that capture the complex provenance of a particular analytic. We aim to learn lessons in this implementation phase that will help facilitate the uptake in further platforms and projects beyond the 18 month timeframe, with sustainable support provided by DPHE-IG. We will conduct extensive outreach, through RDA and professional societies, to identify, and work with, adopters of our outputs.

 

1e. Timeframe

Phase one, updating PECE metadata fields, will run from February 2016 through July 2016. From August 2016 through January 2017 we will collect data on the many ways researchers in history and ethnography address metadata issues. Finally, we will codify best practices and work with researchers using the PECE platform, and other research communities, to adopt these deliverables from February through June 2017.

 

2. Value Proposition

Research case: Given the cultural and social complexity (as well as technical, ecological and economic complexity) of many global problems today, collaborative empirical humanities research has renewed urgency. For decades, research in these fields has been an almost entirely individual-centric enterprise. Field notes, found documents, found or researcher-created photographs or recordings and other data used in cultural analysis are very rarely shared, except when reduced or rendered into some form of publication or museum display.

 

One of the primary barriers to sharing data within the empirical humanities is a lack of agreed-upon protocols for metadata standards for user-created primary research data. While there has been a great deal of work in the cultural heritage arena, especially within museums and libraries, and the dilemmas of qualitative data re-use are well documented (see Holstein and Gubrium 2004), the issues associated with preparing data for later use by third parties are yet to be thoroughly conceptualized. In the cultural heritage sector, for example, Jenn Riley has identified 105 metadata standards and notes that “the sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation” (Riley 2009). In contrast, the RDA Metadata Standards Directory WG lists only one standard for heritage studies and one for anthropology.[1] Many researchers find themselves caught in the confusing space between the dizzying proliferation of standards and a one-size-fits-all approach that can miss out on the diversity of data practices within disciplines. Working closely with existing metadata-focused RDA groups (providing feedback on the list of elements in the package presented recently at the WG Chairs Meeting in Gaithersburg, for example), we will produce a simple list of recommended metadata fields for a delimited set of artifact types, analytics and use cases. Once endorsed by the RDA, and taken up by early adopters, these best practices will be a go-to resource for researchers that may then choose to modify (add or subtract) the fields we suggest for their own purposes. Development and uptake of shared metadata practices and tools will make user-created research data more findable and usable within these research traditions. The work of this WG could also contribute to the development of mechanisms providing greater credit and incentives for sharing data.

 

Business Case and Adoption Plan: Building digital infrastructure to support more data sharing and collaboration in the empirical humanities is far from straightforward. Analytic techniques in the empirical humanities differ from those in social science fields that may collect similar data, and are more akin to those used in literary and philosophical research, relying primarily on hermeneutics (interpretation for explanation and evocation rather than representative or statistical sampling for identification and validation). The goal is not to develop a concise and consistent view of an object, but to produce and explore multiple views of an object, leveraging “epistemological pluralism” (Keller 2002; Turkle and Papert 1990). Indeed, providing multiple, different interpretations and ways of representing particular phenomena (the sociocultural causes and impacts of the Fukushima nuclear disaster, for example, or the impact of genetics research on understandings of environmental health) is the key task for humanities researchers. Computational advances that support open-ended, underdetermined engagement with digital content that enables (even encourages) drift and transmutation in the way content is identified and taken up in analysis, are thus required. The standards we develop are likely to be taken up by individual empirical humanities researchers, people working on collaborative projects and institutions.

Metadata is particularly complex and dynamic in the empirical humanities, even more so when research is collaborative. Empirical material often has limited or contested provenance information; the “empirical” itself can shift, in relevance or prevalence, as analytic structures evolve and multiply. Qualitative interviews are not just collected, for example, they are produced, through questions and other elicitation techniques developed by the interviewer (often drawing on complex traditions of thought about language, culture and society). Interviews are then analyzed, again using analytic structures developed within complex traditions of thought. If interviews are analyzed collaboratively, different analytic structures may be used by different researchers, or different researchers may deploy “the same” analytic structure in different ways, and come to different interpretations of what an interview, image or document “says.” It is thus critical to recognize – and make accessible and discoverable (if researchers deem this appropriate) – the analytic structures through which data in the empirical humanities is both produced and interpreted. Metadata functionality thus needs to be in place at many stages in the ethnographic research process, addressing diverse types of “data”—including analytic structures used to produce and interpret empirical data.

Individuals, communities and initiatives that will benefit from the proposed WG:

●      Researchers: by reducing psychological, institutional, political, cultural and technological barriers to digitizing and sharing data, making shared data easier to find and cite and improving mechanisms for credit

●      Collaborative research platform developers: by providing a guide to various metadata options and recommendations on the important fields to include in form-based metadata entry systems.

●      Interlocutors: by providing informed consent forms with a wide range of options for sharing and dissemination of interviews.

●      Collections: better metadata will improve accessibility, raising demand for archived material, helping collections better meet their mission. 

3. Engagement with Existing Work in the Area

Our preliminary research suggests that researchers in history and ethnography can quickly become overwhelmed the multitude of diverse and somewhat scattered metadata standards and that many researchers have their own ideas about the limits of existing metadata practices and standards. Historian of cartography Pat Seed, for example, is involved in efforts to define best metadata practices for maps and has noted that Dublin Core is far from sufficient for her purposes. Another common standard, the Open Archives Initiative, informs the metadata fields used by many advanced digital projects supporting historical and ethnographic research, but one researcher we interviewed suggested that the OAI standards are "out-of-date" and that “the standard uses older web technology and has not been updated or changed in quite some time… if I were to do it today, first we would want a separation of the data model and the data encoding. For example, many APIs allow you to get results back in JSON, XML, RSS, etc.” This WG will examine the value (and possible limits) of encouraging community-wide compliance with Dublin Core, OAI and many other standards, with a focus on building a list of suggested metadata fields based on an in depth analysis of diverse empirical humanities research practices.

We plan to partner with existing RDA Groups, such as the Metadata IG, the Metadata Standards Directory WG, the Research Data Provenance IG and the Engagement IG. Individual researchers and groups within the RDA working on linked data, preservation, persistent identifiers, dynamic data citation and the long tail of research data will also be key partners. These connections will be strengthened at and beyond the RDA Seventh Plenary.

Beyond the RDA, we will engage with institutions with widely respected standards (i.e. the Smithsonian), initiatives (i.e. Open Folklore) and publishing bodies with a digital presence (i.e. the Journal of Cultural Anthropology). 

4. Work Plan

Work Plan Components

1.     Survey of relevant literature and projects in order to develop a list of interviewees and build initial use-case scenarios.

2.     Ethnographic interviews with researchers in history and ethnography on the types of data for which they need metadata practices, the scenarios in which they encounter metadata decisions and (with a focus on interviews and field notes) their practices.

3.     Drafting deliverables in order to codify metadata practices deemed “best” in the context of different scenarios.

4.     Facilitating uptake of deliverables, initially with researchers using PECE.

5.     Reporting on lessons learned in initial uptake, and working with the DPHE-IG to ensure sustainability and evolution of the deliverables and their uptake.

6.     Promoting the deliverables from this WG within the RDA and beyond.

 

WG Operation

The initial core members of this WG will meet regularly to ensure continual development towards the proposed deliverables. Many of the initial WG members have a well-established working relationship, with a record of collaborative peer-reviewed publications and presentations (at the American Anthropological Association, the Society for the Social Studies of Science, the Society for Cultural Anthropology and other conferences) as well as public dissemination (blog posts, university press releases, etc.) disseminating the results of their work. Differences of opinion and experience will be viewed as an asset within this WG, and will resolve through communication and collaboration practice.

 

In the spirit of “user-centered design,” this WG will partner with developers of the PECE platform from the early stages to increase the likelihood of the deliverables meeting user needs. An ongoing series of “project shares” and “issues shares” hosted by the DPHE-IG will also provide frequent opportunities for members of this WG to envision how the deliverables could feed into a wide variety of digital humanities projects. This WG will be a vehicle for the broadly understood need for RDA to continue developing engagement with social science and humanities research communities.

 

Updates to (and input from) the broader community of RDA will be provided at plenaries every six months in the form of poster sessions, breakout groups and Birds of a Feather sessions.

 

Adoption Plan

Specific groups committed to taking up the deliverables of this WG include collaborative research projects on the PECE platform. Two instances of PECE – The Asthma Files (TAF-PECE) and The Disaster-STS Research Network (DSTS-PECE) – will provide initial venues for the implementation of the deliverables proposed here. Both have small but active, cooperative and growing user communities. TAF-PECE is a collaborative research project that currently has approximately ten consistent users in geographically distributed locations, along with many more student-researchers, all likely to be working on the platform on a daily basis. DSTS-PECE is an international research network that will be actively enrolling new members over the next twelve months  -- in groups of five to ten researchers; this incremental enrollment of new members will provide excellent opportunities to test and improve new, embedded metadata management policies. Technical implementations of new metadata policies in PECE will first run on a PECE test site, then be moved to the TAF and DSTS PECE instances. Embedded metadata policies will be part of the PECE Github release. We are aware that the RDA does not promote and endorse specific products and technologies and aim to use PECE as an initial testing ground and then facilitate adoption (implementation of recommended metadata fields) within a number of additional projects.

One probable adopter of this WG’s outputs is the Northwest Knowledge Network (NKN) which began as a cyberinfrastructure provider for geospatial natural resource data, but is rapidly expanding to host and serve data to the public from a plethora of disciplines, including interviews with native american communities, farmers and other natural resource stakeholders.  They also partner with archaeologists who manage and share their data using our data portal.  Using our recommendations as a guide to the many metadata standards used in the empirical humanities, the NKN will be able to better serve their many partners with their diverse needs. We are in close communication with other researchers and project leads that have expressed great interest in integrating our deliverables into their projects, including Juila Collins (working with projects including the National Snow and Ice Center, which hosts projects such as Exchange for Local Observations and Knowledge of the Arctic), Jason Baird Jackson (Co-chair of the DPHE-IG and Director of the Mathers Museum of World Cultures at Indiana University), Jarita Holbrook (a co-chair of this proposed EHM WG) and other researchers working within AstroAnthro.net (an umbrella project focused on studying astrophysicists, their culture, diversity and their engagement with big data and big collaborations) and representatives from Digital Research Infrastructure for the Arts and Humanities (DARIAH).

We are in communication with early-career scholars that are interested in how smart metadata practices might affect their collection of born-digital data, developing informed consent forms, for example, that allow their interlocutors to make a variety of choices about how interviews will be shared. We have also been in close communication with researchers, such as Sharon Traweek and Michael M.J. Fischer, that have considerable repositories of research material, that are awaiting our WG deliverables in order to digitize and make shareable their research collections.

We will continue to develop relationships with people and projects within and beyond the RDA and aim for a minimum of ten adopters. By connecting to researchers outside of the RDA, a tangential benefit of this WG will be to broaden RDA engagement, especially within the digital humanities. Our leadership currently includes representatives from three continents and we are interested in continuing to broaden the geographic (and other) diversity of our membership and will leverage our connections with the DPHE-IG to enrol Asian partners at the upcoming Plenary 7 meeting in Tokyo.

5. Initial Membership

Leadership (biographic notes in Appendix A)

●      Co-chair: Brandon Costelloe-Kuehn, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Co-chair: Dominic Difranzo, University of Southampton, Southampton, UK

●      Co-chair: Jarita C. Holbrook, University of the Western Cape, Cape Town, South Africa

●      Co-chair: Lindsay Poirier, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Co-chair: Mike Fortun, Rensselaer Polytechnic Institute, Troy, NY, USA

Initial Members/Interested (based on prior discussions and involvement with the DPHE-IG)

●      Alison Kenner, Drexel University, Philadelphia, PA, USA

●      Brian Callahan, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Bridget Almas, Tufts University, Medford, MA, USA

●      Dan Price, University of Houston, Houston, TX, USA

●      Danah Tonne, Karlsruhe Institute of Technology, DARIAH, Germany

●      Ellen Foster, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Jason Baird Jackson, Indiana University, Bloomington, IN, USA

●      Kim Fortun, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Lindsay Poirier, Rensselaer Polytechnic Institute, Troy, NY, USA

●      Matthew Turner, Northwest Knowledge Network, University of Idaho, ID, USA

●      Rainer Stotzka, Karlsruhe Institute of Technology, DARIAH, Germany

●      Robert R. Downs, Columbia University, New York, NY, USA

●      Sharon Traweek, University of California, Los Angeles, CA, USA

●      Luis Felipe Rosado Murillo, Berkman Center for Internet and Society, Harvard University
 

6. References

Holstein, J.A. and Gubrium, J.F. 2004. “Context: Working it Up, Down and Across,” in C. Seale, G. Gobo, J.F. Gubrium and D. Silverman (eds) Qualitative Research Practice, London: Sage.

Keller, Evelyn Fox. 2002. Making Sense of Life: Explaining Biological Development with Models, Metaphors, and Machines. Cambridge, Massachusetts: Harvard University Press.

Turkle, Sherry, and Seymour Papert. 1990. “Epistemological Pluralism: Styles and Voices within the Computer Culture.” Signs 16 (1): 128–57.

Riley, Jenn. 2009. “Seeing Standards: A Visualization of the Metadata Universe,” available at http://www.dlib.indiana.edu/~jenlrile/metadatamap/ (Accessed 12/9/2015)

7. Appendix A: Leadership Biographical Notes

Brandon Costelloe-Kuehn is an anthropologist and Lecturer in Science & Technology Studies at Rensselaer Polytechnic Institute. Using multi-sited ethnographic methods, his research examines, and participates in, the design of innovative media systems to address the communication and collaboration challenges of politically and scientifically complex environmental issues. He works within a number of collaborative endeavors, including The Asthma Files, PECE and the Multispecies Salon. Brandon was awarded a Summer Internship and then RDA/US Fellowship to to develop the Metadata for Empirical Humanities WG proposal and contribute to a number of ongoing initiatives within the Digital Practices in History and Ethnography Interest Group.

Dominic DiFranzo is a Research Fellow with the Web and Internet Science Group at the University of Southampton in the UK. He currently works in the Engineering and Physical Sciences Research Council (EPSRC) funded project, SOCIAM, which involves researching the nature and development of social machines. His research involves collaborating with colleagues across the social sciences and humanities to translate the tools and methods from data science, e-science and informatics to address their research needs and purposes. This includes working with a wide array of research groups and projects including large-scale social network analysis, experimental ethnography, open government data and web observatories.  He holds a PhD in Computer Science from the Rensselaer Polytechnic Institute and was a member of the Tetherless World Constellation.

Jarita C. Holbrook is an Associate Professor of Physics at the University of the Western Cape, South Africa. She holds a doctorate in Astronomy & Astrophysics from the University of California, Santa Cruz. She was a postdoctoral fellow at the Center for the Cultural Studies of Science, Technology, and Medicine at UCLA, and the Max Planck Institute for the History of Science in Berlin, Germany. She is a cultural astronomer focusing on African indigenous astronomy, the culture of astrophysicists and practices of inclusion and exclusion. RDA DPHE-IG members Jarita Holbrook, Sharon Traweek, Luis Felipe Rosado Murillo, and Reynal Guillen are part of AstroAnthro.net, an umbrella project focused on studying astrophysicists, their culture, diversity and their engagement with big data and big collaborations. Of importance to the group is automating tools for data visualizations characterizing the content of interviews and publicly available data on individual astrophysicists.

 

Lindsay Poirier is a PhD student in the Science and Technology Studies department at Rensselaer Polytechnic Institute. For the past year, she has served as the lead platform architect for PECE - a role that involves translating the theoretical commitments of the empirical humanities into digital infrastructure. Her dissertation work draws on the history of artificial intelligence and leverages ethnographic methods to analyze the design and politics of the World Wide Web. She has contributed to a number of initiatives in the DPHE-IG.

 

Mike Fortun is a historian and anthropologist of the life sciences whose research has focused on the contemporary science, culture and political economy of genomics. His work has covered the policy, scientific and social history of the Human Genome Project in the U.S.; the growth of commercial genomics and bioinformatics in the speculative economies of the 1990s; and the emergence of transdisciplinary research programs in toxicogenomics, addiction and environmental health. Mike Fortun is a co-chair of the DPHE-IG and is a lead developer of PECE.

 

 


[1] http://rd-alliance.github.io/metadata-directory/subjects/

 

Review period start:
Wednesday, 10 February, 2016
Custom text:
Body:

Case Statement/Charter for the establishment of an Joint RDA/TDWG Working Group on Metadata Standards for attribution of physical and digital collections stewardship

Thessen, A.E., Woodburn M., Ariño A., Flann C., Nicolson N., Shorthouse D. and Koureas D.

1. WG Charter

This working group (WG) will address the incomplete standards for giving attribution for the maintenance, curation, and digitization of collections. Within the scope of the WG, collections can include digital data and digital/physical objects. This WG will produce use cases from a variety of disciplines that will be used to create the final deliverable – an attribution metadata schema. Adopters will include stewards of collections (such as the Natural History Museum London) and aggregators of professional research metrics (such as ImpactStory).

 

2. Value Proposition

Research collections are an important tool for understanding the Earth, its systems, and human interaction. These collections are very diverse and can include preserved natural history specimens, archeological artifacts, or historical documents, to name just a few. Maintaining and curating these collections requires a large investment of time and money by institutions and many individuals. Knowledge is created from collections by many individuals over time, building on the work of others. For maximum efficiency, work needs to be shared broadly, recorded permanently, and tasks not repeated unnecessarily. Unfortunately, the current research cyberinfrastructure does not support this level of efficiency. 

 

Despite the importance of collections, many are not maintained or curated as thoroughly as they should. Part of the reason for this is the lack of professional reward for curatorial actions. Most of the researchers who are qualified to curate a collection are too busy performing activities that will reap professional reward, such as publication and grantwriting. Proper methods of attribution (at the individual and institutional level) are very important for incentivizing digitization, mobilization. And sharing of data deriving from collections (physical and digital). One strategy for incentivizing physical and digital collection curation is to create infrastructure for attributing curatorial actions. Several programs exist for aggregating metrics for research products other than publication, such as ImpactStory, OpenVIVO, Collector, and Altmetrics. Thus, there is already infrastructure in place for aggregating these data, if the e-infrastructure for creation of these data is available.

 

Significant investment has been made in creating infrastructure components for data integration across a wide variety of disciplines. Many of these components are lists, repositories, or other structures that must be populated with data either by a person or algorithmically. Even an automatically-created data set will require some degree of human curation to ensure quality. Often, very little can be completed without initial work by a person to create reference material. This human-component is a major bottleneck. Thus, existing infrastructure for collective resources are not being populated with data and thus are not maximally useful. One way to widen the bottleneck is to create professional incentives for researchers to contribute to maintaining and curating collections. If people could get professional credit for improving a classification, for example, it would be much easier for them to dedicate the time required. The problem is that there is no good way to manage information about curatorial actions so that curators can get professional credit.

 

The goal of this WG is to develop an attribution data schema (in collaboration with adopters and with use cases from several disciplines) that can make getting credit for curation, maintenance, and digitization of a collection as easy as getting credit for a publication. The deliverables of this WG will benefit institutions that maintain collections and individuals who curate them and will lead to:

  • Improved recognition of the immense effort required for maintaining, curating, and sharing collections, which is likely to lead to increased funds for these activities
  • Increased efficiency in knowledge generation from collections through the proper documentation of corrections and analyses performed
  • Increased viability of crowdsourcing as a model for building collaborative research resources
  • Increased relevance of existing e-infrastructure that is being stifled by the expert annotation bottleneck

3. Engagement with existing work in the area

Most institutions that maintain collections of physical items employ, or are moving towards, a central Collections Management System (CMS) to support digital object curation. Certain information about personal contribution to digital activities can often be assembled from the generic database audit trail incorporated into these systems. However, the primary function of these structures is to support system and workflow requirements, so are rarely able to provide complete and accurate attribution metadata, and can only reflect digital rather than physical effort. There is therefore a strong case for an attribution metadata schema which these systems could adopt as part of their data model and workflows. Several vocabularies have been developed specifically for recording provenance information (PROVO), for recording information about physical samples (IGSN), and for describing contributor roles (TaDiRAHCRediT, OpenRIF) . In addition, several domain-specific standards provide methods for giving attribution for a physical object, data set, or data product (TDWG, ESIP, SESAR, CODATACOPDESS, etc.). None of these standards provides a method for recording specific curatorial actions on a physical/digital object, digital data set, or data product. All of these existing standards provide pieces of a system that, with some additional work, could make attribution and professional reward for curatorial actions possible. This WG will strive to ensure interoperability between its recommendations and existing schema.

 

The PID (Persistent Identifier) Collections WG, which currently has its case statement in review, is potentially very relevant to the work we propose. Briefly, this group will develop collections-level metadata and specifications for an API. This WG will be more focused on developing metadata for individual objects within a collection rather than collection-level metadata, but we will collaborate with this group to ensure that our schema are interoperable.

 

The Metadata Standards WG and Interest Group (IG) are very relevant to the goals of this proposed WG. We will provide a use case for these groups and align our schema with the proposed metadata elements.

 

One group with whom we will be working very closely is IGSN, an organization that provides a unique identifier for physical samples. This group started working primarily with geological samples, but are now moving toward accommodating biological samples. We feel that their initial metadata schema is a good starting point.

 

Museums, repositories, and other stewards of collections are always working hard to maintain and curate their collections for maximum use. This WG will be pursuing these institutions as adopters and working closely with them to investigate large-scale viability of solutions they have implemented as well as ensuring WG deliverables will be useful to them.

 

In order to have a true impact on the social aspect of professional reward, the WG deliverables need to ensure that data within the schema can be used by professional metrics aggregators such as ImpactStory. We will work closely with this project to make sure that their system can handle our products. One important difference between this WG and other efforts is the focus on outputs that result in actionable metrics.

 

4. Work plan

The work of this WG will be completed in 18 months. We will split the tasks and milestones into three concurrent work packages (WP).

 

Work package 1​: Requirements (M1­-M6)

  • Task 1.1. Develop use cases via WG contribution and community engagement

Milestone 1.1: Use cases report (M6)

 

Work package 2​: Technical (M4­-M16)

  • Task 2.1. Investigate existing schemas/infrastructure

Milestone 2.1: List of relevant references (M6)

  • Task 2.2. Develop attribution metadata standard and schema

Milestone 2.2.1. Draft attribution metadata standard and schema document (M12)

Milestone 2.2.2. Schema review with feedback from case studies (M16)

 

Work package 3​: Community establishment (M6­-M18)

  • Task 3.2. Initiate process suggesting the schema for ratification as a community standard through TDWG

Milestone 3.2. Process initiated, and acknowledged by TDWG (M17)

  • Task 3.1. Liaise with stakeholders and community actors to establish adoption plans through piloting actions

Milestone 3.1. Report potential adopters and planned actions (M18)

 

WG final deliverable

  • Final attribution metadata standard and schema document (M18)

 

5. Adoption plan

In order to ensure the eventual functionality and to maximize usefulness of the schema the WG will consult with stewards of physical specimens and aggregators of professional metrics. These collaborations will start from the very beginning of the WG. Potential adopters are categorised in the following groups:

  1. Data providers/Data stewards
  2. Aggregators/Repositories
  3. Publishers
  4. Scholarly metrics providers

The following organizations have from the outset expressed interest in the deliverables of the WG: ­

  • Natural History Museum London, UK (Data provider) ­
  • Royal Botanical Gardens Kew, UK (Data provider) ­
  • Biodiversity Heritage Library (Data provider) ­
  • Pensoft (Publisher) ­
  • ImpactStory (metrics provider) ­
  • Naturalis (Data provider)

6. Operational Policies

6.1. WG mode and frequency of operation

 

This WG will hold in-­person meetings at RDA plenaries as well as TDWG meetings. Also virtual meetings will be held every month. Virtual meetings will be recorded and posted for interested parties who could not attend. Every three months a short report on activity will be requested by the WP leaders and circulated to all members of the WG. All WPs will be supported through a wiki, a developer forum, and mailing lists.

 

6.2. Plans to develop consensus, address conflicts, and stay on track

 

All meetings will be kept on track by having an agenda, action items, and deadlines for those action items. The deadlines will not be flexible. In the event that there is still a lot of open discussion as a deadline approaches, the state of discussion will be reported in the corresponding deliverable. Consensus will be reached via open discussion and voting as appropriate. It is the responsibility of the WG leaders to build consensus through structured moderation. If a conflict cannot be resolved within the WG, the RDA council will be consulted and an independent party will be brought in to mediate. The WG will avoid mission creep by sticking to the project plan as outlined above. Appointed moderators and WG leaders will enforce focused discussion by, for example, splitting forum threads as appropriate.

 

6.3. Broader community engagement and participation plan

 

This WG will hold working meetings and joint meetings at every RDA plenary. The monthly meetings will be open to any interested party regardless of WG membership. Notes, slides, and recorded meetings will be made available on the RDA website. The wikis and forums will be open.

 

7. Initial membership

  Name Affiliation Country Role
1. Agosti D. Plazi CH  
2. Ariño A.H. University of Navara SP  
3. Flann C. Species 2000 NL  
4. Koureas D.N. Natural History Museum London UK co-chair
5. Miller C.      
6.  Nicolson N. Royal Botanic Garden Kew UK  
7. Penev L. Pensoft BG  
8. Piwowar H. ImpactStory US  
9. Priem J. ImpactStory US  
10. Pyle R.      
11. Schentz H.      
12. Shorthouse D. Université de Montréal CA  
13. Thessen A.E.

Ronin Institute for Independent Scholarship and The Data Detektiv

US co-chair
14. Patterson D.J. Plazi AU  
15.  Woodburn M.S. Natural History Museum London  UK co-chair
16. Kersten Lehnert Columbia University US  
17.  Wouter Addink Naturalis NL  
18. Stacy Konkiel Altmetrics US  

nbsp;

8. Co-chairs

The WG will be initially co-chaired by Anne Thessen, Matt Woodburn and Dimitris Koureas.

At the group's kick-off meeting the list of co-chairs will be revisited to ensure balance between technical, outreach and implementation aspects.

 

Acronyms

TDWG - Biodiversity Information Standards
ESIP - Earth Science Information Partners
IGSN - International Geo Sample Number
CODATA - Committee on Data for Science and Technology
COPDESS - Committee on Publishing Data in Earth and Space Science
SESAR - System for Earth Sample

 

Review period start:
Friday, 5 February, 2016
Custom text:

Pages