You are here



See revised Case Statement V2 and TAB review response in attached documents.




Proposal for ICSU-WDS/RDA WG on a Framework and Registry for Brokering and  Mediation Components

Mediation and Brokering

In a SOA framework, software components interoperability is implemented by defining and using common protocols. In a web services framework, interoperability protocols  are characterized by their interface methods, and bindings as well as their payload content. For example, in the case of protocols for data discovery and access, the payload content contains data and metadata encoded using specific models.   

Mediation and adaptation modules are often used to map two different content models or two different interface methods or two different binding types. Commonly, a mediation module addresses the mediation of one feature characterizing two different protocols -e.g. the payload content model.  

Brokering services can be used to implement more advanced and general mediation functionalities. Brokering components address all the three protocol heterogeneities: methods, bindings, and payload content models. In addition, they implement mediation from many-to-many different protocols  

In the geospatial information domain,  Brokering services were successfully introduced to mediate across the different disciplines/Community protocols for data discovery, access, transformation and (in the most advanced cases) processing.  


Problem Description
The scope of the WG covers data sources from research and scholarly communication. Within this domain obstacles for wider application of brokering techniques are:

1.    Multiple service protocols for data discovery, access, and application or processing;
2.    Multiple content standards for data and metadata, augmented by Community profiles and non-standard implementations;
3.    Multiple vocabularies and ontologies.
4.    Multiple adaptation and mediation modules that are not guaranteed to be compatible.

Research projects and research data infrastructure initiatives often solve problems associated with this diversity as a matter of course, but the knowledge gained and components developed during such a process are not visible and useful to others. Furthermore, project life cycle limitations lead to lack of sustainability, loss of expertise, code, and infrastructure.


Address the Need
●    Define a description schema for services, vocabularies, ontologies, content standards, and adaptation components that allow services and clients to be matched – with a mediation component interposed if required.

●    Establish a prototype registry based on the above.

●    Describe a collection of existing mediation and adaptation components that can interoperate through well-defined existing interface specifications and applicable standards.

●    Create a mediation and adaptation components registry -the objective is to support implementation of a more general and agnostic mediation capability.

●    Define a test bed environment for testing interoperability of mediation alternatives leading to recommendations for application areas. The focus will be on metadata and data mediation across data systems that address different disciplines and scopes. 


Mediation functions to be supported by the test bed:

1.    Data discovery and access protocols including harvesting and synchronous distribution (subscription/notification).
2.    Data and metadata content transformation (harmonisation).
3.    Metadata content enhancement and Linked Open Data enablement through vocabularies and ontologies.
4.    Application to popular protocols and service definitions.

Read the full case statement 

Review period start:
Thursday, 4 February, 2016
Custom text:


Updated with TAB feedback 21-June -2016




Given the increased value of research data as a national research asset, many countries are

establishing national and regional level support for research data. This interest group targets such national/regional services with broad cross-discipline scope1.  


For members of the RDA community, “how to effectively form and operate such national data services?” is an important question  that this interest group aims to shed light on.


For the purposes of this interest group “data services” are broadly inclusive of many possible services such as national data storage services, computational services, analysis services, discovery services, identifier services, support and training services, etc.


The word “national” too is not meant to exclude regional or partially national approaches. The IG is inclusive of and acknowledges a variety of configurations, consortia, and approaches to delivering national multi-discipline data services.  Indeed documenting the various types and configurations of national data services is one of the objectives of the group.


This interest group is intended as a peer support mechanism for exchanging ideas and information between national data services. The peers in this group can learn from each other’s experiences and form task forces on specific issues that can potentially lead to new Working Groups in RDA.


By allowing national data services to learn from the experience of each other, the desired outcome is to increase efficiency for the individual operators of national data services and to promote a more coherent joined-up, community of practitioners.


User scenario(s) or use case(s) the IG wishes to address

User story:  “I am part of an organisation that aspires to collaborate and form a consortium with other organisations in my country to receive national research infrastructure funds to provide national data services which will allow researchers in our country to access national-scale services and participate in global initiatives.  Who else has done this elsewhere? How can I learn from what other jurisdictions have tried?  What services can we fruitfully offer? Can we leverage work elsewhere?  Can we solve common issues jointly?”


There are a variety of suggested topics that have arisen from discussions with various potential members of this IG, including:

  • Exploring how one might set up a national data service

  • What kind of collaboration, consortia, and governance models do national data services use to organise themselves?

  • What services do national data services typically provide? e.g. data curation, data storage, data computation, data management, data discovery, data registration, training and education, national data policy advocacy.

  • Technical and social challenges (and solutions) for national data services.

  • Business procedures of national data services



This group aims to:

  • Exchange information among national data services

  • Establish networks among national data services

  • Identify opportunities for collaborative action


The group aims to help aspiring national data services to benefit from the experience of their peers in other jurisdictions.  There is no other peer network specifically for national data services outside or inside RDA.  The IG will also provide a useful forum for national data service providers as a group to identify common issues  to address jointly.


The group will also help RDA achieve its objectives by investigating how national data services can adopt or promote RDA output or in broader terms help in achieve interoperability.



Group membership would be open to those providing national data services or planning to do so.


This IG would interact with other working groups to create synergies and reduce duplication of effort.  The IG has already organised joint sessions with both the Domain Repositories IG, and the Data Fabric IG to coordinate and jointly address shared concerns.  The Data Fabric IG envisages many of the elements of the “data fabric” being delivered through “national/regional data centres”.  Domain repositories share some of the challenges of building national data services.


The proposed IG will also leverage the work of many RDA working groups, validating and applying their outputs through the medium of national data services.  For example the proposed IG will investigate how the outputs of the PID IG can be applied by national-level identifier services (e.g. through national DataCite registration agencies or national ORCID consortia).


The group will actively pursue participation of national data centres that hold large scientific datasets (such as the National Computational Infrastructure in Australia and the equivalent services in other jurisdictions).



  • Natural typology of national data services via the community wiki output:

    • what services/activities do national data services offer?

    • what organisational models do they use?

  • Information exchange sessions at plenaries covering specific topics

  • Discussion forums and informal mentoring networks

  • Identification and definition of joint projects, task forces, or working groups




The group will run sessions at plenaries 7, 8, 9, and 10. Between plenaries, monthly virtual meetings will alternate at times that are convenient for both western and eastern hemispheres.



  • 6 months:  Finalise framework to document and categorise :

  • services typically offered by national data services

  • organisational models used to build national scale services

  • 12 months: Create a list of common issues, challenges, and opportunities

  • 18 months: Propose joint activities for informal collaborations or relevant WGs


Potential Group Members


Co-Chairs: Kevin Ashley (DCC), Adrian Burton (ANDS)


The following organisations have expressed an interest in contributing to the establishment of such a group:

  • Australian National Data Service

  • JISC

  • DCC

  • eScience China

  • Portage/Research Data Canada

  • DANS

  • OpenAIRE

  • US NDS


  • NeSI - New Zealand eScience Infrastructure

Current members of the RDA NDS IG site:

Adrian Burton

Amir Aryani

Antonio Jesús Sánchez Padial

Armin Straube

Ayla Stein

Christopher Brown

Damien Lecarpentier

David Medyckyj-Scott

Dejan Vitlacil

Elaine Sedenberg

Eva Méndez

Kathrin Beck

Kevin Ashley

Lesley Wyborn

Linda Naughton

Mark van de Sanden

Natalia Manola

Nick Jones

Rebecca Grant

Solomon Mekonnen Tekle

Stefanie Kethers

Timea Biro

Veerle Van den Eynden

Ville Tenhunen

Wojtek Sylwestrzak


Other members of the RDA community who have contributed to or have been consulted in the course of building of this IG:

Giuseppe Fiameni

John Towns

Ed Seidel

Bob Hanisch

Rob Pennington

Nick Jones

Kathy Fontaine

Ingrid Dillo

Peter Doorn

George Alter

Ruth Duerr



1 Specific domain-focused repositories are the subject of a complementary interest group, “Domain Repositories IG”.  The proposed NDS IG focuses rather on more generic, multi-domain national-level services and on a potentially much broader set of services than “repository” (ie computation, analysis, integration, identifier, legal, training, policy etc).  Nevertheless the two groups are well aligned and where appropriate work together on shared issues; a joint session has already been held at the Paris Plenary.



Review period start:
Tuesday, 19 January, 2016
Custom text:

RDA Health Data Interest Group Charter - Revised Charter taking into account the TAB recommendations


Name of Proposed Interest Group: Health Data Interest Group


Introduction (A brief articulation of what issues the IG will address, how this IG is aligned with the RDA mission, and how this IG would be a value-added contribution to the RDA community):

Here follows a Proposal for an Interest Group on “Health Data” (HD-IG), as a long-term initiative in the framework of RDA. It follows a rather successful BoF Session during the 6th RDA Plenary Meeting in Paris, which was attended by over 35 researchers and professionals from diverse backgrounds, who discussed several relevant issues, expressed significant interest in forming the proposed Interest Group, and helped shape its focus as presented here. The Interest Group will fill a gap in the RDA subject map formed by its current WGs and IGs, as at the moment, there is no RDA group focusing on the intricacies of Health Data, especially as it relates to privacy and security issues in Healthcare. Establishment of this IG will also enrich the set of communities involved in and contributing to RDA, as there are several professions as well as research disciplines that revolve around Health Data.

This proposal to form the HD-IG is rooted in a long series of European, international, and national projects in the area of biomedical informatics, in which the proposers have been involved in the past decade. These include projects Health-e-Child (, Sim-e-Child (, MD-Paedigree (, p-medicine (, Cardioproof (, Avicenna ( and others.
Different techniques of de-identification were adopted (pseudonymisation and anonymisation) and ad-hoc privacy guidelines were developed during these projects, not only to meet the requirements of the in-force legislation but also to face future challenges in the possible exploitation of the projects. The scientific and practitioner community developed during these projects is quite extensive and several members of them are expected to join HD-IG.

User scenario(s) or use case(s) the IG wishes to address (what triggered the desire for this IG in the first place):

Data-based Healthcare characterizes a fundamental shift in the way biomedical data are collected and processed, as well as how biomedical research is performed. The application of data techniques in Healthcare will allow us to capitalise on growing patient and health system data availability and generate healthcare innovation. However, to bring about this revolution in healthcare there are legal, technical and cultural/societal barriers that must be overcome.
The proposed “Health Data” Interest Group (HD-IG) seeks to bring together stakeholders from all relevant sides and provide a forum for discussion on the specific issues that arise when using advanced data management and analytics techniques in a Healthcare setting, particularly (although not exclusively) focusing on the impact of privacy and security concerns.

Bottom-up (evidence-oriented) analysis, seeking to extract useful knowledge by mining the daily routine's streaming data, is of fundamental interest in model-guided personalized medicine. In this context, advanced techniques are applied aiming to identify latent factors (disease signatures) that can explain and predict variability in drug therapies and disease evolution, reveal similarities among patients stratifying patient groups and build patient specific simulation and prediction models. Such an approach goes beyond classical flat file data analysis, batch learning procedures, and simple data analysis techniques commonly focused only on a few variables of interest and a well specified dataset from a specific clinical trial.
On the contrary, Knowledge Discovery and Data Mining (KDD) platforms in this area should be able to handle massive volumes of uncertain, streaming heterogeneous biomedical data, to curate, validate and analyze them in an incremental/on-line fashion from multiple points of view and under different assumptions, as well as to include or exclude dimensions, combine different modalities and incorporate existing knowledge and previous beliefs, all while preserving the privacy of the patients whose data is being analysed.
The still growing potential of modern data management and analysis is today fully acknowledged, but it may remain partly undeveloped or lead to undesirable outcomes or misuses of data if not carried in parallel with a deeper understanding of the regulatory and legal challenges it poses to patients’ privacy and data protection. At the same time, harming innovation and putting restrictions on research should be avoided. Indeed, as debates and proposals held in different countries show  (such as on a “Magna Charta for Big Data”)[1], there’s a societal need for a more adequate legislative framework for ethical leveraging of data applications, balancing the needs and rights of data providers and owners.

It must be remarked that privacy and regulatory issues related to the process of “data-intensive scientific discovery” have become a metter of special attention for the EU, in particular after the approval of the General Data Protection Regulation, which determines an updated legal framework still needing to be attentively analysed, with the aim to strengthening individuals’ trust and confidence in the digital environment and enhancing legal certainty.
The general debate on health data policies turns around three core themes: the need to ensure that citizens’ data are adequately protected; the need for Open Access to data for research purposes; the need of allowing a Data Value industry to play a growing role also in Health. As a consequence, it is necessary to strike the appropriate balance between individual privacy concerns in the healthcare setting and research purposes and innovation, which can greatly benefit patients.

Given the lack of legal international harmonisation and the different national implementations of data protection, different approaches and protocols will be adopted (many of which in accordance to HIPAA, which still is, as yet, the largest de-identifying constraint expression). Comparing and discussing these approaches is a fundamental need for the improvement of data technology in Healthcare.
The HD-IG will provide its members with a forum to discuss and highlight the legal, technological, ethical and societal challenges to the adoption of advanced data management and analysis techniques in Healthcare, to exchange opinions and compare experiences, and form Working Groups to address these challenges.

Objectives (A specific set of focus areas for discussion, including use cases that pointed to the need for the IG in the first place.   Articulate how this group is different from other current activities inside or outside of RDA.):


The initial focus of the HD-IG includes the following areas:

  • ·Privacy and Security in Health Data
    • sharing best practice on pseudonymisation, anonymisation, differential privacy, homomorphic encryption and dedicated blockchain applications
    • developing models for dynamic consent that protect patients while enabling research
    • providing a forum for discussing, explaining and responding to data protection regulatory issues
  • Data-based Healthcare for Personalised Medicine
    • analytics applied to highly sensitive health data
    • disease signatures identification and stratification of patient groups
    • patient-specific simulation and prediction
    • exploring the potential of health data usage in in silico drug development and clinical trials
  • Health Data Organisations environment
    • defining a list of organisations dealing with Health Data to cluster liaisons with
    • disseminating HDIG results within other relevant Health Data organisations

Regarding other related WG/IG efforts in the area, the following points can be observed:
The discussions during the P6 BoF Session shifted its original focus from “Big Health Data” to simply “Health Data” and intensified the already high importance of privacy and security aspects in the field. This proposal reflects the conclusions of those discussions and concentrates on a rich set of issues that are critical for Health Data and are not covered by any currently active RDA IGs.
Other related WP/IG were invited to participate at the P7 BoF, to give short presentations on their focus and to discuss possible overlapping areas to be aware of, but also possible ways of joining efforts and actions on common issues.
Besides the connections estabilshed in Paris at P6 BoF Session with the following groups: Active Data Management Plans, Big Data, ELIXIR Bridging Force, Ethics and Social Aspects of Data, Long tail of research data, RDA/CODATA Legal Interoperability, Structural Biology, in Tokyo, connections were established also with two newly proposed Working groups, Data Security and Trust Working Group  (WGDST) and RDA/NISO Privacy Implications of Research Data Sets WG
Furthermore, in the Tokyo BoF there were interventions made by GeoHealth, i.e. the geospatial dimension in health data and related analyses; Population data base - project HIVE, i.e. building a repository for the Gold coast region in Queensland starting from hospital data to link to citizen data; Human stress management and monitoring – MindFlow; Clinical data publication guidelines - Nature - Scientific Data.
This IG is the only one dealing mainly with the vertical of Health Data, while other groups dealing with privacy, security and trust are horizontal with potential use cases from several areas.
Nevertheless HD-IG will seek to pursue collaboration with those IGs that have affinity to aspects it will address, as well as with external organisations, such as VPH Institute (Virtual Phisiological Human), National Association of Health Data Organizations (NAHDO), PerMed (Personalised Medicine), HIMSS (Healthcare Information and Management System Society), IMI (Innovative Medicines Initiative).

Participation (Address which communities will be involved, what skills or knowledge should they have, and how will you engage these communities.  Also address how this group proposes to coordinate its activity with relevant related groups.):

HD-IG is open to all RDA members to participate. Particularly, but not exclusively, HD-IG welcomes individuals with the following expertise to actively participate in its activities:

·      Clinicians wanting to use data technology to improve practice

·      Biomedical researchers using data heavy analytical techniques

·      Healthcare Data Analytics with data mining, machine learning, physiological modelling and image processing expertise

·      HPC and distributed computing experts

·      Policy-makers for Healthcare

·      Health bioinformatics legal experts

·      Healthcare administrators and Health Maintenance Organisations

·      Pharmaceutical industry researchers and manufacturers

·      Medical equipment researchers and manufacturers

·      In silico modelling, testing and clinical trial experts


A quick survey during the P6 and P7 BoF Session identified participants with all but the last expertise in the above list, an indication of both the diversity of relevant stakeholders and the strength of current interest in the focus areas of HD-IG.
Nevertheless, the HD-IG will endeavour to reach out for a large number of worldwide participants, focusing especially on those moderately involved in biomedical issues.
In view of P8, an ad-hoc communication strategy will be put in place, to engage experts and people involved in other Health data projects or in the external organizations listed above.

Outcomes (Discuss what the IG intends to accomplish.  Include examples of WG topics or supporting IG-level outputs that might lead to WGs later on.):

Outcomes are expected in the areas identified as part of the Objectives, namely:

  • Privacy and Security in Health Data
    • Best practices on pseudonymisation, anonymisation, differential privacy, homomorphic encryption and dedicated blockchain applications
    • Models for dynamic consent that protect patients while enabling research
    • Recommendations and standards on data protection regulatory issues
  • Data-based Healthcare for Personalised Medicine
    • Analytics applied to highly sensitive health data
    • Disease signatures identification and stratification of patient groups
    • Patient-specific simulation and prediction
    • Exploring the potential of health data usage in in silico drug development and clinical trials
  • Health Data Organisations environment
    • Identify a list of organisations dealing with Health Data to establish liaisons with
    • Address HDIG statement to relevant organisations to invite them to join forces in further developing the IG.

Mechanism (Describe how often your group will meet and how will you maintain momentum between Plenaries.):

The HDIG will meet at the next Plenary (P8) in Denver, Colorado.
In P9 HDIG will organise joint meetings with the related WG/IGs, which will have shown interest for such a possibility in P8.
In between plenaries there will be at least one meeting every two months, via teleconference.

Timeline (Describe draft milestones and goals for the first 12 months):

We are looking forward to having the IG established before the 8th RDA Plenary Meeting, September 2016 in Denver, Colorado, so that we may extensively spread out the news inviting relevant experts to attend in view of having the first official meeting of the group then.
In P9 HDIG will have joint meetings with related WG/IGs, depending on the interest of other groups in collaborating/working together.
The first outcomes will be presented at the latest after 12 months, taking into account the chosen prioritarization among the different areas.

- Avicenna in silico clinical trial (
- IMI (
- National Association of Health Data Organizations – NAHDO:   (
- Oxfam
- Pan American Health Organization (PAHO)
- PerMed (
- Project Hope (NGO, Usa)
- United States Agency for International Development (USAID)
- VPH Institute:
- WHO (World Health Organization)


[1] A “Magna Carta for Data” was proposed and discussed during a seminar hosted by the Insight Centre for Data Analytics titled “Insight: Frontier Data Analytics: Towards a Magna Carta for Data”, held in Brussells on the 4th February 2015. The discussion document is available on the following link:


Review period start:
Friday, 8 January, 2016 to Friday, 5 February, 2016
Custom text:

Revised Charter

See link to Version 3 in attachments to this page below, updated as an Interest Group Charter.





[Old Case Statement]


Working Group Charter


This Working Group will explore issues related to scientific research data sets that contain human subject information, as well as related datasets that have the potential to be combined in a way that can expose private information. Over a period of 18 months, the group will develop a framework for how researchers and repositories should appropriately manage human-subject datasets, to develop a metadata set to describe the privacy-related aspects of research datasets, compile a bibliography of related resources, and to build awareness of the privacy implications of research-data sharing. While privacy is related to the ethical, legal, and data-publishing issues surrounding data management of which privacy is a part, this working group is focused specifically on privacy-related concerns and will support, where appropriate, the related work of other RDA working groups.


The group will work to achieve the following specific outcomes:

  1. Development of a framework that explains, at a high level, the precautions that data creators, repositories, aggregators and scientists should use in creating, using, preserving, and providing access to research data.
  2. Definitions of key vectors where privacy issues are evident in the ecosystem of data sharing and reuse.
  3. An outline of situations where the privacy principles would be applied.
  4. Identification of key areas of variance in privacy laws or regulations at national and international levels that are significant when sharing data worldwide.
  5. Definition of a set of technical metadata that can be used to describe privacy-related information contained within a data set, parameters for use, and description of where it should be applied.
  6. Gather and share a bibliography of data-and-privacy-related materials for public use.
  7. Advancement of adoption of the principles through an outreach and communications campaign.


Value Proposition

The information that comes out of this project will enhance the privacy of people worldwide whose personal data become the subject of research, as well as offer guidelines those involved with the collection, preservation, sharing, use, and re-use of that data. The latter group is very broad, as the number of fields that are using—or could potentially benefit from applying—human subject research data is tremendous. Medicine and psychology are obvious examples, but data science is also being integrated into other fields such as the humanities and social sciences. Disciplines have developed ethics for researchers in these situations and human-subjects review protocols ensure proper treatment of research subjects during studies, but no generalized guidelines yet exist for these same privacy issues in the deposit, preservation, and re-use of such datasets. Institutional Review Boards (IRBs) vet a variety of research processes, including data management and reuse polices primarily within the United States, but even there a 2014 report by the National Academies recommended a number of changes to the IRB system—changes that will be informed by the recommendations produced by this working group. Other parties that might benefit from this effort are research funding bodies, governments, and academic data repositories.



1)  Improving the understanding of the privacy issues that relate specifically to research data from distinct stakeholder perspectives.

2)  Support of a worldwide dialogue about the privacy issues that surround the sharing, combination, and reuse of research data.

3)  Reduction of the risk of an unintentional release of personally identifiable information through the sharing or reuse of research data.

4)  The creation and adoption of this framework will reduce the potential risk to scientific discovery writ large that might be caused by the unintentional but significant exposure  of personal data.

Engagement with existing work in the area 

NISO and RDA are both involved in related work. Related efforts are also being undertaken by outside organizations, and this working group will in some cases include individuals involved in those endeavors. In other cases, the results of the outside work being undertaken will be studied and, where applicable, applied to this project.


The following are some related projects. 


NISO has completed a project funded by the Mellon Foundation related to privacy of patron data in library, publisher, and software-provider systems. This effort created a high-level set of principles that will provide the scholarly communications community with a benchmark to relate to these issues. The principles were distributed and discussed in late 2015. While that project was explicitly focused on the U.S. market and not focused on data, but rather on publisher and library end-user services, it is related to, and will inform, this work.


The Research Data Alliance has a number of groups that are exploring related issues as well. An interest group within the RDA is focused on Legal Interoperability for data sets. This group has been developing a core set of principles and guidelines that include best practices through which legal interoperability can be achieved. For human subject data, a core component of legal interoperability deals directly with privacy issues. 


A new RDA working group has formed that will explore security and trust as it relates to research data. The group, which has posted its case statement for comment, will be focused primarily on the technological aspects of security and trust building necessary for security of potentially injurious data, if released. Certainly, security is a component of protection of privacy and there are many examples of efforts to securely share information, although a significant portion of privacy-related issues are policy focused, not necessarily technology focused.

Yet another intersecting group within RDA is centered on the topic of Ethics and Social Aspects of Data (ESAD). This group is studying a broad set of issues surrounding data sharing and the culture of scientists. It is creating an annotated bibliography and plans to pursue two additional deliverables, producing educational materials and case studies of ethical dilemmas faced by researchers working with data. Privacy is among many concerns that the group is focused on, although many ethical issues extend well beyond privacy. Conversations between ESAD and this group have already begun and the efforts should be complimentary.


Several additional working groups have similar connections to privacy. As this project develops, liaisons and points of contact with other groups will be explored and fostered. One such effort is the work by the Data & Society Institute and its Council for Big Data, Ethics, and Society (BDES). Data & Society is “a research institute in New York City that is focused on social, cultural, and ethical issues arising from data-centric technological development.” In 2015, the project produced a survey report entitled Human-Subjects Protections and Big Data: Open Questions and Changing Landscapes, which outlines some of the challenges related to scholarly data resources and privacy. In September of that year, BDES announced a new network to “facilitate information sharing, discussion, and community building among academics, practitioners, researchers, and others who seek to raise important questions, share opportunities, and ask for help navigating complex data ethics issues.” 


There is a significant project led by the Harvard School of Engineering and Applied Sciences entitled Privacy Tools for Sharing Research Data. This effort is part of a larger National Science Foundation Secure and Trustworthy Cyberspace Project that has received additional support from the Sloan Foundation and Google, Inc. That project’s goals are “to help enable the collection, analysis, and sharing of sensitive data while providing privacy for individual subjects.” A good deal of its work has been around tools to support differential privacy risk assessments as a framework for decision making about the risks and controls necessary to support privacy. The group has developed open course materials, hosted seminars, and produced a variety of papers and presentations. It has also organized a public symposium hosted by the Harvard Institute for Applied Computational Science Privacy in a Networked World, held Friday, January 23, 2015. The symposium included speakers Edward Snowden, Bruce Schneier, John DeLong, John Wilbanks, Lee Rainie, and Cynthia Dwork. This initiative is primarily, though not exclusively, focused on the technological and computational elements of privacy and data protection. NISO has worked closely with several members of this team and plans to include them in this working group.



Work Plan

The group will focus on world-wide legal frameworks and the impacts these frameworks have on data sharing, especially with human-subject data. After gathering these legal strictures and comparing the differences and similarities, the group will begin crafting a set of principles that will provide guidance to the researcher and repository communities on how to manage these data when they are received. Building on these, the group will craft a set of use cases on how the principles will be applied. After these elements are completed, an effort to advance the principles through promotion and community outreach will be developed and executed.                   


This working group will be open to all RDA and NISO members.

Mechanisms and Coordination
The group will meet face-to-face during the RDA Plenaries to craft its work plan, build interest and awareness, and to share its work with other working and interest groups regarding privacy and data sharing.

The group will meet virtually twice per month during the interim periods between plenaries. This will be undertaken using NISO’s virtual meeting service as well as using the RDA’s supporting services, such as the RDA website and wiki.

Meetings and group communications will be coordinated by NISO staff and the other co-chairs.

NISO is planning to host a public symposium on research data and privacy, in coordination with the RDA P8 in Denver, CO in the fall of 2016.

All documents related to this project will be publicly available on both the RDA website and mirrored on the NISO website.

A timeline for the project is included within the full Case Statement.


For more details on the group timeline and potential group membership, download the full Case Statement   A Revision to the full case statement has been posted.



Review period start:
Monday, 4 January, 2016
Custom text:

Global Water Information IG (GWIIG) Charter


Water data is being collected by multiple agencies and research groups around the world, and used across scientific disciplines and management projects. However, efficient sharing, discovery and re-use of water information is hindered by a number of issues. Key themes include development of standards for water data encoding and exchange, creating community models of hydrologic features, managing shared vocabularies of hydrologic terms, development of water data catalogs and data discovery mechanisms, introduction of persistent identifiers, development of standard scalable mechanisms for linking hydrologic data with research publications. These themes are also actively discussed by the RDA community, as they are common across research domains and not confined to water and hydrology.

At the 6th RDA Plenary, a BoF session on global sharing of water data was held. The main outcome of the discussion was a recommendation to organize a water data interest group within the RDA framework. The BoF was attended by members of the OGC/WMO Hydrology Domain Working Group (HDWG), as well as other RDA members representing hydrology and related fields.

The participants agreed on the critical importance of considering global water data sharing issues in the context of RDA. A key additional issue extensively discussed at the session was the relationship between the new proposed interest group, and the HDWG, a joint working group of OGC and WMO, which has been the key platform for international water data standards activities since 2008. It was determined that opportunities in RDA complement the activities in OGC, and that an RDA interest group be formed. Nevertheless, there is an expectation that the membership, and possibly the leadership, will overlap significantly with the OGC/WMO group. Therefore, ‘dual branding’, or rather ‘triple branding’ (considering involvement of OGC, WMO and RDA), might be considered, with complementary activities in the different forums.

The proposed Global Water Information Interest Group (GWIIG) will serve as a platform leading to formation of RDA working group (the initial set of WG topics is outlined below); as a venue for communication among RDA WGs and IGs focused on both technical eInfrastructure aspects (e.g. brokering; metadata and catalogs, dynamic data citation) and on related research domains (e.g. geosciences, agriculture, biodiversity); and as a coordination mechanism across a wide group of water information stakeholders, in particular those not previously involved in the OGC/WMO water standardization work.

While benefitting from RDA’s focus on common eInfrastructure issues, the group will contribute to RDA its experience in developing and implementing standards-based water data exchange, develop use cases for water data use in other domains, formulate technical requirements for metadata, identifiers, data management policies, catalogs, data citation, and other eInfrastructure components as derived from the practice of hydrologic data management and water informatics.

User Scenarios Addressed by the IG

One of the key areas of interest by IG members is organization of a collection of water data-related use cases that can be used in a range of standards- and eInfrastructure development contexts. Several initial use cases addressed by the group include: a) transboundary exchange of surface water and groundwater information, b) integration of standards-based water data exchange into infrastructure for supporting hydrologic forecasting, in particular in cases of disaster management and risk reduction, and c) cross-domain data integration, in agricultural and biodiversity applications.

Value proposition

Participation in RDA as an Interest Group would bring distinct value and expand the ongoing cross-country and cross-disciplinary water data integration and sharing effort within the OGC/WMO HDWG and other groups. In particular, it will help to:

a) facilitate dissemination and implementation of water data standards being developed through HDWG activities;

b) explore and compare eInfrastructures being developed for water data sharing, with related designs in other domains;

c) address issues that are currently outside of the scope of HDWG;

d) facilitate re-use of eInfrastructure foundation and components recommended by RDA or developed by RDA partners;

e) explore policies for water data sharing as related to the general policy agenda being investigated by RDA groups;

f) leverage RDA work on data publication and dynamic data citation (issues directly applicable to management of hydrologic time series);

g) make water data sharing a component of global data sharing system within the RDA framework;

h) gain insights from experience and best practices of data systems in other domains, in particular as they address common pain points in eInfrastructure development;

i) coordinate with other RDA groups on addressing common deficiencies of existing data publication and exchange frameworks such as lack of policy level enforcement of a global water data exchange scheme, in the context of each country/continent having its own IT/political approach;

j) organize use cases and get feedback from water data users in other domains;

k) attract academic researchers working on water data sharing and related infrastructure issues, as their participation may be better recognized and aligned with the RDA framework rather than with OGC;

l) widen participation in global water eInfrastructure discussion by attracting members from regions not currently engaged in the development of water data exchange standards through HDWG, in particular from developing countries;

m) identify and coordinate requirements relating to water data standards from the wider research community, as input to the HDWG-led standardization process.

While having an umbrella agreement between RDA and OGC is desirable, the discussion at the BOF led to the decision that the IG should proceed without waiting for such institutional alignment to happen – and perhaps serve as a test case for shaping such future agreement. This has the potential to bring additional value to both the RDA and the OGC and WMO communities.

Scope and Objectives

The Global Water Information IG (GWIIG) will focus on topics related to organization, representation, publication, sharing, discovery, re-use and integration of hydrologic information of different types, leveraging RDA eInfrastructure efforts and coordinating within RDA and with external groups. GWIIG will bring together key stakeholders working with water data, in particular those who are not actively participating in related international communities of practices, such as the OGC/WMO Hydrology Domain Working Group. It will actively engage with other RDA groups, both with technical and with domain focus, and explore best eInfrastructure practices developed within RDA or existing in other domains, as well as take stock of requirements for water information in other research and management domains. While the IG will provide a longer-term framework for discussion of water-related issues, it will also spin off Working Groups to address topics of interest as they emerge.


Communities involved in GWIIG will include: a) researchers and developers interested in standards for hydrologic data, mostly representing research staff of water data agencies from multiple countries; b) practitioners responsible for management of water observations, from governments, commercial companies, and public utilities; c) academic researchers interested in various aspects of water data management and sharing; d) hydrologic modelers and forecasters, from academia, government agencies and research centers; e) experts from related domains interested in using water information to support use cases in their domains; and f) technical experts and eInfrastructure developers interested in information models, identifier systems, dynamic data referencing, metadata, hydrologic feature and observation semantics and other specifics of water data. Communities not previously involved in OGC/WMO water standards work, will be engaged via announcements of discussion topics via email distribution to several target groups (e.g. US academic hydrologists via CUAHSI) and via the group’s web site; joint compilation of use cases, exploration of water-data related projects around the globe and analysis of best practices, and development of working group topics. Two of the initial co-chairs of GWIIG are also co-chairs of the OGC/WMO HDWG, which should make coordination between the two groups straightforward. GWIIG will specifically seek participation from countries in Asia, Africa and Latin America, developing countries in particular.

Activities and Outcomes

The Water Data IG will initially focus its activities on the following main areas:

●      Use cases related to water data sharing, in particular across different domains.

The group will develop a compendium of use cases, initially starting with use cases developed by HDWG members, and soliciting feedback from other RDA groups, in particular focused on agriculture, biodiversity, marine data, and earth systems science. Additional focus will be on water modeling and scalability of forecast models.

●      Components of eInfrastructure relevant to water information systems.

The focus of this work will be on recent RDA advances in data registries, metadata management, PIDs, data linking infrastructure, data management policies, and big data management. The work will be done in collaboration with respective RDA IGs and WGs.

●      Development of more complete information models and ontologies for the water domain.

This effort will leverage controlled vocabulary, ontology and information model registry work within RDA, and complement work being done within HDWG.

●      Promote water data standards developed within HDWG, explore interoperability with data exchange schemas in other domains, and facilitate global coordination across spatial data infrastructure efforts and stakeholders.

Better coordination between OGC and RDA activities is desirable, and interaction between GWIIG and HDWG will help better identify areas of focus and separation of concerns between the two communities. The developed arrangements can be scaled to the level of OGC-RDA interaction and will help define a formal relationship between the two organizations.

An example of such activity could be to aim at making more visible the standardization effort going on at the OGC level within the research publication realm (for example, DOI on OGC specs, publication of a ‘water collection standards’)

Additional partner organizations and projects in this effort will include: INSPIRE, W3C, GEO, WMO Commission on Hydrology, NSF EarthCube, ESIP, AGILE, CUAHSI, ESSI section of the AGU, and Belmont Forum.

●      Integrated data publication and dynamic data citation for water data.

This has been a key focus of RDA activities, but hasn’t been addressed within HDWG. Following RDA recommendations on data publication and citation will bring significant benefits to the water data research community.

The above topics will serve as the initial discussion foci within the IG, and potentially form the basis for RDA working groups.



The group will meet at RDA plenaries. Between plenaries, communication will be done via email and web site, and occasional conference calls and webinars once topics and schedule are finalized.


Before the 7th RDA Plenary:

●      Finalize the charter, get the IG approved and endorsed

●      Coordinate meetings with several RDA IGs to define common interests and use cases; discuss with them a collection of use cases identified so far

●      Start developing communication materials, including GWIIG web site and discussion forum

●      Call for interested WGs to be setup


In the first 12 months:

·       Compile an initial set of use cases to guide GWIIG work and become the basis of WGs

·       Coordinate with relevant funded projects and ongoing eInfrastructure efforts focused on water data management and forecasting

·       Define criteria and start compilation of water eInfrastructure projects and best practices

·       Facilitate creation of first WGs, and help coordinate their work with other relevant RDA WGs

·       Contribute to RDA registry of metadata standards and other relevant registries


GWIIG will have at least three co-chairs, who will coordinate IG activities between RDA plenaries, organize IG sessions at RDA plenaries, and coordinate work with other RDA groups. It is expected that the IG leadership will partially overlap with leadership of the OGC/WMO HydroDWG, as these two groups are expected to coordinate their activities focused on standards development (HDWG) and on surrounding eInfrastructure technical and cross-domain issues (GWIIG). As specific development projects emerge and organized as Working Groups, their leads will be selected from GWIIG members.

The initial co-chairs are:

Ilya Zaslavsky: Director, Spatial Information Systems Lab, San Diego Supercomputer Center, University of California San Diego, USA. HDWG Co-chair.

Sylvain Grellet: Scientific Program Coordinator, Scientific Information and Technologies Division of BRGM (Bureau de Recherches Géologiques et Minières), France.

Tony Boston: Branch Head, Environmental Information Management, Bureau of Meteorology, Australia. HDWG Co-chair.

Matthew Fry: Environmental Informatics Manager, Center for Ecology & Hydrology, UK.

Potential Group Members


The following individuals, from Australia, Europe and North America, participated in the BOF session, supported GWIIG creation, and expressed their intent to participate in future GWIIG sessions and related working groups.




Bruce Simons

CSIRO, Australia

Paul Sheahan

BoM, Australia

David Maidment

UT Austin, USA

Alva Couch


Tony Boston

BoM, Australia

Ilya Zaslavsky


Matt Fry


Filip Kral


Alistair Ritchie

Landcare Research, New Zealand

Brian Gouge

Aquatic Informatics, Canada

Sylvain Grellet

BRGM, France

Quentain Groan

Botanic Garden Meuse, Belgium

Simon Cox

CSIRO, Australia

Ralf Busskamp

BfG, Germany

Varsha Khodiyar

Scientific Data (NPG), UK

Mike Brown


John Watkins


Jesus Marco


Jay Pearlman


Francoise Pearlman

IEEE, USA/France

Francois Robida

BRGM, France

Rufus Pollock 

Open Knowledge, UK

Rick Lawford

Lawford Consulting, Canada/Japan

Toshio Koike

Univ of Tokyo, Japan

Rifat Hossain

WHO, Switzerland

Nguyen Hong Quan

Vietnam National University, Vietnam

Le Anh Tuan

CanTho University, Vietnam

Hesham Gaber

Library of Alexandria, Egypt

Salah Soliman

Library of Alexandria, Egypt

Mohamed Khalil

Cairo University, Egypt



Review period start:
Wednesday, 9 December, 2015
Custom text:

Introduction: We are proposing a Chemistry Research Data Interest Group under the auspices of the Research Data Alliance (RDA), to foster diverse professional exchange on issues particular to data originating from the field of chemistry. Chemistry, as one of the central sciences, has fundamental impact on the fields of health, pharmaceuticals, materials, energy and many other applied sciences. There is a wealth of chemical data in various heterogeneous formats, distributed across a myriad of systems with endless potential for reuse in chemistry research and many related domains. However, many social, technical and administrative factors have limited the opportunities for open sharing and interoperable exchange.

The high reuse value of chemical information has sparked decades of innovative technologies addressing various challenges in handling chemical specific data, but very few approaches have persisted, are extensible beyond specific data types and/or are operable at scale. There is demonstrable need for coordinated development of updated and scaled infrastructures, hard and soft, for enabling chemical data exchange and connecting data providers with data users across sources and applications. The RDA mission is to build the social and technical bridges that enable open sharing of data. Organizing a forum for professional exchange directed at addressing opportunities and challenges for chemistry data management within the RDA framework will support international participation across a broad range of stakeholders and foster connections with data types and user scenarios in many disciplines. Bringing in IUPAC (International Union of Pure & Applied Chemistry) as co-sponsor of the group would clearly bridge the activities of this group between those of RDA and the responsible standards body for chemistry.

User scenario(s) or use case(s) the IG wishes to address: In response to many scientific, technical, and socioeconomic drivers, research chemists, chemical educators and chemical information specialists are recognizing the necessity to move forward with infrastructures, best practices, and cultural shifts to support consistent data management and sharing practices. Research funding agencies are increasingly requiring openly accessible research data and are looking to the scientific research communities to develop domain-appropriate criteria. Professional societies recognize the benefits in encouraging chemistry professionals to be experts in handling electronic data and documentation, and supporting these skills in professional education. Increasing opportunities for low-barrier technical solutions are opening up the market for electronic based information and data flow through electronic notebooks, automated data collection and analysis, data repositories and citation networks.

The importance of chemical data has long been recognized by science communities and centuries-old efforts in indexing and repackaging chemical data from primary literature into expansive collections that support innovation across many disciplines. However, there are many challenges to meet increasing demands for open research data deposit and maximizing machine operable data exchange. Working with chemistry research data often involves extensive consideration of contextual factors and layers of interpretive technologies. Divergent high-touch workflows have evolved to manage data in the existing collections. Long traditions of small laboratory culture and strong proprietary and commercial value impact the overall adoption and incorporation of open data exchange and high performance computing directly in research chemistry outside of a few sub-disciplines (e.g. drug discovery). As already experienced in many networking venues amidst chemical information professionals, an international Interest Group that spans a range of professional perspectives and expertise can provide much needed opportunity for fostering convergent and informed discussions.

Objectives: At some level, chemistry information is ubiquitous to every wet science laboratory and many theoretical research problems as well. The high value and wide applicability of chemical data generally has ensured a landscape of numerous and scattered, thoughtful and variously adopted “best practices”. Many venerable research and scientific publishing institutions and disciplinary data projects are involved in reviewing and managing data of high utility and have influential roles in long-standing community standards of practice around data use in the discipline. To maximize on the knowledge potential of the discipline, we are interested in approaching the functionality of data from several angles, including domain scope, infrastructure, and community practice. Specifically we propose to:

  1. Characterize different chemical data types of interest, identify critical points in the data life-cycle from instrument to publication, compile data management criteria in practice, map gaps in interoperability and opportunity potential for standards and other infrastructures, and prioritize outreach approaches and tools for researchers, primary publishers, data compilers, and others who manage chemistry research data.
  2. Leverage effort from all parties to establish metadata standards, ontologies and other soft infrastructures for chemical data that are adaptable for different application purposes
  3. Examine current research workflows in various research domains that interact with chemical data to support minimal disruption, encourage development of best practices and lower barriers to adoption. Particular attention will be given to engaging instrument manufacturers in the discussions, as they represent a good target to reduce the barriers to storing both data and metadata early in the research workflow.
  4. Cultivate sharing culture among researchers working in chemistry related fields by demonstrating potential innovations based on reusable chemical data.

Participation: There is increasing interest within RDA to engage with domain-based initiatives and data-driven organizations. The International Union of Pure and Applied Chemistry (IUPAC) is a long-standing professional international organization with vested interest in supporting broad dissemination and usability of chemical data through development of standards and recommended practices. IUPAC engages members from adhering organizations in over 50 countries and is associated with over 30 international scientific organizations. Positioning this initiative as a joint RDA/IUPAC interest group will enable us to leverage the mechanisms and infrastructure of both international working member organizations to facilitate global input, dissemination and practical implementation of initiatives.

Potential Interest Group members hail from a range of professions and sectors that intersect chemistry research data, including experimental and theoretical researchers, educators, data and information scientists, librarians, publishers, database providers, and many others in academic, industrial, private and public sectors worldwide.  Many are active in professional groups with expertise in chemistry data, including the American Chemical Society (ACS) Division of Chemical Information (CINF), the Royal Society of Chemistry Society (RSC) Chemical Information and Computer Applications Group (CICAG), the Chemical Structure Association (CSA Trust), the German Chemical Society, the Chemical Society of Japan, the Chinese Chemical Society, among others. Opportunities exist to participate regularly in the technical programming and social networks of these organizations to further engage chemistry researchers and information professionals.

Outcomes: Understanding current data management practices (in the broadest sense) and perceived gaps across the chemistry discipline is key for targeted action. Suggested documentation projects of potential interest and value for the community include:

  1. Collect top five priority outcomes from members with rationale; from these identify commonalities and diversities for the group and the chemical information and data management professions writ large; collect at discussion events and consider a survey question for new members
  2. Identify and characterize existing systems and solutions relevant to chemistry and the interests that arise in the survey, including existing disciplinary data repositories, ontologies, and other community data projects
  3. Identify and compare funding agency requirements internationally that potentially involve chemistry data
  4. Determine what chemistry analysis instruments are already doing in taking data, what file formats are in use? What metadata commonalities, diversity? Proprietary data, format issues?
  5. Survey top chemistry publishers of the various types of data that are included with manuscripts as supplemental information
  6. Others as they arise from discussion

Mechanism: Discussions of interest initiated at the ACS meetings in March 2015 and August 2015 sparked a proposal for a BoF session at the September RDA Plenary in Paris to seek input on formulating a mechanism for a group. Further international outreach is planned through meetings and technical symposia at the multi-national chemical societies Pacifichem meeting in December 2015 and the ACS meeting in March 2016. Additional programming will be proposed with other societies and meetings. Monthly virtual meetings and regular inclusive communication channels will be established in the fall. Additional meetings focused on specific outcomes will be scheduled as needed.


Outreach – first 6 months

  1. Discussion with the IUPAC Committee on Publications and Cheminformatics Data Standards – August 2015
  2. BoF session at the RDA Plenary in Paris – September 2015
  3. Reach out to other pertinent RDA groups, such as the RDA/CODATA Materials Data, Infrastructure & Interoperability IG, the Data Citation IG, and others - start at the Paris Plenary, September 2015
  4. Establish communication structure – Fall 2015
  5. Outreach and increase group member list through planned symposia and networking highlighting a broad range of data initiatives at other various domain meetings – ongoing, started March 2015
  6. Continued brainstorming for issues and outcomes of potential interest for further discussion – ongoing, started March 2015

Roadmap – second 6 months

  1. Focus on 3-5 documentation activities first year, primarily focusing on professional and scientific community information gathering to develop a roadmap of challenges and opportunities for chemistry data management
  2. Identify deliverables and establish working groups by the end of the year for 1-3 problems in the community for the common good of all / most stakeholders

Potential Group Members:







Northwestern University, US   



National Institutes of Health (NIH), US



Cambridge Crystallographic Data Centre (CCDC), UK



University of North Florida, US, convening organizer



Molecular Materials Informatics, Canada



University of Southampton, UK



University of Pennsylvania, US



Cambridge Crystallographic Data Centre (CCDC), UK



University of Southampton, UK



Princeton University, US



University of New South Wales, Australia, IUPAC Division V



Royal Society of Chemistry (RSC), UK



Sandia National Laboratories, US



University of Michigan, US






National Institutes of Standards and Technology (NIST), US



American Chemical Society (ACS), US



Cornell University, US, convening organizer



Chemical Semantics, Poland



Cambridge Crystallographic Data Centre (CCDC), UK



University of Oregon, US



Environmental Protection Agency (EPA), US



Ohio University, US

NOTE: The convening group is actively pursuing a number of outreach opportunities through connections with IUPAC and other Chemistry Societies to expand membership globally and engage experts across professional and industrial sectors, including research & development, manufacturing & distribution, education, and regulation.

Review period start:
Friday, 30 October, 2015 to Monday, 30 November, 2015
Custom text:
The Array Database Assessment WG will inspect the emerging technology of Array Databases to provide support for technologists and decision makers considering Big Data services in academic and industrial environments (such as in large-scale data centers) by establishing best-practice guidelines on how to optimally serve multi-dimensional gridded Big Data through Array Databases. This will be accomplished through a neutral, thorough hands-on evaluation assessing available Array Database systems and comparable technology
  • based on relevant standards, such as the NIST Big Data Reference Architecture, ISO “Array SQL”, and OGC Web Coverage Processing Service (WCPS) for the geo domain;
  • comparing technical criteria like functionality, thereby eliciting the state of the art;
  • establishing and running a combination of domain-driven and domain-neutral benchmarks that will be run on each platform;
  • as well as real-life, publicly accessible deployments at scale.
The result, consisting of the ADA-WG report together with the open-source benchmarking software and the services established, will establish a hitherto non-existing overview on the state of the art and best use of Array Databases in science, engineering, and beyond.
Review period start:
Monday, 19 October, 2015
Custom text:


PID Collections WG charter

WG candidate co-chairs: Bridget Almas (Tufts/Perseus DL), Tobias Weigel (DKRZ), Tom Zastrow (RZG)

Value Proposition

Several communities have expressed a need to leverage aggregations of objects with a particular focus on building such aggregations, whether virtual or physical, through PIDs and providing identifiers for aggregation objects. There is however no unified cross-community approach to building and managing such collections and no common model for understanding them. The PID Information Types WG has defined a core model and the central interface for accessing object state information and provided a small number of example types, which were consequently registered in the Data Type Registry WG prototype. With these tools available to describe essential object information, collections can be described so to be able to deal with more than a single object at once.

Building collections within diverse domains and then sharing or expanding them across disciplines should enable common tools for end-users and e-infrastructure providers. Individual disciplinary communities can directly benefit if such tools are made widely available, and cross-community data sharing can benefit from increased unification between collection models and implementations. PID providers may benefit from marketing additional services on collections.

Engagement With Existing Work

The WG will examine existing models for identifying and managing collections to surface commonalities and differences across models and to ensure that the output of the WG is general enough to work with these standards.  Specific standards that we will investigate include the IETF BagIT Draft specification[1],  the CITE Collection Services protocol[2] and OAI-ORE[3].  It is not the intent of the working group to propose an alternative to existing well established standards for describing and archiving collections but rather to propose an API and implementation for creation, consumption, distribution and citation of collections and their items that could serve as a unifying layer on top of the existing models.

The WG will observe other developments within and outside of RDA such as the ongoing Type Registry work and similar typing efforts. The later phases of the WG effort may also coincide with concerns within the EUDAT2 project. The notion of collections has also been included in the first model discussions of the Data Fabric IG, and the WG will contribute to these discussions.

Goals and work plan

The WG will start with an assessment of community use cases, some first examples are given further below. From the use cases, a classification scheme or general model should be developed that explains the different approaches and understandings in describing collections, including aspects such as static and dynamic collections. Another important model to recognize during WG work are collections based on file system directories as these represent today’s most common approach to organizing data. Eventually, such models may contribute to a view where digital objects and collections become the equivalent to traditional files and directories.

For a choice of the use cases, the respective collection models should be expressed through PID types and these types should be registered. Other relevant candidate types that go beyond core collection concerns may be discussed as well. Discussions may also cover other methods to relate objects to each other in general object or identifier graphs, building on prior work e.g. in the context of RDF/OWL or FRBR. As part of this discussion, the role of identifier fragments and queries in the collection models should be clarified, and models for fragment services should be discussed. The selected use cases then feed into the formulation of a generic collection API, extending and unifying existing solutions (e.g. from CLARIN or OAI-ORE). Possible themes for the API also include methods to differentiate between nodes and leafs, supported by specialized PID types, and to offer iteration and traversal operations. With respect to such a unifying API and the community use cases, added-value tools should be discussed that offer direct benefits to community end-users. The collection API should be implemented in a small demonstrator project which may also illustrate some tool ideas. To work across identifier systems, the demonstrator should make extensive use of the PID Information Types API. The most essential typing mechanisms that can be used to implement collections should be registered in a Type Registry.

The WG aims to have a productive working session at each of the corresponding RDA plenaries. Besides members from infrastructures and PID providers, representatives from user communities are particularly welcome. Between plenaries, WG work will continue in small groups via e-mail and virtual meetings.

Expected concrete outcomes

D1. Collection models (M12). This report summarizes the collection models with detailed descriptions and usage examples and should help communities to understand and refine their collection usage scenarios. Fragment identifier issues will be addressed as well. This should be a step-by-step guide to the what, why and how of collections.

D2. API and demonstrator (M18). This deliverable includes the collection API specification, documentation and a demonstrator that illustrates the added value of unified collections. A final list of suggested PID types should be included. Paper prototypes for tools or other applications within exemplary domain scenarios may also be provided. Although development of the API specification will only be done through detailed analysis of the use cases, we envision that at a minimum the following types of collection operations would be covered:

  • Retrieving/setting/updating collection level metadata
  • Retrieving a list of items (ordered or unordered) in a collection

◦      refinements on this will include pagination and filtering by specific criteria

  • Create/Read/Update/Delete operations on collection items

We expect some more advanced requirements to be uncovered as well through the use case analysis, such as capabilities for discovery of fragment identifiers and definition of collection type templates.

In accordance with the guidelines of RDA, all outcomes will be provided under open licenses.

Social Deliverables and Sustainability

As described above, the working group plans to deliver an easily adoptable model for identifying and managing collections of data objects via the combination of a clear outline of use case scenarios, a well-defined API for machine driven interaction with the collections, and a reference implementation of that API deployed by projects across several domains.  We hope that this can provide a straightforward solution for many research projects that might otherwise have implemented a closed or idiosyncratic model for their data collections. We expect that this will be a living solution, which is improved over time by the addition of new use cases to make it more robust. A focus over the WG lifetime is to keep the rather abstract API design and the concrete domain use cases closely connected, e.g. by including textual usage scenario descriptions from the individual users’ point of view that show exactly which parts of the API are used (and how they are used) in an exemplary realistic workflow.


M1: BOF at RDA Plenary P6, additional adopters identified and committed

M6: Initial use case descriptions gathered.

M12: Collection models defined. Collection API draft reviewed.

M18: Collection API and demonstrator implemented.

Adoption Plan

The following organizations have expressed a commitment to adopt the outputs of the WG, by implementing and deploying the API for their specific use cases:

·      DKRZ: Collections are useful to bind dataset replicas and versions together and reflect the multi-hierarchical organizational structure of the ESGF dataspace. Such collections are largely static, but highly interconnected with other collections and objects. Implementation continues throughout 2015 and some essential collection tools may be developed in the time afterwards.

·      Perseids Project, Perseus Digital Library: The Perseids Project at the Perseus Digital Library requires an application of collections and fragments for referencing (human & machine), not for object management (moving objects around in an e-infrastructure). Collections are built explicitly via hierarchical PID syntax components which are widely agreeable to be static; for machines, a common API would unambiguously expose the hierarchical levels. Annotation types could be expressed through PITs.

Additionally, the following organizations have expressed a commitment to supplying use cases for the WG and to strongly consider adoption of the outputs:

·      BCO-DMO: The use case focuses on cruise data acquired with various instruments used also across several cruises. Users may perform diverse discovery and aggregation tasks, e.g. for data from a single cruise or the same instrument used across several cruises. Data objects are accordingly arranged in collections; sometimes hierarchical but more often graph-like depending on use. WHOI is looking into assigning specific PIDs to cruise data (DOIs) and related concepts (e.g. ORCIDs for person, IGSNs for physical samples) and interconnecting them.

·      SEAD project: The SEAD virtual archive offers several relevant workflows for research object management, including collection and subcollection building and versioning. A demonstrated practical use case showed the importance of building a virtual collection with data objects of mixed type. From the view of the Collection WG, a first opportunity is the common collection API.

·      Coptic SCRIPTORIUM: The Coptic SCRIPTORIUM project has provided a use case centered on identifying, referencing and managing collections of textual and linguistic data objects, including codices, paleographic symbols, manuscript fragments, digital images, annotations, morphemes and word tokens.  These data objects are currently managed by the project through spreadsheets without use of persistent identifiers and a scalable solution is required to manage this data and it to the PIDs of the source texts.

·      Ocean Data Interoperability Platform (ODIP). For the purposes of data publishing, the ODIP project is creating data set collections using pre-defined criteria such as vocabulary terms or originating repositories. Usually, both collections and their granules bear PIDs.

·      Harvard Astronomy Abstract Service: Tables with data values as supplements to articles, and individual values in table rows, articles bearing PIDs. PID fragments should then point to individual values, which enables better data discovery. The practical feasibility depends on the uniformity of table encodings.

·      Open Philology Project, University of Leipzig: The Open Philology project has collections of various types of data objects relating to texts (e.g. manuscript images, OCR output, TEI XML, and annotations). They want to be able to apply persistent identifiers to these collections and their objects, as well as to the primary sources to which they refer, throughout the data production, publication and preservation lifecycle. Annotations and derivative versions and analyses which are made on the early versions should be easily and automatically portable to the newer, improved versions as they become available. Citations which reference fragments of the text should be robust and automatically resolvable across versions and archived copies.

Other interested parties:

EUDAT, DARIAH and CLARIN have relevant use cases and have expressed interest in following the activities of the WG. 

Initial Membership


Working Group Co-Chairs,

Tobias Weigel, DKRZ (German Climate Computing Center)

Thomas Zastrow, RZG Max Planck Society

Bridget Almas, Tufts University, Perseus Digital Library, Perseids Project

Use Case Providers, Potential Adopters:

Beth Plale, Indiana University, SEAD Project

Cynthia Hudson Vitale, Washington University St. Louis

Cyndy Chandler, Biological and Chemical Oceanography Data Management Office, Woods Hole Oceanographic Institution (BCO-DMO/WHOI)

Helen Glaves, British Geological Survey. Ocean Data Interoperability Platform project (ODIP)

Caroline T. Schroeder, University of the Pacific, Coptic SCRIPTORIUM Project

Giuseppe Celano, University of Leipzig, Open Philology Project






Review period start:
Friday, 26 June, 2015 to Saturday, 25 July, 2015
Custom text:

WG Charter

Updated 15 January 2016 in response to TAB review

A concise articulation of what issues the WG will address within a 12-18 month time frame and what its “deliverables” or outcomes will be.

The Metadata Standards Catalog (MSC) Working Group will produce a catalog of metadata standards of relevance to research data. Specifically, the catalog system will consist of the following components:

  • A set of records describing metadata standards

By ‘metadata standard’ we mean a defined metadata structure and format used within a community. Some metadata standards have formal standardization status but this is not necessarily an indication of quality or utilization.

  • A user interface for submitting information, searching, browsing and displaying standards information
  • A machine-to-machine interface (API) allowing automated tools to submit information, perform queries and retrieve information from the catalog

In this sense the catalog will be ‘machine readable’. We also intend that the information provided through the API will be structured in such a way that the recipient machine will be able to process and act upon it; in this sense the catalog will also be ‘machine actionable’.

This work builds on the outputs of the Metadata Standards Directory Working Group: both the Metadata Standards Directory (MSD) itself and the set of attendant use cases. The advances to be made by the MSC Working Group beyond that work are improvements to the data structure of the records, an improved user interface, and the addition of an API. The latter development will enable the MSC to participate in and provide services to any autonomic e-research fabric. While the base set of records for the MSC will be derived from those in the MSD, further information will be sought from discipline-specific standards directories and RDA working groups.

Value Proposition

A specific description of who will benefit from the adoption or implementation of the WG outcomes and what tangible impacts should result.

Prediction is very difficult, especially if it's about the future. — Neils Bohr

The primary beneficiaries of this work – being a continuation of work performed by the MSD Working Group and the UK Digital Curation Centre – will be researchers, in the following ways:

  • Early career researchers, or those unused to sharing their data, will be able to consult the MSC – whether directly or indirectly through other tools – and discover an appropriate standard to use when documenting their data. This will help them to comply with funder requirements for data management from the planning stages through to submission to an archive.
  • Having a comprehensive catalogue of metadata standards will make it easier to prevent duplication of standards development effort (in areas where metadata standards already exist) and highlight gaps where standards development activity is needed.
  • Following on from the above, it will therefore be easier for peer researchers to discover, validate and reuse datasets that have been shared, due to the expected metadata being in place.

The group recognizes that there exist catalogs of datasets, repositories, software and publications with limited metadata for each entry. There exists the BioSharing initiative for bioscience and DataONE in environmental sciences, and beyond them many projects and initiatives with some information on relevant datasets and metadata associated with them. While these all contribute to the above benefits, in each case the information available is only covering one or a few domains; and in the case of BioSharing, intended for human not machine action. The MSC will address both these shortcomings with respect to metadata standards for research data.

More specific and tangible benefits and impacts of the MSC will be identified and targeted as part of the work plan. Use cases continue to be collected by the existing RDA metadata groups, the Metadata Interest Group (MIG), Metadata Standards Directory Working Group (MSDWG), and Data in Context Interest Group (DICIG). These, along with use cases to be collected directly by MSC Working Group, will be used as a basis for the precise use cases to be addressed by the MSC and the information it will contain. The following are a few examples:

  • Researchers and research support staff will be able to discover standards relevant to the researcher’s discipline.
  • Researchers will be able to browse the catalog for standards in other disciplines that may be appropriate, perhaps to facilitate interdisciplinary or multidisciplinary work.
  • Developers of tools such as DMPTool or DMPonline will be able to use information from the MSC, via the API, to suggest relevant standards to researchers.
  • Systems will be able to look up, via the API, the converters available for unfamiliar standards, to reduce the friction of importing metadata from other systems.
  • In the long term, the MSC may be able to provide information to enable repository software to provide tailored metadata input forms on-the-fly, or to generate a matching between standards that a developer could use as a starting point for a converter.

The medium term goal, however, is to analyze the metadata standards and use cases and generate – working with and through MIG – a set of proposed ‘packages’ of metadata elements for purposes drawn from the wider set of use cases. By ‘packages’ we mean groupings of metadata elements for particular purposes, such as discovery, contextualization, or detailed connection of software to data. For clarity, please note the following:

  • Each element may not be a single-valued attribute but a structure.
  • There are relationships between elements including those carrying referential and functional integrity.
  • Elements may belong to more than one package.

Since the packages represent functions, one possible naming scheme would Function Package with a specific function represented as Function Package: Discovery. The packages describe here differ from application profiles. Application profiles are a mapping from a given data structure to that required for a particular application (business purpose). These packages are generic. The packages so generated will be useful for those writing converters, designing systems and considering new standards. In the longer term they may form the bais of novel presentations of schemas and specifications for metadata standards.

The information that the catalog will need to hold about each standard will depend on the use cases chosen for the MSC, but we anticipate it will include information on the schema/specification, converters available, associated vocabularies (on which collaboration with the Vocabulary Services Interest Group would be useful), associated tools, examples of services with expertise in the standard, the version history of the standard, and the provenance (source, date last updated, etc.) of the records themselves.

Engagement with existing work in the area

A brief review of related work and plan for engagement with any other activities in the area.

The only source of knowledge is experience. — Albert Einstein

The work of MSC Working Group will build on the outputs of the RDA MSD Working Group, as outlined in the case statement for that group. The group will engage with several actively maintained domain-specific standards directories:

The group will engage with other RDA groups with an interest in metadata standards:

  • Wheat Data Interoperability Working Group
  • Metadata IG
  • Data in Context IG
  • Data Fabric IG
  • Practical Policy WG
  • Brokering Governance WG
  • Brokering IG
  • ELIXIR Bridging Force IG
  • Data Description and Interoperability WG
  • Research Data Provenance IG
  • Domain Repositories IG

The group will engage with other external activities with an interest in identifying and producing metadata standards:

Work Plan

A specific and detailed description of how the WG will operate.

M1-M6: Continue to collect use cases for interaction with the proposed catalog including both human access/interaction and machine access/interaction. Analyze these (using the MIG template presented at RDA Plenary 6) for intersections and synergies leading to a definition of the requirements and technical specification for the catalog.

M6-M12: Do initial design and development of the Catalog system prototype based on already collected use cases, defined requirements and technical specifications using best practices in metadata standards to describe metadata standards and their relationships to organisations, persons, software, datasets, etc. thereafter taking into account later-arriving use cases refine to the production catalog system. Cooperate with other RDA Groups (especially domain groups) in refining the prototype system to production status leading to adoption. Identify potential adopters of the Catalog system.

M12-M18: Evaluate the catalog against requirements and technical specifications with and by application domain communities of RDA and other potential adopters. Validate the Catalog mechanisms for directing users to metadata standard(s) appropriate to their purposes.

M14-M16: During this period of time the packages will be documented; standards to the catalog will be inputted "as they are"; a priority list of standards for mapping will be established by the community; the high priority standard swill be mapped to the packages; the mapping will be stored in the catalog; and a user interface and API to the catalog will be provided

M18: Report at next RDA Plenary

Final Deliverables

The form and description of final deliverables of the WG.

A catalog for metadata standards, incorporating

  • User interfaces for input/edit/query/reporting
  • Appropriate APIs for software interaction with the catalog
  • Mechanisms for directing users to metadata standards appropriate to their purposes.

Milestones and Intermediate Deliverables

The form and description of milestones and intermediate documents, code or other deliverables that will be developed during the course of the WG’s work

  • One or more white papers to stimulate discussions and steer group activities (M3, M9)
  • Requirements and technical specification for the MSC, to serve as the foundation for software development (M6)
  • Contributions to the MIG objective of defining ‘packages’ of metadata elements for defined purposes (M12)

Mode and Frequency of Operation

A description of the WG’s mode and frequency of operation (e.g. on-line and/or on-site, how frequently will the group meet, etc.).

The WG will provide the usual forum for discussion through teleconferences/Skypes (every three months), face-to-face meetings between plenaries associated with group chair meetings, face-to-face meetings at plenaries (twice a year), meetings at plenaries with other groups (especially the metadata groups but also others) and the RDA website forum as appropriate. However, the major mechanism of operation is the provision of software to encourage input/edit and utilisation of metadata standards.

Operational Policies

A description of how the WG plans to develop consensus, address conflicts, stay on track and within scope, and move forward during operation

The proposed co-chairs have extensive experience of project management generally and also within the RDA context due to participation in other groups. They have already demonstrated their ability to manage groups. Working from the agreed plan, all group members will participate in discussions on realization of the objectives and the means to do so. Consensus will be developed by frequent online (teleconference or email/forum) discussions mediated by one or more of the co-chairs. This mechanism will also achieve conflict resolution by exposing the different viewpoints and – through discussion – converging to consensus. The milestones defined above provide the anchorage to a timeline and the deliverables will be concrete proof of adherence to the timeline.

Community Engagement Plan

A description of the WG’s planned approach to broader community engagement and participation.

Potentially all RDA groups should be engaged and participate by contributing to this WG. The WG plans to provide advice and assistance upon request especially to domain-based groups enquiring about metadata standards. In particular the groups listed above (Engagement with Existing Work) will be involved in shaping the work of the proposed MSC Working Group acting both as requirements stakeholders and validators of the deliverables.

Adoption Plan

A specific plan for adoption or implementation of the WG outcomes within the organizations and institutions represented by WG members, as well as plans for adoption more broadly within the community. Such adoption or implementation should start within the 12-18 month timeframe before the WG is complete.

In order to ensure the eventual functionality and the APIs of the MSC are maximally useful, the WG will consult with potential adopters, including tool developers, research data management support staff when compiling and evaluating against the requirements and technical specification (see Work Plan above). Entries may also be synchronized with peer standards catalogues, such as the DCC Disciplinary Metadata catalogue.

Initial Membership

A specific list of initial members of the WG and a description of initial leadership of the WG.

In some ways, this is a continuation of MSDWG but with a broader scope focused on one aspect of autonomicity of e-Research. The initial leadership will be as for MSDWG:

There are 45 members of the Working Group as of November 2015.

Review period start:
Monday, 20 April, 2015 to Wednesday, 20 May, 2015
Custom text:

This interest group seeks to create a dedicated space in RDA to discuss ethical and social issues with respect to data archiving, sharing, and reuse. Ethical and social issues occur frequently within the technical and policy work of the rest of RDA. Such issues are complementary to but separate from legal and regulatory aspects which other groups cover, since technology usually outpaces legal precedent, and law itself is underpinned by ethical agreements and social contracts. Our group differentiates itself from other RDA efforts by focusing on the ethical agreements and social contracts that inform and constrain data sharing practices.


Primary points of inquiry include the following:

  1. Understanding the kinds and types of data that can be archived and reused and under what circumstances. How are agreements negotiated and are they upheld throughout the data sharing process?;

  2. Educating researchers, respondents, administrators, ethics bodies, etc. in the exigencies of data sharing and community supported ethical standards;

  3. Understanding the grey and difficult areas of research that present unique challenges for data management, such as “dangerous”, legacy, retrospective, medical, or passively collected data;

  4. Developing policies and guiding practices that encourage (or, where appropriate, enforce) the ethical sharing of data that protects both the researcher and the respondent;

  5. Exploring the social obligation to communities where data is collected, whether that be individuals, families, or whole societies. What are the grand social obligations and ethical issues related to data reuse?;

  6. Identifying emergent issues with potential implications for data sharing: new technologies and tools, scholarly practices, commercial data use, etc.


The Ethics and Social Aspects of Data Interest Group is open to all RDA members.


The Ethics and Social Aspects of Data group will be considered a success if it generates policies and practice documents providing guidance in challenging ethical situations related to data reuse. Additionally, the group will be a success if it creates a space and continued conversation about the ethical dilemmas that arise from new data sharing practices (both technical and social) within the RDA. This interest group will also, where appropriate, serve as a springboard for one or more working groups focused on specific situations and developing ethical practices and piloting those recommendations (e.g. how to ethically incorporate future data sharing into consent forms for human-subjects research).

Mechanism and Coordination

  1. Face-to-Face: This group will meet at plenaries to finalize the agenda for the coming 6 month period, to share progress with other RDA groups, and to host important conversations about the ethical and social implications of data sharing.

  2. This interest group will take advantage of’s native capabilities to communicate and collaborate. These include monthly or bi-monthly telecommunications (go-to-meeting) with a planned agenda for discussing specific issues/ cases and continual collaboration to the group wiki.

All interest group documents will be made publically available and be promoted to the RDA community.

Timeline (subject to change)

  • March 2014: Kalpana hosted a Birds of a Feather session in Dublin to establish interest; was positively received.

  • March 2015: Second Birds of a Feather session:

    • found several candidates to co-chair with Kalpana,

    • documented ethical issues that group members are interested in exploring,

    • established the need for a “gaps” analysis to document what is already available and where the broader data science community has overlooked ethical issues, and

    • discussed several IGs that are potential collaborators (Data Fabric, Digital Practices in History and Ethnography, Education and Training on handling of research data, and Engagement interest groups).

  • Spring/Summer 2015:

    • Apply for an RDA Data Share Fellow to work on documenting the state of the field.

    • Submit and have IG charter approved.

  • September 2015: Plenary 6 - First official session as an Interest Group.

    • Share summer work (Data Share Fellows outputs) with members;

    • Host a joint-session with another interest group on a mutual topic of interest; (use of commercial software with EULA’s that contradict open data principles? Evolving challenges for informed consent forms? Etc.)

    • Preliminary new Working Group identification.

Potential Group Members

Co-chairs: Kalpana Shankar, Candice Lanius





Dr. Kalpana Shankar


ESAD IG Co-chair; Lecturer in the School of Information and Library Studies, University College Dublin

Candice Lanius


ESAD IG Co-chair; RDA/US Resident, PhD Student Communication & Media, Rensselaer Polytechnic Institute

Andy Turner


Researcher in Computational Geography, University of Leeds, UK

Celia Emmelhaiz


Data Librarian, Colby College

Christine Kirkpatrick


CTO/IT Director, San Diego Supercomputer Center

Dessi Kirilova


Fellow at the Moynihan Institute of Global Affairs and the Consortium on Qualitative Research Methods, Syracuse University

Dr. Elizabeth Griffin


Researcher at Dominion Astrophysical Observatory; Chair of IAU’s “Preservation and Digitization of Photographic Plates” group; Chair of CODATA “Data at Risk” group.

Dr. Gilles Adda


Director of IMMI/CNRS Labs; Co-writer of “Ethics and Big Data Charter” initiative in France.

Dr. Inna Kouper


Researcher, Data to Insight Center, Indiana University; Co-chair Engagement IG

Julia Collins


Software Engineer and Metadata Architect, CU/CIRES National Snow and Ice Data Center

Dr. Libby Bishop


Data Archivist/ Manager in the Research Data Management section of the UK Data Archive

Dr. Marco Scarselli


Sociologist, consultant for SMEs (Italy)

Mike Usmar


CEO/ Managing Director at High Tech Youth Network

Dr. Oya Beyan


Researcher in Biomedical Informatics at NUIG @ Insight

Robert Quick


IT Specialist at Distributed High Throughput Computing Operations, Indiana University

Dr. Sarah Olesen


Senior Data Management Specialist with Australian National Data Service; dual apt. at Australia National University

Dr. Stefanie Kethers


Research Data Management at Australian National Data Service

Timea Biro


Communications and Digital Marketing, Trust-IT Services (Italy)

Dr. Heike Felzmann


Lecturer, National University of Ireland-Galway (Ireland)


TAB Interest Group Review:

Review period start:
Custom text: