Repository Platforms for Research Data IG - TAB Review

TAB Case Statement Review

Working Group Title:  Repository Platforms for Research Data

Proposers:

David Wilcox, DuraSpace  <dwilcox@duraspace.org>

Stefan Kramer, American University Library <skramer@american.edu>

Date Received by TAB: Dec. 31, 2014

TAB Reviewers:  Peter Wittenburg, Rainer Stotzka, Beth Plale

Completeness of Case Statement:  

(Does it include the six requisite components: (1) WG Charter; (2) Value Proposition: (3) Engagement with Existing Work in the Area; (4) Work Plan; (5) Adoption Plan; (6) Initial Membership?): Yes X; No __; Comments: 

 

Charter

The WG proposes to analyze research data use cases in the context of repository platform requirements and generate a matrix of requirements for such platforms and a comparison of systems based on the requirements.

 

Value Proposition

The value proposition is clearly specified and to our knowledge the need for guidance is high.

 

Engagement with Existing Work

Clearly there exists a close relationships with work of other other RDA groups, such as Repository Audit and Certification DSA–WDS Partnership WG, Data Fabric IG, Domain Repositories IG, metadata groups and some of the domain related groups. In particular the DFT WG defined requirements for a data model to be supported and also looked at platforms - this should be considered. The domain related groups may provide important use cases as well as other of the mentioned groups - this initiative would need to look at the documents of a number of groups and take up resp. comment on that work.

 

The case statement relates proposed work to initiatives outside of RDA; while important, this is insufficient given the activity within RDA.  The group is advised to additionally look for use cases from groups working on big data with note that platforms may look different. One such platform candidate is iRODS, which also tackles issues such as federations. Other repository platforms have been developed within large communities such as CERA within the European climate modeling community (https://www.dkrz.de/daten/cera) or the LAMUS work at Max PIanck Institute for Psycholinguistics (https://tla.mpi.nl/tools/tla-tools/). Also in China for example we have seen interesting developments.

 

Work Plan

The work plan specifies the major deliverables and milestones but a more precise workplan that states when  milestones will be met is missing.   The proposers will additionally need to state their preference for either 1) completing their work with enough time for the community review of their output products before the 18 month deadline (that is, wrapping up by month 15), or 2) state that they will need the full 18 months to complete and document their final product, in which case the case statement needs to address how the group will maintain continuity of attention to the output product(s) through the community review.   To the latter, some WGs name individuals who retain responsibility; others form Interest Groups whose role is in part to address items that emerge during community review.  

 

Adoption Plan

The output products of RDA working groups must be actionable (adoptable).   It is not clear how adoptable is the product emerge from this effort.   It appears the intended product is a taxonomy of key functionality, that is fleshed out with well chosen examples.  This as an output is better suited to an interest group than a working group. 

 

If adoption is intended to mean that people in the field will read the comparison and from it  make decisions, then adoption in this case means facilitating "wise" decisions.  But those who are going to engage in this "adoption" need to be named in the case statement, and agree to their role as adopters.  

 

The reviewers were of mixed opinion on whether the case statement should indeed be recategorized as an interest group, or whether the proposers can strengthen the output product(s) sufficiently so that it/they stand on par with other working group output products.   It was determined that the case statement be given a conditional accept so the proposers can decide. 

 

 

Initial Membership

The co-chairs are both from the US and not from science labs, a more balanced leadership is urgently required. The membership list includes a variety of well-known persons giving reasons to believe in a good start. As indicated experts from big data labs should be attracted to come to a balanced survey.

 

Focus and Fit:  

 (Are the Working Group objectives and deliverables aligned with the RDA mission ?  Is the scope too large for effective progress, too small for an RDA effort, or not appropriate for the RDA?  Overall, is this a worthwhile effort for the RDA to take on?  Is this an effort that adds value over and above what is currently being done within the community?)

 

The focus of the WG is well aligned with and important for RDA. Many people and communities as well as repository developers would like to have guidelines which repository system to choose and which requirements to fulfill. Much work is currently wasted due to wrong decisions at an early stage. Basically the group will come up with an enhanced and critical survey work with lots of comparisons.

Therefore it will help removing barriers, since if this survey will help people to choose optimal systems we would gain a lot in interoperability and efficiency. It could also be the case that software developers will move towards a higher degree of compliance, etc.

 

Work Plan, Deliverables, and Outcomes:

(Are there measurable, practical deliverables and outcomes?  Can the proposed work, outcomes/deliverables, and Work Plan described in the Case Statement be accomplished in 12-18 months?)

 

Also in other groups such as DFT statements were made what a core model that is being supported should look like, so it would be useful for the proposers to check platforms for how far they are compliant with the suggestions from a number of groups in RDA.

 

Two concrete suggestions can be made to the group:

- in addition to use cases which obviously describe usage scenarios the group should have a wiki page where everyone interested could add a repository platform description, also allow comments about them to indicate strong or weak points and suggest to all people active in RDA and in infrastructure projects to participate

- organize a BoF kind of session at P5 or P6 where people can make short statements about requirements and interact about them (the group already proposed a session for P5)

 

Capacity:

(Does the initial membership list include sufficient expertise, and disciplinary and international representation?  Are the right people involved in the Working Group to adopt and implement?  What individuals or organizations are missing?)

 

David Wilcox is the Fedora product manager of Duraspace. The group needs to take care that the discussions will not be dominated by product interests. Stefan Cramer is librarian and member of RDA OAB. Both co-chairs are from the US and as it seems none of them is from a scientific lab dealing with data. As indicated above it would be good to ensure regional but also domain-related balance by identifying another chair.

 

The members listed shows a good spectrum, but it is not obvious what kind of commitment is given since no early adopters can be identified. As indicated above the group needs to ensure that experts working with big data are also represented. Also here a broader group including experts from a variety of backgrounds would be optimal.

 

Impact and Engagement:

(Is it likely that the outcome(s) of the Working Group will be taken up by the intended community?  Is there evidence that the research community wants this?  Will the outcome(s) of the Working Group foster data sharing and/or exchange?)

 

The topic has the potential for highly relevant results, with benefits that include:

  • Important work
  • Will help to remove barriers
  • Will foster interoperability
  • Will help to develop better and maybe also "RDA-compliant" systems

However, the reviewers have reservations that if not addressed will limit the impact.  

 

Recommendation:  

Case Statement is Sufficient __; Case Statement Requires Revision X; Case Statement is Rejected __

Comments:

 

TAB recommends that the case statement be conditionally approved and subject to the following changes to the WG charter:

  • "Balanced" co-chairs and membership
  • More attention to output product and adoption plan, including possible reclassification as an interest group
  • In case of a WG a more precise work plan including dates
  • Identify related RDA WG/IGs and integration of their outcome (where available)
  • Describe how collaboration could be done with existing RDA work
  • Discuss group wrapup and tending of the output product