You are here

Body:

WG Charter

The general objective of the Fisheries Data Interoperability Working Group (FDIWG) is to devise a global data exchange and integration framework to support scientific advice on stock status and exploitation that build on fisheries data. Various fisheries data domains utilized in such scientific processes are concerned, including data collected for monitoring control and surveillance, scientific fisheries Data Collection Frameworks, fisheries scientific observers schemes, and statistical or status & trends reporting frameworks. The proposed framework will facilitate the use of de-facto, and preferably open, standards for the identification, description, mapping and publication of fisheries data supporting scientific processes..

More specifically, the fisheries Data WG will address the (minimal) metadata requirements to describe fisheries data required for supporting stock assessment and fisheries management. It will also seek to recommend global data standards for topical vocabularies, domain ontologies, and mapping rules and formats (as done for example by CF Conventions for physical and chemical parameters in oceanography).

Driven by pragmatic considerations, the working group will focus on few selected priority needs expressed by its invited participants, ranging from filling gaps in selected schemes to application of best practices across schemes, through issues of data transformation and harmonization among schemes. In terms of functionality and data types, the WG will identify several use cases describing realistic scenarios to produce and test fisheries data work-flows. The result of the WG recommendations will be captured as a set of best practices.

By including interoperability experts, organisations with standardization initiatives, and standardization bodies, the WG will have key actors to reflect and propose future governance of the data framework The focus of this governance is the efficient delivery of interoperability guidelines.

To organize the collaboration and involvement from the community, the WG co-chair on Fisheries data structures will oversee the activity of two topical sub-groups, with one co-chair responsible for the formulation of a framework for structured fisheries data exchange (data structures), and another co-chair responsible for fisheries geospatial explicit data.

To achieve these objectives, the WG will;

  1. Promote existing facilities for data sharing on capture, landing, effort, size classes, VMS and production through sharing of structural data definitions. This promotion will be supported by demonstrations of live examples of data sharing;

  2. Facilitate access to data by recommending standards such as netCDF, SDMX or UN/CEFACT and assist in adoption of tools and facilities;

  3. Recommend existing data tools: Tools for Master Data Management (MDM), database connectors, registries and other assets;

  4. Recommend Master Data Management solutions for classifications and multilingual / multi-locale data: The challenge lies in the variety of languages in which the data is stored, and locale specific data types. This requires also mapping between local classifications and regional and global ones.

  5. Connect existing data networking initiatives such as

    1. The FAO FiRMS partnership,

    2. The FAO secretariat to the Coordinating Working Party on Fishery Statistics (CWP), which combines 19 global partners such as ICES and IOTC,

    3. The Tuna Atlas initiative (tuna RFMOs, FAO, IRD), to provide examples for storing datasets from various RFMOs within a single gridded data format;

    4. Extend CF Conventions  for biological and fisheries data

    5. EU / DG MARE: DCF, Integrated Fisheries Data Management Programme (FLUX)  and INSPIRE directive,

    6. The SDMX community, such as through of Eurostat, FAO, and Worldbank

    7. Unesco’s "International Oceanographic Data and Information Exchange" (IODE) of the "Intergovernmental Oceanographic Commission" (IOC)

    8. Other relevant RDA WG’s and IG’s; such as (Alfabetically) the Agrisemantics WG, Data Citation WG, Agricultural Data IG, Geospatial IG Marine Data Harmonization IG, RDA/CODATA Legal Interoperability IG

 

Value Proposition

The WG will provide a negotiation framework on fishery related standards for data storage and exchange structures to improve data analysis. It will benefit organizations in the fisheries sector by providing a reference interoperability framework based on existing initiatives and formats.

In the longer term, implementing a common framework (however small the scale may be) will help to further cultivate a fisheries data ecosystem, based on common tools and services.

  • The fisheries data managers and data scientists will have a common and global framework to describe, document, and structure their fisheries data.

  • If suitable standards are identified, then the WG can propose generic data storage standards (e.g. for gridded datasets or NetCDF) and services (OGC Web Services for GIS community, facilitating INSPIRE DIRECTIVE compliance..)

  • Fishers, traceability organizations, NGO’s, and other data users will have seamless access to a wide range of fisheries data. Data mapping will also ease emergence of new data analyses and knowledge discovery methodologies.

  • Other infrastructures data managers and scientists will have the benefit of a reusable data framework. Researchers working on other domains will easily access, reuse and link up fisheries data with their own data.

  • Development professionals and policy makers for will be enabled to take informed decisions across multiple data providers.

Expected key impacts of the RDA fisheries Data Interoperability Guidelines

  • Reduced costs related to reusing data. The incompleteness of standards (or guidelines) has a cost. Indeed, e.g. data structures can vary a lot for similar data and much time is wasted to transform data from one format into another. Agreement on a set of standards and writing related guidelines is key.

  • Increased adoption of existing common standards, vocabularies and best practices related to fisheries data management with new communities, such as regional projects. Increased general awareness about research open data and interoperability standards among the fisheries organizations.

  • Enhanced access, discovery (metadata) and reuse of fisheries data, and improved visibility.

  • Major fisheries data integration and more effective measure of the of free sharing impact of fisheries data through data provenance attribution.

  • Created new opportunities for Data Structure Definition (DSD) and ontology based knowledge management in the fisheries sector.

Engagement with existing work in the area:

The members of the WG will liaise through their organizations with existing activities in the area of fisheries data exchange and overall activities to foster data interoperability. The engagement will allow the WG to tap into a wide knowledge base of data exchange specialist, and prepare its recommendations that may also be of value to experts beyond the domain of fisheries data exchange, such as legal interoperability and geospatial metadata experts.

  • iMarine / BlueBRIDGE: Tuna Atlas use case for RFMOs datasets,

  • ICCAT BFT-E Stock Assessment working group to facilitate stock assessment datasets sharing,

  • OpenAIRE open data specialists,

  • Agroknow network of expertise on open data sharing,

  • EGI Engage e.g. for legal interoperability,

  • IRD scientific data collection activities,

  • FAO and Eurostat SDMX SEIF initiative,  

  • DG-MARE: FLUX initiative (in particular VMS & elogbooks) and DCF / DCMAP,

  • FAO CWP standards for fisheries reference data,

  • OGC geospatial standards setting organization.

Through the engagement work, a list of potential adopters of the WG products will emerge. Specific statements of interest and priority needs are expected from the invited participants while the WG is established. Examples of interoperable data flows that could benefit from the application of WG reference models and best practices include:

  • FAO Fisheries and Aquaculture department:

    • data ingestion from regional fishery bodies, fisheries organizations or members states; regional databases to support scientific process;

    • improve the statistical data exchange in line with CWP’s SDMX initiative;

    • Improve the geospatial data exchange building on CWP’s geospatial standards work group;

  • IRD:

    • improve fisheries observers’ data flows to support regional fisheries bodies;

    • Improve scientific data flows;  

    • improve the quality of NetCDF metadata;

  • EU:

    • Ease interoperability between FLUX and SDMX;

    • Ease interoperability between FLUX and FishFrame.

Work Plan

Work plan components

Inventory of existing formats to support solutions (months 1-4)

The first months after the WG has been established, a consultation of existing formats and activities related to fisheries data will identify:

  1. Data formats, existing and proposed,

  2. Data exchange needs and examples,

  3. Data access and storage existing solutions and development proposals

We will evaluate recommended data exchange approaches for several specific scenarios and select pilot candidates for a demonstration. The selection of these candidates will be in close cooperation with stakeholders and data owners. In this phase, a detailed report of the technical aspects of data sharing approaches will be developed.

Examples of scenarios where the WG can propose data interoperability solutions could be selected for inclusion in the report include:

  • Globally established data frameworks interoperability; what are the technical challenges in re-using data collected through e.g. FLUX or SDMX work-flows?

  • Improve coverage and re-usability of on-board collected data such as by-catch reports by harmonizing reference data through master data management; can the interoperability of collected data be improved by relying on global reference data for e.g. species names, gear classifications, and area references;

  • Legal interoperability requirements; what provisions do exist in current data exchange mechanisms to ensure that the data are properly described from a legal perspective through descriptive metadata on license, copyright, and ownership

  • Spatial data interoperability of fisheries geospatial explicit data such as gridded datasets (Tuna Atlas example) through descriptive metadata;

  • Identify requirements for additional data formats for activities such as vessel or FAD (Fishing Aggregating Devices) trajectories

The WG will not meet physically, but be consulted on-line with several on-line WG meetings.

This report will be the Deliverable of this phase.

 

Defining the reference models (months 3-8)

We will develop technical reference models for data exchange based on the inventory above and including possibly Data Structure Definitions for statistical data, and as UML for OGC and ISO standards.

Each model should be open, extensible and, if possible, implementation agnostic. They define how fisheries data can be structured in order to facilitate the sharing of data- and subsets, and how those structured data can be used in interoperable exchanges.

A selected set of DSD’s and UML diagrams or other formalization of fisheries data for exchange, based on the report of the previous phase, will be this activity’s deliverable.

A reference model should address the interoperability issues related to formats, ownership, copyright, data re-use and data quality.

Improve and test the models iteratively (months 7-12)

The models(structure definitions and UML) developed in the previous step will be evaluated against suitable data sets of considerable size from various research organizations. The consortium partners will be asked to provide their real world research sets as a testbed to evaluate each model. This will follow an iterative approach in order to allow improvements.

After the models have been validated, a reference architecture for fisheries data will be implemented. This reference implementation should be based on open source software in order to be usable and improvable by all participating partners. Several implementations of data architectures already exist, and these could be repurposed to also accept the fisheries data models.

  • For statistical data;

  • For geospatial explicit data;

The implementation has to be generic and flexible enough for being adapted to various purposes. An official release will follow the implementation and iterative improvement phase and demonstrate interoperability between two systems (a producer and a consumer) with a live example of fisheries data.

The evaluation report of the existing reference architectures for suitability to manage fisheries data will be the deliverable of this phase.

Promotion of the RDA FDI Model and Reference Adoption

(months 8-18)

Promotion activities will include internal and external dissemination about the data structures and architecture. The reference implementation will be accompanied by substantial documentation and use case scenarios in order to increase adoption and encourage contributions.

WGFDI operation

Form and description of final deliverables

The deliverables are listed above as activity phase outcomes.

Milestones

No particular milestones are specified. If needed, the Deliverables of the previous section can be used as milestones.

Communication and outreach

The entire process will be supported by dissemination activities and community outreach. The dissemination will rely on RDA tools, and include a wiki, documents, and possibly a demonstration site in an EU infrastructure. No developer forum or mailing lists are foreseen.

The outreach will focus on the initiation phase and conclusion phase; the announcement of the activity and pans, the installation of the core team and the resource team, the development of the concrete objectives, and when a result has been obtained, a presentation of progress, and plans for a further development and roll-out phase through the participating members channels.

 

Initial Membership of the FDIWG

The WG organization is specified in the Case Statement. It will be structured with 3 Co-chairs. The first Co-chair will retain overall responsibility on progress and deliverables, and communication with RDA, while the Co-chairs will be responsible for content development and more broadly with technical issues and future collaboration.

Leadership

  • Co-chair: Anton Ellenbroek (FAO) - Fisheries data structures

  • Co-chair: Julien Barde  (IRD) - Fisheries and geospatial data management

  • Co-chair: Aymen Charif (FAO) - Statistical data management

Members/Interested (Not formally invited):

  • Marc Taconet - FAO Rome - Data Governance and Global fisheries data interoperability

  • Donatella Castelli - CNR-ISTI, Pisa - Networking and data interoperability

  • Pasquale Pagano - CNR-ISTI, Pisa - Data and infrastructure interoperability

  • Yann Laurent - FAO Consultant - Fisheries data exchange and interoperability expert

  • Neil Holdsworth - ICES Denmark - fisheries data formats and tools

  • Daniel Surany - ESTAT - SDMX Expertise

  • Erik van Ingen - FAO CIO Rome - SDMX Expertise, mainstreaming fisheries data in FAO UN statistical data flows

  • Fabio Carocci and Emmanuel Blondel; FAO Fisheries - Geospatial data standards expertise;

  • FAO ESS - TBC

  • Charalampos Thanopoulos - Agroknow Greece - Expert on data interoperability

  • Imma Subirats / C.Caracciola - FAO OPCC Rome - Data interoperability experts

  • NOAA - TBC

  • NAFO - Through FAO FiRMS partnership and CWP (logbook data models)

  • David Ramm - CCAMLR Hobart - Fisheries data management expert

  • Alicia Mostiero / Dawn Borg Costanzi - FAO Rome - Global Record - Vessel data management expert  (UN/CEFACT - FLUX)

  • DG Mare - FLUX: Thierry Remy / Eric Honoré (UN/CEFACT business layer standardization)

  • DG MAre - DCF: Bas Drukker / Venetia Kostopoulou Venetia.Kostopoulou

  • JRC - TBD

  • VLIZ - WoRMS Marine species master data, marine georeferences

  • Dimitris Gavrilis - Athena RC

Review period start:
Thursday, 5 January, 2017
Custom text:
Body:

 

TEACHING_TDM[1]

 

Introduction

 

In the healthcare sector, 1.3 million new pieces of research related to biomedical science alone are published each year.[2] A typical database search returns about 80,000 hits, and only 4,000 of those are likely to be very relevant to a researcher’s work. Text and Data Mining (TDM) techniques can already be used to zoom in on the top 25% of papers which are most relevant to any given search query. Researchers believe that, with a little more work, it will be possible to use TDM to identify the top 10% of search results. In a similar vein, the quantity of data being created has also grown exponentially, making it difficult to handle and analyse. Data mining techniques are needed to help researchers to spot patterns in large batches of data.

TDM was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings.” It’s applicability in all fields of research is growing in this age of information overload.[3]

Recent studies show the uptake of TDM is lacking.[4] One of the reasons is the lack of awareness and skill amongst researchers, librarians, and industry practitioners.[5] A key conclusion from the Publishing Research Consortium survey on TDM was that ‘Awareness of text mining techniques is still relatively low.[6] Moreover, the European Communications Monitor identified a ‘gap between training offered and development needs’ [7] Both industry and academia have confirmed a need for education on TDM. Our focus will be on providing the basic skills so as to reach the widest audiences. The decision therefore is to establish a Working Group with a clear focus and purpose to develop a course within the 18 month time frame.

 

Purpose of Initiative

This Working Group aims to address the current skills gap identified with respect to Text and Data Mining (TDM) and help improve the adoption of these practices in a range of research disciplines.

TDM is a cross-cutting skill of value to a wide range of researchers. This working group aims to develop a short module that can plug into existing courses (e.g. the CODATA-RDA School of Research Data Science and existing university research skills courses) to equip researchers and practitioners with basic TDM skills and increase the use of these.

 

Scope of initiative

The Working Group aims to develop a short introductory programme and related content (presentations, exercises and case studies) to introduce researchers[8] to TDM and provide practical experience in applying open source tools to use these skills in their field of research.[9]
The design of the course will be developed based on the research and feedback from the stakeholder communities in the upcoming months.(see timeline and workplan) More specifically the content and the proposed duration of the course will be determined after these consultations. For now we envision a 1-2 day modular course for people with no prior knowledge that includes stand alone modules,lessons and elements that can be selected independently depending on the focus and level of knowledge the participants. The course can be spread out over several days or weeks to fit within existing courses and trainings.

The introductory course will not be discipline-specific, though later iterations could be tailored towards this if needed and for example go into more detail into discipline related fields of interest and expertise.. Although the 1-2 day course aim is to address the skills gap for researchers with no prior knowledge we anticipate that we may need to extend the duration of the course to 4-5 days if we find that we need to include more basic introduction courses on for example the more technical aspects of TDM.

The course and course materials will be made available online and in digital easy to use and modular format accessible for anyone who is interested to use and adapted the course to suit their specific level/audience.[10]

 

Background to Initiative

The European projects FutureTDM, FOSTER and EDISON confirm that there is a growing demand for researchers who understand and are able to use TDM and that current education is falling behind in providing people with the skills and knowledge needed both in academia and industry.[11]

At RDA Plenary 8, a discussion on TDM in the IG on Education session confirmed community interest in developing training materials to address the skills gap. This working group therefore aims to look at how education and in-work training can help fill the gap and create enough expert data scientists.[12]

 

Relevance of the Initiative

Taking into account the many benefits of TDM for research and society this is a topic relevant for RDA. By designing a course to cover TDM skills and developing course materials and making them available to the community we can contribute in bridging this gap. This will include learning outcomes (essential and desirable) and  course content (specific readings, lecture and discussion content, class activities, practical assignments, and graded assignments).
Proposed Outcome
The aim is to develop a generic/adaptable course or training module that can then be used by different disciplines on TDM skills and knowledge.

 

Timeline and Workplan : Term: 12-18 months

Quarter one - 2017: Requirements gathering phase

This will include identifying survey participants (such as existing course providers, the research community, industry partners, librarians and RDA members) and undertaking a questionnaire to understand what skills need to be covered in an introductory TDM course.

Analysing survey outputs and drafting a course design, learning outcomes and programme for consultation at the RDA plenary in Barcelona.

This work will be conducted via virtual meetings and desk-based research.

Deliverable: Survey and results

Milestone: Preliminary course outline for discussion in Barcelona

Quarter two - 2017: Course development

Development of course content, including specific readings, lecture and discussion content, class activities, practical exercises and graded assignments. For this we will look at existing courses and tutorials and build upon those with input from the TDM community such as users and tool developers. For example we will work together with Contentmine, Industry partners such as SAS and at least two Universities who have expressed interest in adopting a course.

Establishing an international network of experts and potential TDM trainers. This will build on the initial survey work and contacts developed through the WG and will support roll-out and reuse of the materials.

The majority of this work will be conducted virtually, with OKFN leading. At least one face-to-face meeting will be scheduled to help define the structure of the course and/or develop key components.

Deliverable: A draft set of training materials and user guides ready for testing

Quarter three - 2017: testing

Liaising with contacts to establish one or two potential opportunities to trial the course. These could be aligned with existing events from partners such as DCC, FutureTDM or institutions who have expressed an interest in hosting events for researchers.

A train-the-trainers style session could be run at RDA Montreal to walk members through the course content and how this should be delivered to receive feedback from potential adopters.

This work will require at least two face-to-face sessions to deliver courses in different contexts

Milestone: Have tested the course and gathered feedback from trainers and pilot participants

Quarter four - 2017: evaluation and review

Here we will take stock of feedback received during the trial. Particular emphasis will be paid to which sessions were most effective in addressing the learning outcomes and engaging participants. The time taken to deliver the sessions, any technical issues encountered by trainers and ideas for reworking content or improving flow will also be addressed.

The course materials will be refined based on the feedback and materials to assist others in reusing the content such as speaker notes will also be improved.

The work will be conducted remotely with regular virtual meetings to support the analysis and review.

Deliverable: a revised set of openly-licensed training materials available online for reuse

Quarter five - 2018: adoption

The complete course materials will be made available online (github, slideshare, zenodo) together with documentation on how to implement the course module, FAQs and contact details for support. Further events like the train-the-trainers at Montreal could help others to understand and adopt the resources.

Through the DCC, European training initiatives (e.g. Swafs-07) and e-infrastructure projects like OpenAIRE, we will raise awareness of the module and promote adoption in academia.

In addition the IEA has a number of industrial partners (including Microsoft, Airbus, environmental consultancies and civil engineering companies)  and can be used as a route to gaining contact with industry.

This work will involve promoting the outputs at events, as well as specific meeting with key targets (e.g. training departments and Doctoral Training Centres) to promote adoption

 

WG Communication

 

Bi-weekly calls for the Chairs or others engaged in specific activities currently underway

Monthly calls to update all members of the Working Group on progress

WG Email list for discussion and sharing of relevant information

Google Drive/ Github for collaboration on course materials

 

Members

 

(co-)chairs

-                       Freyja van den Boom (EU)
                        Sarah Jones (EU)

                        Devan Ray Donaldson (US)
                        Clement E. Onime (TBC)

members

  • Steve Brewer
  • Vicky Lucas
  • Simon Hodson
  • Amy Nurnberger        
  • Puneet Kishor
  • Baden Appleyard
  • Christoph Bruch
  • Alex Fenlon
  • Jez Cope
  • Hugh Shanahan
  • Małgorzata Krakowian
  • Bridget Almas

Group Email: tdm@rda-groups.org

Secretariat Liaison: Fotis Karayannis

TAB Liaison: Devika Madalli

Engagement with existing work in the area:

Collaborations and opportunities for further engagement include:

http://www.futuretdm.eu/ The FutureTDM project seeks to improve uptake of text and data mining (TDM) in the EU. FutureTDM actively engages with stakeholders such as researchers, developers, publishers and SMEs and looks in depth at the TDM landscape in the EU to help pinpoint why uptake is lower, to raise awareness of TDM and to develop solutions.

http://edison-project.eu/
EDISON is a 2-year project (started September 2015) with the purpose of accelerating the creation of the Data Science profession.

https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/swafs-07-2016.html

The forthcoming Swafs-07 ‘Training on Open Science in the European Research Area’ project.

http://www.codata.org/working-groups/research-data-science-summer-schools
CODATA-RDA School of Research Data Science

http://www.the-iea.org

Part of the University of Reading, providing training on analytics and producing proof of concept software either by using environmental data or big data for environmental applications.  The IEA is funded until 2019 by the Higher Education Funding Council for England. The IEA recognises that TDM is a growing field for environmental analysis and applications.  The IEA currently has projects using TDM in tweets and text messages and is moving into larger document analysis, specifically environmental impact assessments.

http://www.bfe-inf.org/

The Belmont Forum is a group of national science funders, including NSF (US) and NERC (UK).  The e-infrastructure group is exploring training requirements for research data scientists, including developing a relevant curriculum in 2017.

http://www.dcc.ac.uk/training

 

The UK Digital Curation Centre has delivered training on Research Data Management for several years and is involved in training activities for a number of European projects such as FOSTER, OpenAIRE, EUDAT and the European Open Science Cloud. Through these and participation in the CODATA summer schools, the DCC will help to embed the module in existing courses and encourage broad adoption.

Other possible collaborations:

Academia: We have interest from several Universities

Possible try-outs may be organized alongside Trieste School 10-21 July at ICTP in Trieste; followed by Sao Paolo, Brazil, 4-15 December.

School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively

Industry and organisations: Contentmine, SAS

 


[1] Developed during the Plenary in Denver IG session Education and Training on handling of research data

[2] FutureTDM project report D4.3 Compendium of Best Practices and Methodologies available online at http://www.futuretdm.eu/knowledge-library/

[3] See for an overview of use examples in the US: Why “Big Data” Is a Big Deal Information science promises to change the world, Shaw. J Harvard Magazine available online http://harvardmag.com/pdf/2014/03-pdfs/0314-30.pdf

[4] The EU expert report on Text and Datamining states that Europe is falling behing the US and China with respect to the uptake of TDM available at http://ec.europa.eu/research/innovation-union/pdf/TDM-report_from_the_ex...

[5] FutureTDM consortium D4.3 Compendium of Best Practices and Methodologies report shows the need for more TDM practitioners in industry as well as a lack of awareness and skill amongst students and researchers in different disciplines.

[6] Key finding from the Publishers community on this issue available here http://publishingresearchconsortium.com/index.php/prc-projects/text-mining-of-journal-literature-2016?platform=hootsuite

[7] As identified in Europe. See European Communication Monitor 2016 http://www.communicationmonitor.eu/

[8] We will initially develop this course aimed for (student) researchers with no or little prior knowledge on TDM. For a second iteration of the course we will also look at industry, librarians and other interested parties to see how the course can be tailored more to specific needs.

[9] The course will be made available under an open access license using open source tools and materials to make sure the course can be adopted by a wide audience.

[10] The content of the course, course materials and best platform to make them available will be looked at in this working group. See timeline for more detailed information,

[11] FutureTDM Deliverable 2.4 and 4.3 available at http://www.futuretdm.eu/

[12] The UK Royal Society is holding a special conference on this topic see https://royalsociety.org/science-events-and-lectures/2016/11/data-skills-workshop/

 

Review period start:
Friday, 9 December, 2016 to Monday, 9 January, 2017
Custom text:
Body:

Scholarly Link Exchange Working Group:

Follow on from: RDA-WDS working group on Data Publishing Services

On Enabling Interlinking of Data and Literature

Charter:

The Scholarly Link Exchange Working group aims to enable a comprehensive global view of the links between scholarly literature and data.  The working group will leverage existing work and international initiatives to work towards a global information commons by establishing:

  • Pathfinder services and enabling infrastructure
  • An interoperability framework with guidelines and standards (see also www.scholix.org)
  • A significant consensus
  • Support for communities of practice and implementation

 

By the end of this 18 month WG period there will be:

  • A critical mass of Scholix conformant hubs providing the enabling infrastructure for a global view of data-literature links
  • Pathfinder services providing aggregations, query services, and analyses
  • Beneficiaries of these services accessing data-literature link information to add value to scholarly journal sites, data centre portals, research impact services, research discovery services, research management software, etc.
  • Operational workflows to populate the infrastructure with data-literature links
  • Better understanding of current data-literature interlinking landscape viewed from the perspective of e.g. disciplines, publishers, repositories etc.

 

The working group follows on from the RDA/WDS Publishing Data Services WG, https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html.  The original working group established demonstrator services enabling infrastructure.  The follow on working group will support the “hardening” of that infrastructure and services as well as an increase in the number of participating hubs and services. The original working group established an interoperability framework. The follow on group will provide further specification, documentation and profiling of that framework to support adoption by link contributors and consumers.  The original working group established a consensus among large infrastructure providers and early adopters; the follow up group will extend that consensus to the next stage of adopters and to a more diverse set of infrastructure providers.  The original working group harnessed the energy and interest of specialists; the follow up group will provide support for a number of communities and services as they implement and adopt the framework and vision established in the original group.

The working group believes a global system for linking data and literature should be:

  • Cross-disciplinary and global (built for, and aspiring to, comprehensiveness) 
  • Transparent with provenance allowing users to make trust and quality decisions
  • Open and non-discriminatory in terms of content coverage and user access (this also means ranging from formal to informal, and from structured to non-structured content)
  • Standards-based (content standards and exchange protocols)
  • Participatory and adopted, including community buy-in
  • Sustainable
  • An enabling infrastructure, on top of which services can be built (as opposed to a monolithic “one-stop-shop” solution).

Note - This group retains the principles established in its precursor working group (Publishing Data Services)

 

Value Proposition:

The WG aims to oversee and guide the maturation of a distributed global system to collect, normalize, aggregate, store, and share links between research data and the literature. This will build upon the output of the preceding Data Publishing Services Working Group, which delivered a consensus vision and set of guidelines called the Scholix Framework, together with an operational system called the Data-Literature Interlinking (DLI) System, which puts these guidelines into practice as a pathfinder implementation. The WG proposed here will build out these assets into an operational infrastructure and service layer that is to become the de facto go-to place for organizations to deposit or retrieve links between research data and the literature.

 

The value of such a system ultimately rests on the value of links between research data and the literature. The utility of such links is threefold (see also the Case Statement of the Data Publishing Services WG):

  1. They improve the visibility and discoverability of research data (and relevant literature), so that researchers can find relevant material more easily.
  2. They help place research data in the right context, so that researchers can re-use data more effectively.
  3. They support credit attribution mechanisms, which incentivize researchers to share their data in the first place.

 

These value elements are illustrated below, and in more detail in Annex A.

While there is broad support for the value and utility of data-literature links amongst the various stakeholders in research data publishing (including researchers as the ultimate end-users of this information), organizing the associated information space is not an easy feat: there are many disconnected sources with overlapping information, and there is a wide heterogeneity in practices today - both at a technical level (different PID systems, storage systems, etc.) and at a social level (different ways of referencing a data set in the literature, different moments in time to assert a link, etc.). As a consequence, the landscape today is incomplete and patchy, characterized by independent, many-to-many non-standard solutions - for example a bilateral arrangement between a journal publisher and a data center. This is both inefficient and limiting in the value that can be delivered to researchers.

 

The universal linking infrastructure which this WG strives to put in place represents a systemic change. It will offer an overarching, cohesive structure that binds together many of today’s practices into a common interoperability framework - which will ensure that links between research data and the literature can be easily shared, aggregated, and used on a global scale. This will drive a network effect, where the value in the system as a whole is greater than the sum of individual parts: for researchers as end-users, this value lies in the comprehensiveness and quality of link information; for service providers and infrastructure providers (including journal publishers and data centers), the value also lies in simplicity, efficiency, and reduction of friction in the process by being able to work with a single interface to deposit and retrieve links (and, potentially, the possibility to benefit from additional services developed on top of the core infrastructure).

 

Who will benefit and Impact

 

Mapping the value proposition as described in the above to the various stakeholders and actors in research data publishing (copied largely from the Data Publishing Services WG Summary & Recommendations), benefits and impact may be summarized as follows:

  • For data repositories and journal publishers: linking data and the literature will increase their visibility and usage, and can support additional services to improve the user experience on online platforms (for example, offering links to relevant data sets with articles, or offering links to the literature that will help place data in context). In contrast to the bilateral arrangements that we often see today between data centers and journal publishers, the global linking infrastructure will make the process of linking data sets and research literature a more robust, comprehensive, and scalable enterprise.
  • For research institutes, bibliographic service providers, and funding bodies: the infrastructure will enable advanced bibliographic services and productivity assessment tools that track datasets and journal publications within a common and comprehensive framework.
  • For researchers: firstly, the infrastructure will make the processes of finding and accessingrelevant articles and data sets easier and more effective. Secondlyit will

 

 

Engagement with existing work in the area:

  1. Building upon previous work of the RDA/WDS Publishing Data Services WG
  2. RDA/WDS Publishing Data IG
    • RDA/WDS Publishing Data Bibliometrics WG
    • RDA/WDS Publishing Data Workflows WG
  3. Infrastructure providers
  4. Infrastructure projects
  5. Related projects
  6. Data Center Community
    • ICSU WDS
    • DataCite
  7. Publisher Community
  8. Institutional Repository Community
    • OpenAIRE
    • SHARE
  9. Discipline-specific Communities
    • Pangaea (Earth and Environmental Science)
    • EBI-EMBL (Life Sciences)
    • ICPSR (Social Sciences)
    • CERN (High Energy Physics)

 

Adoption Plan:

The Adoption Plan for this Working Group is quite mature since it builds on a previous working group, includes adopter work packages, includes outreach and documentation work packages, targets new hubs, and focuses on benefit realisation.

 

Previous Working Group:  The proposed working group builds directly on the Data Publishing Services Working Group which has a considerable membership with an active core of contributors. The WG is representative of publishers, data centres, research organisations and research information infrastructure services who are the key stakeholder and adopter communities. The existing momentum and buy-in of this group will be leveraged for adoption.

 

Technical Development of Hubs: In a similar vein, the WG activity plan includes targeted activity to extend existing hubs (CrossRef, DataCite, OpenAire, RMap) and establish new hubs in new community areas (such as Astronomy, Life Sciences).

 

Implementation Sub Projects: The working group case statement “Activities” section provides details of a number of adoption sub projects.  The Scholix framework that underpins the WG approach involves content publishers (eg journal publishers or data centres) communicating with natural hubs (eg CrossRef and DataCite). This WG activity plan includes implementation projects from publisher to hub.

 

Documentation and Support Materials: The WG activity plan includes an extension of the Scholix framework by providing documentation of instantiation of the abstract Scholix information model in various technologies or formats (such as xml, rdf, json) and using a number of common protocols (such as open api calls, sparql, oai-pmh, resourceSync). These specification and implementation materials will also be the product of the development and adoption projects described above.

 

Outreach, Liaison, Collaboration:  This Working Group focuses on a technical solution to the exchange and aggregation of data-literature link information.  Other peak bodies and advocacy groups focus on changing practice and integrating data citation as part of scientific practice.  The WG work plan includes collaboration with those organisations to leverage their established agendas.  Current members of the WG include leaders in these organisations and further such activity is slated in that area of the work plan.

 

Benefit realisation: The sustainable driver of adoption is benefit for the adopter.  The overall work plan is underpinned by the objective of delivering benefits to end users, as outlined in the use cases of the Annex A.

 

Work Plan:

The work plan will be implemented through a set of interconnected activities outlined below. Categories exist only for planning and pragmatic purposes; they are not at all independent and activities will not be siloed.  Cross-category contributions by working group members will be the norm.

Stream 1.Technical Development.

The objective of this stream to put the Scholix framework into practice such that both hubs and services develop operational functionality.

A. Develop Hubs

  1. OpenAIRE
    • Make OpenAIRE APIs compatible with Scholix to export and import links to and from DLI Service
  2. DataCite
    • Further develop standardised interfaces for query and export
  3. CrossRef
    • Further develop standardised interfaces for query and export
  4. New domain-specific hubs, e.g. EMBL/EBI(TBC by opportunity)
  5. Interim hubs (direct feed to DLI): standardisation (using Scholix framework) of feeds from previous working group and improvement of dynamic currency of feeds
    • ANDS to DLI direct (only non-DOI content)
    • ...
  6. Further interoperation of the hubs (extensions to the Scholix conceptual framework during the course of the working group)

B. Develop Services (in relation to the user scenarios defined in previous WG)

  1. DLI aggregation service https://dliservice.research-infrastructures.eu/#/
    • Transition to production at OpenAIRE data centre and infrastructure
    • APIs for PID resolution (Scholix conformant) - Pangaea
    • Improving quality: e.g. de-duplication of objects (datasets and literature)
    • Improving service level: live updates of links
  2. Use of the Scholix framework to access and expose links between articles and data in exemplar end-user services
    • OpenAIRE APIs compatible with Scholix to export and import links to and from DLI Service
    • Data centre/ publisher exemplar projects using DLI as per user scenarios

C. Elaborate the Scholix framework

  1. Create profiles of the inf model for use in different technologies
    • XML for oai-pmh
    • JSON for RESTful api
  2. Investigate how best to apply
    • DISCO (through cooperation with RMAP)
    • ResourceSync
    • Others?…(RDF for Sparql)
  3. Provide documentation and support materials for the above

 

2. Community buy-in stream

This stream supports buy-in from different communities such that exchange of scholarly link information is implemented and accepted as standard practice.

D. Support Community Adoption:

  1. Create strategies for community adoption:
    • Publishers
    • Data centres
    • Repositories
    • ….
  2. Implement these strategies through:
    • Early adopter groups (eg CrossRef early adopters; e.g. Force11 DCIP project; eg via the THOR project; with COAR)
    • Implementation projects
    • Webinars
    • Presentations
    • Support materials and activities

E. Communicate Broadly

  1. Create communications plans
  2. Implement communications plans

F.  Create Coordination and Governance Materials. Investigate and document issues such as:

  • Quality of data links
  • Requirements to be a hub
  • Access
  • Benefits for contributors
  • Measures of success

 

Key Stakeholder Groups:

The above Activity Plan will be delivered with involvement of the following groups who bring complementary resources, approaches, focus, and expertise.

A. Advocacy and Peak Bodies

  • Force11 (application data citation standards & advise on implementation standards)
  • CODATA (application data citation standards & advise on implementation standards)
  • ICSU World Data System (e.g. get more citations into DataCite)
  • STM (outreach, training, Crossref early adopter project)
  • ESIP / COPDESS
  • FAIR Data

B. Other data literature linkage projects

  • National Data Service
  • RMAP (application of DISCO)
  • SHARE
  • RDA Working Groups (Publishing Data IG, ….)

C. Prospective Hubs

  • BIOCaddie (DataMed)
  • EMBL-EBI/ELIXIR
  • NASA ADS

 

Initial Membership

Initial members are coming from the existing working group on an opt-out basis; they will be asked again if they want to join this newly formed working group by e-mail following the RDA

https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html

 

Adrian Burton

George Mbevi

Kathrin Beck

Paul Dlug

Amir Aryani

Håkan Grudd

Kerstin Helbig

Peter Rose

Amye Kenall

Haralambos Marmanis

Kerstin Lehnert*

Peter Fox

Aris Gkoulalas-Divanis

Howard Ratner

Lars Vilhuber

Rabia Khan

Arnold Rots

Hua Xu

Laura Rueda*

Rainer Stotzka

Arthur Smith

Hylke Koers

Laurel Haak

Richard Kidd

Bernard Avril

Iain Hrynaszkiewicz

Leonardo Candela

Rick Johnson

Carly Strasser

Ian Bruno*

Luiz Olavo Bonino da Silva Santos

Robert Arko

Carole Goble

Ingrid Dillo*

Lyubomir Penev

Rorie Edmunds*

Caroline Martin

Jamus Collier

Mark Donoghue

Sarah Callaghan*

Claire Austin

Jeffrey Grethe

Martin Fenner*

Sheila Morrissey

Claudio Atzori

Jingbo Wang

Martina Stockhause*

Siddeswara Guru

Dan Valen

Jo McEntyre

Michael Diepenbroek*

Simon Hodson*

David Martinsen

Joachim Wackerow

Mohan Ramamurthy

Suenje Dallmeier-Tiessen

David Arctur

Johanna Schwarz

Mustapha Mokrane*

Tim DiLauro

Donatella Castelli

John Helly

Natalia Manola

Timea Biro

Eefke Smit*

Jonathan Tedds

Niclas Jareborg

Tom Demeranville

Elise Dunham

Juanle Wang*

Nigel Robinson

Ui Ikeuchi

Elizabeth Moss

Kate Roberts

Paolo Manghi

William Mischo

Francis ANDRE

Katerina Iatropoulou

Patricia Cruse*

Wouter Haak*

 

 

 

Xiaoli Chen

 

 

 

Yolanda Meleco

* Representattives of a WDS member

 

Initiatial workstream leads and co-chairs:

    • techincal specs and docs (Paolo Manghi)
    • hub development and interoperability (Martin Fenner)
    • Scholix service development (Jeff Grethe)
    • publisher (Iain )
    • repository (Ian Bruno)
    • general outreach (Fiona Murphy)
  1. WG Coordination
    • WG program oversight (Wouter Haak)
    • WG component integration (Adrian Burton)

 

Annex: Use Cases

 

Use Case

Details

 Live linking

As a publisher, I want to know about relevant data for an article that I published so that I  can present links to such data sets to the users on my platform

- OR -

As a data center, I want to know about relevant articles for a data set that I published so that I  can present links to such articles to the users on my platform

  • Needs to be on-demand, real-time query. Performance is critical.
  • Publisher or  data center platform should be able to control UI for smooth platform integration.
  • No need for the service to do any filtering; just return all linked data sets and client can filter as needed.

 

Overview

As a data center, I want to obtain a full overview of article/data (and data/data) links for the data sets relevant to me so that I  can demonstrate the utility of my data

  • Query should be on-demand, complete, and up-to-date.
  • Precision and comprehensiveness are key
  • Ideally on-demand,  pull mechanism.

Notification

As a data center, I want to be alerted that an article may be citing/referencing our data so that I can validate that link and then add it to our own database.

  • For an alerting mechanism, recall is more important than precision (since the data center will still validate)
  • Should be push notifications.
  • Data center needs to be able to selectively receive notifications for their data repository only, need “data center” metadata.
  • This service is not so sensitive to comprehensive coverage

Exploration

As a researcher interested in a particular topic of study, I want to be able to explore a relevant article/data graph so that I  can find the articles or data sets that I am interested in.

  • General “research” use case, could apply to individual researchers, data repositories, and others.
  • Requires a lot of freedom to do exploration at the user’s terms
  • Would expect the user in this case is highly tech-savvy and will want to create their own search logic using a minimal “hopping service” that exposes a set of links given an article or data set PID.

 

 

Review period start:
Friday, 7 October, 2016
Custom text:
Body:

Please see attached document.

Review period start:
Thursday, 1 September, 2016 to Friday, 30 September, 2016
Custom text:
Body:

"Semantic Interoperability is usually defined as the ability of services and systems to exchange data in a meaningful/useful way." In practice, achieving semantic interoperability is a hard task, in part because the description of data (their meanings, methodologies of creation, relations with other data etc.) is difficult to separate from the contexts in which the data are produced. This problem is evident even when trying to use or compare data sets about seemingly unambiguous observations, such as the height of a given crop (depending on how height was measured, at which growth phase, under what cultural conditions, ...). Another difficulty with achieving semantic interoperability is the lack of the appropriate set of tools and methodologies that allow people to produce and reuse semantically-rich data, while staying within the paradigm of open, distributed and linked data.

The use and reuse of accurate semantics for the description of data, datasets and services, and to provide interoperable content (e.g., column headings, and data values) should be supported as community resources at an infrastructural level. Such an infrastructure should enable data producers to find, access and reuse the appropriate semantic resources for their data, and produce new ones when no reusable resource is available. The Agrisemantics working group aims at being a community hub for the diffusion of knowledge and practices related to semantic interoperability in agriculture, and to serve a common place where the future of data interoperability through semantics will be envisaged.

Review period start:
Thursday, 1 September, 2016 to Saturday, 1 October, 2016
Custom text:
Body:

This WG proposal emerged from the repository registry discussions within the Data Fabric IG. The bootstrapping co-chairs are Michael Witt, Johannes Reetz, Herman Stehouwer and Peter Wittenburg. At P8 we will suggest an election of the co-chairs and present an initial core group covering European, US and Asian experts also including an increased number of other initiatives that are actively building large federations.

 

For background information look at the Repository Registry web-pages in the DFIG realm:

https://rd-alliance.org/node/44520/all-wiki-index-by-group

 

Work Group (WG) Charter

The task of the RCD WG is to analyse existing mechanisms and schemas with help of which repositories are offering their detailed characteristics to service providers and based on this analysis develop two concrete recommendations:

  1. A set of guidelines that should be followed by digital repositories in presenting their characteristics
  2. A flexible enough nevertheless unified schema that should be used by trustworthy repositories in presenting their characteristics

Since it will not be easy to collect the information of a large group of repositories active in larger federations, the WG may restrict itself to deliver point 1 within the 18 months period, i.e. shift the definition of an agreed schema to a phase 2 group.

_____________________________

The full case statement can be downloaded  here.

Review period start:
Tuesday, 19 July, 2016
Custom text:
Body:

RDA Rice Data Interoperability Working Group Proposal

1.- Rationale

Rice is a staple food for some 4 billion people worldwide, and it provides 27% of the calories in low- and middle-income countries. Just to keep up with population growth, an additional 104 million tons of (milled) rice beyond the expected 2015 harvest of 475 million tons are needed by 2040, with little scope for easy expansion of agricultural land or irrigation—except for some areas in Africa and South America. Rice farming is associated with poverty in many areas. About 900 million of the world’s poor depend on rice as producers or consumers and, out of these, some 400 million poor and undernourished people are engaged in growing rice.

In the future, given declining environmental quality worldwide, rice will also have to be produced, processed, and marketed in more sustainable and environment-friendly ways, despite the diminishing availability of resources (land, water, labor, and energy). Climate change is exacerbating the situation through the effects of higher temperatures, more frequent droughts and flooding, as well as sea-level rise, which threatens rice production in mega-deltas. Nevertheless, the necessary increases in rice production to meet future demand have to come mainly from increases in yield per unit of land and water. (Rice Agri- Food Systems, IRRI 2015 www.grisp.net)

The research and development efforts generated international public goods as well as locally tailored solutions such as publicly accessible data and information systems, genes and markers, breeding lines, improved varieties, improved crop management and postharvest technologies, policy briefs, and training and dissemination materials, as well as knowledge products and capacity building.

The delivery mechanism for these products and services follows a pipeline approach: upstream research results in discoveries and innovations are translated into concrete products, which are introduced, evaluated, improved, and disseminated to intermediate users, and finally become adopted by end users, who may be millions of beneficiaries.

While modern rice research and research data dates back to 19th century, last five decades have seen the successful development of high-throughput technologies that generated large quantities of data in basic, applied and adaptive research in rice sector.  However, using these resources comprehensively, taking advantage of the associated cross-disciplinary research opportunities poses a major challenge to both domain scientists and information technologists. Effective data integration and management allows a broader perspective across many disciplines, than is possible from one or a series of individual studies. In the long run, this allows information to be used for purposes other than those for which they were originally intended, to address questions that were unapproachable at the time the data were collected. To this end, the need for umbrella approaches for providing uniform data is a much discussed topic in recent times.

Today global rice research is actually an ensemble of Consortium Research Projects, and is LESS a network of stand-alone institutes. The boundaries between national, regional and global players are blurring. Research organizations need to develop specific strategies and measures to create repositories that quickly communicate with each other.

This warrants building a common framework/ standard for rice data, information sharing. The Research Data Alliance (RDA) through its interest group namely, the Agriculture Data Interoperability Interest group becomes a space to discuss the need to improve data exchange enabling data integration in this domain. Keeping in view the complexities of data, information, knowledge continuum of rice sector, it becomes imperative to work towards a common framework for rice research data sharing across the globe. Hence Rice Research Data Interoperability  (RDI) Working Group.

Context :

The RDI Working Group aims to reinforce synergies between rice research & development organizations to support food security, nutritional value and safety while taking into account societal demands for sustainable and resilient agricultural production systems.

·     provide a forum to facilitate communication between research groups and organisations worldwide on effective sharing of rice research data

·     foster communication between the research community, funders and global policy makers at the international level to meet their research and development goals

·     facilitate and ensure the rapid exchange of information and know-how among researchers, and support knowledge transfer to breeders and farmers

At the 2012 G-8 Summit, G-8 leaders committed to the New Alliance for Food Security and Nutrition, the next phase of a shared commitment to achieving global food security. As part of this commitment, they agreed to “share relevant agricultural data available from G-8 countries with African partners and convene an international conference on Open Data for Agriculture, to develop options for the establishment of a global platform to make reliable agricultural and related information available to African farmers, researchers and policymakers, taking into account existing agricultural data systems.”

2. Charter

The aim of the Rice Research Data Interoperability WG is to provide a framework based on community-accepted standards, which ensure data analysis and data integration facilities. Such a framework is a great asset for the rice community to provide the analysis functions and other services expected by the researchers. Linking data bases, platforms and big data from different stakeholder organizations could be helpful for thousands of rice research organizations across the globe especially in Asia and Africa. The Rice Data Interoperability WG in collaboration with partners will work towards bridging the gaps in free data sharing and interoperability of rice research data.

The proposed common framework will help describing, representing linking and publishing rice data with respect to open standards. Such a framework will promote and sustain rice data sharing, reusability and operability. Like the Wheat WG, Rice WG also will try to address questions: which (minimal) metadata to describe rice data? Which vocabularies/ontologies/formats? Which good practices?

With regards to the legal and policy aspects of the underlying data, the proposal will defer to the policies in place in the respective organizations regarding data access: we recognize that private research institutions and companies keep most of their data internal, hence these are never exposed publicly. Nevertheless, private (& for-profit) institutions are encouraged to adopt the proposed framework within their internal systems, with the RDI WG requesting only acknowledgement of the adoption of the Rice RDI framework (and notification to the RDA-RDI WG by email). Most public (or publicly-funded) agricultural research organizations are mandated to provide open access to research data (eg. CGIAR Open Access and Open Data - http://www.cgiar.org/resources/open-access/; open data charter for agriculture - http://opendatacharter.net/introducing-agricultural-open-data-package-beta-version/), hence all data accessed using the framework are Open Data and should be treated accordingly.

On the matter of interoperability of rice research data, the RDI WG will use the recommendations and outcomes of the Research Data Alliance – CODATA Working Group on Legal Interoperability of Research Data (RDA-CODATA WG) , adapting them accordingly to suit rice research data.

In terms of functionalities and data types, the working group will identify relevant use cases in order to produce a  “cookbook” on how to produce “rice data” that are easily shareable, reusable and interoperable. Implementing the framework will help cultivate a rice research ecosystem with people familiar with interoperability, organisations ready to collaborate, and common tools and services. To do so, the WG will focus on;

1.   Sharing  heterogeneous research data that could be useful across regional boundaries. The rice data may range from germplasm, pedigree, genetics, genotyping, phenotyping, varietal, technological and rice policy data.

2.   Capturing the farmers management techniques (tacit knowledge) from varied agro-climatic conditions. Data on Farm innovations can impact providing the local solutions to global problems vice versa

3.   Accessing performances of the released varieties. It is estimated that there are about 40,000 varieties of rice cultivated by farmers. The suitability of few varieties in areas other than where they are released can be worked out if only the data related to their performance is shared across the board.

4.   Providing data relevant to decision support systems: The decision support systems operate in two tiers i.e., at researchers level and development professionals level. Finding and comparing experimental and putative rice data from many well-established and emerging sources is a real challenge for informed decision making by researchers. For example, there is a need to consolidate a wide range of public scholarly data into a single search box by extensive data matching and curated cataloguing of disparate sources, giving an overview of rice gene knowledge and side-by-side comparisons for thousands of genes on a single screen. In this way, researchers can gain access to many kinds of rice knowledge by simply entering a keyword or identifier.

5.   Accessing relevant socio-economics data and policies: Global rice trade, policy decisions and socio economics related to rice sector will have bearing on rice research. A prototype framework will go a long way

6.   ‘Revive’ legacy data (ink on paper) through digitalization: There are millions of legacy data generated through multi-location testing programs of National systems in last five decades. The untapped resources of legacy data will bring data revolution in rice sector, if available to thousands of researchers. For example in India, legacy data of 50 years is made available through 27000 datasets related to multi-location trials with effective tagging based on disciplines, year, season catering to data requirements of rice researchers of the country.

7.   Manage the multilingual status of the data: The biggest challenge in agricultural data lies in the variety of languages in which the data is stored in. Notwithstanding the complexities, an attmept can be made in building and piloting a framework that will eventually becomes a model for agriculture as a whole

8.   Bring the International Rice Informatics Consortium into RDA Rice working group: The International Rice Informatics Consortium (IRIC - http://iric.irri.org ) aims to provide access to well organized information about rice, and to facilitate communication and collaboration for rice community, having germplasm diversity as a focal entry point.

3. Value proposition

Individuals, communities, and initiatives that will benefit from the Rice Data Interoperability Guidelines

The RDI will provide with a linked data framework based on community-accepted standards, which ensure data analysis and data integration facilities. Such a framework is a great asset for the Rice Information System  to provide the analysis functions and other services expected by the researchers. Implementing a common framework (however small the scale may be) will help cultivate a rice research ecosystem with people familiar with interoperability, organisations ready to collaborate, and common tools and services.

  • The Rice data managers and data scientists will have a common and global framework to describe, document, and structure their rice research data.
  • Researchers, growers, breeders, and other data users will have seamless access, use, and reuse to a wide range of Rice data. Data linking will also ease emergence of new data analyses and knowledge discovery methodologies
  • Other plants data managers and scientists – will have the benefit of a reusable data framework. Researchers working on other plants will be able to more easily access, reuse and link up rice data with their own data.
  • Development professionals and policy makers for taking informed decisions with comparative advantages across the countries
  • In terms of scope, more than 132 countries can get direct benefit from the free sharing of rice research data

Key impacts of the RDA Rice Data Interoperability Guidelines

  • Promote adoption of common standards, vocabularies and best practices for Rice data management. A general awareness among the rice research organizations about openness of data and interoperability standards
  • Facilitate access, discovery and reuse of Rice data there by creating an evidence based impacts for RDA/IGAD/RDI framework
  • Facilitate Rice data integration and measuring the impacts of free sharing of rice data
  • Creating new opportunities for ontology based knowledge management in rice sector

4. Engagement with existing work across the globe

The Rice data interoperability WG is a working group of the RDA IGAD.  The working group will take advantage of other RDA’s working group’s outputs.  In particular, the working group will be watchful of working groups concerned with metadata, data harmonization and data publishing.

The working group will also interact with the experts and other projects from national and international organizations and their initiatives which are built on standard technologies for data exchange and representation.

The Rice data interoperability group will exploit existing collaboration mechanisms to get as much as possible stakeholder involvement in the work. The working group will also interact with the Wheat Data Interoperability WG experts and other plant projects such as TransPLANT (http://urgi.versailles.inra.fr/Projects/TransPLANT), agINFRA (http://www.aginfra.eu), GOBII (http://gobiiproject.org/) and more generic project such as Elixir Excelerate (https://www.elixir-europe.org/excelerate), International Rice Information System (of IRRI) Integrated Breeding Platform (https://www.integratedbreeding.net) which are built on standard technologies for data exchange and representation, DivSeek (http://www.divseek.org/) and RICE-GRISP (http://www.cgiar.org/about-us/our-programs/rice-grisp/)

The work will directly align with the ongoing initiatives of hundreds of rice research organizations (including International Rice Research Institute and Africa Rice).

  • Understanding ongoing initiatives - existing work in Rice Data - through a community survey  - what difference can WG outputs can make among these initiatives. The survey will try to understand the systems (data content, ontologies/controlled vocabularies,  software technologies including APIs), breeding workflow, rice high throughput genotyping/genomics, avenues for harmonizing semantics for phenotyping and agronomy data, ontology based production management and various Interoperability issues.
  • Create a prototype data registry for test in line with IRRI's ongoing work - this helps providing guidelines for creating data registry for rice research organizations. IRRI and AfricaRice are the lead International Rice Research Institutions that have a reach to every NARS partner. A Data Registry created can immediately be taken to the national partners. RDI WG can leverage the strength of these two organizations.
  • Collect semantics and initiate a framework for a Rice ontology that aligns existing rice ontologies , thesauri, controlled vocabulary and prospect the multi-lingual conversion of ontologies. Many countries such as India and Thailand build their rice programs on semantic portals using standard rice ontology. RDI WG will work towards aligning all these ontologies to a common framework and design ways to using ontologies for collective intelligence and production and pest management.
  • Best practices for digitization of  rice legacy data based on Indian and Thai experiences. Most valuable and reusable data still lies in paper based documents. While generating the awareness among the policy makers in rice growing countries about the need for digitizing the legacy data, RDI WG will also work towards developing/ documenting best bet practices for effectively digitizing rice legacy data.

5. Work plan

Form and description of final deliverables

  1. A report on the survey of existing standards among rice research and development organizations. Focus on data availability, accessibility and applicability, formats, ontology, standards and meta data used. A complete analysis of interoperability (or otherwise of) among rice databases and repositories.
  2. A set of recommendations on good practices, ontologies, tools and examples to create, manage and share data related to Rice. This work will be based on the existing Wheat Data WG Guidelines. The WG will Identify and adopt those relevant to rice data, and will customize accordingly. New types of data might be added according to the results of point 1. The expected output is a Rice Data Framework specification (cookbook)
  3. Evaluation of a prototype on Rice specific data registry, Recommendations on how to develop this type of tools would be prepared and disseminated as good practices.
  4. Recommendations for a Rice ontology which should align existing rice ontologies, thesauri, controlled vocabularies. This should be the basis for a prospect on multi-lingual conversion of ontologies (TH KU/JP NARO/IRRI/ IIRR / Bioversity) which will not be covered by this WG as a deliverable.
  5. Good practices for digitization of rice legacy data in line with India's data repository that can serve as a model for getting Thailand national legacy data  available and identify best practices (India - IIRR,  TH Rice Dept/Ministry of Agric)
  6. An adoption phase for deliverables 2 to 5 is foreseen in the workplan, in two different forms:1.) disseminating and creating awareness on the results within the Rice Research Community; 2.) preparing use cases in national and international organizations.

6. Milestones

Month 1 to 6: Survey to identify the existing standards and recommendations (including vocabularies, ontologies and data formats), end-user categories, and relevant platforms and tools. Target audience are researchers and data managers.

Month 1 to 12: First version of the Rice Data Framework specification online (cookbook).

Month 6 to 18:  First draft of good practices for digitalization of  rice legacy data ready.

Month 6 to 18: Evaluation of prototype data registry involving few partners , coordinated by IRRI

Month 12 to 18 onwards: Creating general awareness among rice research organizations on common standards, interoperability issues, openness, data standards. This would be followed by an adoption phase targeting one natinal rice and one international repositories.(organizations to be identified).

Month 18 onwards: Creating and measuring the impacts (evidences) of rice data framework and standards of RDI/IGAD/RDA

7. Adoption plan

The working group can rely on its initial members to promote a large adoption of the data framework. But the overall aim of the working group is to design a common framework as well as to create the awareness and hence encourage the participation of large number of rice research and development organizations.

The initial work will be undertaken by existing members led by one/ two organizations;

  1. Community surveys on ontologies, data needs and interoperability  -  Co-lead: IIRR/IRD
  2. Adopt Wheat Data Interoperability Guidelines relevant to Rice Data, and publication of the Rice Data Framework specification (cookbook)  –lead: IRD
  3. Prototype data registry for test and guidelines for creating data registry –  lead: IRRI
  4. Collect semantics and initiate recommendation on a Rice ontology that aligns existing rice ontologies, thesauri, controlled vocabulary and prospect the multi-lingual conversion of ontologies – co lead: NARO/IRRI/IIRR/Bioversity
  5. Good practices for digitization of  rice legacy data - Co-Lead: IIRR,  KU -  Thailand

11 Initial membership

Initial member institutions

  • IRRI
  • IRD
  • Bioversity
  • NARO
  • IIRR
  • CIRAD
  • CIAT
  • FAO of the UN
  • INRA
  • PHILRICE
  • International Rice Informatics Consortium
  • AfricaRice

Initial members

  1. Alexandre Guitton
  2. Devika Madalli
  3. Imma Subirats Coll
  4. N Meera Shaik
  5. Pierre Larmande
  6. Ramil Mauleon
  7. Sridhar Gutam
  8. Manuel Ruiz
  9. Laurel Cooper
  10. Elizabeth Arnaud
  11. Vessela Ensberg
  12. Giovanna Zappa
  13. Jeffrey Detras
  14. Muhammad Naveed Tahir
  15. Terry Lee
  16. Vassilis Protonotarios
  17. Xavier Greg I. Caguiat
  18. Ibnou Dieng
  19. Xuefu Zhang

 

Review period start:
Wednesday, 29 June, 2016 to Sunday, 31 July, 2016
Custom text:
Body:

In a field where the amount and the data sources increase exponentially, it becomes critical to find efficient solutions at a community level to address problems such as storage, indexing, metadata, sharing, or analysis. The global objectives of this Interest Group will be to explore and discuss the challenges for management, efficient analysis and dissemination of large and diverse datasets generated by the weather, climate and air quality communities. These communities follow one of the main ideas of data sharing of RDA as most of the datasets used are completely open and freely available. Most of the concepts and problems that this IG aims to address correspond to classic RDA issues (metadata, data standards, efficient sharing, etc...) but as these communities already have rules and policies structured in terms of standards and data portals, they need some advanced community specific solution to these issues. These communities, which are not (yet) adequately represented in RDA, are one of the biggest data producers in terms of volumes stored (for example, the CMIP6 project is planned to generate 5PB of data and this is only one example of a project for one community), have a range of users well identified and, therefore, can bring an important added value to the work of the RDA community.

Download the full Case statement 

 

Review period start:
Tuesday, 14 June, 2016
Custom text:
Body:

Maecenas faucibus mollis interdum. Nullam quis risus eget urna mollis ornare vel eu leo. Vivamus sagittis lacus vel augue laoreet rutrum faucibus dolor auctor. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur blandit tempus porttitor. Morbi leo risus, porta ac consectetur ac, vestibulum at eros.

Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Donec sed odio dui. Duis mollis, est non commodo luctus, nisi erat porttitor ligula, eget lacinia odio sem nec elit. Praesent commodo cursus magna, vel scelerisque nisl consectetur et.

Review period start:
Friday, 27 May, 2016
Custom text:
Body:

The initial idea of establishing this working group was presented during P6 in Paris in the Repository Platforms for Research Data IG session. Shortly after P6 a telephone conference was carried out with the conclusion to prepare a case statement and to finalize it during a BoF session at P7. The initial co-chairs are David Wilcox and Thomas Jejkal. Contacts to potential co-chairs from Asia were already made during P6 and will be finalized during P7.

For more information please visit the web page of this BoF group:

https://rd-alliance.org/groups/research-data-repository-interoperability-wg-bof.html

Charter

The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms focusing on machine-machine communication. These standards may include (but are not limited to) a generic API specification and import/export formats summarized in a document serving as an implementation guide for adoption. The scope of this document and all the WG’s activities will be defined by the following list of initial use cases:

  • Migration/Replication of a Digital Object between research data repository platforms

    • Platform, data model and/or version may differ between source and destination

  • Retrieval of information related to the platform and/or its contents

    • E.g. to register the system in a (repository) registry or to harvest contents

This initial list might be extended in the first phase of the WG’s operational time.

In order to cover these use cases, existing standards and technologies will be identified and evaluated in the second phase. Evaluation results will be summarized in a separate deliverable and will form the basis of the final deliverable. During the evaluation phase, the preparatory work of other RDA WGs will be used as far as possible along with experiences gathered by the RDRI WG’s members during their work with and on existing research data repository platforms.

In the final phase the WG will strive for a consensus regarding a generic API specification and/or import/export formats needed for offering the listed functionalities. The final deliverable will then contain this consensus in a form such that it can be used as an implementation guide for later adoption.

Value Proposition

The Research Data Repository Interoperability working group will provide recommendations and implementation guidelines (e.g. for a generic API or import/export formats) for research data repository interoperability that can be integrated by platform developers and service providers. Therefore, existing standards and technologies will be evaluated and integrated where possible. Once adopted widely, these outcomes will allow institutions and organizations with research data repositories to deposit, access and share their data in a common way and to disseminate repository resources and contents to clients and services easily. For adopters and their users this means:

Removing Barriers: Defining and implementing interoperability standards for realizing the use cases mentioned above could help to identify and to acquire datasets stored in other platforms not available before in order to enrich the own research.

Easier Collaboration: Having a common way to exchange datasets stored in different research data repository platform instances from different institutions or even disciplines can help to identify new starting points for (inter-)disciplinary collaborations.

Creating Commonalities: Agreeing on and implementing common standards for realizing typical research data repository tasks might bring adopters closer together. For the future this could result in fruitful collaborations extending the basic set of functionalities that have been proposed by this WG.

As everything rises and falls with the adoption of the results, repository platform developers contributing to this group have agreed to implement the results as early adopters.

Engagement with Existing Work

A number of related standardization efforts have already taken place; for example, the OAI protocol for metadata harvesting, the SWORD protocol for repository deposits, and the re3data.org schema for collecting information on research data repositories for registration. The Research Data Repository Interoperability WG will review these and other related standards to see how they might be adopted or extended to support our goals. This review period will ensure that we do not duplicate existing efforts.

Related Work

Related RDA Groups

Work Plan

The work of the proposed group is organized in three phases framed by the RDA plenary meetings beginning with P8.

Timing

Action

Main Participants

September 2016

Official start of RDRI WG at P8, working session at P8 for analyzing state of the art

Session participants in an open discussion

September – December 2016

Identification and discussion of additional use cases and adoptable technologies. Mapping of technologies for potential adoption to single functionalities.

Registered members

January – April 2016

Create a primer document describing all use cases and technologies for potential adoption. The document also points out gaps not covered by existing technologies.

Co-chairs

April 2016

Session during P9 to present the primer document and to prepare next steps, e.g. identification of functionalities or exchange formats.

WG members

April 2016 – September 2017

Discussion of functionalities, exchange formats and intended behavior. Create first draft of specification document.

Registered members

September 2017

Presentation of the specification draft at P10 and identification of open points and potential improvements.

Session participants in an open discussion

September – March 2018

Find consensus regarding final specification and write final deliverable serving as implementation/adoption guideline.

Registered members/co-chairs (writing)

March 2018

Present final results at P11.

Co-chairs

 

Deliverables

D1. Research Data Repository Interoperability Primer (M6): This document describes targeted use cases, needed functionalities, as well as existing technologies and their feasibility for adoption. Gaps not covered by existing technologies are also described in this document.

D2. Interface Specification Draft (M12): A first draft document of the final specification. The document gives a basic overview of functionalities, exchange formats and intended behavior targeted by the WG to cover the defined use cases. This document will be the basis for finding a consensus between all WG members.

D3. Interface Specification (M18): This specification represents a consensus of all partners regarding an interoperable repository interface. It describes all functionalities provided by this interface including exchange formats and the expected behavior of a repository platform implementing the interface. This document serves as guideline for adopting the results of this working group.

Mode and Frequency of Operation

The Research Data Repository Interoperability WG will primarily communicate asynchronously online using the mailing list functionality provided by RDA. Online voice meetings will be scheduled as needed; likely once per month. When possible, in-person meetings will also be scheduled; these will take place at RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

Addressing Consensus and Conflicts

Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.

Community Engagement

The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.

Adoption Plan

Representatives of several major repository platforms have already joined this working group, including:

These representatives have agreed to consider implementing the standards recommended by the Research Data Repository Interoperability WG in their respective repository platforms. We will continue to seek representatives from a variety of repository platforms and services to ensure that this working group’s deliverables are widely adopted.

Initial Membership

Co-Chairs

Thomas Jejkal

David Wilcox

 

Members

Stefan Funk

Ralph Mueller-Pfefferkorn

Robert Olendorf

Rick Johnson

Ulrich Schwardmann

Ajinkya Prabhune

Andrew Woods 

Wolfram Horstmann

Cynthia Hudson Vitale

Adam Soroka 

Jared Whiklo

Colleen Fallaw

Rainer Stotzka

Stephen Abrams

Eleni Castro

Amy Nurnberger

Andre Schaaff

Christopher Harrison

Holger Mickler

Jibo Xie

Juanle Wang

Muhammad Naveed Tahir

Niclas Jareborg

Shaun de Witt

Volker Hartmann

William Gunn

Wouter Haak

 

Review period start:
Thursday, 19 May, 2016 to Monday, 20 June, 2016
Custom text:

Pages