Data Discovery Paradigms IG Activity Overview Relevancy Ranking Task Force Wiki

Relevancy Ranking Task Force Wiki

Creator

Discussion
December 21, 2016 at 4:34 pm #137750
Siri Jodha Khalsa
Participant
This is the Main Wiki page for the Relevancy Ranking Task Force

Co-Leads
Peter Cotroneo
Mingfang Wu

Contributors
Anita de Waard
Øystein Godøy
Jeffrey Grethe
Beth Huffer
Siri Jodha Khalsa
Jens Klump
Lewis McGibbney
Jun-ichi Onami
Craig Willis

Survey instrument for current practices in relevancy ranking systems is available here.

A summary of per survey question and ongoing analysis of survey data is avilable here.

Goals

Relevancy Ranking is a specific feature of a data search system, yet it is an important component for a data search system to deliver what a data seeker is searching for. This task force aims to achieve the following goals:
1. Help people choose appropriate technologies when implementing or improving search functionality at their repositories.
2. Provide a means or forum for sharing experiences with relevancy ranking.
3. Capture the aspirations, successes and challenges encountered from repository managers.
Progress

To achieve the above goals, we have carried out the following activities:
1. Identify who is the target for recommendations/outputs of the Relevance Ranking Task Force?
  - For repository (primary target) – how to implement, what to implement, what tools are available in open code repositories.
  - For data producers – relates to choice of metadata standard, etc.
  - For data searchers – how to formulate queries to get appropriate results ranking.
2. Identify issues on search ranking from within a repository and evaluation methods.
3. Identify current practices in relevancy ranking for data search by designing a draft survey questionnaire. It is expected the aggregated search result would be used to support the 2nd & 3rd goals.
4. Explore possible testbeds to address data search challenges. Some possibilities may include:
  - Elsevier can provide AWS EC2 instances for a relevancy test bed. The Elsevier team could probably clone the machines that they used during the recent bioCADDIE Challenge.
  - Discuss with NDS Labs if they can provide a testbed (NB Anita can make connection, discuss in Barcelona?)
  - ANDS can provide a corpus of the Research Data Australia repository.
Deliverables

At this stage, we have been doing scoping study. We hope after we conduct the survey, we will be able to share where the community is at in implementing relevancy ranking, what are benchmarks, and what are common relevancy ranking activities that data search implementers would like to start and participate.

Next Steps

The planned future activities include:
- Finalise the survey instrument and conduct the survey.
- Analyse the survey result to understand current practices on relevancy ranking and prioritise future activities for the group.
- It takes quite a lot of effort to experiment and evaluate various factors that affect relevancy ranking, this task force will collaborate with data search community (or search community at large) to explore what are realistic and yet reliable ways for data repositories to carry out such a comparison and evaluation task.
Minutes

The task force has four meetings so far (about every two weeks). The notes from each meeting is available from the task force’s wiki page.
Creator

Discussion

Data Discovery Paradigms IG

Group Organizers

Relevancy Ranking Task Force Wiki

Goals

Progress

Deliverables

Next Steps

Minutes