Joint RDA UK Node and FREYA project workshop highlights the importance of Persistent Identifiers
The RDA UK node held its second workshop on 16 July 2019 in London, which was kindly hosted by the Wellcome Trust. After running a launch webinar and a half-day workshop to introduce the node, we decided to focus on a specific theme for our second workshop. We chose Persistent Identifiers (PIDs) and ran the workshop jointly with the EU funded FREYA project as they have a number of partners who are actively engaged in the RDA.
The day started with a welcome from David Carr who had agreed to host the event. He sees the RDA as crucial to the global research landscape to help deal with shared challenges. The Wellcome Trust is supportive of the UK node to help engage and reach out to promote RDA work. Frances Madden gave an introduction to the FREYA project and the PID Graph. Juan Bicarregui introduced the UK node and described how the day would involve presentations and breakout groups to make it a more interactive event.
The presentations were grouped into a number of areas – the PID landscape, PID provider plans, and PID use cases.
Christopher Brown described why PIDs are important to research projects, particularly within Jisc, and how the RDA supports community work at an international level. The next three presentations focussed on RDA activities around PIDs from the chairs/co-chairs of relevant RDA groups.
Martin Fenner described the Open Science Graphs for FAIR data interest group and how the PID Graph from FREYA is a core component. The goal of the interest group is to build on the outcomes of DDRI and Scholix RDA Working Groups to investigate the challenges and identify solutions towards achieving interoperability between services and information models of Open Science Graph initiatives. The aim is to improve FAIRness of research data, and more FAIRness of science.
Louise Darroch talked about how PIDS are used at the British Oceanographic Data Centre where they need instrument IDs to be part of the research object. They provide a good way of getting metrics for funding as they show how the funds have been used. Having created the Persistent Identifiers of Instruments working group they gathered a number of cross-community use cases, including from the UK Polar Data Centre.
Tobias Weigel works at the DKRZ in Germany and is involved in a number of RDA groups. He described how they assign PIDS in pre-publication workflows, before the data is published and ready to be distributed to others. Tracking of data versions and copies and referencing before formal publication are two features requested by their users. Each digital object has an identifier and metadata associated with it. They want to automate the management of data objects and their metadata - pre-publication, record connections between (meta)data objects, software, and workflows. The RDA groups and recommendations supporting this work are the Data Fabric IG, PID Kernel Information WG, Data Type Registries WG and Research Data Collections WG.
PID Provider Plans
In this session we heard of plans from PID providers, such as DataCite (Jez Cope), ORCID (Tom Demeranville) and Crossref (Rachael Lammey).
DataCite was founded in 2009. The British Library is a consortium lead for the UK’s DataCite membership. In 2019 the British Library have 94 UK clients, including 71 HEIs, and 471k UK DOIs. They’re looking to simplify the membership model, making DOIs accessible to smaller institutions, and improving communication between DataCite, the British Library and data centres.
ORCID is working on connecting infrastructure, using ORCID IDs to create a graph of links. To demonstrate a facility’s impact of public investment requires IDs to these resources and ORCID has been involved in discussions within and outside the RDA. Researchers can now pull metadata from PIDs when manually adding works to ORCID. As workflows may start with PIDs but end with a piece of paper, it’s important to bring our community with us and not move ahead of ourselves. Telling people to use the power of PIDs doesn’t always work. They’re hoping that the PID Graph will solve some of these issues.
Crossref is aiming to make research outputs easy to find, cite, link, assess and reuse. The communities they’re working with are growing a lot. It’s not just for publishers, but they are talking to funders to determine what they need from Crossref. The Open Funder Registry launched in 2012 and is a taxonomy of funding bodies, each with their own DOI. This has grown from 4,000 to 20,000 funders, but 3.6M content items (out of ~13M) have some funder information, 2.6M have a funder DOI. They want to automate the publishing process using a global grant Identifier and the Grant Identifier Metadata Schema has been made available for public comment.
PID Use Cases
It’s important to show how PIDs are being used and in the afternoon we moved on to a number of case studies from FREYA partners. The first was from Christine Ferguson who told us that the EBI are indexing preprints in Europe PMC and extending the PID Graph, integrating other metadata and PIDs into these records. Vasily Bunakov, from the STFC, is looking at PhD research and extracting information from the British Library’s ethesis service EThOS and integrating with information from the repositories of facilities supported by STFC. This enrichment and cross-repository harmonisation of records can be built as a knowledge graph with as much use of PIDs as possible to understand the outcomes of the use of the facilities. The STFC are also actively involved in the European Open Science Cloud project and Juan Bicarregui gave us a personal view on what the EOSC is trying to achieve, what needs to happen to achieve this and how it will happen.
The breakout groups were asked to consider how do we use PIDs together and where are the gaps? The following is a summary of the feedback from the groups.
Need to be clear what the problem is we are trying to solve. Graphs enable us to see things that we had not seen before by visualising the information. Links need to be defined with a data structure you can build solutions on top of. The power and value of the Graph needs to be demonstrated.
The RDA can help influence other international orgs (Google, schema.org) and should engage with industry and business. It has the ability to get people together in a room to talk - could happen with universities and funders. There are other PID-related events and groups (Pidapalooza, PIDforum.org, ISO meetings, etc). With so many groups it’s hard to get a handle on what the RDA is and what it does, but it’s good to have UK events to try and make it clearer. Having themes helps but the RDA can be quite disorientating to navigate because there is so much going on. The RDA has done lots of useful work, but less on producing outputs directly relevant to researchers. Scholix has been useful in bringing publishers along en masse.
There is a proliferation of PID types and we must be careful not to silo them. The key for PID success is through adoption. Need to dig down into workflows to see how it actually happens. A taxonomy on what is out there would be helpful. Advocacy is required so that people care about PIDS. There’s a need for use cases to show the benefit. A registry of PIDs and PID services would be useful and there are unknown unknowns about what PIDs are useful for - need a “compare the market” for PIDs. Recommendations on what to do and when with PIDs would be useful. There are implementation challenges and costs. Who is responsible? A lack of incentive structures for researchers (who have to produce papers).
Update on RDA Europe
After the breakout sessions the day came to an end with a brief update on RDA Europe. This included an update on the existing and new nodes, the recent changes to the RDA website to make it easier to use, and upcoming calls from RDA Europe funding. The update finished with thanks to Wellcome Trust and David Carr for supporting the event by hosting the workshop and helping with planning.
Updates from workshop participants
Participants had the chance to provide updates and news at the end of the day. Paul Walk gave a brief update on the recent development with the DCMI work on expressing PIDs in XML-based Dublin Core metadata.
I’d like to thank all the presenters for their time and effort in making the workshop a success, the 70 people who attended and participated in the breakout groups, and the Wellcome Trust for kindly hosting the event. Juan and I have already started planning the next workshop. This will be held in early 2020 and might be run in collaboration with OpenAIRE. We also plan to run at least one more webinar. If you would like to get involved, or have suggestions for content, please do get in touch.
For more information on the RDA UK node visit the UK node page. The slides are available in the RDA UK node’s file repository, the event page and Zenodo. Recordings of each presentation will be uploaded when they’re available.