Notes from July, 2017 Virtual meeting

24 Aug 2017

Notes from July, 18 2017 DFT Virtual Meeting

This was our 2nd organizational Virtual Meeting planned to get ready for P10 in September.

Attendees

1. Xin Mou
2. Rowena Davis
3. Keith Jeffries
4. Rebecca Koskela
5. Raphael Ritz
6. Gary Berg-Cross
7. Thomas Zastow

Objectives of this session included:
Work on the plan on presenting a stable version 1.0 of the vocabulary including updates from the P9, the Gothenburg chairs discussion (such as in the DXWG/DCAT & mapping the Landscape presentations) and input from groups in the Summer virtual meetings.
Exploration of areas of interest for adding some concepts and definitions to the existing list of 200+ in our term tool (TeD-T)

1. Metadata (MD) element discussion
Rebecca described plans for P10 such as joint meetings. Several “Metadata Principles” have been created and endorsed by all the RDA metadata groups.
In addition one can view online pages where update to MD element discussion has been progressing -see https://www.rd-alliance.org/groups/metadata-ig.html

At the June Chairs meeting in Gothenburg we heard about” “Use rich superset canonical metadata covering existing metadata standards”....”Born-rich metadata” ...”Rich multilingual semantics” and “using automated term language translation via ontologies with term relationships to allow super- and sub-terms (and other related terms)”
Gary thought that all of these could use some discussion followed by definitions and examples of what is meant by these.
Keith suggested that these ideas along with some from DXWG will lead first to richer syntax and later to semantics (such as role definitions for MD.)

Rebecca was interested in the capability of discussing definitions in the term tool (TeD-T).
Raphael and Thomas walked the group through some of the funtionality for this and invited people to regisiter and submit definitions.
Rebecca noted an issue with one of the “data lifecycle” definitions that included the idea of “destruction” of data. Gary noted this as the older definition from the WG effort and the idea of destrution goes back to paper record destruction.

2. DXWG/DCAT topic which was at the Chairs meeting.

Keith provided some background of the DCAT model which is used more in Europe (44 uses) and motivation to expand it . There seems like good overlap of this effort with DFT so we should look forward to future conversations which will add an RDA perspective to it.
Gary noted the DCAT effort has such things as classifying datasets by themes which takes us into skos:ConceptScheme. It also has Catalog Record, Rights Statement, catalog license, distribution rights which would expand the DFT vocabulary a bit.

3. Landscaping mapping

Landscaping mapping was discussed at the Chairs meeting as an effort to standardize machine-to-machine interfaces that can loosely couple data and software through agreed formats, interfaces, vocabularies and ontologies, preferably across multiple domains.
The intent of the IG is to “support continued synchronization of RDA conceptualization and enable better understanding within and between RDA groups. In addition it will provide updates on the term tool operation, functionality and use by groups.”

Among the things to map to were:
Track Groups
Discover Gaps
Identify Overlaps and Possible Synergies
Acceleration of Outputs
On-Boarding Resource for those new to RDA, and a Reference for current members

Rowena provided some additional background and noted that mapping started at p8. There are indeed many different types of mapping. Some are social, some geographical.

Mapping asks the questions, “can we get a comprehensive view.” The big challenge in current mapping is creating a map index with a useful legend.
See https://docs.google.com/spreadsheets/d/1MIpGocUf4AnJTGk06uVH0ZNKbf9OHmys... and
https://docs.google.com/spreadsheets/d/1SgsQHw1ZQ07rC7sO1HHJ2MObV6i3wy-M...

The group has the beginnings of a vocabulary mapper and is trying to create an ontology mapper. The question was asked whether there were any ontology groups in RDA? Gary noted the presencne in onologies and their discussion in several groups but no overall group in charge. He discussed some history of RDA discussion on ontology and efforts to start a group. His opinion is that it is partly a matter of making sure the right resources are committed to it across the various parts of RDA. He would be interested in being part of such an effort and has sponsored 2 BoFs at Plenaries to discuss the use of ontologies for domain vocabularies. Follow up since P8 included Ontolog sponsored sessions on Domain Vocabularies with a Session Overview by Ontolog Board Member Gary Berg-Cross. See
http://ontologforum.org/index.php/DomainVocabularies
Presenters at the first session included:
Mark Fox (University of Toronto) An Upper Level Ontology for Global City Indicators
Torsten Hahmann (University of Maine): Domain Reference Ontologies vs. Domain Ontologies: What's the Difference? Lessons from the Water Domain
Boyan Brodaric (Research Scientist at Natural Resources Canada): What's a river? A foundational approach to a domain reference ontology for water
These may provide some basis for a joint session at P10 on Global water and BioDiversity.
For an Ontological Engineering & Development 101 briefing developed for the 2nd BoF see https://www.rd-alliance.org/sites/default/files/attachment/Ontology%20En...

Another follow up was the 2016 VoCamp organized by Gary at the U of MD. Topics included several realted to RDA group work:
An RDF vocabulary for Chemical Safety & Chemical Terminology (Leah McEwen)
A pattern to support Materials Research vocabularies (Kimberly Tryka & Alden Dima)
Topography - basic terrain primitives, slope, length, shape, curvature (USGS topic)

In addition Gary and Rebecca had organized a 2015 RDA Outreach effort on improved semantics for metadata. Among the participants were Michel Dumontier discussing efforts in healthcare. See https://www.rd-alliance.org/rda-metadata-semantics-workshop-indianapolis... for the presentations, outputs, and breakout group summaries from the Metadata & Semantics Workshop held in Indianapolis, Indiana February 23-25, 2015.

Another part of landscape mapping to identify what best practices can we identify.
Rowena mentioned that the group is using Oil-E or Open Information Linking for Environmental Ris. For some info see https://confluence.egi.eu/display/EC/OIL-E.

For P10 the ML Interest group has an expressed interest is vocabulary mapping from health groups. Known work include Mark Mussen;s work on the BioPortal, and interest from the NeuroScience community.
Gary has been in touch with Michel Dummontier (part of FAIR) about attending P-10 which might allow more discussion but Michel is not yet sure that he would be available.

Xin Mou talked about his RDA US Fellowship project called “ Building a catalogue for data standards among scientific disciplines “.
He is currently collecting information on data standards among scientific disciplines and the interent is to build a catalogue.

4. Gary mentioned A BoF planned for P-10. It was initiated by the IRIDIUM data management vocabulary effort and David Baker discussed it in our last meeting.
See: https://www.rd-alliance.org/research-data-management-rdm-vocabularies-rd...

Part of this is development of a White Paper reviewing data management vocabularies. Gary is participating and has drafted a template and some reviews from extant work. One of these is revised version of ISO 5127:2017 - the standard vocabulary for Information and documentation.

See https://www.iso.org/standard/59743.html and for browing:
https://www.iso.org/obp/ui

One approach to doing this White Paper is to recruit some help by people who did the original work such as on NIH glossaries (e.g. Analysis Data Model (AdaM), CDASH terminology a subset of SDTM terminology...). Gary hoped that we can recruit some help.

5. Tool update
Thomas had good news regarding adoption from the data collections effort.
They may implement collection API in FEDORA commons to present RDA work in a practical manner. We also expect some follow up to discussions on Data Collections and minting PIDs for vocabulary items.
See https://www.rd-alliance.org/groups/research-data-collections-wg.html

6. Miscellaneous updates
Gary noted that the RDA Provenance profiles WG has use cases related to Landscaping such as “find all of the datasets made by a person and further datasets descending from those.”
That work on profiles may get deeper into models of data models and thus have concepts for DFT standardization.

Vocabulary items coming out of the Data Description Registry Interoperability WG may also be entered into TeD-T soon. They have delivered a report which includes key ideas such as a "KnownAs" relationship and a Co-Authorship concept. (Our meeting time is not convenient for Amir Aryani in Australia but he may provide some examples of concepts)
The Trusted Repositories group may have something for P10.

The next meeting was set for the same time on Wed. August 30th.