Final Notes from August 30, 2017 DFT Virtual Meeting

06 Sep 2017

These are the Notes from August 30, 2017 DFT Virtual Meeting - 4:00 PM - 5:30 PM CEST
This was our 3rd and final organizational Virtual Meeting before P10 in September.

Participants:

1. Nicholas Car
2. Xin Mou
3. Raphael Ritz
4. Gary Berg-Cross
5. Thomas Zastow
6. 6. Amir Aryani (by Skype on August 5th)
1. Vocabulary Update: Gary noted small updates to our vocabulary coming out or the MIG drafts on MD elements. One example is facility and equipment. It is worth noting that there are more precise definitions of something like “facility” or 'equipment” coming from other work including the Internet of Things.
MD elements may be a major agenda item to discuss at P10. Elements from the MIG site include :

1. 2. Metadata Element Set (provides some input for DFT vocabulary):
Below are the MIG's draft metadata element set. The “comments” or notes from the P9 session in Barcelona are linked for each element.
Unique Identifier (for later use including citation) {http://bit.ly/2ryRr12}
Location (URL) {http://bit.ly/2rujALv}
Description {http://bit.ly/2ss2CwH}
Keywords (terms) {http://bit.ly/2se44QX}
Temporal coordinates {http://bit.ly/2sdVKAR}
Spatial coordinates {http://bit.ly/2ru6kGt}
Originator (organisation(s) / person(s)) {http://bit.ly/2ruFCgZ}
Project {http://bit.ly/2rukIid}
Facility / equipment {http://bit.ly/2sdEj3h}
Quality {http://bit.ly/2svs0Cc}
Availability (licence, persistence) {http://bit.ly/2t56LEy}
Provenance {http://bit.ly/2se59Z1}
Citations {http://bit.ly/2se9efQ}
Related publications (white or grey) {http://bit.ly/2rjHFR5}
Related software {http://bit.ly/2rutPzn}
Schema {http://bit.ly/2srMUl3}
Medium / format {http://bit.ly/2svtEEe}
3. There was no update on DXWG/DCAT topic which was at the Chairs meeting as none of the reps from that effort were able to make the call.
4. Research Data Provenance (RDP) Nicholas Car provided a rich overview of the provenance profiles
This effort is more about process of how to relate thing to other things. PROV-O is the starting point and is very general with 3 basic concepts. This is a generalization from several prior, more detailed prov models out there. PROV-O is light but can be specialized a bit say for a particular Science where you need some special things. Nick has done this for the topic of decision making.

The group is just starting but some of the members have worked on use cases.
Provenance solution patterns (for any provenance task such as representation, transmission, use etc .) involving flowing values through a Prov graph which shows how an artifact was derived.
RDP would encourage a derived view which includes parent -child relations. Profile property introduced makes sense.

Gary noted that Prov pattern and Prov graph are terms that might be defined for DFT to help others understand this work. It would be nice to have a link to some of this work in our definitions as they proceed.

Example – getting credit for date item in a digital collections. As part of this process there are curator roles which are evident as part of PROV agent idea. Such constructs move away from too direct object relation. Role becomes an intermediate idea.
Raphael mentioned that the Neuro imaging community has been thinking along these lines and may be more advanced with some extensions.
http://nidm.nidash.org/specs/nidm-overview.html may be an example of what Nick mentioned.

Using PROV -O may allow another domain to understand some general things about a domain like Neuro since a high. Publishing Data Workflows WG is one example.

Research Data Provenance IG has joint meetings at P10 with other groups (e.g Joint meeting: IG Metadata, WG Metadata Standards Catalog, IG Data in Context, IG Research Data Provenance)
A meeting with legal is another example but the work is just starting now. In the near future some groups will be contacted to discuss attribution and flow.

Collections discussion (many discussions and several groups)– ISO standard 11915 for Geo defines collections objects as sets.
They have a lineage idea with these processing steps, for say an image.
But such processes are often better handled in another way. You may ask, “What was done in common? What algorithm was used for all these images?”
You can express an answer to these as a knowledge graph. PROV has suggestions on how to do this.

Raphael mentioned Thomas' Research Data Collections work on this. Thomas clarification. Just talking about the collection itself, not its semantics.

Maybe there are 2 different stories here. As part of Research Data Collections WG there is API work which doesn't include the why people are collecting things. Nick, saw no conflict here between the 2 efforts. There is value if there are causal relations between collections. We can say, “These data items came from these elements for this reason.“

Thomas also noted in passing that the collections group may be coming to an end. There may be some follow up on the Spec. documents, but more than that is not clear.

5. Follow up to discussions of minting PIDs for vocabulary items and versioning. Thomas could add Ids to each term and to a version. He asked if he should create a Version DFT vocabulary for P10. The consensus was that we should have a versioning of DFT.
Based on what happens at P10 and afterwards we might update to 1.5 between P10 and 11.

6. Vocabularies services – DFT may get involved again as this group becomes more active. Gary provided some history of the DFT to VSIG interaction including the idea of testing SKOS for integrating vocabularies.
It is another thing that Nick is interested in.

7. Data management vocabulary BoF
We still plan a BoF on Data Vocabulary efforts at P10. Gary has developed a template for analyzing these vocabularies and drafted 5-6 analysis of candidate data vocabularies to discuss. One is DFTitself and anther is on the ISO 5127 data management Vocabulary. Gary asked some of the ISO members of that group to help refine his draft analysis. The other organizers of the BoF have yet to submit their reviews and the final status of the planned White Paper is uncertain although Gary is prepared to brief DFT and his analysis.
8. Vocabulary items from the Data Description Registry Interoperability WG.
Gary had a Skype meeting with Amir Aryani on Sept. 6tg and he provided some examples of concepts for discussion at P10.
We started looking at the Term tool, and Amir found terms of interest like data registry and data infrastructure and data citation.
He noted there are things we can do and should do based on what is there. Certainly more on graphs would be of interest.
There are several things to note post the switchboard project connecting information. One thing that came out of this was a research graph model that links data across sites using the Switchboard. This was cross walked with other people.
We now have 20 or so repositories. See Figure and https://github.com/researchgraph with 50 million or so objects.
There is also a public node that is available for new partners.
Amir may be able to provide a list of missing terms that are of interest to them. For example, Grant.
They would like to link to our terms and they can be part of their graph.
They would be interested in developing a graph for controlled vocabularies.
One thing to discuss is what are the relations between terms. Archived data can be in a repository and related to that or something else.
DFT may try to develop formal relations or a taxonomy to organize terms and this is of common interest and could be used in their graph. This might be something to work on together on a "pilot project"  between P10 and P11. The goal would be develop definitions to explain what objects we have in the graph.
Since P10 is happening soon we talk a little about this there with interested parties, but pick this up and define a project after P10.

Btw, Nick was aware of this work and suggested that Amir is talking about chain of authorship dealing with a graph. This could align on LOD and ontology and Prov-O is something to leverage since a profile could look at such chains in a general way.
9. Follow on Domain vocabulary work. Gary noted that will be a joint session at P10 of  IG Global Water Information, IG Data Foundations and Terminology, IG Biodiversity Data Integration
This is will address some common interests and best practices in developing quality domain vocabularies.

Other topics
Nick noted that he was having trouble accessing the TeD-T site. He is on a Mac and may need to try a different browser. But it may be a more general security certificate/verification problem with the server.
Thomas and Raphael noted that some others have reported this problem and they will look into updating the certificate chain.