3rd Plenary - Data in Context IG - Notes

3rd Plenary - Data in Context IG - Notes
by Alessia Bardi (CNR), Early Career Researchers Programme recipient

 

Thursday, March 27th 15:30-17:00
 
Introduction by Brigitte Joerg
- history / 1st group proposal at 1st plenary
- revised use cases
   - findability
   - funding  
   - provenance
   - interoperability
 
Current life Cycle Models for Research Data
--> approaches from different directions
 
Models:
- DCC: Curation Lifecycle
- DDI: Data Documentation Initiative 
- Data Asset Framework
- Jisc Research Lifecycle Model
- Tony Hey's Model (presented during RDA keynote)
- Research Objects approach
- Knowledge Networks (liberalised metadata)
 
Collaboration/Overlap with other IG/WG
Use Case Template
Standards for the Representation of the Models
 
---------------------------------------------------------------------------------------------------------------------------
Contributions from WG/IG 
---------------------------------------------------------------------------------------------------------------------------
 
RDA Data Publishing / Workflows: DCC Data Profiles (Angus Whyte)
- four WGs under one umbrella
- Workflows group: for data deposition
- Assigning PIDs, Adding Metadata
Work ongoing with survey involving different stakeholders.
 
DCC Data profiles
- Used for data management requirement
- Less standardised now
 
Data assets framework
- used successfully at institutional level
- re-engineer and create a question bank and identify requirements
 
Long-tail Data IG, Data Publishing IG (Jochen Schirrwagen) 
Long Tail IG
- small research institutions fight for Big Data
- data not well documented and accessible (no time for researcher)
- problems with storage (no data archive available for particular kind of data)
Goal: collect best practises for data management at institutional level
- create examples by collecting datasets and survey on the landscape of practices on the tail
- what kind of policy is in place?
- who creates metadata?
- how is metadata accessible from the data centers? 
- which are the licenses?
How is work related to DiC IG?
Data archives document data which is preserved.
How can researchers have impact on this in their daily life?
 
Publishing Data IG
- best practises for publishing data
- traditional scholarly communication needs to be changed with new approaches to ensure availability of data
- Services WG: demo for references services
--> link literature to data via data centres and academic libraries
-> how to ensure quality of research data?
 
WDS Knowledge Network activity (Wim Hugo) 
-> quality assurance of data
-> 75 credited members (data providers mainly)
-> how do we add value to this collections we have available?
-> created knowledge network metadata WG --> conceptualisation
-> Liberalised metadata: network of relationships between entity described by metadata (authors, inst, projects, etc)
-> relationships are explicit and are found by mining the metadata
-> stored in triple stores
-> probably many systems interoperate with each other likewise
-> mining the metadata is not scalable without new ways of managing the network
-> it should be easy to update the network through different source of information
-> sopic coverage seems to be of very interest to the funders
-> standards are missing for the knowledge network
 
Research Objects (Brian Matthews)
-> what do we want to capture of an experiment?
-> it is the whole life cycle from the proposal to the publication
-> does not want to publish data but the experiment/investigation
-> based on what one wants, one needs different context: to share, to reproduce, to preserve
-> ideas taken up from ResearchObject.org: uses PROV, Open Annotation (Uni Manchester) 
Goal: use RDF annotation with free text, generation of packages for preservation purposes
 
Data Description Registry and Interoperability WG (Amir Aryani)
-> problem of metadata exchange, deduplication of records, persons, grants with wrong ids
-> interoperability / engagement
-> don't want to solve, but improve use of existing solutions for problem solving and start collaboration to address the problems at scale to achieve better quality
-> for researchers  to get hundreds of datasets as a result is not very interesting
-> the context of datasets is interesting, as it can be used to droll down the number of relevant results.
 
Reference Model BoF (Yin Chen)
-> reference model: standard for description of data computation of research infrastructures
-> help community to a common vision/ common language
-> uniform framework to compare research infrastructures
-> common solution to common problems
-> enable reuse of resources
-> identification of a high level lifecycle for data
-> acquisition, curation/preservation, access, processing, community support
-> Open Distributed Processing for design specifications (standard) based on viewpoints.
 
---------------------------------------------------------------------------------------------------------------------------
DISCUSSION
---------------------------------------------------------------------------------------------------------------------------
 
What is a Data profile?
- interface to define specific forms of data towards interoperation
- might be connected to the Linked Data/RDF shape new WG in W3C
 
Are we considering context for all steps of the life cycle?
- idea is to get an overall idea of all models - these may complement each other
- and check where the intersections are and understand commonalities
- e.g. it is interesting that also Practical Policy WG refers to a lifecycle
 
DDI is a conceptual massive things, we should be pragmatic.
- be careful on usage of use cases from other WG
- might not be very pragramtic and useful for us
Agreed: We do not want to do duplicate things.
 
Who is the community to use the outcome of this group?
- We are the community!
 
 
Thursday, March 28th 11:00-12:30
 
Recap by Brigitte Joerg
 
Remark from Audience
 
- Research data provenance is part of the Context
-> needs cooperation with Provenance WG
 
- What about OAIS lifecycle model? 
-> It is very abstract but we are aware; will be added to the list for investigation.
 
Semantic Interoperability (Gary Berg-Cross)
Data Foundation and Terminology WG
- Metadata is a Data Object (metadata can have metadata)
- Types of data have been defined (definitions available on mediawiki)
Use cases
- Research Objects
- EUDAT cases
 
Semantics is a feature at the application layer of OSI
- Socio tech aspects of Semantic Interoperability
- why are u developing a semantic model? 
- what are the questions you want to be answered?
- Semantic Interoperability is the technical analogue of human communication, hence hard
- EarthCube ontology manifesto for integration of earth science
- if we add semantic annotation to metadata we can do horizontal integration and not only vertical integration
- an ontology must include only the parts needed for a sufficient reasoning: do not overcomplicate things
- an entity can be seen according to different perspectives based on what you have in mind
e.g. Grafton street now is different from Grafton street in 1933
--> guide the visualization of ontological entities based on settings
Question:
1) Is this modelling the terms or the description of the terms?
     ->  it is not label oriented, but concept oriented
2. Context as a way to disambiguate?
    -> the settings in this case work as the context
3. Multilingual support?
    -> yes, important, and not easy: term translations are not enough. 
   -> W3C groups working on multilinguality
 
Metadata WG/IG (Rebecca Koskela, Keith Jeffery)
Metadata Standards Directory WG
- aims at machine understandable catalogue 
- metadata groups are related
- reference architecture coming by March 2015
 
---------------------------------------------------------------------------------------------------------------------------
DISCUSSION
---------------------------------------------------------------------------------------------------------------------------
 
Is the life cycle approach correct?
- Terminology WG is looking at models and creates definitions 
- Commonalities with practical policies: What metadata to create at which steps?
-> First get an overview of what is going on and from there develop a work plan towards setting up a WG.
 
What do you want to do by analysing life cycles?
- How to define object boundaries
- Context can be infinite
- Needs boundaries through models and use cases
- Needs better explanation better what case templates / profiles are and how they are related to each other
- Case template is to describe a use case for inclusion of properties 
- Needs assignment to the life cycle reflected from within the use case under analysis
-> A profile contains the metadata applicable with an object. Profiles are built - based on use cases. Given a use case one builds a profile that tells which parts from a standard to apply to accomplish the use case.
 
Profiles need to be different also at the domain level
- metadata are added by people, when addressed by users?
- it is also a practical policy issue, and should be considered in the model
- if profiles are good, people will create tools to work with profiles
- enhancement of metadata is a continuous process
- CRISs are a use case, DSpace is a use case - they require different methodologies
 
- MaxPlanck created a tool which might be of interest
- Naresh is working on a similar tool for his master thesis 
 
Peter Fox (TAB) informs about RDA tools
- use the mailing list 
- use the wiki
- let us know if there are functionalities missing
- let TAB knows about WG/IG collaborations
- is there video conf tools provided by RDA?
-> yes, but only for less than 10 people