Data Foundation and Terminology: Data Models

You are here

23 Jul 2013

Data Foundation and Terminology: Data Models

Greetings all,

 

I hope that summer is going well and look forward to seeing you all at the 2nd RDA Plenary. The DFT WG is progressing and we will very shortly have a Models paper, lead by Peter Wittenburg, out for review.

 

At the Gottenburg meeting the DFT WG discussed soliciting input on terms from other WGs and at this point this input would be very useful. Out next step will involve analysis of model and terms to bring some things into harmony.  We would therefore like to ask if you have some candidate terms/concepts from your preliminary work that you would like to us include in our work. 

 

Thoughts and input can be sent to me and Peter W and, of course, posted on the RDA site as you choose.  Discussion is likely to take place there over the next month or so in preparation for the Plenary.

 

Best wishes and thanks in anticipation

 

Gary Berg-Cross, Ph.D.  

 

The file referenced by Gary is now online in the filedepot, you can get it with the link below.

Please discuss

https://www.rd-alliance.org/index.php?q=filedepot_download/385/131

 
  • Reagan Moore's picture

    Author: Reagan Moore

    Date: 23 Jul, 2013

    Gary:
    I read through the data model descriptions, which primarily focused on identification and access methods for digital objects.

    I am interested in a generalization of the data model that is applied within the iRODS data grid.  The challenge is that the environment that is used to managed the digital object is equally important.  We had to consider name spaces for both the environment in which the digital object is managed as well as a name space for the digital object.  The trivial case is the need for a name space for users, which is required if access controls are going to be enforced.

    The generalized model is to consider:

    • Name space (logical construct for registering entities)
    • Entities represented by the name space
    • Properties that are associated with the entities
    • Policies that control the enforcement of the desired properties
    • Operations that are performed upon the entities

    This made it possible to develop name spaces for digital objects, collections, users, storage systems, and policies.  

    We could then impose:

    • Access controls, policies managing relationships between the name space for digital objects and the name space for users
    • Data distribution, policies managing relationships between the name space for digital objects and name space for storage
    • Data replication
    • Time dependent access controls
    • Registration of workflows as digital objects
    • etc.

    Reagan Moore

  • Chris Morris's picture

    Author: Chris Morris

    Date: 26 Jul, 2013

    When a paper is retracted, it would be great if some appropriate markup was supplied to future lookup of the papers that cited it. 

    This simple use case is a reminder that the life cycle of research data is a little richer than that of data in general. Some extra information is needed to support the management of good research practice, and the appropriate steps when misconduct is suspected:

    - links from results to researchers' declaration of interests

    - a link to details of the funding source (statistics show that a study of the efficacy of a drug has different significance if publicly funded that if funded by the manufacturer)

    - if there were human subjects, links to the consent given and the report of the ethical review

    - a link to any retraction that applies

     

    BioMedBridges is discussing these questions, but we are some way from a proposed ontology.

     

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 29 Jul, 2013

    Thank you for this comment, which does extend some of the thinking on documentation and markup/annotation of research data.  My first thought is that the additional "markup'  would be an extension of a metadata standard such as the Dublin Core with the items you cite such as "declaration of interests."

    This is a topic for that WG to consider and one of us should cross-post it to them, and we can raise it at the 2nd Plenary when cross-group discussions take place.

    Within a stanard template like the Dublin Core there is a very general domain category.  Differen domains, such as BioMed may have a need for some specific documentation, per your example. Thus the standard temple may need to be further extended in some systematic way for different reserach fields.

     

    Gary Berg-Cross

    SOCoP

  • Peter Wittenburg's picture

    Author: Peter Wittenburg

    Date: 25 Aug, 2013

    During the last weeks we worked on an analysis of the models that have been presented so far. Yet we did not manage to include Reagan's model ideas which will be the next step in revising DM1 and DM2.

    We are working on the following documents and whenever they are published via this forum, people are welcome to respond.

    Data Models 1: Overview                                          uploaded earlier - version 1

    Data Models 2: Analysis (of data models)            now uploaded - version 0.2

    Data Models 3: Analysis of Workflows                  to come

    Data Models 4: Synthesis                                        to come - will need thorough and open discussion

    Data Models 5: Terminology                                    to come

    Everyone interested is invited to comment on the DM 2 document.

    We are planning to discuss the documents

    - at the coming DFT virtual session 27.9.2013 at 4 pm CET

    - at the plenary DFT session

     

    You find all uploaded documents of DFT at the following link:

    https://rd-alliance.org/filedepot/folder/100

     

    best regards

    Peter

     

     

     

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 27 Aug, 2013

    You can find the latest DFT WG document on Model Analysis in the File Repository.

    The exact link is:

    https://www.rd-alliance.org/filedepot?fid=163

  • Gary Berg-Cross's picture

    Author: Gary Berg-Cross

    Date: 06 Sep, 2013

    We have had some interesting exchanges on email about PIDs and such.  If interested you should be able to read these on the DFT email archive at:

     

    http://lists.lists.rd-alliance.org/pipermail/rda-cwg-terminology/

     

    Gary Berg-Cross

     

  • Simon Cox's picture

    Author: Simon Cox

    Date: 17 Sep, 2013

    Perhaps a review of some of the work on registry-repository models could provide some useful insight. 

    I suggest looking at the 'Procedures for registration' standard from ISO/TC 211

    ISO 19135 http://www.iso.org/iso/catalogue_detail.htm?csnumber=32553 
    http://standards.data.gov.uk/proposal/use-iso-19135-standard-item-registration

    Sorry this is an ISO document so you or your library has to purchase it (not different from academic journals :-) ) but there is some good stuff inside so I recommend taking a look. 

    And also the OASIS ebXML Registry-Repository model https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=regrep This particular implementation is probably a dead technology now, but the coupled registry-repository theory is good. 

    While each of these are expressed in terms of a detailed object model with many attributes defined, the general principle are similar: description, lifecycle, access. 

    For terminology definitions, also look here: http://www.isotc211.org/Terminology.htm

     

submit a comment