Skip to main content

Notice

The new RDA web platform is still being rolled out. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Please report bugs, broken links and provide your feedback using the UserSnap tool on the bottom right corner of each page. Stay updated about the web site milestones at https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/.

RE: EU´s definition of research data plus addtl comments [rda-legalinterop-ig] Variant definitions of research data

  • Creator
    Discussion
  • #124794

    Hi you all,
    Only yesterday I had some time to read carefully the extended exchange of emails on this issue. I am not surprised that this debate is going on. I had the suspicion since 3 years ago that the RDA did not really know what its Alliance was all about. Actually a month ago in the RDA meeting in Barcelona, talking to Peter Wittenburg about whether only data, data expressions (scientific articles, biochem formulas –from nature, not synthesized or created-GMOs-,.. et al, in/through which the data are referred to.., as well as software, web services… that are processes etc), I was also surprised to get as his response that he did not know, and that we should raise the issue in the RDA plennaries because it is not really being addressed yet??!!
    [see also two substantive Qs I raised to delineate scope of our principles, on software and services, the same email as the one on public domain, which lead us, under Bob´s rec., to say in the intro that the 2 latest are not included at all for the sake of our Ps]
    The debate is very similar to the one we had in the CBD context about what the hell are genetic resources, whether the resources themselves or the information (e.g. sequenced DNA that can be set abroad with an email from the country of origin, while the resource remains untouched) or knowledge embedded in them and/or obtained from them (e.g. causal relationship to pre-knowledge such as traditional knowledge); it was, and still is, a debate that the Nagoya Protocol, 20 years after genetic resources were defined in the CBD, has not yet been solved and in which for the first time I realized the complexity that digitizing and other technologies (e.g. algorithms expressing whales´ vocalization) imply per se because of the “copybility” potential that they introduced [Actually, there was an ad hoc think tank meeting to define non-commercial uses of genetic resources which raised this issue upfront- but that also decided not to address this issue either, at Museum Koenig in Bonn, Germany on 17-19 November 2008, called by the Smithsonian,, in which I was commissioned to prepare the main info doc –so I had included discussing it]
    In any case I have no clear answer at all.
    This email is to remark:
    1.- That very possibly we might not be able to solve that on time (ambiguity could be our premise, with an introductory paragraph acknowledging that nobody really knows for sure –since there is no final authoritative definition- what research data is all about).
    2.- Since somebody asked if EU´s H2020 has a definition, I add the following crossed emails that I had with Gail just after one of our telcos 3 weeks ago (as if we were intuitively guessing that it would be unavoidable to have such a debate pretty soon: and it happened just the following week).
    I asked her what was her opinion about the “official” EU H2020 definition for the sake of open access, and her response convinced me that this def was not at all workable and she referred me to the ongoing work by Christine Borgman because they are also of relevance for the ongoing debate and I think we all should share it and gail probably forgot to raise them (or simply add them) to the ongoing fully open debate.
    I add these elements
    1.-First email (my comment to Gail):
    How do you feel about the RD defnition in EU´s H2020?
    ‘Research data’ refers to information, in particular facts or numbers, collected to be examined and considered and as a basis for reasoning, discussion, or calculation. In a research context, examples of data include statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images. The focus is on research data that is available in digital form.
    2.- Gail´s response: The concern with definitions such as this is that they presume a certain genre of data (e.g., quantitative/numeric) which suggests a skew toward physical scientific research. But it does not take much to encounter digital data that does not fit: photographic (natural sciences; medical; remote sensing); images (geologic maps, cross sections, deep sea core photos); textual (digital humanities); etc.
    Does a corpus of Spanish literature available for text-mining to study linguistic development fit within the category below? Or what about digital audio of animal calls in the wild? If RDA wants to become as relevant and ubiquitous as possible, I believe we must caution against presuming the interests of scholars concerned with digital data.
    Best regards
    3.- Third email with references to C.B research on the issue [Bernard, you are right there is research on what the heck research data is]
    De: Gail Clement [***@***.***] Enviado el: domingo, 09 de agosto de 2015 15:38
    Para: Enrique Alonso García
    Asunto: Big Data, Little Data, No Data | The MIT Press
    Dear Enrique,
    UCLA Professor Christine Borgman keynoted RDA Amsterdam last fall.
    Her latest work on Research Data is: https://mitpress.mit.edu/big-data
    All best, Gail
    ________________________________
    De: chris.morris=***@***.***-groups.org [chris.morris=***@***.***-groups.org] En nombre de chrishmorris [***@***.***]
    Enviado el: miércoles, 19 de agosto de 2015 10:20
    Para: ***@***.***; ***@***.***; ***@***.***; ***@***.***; ***@***.***-groups.org
    CC: ***@***.***; ***@***.***
    Asunto: Re: [rda-legalinterop-ig] Variant definitions of research data
    HI,
    It is useful to discuss policies for preserving and sharing physical object of research importance. In the life sciences we usually call them samples.
    But this isn’t the same discussion as about data. Data can be copied without loss, and copying data is usually cheap. This is a key argument for open data.
    For samples, there are many other considerations. Some study techniques are destructive, and even techniques that are planned to be non-destructive involve risk. For this reason alone some access restrictions are appropriate, e.g. to precious hominid fossils. Some samples are hazardous, e.g. blood samples from Ebola patients. Some samples have unknown and changing privacy implications – how much can you find out about me from 0.5ml3 of my cerebrospinal fluid? So the range of law involved is much more than IP law – even within Europe the legal definition of biological hazard is far from uniform.
    Finishing a task involves bounding it. Surely it is useful to produce legal interoperability guidelines about digital data. Once that is done, if anyone wants to convene a workgroup about physical samples of research importance, then good luck to you.
    Regards,
    Chris
    From: A.G.D.Turner=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of agdturner
    Sent: 19 August 2015 08:46
    To: Herman Stehouwer; puhlir; Repositorian; ***@***.***-groups.org
    Cc: Donat Agosti; MsDrData
    Subject: Re: [rda-legalinterop-ig] Variant definitions of research data
    I have just noticed that the source for the CASRAI Research object definition is RDA: http://smw-rda.esc.rzg.mpg.de/index.php/Main_Page
    I appreciate that in many cases the RDA will concern itself with such digital objects, but the importance of physical data in some academic studies should not be underestimated. I think the RDA should explicitly concern itself with samples (and indeed entire artefacts and all the dirt they have accumulated) and specimens used in research. BTW – a practical way of categorising such physical research data is by the conditions in which they are stored (sometimes they are not, they are just located in situ). Additionally though, some research data objects may be physical bit/byte storage entities – and that edge case is also important, and may complicate definitions. Notwithstanding that most physical research data will have digital object type profiles and that over time, the digital object profiles may persist for longer.
    HTH
    Andy
    http://www.geog.leeds.ac.uk/people/a.turner/index.html
    —– Reply message —–
    From: “agdturner”
    To: “Herman Stehouwer” , “puhlir”
    , “Repositorian” , “***@***.***-groups.org”
    Cc: “Donat Agosti” , “MsDrData”

    Subject: [rda-legalinterop-ig] Variant definitions of research data
    Date: Wed, Aug 19, 2015 08:08
    FWIW I like the CASRAI Research Data definition. “Digital” kind of meaning of the fingers a useful term, but is used out of context of computers while others regard it in the realm of codified bits/bytes. The CASRAI definition for “Digital data” is currently not very helpful in this context, but the definition of “Digital object” is more pertinent to this discussion as it has been mentioned and does refer to bits/bytes:
    http://dictionary.casrai.org/Digital_object
    A digital object is editable, interactive, accessible and modifiable by means of digital objects other than the one governing its behaviour, and is distributed over information infrastructures. It is a machine-independent data structure consisting of one or more elements in digital form that can be parsed by different information systems; the structure helps to enable interoperability among diverse information systems in the Internet.” A digital object is composed of structured sequence of bits/bytes. As an object it is named. The bit sequence realizing the object can be identified and accessed by a unique and persistent identifier or by use of referencing attributes describing its properties. SYNONYM. Digital entity
    Andy
    http://www.geog.leeds.ac.uk/people/a.turner/index.html
    —– Reply message —–
    From: “Herman Stehouwer”
    To: “puhlir”
    , “Repositorian” , “RDA/CODATA Legal Interoperability IG”
    Cc: “Donat Agosti” , “MsDrData”
    , “Andy Turner”
    Subject: [rda-legalinterop-ig] Variant definitions of research data
    Date: Wed, Aug 19, 2015 07:44
    Dear all,
    two quick remarks (I do try to follow your discussions, but usually I have nothing to add!).
    1) The DFT group explicitly limits itself to the domain of registered, digital, data. So it is a bit odd to use their definitions as an argument to limit yourself, as I thought the discussion here was broader.
    2) Informally RDA defines research data as “data of interest to researchers”. Which is a bit of a cop-out, but there you go.
    Cheers,
    Herman
    On 18/08/15 23:54, puhlir wrote:
    Sorry to come to this conversation late and thanks for getting the ball rolling (or the definition gelling). I agree that we should try to use any definition that the RDA DFT WG develops for “Research data” and I am surprised that this wasn’t the first term they addressed. If they do not in the near term, we can suggest using the CASRAI formulation, although it is quite long, or some other long-term definition that is from a reputable source. I think we should resist making one up, however.
    Cheers,
    Paul
    On Tue, Aug 18, 2015 at 5:25 PM, Repositorian wrote:
    On the question of whether RDA has its own RDM glossary containing a definition of research data, the answer is “Sort of yes”. Here is what I’ve surfaced so far:
    • The Data Foundations and Terminology (DFT) Working Group of RDA has in their remit the job of devising definitions for use across RDA
    • Their released deliverables to date comprise a set of documents available online at https://rd-alliance.org/group/data-foundation-and-terminology-wg/outcome
    • Of particular relevance to our discussion is RDA Data Foundation and Terminology DFT 3:Snapshot of DFT Core Terms, online at https://rd-alliance.org/system/files/DFT3%20-%20Snapshot%20of%20core%20t
    o They identify core terms and core concepts, using ‘snapshots’ to fix and represent a term/concept that may be differently understood across RDA or may be evolving and still fluid
    o They have defined only those 10 core terms which have shown to find rough consensus: neither “data” nor “research data” are not among those 10 core terms
    o “Data object” is not a core term in the eyes of this WG but it does have a ‘placeholder’ in the Appendix of Additional Terms that have been discussed. The term “Data object” is associated with the following “indication of meaning”: a type of Digital Object containing processible data/information/knowledge. “Digital Object” is a core term and is defined as:
    • 2.2.1 Digital Object (DO)
    • A. Definition
    • A digital object (DO) is represented by a bitstream, is referenced and identified by a persistent identifier and has properties being characterized by metadata.
    Thus it appears for our purposes that analog (non-digital) objects fall outside the scope of our Principles and Guidelines.
    It also appears from Data Foundation and Terminology (DFT) WG webpage that they are aware of the newly introduced CASRAI glossary. A posting to that page on 8-17-2015 reports:
    Announcing a new transdisciplinary Glossary for research data management
    Research Data Canada (RDC) in partnership with the international Consortia Advancing Standards in Research Administration Information (CASRAI)
    is pleased to announce the launch of a PILOT for a new interactive Glossary containing 500+ draft terms and definitions to support work in the field of research data management.
    The glossary is publicly available under a Creative Commons Attribution Only license (CC-BY) at
    http://dictionary.casrai.org/Category:Research_Data_Domain
    Gail P. Clement | Head of Research Services | Caltech Library | Mail Code 1-43 | Pasadena CA 91125-4300 | 626-395-1203
    http://orcid.org/0000-0001-5494-4806 | library.caltech.edu
    From: Donat Agosti [mailto:***@***.***]
    Sent: Tuesday, August 18, 2015 1:46 PM
    To: MsDrData
    ; agdturner ; RDA/CODATA Legal Interoperability IG
    Cc: Gail Clement
    Subject: RE: [rda-legalinterop-ig] Variant definitions of research data
    Does RDA have a definition of research data? What about US NSF or Horizon 2020/EU research? I am sure, Paul through his work at the National Academy has a source(s)? How does research data relate to research results that ought to be open in the US? Though Research data is not mentioned in the memo data is included in research results https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_publ
    cheers
    donat
    From: lisan=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of MsDrData
    Sent: Tuesday, August 18, 2015 10:32 PM
    To: agdturner ; RDA/CODATA Legal Interoperability IG
    Cc: Repositorian
    Subject: Re: [rda-legalinterop-ig] Variant definitions of research data
    The research data definition is better from my standpoint and incorporates all the examples I included. And, it covers the other more intentional data collection efforts.
    -Lisa
    On Tue, Aug 18, 2015 at 1:56 PM, agdturner wrote:
    Just to point out that the CASRAI glossary also has a definition for research data that might be useful:
    http://dictionary.casrai.org/Research_data
    Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data.
    Andy
    http://www.geog.leeds.ac.uk/people/a.turner/index.html
    – Show quoted text -From: Gclement=***@***.***-groups.org [mailto:Gclement=***@***.***-groups.org] On Behalf Of Repositorian
    Sent: 18 August 2015 18:23
    To: ‘RDA/CODATA Legal Interoperability IG’
    Subject: [rda-legalinterop-ig] Variant definitions of research data
    Hello RDA Colleagues,
    There seems to be a proliferation of definitions across our domain, including this one from CASRAI as part of their research data glossary initiative. http://dictionary.casrai.org/Data
    CASRAI just put out a call for reviews of their glossary and the one for data looks pretty good.
    Does this look useful to you? Does it align with the RDA research data terminology work?
    Data
    Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be in any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records
    Gail P. Clement | Head of Research Services | Caltech Library | Mail Code 1-43 | Pasadena CA 91125-4300 | 626-395-1203
    http://orcid.org/0000-0001-5494-4806 | library.caltech.edu

    Full post: https://www.rd-alliance.org/group/rdacodata-legal-interoperability-ig/po
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/49569

    _______________________________________________
    Lisa Neidert Population Studies Center
    Data Scientist Institute for Social Research
    734-763-2203(P) 426 Thompson, P.O. Box 1248
    734-763-1428(F) Ann Arbor, MI 48106-1248
    ***@***.*** � �http://www.psc.isr.umich.edu
    Twitter: @msdrdata Skype: MsDrData

    Full post: https://www.rd-alliance.org/group/rdacodata-legal-interoperability-ig/po
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/49569

    Full post: https://www.rd-alliance.org/group/rdacodata-legal-interoperability-ig/po
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/49569

    Dr. ir. Herman Stehouwer
    Max Planck Computing and Data Facility (MPCDF)
    RDA Secretariat
    ***@***.*** 0031-619258815
    Skype: herman.stehouwer.mpi
    ________________________________
    El texto de este correo es confidencial y exclusivamente está dirigido a su destinatario. Si se ha enviado a una dirección errónea rogamos elimine el mismo y, en su caso, los documentos adjuntos, y nos lo comunique urgentemente. This message is intented only for the use of the addresse and contain confidential information. If you are not the intented recipient, dissemination of this documentation is prohibited. If you have received this communication in error, please, erase all copies of the message and its attachments and notify us immediately.
    Antes de imprimir este correo electrónico, piense bien si es necesario hacerlo: El medioambiente es cosa de todos.

Log in to reply.