Dear members of the FAIR Data Maturity Model Working Group!
The discussion on GitHub
has been
lively over the last couple of weeks. This link
+is%3Aopen+sort%3Aupdated-desc> allows you to see the issues that were
recently updated. Below are some of the comments for which we'd like to
gather more views.
Other than that, we would like to ask you to review the proposed priorities
(mandatory, recommended and optional) and let us know if you want to suggest
changes to the proposals.
Indicators prioritisation for Findability
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
* Could the indicators for uniqueness and persistence of identifiers
be combined in a single indicator?
* Why are there indicators concerning universally unique, persistent
identifiers for metadata? Are there any examples where metadata (e.g. a
metadata 'record') has its own identifier?
Indicators for F3: metadata clearly and explicitly include the identifier of
the data it describes
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
* There are cases where the PID of the data resolves to the metadata.
How could that situation be captured in the indicators?
Indicators for F4: (meta)data are registered or indexed in a searchable
resource
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
* The formulation "metadata is harvested by" does not put the
responsibility on the data provider; a provider cannot force anyone to
harvest the metadata. It might be better to change this to "metadata is
offered/made available in such a way that it can be harvested and indexed by
."
Indicators prioritisation for Accessibility
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/31
(No comments yet)
Indicators for A1: (meta)data are retrievable by their identifier using a
standardised communications protocol
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
* A proposal has been made to replace "file" by "digital object" in
indicator A1-03D.
Indicators for A1.1: the protocol is open, free, and universally
implementable
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
* "Protocol" should also be understood to include the actions that a
human reuser needs to perform to get access to the data, including, for
example, filling in an application form or calling by telephone. The current
indicators do not include this aspect. Should we add an indicator such as
"Actions to be taken by a reuser to get access to the data are well
documented"?
Indicators prioritisation for Interoperability
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
* In the current proposal, there are no mandatory indicators for
interoperability. Should there be?
Indicators for I1: (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
* Big data does not use the term "knowledge representation" the same
way. Knowledge about the structured sequences of numbers is contained in the
metadata. The term "knowledge representation" needs to be explained that it
includes such structured sequences or a different term should be used.
Indicators for I3: (meta)data include qualified references to other
(meta)data
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
* In the case of Big Data (structured number sequences), no references
to other data will exist. In that case, the formulation is not appropriate.
Indicators prioritisation for Reusability
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/33
* Should there be a separate indicator concerning the provision of
information in the metadata about the technical environment needed to re-use
data?
* The indicator "Data complies with a community standard" is too
vague. Could it be narrowed to "Data format complies with a community
standard" or "Data representation complies with a community standard" or
"Data description complies with a community standard"?
Indicators for R1.3: (meta)data meet domain-relevant community standards
https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/29
* The requirement for using a community standard could be difficult to
meet for new research, because it could be too early for standards to be
established in that area. Could this be handled by understanding 'standard'
in a wider sense, e.g. including less formal, published specifications?
Furthermore, a suggestion has been made to refer to the FAIRsharing registry
of community standards in various
places (in particular F2
and
R1.3 )
to help people find relevant domain/discipline-specific metadata/data
standards.
We are hoping to reach consensus on the indicators and the prioritisation by
the end of August, in order to discuss and reach agreement in the online
meeting on 12 September 2019.
We very much welcome your comments and suggestions in the relevant GitHub
issues.
Makx Dekkers and the editorial team
Author: Mark Wilkinson
Date: 19 Aug, 2019
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
>
> Indicators for I3: (meta)data include qualified references to other
> (meta)data
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
>
> * In the case of Big Data (structured number sequences), no
> references to other data will exist. In that case, the formulation
> is not appropriate.
>
We could add "where possible". ...but even in Big Data, there are often
references to external concepts. Those concepts should be captured
using the GUID of that external concept, rather than just a "label".
...of course, in many/most Big Data formats, it's too late to change
their structure, so then it falls onto the Metadata to explain what each
external reference in the data "blob" referrs-to. Not a solved problem
(at least, not in widespread use).
For small-data, the rule should apply, IMO. For Big Data formats, we
should encourage those who are inventing new formats to consider this
FAIR rule when they invent their data structures.
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
>
> Indicators for I3: (meta)data include qualified references to other
> (meta)data
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
>
> * In the case of Big Data (structured number sequences), no
> references to other data will exist. In that case, the formulation
> is not appropriate.
>
We could add "where possible". ...but even in Big Data, there are often
references to external concepts. Those concepts should be captured
using the GUID of that external concept, rather than just a "label".
...of course, in many/most Big Data formats, it's too late to change
their structure, so then it falls onto the Metadata to explain what each
external reference in the data "blob" referrs-to. Not a solved problem
(at least, not in widespread use).
For small-data, the rule should apply, IMO. For Big Data formats, we
should encourage those who are inventing new formats to consider this
FAIR rule when they invent their data structures.
>
> Indicators prioritisation for Reusability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/33
>
> * Should there be a separate indicator concerning the provision of
> information in the metadata about the technical environment needed
> to re-use data?
>
The word "plurality" was the only word that we (Michel, actually,
decided on that word, and others at the table agreed with it) could find
that described the concept of "giving as much metadata as you possibly
could, without presuming who the end user might be, and then giving some
more!". So yes, technical environment would definitely be a part of that.
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
>
> Indicators for I3: (meta)data include qualified references to other
> (meta)data
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
>
> * In the case of Big Data (structured number sequences), no
> references to other data will exist. In that case, the formulation
> is not appropriate.
>
We could add "where possible". ...but even in Big Data, there are often
references to external concepts. Those concepts should be captured
using the GUID of that external concept, rather than just a "label".
...of course, in many/most Big Data formats, it's too late to change
their structure, so then it falls onto the Metadata to explain what each
external reference in the data "blob" referrs-to. Not a solved problem
(at least, not in widespread use).
For small-data, the rule should apply, IMO. For Big Data formats, we
should encourage those who are inventing new formats to consider this
FAIR rule when they invent their data structures.
>
> Indicators prioritisation for Reusability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/33
>
> * Should there be a separate indicator concerning the provision of
> information in the metadata about the technical environment needed
> to re-use data?
>
The word "plurality" was the only word that we (Michel, actually,
decided on that word, and others at the table agreed with it) could find
that described the concept of "giving as much metadata as you possibly
could, without presuming who the end user might be, and then giving some
more!". So yes, technical environment would definitely be a part of that.
> * The indicator “/Data complies with a community standard/” is too
> vague. Could it be narrowed to “/Data format complies with a
> community standard/” or “/Data representation complies with a
> community standard/” or “/Data description complies with a
> community standard/”?
>
I think it really meant: if there IS a standard, then be sure you use
it. If there ISN'T a standard, you should consider if the problem can
be standardized, and then create one. --> The more predictable
structure, the better! And only the community can decide what metadata
elements are really important for reproducibility and reuse of their data.
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
>
> Indicators for I3: (meta)data include qualified references to other
> (meta)data
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
>
> * In the case of Big Data (structured number sequences), no
> references to other data will exist. In that case, the formulation
> is not appropriate.
>
We could add "where possible". ...but even in Big Data, there are often
references to external concepts. Those concepts should be captured
using the GUID of that external concept, rather than just a "label".
...of course, in many/most Big Data formats, it's too late to change
their structure, so then it falls onto the Metadata to explain what each
external reference in the data "blob" referrs-to. Not a solved problem
(at least, not in widespread use).
For small-data, the rule should apply, IMO. For Big Data formats, we
should encourage those who are inventing new formats to consider this
FAIR rule when they invent their data structures.
>
> Indicators prioritisation for Reusability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/33
>
> * Should there be a separate indicator concerning the provision of
> information in the metadata about the technical environment needed
> to re-use data?
>
The word "plurality" was the only word that we (Michel, actually,
decided on that word, and others at the table agreed with it) could find
that described the concept of "giving as much metadata as you possibly
could, without presuming who the end user might be, and then giving some
more!". So yes, technical environment would definitely be a part of that.
> * The indicator “/Data complies with a community standard/” is too
> vague. Could it be narrowed to “/Data format complies with a
> community standard/” or “/Data representation complies with a
> community standard/” or “/Data description complies with a
> community standard/”?
>
I think it really meant: if there IS a standard, then be sure you use
it. If there ISN'T a standard, you should consider if the problem can
be standardized, and then create one. --> The more predictable
structure, the better! And only the community can decide what metadata
elements are really important for reproducibility and reuse of their data.
> Indicators for R1.3: (meta)data meet domain-relevant community standards
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/29
>
> * The requirement for using a community standard could be difficult
> to meet for new research, because it could be too early for
> standards to be established in that area. Could this be handled by
> understanding ‘/standard’/ in a wider sense, e.g. including less
> formal, published specifications?
>
See above - if there isn't a standard, and the data representation could
be standardized, then create and approve a standard (e.g. a Minimal info
model)
Phew! Lots of things to say...
I'll scatter comments into the text. Sorry if the comments are blunt
and to-the-point. I'm doing this with my left-hand while editing my
student's thesis, so I don't have time for friendly/polite chit-chat ;-)
On 2019-08-19 2:23 p.m., makxdekkers wrote:
> Indicators prioritisation for Findability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/30
>
> * Could the indicators for uniqueness and persistence of identifiers
> be combined in a single indicator?
>
uniqueness and persistence are not synonymous. http://cnn.com is
unique, in that it cannot mean anything other than the CNN homepage. It
is not persistent, however, because it is not pointing at the same
content every time. Persistence is (in part) about the re-use of an
GUID to point to another record... as happens all the time with web
pages. (persistence also involves the longevity of the identifier, but
that's not part of F(indability))
> * Why are there indicators concerning universally unique, persistent
> identifiers for metadata? Are there any examples where metadata
> (e.g. a metadata ‘record’) has its own identifier?
>
Every DOI on earth!
>
> Indicators for F3: metadata clearly and explicitly include the
> identifier of the data it describes
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/17
>
> * There are cases where the PID of the data resolves to the
> metadata. How could that situation be captured in the indicators?
>
the question is internally inconsistent. FAIR requires that the data
and the metadata have distinct identifiers (because they are not the
same thing!), so... (For those who remember history, there was a
protocol in the past, the LSID, that solved this problem by having a
single identifier for the data and the metadata, and then providing them
independently via two distinct Web calls - getData, getMetadata. With
HTTP, we are stuck with GET, so we must have distinct identifiers for
the two kinds of information.) (that is not to preclude that other
identifier systems - beyond URIs - might exist now or in the future, and
should be evaluated by the same rules)
>
> Indicators for F4: (meta)data are registered or indexed in a
> searchable resource
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/18
>
> * The formulation “/metadata/ /is harvested by/” does not put the
> responsibility on the data provider; a provider cannot force
> anyone to harvest the metadata. It might be better to change this
> to “/metadata is offered/made available in such a way that it can
> be harvested and indexed by …/”
>
This is an extremely hard Principle to adhere-to, at the moment. Most
search engines do not harvest structured metadata at all; some harvest
only certain types of metadata (e.g. schema.org); search engines don't
have a common search interface, meaning we have to code to each one; and
many search engines that DO harvest metadata, are not allowed to be
accessed automatically (e.g. Google dataset search). So... I think the
best thing is to simply talk about the problems here, and suggest that
we all find a way to do better. I cannot think of a "good" solution for
this Principle, at the moment... (my test uses Bing, only because I am
allowed to automate search against Bing! But it doesn't index metadata
very well; doesn't index DOIs **at all**!!; and it seems to refuse to
index pages that primarily contain structured metadata, like the search
interface to one of my databases, that provides a google-like search box
as its only HTML element, but provides a 2-page JSON-LD metadata record
about what is in the database! LOL! Bing wont index that page at all!)
So I would suggest that this is "an ongoing problem, with no good
solution to-date"
>
> Indicators for A1: (meta)data are retrievable by their identifier
> using a standardised communications protocol
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/19
>
> * A proposal has been made to replace “/file/” by “/digital object/”
> in indicator A1-03D.
>
+1
> Indicators for A1.1: the protocol is open, free, and universally
> implementable
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/20
>
> * “Protocol” should also be understood to include the actions that a
> human reuser needs to perform to get access to the data,
> including, for example, filling in an application form or calling
> by telephone. The current indicators do not include this aspect.
> Should we add an indicator such as “/Actions to be taken by a
> reuser to get access to the data are well documented/”?
>
+1
> Indicators prioritisation for Interoperability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/32
>
> * In the current proposal, there are no mandatory indicators for
> interoperability. Should there be?
>
Not sure. They would be pretty general, if there were, because the
sub-principles define what was intended.
> Indicators for I1: (meta)data use a formal, accessible, shared, and
> broadly applicable language for knowledge representation
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/23
>
> * Big data does not use the term "knowledge representation" the same
> way. Knowledge about the structured sequences of numbers is
> contained in the metadata. The term "knowledge representation"
> needs to be explained that it includes such structured sequences
> or a different term should be used.
>
Michel and I had a long discussion about this at the biohackathon last
year (or maybe the year before?). "knowledge representation",
originally (among our initial FAIR Metrics group) meant that the
language had a BNF. I am personally not happy with that definition
(though we have never created a better one, in a formal way). Lots of
formal data structures structures have BNFs, but they are not able to
communicate "meaning", only structure (or at best, grammatical "meaning"
- e.g. this is a thing, and this is a property). I think FAIR intended
something much deeper than that! ...but I don't know how to define it.
It must at least be able to represent concepts that are defined
elsewhere (i.e. it must be able to refer outwards to defined concepts,
e.g. in an ontology)
>
> Indicators for I3: (meta)data include qualified references to other
> (meta)data
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/25
>
> * In the case of Big Data (structured number sequences), no
> references to other data will exist. In that case, the formulation
> is not appropriate.
>
We could add "where possible". ...but even in Big Data, there are often
references to external concepts. Those concepts should be captured
using the GUID of that external concept, rather than just a "label".
...of course, in many/most Big Data formats, it's too late to change
their structure, so then it falls onto the Metadata to explain what each
external reference in the data "blob" referrs-to. Not a solved problem
(at least, not in widespread use).
For small-data, the rule should apply, IMO. For Big Data formats, we
should encourage those who are inventing new formats to consider this
FAIR rule when they invent their data structures.
>
> Indicators prioritisation for Reusability
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/33
>
> * Should there be a separate indicator concerning the provision of
> information in the metadata about the technical environment needed
> to re-use data?
>
The word "plurality" was the only word that we (Michel, actually,
decided on that word, and others at the table agreed with it) could find
that described the concept of "giving as much metadata as you possibly
could, without presuming who the end user might be, and then giving some
more!". So yes, technical environment would definitely be a part of that.
> * The indicator “/Data complies with a community standard/” is too
> vague. Could it be narrowed to “/Data format complies with a
> community standard/” or “/Data representation complies with a
> community standard/” or “/Data description complies with a
> community standard/”?
>
I think it really meant: if there IS a standard, then be sure you use
it. If there ISN'T a standard, you should consider if the problem can
be standardized, and then create one. --> The more predictable
structure, the better! And only the community can decide what metadata
elements are really important for reproducibility and reuse of their data.
> Indicators for R1.3: (meta)data meet domain-relevant community standards
>
> https://github.com/RDA-FAIR/FAIR-data-maturity-model-WG/issues/29
>
> * The requirement for using a community standard could be difficult
> to meet for new research, because it could be too early for
> standards to be established in that area. Could this be handled by
> understanding ‘/standard’/ in a wider sense, e.g. including less
> formal, published specifications?
>
See above - if there isn't a standard, and the data representation could
be standardized, then create and approve a standard (e.g. a Minimal info
model)
> Furthermore, a suggestion has been made to refer to the FAIRsharing
> registry of community standards
> in various places (in particular F2
>
> and R1.3
> )
> to help people find relevant domain/discipline-specific metadata/data
> standards.
>
+1 +1 +1! And... beyond that... we should encourage FAIRsharing to
expand the metadata elements that they provide for each standard, so
that an agent discovering the standard in FAIRsharing is sufficient for
it then to operate on the (meta)data itself (likely in collaboration
with other, similar projects). This will require resources for
FAIRsharing!! It's not an easy problem! (one simple example is this:
My agent has a GUID. I want to know if it conforms to an existing
standard (e.g. DOI). I would like to be able to find all GUID standards
in FAIRsharing, and then do a test (e.g. regular expression) against my
GUID, to see if it conforms with any of those - I discover that it
pattern-matches with DOI. Then I use the other metadata in FAIRsharing
to determine what the resolution mechanism is for this GUID
(https://doi.org/XXXXX with the Accept headers "application/json").
Some of this is captured in the MIRIAM registry, but not all, and not at
the level of granularity that FAIRsharing offers (or could offer!) -
e.g. MIRIAM can tell me how to resolve an ID once it is identified, and
even how to identify it by RegExp, but it cannot tell me if that is, in
fact, a globally-accepted GUID standard.
I hope those comments are helpful! Sorry, at the moment I don't have
time to go into the GitHub and deposit these ideas there!
Cheers all!
Mark
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus