Hi Rolf,
thanks for your thoughts. Of course, we can interpret the XSD as (too strict) implementation of the standard, maybe the RegEx is also an artifact from older versions. I think is definitely a good idea to file an issue in the GitHub repository of DataCite. I will do this in a second. However, as we do not ‘force’ someone to validate datacite.xml this should be no showstopper. It’s just not optimal. Regarding the alternate identifier we can of course also skip recommending any identifier type and let the bag consumer assign an identifier if the bagged object has no identifier.
Regards,
Thomas
—
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Dipl. Ing. Thomas Jejkal
Hermann-von-Helmholtz-Platz 1
76344 Eggenstein-Leopoldshafen, Germany
Phone: +49 721 608-24042
E-mail: ***@***.***
Web: http://www.scc.kit.edu
ORCID: http://orcid.org/0000-0003-2804-688X
Registered office: Kaiserstraße 12, 76133 Karlsruhe, Germany
KIT – The Research University in the Helmholtz Association
Am 24.01.18, 10:33 schrieb “rolf.krahl=***@***.***-groups.org im Auftrag von rolf.krahl” :
Hi Thomas,
Hi Rolf,
thanks for your thoughts. Of course, we can interpret the XSD as (too strict) implementation of the standard, maybe the RegEx is also an artifact from older versions. I think is definitely a good idea to file an issue in the GitHub repository of DataCite. I will do this in a second. However, as we do not ‘force’ someone to validate datacite.xml this should be no showstopper. It’s just not optimal. Regarding the alternate identifier we can of course also skip recommending any identifier type and let the bag consumer assign an identifier if the bagged object has no identifier.
Regards,
Thomas
—
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Dipl. Ing. Thomas Jejkal
Hermann-von-Helmholtz-Platz 1
76344 Eggenstein-Leopoldshafen, Germany
Phone: +49 721 608-24042
E-mail: ***@***.***
Web: http://www.scc.kit.edu
ORCID: http://orcid.org/0000-0003-2804-688X
Registered office: Kaiserstraße 12, 76133 Karlsruhe, Germany
KIT – The Research University in the Helmholtz Association
Am 24.01.18, 10:33 schrieb “rolf.krahl=***@***.***-groups.org im Auftrag von rolf.krahl” :
Hi Thomas,
Am Mittwoch, 24. Januar 2018, 08:10:41 schrieb TJejkal:
>
> David figured out that the SWORD people are also referring to our
> recommendations with the difference that they skip the requirement
> of datacite.xml. After a short discussion in their profile working
> document [1] it seems that the main reason is the necessity of a
> DOI. Obviously, using machine-recognizable codes as stated under
> ‘Guidance for handling missing mandatory property values’ in [2]
> applies to all mandatory properties but the identifier as the schema
> defined a fixed regular expression with the value 10..+/.+ for this
> element. Thus, datacite documents using placeholders for the
> identifier won’t validate against the schema.
I would say, the authoritative source for the standard is the written
document. The XML Schema Definition file is (or should be) merely an
implementation of this standard. The formulation in the standard
document is clear, cite p. 10:
| 2.3 DataCite Properties
|
| Table 3 provides a detailed description of the mandatory properties,
| which must be supplied with any initial metadata submission to
| DataCite, together with their sub‐properties. If one of the
| required properties is unavailable, please use one of the standard
| (machine‐recognizable) codes listed in Appendix 3, Table 11.
E.g. the standard values for unknown information in Appendix 3, Table
11 are allowed to be used for the mandatory properties listed in Table
3, which includes the Identifier property. From this follows that the
regular expression in the XML Schema Definition file is a bug.
I would favor:
1. For the time being, we keep the requirement that packages must
contain a datacite.xml file and also that the content of this file
must be valid according to the DataCite standard.
2. We add a note that that if the digital object in the package does
not has a DOI or if the DOI is not known, one of the standard
values for unknown information (Appendix 3 of DataCite) MUST be
used in the Identifier property. (E.g. we state explicitly that
according to our interpretation, DataCite does not imply the
necessity of a DOI.)
We add that an AlternateIdentifier SHOULD be used if a DOI is not
provided. (But I would not require any particular type. DataCite
requires the alternateIdentifierType sub-property to be used with
AlternateIdentifier, but specifies the allowed value only as free
text. We should not go further then that here.)
3. We add a note that if one of the standard values for unknown
information is used for the Identifier property, the datacite.xml
will not validate against the DataCite XML Schema Definition. We add
that we consider this a bug in the XSD and that this fact does not
imply invalidity of the provided metadata. We add that the
receiver of a package MUST NOT reject the package based on a failed
XML Schema validation, if this failure is only due to an unknown
DOI.
4. We contact the DataCite people and ask them to fix the bug in their
XML Schema Definition.
—
Rolf Krahl
Helmholtz-Zentrum Berlin für Materialien und Energie (HZB)
Albert-Einstein-Str. 15, 12489 Berlin
Tel.: +49 30 8062 12122