Fwd: [rda-agrdatainterop-ig] Advice on new controlled vocabulary

21 Feb 2018

Dear agrisemantics members,
I am transfering this most interesting discussion which started on the IGAD mailing list. Sorry for multiple posting
Sophie Aubin - Inra - DIST
-------- Message d'origine --------
De : ElizabethArnaud <***@***.***>
Date : 21/02/2018 12:59 (GMT+01:00)
À : RichardOstler <***@***.***>, ***@***.***, "simon.cox" <***@***.***>, "Agricultural Data Interest Group (IGAD)" <***@***.***-groups.org>
Cc : valeria pesce <***@***.***>, Jonquet <***@***.***>
Objet : Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Just a quick naïve remark:
Why not to submit to the reference Unit Ontology (UO) all missing concepts that are the WUR Ontology of Unit of Measurement (UM)? UM terms would then get a UO ID. Additionally, Unit Ontology is available on OBO-foundry ( http://www.obofoundry.org/ontology/uo.html) which is the reference source of ontologies used by OLS through the API, then integrated concepts would automatically appear in OLS.
Elizabeth
From: <***@***.***-groups.org> on behalf of RichardOstler <***@***.***>
Date: mercredi 21 février 2018 12:33
To: "***@***.***" <***@***.***>, "simon.cox" <***@***.***>, "Agricultural Data Interest Group (IGAD)" <***@***.***-groups.org>
Cc: "***@***.***" <***@***.***>, "***@***.***" <***@***.***>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Thanks for the tips.
So far I’ve mostly been using ontology lookup service and agroportal, and have been selecting measurement concepts from Unit Ontology (https://github.com/bio-ontology-research-group/unit-ontology) and Crop Ontology. However, I have found a few gaps with older imperial measures (e.g. hundredweight) which the Wageningen unit ontology does include, so it’s a shame it isn’t listed/searchable on OLS and Agroportal - I’ve also checked bioportal and fairsharing and it isn’t listed on those either.
Thanks
Richard
From: ***@***.*** [mailto:***@***.***]
Sent: 21 February 2018 08:20
To: simon.cox <***@***.***>
Cc: valeria pesce <***@***.***>; Richard Ostler <***@***.***>; ***@***.***; ***@***.***-groups.org
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
You may also be interested in ontologies for units of measurement, like http://www.wurvoc.org/vocabularies/om-1.6/ .
Sent via Webmail interface
________________________________
From: "simon.cox" <***@***.***>
To: "valeria pesce" <***@***.***>, "richard ostler" <***@***.***>, ***@***.***, ***@***.***-groups.org
Sent: Wednesday, 21 February, 2018 4:07:17 AM
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Also consider the w3c SSN ontology for sampling and observations.
w3.org/TR/vocab-ssn
- Show quoted text -From:valeria.pesce=***@***.***-groups.org <***@***.***-groups.org> on behalf of valeriapesce <***@***.***>
Sent: Tuesday, 20 February 2018 11:50:40 AM
To: RichardOstler; Clement Jonquet; Agricultural Data Interest Group (IGAD)
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Dear Richard,
Interesting task.
I would second Armando’s recommendation that for your use case you might need an ontology / model / schema for agricultural experiments rather than a thesaurus (you can have a thesaurus for the controlled values of specific properties).
It’s also true that I’ve seen cases in which people use SKOS to create a list of properties that they need to describe something, but I don’t think it’s the best use of SKOS.
And I would also highlight what Armando says about distinguishing between vocabularies and datasets: you will probably need vocabularies for the experiment description structure (a schema or an ontology) and for controlled values, but the data that you expect will be contributed would be the actual instances of the experiments data, which would ideally go into a dataset, with records structured according to the vocabularies you designed.
The only suggestion I would add is to look at what has already been done in a few projects (which you may already know):
- The ICASA data standards for field experiments: https://dssat.net/data/standards_v2. I think the initial standard was a specification with possible serializations in CSV and XML. It provides a model and a list of variables. From what I understand there were plans to render ICASA in RDF in the TERRA-REF project (http://terraref.org/about/) and the ICASA data dictionary is also being mapped to various ontologies as part of the Agronomy Ontology project (http://www.obofoundry.org/ontology/agro.html).
- Projects on crop modelling and sharing experiment data (no RDF yet, but useful models to base your RDF upon): AgMIP (in which they’re reusing the ICASA variables: http://research.agmip.org/display/dev/ICASA+Master+Variable+List), APSIM: http://www.apsim.info/, CGIAR AgTrials (http://www.agtrials.org/).
- Useful RDF vocabularies to build upon or reuse: the Crop Research Ontology: http://agroportal.lirmm.fr/ontologies/CO_715; AgroRDF: http://data.igreen-services.com/agrordf.
Hope this helps a little.
Best regards,
Valeria
From:richard.ostler=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of RichardOstler
Sent: Tuesday, February 20, 2018 12:45 PM
To: Clement Jonquet <***@***.***>; Agricultural Data Interest Group (IGAD) <***@***.***-groups.org>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Hi Clement,
Thanks for your reply. I think more background would be useful for explaining my motivation.
At Rothamsted we have several agriculture long-term experiments (e.g. Broadbalk winter wheat, Park grass, Hoosfield spring barley and more). Globally, with other agricultural research institutes, we’re starting to identify other long-term experiments and develop a better network for utilising them – there is growing interest in meta-analyses across diverse LTEs for questions on yield sustainability/sustainable intensification and natural capital.
From our initial work it is apparent there are not any metadata standards or recommendations being widely used in his community.
Some defining characteristics of agriculture long-term experiments are:
1. They run for many years (decades) and an experimental plot will receive the same treatment regime over that time
2. The experiments and their plots have names or identifiers which persist over time, however they can be known by different names in the literature.
3. The experiment design can be modified (e.g. plots split or renamed, treatments modified)– this is a particular problem for older experiments with less robust/modern statistical designs
4. Experiments can be used for research beyond their original purpose and new, but related, datasets generated. For example, our primary datasets are yield, but additional datasets on soils, biodiversity and phenotypic traits have been generated. These can all be linked to an experiment plot but currently there isn’t a recommendation on persistent identifiers for doing this.
For point 2 above, a gazetteer-like controlled vocabulary of Long-term experiment, field and plot names could provide a resource for consistently naming these things and semantically tagging resources referencing them. It could also capture alternative names and basic relationships between different classes. – e.g. Plot A is part of Experiment X. The main classes would be site/field, experiment, plot. Potentially you could expand this into an ontology to capture other relationships for example, Experiment X is a Rotation Experiment.
In fact I think the experiments, plots and fields should be assigned persistent identifiers and have relevant metadata captured. e.g. location, geographic & climate for the site; area, soil characteristics, treatment group for a plot; treatments; standard management & cropping for the experiment.
I’ve looked closely at the DEIMS-SDR metadata models and MIAPPE and I don’t think either is quite sufficient to represent metadata for agricultural LTE datasets. I like the MI checklist approach and recommended ontologies taken by MIAPPE and I think existing ontologies can be used for much of the checklist detail – I think what is needed is an MI checklist relevant to agriculture LTEs.
Thanks
Richard
From: Clement Jonquet [mailto:***@***.***] On Behalf Of Clement Jonquet
Sent: 20 February 2018 06:13
To: Richard Ostler <***@***.***>
Cc: Agricultural Data Interest Group (IGAD) <***@***.***-groups.org>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Dear Richard,
All
Will not get into the SKOS or OWL debate. Depending on what you want to model, SKOS is a good starter than ultimately you might need to pass to OWL for more expressivity.
Could you tell us more about :
but experiment names, fields and plots are not represented by any controlled vocabularly.
Can you list exactly the terms/concepts that you will be interested in modeling? Is the specificity of these experiments the fact that they are long-term only ?
Because, I have the feeling you can certainly attach this to other existing vocabularies out there…
You might consider using the AgroPortal Recommender which could help identify what are the ontologies that have good coverage over a certain list of terms.
In case you decide to create your own new URIs (because you really have to say something specific about them), we can help you map your new vocabulary to existing ones in AgroPortal (we have someone now in the team, Elcio, who recently joined us to support users into this process). And of course, host it in AgroPortal.
Will be in Berlin too, to discuss.
Clement
-------------------------------------------------------------------------------------------
Dr. Clement JONQUET - PhD in Informatics - Assistant Professor
University of Montpellier
http://www.lirmm.fr/~jonquet
------------------------------------------------------------------------------------------
Le 16 févr. 2018 à 01:20, RichardOstler <***@***.***> a écrit :
Hi all,
I'm involved in a project to establish a global network for long-term agricultural experiments (experiments running for >10 years). One aim of the project is to improve access to and interoperability of datasets from these experiments. In most cases I think existing ontologies/vocabularies can be used, but experiment names, fields and plots are not represented by any controlled vocabularly. Clearly creating a concept ID for a name means an experiment, field or plot can be umabiguously referenced, e.g. in DOI metadata.
I think a SKOS is an appropriate model since it can capture:
* Preferred and alternative names for experiments, experiment sites/fields and plots
* Capture relations between a field/site and experiments conducted there on that field
* Capture relations between experiment plots and an experiment and field and relations between plots (where they've merged or been split into sub-plots).
I would appreciate this groups thoughts on the following questions:
1. Is it better to create a new vocabulary specific for long-term agricultural experiments or add to an existing agriculture domain vocabulary (e.g. Agrovoc, GACS)?
2. If we build a new vocabulary where is the best place to host it? Agroportal?
3. If successful the vocabulary would have information contributed from many institutes, any advice on collecting data and curating this?
I will be at the Berlin meeting so happy to discuss this and the network further there.
thanks
Richard
--
Full post: https://www.rd-alliance.org/group/agricultural-data-interest-group-igad/...
Manage my subscriptions: https://www.rd-alliance.org/mailinglist
Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/58922
Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038.
--
Full post: https://www.rd-alliance.org/group/agricultural-data-interest-group-igad/...
Manage my subscriptions: https://www.rd-alliance.org/mailinglist
Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/58922
Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038.
Dear agrisemantics members,
I am transfering this most interesting discussion which started on the IGAD mailing list. Sorry for multiple posting
Sophie Aubin - Inra - DIST
-------- Message d'origine --------
De : ElizabethArnaud <***@***.***>
Date : 21/02/2018 12:59 (GMT+01:00)
À : RichardOstler <***@***.***>, ***@***.***, "simon.cox" <***@***.***>, "Agricultural Data Interest Group (IGAD)" <***@***.***-groups.org>
Cc : valeria pesce <***@***.***>, Jonquet <***@***.***>
Objet : Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Just a quick naïve remark:
Why not to submit to the reference Unit Ontology (UO) all missing concepts that are the WUR Ontology of Unit of Measurement (UM)? UM terms would then get a UO ID. Additionally, Unit Ontology is available on OBO-foundry ( http://www.obofoundry.org/ontology/uo.html) which is the reference source of ontologies used by OLS through the API, then integrated concepts would automatically appear in OLS.
Elizabeth
From: <***@***.***-groups.org> on behalf of RichardOstler <***@***.***>
Date: mercredi 21 février 2018 12:33
To: "***@***.***" <***@***.***>, "simon.cox" <***@***.***>, "Agricultural Data Interest Group (IGAD)" <***@***.***-groups.org>
Cc: "***@***.***" <***@***.***>, "***@***.***" <***@***.***>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Thanks for the tips.
So far I’ve mostly been using ontology lookup service and agroportal, and have been selecting measurement concepts from Unit Ontology (https://github.com/bio-ontology-research-group/unit-ontology) and Crop Ontology. However, I have found a few gaps with older imperial measures (e.g. hundredweight) which the Wageningen unit ontology does include, so it’s a shame it isn’t listed/searchable on OLS and Agroportal - I’ve also checked bioportal and fairsharing and it isn’t listed on those either.
Thanks
Richard
From: ***@***.*** [mailto:***@***.***]
Sent: 21 February 2018 08:20
To: simon.cox <***@***.***>
Cc: valeria pesce <***@***.***>; Richard Ostler <***@***.***>; ***@***.***; ***@***.***-groups.org
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
You may also be interested in ontologies for units of measurement, like http://www.wurvoc.org/vocabularies/om-1.6/ .
Sent via Webmail interface
________________________________
From: "simon.cox" <***@***.***>
To: "valeria pesce" <***@***.***>, "richard ostler" <***@***.***>, ***@***.***, ***@***.***-groups.org
Sent: Wednesday, 21 February, 2018 4:07:17 AM
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Also consider the w3c SSN ontology for sampling and observations.
w3.org/TR/vocab-ssn
________________________________
From:valeria.pesce=***@***.***-groups.org <***@***.***-groups.org> on behalf of valeriapesce <***@***.***>
Sent: Tuesday, 20 February 2018 11:50:40 AM
To: RichardOstler; Clement Jonquet; Agricultural Data Interest Group (IGAD)
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Dear Richard,
Interesting task.
I would second Armando’s recommendation that for your use case you might need an ontology / model / schema for agricultural experiments rather than a thesaurus (you can have a thesaurus for the controlled values of specific properties).
It’s also true that I’ve seen cases in which people use SKOS to create a list of properties that they need to describe something, but I don’t think it’s the best use of SKOS.
And I would also highlight what Armando says about distinguishing between vocabularies and datasets: you will probably need vocabularies for the experiment description structure (a schema or an ontology) and for controlled values, but the data that you expect will be contributed would be the actual instances of the experiments data, which would ideally go into a dataset, with records structured according to the vocabularies you designed.
The only suggestion I would add is to look at what has already been done in a few projects (which you may already know):
- The ICASA data standards for field experiments: https://dssat.net/data/standards_v2. I think the initial standard was a specification with possible serializations in CSV and XML. It provides a model and a list of variables. From what I understand there were plans to render ICASA in RDF in the TERRA-REF project (http://terraref.org/about/) and the ICASA data dictionary is also being mapped to various ontologies as part of the Agronomy Ontology project (http://www.obofoundry.org/ontology/agro.html).
- Projects on crop modelling and sharing experiment data (no RDF yet, but useful models to base your RDF upon): AgMIP (in which they’re reusing the ICASA variables: http://research.agmip.org/display/dev/ICASA+Master+Variable+List), APSIM: http://www.apsim.info/, CGIAR AgTrials (http://www.agtrials.org/).
- Useful RDF vocabularies to build upon or reuse: the Crop Research Ontology: http://agroportal.lirmm.fr/ontologies/CO_715; AgroRDF: http://data.igreen-services.com/agrordf.
Hope this helps a little.
Best regards,
Valeria
- Show quoted text -From:richard.ostler=***@***.***-groups.org [mailto:***@***.***-groups.org] On Behalf Of RichardOstler
Sent: Tuesday, February 20, 2018 12:45 PM
To: Clement Jonquet <***@***.***>; Agricultural Data Interest Group (IGAD) <***@***.***-groups.org>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Hi Clement,
Thanks for your reply. I think more background would be useful for explaining my motivation.
At Rothamsted we have several agriculture long-term experiments (e.g. Broadbalk winter wheat, Park grass, Hoosfield spring barley and more). Globally, with other agricultural research institutes, we’re starting to identify other long-term experiments and develop a better network for utilising them – there is growing interest in meta-analyses across diverse LTEs for questions on yield sustainability/sustainable intensification and natural capital.
From our initial work it is apparent there are not any metadata standards or recommendations being widely used in his community.
Some defining characteristics of agriculture long-term experiments are:
1. They run for many years (decades) and an experimental plot will receive the same treatment regime over that time
2. The experiments and their plots have names or identifiers which persist over time, however they can be known by different names in the literature.
3. The experiment design can be modified (e.g. plots split or renamed, treatments modified)– this is a particular problem for older experiments with less robust/modern statistical designs
4. Experiments can be used for research beyond their original purpose and new, but related, datasets generated. For example, our primary datasets are yield, but additional datasets on soils, biodiversity and phenotypic traits have been generated. These can all be linked to an experiment plot but currently there isn’t a recommendation on persistent identifiers for doing this.
For point 2 above, a gazetteer-like controlled vocabulary of Long-term experiment, field and plot names could provide a resource for consistently naming these things and semantically tagging resources referencing them. It could also capture alternative names and basic relationships between different classes. – e.g. Plot A is part of Experiment X. The main classes would be site/field, experiment, plot. Potentially you could expand this into an ontology to capture other relationships for example, Experiment X is a Rotation Experiment.
In fact I think the experiments, plots and fields should be assigned persistent identifiers and have relevant metadata captured. e.g. location, geographic & climate for the site; area, soil characteristics, treatment group for a plot; treatments; standard management & cropping for the experiment.
I’ve looked closely at the DEIMS-SDR metadata models and MIAPPE and I don’t think either is quite sufficient to represent metadata for agricultural LTE datasets. I like the MI checklist approach and recommended ontologies taken by MIAPPE and I think existing ontologies can be used for much of the checklist detail – I think what is needed is an MI checklist relevant to agriculture LTEs.
Thanks
Richard
From: Clement Jonquet [mailto:***@***.***] On Behalf Of Clement Jonquet
Sent: 20 February 2018 06:13
To: Richard Ostler <***@***.***>
Cc: Agricultural Data Interest Group (IGAD) <***@***.***-groups.org>
Subject: Re: [rda-agrdatainterop-ig] Advice on new controlled vocabulary
Dear Richard,
All
Will not get into the SKOS or OWL debate. Depending on what you want to model, SKOS is a good starter than ultimately you might need to pass to OWL for more expressivity.
Could you tell us more about :
but experiment names, fields and plots are not represented by any controlled vocabularly.
Can you list exactly the terms/concepts that you will be interested in modeling? Is the specificity of these experiments the fact that they are long-term only ?
Because, I have the feeling you can certainly attach this to other existing vocabularies out there…
You might consider using the AgroPortal Recommender which could help identify what are the ontologies that have good coverage over a certain list of terms.
In case you decide to create your own new URIs (because you really have to say something specific about them), we can help you map your new vocabulary to existing ones in AgroPortal (we have someone now in the team, Elcio, who recently joined us to support users into this process). And of course, host it in AgroPortal.
Will be in Berlin too, to discuss.
Clement
-------------------------------------------------------------------------------------------
Dr. Clement JONQUET - PhD in Informatics - Assistant Professor
University of Montpellier
http://www.lirmm.fr/~jonquet
------------------------------------------------------------------------------------------
Le 16 févr. 2018 à 01:20, RichardOstler <***@***.***> a écrit :
Hi all,
I'm involved in a project to establish a global network for long-term agricultural experiments (experiments running for >10 years). One aim of the project is to improve access to and interoperability of datasets from these experiments. In most cases I think existing ontologies/vocabularies can be used, but experiment names, fields and plots are not represented by any controlled vocabularly. Clearly creating a concept ID for a name means an experiment, field or plot can be umabiguously referenced, e.g. in DOI metadata.
I think a SKOS is an appropriate model since it can capture:
* Preferred and alternative names for experiments, experiment sites/fields and plots
* Capture relations between a field/site and experiments conducted there on that field
* Capture relations between experiment plots and an experiment and field and relations between plots (where they've merged or been split into sub-plots).
I would appreciate this groups thoughts on the following questions:
1. Is it better to create a new vocabulary specific for long-term agricultural experiments or add to an existing agriculture domain vocabulary (e.g. Agrovoc, GACS)?
2. If we build a new vocabulary where is the best place to host it? Agroportal?
3. If successful the vocabulary would have information contributed from many institutes, any advice on collecting data and curating this?
I will be at the Berlin meeting so happy to discuss this and the network further there.
thanks
Richard
--
Full post: https://www.rd-alliance.org/group/agricultural-data-interest-group-igad/...
Manage my subscriptions: https://www.rd-alliance.org/mailinglist
Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/58922
Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038.
--
Full post: https://www.rd-alliance.org/group/agricultural-data-interest-group-igad/...
Manage my subscriptions: https://www.rd-alliance.org/mailinglist
Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/58922
Rothamsted Research is a company limited by guarantee, registered in England at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a not for profit charity number 802038.