Research Data Collections WG Activity Overview Minutes from yesterday’s call

Minutes from yesterday’s call

Creator

Discussion
June 1, 2016 at 1:54 pm #122075

Tobias Weigel
Member

Dear all,
here are the merged minutes from yesterdays’ group call. The next call
will take place in 2 weeks at the usual timeslot: Tuesday June 14, 13:00
GMT.
Best, Tobias
Attendees: Frederik, Ulrich, Christopher, Tobias
Notes:
* Formal definitions – set theory:
o potentially use ADT as intermediate step between model and
implementation
o Look into Haskell docs, use that to get to a small core def for
sorted/unsorted, multimembership/unique
* Models are essentially a set of attributes and rules about them
(plus operations)
o Traits are then useful because they group things together and
make it simpler to explain what a certain formal concept (like
sorting) implies in terms of practically useful properties and
methods
o The clone operation (i.e. cache a snapshot of elements in
recursive collections) is useful because it may solve the
complexity issues that arise once we have recursion
o for citation use case: the copy/clone method (i.e. making
snapshots) may solve the issue that collections whose deep
membership changes may not remain citable
+ citation use case is present in many communities, and we
need to cover their differences. Snapshotting may not be
possible for everyone.
+ but we may figure out a way to compress the snapshotting
process so we can reconstruct a snapshot on request.
o for very dynamic data we may want to state that there are
policies that must be observed (basic versioning)
+ Christopher Harrison: use case with high volume and lots of
change every day – not possible to statically snapshot it –
need a UC description and see where the gaps are wrt the
formal model we end up with
* Collection API – what about a member API? Such an API would answer
e.g. “which collection(s) does this object belong to?” – different
from scope of current swagger API
o Might be solvable through PIT + registered property, but not
sure whether this is the only action of that API
o Close to a global search, so costly/difficult – might not be
implementable by all use cases, but perhaps relevant across
multiple RDA use cases
o Ulrich: nice if every member of a collection has a pinned PID to
its parent; but only feasible for static collections inside a
repository
+ across repositories, you will need a crawler that then
creates a graph; the query however then deviates to “which
subcollection(s) of a given (entry point) collection does
this object belong to?” which actually will lead to a two
parameter function
SubcollectionsMemberBelongsTo(member,collection) – we can
include this in the hierarchy trait
+ actually, the pointer to parents is a collection in itself
o Christopher: Subscription model could help with dynamic data as well
o Frederik: RSS or blog pingback might provide useful ideas and
infrastructure
—
Tobias Weigel
Abteilung Datenmanagement
Deutsches Klimarechenzentrum GmbH (DKRZ)
Bundesstraße 45 a • 20146 Hamburg • Germany
Phone: +49 40 460094-104
Email: ***@***.***
URL: http://www.dkrz.de
ORCID: orcid.org/0000-0002-4040-0215
Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784
Creator

Discussion

Author

Replies
June 3, 2016 at 1:42 pm #132970

Thomas Zastrow
Member

Dear all,
… as I wasn’t able to participate in the last videomeeting and because
we want some discussion on the mailing list … and weekend is coming …
I have the feeling we try to say too much about the individual items
inside a collection. From my perspective, *anything* which has an adress
can be an item inside of a collection. But that means it is difficult to
say anything about the item itself from the collections perspective. And
it is not necessary: there is the PIT API or similar approaches.
a)
It is not necessary that an item has the capability of storing parent’s
information – nor the creator of a collection maybe doesn’t have write
permissions on items he/she is adding to a collection. I also don’t know
any programming language where you can ask an object: “Tell me to which
collections you belong to”. Even in the closed namespace / memory area
of an application this is not realized, how should that work on a higher
/ global level?
b)
I’m not sure if we really should care about a differentiation between
dynamic / static collections wrt the items a collection contains. A
collection itself can be dynamic or static – I agree. But we can’t say
*anything* about the items inside of a collection. Maybe there are items
which are defined as a rule like “Give me the last measurement”. I don’t
see a way how such an item could tell the collection “Today I’m giving
back other results then yesterday”. It could also be that such an item
doesn’t exist anymore – we don’t have a “wayback” channel and yes, I’m
not talking only about PIDs here. If static just means that the
collection cannot be changed anymore – thats fine.
c)
I’m not sure if I understood the “trait” thing (and I never heard about
something like that before): it is about collecting properties/functions
in to a functional group? Do we really need this level of abstraction?
Best,
Tom
June 6, 2016 at 9:20 am #132965

Tobias Weigel
Member

Hello Tom,
I concur that we may allow anything that has an address to be inside a
collection, however, there may be an important side condition: That
either the agent adding the item to a collection or the one responsible
for managing the collection can make a realistic claim about the item’s
current life cycle state and expected development. There is not a clear
distinction here, which is probably the cause for many of our problems.
Do we want to be totally arbitrary regarding the items? Or should the
benefit of using a collection API rather be that you *can* assume that
some essential information about item status will be available? I don’t
think we have a clear take on this yet.
On your item a) – parts known what they are part of – the best example
for such cases are trees, where you would be unable to traverse otherwise.
On b) – I think you are right and there may be a line here that we do
not want to cross wrt the “backchannel” from item to collection you
explained. A collection should specify whether its constituency is
dynamic or static, but it is probably too difficult to answer this by
redirecting to individual items and leave the answer up to them.
On the traits: It is in principle as you describe, gathering properties
and methods into flexible “chunks” that can be recombined and their
recombination may give rise to other special methods. I like the model
because it at least circumvents some of the issues with multiple
inheritance. I am currently sticking with it because it is very
flexible, but I am also not sure if this will be reflected in the API at
the end. Traits-based programming [1] is probably not the most
accessible paradigm and I’m not completely sure if this is the right
description for what’s currently in the document.
Best, Tobias
[1] https://en.wikipedia.org/wiki/Trait_%28computer_programming%29
——– Original Message ——–
Subject: Re: [rda-collection-wg] Minutes from yesterday’s call
From: ThomasZastrow

To: TobiasWeigel , RDA Collections WG

Date: 03 Jun 2016, 15:42
June 6, 2016 at 1:45 pm #132955

Bridget Almas
Member

On 06/06/2016 05:20 AM, TobiasWeigel wrote:
> Hello Tom,
>
> I concur that we may allow anything that has an address to be inside a
> collection, however, there may be an important side condition: That
> either the agent adding the item to a collection or the one
> responsible for managing the collection can make a realistic claim
> about the item’s current life cycle state and expected development.
> There is not a clear distinction here, which is probably the cause for
> many of our problems. Do we want to be totally arbitrary regarding the
> items? Or should the benefit of using a collection API rather be that
> you *can* assume that some essential information about item status
> will be available? I don’t think we have a clear take on this yet.
>> The latter is what I have been assuming, and without it it would be

On 06/06/2016 05:20 AM, TobiasWeigel wrote:
> Hello Tom,
>
> I concur that we may allow anything that has an address to be inside a
> collection, however, there may be an important side condition: That
> either the agent adding the item to a collection or the one
> responsible for managing the collection can make a realistic claim
> about the item’s current life cycle state and expected development.
> There is not a clear distinction here, which is probably the cause for
> many of our problems. Do we want to be totally arbitrary regarding the
> items? Or should the benefit of using a collection API rather be that
> you *can* assume that some essential information about item status
> will be available? I don’t think we have a clear take on this yet.
>> The latter is what I have been assuming, and without it it would be
hard for me to justify the value of the collections API to our use cases.
>
> On your item a) – parts known what they are part of – the best example
> for such cases are trees, where you would be unable to traverse
> otherwise.
>
> On b) – I think you are right and there may be a line here that we do
> not want to cross wrt the “backchannel” from item to collection you
> explained. A collection should specify whether its constituency is
> dynamic or static, but it is probably too difficult to answer this by
> redirecting to individual items and leave the answer up to them.
>
> On the traits: It is in principle as you describe, gathering
> properties and methods into flexible “chunks” that can be recombined
> and their recombination may give rise to other special methods. I like
> the model because it at least circumvents some of the issues with
> multiple inheritance. I am currently sticking with it because it is
> very flexible, but I am also not sure if this will be reflected in the
> API at the end. Traits-based programming [1] is probably not the most
> accessible paradigm and I’m not completely sure if this is the right
> description for what’s currently in the document.
>
>> I think the traits are what I had previously been thinking of as
June 6, 2016 at 3:47 pm #132951

Tobias Weigel
Member

Hello Bridget,
well said – the ability to retrieve essential item status and membership
information through the collection API is a deciding feature for many
use case providers, including yours and also ours, actually.
Best, Tobias
——– Original Message ——–
Subject: Re: [rda-collection-wg] Minutes from yesterday’s call
From: balmas
To: ***@***.***-groups.org
Date: 06 Jun 2016, 15:45
Author

Replies

Research Data Collections WG

Group Organizers

Minutes from yesterday’s call