Data Foundation and Terminology WG Activity Overview RDA Case Statement Data Foundation and Terminology (DFT)

RDA Case Statement Data Foundation and Terminology (DFT)

Creator

Discussion
May 29, 2013 at 9:40 am #138875

RDA Admin
Member

v16

Work Group (WG) Charter

The DFT WG task is to describe a basic, abstract data organization model which can be used to derive a reference data terminology that can be used:

• across communities and stakeholders to better synchronize conceptualization,

• to enable better understanding within and between communities and finally

• to stimulate tool building, such as for data services, supportive of the basic model’s use.

We assume that this abstract data organization model will focus on common building blocks and their characteristics, along with relevant protocols. It will not specify the technologies which may be used for subsequent, rapid innovations based on the model.

The WG will

• write a reference document about DFT,

• create an accompanying abstract data organization model that may be also expressed graphically,

• register the defined terms in an ISO-like concept registry so that everyone can easily refer to them and

• support the above by seeking to engage many communities and stakeholders in the document, terminology and model creation as well as establishing contact with established, relevant communities such as W3C, librarians, etc.

This phase of work will be finished in about 15 months; however, some ongoing work will be necessary to adapt and sharpen the concept definitions reflecting ongoing DFT discussions and interactions with new communities that are interested in this harmonization work and with any resulting technical innovations.

Value Proposition

There is substantial and general value for research data communities having access to more standardized data vocabularies reflecting the same definition for the same terms. This helps ensure that conversations are meaningful and that people aren’t talking past each other. Shared vocabularies establish a common ground for interactions such as adoption of common data sharing practices and interoperation and help avoid duplication of effort. Proper data organization will be enabled by agreeing upon a number of basic concepts and their relationships as well by explicitly defining and registering appropriate terms along with alternative views of them.

Taken together common vocabularies create awareness about basic considerations and will allow the data community to improve mutual understanding and thus enable easier data sharing and interoperability. In doing so it can play a role similar to what was accomplished by the development of the Internet protocol many years ago which established a standard language of networking, which in turn:

• sharpened the global understanding of the need for systematic relations of the various protocol layers and

• enabled basic protocol notions to be realized by such things as IP and TCP.

The expectation is that a systemization of the already large body of definition work on data management terms will boost a common understanding of data organizations and their efforts. This will in turn help the RDA community to find common building blocks, describing their properties and defining data process protocols related to them. Relevant work includes the experience of the cross-disciplinary data infrastructure project EUDAT for example which found that iteratively using a proper conceptualization for a joint reference terminology helped boost subsequent discussions which lead towards solutions.

As was the case of the Internet and its protocol we assume that converging on a simple and clear basic model will finally boost new types of layered services on top of a layered, basic model and convince software developers worldwide to add modules that make use of the emerging building blocks. We need to establish trust of all stakeholders involved in data related activities that we agree across communities in a few essentials.

When we speak about “data” in this document we should note that not only “static data” is included but also “dynamic data” where data objects are encapsulated by methods that need to be invoked to yield the targeted data. Although this is not the first priority in the group we also need to look at procedures which are just another type of data objects.

Engagement with Existing Work

Considerable work on data objects and conceptualizing around these objects has been done by different persons and initiatives, such as referenced below. In the first phase of the work more research needs to be done to integrate these and identify other initiatives that may contribute to this discussion. Therefore the following list can only be indicative:

• Kahn & Wilensky created a paper describing a framework for distributed digital object services. The original version was written in the 1994-1995 time frame and was subsequently re-issued by Springer in 2006 (with an added preface by the authors to explain the history of the paper) as the lead article in a special issue on complex digital objects in the International Journal on Digital Libraries.

• Moore worked on collection properties and architectures modifying state information.

• Wittenburg, Lautenschlager and Broeder analysed the data organizations of about 15 communities and came to some common abstractions.

• Lannom gave a talk at the Copenhagen meeting to speak about layers.

• Kahn gave a talk at the EUDAT conference pointing to abstract building blocks.

• other groups from the library and information science communities also serve to inspire discussion such as (Palmer et al., 2012)

• a note called “Data Foundation and Terminology” has been worked out recently

In addition we need to synchronize our work with those of closely related RDA working groups such as those working about PIDs, registries and metadata.

Below is a n initial and tentative list of relevant contributions to this work.

References:

1. Stephen Abrams, Sheila Morrissey, Tom Cramer,“What? So What: The Next-Generation JHOVE2 Architecture for Format-Aware Characterization”, 2009, Vol. 4, No. 3, pp. 123-136

doi:10.2218/ijdc.v4i3.122

2. CH. Blanchi, J. Petrone, “An Architecture for Digital Object Typing”, http://www.cnri.reston.va.us/software/repository/repo-whitepaper.pdf

3. Robert Kahn, Robert Wilensky, “A framework for distributed digital object services” , International Journal on Digital Libraries (2006) 6(2): 115–123

DOI 10.1007/s00799-005-0128-x

http://www.doi.org/topics/2006_05_02_Kahn_Framework.pdf

4. Robert Kahn, “An Open Architecture for Managing Information in the Internet”, EUDAT Conference, Barcelona, Oct. 23, 2012;

http://eudat.eu/system/files/B/Kahn.pdf

5. Robert Kahn, Daan Broeder et.al., “Data Foundation and Terminology – Basic Concept Note V2”, December 2012,

http://forum.rd-alliance.org/viewtopic.php?f=2&t=20&sid=fbed667acd0fc740…

6. Larry Lannom, “DAITF: Enabling Technologies”, Pre-ICRI DAITF Workshop, Copenhagen, 21 March 2012; http://www.daitf.org/?p=48

7. Moore, R. W., “Automating Data Curation Processes”, NSF workshop on “Curating for Quality”, September 2012, Arlington, VA.

8. Carole L. Palmer, Tiffany C. Chao, Nicholas M. Weber, Simone Sacchi, Karen M. Wickett, Allen H. Renear, Karen Baker, Andrea Thomer, and David Dubin. “Integrating conceptual and empirical studies of data to guide curatorial processes.” Presented at the 2012 ASIS&T Research Data Access and Preservation Summit, March/April 2012. (work within the Data Conservancy – http://dataconservancy.org/)

9. S. Payette, Ch. Blanchi, C. Lagoze, E. Overly, “Interoperability for Digital Objects and Repositories”, http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/may99/payette/05payette.html

10. EPIC/DataCite/Handle Flyer: “High-Availability, Complementary Infrastructures for Persistent & Unique Identifiers for Data Objects & Published Collections based on Handle System”, March 2012

11. various RDA initiatives: http://forum.rd-alliance.org/viewforum.php?f=2

12. P. Wittenburg, D. Broeder, M. Lautenschlager, “DAITF Preparation-Note”,

http://www.daitf.org/wp-content/uploads/2012/06/DAITF-Preparation-Note-v…

13. “Managing Access to Digital Information”, http://www.xiwt.org/documents/ManagAccess.html

14. OAIS Model, http://public.ccsds.org/publications/archive/650x0m2.pdf

15. Two papers from Biochem area:

http://www.biomedcentral.com/1471-2105/12/487

http://xpdb.nist.gov/chemblast/pdb.pl
Creator

Discussion

Data Foundation and Terminology WG

Group Organizers

RDA Case Statement Data Foundation and Terminology (DFT)