Conceptualizing Metadata for User-Created Data in History & Ethnography

This is a place to paste in references and commentary on metadata models and practices in the humanities and social sciences (with a focus on history and ethnography).  You can add quotes from articles, questions they raise, partially formed analysis; this is mostly meant to spark and store our own thinking, to be cleaned up later and extracted into more polished documents.  Scroll to the bottom and paste in new commentary, or add to existing commentary where appropriate. 

 
Dan Price: I have spent some time trying to think through ways of making more flexible ontologies, so that you're not bound to the initial ontology choices as you build your data archives - but I guess I wanted to think about something even more specific. It seems to me that there's a practical side of the argument (which is how I would try to pitch the idea of flexible metadata) and then there's a critical side (which is, in our age, something like reflexive practice as opposed to simply not doing anything). Under the rubric of reflexive practice, I think you need metadata categories that say something like - "used as contesting existing categories"; or "placeholder for something not yet sayable"; or "kinda in between these two known things&quot! ;. I know some folks have talked about that, but most of the OWL semantics still seem to me to be attempts to systematize what has already been said, as opposed to reflect on what can emerge from trying to speak more attentively to the future context/choices.
 
 
Kendall Roark: Just wanted to follow-up on the discussion today around metadata schema for ethnography and history and DanPrice's suggestion that we take a more critical approach to focus on controlled vocabularies.  After a little bit of thought, I realized that this discussion reminded me of Anne Gilliland-Swetland's 2000 report for CLIR, where she describes the different histories and trajectories for archival and library traditions in North America. I believe as someone trained in an American four-field anthropology tradition I am much more comfortable with the archival approach with it's emphasis on context  or the more problematic idea of the "organic nature of records". Seems like this group especially can offer (and draw upon critical traditions in history, anthropology and achives) to talk about what metadata is needed to place a digital object in context alongside what might make it discoverable and available for access / re-use! by a secondary researcher. I am starting to see this come up as well in other scientific disciplines that are re-examining annotation practices and metadata for preserving context. A recent example of this is the LCPD2014 conference presentation by Laura Slaughter, et. al. on "living systematic reviews" for clinical trials.
1. pub89. Enduring Paradigm, New Opportunities: The Value of the Archival Perspecti! ve in the Digital Environment

Anne J. Gilliland-Swetland (2000, 43pp), Washington, D.C., Council on Library and Information Resources, ISBN 1-887334-74-2, Available at: http://www.clir.org/pubs/reports/pub89/

 
2. Enabling Living Systematic Reviews and Clinical Guidelines Through Semantic Technologies, LCPD2014 presentation,
Laura Slaughter, Christopher Friis Berntsen, Linn Brandt and Chris Mavergames (2014), Available at:
http://prezi.com/2stwnxh-qxl9/living-systematic-reviews/ (see also transcription of presentation tab)
 
 

Jason Jackson: From Jason: museum people use Provenance in a related but not totally the same way as those seemingly at the center of this work.  Museum provenance refers to chains of custody. Here is the definition in Museum Registration Methods (5th Edition), the standard source in the English-speaking world for such stuff.

"Provenance: For works or art and historical objects, the background" and history of ownership. The more common term for anthropological collections is “provenience” which defines an object in terms of the specific geographic location of origin. In scientific collections, the term “locality,” meaning specific geographic point of origin is more acceptable.” [p. 479]

 

Sample uses made up by me….

 

The provenance file for the painting tells us that John Smith purchased it from the artist in 1945. In his estate, it was inherited by Sally Smith, who then donated it to the museum in 1981.

 

While it is not well documented, the reliable clues regarding the provenience for this basket indicate that it is from a mission-contacted village in the New Guinea highlands. This view is supported by stylistic analysis and identification of the raw materials out of which it is made.

 

The quoted source is:

Buck, Rebecca A., and Jean Allmam Gilmore, eds. Museum Registration Methods, 5th Ed. AAM Press, 2010.

This relates to the broader interests that the ethnographic wing of DHPE would have. For instance, I know people who have inherited and actively used other people’s field notes (etc.), with the notes only entering an archive after the second round of use. This is common when a person chooses a literary executor because this person is best prepared to understand, use, and know how to eventually archive their original materials. In such cases, it would be important to preserve such information, especially if the secondary ethnographer might (as is likely) modify or iterate the documents. For instance, imagine ethnographer A (phonetically) records the local language names of 50 birds at a time in which no standard orthography exists. Ethnographer B (student of ethnographer A, for instance) inherits these notes and, while doing new work, pencils in the now standard local language forms using the now agreed upon/accepted orthography (and up to date linguistic knowledge). All of this matters in both a paper and file folders world and in a digital one.

 

A variant is when, as is common with me, I am given parts of someone’s fieldwork materials in a reproduced (duplicated) form and these take a place in my larger research corpus. I curate such materials in paper form, photographs, audio recordings, etc.

 

Also from Jason

Picking up on Kim’s thread. I agree that we would do well, in-group, to think about metadata needs and metadata traditions presently in use. Such a conversation should move up to the IG level as soon as we have a clear idea ourselves as to how to host it. In a getting our bearings phase, let me offer a few notes of possible interest.

 

The following applied to the museum object corner of our space. It will be harder work to do the key task of making explicit implicit practices in other corners of our realm (field photography, found documents, etc.).

 

In a ethnographic museum case, what I will describe now could have been true for the other museums where I have worked. It comes in actuality out of work currently underway at the MMWC.

Like (all/) most museums, the MMWC has an idiosyncratic implementation of “customary” cataloging and accessioning practices. Its current catalog system is built upon older iterations of a once new system. That new system was inspired by unknown older museums, but it has its own quirks. These are usually the product of the original insights of some founding curator and some preconceived notions of what the collection is or will be like in the future and of the needs of future users. Many middle aged museum systems have smart numbers and other elements designed to encode extra metadata into simple elements. These are almost all despised by those who inherited them.  An example from the Gilcrease Museum where I once worked is a prefix in the catalog number that told you whether an object came from east or west of the Mississippi River.  Which turn out to be wonderful when cataloging objects from, say, Brazil.

At MMWC our idiosyncratic system is not too bad as these things go.  Our work over the last few years has involved mapping (building crosswalks) our unique fields into Dublin Core fields. The first place where we have needed to accomplish this task is in our use of Omeka to build digital exhibitions on the basis of our collections. In this example, we have to export data out of our collections database (Filemaker files to Comma Separated Value Files into Omeka import).

For Dublin Core basics, see: http://dublincore.org/metadata-basics/

For the Omeka tool that makes Dublin Core “Extended” work see: http://omeka.org/codex/Plugins/DublinCoreExtended_2.0 

When we do this, because of the public nature of the target outcome (an object in our Omeka site), some information is purposefully filtered out in route from our in-house database to the public content space. A clear example of this would be insurance value or some other information of back house relevance that we are not wanting to share widely.  Another kind of transformation is that we might filter out information about which we do not have sufficient confidence to share—a speculative attribution for instance.  Our database may record that someone “thinks” a pottery bowl is from Uganda or Santa Clara pueblo on the basis of style or some other curatorial skills-based analysis, but we might not want to commit to that publicly. We might not export a whole category of data for this reason, we might massage it on a single object basis, or we might apply a more general description across a group of objects. (If we were confident about Africa universally but not confident about Country, County, City, etc. of origin, we might attend to this as the data moves.

This trial and error work is easier in the Omeka case because it is done on a small, exhibition level basis and it does not effect us wholesale. It is teaching us things that we will probably go back and deal with systemically in our underlying collections database as a whole. As when Dublin Core Exended provides for a field we now know we need but that was absent in our ancestral database scheme (DC: "Rights Holder", for instance).

 

(Something to know about how Omeka works with Dublin Core extended…  It does not present empty fields. This example object http://dlib.indiana.edu/omeka/mathers/items/show/288 shows a small amount of metadata because we did not bring much over.) We are not using Omeka for our catalog, just as a means of doing exhibitions. The content in this case is more like an exhibition label and less like a full database item.)

 

We have not gone there yet, but our Omeka site should be harvestable by Open Folklore and cognate efforts.

 

From Brandon:

Meta data research reading, annotations, excursions

 

 

“Seeing Standards: A visualization of the Metadata Universe” (http://www.dlib.indiana.edu/~jenlrile/metadatamap/) contains a very interesting poster of the different “cultural heritage” metadata standards , produced by Indiana University Libraries:

 

Jenn Riley called herself (or was called) a “metadata librarian” working within the Indiana University Digital Library Program, starting in 2004. Then in 2010 Head of the Carolina Digital Library and Archives at UNC Chapel Hill. Now associate dean of digital initiatives in the McGill university library.

 

Her blog, the Inquiring Librarian was active 2005 – 2010.

 

“The sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation. This visual map of the metadata landscape is intended to assist planners with the selection and implementation of metadata standards. Each of the 105 standards listed here is evaluated on its strength of application to defined categories in each of four axes: community, domain, function, and purpose. The strength of a standard in a given category is determined by a mixture of its adoption in that category, its design intent, and its overall appropriateness for use in that category. The standards represented here are among those most heavily used or publicized in the cultural heritage community, though certainly not all standards that might be relevant are included. A small set of the metadata standards plotted on the main visualization also appear as highlights above the graphic. These represent the most commonly known or discussed standards for cultural heritage metadata.”

 

So are we trying to develop our own to add to this huge list? What will our acronym be? [Freelinking: unknown plugin indicator "path"]

 

 

Maze´, Elinor. “Metadata: Best Practices for Oral History Access and Preservation.” Oral History in the Digital Age. Accessed September 15, 2014. http://ohda.matrix.msu.edu/2012/06/metadata/.

 

[Freelinking: unknown plugin indicator "path"]

 

Elinor Maze: Senior Editor at the Baylor University Institute for Oral History

 

The OHA lists principles for oral history and best practices for oral history:

1.    Interviewers, sponsoring institutions, and institutions charged with the preservation of oral history interviews should understand that appropriate care and storage of original recordings begins immediately after their creation.

1.    [Freelinking: unknown plugin indicator "path"]

2.    Interviewers should document their preparation and methods, including the circumstances of the interviews and provide that information to whatever repository will be preserving and providing access to the interview.

1.    [Freelinking: unknown plugin indicator "path"]

2.    data about how the data was collected, gets into provenance too.

3.    Total of 5…

 

It involves caring for oral history interviews in all their forms from the moment of their creation (or even before) into the indefinite future; it involves caring for originals and derivatives, maintaining their integrity and reliability, making them accessible through changes in technology for disparate uses and users, and, increasingly, adding value to them through application of both sophisticated and informal analytic tools.

 

Understanding metadata challenges oral historians to understand the greatly enhanced opportunities that digital documentation offers to add value to the interviews they record. Broadly understood, metadata makes possible the discovery of themes and meaningful relationships within interviews, among sets of interviews, and with other digitally represented resources. “

 

[Freelinking: unknown plugin indicator "path"]

 

Defining sets of terms—metadata elements—to document all of these disparate kinds of objects and their relationships to each other poses significant challenges to oral history curators. It requires access to technical expertise and analytical tools and processes from several disciplines, as well as acquaintance with a broad range of standards and best practices for metadata formulation, collection, and use. At the same time, because oral history must be accessible by users and researchers with a broad range of technical expertise and resources, curators are obligated to make access simple in spite of the complexities they must master to do so.

 

It is frequently asserted that metadata conventions, standards, and best practices are best governed by the communities which need and use the metadata.

 

Because oral history is a practice which spans many communities, both academic and popular, both professional and amateur, and of a wide range of size and resource endowment, generally-agreed-on metadata standards have not evolved.

 

 

“Functions of meta-data for oral history”

 

first category of function: “Creation, multiversioning, reuse, and recontextualization

 

Extracts of audio or video recordings, with or without transcripts, may be made accessible on Web pages, or included in documentary films, performances, or art works of various kinds.

 

 

They may be assessed by various linguistic, sociological, or other analytic tools that have little relation to the original historical interests of the interview project.

 

 

[Freelinking: unknown plugin indicator "relationship between computer-readable metadata? As </span><span style="font-family"]

 

third is Validation [Freelinking: unknown plugin indicator "path"]

 

 

 

 

Case study: interviewer-generated metadata.

Boyd, D. A. (2012). In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/interviewer-generated-metadata/.

 

 

 

Library of Congress METS

Metadata Encoding & Transmission Standard (http://www.loc.gov/standards/mets/)

Used by major libraries, including Indiana University.  

In Riley’s “Seeing Standards” visualization, METS is categorized as follows:

Communities:

Archives: semi-strong

Libraries: Strong,

Information Industry: Weak

Museums: weak

Domains:

Cultural Objects: Strong

Data Sets: Strong

Geospatial Data: Semi-strong

Moving images: strong

Musical materials: strong

Scholarly texts: strong

Visual resources: strong

Purpose:

Metadata wrappers: strong

Rights metadata: strong

Structural metadata: semi-strong

Function:

Record format: strong

 

In Riley’s visualization METS is one of about a dozen highlighted metadata standards with this associated word cloud: record format, structure standard, structural metadata, metadata wrappers, museums (very small word), libraries, information industry (small word), archives, cultural objects, datasets, geospatial data, moving images, musical materials, scholarly texts, visual resources.

 

Note METS’s recent announcement preparing for the “next gen” standards:

 

Announcing a METS Workshop at the Digital Libraries 2014 Conference in London, September 11-12: METS Now, and Then… Discussions of Current and Future Data Models

In this workshop, participants will develop an understanding of the data models underlying some canonical uses of the existing METS schema as a contextual basis for the description of a next generation METS (2.0) data model, participate in the refinement of the METS 2.0 data model being developed by the METS Editorial Board, and discuss options for serialization of the data model.

 

Read METS editorial board minutes (http://www.loc.gov/standards/mets/mets-boardnotes.html)? Their technical documentation? Example docs? Community building? What do their Tools look like? i.e. LIMB and YooLib. Good place to start: METS Presentation (overview) prepared by Karin Bredenberg of the Riksarkivet/Swedish National Archives. And METS Suggested Reading list (background readings)

 

 

 

 

Text Encoding Initiative (http://www.tei-c.org/index.xml)

A self-described consortium (a bit like CMAS? See Recursive Publics? “collectively develops and maintains a standard for the representation of texts in digital form.”)

A working group (consortium), emphasis added: “its chief deliverable is a set of Guidelines [with a capital G!] which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics” (History & Ethnography, specifically?)

 

According to Maze: “schemas define and specify the encoding standards for a particular type of XML document. For example, the TEI (Text Encoding Initiative) schema specifies how to mark up texts in the humanities and social sciences. TEI markup elements can show the structure of texts, identify subjects and themes, index key words and phrases, as well as identify such facts as authorship, publication, revision, and so on. Schemas are formally defined in a document type definition (DTD), a ‘machine-readable set of rules that specify how a particular metadata document such as a TEI or EAD (Encoded Archival Description, described in a following section) XML document – formally called an instance – is to be written’”

 

Maze’s article also states: “There are many resources on the Internet that provide guidance in the use of XML. One excellent general introduction is that provided in “A Gentle Introduction to XML,” provided online by the Text Encoding Initiative.[18]

 

In the “Visualizing the Metadata Universe” PDF, TEI is listed as one of the key metadata standards and is associated with (roughly strong à weak) Libraries, Markup language, scholarly texts, archives, record format, technical metadata, structural metadata, descriptive metadata, information industry, content standard, rights metadata and museums. So like METS, big in Libraries.

 

In the Other News section:

 

Freedman Center for Digital Scholarship Colloquium: Pedagogy and Practices

 

Open Access Repository Ranking

 

(Note the announcement here that Dublin Core will be hosting a Training the Trainers For Linked Data” workshop -- http://dcevents.dublincore.org/IntConf/index/pages/view/train

Abstract: Linked Data has gained momentum, and practitioners are eager to use its principles to derive more value from metadata. Available handbooks and training materials focus on an audience with a computer science background. However, people with a non-technical education find it hard to understand what Linked Data can mean for them. This full-day, hands-on workshop will provide an overview of methods and case studies from the handbook "Linked Data for Libraries, Archives and Museums" (2014, ALA/Neal-Schuman). Using freely available tools and data, this workshop will teach you how to clean, reconcile, enrich, and publish your metadata. Participants will learn about concepts, methods, and tools that they can use on their own, or to teach others within their own institutions, to get more value from metadata.)

 

 

Dublin Core Metadata Initiative (Element Set) http://dublincore.org/

Described by Maze as one of a number of Metadata Systems for Oral History.

Developed in the mid 1990s.

The 15 dublin core elements are: creator, contributor, publisher, title, date, language, format, subject, description, identifier, relation, source, type, coverage, rights.

The key to the metadata system’s flexibility and adaptability is that the elements are optional–one, all, or a selection may be used—and repeatable.

[Freelinking: unknown plugin indicator "path"]

Proprietary digital content management systems such as CONTENTdm and open-source, freely available ones such as Omeka, as well as such metadata systems and schema as Encoded Archival Description (EAD), Metadata Object Description Schema (MODS), and Text Encoding Initiative (TEI) have all been developed on a Dublin Core framework or are designed to be interoperable with it.

 

 

General questions:

·      Riley’s Viz

o   Are we trying to add another acronym/standard to something like Riley’s visualization?

o   How have the number, categorization of metadata standards (in the cultural heritage arena) changed since Riley’s visualization was produced in 2009-2010?

·      TEI

o   Are we, like the TEI, aiming to develop “Guidelines which specify encoding methods for machine-readable texts?”

·      A batesonian angle on meta data? The idea of scale seems at play, something about a different order of analysis… context. Like deutero-learning somehow? Reflexivity?

·      Pedagogical connections to meta-data? Meta-learning? Trajectory of “critical” thinking (in higher ed) starting post WWII as response to communism (critiquing the other but leaving our own ideologies unexamined?) and then more reflexive around the Vietnam War era?

·      Are we (do we want to be) developing a standard (Bowker, sorting things out, light structure, etc.)?

o   i.e.: AES standards of particular interest to those working with oral history metadata are: AES57-2011: AES standard for audio metadata – Audio object structures for preservation and restoration Printing Date: 2011-09-21 Abstract: This standard provides a vocabulary to be used in describing structural and administrative metadata for digital and analog audio formats for the purpose of enabling audio preservation activities on those objects. Some implementations also refer to this metadata as technical metadata. The characteristics of the audio objects captured by this standard may be of use to audio communities beyond the audio preservation community.[9]