Dataset Profiles

Help us build a set of profiles for datasets that fall within the scope of this project.

Identify a dataset produced at an institution, for which the researcher has archived the data locally or is looking for a place to archive the dataset.

Describe the dataset according to the following elements: domain (research area), format, size, doi (yes or no), any access restrictions (i.e. privacy issues OR readability).



Dataset from the University of Göttingen

Dataset sampled: ‘SoundEFForTS’- Web Sound Platform for Soundscape biodiversity identification


Sound recording of Sumatran birds.  A subset of the data is recording the identification of the birds.

Domain: life-sciences, biodiversity

File format: flac. Audio files, played back using HTML5 audio. Circa 35 files to date. Each file c. 60 MB.

Storage platform:

The sounds are collected using an open source web sound archive. It has been developed by Pumilio is a free and open source PHP/MySQL application for the management of sound archives and the visualization and manipulation of sound files. The system allows the user to load sound files in many formats, see the spectrogram of the sound, select regions of the sound for further analysis and insertion in a database, filtering, and many other manipulations.

Functionalities include comparison of files, audio clips, side by side.

Metadata collected: collection, time, date, site, time, sampling rate.

More granular is a taxonomy of bird names which can be allocated.


Not open to all. Login needed to view results. Changing of results obviously not permitted.

Many researchers contribute to collecting this data. This is based on a ‘crowd-sourcing’ concept.


Of Note:

SoundEFForTS is a DFG funded project ‘web-based information system and research  data management’ project

The University of Göttingen supports the project, and the PI is based at the library. The project goals are:

1. to set up a state-of-the-art information system which serves all scientific projects as the central

service point to manage and analyse their data and to potentially reuse other projects’ data,

2. to develop a common data policy and data management plans for individual projects,

3. to offer data publication and long-term preservation options and

4. to support the whole CRC in all data curation issues and to publicly disseminate examples and

guidelines for a research data infrastructure based on the project results.

While the above data is long-tail, the data is well-supported via the goals of the project