Big Data Use Case: Research on Data Analytics for Automated Quality Control of Measurement Data
-
Discussion
-
contributor and driver: Morris Riedel, Shiraz Memon, Shahbaz Memon – Juelich Supercomputing Centre; Robert Huber – MARUM Bremen
The PANGAEA collection [1] offers huge datasets in the field of earth & environmental science. This includes often measurement data from various measurement devices in the field. The data measured needs to be checked as best as possible before they are becoming available in public. The particular problem thus in this case study is how we can apply ‘anomaly or outlier detection algorithms’ in order to provide automated quality control (at least to a significant degree). This is helpful because the manpower available for manually checking and validating the datasets is small and data analytics techniques with corresponding computational resources and storage may support the data scientists in this task. But this required an understanding of the data in question and how to combine best the many ‘anomaly or outlier detection algorithms and software packages’ available with a clever mix of useful underlying resources (high performance computing, high throughput computing, storage, databases, etc.). The trial data of the use case is a data set from underwater measurements in the Koljoefjord cabled observatory in Sweden.
Contact: Morris Riedel ( m.riedel@fz-juelich.de ) for more information.
Links: [1] http://www.pangaea.de/
Log in to reply.