Big Data Use Case: Massive Multi-Dimensional Arrays
-
Discussion
-
Contributor and driver: Peter Baumann, Jacobs University
Summary: Science and engineering data often come as sensor, image, simulation, and statistics data digitized/sampled from some natural phenomenon or generated from some simulation. These naturally resemble raster data (“arrays” in programming) of some particular dimensionality. In the Earth Sciences, for example, we find 1-D sensor time series, 2-D satellite images, 3-D x/y/t image time series and x/y/z geophysical cubes, and 4-D x/y/z/t atmosphere and ocean data. Array data application domains span Earth, Space, Life, and Social sciences, plus many more. Array Databases are being standardized by ISO (nickname: “Science SQL”).
Arrays as a specific data structure require specific methods for handling. For example, MapReduce deals with independent data items whereas array cells have a clearly defined Euclidean neighbourhood. Efficient implementations will respect this. A particular research area in databases, Array Databases, addresses such challenges and provides solutions, like rasdaman, which have proven to be applicable across highly diverse domains.
ISO has embarked on extending the SQL database language with multi-dimensional arrays so that a large part of the Big Science data can be managed integrated with their metadata. This extension to the SQL query language is being established under the name: ISO 9075 SQL Part 15: MDA (“Multi-Dimensional Arrays”). See press citations here and here, and this introduction to Array Databases.
RDA Contributions
- Initiating and driving ISO standardization of SQL/MDA
- Continuously advancing Array Database technology through implementation work on rasdaman
Log in to reply.