GCE Data Structure - Data and Metadata Model

Overview

The GCE Data Structure is a generalized MATLAB specification used by the GCE Data Toolbox software to store tabular data along with quality control information and detailed attribute, documentation and provenance metadata. The toolbox provides command-line API functions and GUI forms for instantiating, validating, populating and managing these structures, so users are not required to know any details about the specific MATLAB implementation. The conceptual diagram and specification overview below are therefore provided for reference only.

Conceptual Diagram

screenshots/gce_data_structure_1_1.png

MATLAB Specification (version 1.1, 2001)

Field Field Type Description
version n x 1 cell array of strings GCE Data Structure version (used for validation)
title character array Title of the overall data set
metadata n x 3 cell array of strings Parseable array of general metadata information (variable-length array of metadata category names, field names, field values)
datafile n x 2 cell array Names and sizes of data files added to the structure
createdate character array Date and time the data structure was created
editdate character array Date and time the structure was last edited
history n x 2 cell array of strings Processing history of the data structure (list of dates and operations performed)
name 1 x m cell array of strings Name of each data column
description 1 x m cell array of strings Description of each data column
units 1 x m cell array of strings Units for values in each data column
datatype 1 x m cell array of strings Physical data type of each column (i.e. storage type, e.g. 'f' for floating-point, 'e' for e by ponential, 'd' for decimal/integer, 's' for string)
variabletype 1 x m cell array of strings Variable type (semantic type) of each column, i.e. 'data' = measured data value, 'calculation' = calculated data value, 'nominal' = categorical value (e.g. name, species, site), 'logical' = Boolean or true/false value (e.g. 0 or 1, yes or no), 'datetime' = date and/or time value, 'ordinal' = order or positional value, 'code' = coded value, 'coord' = geographic coordinate value (e.g. latitude or longitude), 'text' = free text (e.g. notes, comments)
numbertype 1 x m cell array of strings Numerical type of each data column (e.g. 'continuous', 'discrete', 'angular', 'none')
precision 1 x m array of integers Decimal places to display for each column
values 1 x m cell array Data values (each cell contains a matching "column" of data, as an n x 1 numerical array or n x 1 cell array of strings)
criteria 1 x m cell array of strings Flag criteria e by pressions for each column, which are evaluated by the 'dataflag' function to generate QA/QC flags
flags 1 x m cell array Arrays of flag characters (each cell is empty or contains an n x m character array of flags matching the corresponding value array)