Data Storage Protocols
Data storage in the GCE Information System is organized to balance analytical, accessibility, security, and archival considerations for each resource. Storage specifications for metadata, digital data, and printed data archives are as follows:
The metadata describe the physical and logical structure of a data set, as well as the hypotheses, methodology and researchers responsible for its creation. The primary repository for GCE metadata is the Metabase, a relational database developed using Microsoft SQL ServerŪ 2008. The Metabase is secured using both network and database security layers, and is accessed primarily through web applications available on the GCE LTER Public Web Site and GCE LTER Project Web Site. Limited write access is provided through data submission forms (project web only), and read access is provided through submission editing forms (project web) and metadata search and display applications (both web sites).
Database files are synchronized between the GCE data management workstations and servers, and regularly backed up to magnetic tape and off-site servers.
The primary repository for digital data (e.g. submissions, processed data, and archived data) is the GCE Information System, consisting of multiple fault-tolerant servers and workstations housed in the Marine Sciences Department at the University of Georgia. Data files are protected by several layers of computer security, mirrored between RAID-5 arrays on multiple servers, and regularly backed up to magnetic tape stored off-site. Multiple data formats are supported.
As data files become candidates for online access, copies are transferred to the GCE Web Server to provide access on the WWW. Access to these files is controlled by network file security protocols, and web-based data access is supported through the GCE LTER Public Web Site and GCE LTER Project Web Site.
Information submitted on paper, such as log files and field data sheets, are also archived in the GCE LTER data management office (110B Marine Sciences Dept., University of Georgia). Paper forms are scanned when practical to provide digital versions for archiving with the corresponding data. Access is controlled by the data manager, using appropriate physical security measures.
Poster presentations, printed reports, news clippings, and other printed matter submitted to the data manager will also be archived in the GCE LTER data management office.
Data File Formats
MATLAB Files (*.mat)
MATLAB binary files are the primary storage format for GCE tabular data sets. Data are organized as two types of structure variables: data structures (named "data") and stat structures (named "stats_all" or "stats_unflagged"). These variable types are described below:
GCE Data Structures
GCE Data Structures are multidimensional MATLABŪ 6.x structure variables designed to store fully-documented tabular data sets (specifications). MATLAB functions in the GCE Data Toolbox provide a layer of abstraction, allowing users to work with information in data structures without requiring direct manipulation of the structure itself. Toolbox functions also programmatically preserve row correspondence between data columns, transfer metadata content when creating new structures, and transparently store function processing history information. This allows users to manipulate data structures without compromising their information quality.
GCE Stat Structures
GCE Stat Structures are multidimensional MATLABŪ 5.x structure variables designed to store statistical summary information for a single GCE Data Structure (specifications). This information will be used to summarize data sets and provide authentication information for data documentation.
Column statistics can be performed on all or only unflagged observations, either ungrouped or grouped by the values in one key column. Appropriate statistics are calculated according the the physical and logical data types of each column.
Text Files (*.txt, *.doc)
Text data files are secondary files generated from MATLAB data files to provide an open standards-based archive of the data and documentation. Data are stored as tab-delimited columns of numbers and text, formatted according to information in the MATLAB structures. Column descriptions, metadata, and summary statistics are provided in separate non-delimited documentation files (*.doc).
This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.