Home > Data > Data Search | Data Catalog | Use Agreement | Distribution Formats | EML Metadata

Data Distribution Formats

Data File Formats

GCE-LTER data sets are generally available in multiple file formats to suit various end-user requirements and software capabilities. The list below summarizes the main features of these formats and recommendations for common types of software. Technical details are described in the next section.

  • Spreadsheet (CSV) format (*.CSV)
    • text file in CSV (comma-separated value) format optimized for spreadsheet compatibility
    • contains a minimal file header listing the title, column names, column units, and column variable types (the metadata is provided separately)
    • recommended for users of spreadsheet and database applications
  • Text File format (*.TXT)
    • tab-delimited text file with a one line header containing column titles only, as described in the corresponding EML metadata document
    • intended for downloading via data table distribution links in the EML document (i.e. eml/dataset/dataTable/physical/distribution/online/url)
    • also recommended for users of spreadsheet or database programs
  • Text Report format (*.RPT)
    • conventional text file containing a complete metadata header followed by a tab-delimited data table and one or more tab-delimited statistical reports
    • recommended for users of word processor and spreadsheet programs
  • MATLAB (GCE Toolbox) format (*.MAT)
  • MATLAB (Variables) format (*-VARS.MAT)
    • conventional MATLAB data files, with data columns as named arrays and formatted metadata in a single character array (i.e. 'metadata')
    • recommended for MATLAB users who have not downloaded the GCE Data Toolbox program
  • Zip archive (*.ZIP)
    • Zip archive containing an ESRI ArcGIS shapefile, raster GIS data or data in other specialized data formats as described

Technical Format Specifications

Tabular GCE-LTER data sets are primarily stored as structured MATLAB data files with encapsulated documentation (metadata), QA/QC rules and qualifier flags to support data analysis, transformation and formatting. Custom MATLAB programs in the GCE Data Toolbox are used to export the data sets in a wide variety of ASCII and MATLAB formats with documentation formatted in various styles.

Technical specifications for GCE distribution file formats are described in the table below. Contact the GCE Information Manager to request data in a custom format or to request a basic statistical treatment of the data (e.g. statistics for values aggregated by one or more key columns).

Extension File Type File Format Data Delimiter Documentation Comments
.CSV Text (ASCII) Comma-delimited text in comma-separated value (CSV) spreadsheet format, with text containing commas wrapped in double quotes Single comma 5-line file header containing data set title, column titles, units, variable types (complete documentation available in corresponding ESA-FLED metadata file) CSV format is supported by most spreadsheet programs
.TXT Text (ASCII) Tab-delimited text Single tab 1-line file header containing column titles (complete documentation available in corresponding EML metadata file) This format is designed to support loading GCE data into database or statistical analysis software that does not support multi-line file headers
.RPT Text (ASCII) GCE data set report containing a metadata header, tab-delimited data table, and one or more tab-delimited statistical summary reports Single tab GCE metadata in compact format without project-level descriptors (pre-pended to the data table as a file header) Relevant column statistics appended below the data table, preceded by brief descriptive headings
.MAT MATLAB 6.5 binary GCE Data Structure N/A Metadata is stored in the structure as an nx3 cell array of strings (category, field, value triplets) for automated updating and parsing This format is documented in the GCE Information System Guide
-VARS.MAT MATLAB 5.3 binary Standard MATLAB arrays N/A Complete GCE metadata in ESA-FLED numbered outline style (stored as formatted text in a character array named 'metadata') Conventional MATLAB data file, with data columns and flag columns as individual numerical arrays or cell arrays of strings (as appropriate)
-META.TXT Text (ASCII) Documentation (metadata)
for standard CSV text
distribution file
N/A Complete GCE metadata in ESA-FLED numbered outline style File does not contain the data table

GCE File Versioning Information

All files in the GCE Data Catalog are assigned version numbers to assist users in identifying whether changes have been made to a data set since its original release. Version numbers are assigned according to the following scheme:

[Major Version].[Minor Version]  (with 1.0 as the first release)

'Major Version' is the primary version of the data set, based on composition of the data table and metadata descriptors. Updates to this number indicate significant changes to the data set, such as addition or removal of data columns or significant changes to data values or metadata descriptors; users who have downloaded an earlier version are encouraged to download the update and re-evaluate any analyses based on the data.

Changes to 'Minor Version' usually represent refinement of metadata descriptors or reformatting of data sets and metadata to comply with new display standards. Users are encouraged to look at the data set processing history listed in the updated metadata to determine whether or not the update is important to their work, but re-downloading the data set is usually not necessary.

As of October 2003, major and minor version numbers are appended to all GCE data and metadata file names (e.g. PHY-GCEM-0310a1_1_1.MAT for PHY-GCEM-3010a1 version 1.1). This change was made to improve data file version control in GCE archives, and to simplify end-user assessment of data set lineage and updates.

Data Update Notification

As a service to project participants and the scientific community, update notifications are sent via email whenever changes are made to a data set listed in the data catalog. Announcements are sent to project participants using the GCE-LTER mailing list and to individual email addresses of public users who have downloaded an earlier version of the data set and requested update notification on the data access form.

LTER
NSF

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.