Home > Informatics >

Data Documentation

Introduction

The usefulness and longevity of any data archive is largely determined by the completeness and consistency of the data documentation, or metadata.  The metadata describe the physical and logical structure of a data set, as well as the hypotheses, methodology and researchers responsible for its creation.

Fully documenting a data set is a daunting and time-consuming process, and in some cases may require more effort than producing the data set itself.  Consequently, developing efficient procedures for entering, updating, and organizing metadata remains a top priority in the design and implementation of the GCE Information System.

Submitting Metadata

Web forms for submitting data documentation directly to the GCE Metadata Database ("Metabase") are under development. When completed, these forms will allow users authorized by each investigator to enter and update documentation for all data submissions contributed by that individual investigator.  To make this process as convenient as possible, users will be able to save and reopen incomplete submissions, and use prior submissions as templates for new submissions.  Optionally, individual passwords and "friendly names" can be assigned to each submission to enhance security and selection from lists.

In the interim, investigators ready to submit data should use the Microsoft Excel template below for data submissions. The template contains fields and instructions for submitting documentation, data tables and details about non-tabular data (e.g. GIS files), and also contains reference material on GCE site standards for variable names and research themes.

GCE Data Submission Template

For investigators who do not have access to Excel, please contact the Data Manager to request a text file containing a list of all the metadata fields that need to be filled in by the contributor.  Also, please consult the GCE Site Standards page to review recommendations for naming variables and reporting units.  Following these guidelines is strongly encouraged to maximize the inter-comparability of data sets within the project.

Metadata Standards

Overview

Guidelines and standards for content and formatting of ecological metadata are constantly changing as the science of ecological informatics evolves and new applications are envisioned. In order to accommodate future changes in metadata standards, GCE metadata content is managed in a normalized relational database structure. Web applications and data processing programs dynamically query this database in order to provide custom-formatted metadata to suit various end-user requirements. Official ecological metadata standards currently supported by the GCE LTER project are listed below.

LTER Metadata Standard (EML)

The LTER Network adopted the Ecological Metadata Language (EML) standard in 2003 to support LTER Network Information System (NIS) objectives, NSF directives, and proposed grid computing initiatives. This standard requires that documentation for both geospatial and non-geospatial metadata be provided in computer-readable XML format based on the EML XSD schema. In order to support these initiatives, comprehensive metadata in EML 2.1.0 format is available for all data sets in the GCE Data Catalog.

GCE Geospatial Metadata Standard

Metadata from GIS-based research projects will be generated by ESRIŽ ArcInfo according to Federal Geographic Data Committee (FGDC) standards, or created using other FGDC-compliant tools such as NBII MetaMaker.  Geospatial metadata will be provided along with GIS data products in ASCII and HTML format.

GCE Non-geospatial Metadata Standard

Native GCE metadata for non-geospatial data, including most tabular data sets, is based on content and formatting standards recommended by the Ecological Society of America's Committee on the Future of Long-term Ecological Data (ESA-FLED).

The major categories of metadata descriptors are listed in the table below.  Click on hyperlinks to view the list of elements in the corresponding section of the Metadata Reference page.

Metadata Class Information Contained Source*
I. Data Set Descriptors Elements that uniquely identify each data set and provide global search information (e.g. originators, abstract, key words) C, DB
II. Research Origin Descriptors Elements that describe the research program under which the data set was produced. DB
   A. Overall Project Description General information about the affiliations, objectives, and funding of the overall research project DB
   B. Sub-project Description Specific information about study site characteristics, research design, and personnel involved in producing the data set C, DB
      1. Site Description Description of the study site C, DB
      2. Experimental or Sampling Design Details about the overall design of the study C
      3. Research Methods Details about the methods used in the study C
      4. Personnel List of all the personnel that participated in the study C
III. Data Set Status and Accessibility Status and accessibility of the data set C, DB
   A. Status Details about the storage and archival history of the data set DB
   B. Accessibility Details about accessing and citing the data set C, DB
IV. Data Structural Descriptors Details about the physical and logical structure of the data set, including variables, column formats, and code definitions C, DB
   A. Data Set File Details about the data file attributes C, DB
   B. Variable Information Details about the variables in the data set (i.e. columns) C, DB
   C. Data Set Anomalies Description of any anomalies noted in the data set C
V. Supplemental Descriptors Supplemental information about data processing software, usage history and resultant publications C, DB

* Information source:
     C = Contributor
     DB = Database (i.e. automatically filled in from database fields)

   Data Formatting Data Submission 
LTER
NSF

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.