Home > Informatics >

Data Submission

Introduction

GCE investigators use a wide variety of commercial and custom software applications to process and store data they collect for their GCE-sponsored research programs. Information required to document the data, such as descriptions of study characteristics, methodology, and data attributes, is also stored and managed in a variety of ways. The primary goal of the data submission process is to standardize data and documentation from these various proprietary formats so that all GCE data sets can be managed and distributed in a consistent manner, as well as meshed with other data sets to support data syntheses and cross-site comparisons.  This is a formidable task, but data submission templates and sophisticated data processing programs have been developed for the GCE program in order to make this process as convenient and efficient as possible.

After this standardization process is complete, data and documentation are managed using a multiple database strategy:

  • Data documentation (i.e. metadata) is primarily stored and organized in a relational database management system to support data set querying and cataloging, and to generate documentation in different formats to suit various requirements.

  • Data sets from individual research studies are primarily stored as descriptively-named files in both structured binary and delimited ASCII text formats, with database-generated metadata stored along with the data table.  These files are organized using a data file management system approach, with files named as follows:

    [theme code]-[project and study type code]-[year and month].[extension]

    Example:

    File Name:  PLT-GCEM-0101.ASC

    Description:  plant ecology theme, GCE program monitoring study, submitted Jan. 2001, ASCII text

    Serial letters or letters and numbers are appended to the base file name to accommodate multiple submissions in the same month (e.g. PHY-GCEM-0111a.ASC).  The file names and locations are stored in the metadata database to support catalog generation and maintain linkages between metadata and data files.

  • Large data sets from long-term monitoring efforts are primarily stored as highly structured MATLAB files to support subsetting and integration, and secondarily as individual documented data sets generated from the database containing subsets of data based on spatial or temporal coverage (e.g. monthly or yearly data sets or summary data sets).

Note that the GCE Information System and data submission protocols will be revised as NSF and LTER requirements, user expectations and software tools evolve so specific standards for the content, formatting, documentation, and submission of GCE data are necessarily subject to change.  Our goal is to make the entire process of data submission and access as convenient and efficient as possible for everyone, and contributor feedback is absolutely essential to meeting this goal. If any aspect of the data submission protocols seems inefficient, error-prone, or confusing, please discuss it with the Information Manager.

What to Submit

The primary purpose of the GCE database and data archive is to provide a long-term record of ecological observations to support data discovery and analysis over long temporal scales.  Consequently, data sets should predominantly consist of raw data from direct measurements or counts.  Derived and calculated parameters should only be included when they are essential to the interpretation of the data, such as when the raw data require proprietary calibration steps or the main properties of interest can only be measured indirectly (e.g. by change of an indicator solution or measurement of a reciprocal property like post-combustion mass).  Publication in the conventional scientific literature is usually a more appropriate outlet for detailed calculations and statistical analyses of the raw data.

When derived or calculated parameters are included in data sets, it is imperative that all information relevant to their calculation be included in the data documentation, including equations, descriptions of processing steps, and references.  It is also strongly recommended that initial measurements used in the calculations be included as well, to allow future analysts to reevaluate the data using different criteria or derive secondary information from the data values.

When the basic principles stated above are applied to actual research studies, however, the distinction between 'measured' and 'derived' values is often unclear and open to debate.  This is particularly true in the case of electronic instrument-based measurements, which are increasingly prevalent in modern science.  The ultimate decision about what constitutes 'data' or 'calculations' and what information to submit rests with the investigator, but contributors are encouraged to consult with the Information Manager prior to submitting data for the first time to discuss appropriate strategies for classifying and documenting each parameter. 

When to Submit

Data should always be submitted at the investigator's earliest convenience, to minimize the possibility of information loss as memories fade, data sheets become misplaced, and workers move on to other projects.  Time frames for data release (see Data Access) will be honored regardless of when the data are actually submitted.

As a general guideline, investigators are asked to submit monitoring data as soon as it is obtained and processed, and directed study data within six months of collection and analysis.

How to Submit

Investigators are strongly encouraged to schedule a meeting with the Information Manager prior to preparing their data and documentation for initial submission.  Many potential content and formatting issues can easily be avoided, saving everyone time and trouble. Specific guidelines for preparing data for submission are presented in the Data Format Guidelines and Data Documentation sections of the GCE-IS guide.

Note that as of October 2018 data set documentation (metadata) must be provided separately online prior to submission of the data files. Go to Data Products > Data Submission > Submit Data (https://gce-lter.marsci.uga.edu/private/app/add_dataset.asp) to register the metadata for your data set. Metadata from previously submitted data sets can be used as a template for new submissions from long-term studies to save effort re-entering content. "Copy" links are provided on the View Submissions page (https://gce-lter.marsci.uga.edu/private/app/view_submission.asp), and "Copy as Template" links are provided on the Data Set Summary pages of the GCE Data Catalog for this purpose.

A Microsoft Excel spreadsheet template is provided for formatting and describing tabular and non-tabular data for submission to the GCE Information Management Office for archiving. For tabular data (e.g. spreadsheets, non-digital data sheets, simple comma or space-delimited logger files), the data values and column information should be entered or pasted into the "Tabular Data" worksheet of the template unless prior arrangements are made with IM staff for parsing data from specialized formats (e.g. raw data logger files, real-time data telemetry or lab-specific storage formats). For specialized tabular or non-tabular data (e.g. GIS files, raster imagery, genomics data), the data file(s) can be described using the "Non-Tabular Data" worksheet and the template can be uploaded along with the data files.

Completed templates and data files (if provided separately) should be uploaded to the GCE IM office using the "Add Files" links for the corresponding data set metadata on the Private GCE Web Site (https://gce-lter.marsci.uga.edu/private/app/view_submissions.asp). For large files (>200MB), please contact the GCE IM Office for alternative transfer options, such as SFTP or hard drive shipping.

» Download the GCE Data Submission Template

After your submission is received, the Information Manager will contact you to confirm any incomplete or missing details, discuss any requests for post-processing analyses, and establish a time frame for returning processed files for review.

   Overview Site Standards 
LTER
NSF

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.