Home > Informatics >

Data Submission

Introduction

GCE investigators use a wide variety of commercial and custom software applications to process and store data they collect for their GCE-sponsored research programs. Information required to document the data, such as descriptions of study characteristics, methodology, and data attributes, is also stored and managed in a variety of ways. The primary goal of the data submission process is to standardize data and documentation from these various proprietary formats so that all GCE data sets can be managed and distributed in a consistent manner, as well as meshed with other data sets to support data syntheses and cross-site comparisons.  This is a formidable task, but data submission templates and sophisticated data processing programs have been developed for the GCE program in order to make this process as convenient and efficient as possible.

After this standardization process is complete, data and documentation are managed using a multiple database strategy:

  • Data documentation (i.e. metadata) is primarily stored and organized in a relational database management system to support data set querying and cataloging, and to generate documentation in different formats to suit various requirements.

  • Data sets from individual research studies are primarily stored as descriptively-named files in both structured binary and delimited ASCII text formats, with database-generated metadata stored along with the data table.  These files are organized using a data file management system approach, with files named as follows:

    [theme code]-[project and study type code]-[year and month].[extension]

    Example:

    File Name:  PLT-GCEM-0101.ASC

    Description:  plant ecology theme, GCE program monitoring study, submitted Jan. 2001, ASCII text

    Serial letters or letters and numbers are appended to the base file name to accommodate multiple submissions in the same month (e.g. PHY-GCEM-0111a.ASC).  The file names and locations are stored in the metadata database to support catalog generation and maintain linkages between metadata and data files.

  • Large data sets from long-term monitoring efforts are primarily stored as highly structured MATLAB files to support subsetting and integration, and secondarily as individual documented data sets generated from the database containing subsets of data based on spatial or temporal coverage (e.g. monthly or yearly data sets or summary data sets).

Note that the GCE Information System and data submission protocols will be revised as NSF and LTER requirements, user expectations and software tools evolve, so specific standards for the content, formatting, documentation, and submission of GCE data are necessarily subject to change.  Our goal is to make the entire process of data submission and access as convenient and efficient as possible for everyone, and contributor feedback is absolutely essential to meeting this goal. If any aspect of the data submission protocols seems inefficient, error-prone, or confusing, please discuss it with the Information Manager.

What to Submit

The primary purpose of the GCE database and data archive is to provide a long-term record of ecological observations to support data discovery and analysis over long temporal scales.  Consequently, data sets should predominantly consist of raw data from direct measurements or counts.  Derived and calculated parameters should only be included when they are essential to the interpretation of the data, such as when the raw data require proprietary calibration steps or the main properties of interest can only be measured indirectly (e.g. by change of an indicator solution or measurement of a reciprocal property like post-combustion mass).  Publication in the conventional scientific literature is usually a more appropriate outlet for detailed calculations and statistical analyses of the raw data.

When derived or calculated parameters are included in data sets, it is imperative that all information relevant to their calculation be included in the data documentation, including equations, descriptions of processing steps, and references.  It is also strongly recommended that initial measurements used in the calculations be included as well, to allow future analysts to reevaluate the data using different criteria or derive secondary information from the data values.

When the basic principles stated above are applied to actual research studies, however, the distinction between 'measured' and 'derived' values is often unclear and open to debate.  This is particularly true in the case of electronic instrument-based measurements, which are increasingly prevalent in modern science.  The ultimate decision about what constitutes 'data' or 'calculations' and what information to submit rests with the investigator, but contributors are encouraged to consult with the Data Manager prior to submitting data for the first time to discuss appropriate strategies for classifying and documenting each parameter. 

When to Submit

Data should always be submitted at the investigator's earliest convenience, to minimize the possibility of information loss as memories fade, data sheets become misplaced, and workers move on to other projects.  Time frames for data release (see Data Access) will be honored regardless of when the data are actually submitted.

As a general guideline, investigators are asked to submit monitoring data as soon as it is obtained and processed, and directed study data within six months of collection and analysis.

How to Submit

Investigators are strongly encouraged to schedule a meeting with the Information Manager prior to preparing their data and documentation for initial submission.  Many potential content and formatting issues can easily be avoided, saving everyone time and trouble.

Specific guidelines for preparing data for submission are presented in the Data Format Guidelines and Data Documentation sections of the GCE-IS guide. Web-based data submission applications may be provided in the future, but spreadsheet-based and plain text data submission templates have been prepared as an immediate solution. Contributors who use Microsoft Excel or Sun Open Office are encouraged to use the spreadsheet version of the template (.xls file)  to prepare submissions to streamline metadata processing in the IM office.

» Download the GCE Data Submission Template

Once formatted, data files and documentation can be sent to the GCE Information Manager in several ways:

  • Directly uploaded to the GCE server via the file upload form on the private GCE web site
  • Provided on magnetic or optical disk (e.g. USB thumb drive, portable hard drive, CD-R, DVD-R)

After your submission is received, the Data Manager will contact you to confirm any incomplete or missing details, discuss any requests for post-processing analyses, and establish a time frame for returning processed files for review.

   Overview Site Standards 
LTER
NSF

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.