Format Guidelines for Tabular Data

Data Table Formatting

Data tables can be submitted as ASCII text files or spreadsheet files in Microsoft Excel or Lotus 1-2-3 format. The files should be given a descriptive name, such as the investigator's name, study description, and date submitted, with the appropriate extension for the file format (e.g. Sheldon_weather_020101a.txt, Sheldon_hydro_020101.xls).

General Guidelines

The data table must be laid out as a rectangular "matrix" with an equal number of data values in each row.
Any missing observations should be represented by a missing value code, such as NaN for numerical columns and NaN or a user-defined code for text columns.
Each column in the data set should only contain one distinct type of information, so that columns represent variables and rows represent individual observations for each variable.
25-Apr-2008 each row should be separated by a single tab, single comma, or one or more spaces as field delimiters. The same delimiter should be used throughout the data table, and combinations of spaces and tabs or commas are not allowed (e.g. use 32.3,32.5 rather than 32.3, 32.5).
When column format headers are included (see below) different delimiters can be used on each row, but do not use combinations of delimiters within a single row to avoid parsing errors. Also note that 'Descriptions' must be delimited by tabs, commas, or semicolons to preserve spaces between words.
Individual text values should not contain any internal spaces or commas to prevent data parsing errors, although other characters, such as underscores, are allowed (if necessary, placeholder characters can be substituted for commas and spaces temporarily, and then replaced after data processing to restore the original values).
The data file should not contain any header or footer information, except for optional column format headers as described below.
If a word processor is used to generate the file, make sure that word wrap is disabled and that line break characters are included on every row.
Use a file extension of .txt or .asc to distinguish the file as ASCII and prevent confusion.

Spreadsheets

When submitting spreadsheets make sure that columns are formatted consistently and numerical precisions are set appropriately, because data will be processed as formatted (i.e. non-displayed digits will be lost).
It is a good practice to recalculate the spreadsheet then convert all formula cells to values to prevent update anomalies caused by complex formula logic or linked cells (e.g. copy cells, then paste values and formats only).
If date/time information is included in the table, make sure to inform the Data Manager and specify how this data should be processed. Spreadsheet programs often "coerce" date and time information into proprietary numerical date formats during data entry. Numerical date standards differ among programs and platforms so this is a potential source of data contamination. To avoid this problem, manually format date/time information in one of the following formats:

As individual columns containing integer date components (e.g. 02-Feb-2001 06:00PM as columns Year, Month, Day, Time with values 2001, 2, 2, 1800)
As fractional Julian days based on Jan. 1, 0000 as 1 (e.g. 730884.750)
As integer year (e.g. 2001) and Julian date/Year date (i.e. days since Jan. 1 of the same year, e.g. 32.75)

Documentation and column format information can be included in the same workbook as the data table, but generally avoid including extraneous worksheets, plots, graphical elements, or comments interspersed with data when submitting spreadsheets.

Data Column Format Information

Specific format information for each column in the data set must be provided by the contributor at the time of submission. This information is needed to complete the Data Structural Descriptors portion of the metadata, and to support online data analysis and dynamic document generation. Note that some format selections require corresponding information in other sections of the data documentation, as indicated.

The format information can be provided separately in list format, or elements can be included as header rows at the top of each data column with values delimited like the data values. When including header rows, precede the first value in each row with the attribute name and a colon (e.g. Names:Salinity,Temperature,Depth,...), and do not use more than one delimiter between values (e.g. do not include spaces when delimiting by commas). Note that individual column descriptions must be delimited by tabs, commas, or semicolons in order to preserve spaces between words. All attributes are required, except for Descriptions (Names will be substituted if omitted).

Attribute	Description and Formatting	Examples
Names	See Site standards (no internal spaces, symbols are discouraged, mixed-case text and underscores are supported)	Salinity,Stem_Count,Marsh_Zone
Descriptions	Description of each column in the table	Pore-water salinity,Number of Spartina stems in the plot,Marsh zone code
Units	Measurement units for data in each column (no internal spaces, symbols are allowed)	PSU,count,none
DataTypes	Physical data type of each column F = floating-point number I = integer number S = alphanumeric string	F,I,S
VariableTypes	Logical data type of each column Data (measured data values) Calculation (calculated values)¹ Nominal (categorical values)² DateTime (date and/or time values)³ Logical (Boolean values, e.g. 1/0, True/False) Description (textual observation) Code (strings matching items in a list)¹	Data,Data,Code
NumberTypes	Numerical data type of each column Continuous (continuous numbers) Discrete (discontinuous numbers) Angular (angular numbers, in degrees) None or N/A (not applicable, e.g. strings)	Continuous,Discrete,None
Precisions	Decimal places to display for each column (i.e. indicating decimal places to use when exporting values as text files, HTML tables)	2,0,0
FlagCriteria	Flag criteria expression for each column⁴	x<0='?';x>36='H',x<0='?',none

Equations, code lists, and flag character definitions are required in the data documentation. This information will be stored in the following section of the Metadata:

      Metadata Category IV: Data Structural Descriptors
           Equations in: Data_Calculations
           Code lists and flag character definitions in: Data_ValueCodes
           Flag criteria in: Data_FlagCriteria
Categorical values can be numbers or strings that identify some fixed property of the observation, such as site or plot, but aren't data values in themselves and won't be used for calculations or statistics (other than as grouping columns).
Date and time values can be expressed as numerical components (e.g. year, month, day, hour columns) or formatted strings (e.g. 09-Feb-2001) as appropriate for the data set - by default, the datetime variable type designation will prevent these columns from being included in statistical analyses regardless of data type.
Flag criteria are text expressions containing one or more equations per column defining flag characters to associate with numerical values that match the specified criteria. Flagging questionable data values is a powerful quality control technique, so criteria should be defined whenever possible. (Note: value flags are maintained separately from data in the MATLAB data structure, so flagged values are not removed from the data set by default. Criteria can be edited and reevaluated any time, and flags can be suppressed or displayed in various formats when the data set is saved as text or HTML).

Flag criteria syntax is as follows (or contact the Data Manager for assistance):
          x[conditional][value]='[flag character]' , where:
                    [conditional] is <, >, <=, >=, ~= (or <>), == (or =)
                    [value] is a numerical data value
                    [flag character] is any single text character, symbol, or digit

Multiple equations can be specified per column by using a semicolon to separate each one (overlapping criteria are supported, allowing flag characters to be combined when more than one criteria is matched)

For example: x<0='L';x>30='H' generates 'L' flags for negative values, 'H' flags for values over 30

(Note: contact the data manager for assistance assigning criteria for flagging coded values)

Appendix: Data types

Physical data type specifies how the values are represented and stored in the data set, whereas logical data type specifies the information content of the column (i.e. variable type or domain) and numerical data type specifies the numerical characteristics of the column. These three column attributes are used by GCE-LTER Data Structure tools to display and analyze values appropriately (e.g. statistics requiring ratio data, such as mean and standard error, will only be calculated on Floating-point numbers with logical types of Data or Calculation, whereas columns with any combination of data types can be enumerated or used for querying, sorting, or aggregating rows).

Supported combinations of these column attributes are listed in the following table, along with descriptions and common usage examples. Note that some logical data types can be represented by several physical data types, depending on the form of the data and requirements of the investigator.

Data Type Category			Description and Examples
Physical	Logical	Numerical	Description and Examples
Floating-point number	Data	Continuous	measurements of continuous variables (e.g. salinity, temperature, elapsed time)
	Data	Angular	measurements of directional parameters (e.g. wind direction)
	Calculation*	Continuous	calculations on continuous data (e.g. fractions, percentages, rates)
	Calculation*	Angular	calculations on directional parameters (e.g. wind direction)
	DateTime	Continuous	fractional numerical date values (e.g. fractional Julian day)
Integer number	Data	Discrete	count or abundance data
	Calculation*	Discrete	calculations on count data (e.g. min, max, sum, median)
	DateTime	Discrete	integer date components (e.g. year, month, day, hour)
	Nominal**	Discrete	digits representing fixed categories (e.g. sites, plots, zones)
	Logical**	Discrete	binary or numerical Boolean values (e.g. 1 = present, 0 = absent)
Text string	DateTime	NA or blank	formatted date or time strings (e.g. 01-Feb-2001)
	Logical**	NA or blank	Textual Boolean values (e.g. True/False, Present/Absent)
	Description	NA or blank	textual observation, taxonomic names (e.g. color, sex, appearance)
	Code**	NA or blank	alphanumeric code strings (e.g. species code, description code)

* equations must be included in the data documentation (Category IV: Data_Calculations)
** value definitions or code lists must be included in the data documentation (Category IV: Data_ValueCodes)

Data Formatting