Frequently Asked Questions

Installation and Licensing

Do I need to have a license for MATLAB to use this software?

Yes. The GCE Data Toolbox is a library of MATLAB functions and data files, and requires a licensed copy of MATLAB (version 7.9/R2009b or higher) to run.

Can I use an open source MATLAB clone (e.g. Octave) to run the GCE Data Toolbox?

No. The GCE Data Toolbox leverages hundreds of built-in MATLAB functions and libraries that are not available in Octave and other alternative programs that support MATLAB-like syntax.

What kind of computer hardware do I need to run MATLAB and the GCE Data Toolbox?

MATLAB hardware requirements and recommendations are listed on the Mathworks web site. The GCE Data Toolbox does not impose any additional requirements beyond MATLAB. However, memory fragmentation on 32-bit Windows systems can limit the maximum array size available in MATLAB sessions to <100 MB even in systems with >3 GB of RAM, causing certain memory intensive Q/C and geographic operations to fail. Whenever possible, MATLAB should be run on 64-bit operating systems to avoid this fragmentation issue.

Does the GCE Data Toolbox require any specialized MATLAB toolboxes?

No, but it can take advantage of functions in the Database Toolbox to exchange data with relational database management systems, including MySQL, Postgres, Microsoft SQL Server and Oracle. Specialized functions in other toolboxes (e.g. Statistics, Curve Fitting, Signal Processing) could also be called when adding calculated columns, correcting sensor drift, filling missing values or in custom QA/QC rules.

Do I need to add the GCE Data Toolbox directories to the permanent MATLAB path?

No. The needed directories will be added to the run-time path automatically when startup.m is called from the toolbox root directory. In fact, manually registering GCE Data Toolbox directories in the permanent path can lead to problems with future toolbox updates, because startup.m will not alter the path if any toolbox directories are found to prevent duplication, preventing any new sub-directories from being added.

How can I start up the GCE Data Toolbox without changing my working directory?

The simplest way to accomplish this is to save the working directory in a variable, change to the toolbox directory to run startup.m, then change back to the original directory. These steps can be coded in a MATLAB Desktop shortcut similar to the following:

curpath = pwd;               %cache working directory
cd 'C:\MATLAB\GCE_Toolbox'   %change to toolbox directory
startup                      %run the startup script
cd(curpath)                  %change back to the original directory
clear curpath                %clear the curpath variable from memory

Do I have to store my custom workflows in the toolbox /workflows directory?

No. You can store workflow functions or scripts and other toolbox extensions in any directory available in the MATLAB search path (use the path browser GUI or type 'path' in the command window to view the active path). In fact, the best practice is to store all custom and localized toolbox files in a directory outside of the default GCE Data Toolbox directories to streamline updating to new toolbox versions without losing your customizations.

To add local directories to the toolbox automatically on start up, create a text file named 'localpaths.txt' containing fully qualified paths appropriate for your system and save it to your default user path (type 'userpath' in the command window) or the top-level toolbox directory. List one directory per line with no quotes or delimiters, e.g.

C:\Users\Joe\Documents\MATLAB\workflows
C:\Users\Joe\Documents\MATLAB\myfunctions
\\DataServer\harvester\harvester-scripts

Note that any functions and .mat files stored in directories included in localpaths.txt will have higher priority than the default GCE Data Toolbox directories, so use distinct file names unless you intend to over-ride native toolbox files with customized versions.

Importing and Loading Data

What's the difference between loading and importing data, or saving and exporting data?

Loading and saving data in the Dataset Editor application (e.g. File > Load Data Structure) pertain to GCE Data Structures, which are the native MATLAB variables used for storing data, Q/C information and metadata (see the data model page for more information). Importing and exporting data commands are used to load or save data stored in other formats, such as CSV text files, MATLAB arrays or specialized exchange formats.

How do I import Campbell Scientific CR10x data into the GCE Data Toolbox?

Array-based data from Campbell CR10x loggers require special processing, because the files lack a header and each row may contain a different number of variables representing different measurements. A special function (csi2struct.m) is used to split arrays into separate data sets and then apply appropriate column metadata and QA/QC rules to each data set.

However, unlike other import filters a special metadata file must be created prior to parsing the data file (csi2struct.mat). This file is a GCE Data Structure containing metadata descriptions for each array field and boilerplate documentation metadata to apply to each array data set. This file can be created using the GCE Data Toolbox (load /userdata/csi2struct.mat and view the data table as an example) or imported in CSV format from a spreadsheet. Detailed instructions will be included in the tutorial that is currently in development, but contact Wade Sheldon for more information.

Why do my data structure files take so long to load or save?

Recent versions of MATLAB default to saving .mat files using the version 7 (-v7) or 7.3 (-v7.3) MAT-file format, which compresses arrays and supports Unicode character sets. The system overhead involved in compressing and decompressing arrays can dramatically slow down loading of large data sets. For example a 20MB file may actually represent at 300MB data set once decompressed.

If storage space and network bandwidth are not limiting, you can change the default MAT-file format to Version 5 or later (-v6) by opening the Preferences dialog from the command window HOME tab, then choosing MATLAB > General > MAT-Files and selecting MATLAB Version 5 or later.

How do I enable the Data Turbine import tools?

The Data Turbine integration functions and 'File > Import Data > Data Turbine Channel Data (WWW)' import wizard are enabled automatically if the Data Turbine MATLAB support functions are present in the MATLAB search path. See the Data Turbine Integration page for more information.

Using the Toolbox

Do I have to use the GUI applications to process data using the GCE Data Toolbox?

No. In fact the toolbox was originally developed as a library of scriptable command-line functions and the GUI applications were added later to simplify use of the toolbox by non-programmers. Every operation performed using a GUI application can typically be accomplished by calling a single high-level function, and custom functions and scripts can be written that leverage the core toolbox functions as an API to automate routine operations.

The /workflows and /demo directory of the toolbox distributions contain working examples, and a list of commonly-used command line functions is available on the Command-line API wiki page.

My metadata template is not being applied when I import a logger file. What's wrong?

This problem often occurs when there is a mismatch in spelling between the column name in the logger file and variable entry in the template attribute metadata (e.g. 'WindSpd' and 'WindSpeed'), or if the template name is misspelled when the import filter is called in a workflow (e.g. 'MetStation1' and 'Met Station 1'). Most import filters will apply a default template or determine data types and precisions of variables automatically if a template or a variable is not matched, so no error typically results when these types of spelling issues happen.

Can I use the same metadata template for multiple stations with different sensors?

Generally yes. Variables in raw data files are matched to template attribute metadata based on name, so you can include more attributes in a template than exist in any individual data file. All variables of the same name will share the same attributes and QA/QC rules. Variable names with alternative spellings in different logger files can also be "mapped" to the same column name and attribute metadata by copying the relevant row in the template attribute editor spreadsheet then revising the Variable field to correspond to each logger file variable name variant (e.g. 'AirTemp', 'Temp' and 'Temp_C' all mapped to 'Temperature_Air' in the processed data set with the same attribute characteristics).

The major exception is that if the same variable name is used in data files for columns that have incompatible data types or units then multiple templates must be used (e.g. 'Date' as a string in YYYY-MM-DD hh:mm:ss in one file and mm/dd/yyyy in another, or 'Site' as a string in one file and integer in another).

How can I split a large time-series data set into multiple data sets by year?

Data sets can be split into subsets by year or any combination of column values using the querydata() command line function or ui_querybuilder GUI (invoked from the Data Editor using Tools > Filtering > Filter/Subset Data by Column Values...)

Below are several examples of how to split a time-series data set by year using the Query Builder GUI:

  1. If your data set has a numeric Year column, double click on that entry in the column list (or select it and click the "Copy Column..." button)
  2. Select "=" as the comparison qualifier
  3. Type the desired year in the criteria field to the right of the comparison qualifier drop-down (or use the "Pick" button and select the year from the range)
  4. Click on "Evaluate" to open the filtered data in a new Data Editor window, then save the data as a separate file
  5. To split the data set into multiple yearly files, uncheck the "Close dialog after export" option and repeat 1-4 for each year, adjusting the year criteria accordingly

If the data set does not have a Year column, you can also filter based on a numeric Date column (i.e. MATLAB serial dates) by adding 2 entries for the Date column and specifying starting (e.g. >= 1/1/2013) and ending (< 1/1/2014) criteria, resp. Note that date strings you enter will automatically be converted to MATLAB serial dates by the GUI. You can also filter based on string dates using the CONTAINS qualifier (e.g. Date CONTAINS '-2013').

To subset data from the command line or in a workflow you can call querydata() directly using query statements like those generated by the Query Builder, e.g.

data2012 = querydata(data,'Year = 2012');
data2013 = querydata(data,'Year = 2013');

or for numeric serial dates:

data2012 = querydata(data,'Date >= datenum(''1/1/2012'') and Date < datenum(''1/1/2013'')');
...

and for string dates:

data2012 = querydata(data,'contains(Date,''-2012'')');
...

Data Harvesting

How do I make the XML/HTML output from data_harvester.m appear like my website for production use?

Customizing the XML/HTML output appearance requires 4 steps:

  1. Copy the following files in /demo to a local, web-accessible directory on your web server:
    • harvest_webpage.xsl
    • harvest_webpage.css
    • harvest_info.xsl
    • harvest_details.xsl
    • harvest_plots.xsl
  2. Open harvest_webpage.xsl (the master web scaffolding stylesheet) in a text or XML editor and revise as necessary to match your website. The easiest way to do this is to copy your website HTML into the relevant sections of the <head> and <body> blocks, making sure to include the <xsl:call-template...> instructions in the appropriate portions of <body>, and add your CSS and JavaScript links in the <head> section. Optionally customize harvest_webpage.css to revise coloring and other aspects of the harvest pages to match your website.
  3. Revise the local copies of harvest_info.xsl, harvest_details.xsl and harvest_plots.xsl to include links to your custom harvest_webpage.xsl in the <xsl:import...> elements at the top of the stylesheets.
  4. Add custom profile entries to demo/harvest_info.m and demo/harvest_plot_info.m that include links to your custom versions of the above stylesheets, using the demo profile as a guide

Help and Support

How do I report a bug or request a new feature?

Unfortunately the GCE Data Toolbox Trac website is no longer online for adding support tickets, so the best way to report a bug or request a new toolbox feature is to email the lead developer ( Wade Sheldon) directly. You can alse send an email to the peer-support mail list for advice from the user community on how to correct or work around the issue.