Document Details

Title Mining and Integrating Data from ClimDB and USGS using the GCE Data Toolbox
Archive All Files / Documents / Publications / Newsletter Articles

Climate and hydrographic data are critically important for most long-term ecology studies, so integrated climate and hydrography databases such as ClimDB/HydroDB and the USGS National Water Information System represent major resources for ecologists. Both ClimDB/HydroDB and USGS also have web-based query interfaces, which provide convenient access to data and basic plots from many monitoring stations across the country. These centralized databases therefore significantly aid users with the first two phases of any data synthesis project: data discovery and data access. Data synthesis doesn't stop with the initial download, though, and many users I've worked with quickly become frustrated performing the remaining steps that are typically required. For example, common follow-up tasks include parsing and importing data into spreadsheets or analytical software, assigning or harmonizing attribute names and units, and integrating data from multiple stations for comparative analysis. Automating these operations is highly desirable, but usually requires custom programming and is not practical for most researchers. Consequently some students and researchers avoid data synthesis all together, viewing it as either too tedious or difficult, while others request help with synthesis tasks from information management staff, adding to their workload. As I've described in several prior Data Bits articles (1,2,3), at GCE we have developed a flexible environment for metadata-based data transformation, quality control, analysis and integration using the multi-platform MATLAB programming language (i.e. GCE Data Toolbox). This software was also used to develop an automated USGS data harvesting service for HydroDB that contributes near-real-time hydrographic data on behalf of 10 LTER sites to the ClimDB/HydroDB database on a weekly basis (4). In the remainder of this article I describe new data mining features recently added to this toolbox that allow users to interactively retrieve data from any station in ClimDB/HydroDB or the USGS NWIS (using MATLAB 6.5 or higher), and then easily transform and integrate these data sets to perform synthesis on their own.

Contributor Wade M. Sheldon

Sheldon, W.M. Jr. 2006. Mining and Integrating Data from ClimDB and USGS using the GCE Data Toolbox. In: DataBits: An electronic newsletter for Information Managers: Spring 2006. Long Term Ecological Research Network, Albuquerque, NM.

Key Words ClimDB, data integration, data mining, database, GCE Data Toolbox, HydroDB, LTER-IMC, MATLAB, USGS
File Date 2006
Web Link PDF file
view/download PDF file

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959, OCE-1237140 and OCE-1832178. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.