GCE Data Search Engine
The GCE Data Search Engine is a GUI application for performing metadata-based searches to identify GCE Data Structures that meet various thematic, temporal, and geospatial criteria. Multiple search criteria can be defined by filling in text boxes, selecting items from lists or dragging rectangular bounds on maps, and then data sets matching the criteria are added to a cumulative search results list. Data sets in the results list can then be examined, loaded into various data analysis and plotting tools, exported in user-specified formats or integrated to form composite data sets.
In order to support searching, data files are first analyzed using a combination of metadata and data mining techniques to generate an optimized search index. GCE Data Structures stored in MATLAB files in any number of local directories can be indexed, and the generated indices can be saved and then re-loaded for subsequent search sessions for immediate start-up. Indices can also be refreshed at any time to remove entries for deleted files, update entries for changed files, and index any new files in previously-indexed directories.
Using MATLAB 6.5 or higher, pre-generated indices of public data sets in the GCE Data Catalog and GCE Data Portal can also be downloaded and merged with local indices to support simultaneous searches of local and web-based data sets. When data files residing on the GCE web server are selected for any analyses, the corresponding data structure is automatically retrieved and cached locally. This application therefore functions as a remote GCE data access client in addition to an end-user data management tool.
Various data set metadata fields can be searched, including title, key words, abstract, methods, study descriptors, author, and taxonomic names. Searches can also be performed on study dates (either by date range or contained date), parameter names, study sites, and geographic bounding boxes. Negative criteria can be specified for textual search terms (e.g. -PAR) to exclude unwanted matches, and positive and negative criteria can be mixed and matched in fields accepting multiple terms (e.g. keywords) to fine-tune results.
Queries can optionally be saved to a query history list and then reloaded at any time for editing and re-execution. This feature allows users to build up standard queries which can be run against new or updated search indices. The query history window can also be hidden to make more room on screen for search results.
Working with Search Results
After every successful query, new data sets matching the specified criteria are added to
a cumulative search result list. All information necessary to retrieve the corresponding data set is stored along with each entry, so search results are completely independent from search indices. In fact, result sets can be generated over multiple sessions using any number of different index files.
Data sets in the search result list can also be opened in the Data Editor application for detailed examination and analysis (e.g. statistical analysis and re-sampling, sub-sampling, value filtering, unit conversions), as well as various data plotting and summarization tools. Multiple data sets can also be selected and simultaneously copied or exported in various text and MATLAB formats or merged to create composite data sets, with user-specified QA/QC-flag handling and metadata format options. This application therefore provides users with convenient batch-processing capabilities that would otherwise require MATLAB scripting to perform.
This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959 and OCE-1237140. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.