Home > File Archive > Documents > Publications > Newsletter Articles

Documents - Publications - Newsletter Articles

Page 1 of 2  
Category Theme Document  (click on title to view file details) Download
Publications Newsletter Articles Sensor and sensor data management best practices released
Abstract - Rapid advances and decreasing costs in sensor technology, wireless communication, data processing speed, and data storage capacity have enabled widespread deployment of automated environmental sensing systems. Basic environmental processes can be monitored continuously in habitats ranging from very remote to urban providing information in unprecedented temporal and spatial resolution. Although research questions that may be answered based on these data are very diverse (Porter et al. 2009), the design process, establishment and maintenance of most environmental sensor systems, and resulting data handling have many commonalities.
(contributed by Corinna Gries, 2014)
PDF file
    Using the GCE Data Toolbox as an EML-compatible workflow engine for PASTA
Abstract - The GCE Data Toolbox for MATLAB was initially developed in 2000 to process, quality control and document environmental data collected at the then-new Georgia Coastal Ecosystems LTER site (Sheldon, 2001). Development of this software framework has continued steadily since then, adding graphical user interface dialogs (Sheldon, 2002), data indexing and search (Sheldon, 2005), web-based data mining (Sheldon, 2006; Sheldon, 2011b), dynamic QA/QC (Sheldon, 2008), and a growing suite of tools for automating data harvesting and publishing (Sheldon et al. 2013; Gries et al., 2013). We began distributing a compiled version of the toolbox to the public in 2002, and in 2010 we released the complete source code under an open source GPL license (Sheldon, 2011a). Today, the GCE Data Toolbox is used at multiple LTER sites and other research programs across the world for a wide variety of environmental data management tasks, and we are actively working to make it a more generalized tool for the scientific community (Chamblee et al., 2013).The toolbox can be leveraged in many ways, but it has proven particularly useful for designing automated data processing, quality control and synthesis workflows (Sheldon et al., 2013; Cary and Chamblee, 2013; Gries et al., 2013). Key factors include broad data format support, a flexible metadata templating system, dynamic rule-based QA/QC, automated metadata generation and metadata-based semantic processing (fig.1). Consequently, the GCE Data Toolbox was one of the technologies chosen for a 2012 LTER NIS workshop convened to test the PASTA Framework for running analytical workflows (see http://im.lternet.edu/im_practices/data_management/nis_workflows). The lack of built-in support for EML metadata proved to be a significant barrier to fully utilizing this toolbox for PASTA workflows during the workshop; however, complete EML support has since been implemented. This article describes how the GCE Data Toolbox can now be used as a complete workflow engine for PASTA and other EML-compatible frameworks.
(contributed by Wade M. Sheldon, 2014)
PDF file
    Integrating Open Source Data Turbine with the GCE Data Toolbox for MATLAB
Abstract - North Temperate Lakes LTER's streaming sensor data are being used as one of three "science experiments" in a NSF Software Infrastructure for Sustained Innovation (SI2) project led by Tony Fountain (CalIT2, UCSD). A major focus of this collaborative project is software integration in complex science environments. This involves strategic software development, systems integration, and testing through demonstration projects (i.e., science experiments). Major requirements for the software developed by this project include performance, usability, interoperability, and cyber-security. In addition to NTL LTER, these software products will be integrated into production research infrastructures at Purdue University, the University of Connecticut, and the Monterey Bay Aquarium Research Institute to answer important science questions, including: (1) What is the impact of uncertainty in the design of civil infrastructure? (2) How sensitive are ocean systems to pH changes? (3) What is the variability of lake metabolic parameters such as gross primary productivity and respiration?One goal of this collaboration is to make integrating the Open Source Data Turbine (OSDT) streaming data middleware with other environmental community software tools more robust and accessible to information managers. In the first project phase, the existing OSDT - MATLAB interface was improved by developing a toolkit (DTMatlabTK) of easy-to-use MATLAB functions for interacting with Data Turbine servers. Building on these improvements, code was developed to directly access data in OSDT using the GCE Data Toolbox for MATLAB (developed at Georgia Coastal Ecosystems LTER) to provide a robust, automated and configurable QA/QC environment for harvesting real-time sensor data. The GCE Data Toolbox was then used to transform data to be compatible with the CUAHSI Observations Data Model (ODM, see Resources section below for links), and insert processed OSDT data into an ODM database to support an end-to-end workflow from NTL data buoys to a CUAHSI Hydrologic Information Server.
(contributed by Corinna Gries, 2013)
PDF file
    GCE and CWT Host Successful Workshop to Demonstrate, Improve, and Promote the Adoption of the GCE Data Toolbox for Matlab
Abstract - As the volume and diversity of ecological data grows, scientific discovery demands ecological scientists and anthropologists develop common tools to solve common problems so that data, as well as published literature, can be used to frame and envision next-generation research. From November 27-30, 2012, the Coweeta (CWT) and Georgia Coastal Ecosystems (GCE) information managers pursued this goal by leading a workshop on the GCE Data Toolbox for MATLAB. At this workshop, information and data managers from 11 universities and federal agencies were provided a potentially critical step in meeting the need for a common set of tools. The workshop was organized so that attendees were offered time for hands-on instruction that not only provided an introductory framework, but also a considerable amount of unstructured time in which information managers could interact with the software and its developer using their own data to solve their own problems.
(contributed by John F. Chamblee, 2013)
PDF file
    Quality of biodiversity, not just quantity, is key: Right mix of species is needed for conservation
Abstract - A new study of biodiversity loss in a salt marsh finds that it's not just the total number of species preserved that matters; it's the number of key species. If humans want to reap the benefits of the full range of functions that salt marshes and other coastal ecosystems provide, we need to preserve the right mix of species.
(contributed by Brian R. Silliman, 2013)
Web link
    Automating Data Harvests with the GCE Data Toolbox
Abstract - As described in the Spring 2013 issue of Databits, infusions of funding from the ARRA award to the LTER Network (Chamblee, 2013a) plus an NSF SI2 grant to Tony Fountain and colleagues (Gries, 2013) allowed us to make quantum leaps in both the functionality and usability of the GCE Data Toolbox for MATLAB software in 2012-2013. Accompanying funding for user training and support also allowed us to introduce more potential users to this software, and to work directly with new and existing users to take full advantage of this tool (Chamblee, 2013a; Henshaw and Gries, 2013; Peterson, 2013). This intensive work on the toolbox not only resulted in major improvements to the software, but allowed us to develop critical user support resources (https://gce-svn.marsci.uga.edu/trac/GCE_Toolbox/wiki/Support) and establish an email list and Wiki pages to encourage ongoing peer support and collaboration. This process also provided the necessary momentum to remove remaining GCE-specific code from the main distribution and open the Subversion repository to public access, completing a 12-year transition from the toolbox being a proprietary GCE-LTER software tool to an open source community software package.
(contributed by Wade M. Sheldon, 2013)
PDF file
    Putting It Out There – Making the Transition to Open Source Software Development
Abstract - I have spent a significant portion of my scientific career developing and customizing computer software, both to process and analyze research data, and to build systems to disseminate these data to others. Throughout this time I did what the majority of scientists do, and kept this code mostly to myself. There were many reasons for my closed development approach, from the practical ("the code isn't sufficiently documented for someone else to use") to the paranoid ("I don't have time to answer questions or help people use it") to the proprietary ("why should I give away my hard work for free"). But looking back, one of the primary drivers for my attitude was a negative experience early in my career when I found myself competing against my own software for salary money, and lost. A former research colleague found it more cost-effective to hire an undergraduate student to run my software (developed for another project and shared) than to include me on the new project as a collaborator. Although that issue was eventually overcome, it had a lasting impact on my attitude regarding giving away source code.
(contributed by Wade M. Sheldon, 2011)
PDF file
    Systems Upgrade through Technology Transfer across LTERs: Who Benefits?
Abstract - In 2009, the Coweeta LTER site began planning a complete web and information system redesign. In an early preparatory step, John Chamblee (CWT IM) and Ted Gragson (CWT LPI), met with the Wade Sheldon (GCE Lead IM), to discuss potential use of GCE technology for this effort. Both CWT and GCE LTER sites are administered at the University of Georgia, and attempts to forge closer ties between CWT and GCE have been underway since GCE was first established in 2000. The need for a system upgrade presented a fresh opportunity to push the effort forward. After discussion and demos of GCE software, and with the approval of both project leaderships, we agreed to collaborate on adapting several GCE databases and web applications for Coweeta’s use, as well as the GCE Data Toolbox software for processing Coweeta field data. Although work continues, initial products of this collaboration are now implemented on the new CWT website, unveiled in April 2011 (http://coweeta.uga.edu).
(contributed by John F. Chamblee, 2011)
PDF file
    Mining Long-term Data from the Global Historical Climatology Network
Abstract - Long-term climate data are critically important for climate change research, but are also needed to parameterize ecological models and provide context for interpreting research study findings. Consequently, climate data are among the most frequently-requested data products from LTER sites. This fact was a prime motivating factor for development of the LTER ClimDB database from 1997 to 2002. However, direct climate measurements made at the Georgia Coastal Ecosystems LTER site (GCE) are currently fairly limited, both geographically and temporally, because our monitoring program began in 2001. Therefore, in order to put results from GCE studies into broader historic and geographic context and to support LTER cross-site synthesis projects, we rely on climate data collected near the GCE domain from an array of long-term National Weather Service stations operated under the Cooperative Observer Program. Data from NWS-COOP stations are distributed through the NOAA National Climatic Data Center, so we have periodically requested data from NCDC for these ancillary weather stations to supplement GCE data. Unfortunately, this entire process ground to a halt in April 2011 when NOAA announced that it was abandoning the traditional COOP/Daily data forms, meaning that daily summary data sets would not be available from the existing web application beyond December 2010. We clearly needed to find a new source for NWS-COOP data.
(contributed by Wade M. Sheldon, 2011)
PDF file
    Implementing ProjectDB at the Georgia Coastal Ecosystems LTER
Abstract - Two LTER workshops were convened in 2008-2009 to plan and develop ProjectDB, a cross-site research project description language and database. The first workshop brought together a diverse group of LTER Information Managers to define the scope and use cases for the database (Walsh and Downing, 2008; O'Brien, 2009). A second workshop was convened in April 2009, where a smaller group developed an XML schema for storing the targeted project information (lter-project-2.1.0, based on eml-project) and prototyped XQuery-based web services using eXist, a native XML database system (Gries et al., 2009; Sheldon, 2009). The ProjectDB effort was very effective and serves as a model for collaborative software design in LTER; however, design is not the end-point in the software development process. This article describes taking ProjectDB to the next level at the Georgia Coastal Ecosystems LTER site (GCE), by putting the schema and database into production and integrating it with the rest of our information system.
(contributed by Wade M. Sheldon, 2010)
Web link
    Getting started with eXist and XQuery
Abstract - Two recent LTER workshops were convened to plan and develop ProjectDB, a cross-site research project description language and database (Walsh and Downing, 2008). During the first workshop participants agreed to use eXist, an open source native XML database (http://exist.sourceforge.net/), as the back-end system for storing and retrieving research project documents. This database was primarily chosen to leverage ongoing software development work at CAP LTER that uses eXist, but excellent query performance, built-in support for both REST and SOAP web service interfaces, and simplicity of configuration and administration were also influential factors.The combination of eXist and XQuery (the XML query language used by eXist) proved to be extremely effective for ProjectDB, exceeding all expectations. A working group of six Information Managers and a CAP software developer designed and implemented a complete system for storing, querying, summarizing and displaying research project documents in just a few days, including RESTful web services to support all the primary use cases identified during the planning workshop (Gries et al.). The rapid success of this development effort has sparked interest in eXist and XQuery across the LTER IM community, and this article presents an overview and brief guidelines on how to get started using this new XML technology.
(contributed by Wade M. Sheldon, 2009)
PDF file
    Connecting Academic Scientists and Coastal Managers in Georgia
Abstract - (none)
(contributed by Merryl Alber, 2009)
Web link
    Developing a Searchable Document and Imagery Archive for the GCE-LTER Web Site
Abstract - In 2007 the Georgia Coastal Ecosystems LTER program began its second cycle of NSF funding, and as part of the transition from GCE-I to GCE-II we conducted a top-to-bottom review of our integrated information system. One major conclusion of this review was that we needed to do a better job of managing the various electronic resource files that are acquired during the course of GCE research and project management activities, including documents (e.g. publication reprints, reports, protocols), imagery (e.g. rendered maps, photos, logos) and other types of static files. During GCE-I, many of these resources were informally organized in server directories using a file system management approach, with online access provided via URL on various public and private GCE web pages. The only effective way to search for some categories of files was using Google Site Search, and many people ignored the web site entirely and contacted GCE IM staff directly for assistance locating specific files. We also noted in our review that several types of files are already being managed effectively, with file information and network paths stored in relational databases. For example, links to both publicly-accessible and private reprints and presentations are stored in the GCE bibliographic database (http://gce-lter.marsci.uga.edu/public/app/biblio_query.asp). In addition, links to organism photos and other relevant files are stored in the GCE taxonomic database (http://gce-lter.marsci.uga.edu/public/app/all_species_lists.asp). Both of these databases are also integrated with the centralized GCE personnel and metadata databases to support crossreferencing and dynamic linking between personnel records, data sets, publications, and species information. Consequently, we decided to leverage and extend our existing centralized databases and web framework rather than explore the use of other stand-alone file archival systems to provide a more integrated solution.
(contributed by Wade M. Sheldon, 2008)
PDF file
    Practical Distributed Computing Approach for Web Enabling Processor-intensive Programs
Abstract - Providing Internet access to processor-intensive programs, such as ecological models and analytical work flows, can present many challenges. If a conventional web application approach is used for hosting the program (e.g. direct access via CGI or indirect access via ASP/PHP scripting) then processor bottlenecks can lead to a denial of service (DOS) condition on the server if too many requests are received over a short period of time. The longer the application takes to complete tasks the more vulnerable the system is to DOS, particularly as users accustomed to immediate feedback press the "Submit" button multiple times, queuing up even more requests. Employing a multi-tiered architecture with the web and application layers residing on different computers can prevent one over-tasked program from locking up the entire web server, but the application server is still subject to process blocking. A different approach is clearly needed to control web access to processor-intensive programs. An ideal solution would be to use a grid computing infrastructure to schedule and execute long-running analyses; however, grid technology is not yet widely accessible and most existing ecological models and analytical programs are not grid-enabled.
(contributed by Wade M. Sheldon, 2007)
PDF file
    Mining and Integrating Data from ClimDB and USGS using the GCE Data Toolbox
Abstract - Climate and hydrographic data are critically important for most long-term ecology studies, so integrated climate and hydrography databases such as ClimDB/HydroDB and the USGS National Water Information System represent major resources for ecologists. Both ClimDB/HydroDB and USGS also have web-based query interfaces, which provide convenient access to data and basic plots from many monitoring stations across the country. These centralized databases therefore significantly aid users with the first two phases of any data synthesis project: data discovery and data access. Data synthesis doesn't stop with the initial download, though, and many users I've worked with quickly become frustrated performing the remaining steps that are typically required. For example, common follow-up tasks include parsing and importing data into spreadsheets or analytical software, assigning or harmonizing attribute names and units, and integrating data from multiple stations for comparative analysis. Automating these operations is highly desirable, but usually requires custom programming and is not practical for most researchers. Consequently some students and researchers avoid data synthesis all together, viewing it as either too tedious or difficult, while others request help with synthesis tasks from information management staff, adding to their workload. As I've described in several prior Data Bits articles (1,2,3), at GCE we have developed a flexible environment for metadata-based data transformation, quality control, analysis and integration using the multi-platform MATLAB programming language (i.e. GCE Data Toolbox). This software was also used to develop an automated USGS data harvesting service for HydroDB that contributes near-real-time hydrographic data on behalf of 10 LTER sites to the ClimDB/HydroDB database on a weekly basis (4). In the remainder of this article I describe new data mining features recently added to this toolbox that allow users to interactively retrieve data from any station in ClimDB/HydroDB or the USGS NWIS (using MATLAB 6.5 or higher), and then easily transform and integrate these data sets to perform synthesis on their own.
(contributed by Wade M. Sheldon, 2006)
PDF file
21 Records

This material is based upon work supported by the National Science Foundation under grants OCE-9982133, OCE-0620959, OCE-1237140 and OCE-1832178. Any opinions, findings, conclusions, or recommendations expressed in the material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.