Innovative Application of AI Gives New Life to Long-term Monitoring Data

By on April 30, 2018
algal bloom

Canoeist's paddle scoops up algae on Santa Fe River, Florida, not far from Lake George. (Credit: By John Moran [Public domain])

Scientists already know that algal blooms can wreak havoc in lake ecosystems. However, it’s not always clear how exactly local people can manage the issues that contribute to the formation of these blooms. New interpretations of ongoing research data from the University of Florida (UF) are shedding light on this issue, and revealing that managers may have the most success in limiting harmful algal blooms only when they limit both phosphorus and nitrogen in local lakes rather than one or the other.

Setting the scene at Lake George

Ed Phlips, a professor in fisheries and aquatic sciences from the Institute of Food and Agricultural Sciences (IFAS) at UF, has been working on this issue for more than 25 years. Phlips spoke with EM about his work.

“By 1993, the system had been subject to algae blooms for quite a while,” explains Phlips. “In Florida the issue of eutrophication and blooms has been changing over the last century because of the rapid rate of development. The population growth in Florida has been tremendous since the early 1900s, and as a consequence, the interface between human activities and the ecology of aquatic ecosystems has rapidly been changing over that entire period of time.”

Teamed with scientists from the St. Johns River Water Management District, Phlips has been working to find out how nutrients should be limited from entering the Lake George ecosystem, and how the blooms and the species that produce them affect the water.

“The bloom issues have been becoming more acute in a lot of the systems, and St. John’s River and Lake George are one such ecosystem,” details Phlips. “We started working based on the St. John’s River water management district funding in 1993 because they were concerned about algae blooms and they wanted to know more about them.”

algal bloom

The navigable channel for vessels entering Lake George from the south on the St. Johns River in Volusia County, Florida, USA. (Credit: By TampAGS, for AGS Media [CC BY-SA 3.0])

In systems like the St. John’s, much of what Philps and his team can do in terms of monitoring is defined by how much funding the work receives.

“We do the best job we can of describing the system with the resources we have,” Phlips states. “In the case of St. John’s River, we’ve been sampling once a month since 1993, at between six to six or seven sites up to 12 sites during the year. The information for this particular study was based on the single most represented site from our work in Lake George.”

The work is time-consuming and labor-intensive. Team members collect water and field data monthly from a boat, sending samples out to St. Johns River Water Management District subcontractors for water chemistry data analysis, and keeping phytoplankton analysis in-house.

“We analyze the phytoplankton in terms of species composition, and then we convert the numbers from all the individual taxa or species into bio volume, which is basically a measure of the size of the cell,” Phlips explains. “That provides us with an idea of wet weight, and then we will convert that into carbon equivalents so we can get a better idea of the carbon within that phytoplankton, which is a good measure of biomass.”

The team thus generates abundance, bio volume data, and carbon data every month. Each sample takes anywhere between three to five hours of analysis, plus conversion work and recording; it requires a great deal of expertise.

“The person who is working with me now on phytoplankton analysis has been doing it for 30 years, all the people that have been working on analyses have been in professional phytoplankton analysis for anywhere from five to 30 years,” remarks Phlips. “Phytoplankton analysis requires a lot of experience and a lot of knowledge to be able to do well.”

Seeing the forest and the trees

Identifying what causes the frequent and potentially dangerous algal blooms in the system has always been among the central goals of the research. However, it’s not a simple answer, and has challenged the scientists working on the project.

“Over the last couple of years I’ve been focusing more on looking at new ways to use data; I’ve got a lot of data sets, some of them very long, and so I felt like it’s about time that we start looking at different approaches, including quantitative analysis,” states Phlips. “We wanted to look for drivers of phytoplankton; in my case, often it’s phytoplankton dynamics or harmful algae blooms. We know that nutrient concentrations of things like phosphorus and nitrogen are important. We sometimes don’t know how important each one of those variables is in different systems.”

This is where Rafael Muñoz-Carpena, a UF/IFAS professor of agricultural and biological engineering, came into the picture. Muñoz-Carpena and a research team including doctoral student Natalie Nelson analyzed 17 years of the data that Phlips and his team had collected over the years from Lake George. Muñoz-Carpena corresponded with EM about the work, including the team’s application of Random Forests Analysis (RF) to the problem.

algal bloom

Rafael Muñoz-Carpena (above), a UF/IFAS professor of agricultural and biological engineering, led a research team, with his doctoral student Natalie Nelson that reviewed 17 years of data collected by Ed Phlips’ lab from the waters of Lake George. (Credit: Rafael Munoz-Carpena, UF/IFAS)

Random Forest (RF) is a statistical machine learning (ML) method. Like any kind of ML, it uses computing power to automatically analyze huge amounts of measured data in order to identify patterns that may be hidden to human observers, and make reasonably accurate predictions of even complex behaviors or interactions. RF in particular does this by creating “decision trees”—in fact, whole forests of them—to correctly predict some missing variable.

“A simple example of this would be driving on a road to get to point B without a map, with many intersections along the road,” explains Muñoz-Carpena. “At every intersection, we could toss a coin to decide to go left or right, with a 50% probability of getting either, hoping that in the end we will get to point B. If enough different drivers take this road randomly and we collect their blind decisions favoring those that allowed them to arrive best to point B, we can adjust the left or right probabilities at each intersection until we find a final road map (or decision tree) that improves the chances to get to B more efficiently. This is the prediction we are seeking.”

A computer can execute this massive kind of process quickly, automatically repeating this process many times at many points in time and space, based on the available data. In this way, it can create a forest of many decision trees to find the one that best matches the value of the variable of interest. For Lake George, the variable(s) of interest had to do with environmental conditions leading to algal blooms, such as water temperatures, light levels, nutrient levels, and densities of bottom-feeding aquatic life.

The research shows that for the blooms that occur in Lake George, nitrogen and phosphorus levels are the important factors.

Putting it all together

Even generating the data sets used in this research takes a notable amount of expertise; this is a testament to the value of high-quality, long-term monitoring.

“The analysis done microscopically; we don’t just look at the sample, hold it up to the light and say there’s a lot of algae there,” remarks Phlips. “We actually conduct the quantitative analysis and qualitative analysis using microscopy, and for the small things we’ll use fluorescence microscopy. So we use different techniques but basically it’s all microscopic.”

Once this kind of data is generated, the issue becomes interpretation, and putting the data to use. Applying AI to this kind of data set is an exciting new possibility—but it’s not a one-size-fits-all solution.

“There are many brave new opportunities to look at ever more abundant data to get a glimpse on the complex world around us,” remarks Muñoz-Carpena. “However, we need to be very cautious and understand the limitations of these methods and not fall into the trap of believing that we can completely predict behavior of things that are complex and intrinsically unpredictable. We must remember that while these methods might be efficient at predicting existing and well-established patterns in the data, they do nothing if the patterns are shifting as is often the case in many real, complex systems. In many cases we can only predict what is expected or known, perhaps influenced by our own biases.”

Muñoz-Carpena is right to point out the complex and dynamic nature of an ecosystem like Lake George. Many driving factors change algae dynamics in such an ecosystem, including nutrients, water residence time or other hydrodynamic conditions, and climatic cycles and weather.

algal bloom

Taken in October 2011, the worst algae bloom that Lake Erie has experienced in decades. Record torrential spring rains washed fertilizer into the lake, promoting the growth of microcystin producing cyanobacteria blooms. (Credit: By Jesse Allen and Robert Simmon – NASA Earth Observatory, Public Domain)

“We see that many El Niño and La Niña years tend to be higher phytoplankton years, and we also see multidecadal patterns,” remarks Phlips. “We’re concerned about global warming and how that will impact phytoplankton population. The most common harmful algal bloom forming species in the system are cyanobacteria, which are very competitive at higher temperatures. Future increases in water temperature might bring increased problems with cyanobacteria—not just increased intensity, but also increased seasonal extent of the blooms. In other words, in subtropics and tropics, global climate change may prompt not just higher temperatures in the water during the warm season, but an extension of that warm season further into the fall and later into the spring.”

How can scientists hope to deal with these complex systems and apply things like ML models to them with any predictive reliability? In part, simply by addressing these issues head-on.

“As a community we must be transparent, state our assumptions and limitations, and provide all materials, such as extensive supplementary materials with data, code and all parts of the analysis, needed to ensure the reproducibility of our work so others can judge and improve on our work,” asserts Muñoz-Carpena.

In any case, this research certainly proves that long-term monitoring yields important results—results that are worth supporting.

“There is a dire need to improve the type of monitoring we currently do in many environmental problems,” states Muñoz-Carpena. “Long-term monitoring like that Dr. Phlips did is expensive, time consuming and often does not yield immediate results amenable for quick-publication in the ‘publish or perish’ academic culture of today. It takes a lot of vision and determination by researchers, funders and institutions to sustain long-term and detailed monitoring efforts. Often these efforts are the first ones to go on economic turndowns or reprioritization of research, and in academia we do not have a good reward system in place for them. To advance in many of these difficult environmental problems, it is critical that we have available detailed long-term datasets like the one used in this work.”


About Karla Lant

Karla Lant is a professional freelance science writer and a member of the Society of Environmental Journalists. She also covers other scientific and medical stories as well as technology.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.