Thanks to Andrew Durso of Utah Public Radio for doing this interview with me. Listen here!
Post a comment »
Thanks to Andrew Durso of Utah Public Radio for doing this interview with me. Listen here!
I had a great day a couple weeks ago scoping sites on the Jornada Experimental Range. John Anderson, the site manager for the Jornada, gave us the grand tour of areas that had good biocrusts and areas that had some of the plants that we considered targeting. We had great weather and were really able to benefit from his expert knowledge of all of the plants and all of the experiments that have been set up there over the years. It’s a wonderful site to have near us at UTEP.
The picture above shows Eva wetting up some crusts to see what they do! One interesting thing we learned is that there has been a recent convergence of interest in biocrusts at the Jornada, with now three big projects, one focused on biocrust restoration, one focused on examining biocrust effects on soil stability, and our study of translocation between biocrusts and plants. We are hoping to be able to share insights and findings with these groups and gain an unprecedented look at the role of this component of the Jornada site.
Here’s the ad for the Ph.D. positions in my lab. I really think this will be a cool opportunity for a couple of aspiring scientists so please contact me if you’re interested! Please note that though the application date for the UTEP graduate program has technically past, I can still accept applications without issue.
The Darrouzet-Nardi lab at the University of Texas at El Paso (UTEP) is recruiting two Ph.D. students to work on a recently funded NSF grant to study interactions among plants, biocrusts, and fungi in the deserts of the Southwestern United States. Three years of full funding (RA support including summers, project supplies, and travel costs) is available. In addition to joining our growing ecology program at UTEP, students will have the opportunity interact extensively with leading ecologists at both the University of New Mexico and the U.S. Geological Survey in Moab, Utah. The main goal of the project is to test the “fungal loop hypothesis” using isotopic and other biogeochemical techniques. More info on the project goals here. Students will have the opportunity to work at three desert field sites: the Jornada Experimental Range in New Mexico, the Sevilleta National Wildlife Refuge in New Mexico, and a site on the Colorado Plateau near Moab, Utah. The ideal candidate would have some research experience, a published paper from work in any discipline as an undergraduate or M.S. student, strong performance in science courses, and a desire to do field work. Though initially students will work with our team on grant objectives, they will also have considerable opportunity to springboard into projects of their own design. If you love deserts and science, this is a fantastic Ph.D. opportunity. Contact email@example.com if you are interested or have questions.
Field site near Moab for a similar biocrust project
I just wanted to share a more detailed description of our recently funded project for prospective students and anyone else who is interested. This description is similar to NSF’s public abstracts, the first paragraph being for a general audience, and the second more technical for biologists. A key objective of the project is to create a lot of fun opportunities for students at all stages to get involved with Southwestern desert ecology. If you are one of those prospective students, please contact me!
Testing the fungal loop hypothesis for C and N cycling in dryland ecosystems
Understanding the basic biotic relationships and exchanges of energy and nutrients among desert organisms is important because drylands cover about 40% of Earth’s surface and play essential roles in global change phenomena such as rising atmospheric CO2, warming, and dust deposition. Recent evidence suggests that fungi play a critical role in supporting arid ecosystems. Fungi scavenge for nutrients and transport them throughout the soil using their filamentous root-like structures known as hyphae. Fungi may be especially important in areas with extensive biological soil crusts (biocrusts), which consist of surface-layer bacteria, fungi, lichens, and mosses. Biocrusts are common in drylands and confer benefits such as greater soil fertility (some are photosynthetic, turning atmospheric carbon into usable sugars and some can fix nitrogen from the air into forms usable by plants and other organisms) and stability, but how they interact with the dominant plants and how fungi mediate this interaction is not well resolved. This project explores the “fungal loop hypothesis,” which posits that fungal hyphae form a bridge between plants and biocrusts and transport resource such as water and nutrients between plant and biocrusts, conserving the scarce resources. While studies of fungal physiology, genetics, and nutrient cycling have provided support for the fungal loop hypothesis, no comprehensive studies have examined the importance of the fungal loop across multiple dryland sites. To test its importance, researchers on this project will study three different deserts: the Chihuahuan desert near El Paso, TX, the Colorado Plateau near Moab, UT and a site between those, the Sevilleta National Wildlife Refuge near Socorro, NM across three different years and two seasons. At these sites, they will quantify the movement of resources through fungal hyphae and develop a framework for understanding where and when the fungal loop is most important. If the broad importance of the fungal loop can be demonstrated, it will represent a fundamental difference between drylands and wetter environments, and lead to a new understanding of what drives ecosystem processes in drylands.
The overall objective of this study is to test the fungal loop hypothesis by studying C and N translocation and retention across representative dryland sites. Using a set of field experiments at three sites, this project will address three questions: (1) How do translocation rates (i.e. transfer of C and N between plants and biocrusts through fungal hyphae) vary among dryland sites, plant and biocrust types, and seasons? (2) Does translocation improve growth, productivity and retention of C and N for plants and biocrusts? And (3) Are translocation rates determined by the stoichiometric requirements of plants and biocrusts? The proposed work will generate a predictive framework for when and where translocation of C and N between plants and biocrusts is greatest by examining translocation rates using isotopic tracers in sites across a latitudinal gradient with multiple biomes, a variety of plant and biocrust functional groups (eg. C3 and C4 grasses, forbs; light cyanobacterial vs. dark, multi-species communities) and different seasons (spring and monsoonal growing seasons). The work will also examine the importance of translocation by experimentally severing hyphal connections and measuring the effects on plant and biocrust performance as well as retention of C and N in the ecosystem. Finally, to address the mechanism of translocation, the investigators will test the hypothesis that stoichiometric gradients drive C and N movement through fungal hyphae (see figure above) by experimentally manipulating C and N gradients and observing the effects on the horizontal movement of C and N through the soil, also with the use of isotopic tracers. This research approach will allow for an unprecedented evaluation of the extent to which fungi are the key regulators of C and N cycling in dryland soils as suggested by the fungal loop hypothesis.
Just before starting my current position at UTEP, I was at USGS in Moab as a postdoc for about a year and a half. Most of that time was spent working on a fascinating data set that came from a large warming experiment set up in the red rock desert of southeast Utah at a site thick with gorgeous biological soil crusts (biocrusts). I was working with my postdoc advisers Dr. Sasha Reed and Dr. Jayne Belnap, who set up the project, as well as USGS scientist Ed Grote, who was crucial to the technical aspects of the operation for the entire length of the study.
Here’s the paper, which came out in Biogeochemistry last week. The main purpose of the study was to examine the effect of the warming treatment on C exchange in these biological soil crust dominated soils. I think the findings for the crusts themselves was pretty clear. We saw that when they were active, that is when the soils were wet enough for the crusts to be photosynthesizing (~10% of the time), the warming treatment negatively affected the carbon balance (denoted as net soil exchange or ‘NSE’ here) in these soils. The warmed soils lost more or gained less carbon than the control soils.
This overall trend was noisier when we looked at the entire year, and I believe this is because we saw high CO2 loss rates in the spring. The source of these CO2 losses is unclear. Here is the yearly trend in CO2 flux:
and here is a graphic I made for an associated presentation indicating the possible sources of CO2.
The interesting thing to note here is that the soils are losing CO2 almost all of the time, which means that the CO2 source cannot come purely from the biocrusts (otherwise they would rapidly disappear from constantly losing C). In the supplemental material, we discuss how we consider the likely sources aside from biocrusts to be (1) sub-crust microbes (2) vascular plant roots and (3) pedogenic carbonates, which would be an interesting inorganic source. My best current guess is that the carbonates are not a big contributor to the annual CO2 fluxes at this site, but honestly it hasn’t been thoroughly studied enough and I think there is a lot more to know about how this inorganic pool interacts with CO2 from plant roots and microbes. Deserts have a huge quantity of inorganic carbon and if even a fraction of it is being actively exchanged it could be a big deal.
Perhaps the coolest thing we saw from taking an exceptionally close look at the high-resolution CO2 exchange numbers was that the biological soil crusts appear to be photosynthesizing under the snow. Check out this event from March of 2006:
The upper part of the graph shows CO2 exchange, with photosynthesis occurring when the points drop below zero (highlighted in green). Precipitation and temperature are shown on the bottom. Here’s a picture of what this precipitation event looked like:
I want to take some biocrusts into the lab and prove that this occurring and explore the dynamics a bit more, but for now this is some intriguing evidence!
The last contribution I want to mention about this paper is the gap-filling techniques that we employed. I used a super cool R package called
missForest which iteratively fits random forest models to fill in all gaps in a data set. You can feed it any data frame and it will do its best to fill in the blanks. It’s best can be surprisingly good, and I’ve been joking that I’m now very good at creating fake data.
I found I got the most impressive results when I fed it three days worth of time lags on either end of the missing data point. The flux from yesterday at noon is a great predictor of the flux today or tomorrow at noon. This technique worked great for all of the smaller gaps, and is less good for larger gaps, but we were blessed with a relatively complete flux data set compared to many. That’s a testament to Ed and his team keeping the system up. Anyway, if you want more info on this technique, you can see the code for yourself in the supplement and I’d also be happy to answer questions.
There is an interesting post on 538 today showing that most scientists cannot explain intuitively what a p-value really tells you. This does not surprise me, but may surprise some who still think p-values are an acceptable analysis framework. Some can spout the definition but the definition is pretty useless because it relies on epistemologically problematic ideas like separating events into black and white categories of “due to random chance” or “not due to random chance.”
That said p-values are not devoid of information and the way to interpret them is to imagine what they show about a confidence interval. If your p-value is exactly equal to 0.05, your 95% confidence interval for the parameter of interest will exactly touch zero on one end (or whatever your “null hypothesized” value is). Here’s an example of a t-test where p is really close to 0.05. Note the very close link between the p-value being just under 0.05 (yay, it’s “significant!”) and the confidence interval being barely constrained to not crossing zero. The data underlying this particular test is shown in the graph.
So, the p-value is useful insofar as it gives you a clue about what the confidence interval is for your effect size. Confidence intervals are easily human-understandable (e.g., 95% chance the true value is in this range given assumptions of calculation method) while p-values are not. Why not just report the confidence interval instead? That is a rhetorical question.
To get a better sense of p-values for yourself, here’s some code you can fiddle with. Watch the confidence interval and p-value output from the t-test alongside the graph while tweaking the sample size, variability, and true difference between samples. It would be cool to make this runnable and tweakable right here on the website but that’s maybe a project for another day.
n = 5 # sample size sd = 1 # variability con <- rnorm(n, sd=sd) + 3 trt <- rnorm(n, sd=sd) + 5 # true difference is 5-3 = 2 t.test(con, trt, var.equal=T) d <- data.frame(values=c(con, trt), treatment= factor(rep(c("control", "treatment"), ea=n))) xyplot(values~treatment, d, pch=16, cex=2, col = rep(c("black", "red"), ea=n))
This might be made even more clear by looking at a paired or “one sample t-test” type of example where you can literally see that 95/100 of the paired differences from many simulations will be within the confidence interval and that, again, that confidence interval will just barely graze 0 when p=0.05.
Along these lines of asking scientists to explain commonly used statistics, I saw in a stats class the professor ask students to draw a standard deviation on a set of points like the ones in the above graph. (This was a radically different kind of stats class and the first place that clued me in to the problems with the p-value paradigm). It was intriguing and instructive to see how difficult it was to interpret a basic concept like standard deviation intuitively. Not to leave you hanging, ~68% of points should fall within the SD.
I needed to spruce up the bare walls in my office, so I whipped up this crust poster. Fellow crust lovers, if you want one, let me know!
At most large research universities, there are two or more biology departments. At my undergraduate institution UC Berkeley it was Molecular and Cell Biology and Integrative Biology, with many biologists also in the Environmental Science, Policy, and Management Department. At my graduate institution CU Boulder it was Molecular, Cellular, and Developmental Biology and Ecology and Evolutionary Biology. I don’t know the history of these departments, but in some cases I’m sure there were splits, or perhaps in other cases, departments coalesced from older disciplines such as bacteriology and zoology. At my current institution, UTEP, there is still a united Biological Sciences department, which is one of the things that got me thinking about this issue.
Today, these departments are kept apart for a few reasons. First, the gap in the scale of study systems can lead to disinterest in work on the “other side” of biology. If you study plant ecology, its can seem difficult to get anything relevant out of a talk on something like neuroscience. Second, there is a structural difference in funding sources: NIH vs. NSF. If you are doing human biology, you are looking at NIH funding and if you are doing non-human biology, NSF is your go-to agency. The different missions of these agencies are accompanied by cultural differences and career-path differences which can reinforce the gap between molecular/cell biology and ecology/evolutionary biology.
Lately though, I have noticed something interesting happening. As genetic and omics-style techniques become popular in both major branches of biology, the gap between the two biologies has narrowed. Ecologists and evolutionary biologists are reaching for molecular biology and and genetics tools more than ever to understand the underpinnings of non-human organisms and ecosystems. Biomedical researchers are realizing that due to the microbiome, the human body is in fact an ecosystem with phylogenetic diversity, ecological interactions, and the exchange of energy and nutrients among microbes.
Both biologies are making extensive use of omics-style methods. As an ecologist, I am wanting to understand which microbes contain genes that code for key enzymes that break down soil organic matter. I also want to get a better chemical sense of aqueous chemistry in soils, an endeavor that may culminate in metabolomics-type approaches. At the same time that soil ecologists are hunting for genes that control biogeochemical transformations in soils, cancer biologists are replacing an organ and tissue-based system of cancer classification with a system based on gene mutations. Using these gene-based approaches, both groups of biologists are trying to process bigger data sets, and are further tied together by bioinformatics.
To me this convergence of questions and methods is profound in that it underscores the importance of the key tenets in biology. The uniting principle of biology is the evolution of life through natural selection of self-replicating genes that are organized into a diverse suite of organisms. This idea and paradigm is so powerful that it can apparently pull diverging fields of inquiry back together after they have drifted apart.
I’m not arguing that subdisciplines won’t remain distinct, or that these departmental boundaries will disappear, or even that the funding rift is going anywhere anytime soon. However, I do believe that now is a great time to walk across campus and check in with your colleagues in the “other side” of biology.
In the early 1980s, advances in analyses of microbial function in ocean water allowed researchers to observe that smaller microbial producers–distinct from well studied photosynthetic dinoflagellates and diatoms–were leaking a substantial amount of dissolved organic carbon (DOC) into the water column. This DOC was then taken up by heterotrophic microbes, who were in turn eaten by phagotrophic protozoa, who were then finally consumed as an additional food source by the primary consumers of the traditional food web. This previously unobserved pathway for transfer of energy and key elements like C and N appeared to operate in parallel to the traditional food web, and was thus named the “microbial loop.” The microbial loop was eventually shown to account for a substantial amount of the C and N cycling and storage in ocean systems, changing our basic understanding of ocean biogeochemistry.
The idea of parallel cycling “loops” has since caught on in biogeochemistry as a way to describe alternative and previously underappreciated cycling pathways. I’ve recently been working with ecologists at the University of New Mexico who have been developing the analogously named “fungal loop hypothesis.” About a decade ago, they began to synthesize evidence suggesting that fungi were an unusually important part of C and N cycling in desert soils. Much like the microbial loop, this non-standard pathway for C and N cycling in soils appeared to represent a fundamentally different cycling pathway in which fungi were key agents of storage, transformation, and translocation for C and N. These functions are normally more closely associated with soil organic matter and heterotrophic bacteria in wetter ecosystems.
I have also heard some rumors that tropical ecologists have been thinking about an alternative “loop” of their own. This makes a lot of sense to me because carbon and nutrient cycling is much more associated with the massive amount of living vegetation in tropical forests than it is in temperate ecosystems. I think this is a really exciting area of biogeochemical theory, with the potential to revolutionize our understanding of how these basic cycles vary among biomes. As more tests of these hypotheses enter our syntheses and the true nature of these alternative loops are understood, we may be partially rewriting and definitely expanding our knowledge of how these key biological elements are cycled in the biosphere.
I’ve always been uncomfortable with drawing insights from big complex ANOVA models, and in searching for a couple stats references for a paper, I found there is a sizable literature on the topic.
Here’s a statement from a book by Robert Rosenthal and colleagues that summarizes exactly how I feel about these models:
The problem is that omnibus tests, although they provide some protection for some investigators from the danger of “data mining” when multiple tests are performed as if each were the only one considered, do not usually tell us anything we really want to know.
I like the term “omnibus tests” as a description for these types of analyses that try to arrange all possible variables into a hard-to-interpret statistical model. I find the interpretation of “interaction effects” in many models I see to be particularly problematic.
As an alternative to these omnibus tests, the authors suggest using “focused contrasts,” which to me, sounds very similar to the “lots of t-tests” approach that I have settled on for many of my analyses. In their book, they present some novel algorithmic approaches to making these contrasts, and while I have not read the whole book to understand what they did, I think that the basic concept of learning about data from many simple analyses instead of one kitchen-sink style analysis is the same.
I also think they are correct that fear of errors from multiple comparisons is a big reason people gravitate toward omnibus tests. My feeling is that the epistemelogical challenges that multiple tests can create are better dealt with in the interpretation phase rather than the calculation phase of an analysis.
The idea of fertilizing the ocean with iron to capture carbon is one of the more colorful ideas to arise directly from a basic understanding of biogeochemistry. Ocean phytoplankton are iron-limited and alleviating that iron limitation makes them grow enough to cause some of the fixed carbon to get locked away in the deep ocean. There was substantial debate about a decade ago about the feasibility of this idea for fighting climate change and I seem to remember that most of the scientists involved decided that it was an interesting but not practical idea.
A re-promotion of the idea was making the internet rounds today due to this profile of a prominent advocate—practicioner even—of iron fertilization. I had forgotten the details of the debate so it was fun to revisit them. The profile is written in such a way to make it seem like iron fertilization is a great idea that is being held back only by environmentalists scared of geo-engineering. Unfortunately for the author–or perhaps to their credit in being objective–the counterarguments to this viewpoint are apparent in the article itself. The author writes:
iron fertilisation could potentially sequester as much as 1 billion metric tons of carbon dioxide annually, and keep it deep in the ocean for centuries. That is slightly more than the CO2 output of the German economy, and roughly one-eighth of humanity’s entire greenhouse gas output.
This sounds good until it is put to a simple cost benefit analysis. The one-eighth figure likely comes from a modeling study, and is also discussed in an editorial in Nature arguing against iron fertilization back in 2009. (I note that the authors here are better described as ‘scientists’ than ‘environmentalists.’ ) In that editorial they write:
A model published in 2008 (K. Zahariev et al. Prog. Oceanogr. 77, 56–82; 2008), which is as convincing as any available, found that even if the entire Southern Ocean were fertilized forever with iron sufficient to eliminate its limitation of phytoplankton production, less than 1 gigatonne of carbon a year of CO2 of probable future emissions (currently about 8 gigatonnes a year) would be sequestered, and that amount for only a few years at best.
So there you have the cost and the benefit. The benefit: we may reduce 1/8th of our emissions, a substantial and impressive amount, but nowhere near enough to stop global warming. The cost: we risk fucking up an entire ocean. The ‘environmentalists’ were ridiculed in the the pro-fertilization piece for having that attitude, but it hardly seems outrageous to worry that fertilizing an entire ocean to stop 1/8th of our emissions could have unintended consequences, and the Nature editorial shows that scientists, myself included feel the same way.
There are too many examples of fragile food webs tied together as trophic cascades and biocontrol agents gone wrong to not worry about unintended consequences of a vast ecological manipulation in a system that is not totally understood. We are still barely getting a handle on the consequences of doubling carbon in the atmosphere. Is it wise to perform a similar experiment with another element in the ocean? The pro-fertilization piece crescendos toward this point at the end:
The ocean is no longer a vast, unknowable wilderness, whose mysterious gods must be placated before it can be crossed. Instead, it’s become the first viable arena for large-scale manipulation of the planetary environment.
What could possibly go wrong?
There is a cool new paper in Nature showing that semiarid ecosystems can have big effects on overall carbon concentrations in the global atmosphere, primarily through enhanced productivity of vegetation in wet years. The authors write:
As the dynamics of dryland systems, which cover 45% of the Earth’s land surface, increase in global importance, more research is needed to identify whether enhanced carbon sequestration in wet years is particularly vulnerable to rapid decomposition or loss through fire in subsequent years, and is thus largely transitory. Such behaviour may already be reflected by the larger-than-average atmospheric growth rate in 2012 (ref. 30) that was associated with a return to near-normal terrestrial land sink conditions.
In other words, the next question is how long can these systems lock up this carbon in the transient vegetation of deserts? I would guess that, as the authors suggest, a string of dry years would cause these gains in carbon storage to be lost, but it might take a couple of decades to work the carbon out of the soil, and increased fertility in the short term can lead to soil stabilization, thus increasing overall ecosystem fertility, particularly with good land management practices. So this could be a good thing for the climate, and an essential role played by dryland ecosystems.
One of Mike’s and my papers just came out this morning in Soil Biology and Biochemistry (link here). We compared nitrogen concentration data from our lysimeter samples of soil pore water and our extracted samples from soil cores and found were were seeing a lot more nitrogen in the soil core samples. This seemed to support the idea from a paper by John and Erik Hobbie a few years ago in which they suggested that soil amino acid concentrations seemed a little too high given the strong uptake capability of microbes, and that thus some of it might not be biologically available. Here’s our key figure showing the relative magnitude of nitrogen pool sizes and suggesting a sizable “inaccessible” pool:
You can see that the cores (red and dark blue lines), even though quite variable, definitely get more nitrogen than the lysimeters (light blue), which spike at the beginning of the season and then drop to low levels. It also looks like the adsorbed pool (mostly ammonium bound to the soil matrix) and the inaccessible pool can change throughout the season, so the inaccessibility isn’t permanent. Overall, a fun comparison of techniques that I think has implications for the way nitrogen is actually cycled in soils.
One of the coolest results so far from the global change experiment I’ve been working on is that mosses that are part of the biological soil crusts have pretty much all died in response to the experimental addition of numerous small rainfall events (Reed et al. 2012). The mosses die because they lose more carbon during the small wet-up events than they can gain from photosynthesis during the short time that they are rehydrated. This finding got us curious about the moss physiology and biochemistry behind why they cannot keep up with the carbon demands and what ultimately causes their death.
To help answer that question, my postdoc adviser Sasha Reed and I visited the University of Missouri last week where we learned some awesome analytical techniques in Mel Oliver’s lab, particularly from his amazingly skilled postdoc Abou Yobi. These guys are experts in plant biochemistry and we had a lot of great conversations about stress biology and the chromatography techniques (one of their cool HPLC units above) that they are using to measure the key metabolites that govern plant responses to stress.
Many mosses, especially in the desert, are dessication-tolerant, meaning they can completely dehydrate and rehydrate as part of theyr normal physiological function. There is a ton of fascinating biochemistry here, particularly in understanding how the plant cells respond when they dry out and wet up. Plants have two strategies in surviving dessication: protection and repair. In practice, most dessication-tolerant plants are somewhere along this spectrum. They can try to protect their cells in preparation for dessication by resorbing and preserving themselves as best as possible, or they can devote resources to repairing cells and tissues once they rehydrate.
There are a huge number of metabolites, proteins, enzymes, and other compounds that help these mosses perform the amazing feat of surviving total dessication and rehydration. There are compounds that help them resist the osmotic shock of drying out and rewetting–osmoprotectants–such as sucrose and some amino acids. One of the interesting facts we learned is that Syntrichia ruralis, a sister species to the one present in our experiment, S. caninervis, is 10% sucrose by weight when it is dried out. This is the first thing we want to look at in our mosses because if they cannot regenerate those sugars, they may not be able to dry out properly, sustaining damage to their membranes without the sugar to mediate the collapse of cell membranes and organelles.
Other potentially important compounds include antioxidants, photoprotective pigments, and a class of proteins called LEA proteins that can prevent reactive oxygen species (ROS) from building up and damaging cells. Mel and Abou have been using “omics” approaches in which a large number of metabolites (metabolomics), expressed genes (genomics), proteins (proteomics) or other compound classes are measured simultaneously to give a more complete biochemical picture of a plant tissue under conditions such as dessication (example figure above). Perhaps the ultimate way to answer our question would be to use these techniques on mosses over the course of the rapid wet-dry cycles that ultimately result in their death. We won’t go that far at least for now, but it was fascinating to learn what kind of analytical capability is out there. Bringing these types of approaches to global change experiments could be an excellent avenue for future research and collaboration!
Drylands are not known for massive soil carbon stocks like the Arctic tundra or plant carbon storage like tropical forests. However, drylands account for a large fraction of the terrestrial surface so they are still important when considering global carbon.
Many drylands biogeochemistry papers include statements like this one:
Arid, semiarid and dry-subhumid ecosystems (drylands) occupy 41% of the terrestrial surface, and account for ca. 25% of global soil organic carbon (C) reserves.
To put those numbers in context using the aridity index≤0.65 definition of drylands (numbers from Millenium Ecosystem Assessment):
Dryland: 41% of terrestrial surface
Dry Subhumid: 8.7%
Frozen ground and Permafrost: 36%
Everything else (mostly forests): (23%)
Tropical Forests: 5% (used to be 2x more!)
All forests (overlaps with other categories, particularly boreal forest): 28%
When you follow the references back to their initial sources, you find that the carbon number (ca. 25% total global soil carbon) comes from large pedon databases with information on typically 3,000-10,000 soil profiles that were developed over several decades. Here’s an example. There is no reason to doubt these data, but they do have some limitations like several noted in Campbell et al. (2008):
The methods for estimating carbon storage vary widely, and no single method is considered highly accurate…The 1m depth is appropriate for this analysis, but likely underestimates carbon emissions from deeper peatland systems. No global dataset of peat depth is yet available.
This brings up an interesting point which is that the percentage might be a lot smaller if you account for Arctic peat and wetland C. From the most recent IPCC report:
The terrestrial biosphere reservoir contains carbon in organic compounds in vegetation living biomass (450–650 PgC; Prentice et al., 2001) and in dead organic matter in litter and soils (1500–2400 PgC; Batjes, 1996). There is an additional amount of old soil carbon in wetland soils (300–700 PgC; Bridgham et al., 2006) and in permafrost soils (see Glossary) (~1700 PgC; Tarnocai et al., 2009).
So if we count the additional carbon stocks in wetlands or permafrost, it could reduce that percentage of global soil carbon in drylands by about half, to ~13% of the global total.
The Tarnocai paper does point out though that 88% of that unaccounted-for carbon is totally frozen right now. Some of that may well defrost and mineralize to CO2, but a lot of it still won’t. How much defrosts remains to be seen as climate change progresses. On the other hand, most C in drylands is very close to the surface and thus potentially vulnerable to loss with climate change. A new study in Nature shows that dryland C is negatively correlated with aridity in semiarid and arid ecosystems, so as desertification continues, drylands seem likely to lose soil carbon:
In conclusion, current estimates of 25% global soil carbon in drylands come from soil profile data and the limitations of those data include (1) they are not necessarily collected with consistent methodology and (2) they generally don’t include carbon deeper than 1 m. Depending on whether you count old carbon in permafrost and wetlands, the total percentage of soil carbon in drylands may be more like 13%, or half of what is often cited. However, because dryland carbon is near the surface, it may still be vulnerable to loss with climate change.
I have enjoyed boning up on my biological soil crust knowledge during my first few weeks here. Here’s a crust picture I took on a trip to the Canyonlands National Park Needles District back in January 2006. This crust probably took decades if not centuries to get to this point. These “pinnacled” crusts only form in dryland areas with cold winters like here on the Colorado Plateau. The structure is created by freeze-thaw cycles.
Here’s the definitive book on the subject that I’ve been poring through:
It’s got tons of great information and hopefully some of my work here will be able to contribute to our more general knowledge of how this cool component of dryland ecosystems might respond to the forces of global change we are inflicting upon them.
When I was looking for the above crust picture, I also came across a panorama of Chesler Park that I had never stitched together. Enjoy:
I’ve just begun a new postdoctoral position at USGS in Moab, Utah. Now that I’ve drawn my Arctic project mostly to a close (a few papers still to finish up), I am really excited to shift from the tundra to the desert. Though the ecosystem is different, the nature of the project is the same as my previous work in which I have applied biogeochemical and ecophysiological methods to understand how global change is affecting ecosystems.
Here in Moab, I will be working with world renowned soil crust expert Dr. Jayne Belnap as well as my grad school friend and fellow biogeochemist Dr. Sasha Reed. We’ll be leveraging an awesome field experiment that Jayne has been running near Castle Valley, Utah for the last seven years. It’s a DOE-supported project in which they have applied warming (via infrared lamps, see above) and watering treatments to crust-dominated desert soils. It also features an automated CO2 chamber system (lower left on picture) that is producing some sweet data. They have seen some interesting results so far and I am excited to jump in and contribute to and build upon this project.
It’s fun to be back in the American West and to live in a place that my family used to go on vacation. I first rolled through Moab when I was about 10 in 1991. It’s grown since then but appears so far to be just as much fun! We’ve already been to Arches about 5 times and met lots of nice people. Should be fun!
I just read an interesting article in the New York Times.
Universities can hardly turn out data scientists fast enough. To meet demand from employers, the United States will need to increase the number of graduates with skills handling large amounts of data by as much as 60 percent, according to a report by McKinsey Global Institute. There will be almost half a million jobs in five years, and a shortage of up to 190,000 qualified data scientists, plus a need for 1.5 million executives and support staff who have an understanding of data.
And from the McKinsey report:
To capture the full economic potential of big data, companies and policy makers will have to address the talent gap. New research by the McKinsey Global Institute (MGI) projects that by 2018, the United States alone may face a 50 to 60 percent gap between supply and requisite demand of deep analytic talent, i.e., people with advanced training in statistics or machine learning.
Machine learning techniques were one of the core analyses in my dissertation research (example below). I know that as I went deep into learning and using these methods I was struck by their power and broad generality. I realized, oh, this is what Safeway is doing with all that data they collect from my ‘rewards’ card. These techniques are a huge upgrade from the clunky and drawback-ridden multivariate techniques of the past.
Many businesses are already reaping the rewards of using machine learning and other big data techniques, while I suspect others are just jumping in or are ramping up their operations. At the same time, public consciousness of these methods has been raised with the likes of Moneyball and Nate Silver’s entertaining pwnage of the ‘experts’ and prediction markets in the last election. Everyone is realizing, hey this stuff actually works.
People also just have reams and reams of data these days. It is axiomatic in the world of data that it is easier to collect data than to analyze, visualize, model, and interpret it. Even before interpretation, just organizing data so it can be analyzed (see this for example) is a challenge and an art that requires practice with and knowledge of concepts like normalization and map/reduce.
Once data is organized, for all but the most basic analyses, interpreting data quickly forces you into the realm of epistemology. This is the intellectual challenge that I love about statistics. You can’t just calculate mean±SE and call it a day; instead you have to ask what insights are possible, quantify how certain you are of those insights, and perhaps most importantly, define which insights are not possible. In other words, how can you extract truth from numbers without overreaching? This is a huge challenge and even with the latest techniques, pitfalls abound.
I think this critical thinking aspect of data analysis is not obvious to those with a more surface-level knowledge of statistics. Introductions to statistics tend to focus on how to “test” things, a rampant paradigm that I see as deeply problematic. For universities developing data science courses, this may also be an area where the better programs will instill the critical thinking skills while the weaker programs just teach you the latest algorithms. Either way, the students in the earliest programs seem to be doing pretty well:
North Carolina State University introduced a master’s in analytics in 2007. All 84 of last year’s graduates in the field had job offers, according to Michael Rappa, who conceived and directs the university’s Institute for Advanced Analytics. The average salary was $89,100, and more than $100,000 for those with prior work experience.
Anyway, this is cool stuff. I know I would have loved to take some data science courses when I was in school. It would have been a nice complement to all the self teaching I ended up doing out of personal interest and necessity for my research. There are so many great topics from the machine learning algorithms themselves to the Bayesian v. frequentist paradigms to more obscure stuff such as retinal variables and shrinkage.
When I was at AGU last December, I noticed that Facebook was looking to hire scientists to do this kind of work. Since the big tech companies like Facebook are at the cutting edge of implementing these methods, the fact that Facebook was trying to vacuum up talent in this area seemed to be an indicator that other companies would follow. I for one will be continuing to bone up on all of the latest tech in this area.
I just put up a web slideshow version of my dissertation based on my exit talk that I gave upon graduating from CU. I hope it will be useful for future students working on Niwot Ridge and anyone else curious about nitrogen cycling or biogeochemical hot spots. It features some of my best photos and graphics from my work in the Colorado Front Range and is annotated with descriptions of the project.
I want to share some data manipulation and graphing techniques I have found useful for making flexible visual comparisons of time series data in R. These techniques can be pieced together from the documentation of the R packages and various online forums but sometimes it’s helpful to have a fully integrated example based on real data, so that’s what I’m presenting here.
The problem at hand is that we have dozens of variables from our Arctic experiment that we have measured over the same time period, in our case three summer field seasons. We want to be able to compare subsets of the variables or even all of the variables on the same temporal axis so that we can see which events in the data line up temporally.
I took three main steps: (1)
melt the data frames that contain all the variables into data frames with a uniform column structure; (2) bind all variables into a giant data frame using
rbind; and (3) use the mapping and paneling features of
ggplot2 to draw plots.
Step 1: The
This function is a really nice and generally useful technique to have in your munging arsenal. Basically it reformats your data into a “long” format in which there is only one column called ‘value’ that contains all numeric response variable data and then adds a column called ‘variable’ listing the variable name.
melt function has a complementary function
cast that goes in the reverse and casts your melted data frame in whatever “wide” format you want. That’s great for making paired comparisons, though it is not needed for what I’m doing here. Together these are a great way to
reshape your data frames to whatever format you need and are a lot better than the confusing reshape function.
The example I’ll show can be run on your own machine after loading the data:
Melt the data frames containing diverse data sets, in this case soil cores (destructive harvests=dh), air temperature, and snow cover. The date is in day of year format (the
dh.m <- melt(dh.blog,id=c("year","doy","block","treatment"))
airtemp.m <- melt(airtemp.blog,id.vars=c("year","doy",
"block","treatment"), measure.vars = "air_temp_mean",na.rm=T)
snowcover.m <- melt(snowcover.blog,id.vars=c("year","doy",
By default it will melt all columns that are not
id.vars into a single column. To select only certain columns, identify them as
This is the easiest step, typically one line of code with maybe a few modifications to the resulting data frame. The order of the data can be relevant for ease of graphing them in a particular order, but there are always ways to change that sort of thing downstream as well. Make sure before binding that the data frames to be stitched together have identical columns and column names (
snow_project <- rbind(snowcover.m, airtemp.m, dh.m)
snow_project$treatment <- ordered(snow_project$treatment,
Step 3: Graph with
Let's say we want to compare ammonium and nitrate from the soil cores with air temperature and soil cover. That's all the data I uploaded for the example, but for our actual project there are way more variables. This is the graph shown at the top of the post. Click it for larger size.
aes(doy, value,col=treatment)) +
geom_line(stat = "summary", fun.y = "mean",lwd=1) +
stat_summary(fun.data = "mean_cl_normal",conf.int=0.68,lwd=.2) +
#this takes a sec to print due to the error bar calcs
subset the massive
snow_project data frame to extract the variables I am interested in graphing. The
subset can be left out if I want to graph everything all at once. For my current project, I have been having R draw a massive 110×30 inch pdf.
You want to take advantage of (1) the panel system in
facet_grid along with (2) what data visualizer Jacques Bertin calls "retinal variables." These are the colors, shapes, symbols, textures etc that you use to encode additional data dimensions. Between retinal variables and panels you can show a lot of variables in the same graphic.
In the example, treatment is color coded using the
col argument in the main
ggplot input function. If you had another variable you needed to show, the
shape argument is probably the next most useful after color. Use the
show.pch() function in the
Hmisc package to see what shapes are easily available.
After deciding what you want each panel to look like, the
facet_grid function divides your plots up into separate panels by variable allowing easy access to any kind of comparison we want. The formula input is
vertical~horizontal for panel structure. In this case we have variables stacked on one another and horizontal separation by year. The
scales="free_y" is essential for this approach because it allows your different variables to each have their own axis scale. For some variables, we might prefer to plot them in the same panel (like say soil and air temperature). In that case, pull variable out of the
facet_grid function and put it in
This series of steps works great for time series data, but the techniques in the three steps can be useful for other non-time series data as well.
I keep thinking about this question as I write up the results of our project. I posted about it once before here. On the one hand, warmer temperatures should melt snow earlier. On the other hand, warmer air holds more water, which can increase snowfall and snowpack depth. Deeper snowpacks may melt out later.
A broad summary study by Callaghan et al. reports the recent history:
[Changing temperature and moisture regimes are] driving significant changes in the snow regime particularly during the spring season when snow cover disappeared earlier at an average rate of 3.4 days per decade over the pan-Arctic terrestrial region (excluding Greenland) during 1972–2009.
And on future projections they write:
Projected changes in snow cover from Global Climate Models for the 2050 period indicate increases in maximum SWE of up to 15% over much of the Arctic, with the largest increases (15–30%) over the Siberian sector. In contrast, SCD is projected to decrease by about 10–20% over much of the Arctic, with the smallest decreases over Siberia (<10%) and the largest decreases over Alaska and northern Scandinavia (30–40%) by 2050.
Here’s a graph they included showing spatial variation in the change of timing of spring snowmelt:
Another study in Eurasia appears to support this finding:
The only downside to these analyses is that the underlying data (below shown from the first study, Callaghan et al.) are a bit noisy and a series of late melt years, should they occur, might really change the shape of these curves that underlie each pixel in the above maps:
In contrast to the findings mentioned in the summary study, a simulation modeling study focused specifically on Northern Alaska reports:
Despite warmer near surface atmospheric temperatures, it is found that spring melt is delayed throughout much of the North Slope due to the increased snow pack, and the growing season length is shortened.
Analysis of Northern Hemisphere spring terrestrial snow cover extent (SCE) from the NOAA snow chart Climate Data Record (CDR) for the April to June period (when snow cover is mainly located over the Arctic) has revealed statistically significant reductions in May and June SCE. Successive records for the lowest June SCE have been set each year for Eurasia since 2008, and in 3 of the past 5 years for North America.
Overall, it looks like spring melt will probably be earlier and has been getting earlier as of late. However, there is some disagreement in future projections about whether Northern Alaska (and perhaps other areas) will in fact see earlier snowmelt in spring. The underlying correlations that are driving the appearance of the maps above could turn around if warmer and warmer air leads to bigger and bigger snowpacks. My one personal observation here is that warm snaps in the Arctic spring can melt snow really fast so deeper snow may not in fact melt that much more slowly.
Midweek at AGU I also had a nice discussion with Jay Arnone, who I worked for as an intern at the Desert Research Institute just before starting grad school. Jay has a really amazing and unique facility at the Desert Research Institute that they call the EcoCELLs. The EcoCELLs are large mesocosms in which they can monitor large intact monoliths of soil (several cubic meters).
I told Jay he should pop some Arctic tundra in there since it would allow for some nicely controlled analyses of permafrost melt and other global change effects. Turns out he had already been working on getting this funded. I really think this would be awesome, adding a nice dimension to our understanding of arctic carbon balance that can’t be easily measured with either soil cores in the lab or field experiments. The ability to get carbon flux numbers at the same resolution as eddy flux on intact plants and soils undergoing temperature manipulations is not possible with other techniques and can provide some awesome data. Hope this happens!
Our snowmelt project group met on Wednesday to summarize our final year of field data and make plans for putting together our three years of results from the experiment. We have lots of data and had a good discussion about how to package it to best convey our main results. Below is the effect of our snowmelt acceleration treatment on the timing of melt over the three years of our project.
We decided that we want to write a mix of papers that focus on single data sets and papers that combine data sets from the different aspects of the project. Some of the data sets are complex and unique, seeming to warrant their own papers while others may be better brought to light in the context of the whole experiment.
One of the challenges of combining data sets from different aspects of the project is that field data sets can be idiosyncratic, with only the data collectors having total understanding of all of the nuances. We had a nice discussion about how to make accurate conclusions when melding the different project parts. I’m greatly looking forward to this integration of data since I’ve been so focused on this project for the last three years.
I really enjoyed participating and contributing to the oral session I was in entitled When Winter Changes: Hydrological, Ecological, and Biogeochemical Responses. There were many great talks. I tweeted some highlights from the session.
C Rixen on early snowmelt in Alps. Yet more evidence that tundra plants do not like cold conditions caused by early snowmelt
— A. Darrouzet-Nardi (@anthonydn) December 4, 2012
H. Steltzer on unusual season in CO alpine 2012. Melt 1 month early and little rain. Plants grew later and “missed lunch” of early nutrients
— A. Darrouzet-Nardi (@anthonydn) December 4, 2012
More highlights on my twitter page.
I took these back in September and just got around to stitching them together. Click for the full versions:
Our paper on a popular soil amino acid technique just came out in Soil Biology and Biochemistry. We got interested in improving this method and adapting it for microplates a couple years ago and after quite a bit of lab work this is the result. The abstract:
In studies of soil nitrogen (N) cycling, there is growing demand for accurate high-throughput analyses of amino acids and other small organic N compounds. We adapted an existing fluorometric amino acid method based on o-phthaldialdehyde and β-mercaptoethanol (OPAME) for use in 96-well microplates, and tested it using standards and field samples. While we started with an existing protocol, we made one critical change: instead of using a 1-min incubation period, we used a 1-h incubation period to deal with differences in reaction timing among microplate wells and to reduce interference from ammonium. Our microplate method is similar in sensitivity to existing protocols and able to determine leucine standard concentrations as low as ∼0.5 μM. Finally, we demonstrate that the OPAME reagent fluoresces in the presence of primary amines other than amino acids, such as amino sugars and tyramine. Because of this broad sensitivity to primary amines, descriptions of the measured pool should be revised from total free amino acids (TFAA) to total free primary amines (TFPA).
Reaction kinetics of primary amines and ammonium with OPAME. a. Fluorescence levels of leucine, a mixed amino acid standard, and ammonium over 3 h at 1.5 min intervals. b. Ratio of ammonium:leucine fluorescence for 20 μM standards over 3 h (mean ± 95% CI, n = 5). c. Fluorescence levels of leucine, glucosamine, tyramine, and N-acetylglucosamine over 3 h at 1.5 min intervals.
There was a well reported story on NPR yesterday about recent trends in Arctic snow melt:
Derksen and colleague Ross Brown have produced a study, which has been accepted for publication in Geophysical Research Letters, that documents a dramatic increase in the speed of this snowmelt. It turns out that in May and June, snow across the far north is disappearing fast. ”It’s decreasing at a more rapid rate than summer sea ice,” Derksen says. “So the loss of snow cover across the Arctic is really as big an issue as the loss of sea ice.”
That’s snowmelt day of year on the vertical axis and calendar year on the horizontal. So it is good to know that our experiment is accurately simulating a broad Arctic trend even if there has been a recent local effect that is the opposite.
I arrived in Estes Park yesterday to attend the LTER (Long Term Ecological Research) network meeting this week. I like this meeting because there are a lot of ecosystem scientists that do work closely aligned with my interests. These are my people. I’m presenting the above poster on data from our three summers of field data from the Arctic. (click poster for pdf).
Blog’s been on the back burner recently since the birth of my daughter Cara on June 29th:
So far a successful experiment!
I’m at Toolik right now tying up loose ends for the third and final field season of our project. When we arrived on August 11th, I was struck at how green it was here, definitely greener than last year at that time. However, within the first couple of days, the landscape turned from green to yellow and is well on its way now to red and brown. We have been busy taking a final round of soil cores and now taking the site down.
Just before arriving here, I had a nice trip to ESA in Portland and saw lots of friends and lots of great talks, particularly by some of my fellow postdocs I’ve met at Toolik and elsewhere. I want to post a few observations about the conference soon.
While the tundra is enchanting as always, I can’t wait to get back home to my sweet little baby and her heroic mom.
I know this is for a narrow audience, but I’m always surprised by how much finding random things like this online can help me, so here we go.
The goal is to paste raw microplate data into a spreadsheet and get out the final numbers you need. If you had thousands of microplates, it might be better to write a short program that can process the data, but for everyday lab analyses that change frequently and are implemented by many lab personnel that don’t program stuff, the spreadsheet is a good tool.
I’ve made a lot of these spreadsheets over the last few years to help process enzyme and nutrient data. Here’s one I made recently. It has a few parts in the different worksheets: (1) the blank template in which raw data can be pasted; (2) a pipetting map to be printed so that you know where to put your samples when you are pipetting; (3) simulated data that helps to identify the assumptions you are making about sources of well absorbance (for colorimetric assays) or in this case fluorescence; and (4) a run with the modeled data to help spot errors in the spreadsheet.
Here are some guidelines I have learned over the last couple years that I think make these spreadsheets more useful for our lab group.
(1) Label all the parameters. In particular, formulas should only have references to other cells, not any hard values like mass of soil used or volume of extractant, etc. This will prevent having to search all cells whenever you change one of these variables.
(2) Highlight anything you have to enter when samples are run so you don’t forget anything.
(3) Don’t make overly complicated formulas that are difficult to decipher later. To prevent this, divide up long calculations into two or more steps so each is more clear.
(4) Test the spreadsheet with simulated data that you create to look similar to real data but with nice round values. This will help you identify assumptions in the way calculations are made as well as locate typos in calculations across many spreadsheet cells.
(5) Make the output of your spreadsheet into a well organized table (below) that has only sample ID information and the final values in preferred units. Then you can grab these values to use for stats and comparison with other assays, leaving all the processing behind.