Simple and intuitive explanation of ROC curves and AUC

I don’t know why, but it took me a little while to properly make sense of these diagnostics, so I wanted to develop a very simple illustration of the logic behind these concepts. ROC stands for Receiver Operating Characteristics, while AUC is the area under this curve, which is used as a metric for model performance in a classification problem. Perfomance is measured as the ability to maximise true positives, while minimising false positives.

Getting spatial data on soil profiles in R

Soil structure and health is critical to water availability, nutrient cycling, and plant productivity. By extension, soils will have strong associations with invertebrate community assemblage, diversity, and abundance and is thus a invertebrate science. Modern spatial datasets on soil are available to help us intergrate variation in soil characteristics in the prediction of invertebrate processes. In Austalia, one of the newer soil data sets is Soil and Landscape grid of Australia.

Modelling proportions - Part 1

Proportions are a funny thing in statistics. Some people just seem to love percentages. But there is a dark side to modelling a response variable as a percentage. For example, I might be tempted to fit a linear model to mortality data on some insects exposed to heat stress for some time. To prove the point I will simulate some data. library(tidyverse) time = rep(0:9, 10) n = 100 a = -1 b = 0.

Aggregation measures for pests - Taylor's power law and Iwao's patchiness

Aggregation measures for pest abundance have been widely used as summary statistics for aggregation levels as well as in designing surveillance protocols. Taylor’s power law and Iwao’s patchiness are two methods that are used most commonly. To be frank, I find the measures a little strange, particularly when they appear in papers as “cookbook statistics” (sometimes incorrectly presented) with little reference to any underpinning theory. But I managed to find some useful sources which helped to clarrify things for me.

Tutorial on quantifying species detection probabilities during surveillance with stan in R

A practical question in species surveillance is “How much search effort is required for detection?”. This can be quantified under controlled conditions where the number and location of target species are known and participants are recruited to see how success rate varies. Let’s use an example of an easter egg hunt where the adult (the researcher) wants to quantify how much effort it takes a child (the participant) to find an easter egg.

Accessing SILO climatic data for Australia in R

This short post will describe how to access SILO climatic data for Australia. The data is available as both csv and json, but here we work with the two-dimensional csv format to make use of R’s powerful data table functionality. At the time of writing there were 18937 stations. We can access the metadata on each station (including location and years of available data), which will later become useful for selecting an appropriate weather station.

Generating probability distributions with natural examples

How many probability distributions can we generate by imagining simple natural processes? In this post I use a simple binomial random number generator to produce different random variables with a variety of distributions. Using built in probability densities functions in R, I show how the simulated data (plot bars) approach the exact probability density (plot lines) and provide an intuitive interpretation of model parameters of commonly encountered distributions. A biological example “Nothing in Biology Makes Sense Except in the Light of Evolution” - Theodosius Dobzhansky, 1973

Bayesian and frequentist approaches to binomial dose responses in R

For a given species, a simple mortality response to environmental conditions can represented with the probabilistic outcome (death), which occurs with probabilty (p). This simple process is know as a Bernoulli random variable. A motivating example is how a pest responds to increasing doses of a pesticide. Invertebrate pests cause 10-20% of yield losses in modern food systems. While cultural practices such as crop rotatation and biological control through beneficial insects increasingly form a core component of effective and sustainable management, pesticides remain a widely used tool.

Prey population growth contrained by predators

What if a growing population gets eaten by another population? In a previous post I showed why we might expect a population to grow exponentially when not resource limited. We then extended this to the case where a population reaches some carrying capacity (using a simple and non-mechanistic logistic function). But population growth can also be curtailed through interactions with another species population, such as a predator. In my area of study, we deal with a lot of herbivorous pests of agricultural systems.

Modelling density dependent population growth (logistic growth)

Let’s derive some more population growth functions! The logistic population growth function In a previous post we derived a function for population growth based on the vital rates of reproduction and mortality. We assumed that the growth rate was constant with respect to the number of individuals in the population or (\frac{dt}{dN} = rN). This led to the unrealistic prediction that populations will grow indefinitely. Of course, populations will eventually run into resource problems (e.