In between finishing up my database coursework, I’m usually doing remote sensing work.
Most of what I’ve been working on lately is an attempt to create crop phenology curves. In a nut shell phenology curves are just a measurement of crop growth throughout the year.
This post is just a quick code-along which shows you how to generate phenology curves using R.
I’ll try to always list any packages or addins I use so that if anyone else stumbles across this, they know what I’m using. Nothing’s more annoying than orphaned functions and having to hunt down packages. Also the link to download the data is public so it should be available to play around with as well.
library(dplyr)
library(magrittr)
library(readr)
library(knitr)
library(kableExtra)
The Data
This past weekend I spent some time writing a python script (more on this later) that calculated the average NDVI value for a few thousand fields in the Mississippi Delta region for 20 discrete days in 2000. NDVI stands for Normalized Difference Vegetation Index - a measurement of vegetation health. The Delta is one of the most agriculturally productive regions in the country, which makes it a good target if we want to look at crop growth over time.
# Read in NDVI data
ndvi_stats_2000 <- readr::read_csv('https://www.dropbox.com/s/rvqgnxc70gv4bhl/ndvi_stats_2000.csv?dl=1', guess_max = 10001)
You might be wondering what guess_max = 10001
is for. By default readr
tries to guess the data type by reading the first 1000 rows. Since I’ve looked at our raw data table before hand, I know that the first few thousand rows are actually no data values represented by ‘0’. This misleads readr
to assume incorrect data types resulting in data integrity issues.
The next thing I wanted to do was filter crops into their own tables based on their crop IDs. These identifiers correspond to the respective identifiers from the Crop Data Layer - a really handy land-cover dataset provided by the USDA.
# Filter by crop type
corn <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 1)
cotton <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 2)
rice <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 3)
soybeans <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 5)
fallow <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 61)
This particular dataset has quite a few no data values due to clouds. Before we can proceed we need to mask out these values.
# Filter out no data
corn <- filter(corn, corn$avg > 0)
cotton <- filter(cotton, cotton$avg > 0)
rice <- filter(rice, rice$avg > 0)
soybeans <- filter(soybeans, soybeans$avg > 0)
fallow <- filter(fallow, fallow$avg > 0)
If we look at our data, we’ll notice that we have an average NDVI value for each field. If we wanted we could look at the phenology curve for each field, but I’m more interested in what the phenology curve is for each crop.
field_id | doy | avg | max | min | range | std | count | area | cdl |
---|---|---|---|---|---|---|---|---|---|
4369 | 84 | 2257.465 | 3953 | 0 | 3953 | 807.3354 | 101 | 90900 | 1 |
4746 | 84 | 1635.798 | 3673 | 0 | 3673 | 549.2867 | 327 | 294300 | 1 |
82 | 20 | 3218.348 | 4946 | 0 | 4946 | 755.9517 | 164 | 147600 | 1 |
124 | 20 | 6144.165 | 7483 | 0 | 7483 | 572.2665 | 176 | 158400 | 1 |
197 | 20 | 2235.016 | 4798 | 0 | 4798 | 599.0628 | 183 | 164700 | 1 |
247 | 20 | 4121.549 | 7234 | 0 | 7234 | 1424.5496 | 268 | 241200 | 1 |
The Analysis
First we need to group by the day of year an image was collected. Then for each snapshot in time we can calculate aggregate statistics for each crop. What was the mean average NDVI value for every field growing corn on the 4th day of the year for instance.
# Calculate stats on cleaned data
corn %>%
group_by(doy) %>%
summarise(average = mean(avg), maximum = max(avg),
minimum = min(avg), stdev = sd(avg) ) -> corn_stats
cotton %>%
group_by(doy) %>%
summarise(average = mean(avg), maximum = max(avg),
minimum = min(avg), stdev = sd(avg) ) -> cotton_stats
rice %>%
group_by(doy) %>%
summarise(average = mean(avg), maximum = max(avg),
minimum = min(avg), stdev = sd(avg) ) -> rice_stats
soybeans %>%
group_by(doy) %>%
summarise(average = mean(avg), maximum = max(avg),
minimum = min(avg), stdev = sd(avg) ) -> soybeans_stats
fallow %>%
group_by(doy) %>%
summarise(average = mean(avg), maximum = max(avg),
minimum = min(avg), stdev = sd(avg) ) -> fallow_stats
# Gather stats
crops <- list(corn_stats, cotton_stats, rice_stats,
soybeans_stats, fallow_stats)
crop_names <- c('Corn','Cotton','Rice','Soybean','Fallow')
par(mfrow=c(2,3)) # all plots on one page
Now if we want to plot our aggregated averages, we can use a simple for loop to produce a plot for each crop.
c <- 2
for(i in 1:length(crops)){
crop <- crops[[i]]
heading = crop_names[i]
plot(x=crop$doy, y=crop$average, ylim=c(1000, 8500), type="l",
main=heading, col = c, xlab = 'Day of Year', ylab = 'Mean NDVI')
lines(x=crop$doy, y=crop$average, type="l", col = c)
c <- c + 1
}
Pretty neat.
Immediately the Fallow phenology curve stands out, with little response during the year which shouldn’t be surprising since these fields are idle.
Interestingly Cotton and Rice seem to have similarly shaped curves. Rice has a higher plateau near 8000 versus cotton which looks to be closer to 7000. According to the USDA Agricultural Statistics Board, the usual plant date for Mississippi cotton is April 14 and the harvest date ends by November 13. Rice planting starts April 2 and the harvest is typically complete by October 27.
So there’s definitely overlap in their production schedules. My initial thoughts are that the density of the rice plants and consistent irrigation might contribute to the high rice NDVI values as opposed to cotton which may or may not be irrigated depending on the farm thus lowering the overall NDVI average for cotton.
Eventually the idea is to use these phenology curves as references for historical crop classification of NDVI images from before 2000 when field scale crop landcover data doesn’t exist.