Measuring Crop Growth in R

In between finishing up my database coursework, I’m usually doing remote sensing work.

Most of what I’ve been working on lately is an attempt to create crop phenology curves. In a nut shell phenology curves are just a measurement of crop growth throughout the year.

This post is just a quick code-along which shows you how to generate phenology curves using R.

I’ll try to always list any packages or addins I use so that if anyone else stumbles across this, they know what I’m using. Nothing’s more annoying than orphaned functions and having to hunt down packages. Also the link to download the data is public so it should be available to play around with as well.

library(dplyr)
library(magrittr)
library(readr)
library(knitr)
library(kableExtra)

The Data

This past weekend I spent some time writing a python script (more on this later) that calculated the average NDVI value for a few thousand fields in the Mississippi Delta region for 20 discrete days in 2000. NDVI stands for Normalized Difference Vegetation Index - a measurement of vegetation health. The Delta is one of the most agriculturally productive regions in the country, which makes it a good target if we want to look at crop growth over time.

# Read in NDVI data
ndvi_stats_2000 <- readr::read_csv('https://www.dropbox.com/s/rvqgnxc70gv4bhl/ndvi_stats_2000.csv?dl=1', guess_max = 10001)

You might be wondering what guess_max = 10001 is for. By default readr tries to guess the data type by reading the first 1000 rows. Since I’ve looked at our raw data table before hand, I know that the first few thousand rows are actually no data values represented by ‘0’. This misleads readr to assume incorrect data types resulting in data integrity issues.

The next thing I wanted to do was filter crops into their own tables based on their crop IDs. These identifiers correspond to the respective identifiers from the Crop Data Layer - a really handy land-cover dataset provided by the USDA.

# Filter by crop type
corn <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 1)
cotton <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 2)
rice <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 3)
soybeans <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 5)
fallow <- filter(ndvi_stats_2000, ndvi_stats_2000$cdl == 61)

This particular dataset has quite a few no data values due to clouds. Before we can proceed we need to mask out these values.

# Filter out no data
corn <- filter(corn, corn$avg > 0)
cotton <- filter(cotton, cotton$avg > 0)
rice <- filter(rice, rice$avg > 0)
soybeans <- filter(soybeans, soybeans$avg > 0)
fallow <- filter(fallow, fallow$avg > 0)

If we look at our data, we’ll notice that we have an average NDVI value for each field. If we wanted we could look at the phenology curve for each field, but I’m more interested in what the phenology curve is for each crop.

field_id	doy	avg	max	range	std	count	area	cdl
4369	84	2257.465	3953	3953	807.3354	101	90900	1
4746	84	1635.798	3673	3673	549.2867	327	294300	1
82	20	3218.348	4946	4946	755.9517	164	147600	1
124	20	6144.165	7483	7483	572.2665	176	158400	1
197	20	2235.016	4798	4798	599.0628	183	164700	1
247	20	4121.549	7234	7234	1424.5496	268	241200	1

The Analysis

First we need to group by the day of year an image was collected. Then for each snapshot in time we can calculate aggregate statistics for each crop. What was the mean average NDVI value for every field growing corn on the 4th day of the year for instance.

# Calculate stats on cleaned data
corn %>%
  group_by(doy) %>%
    summarise(average = mean(avg), maximum = max(avg),
              minimum = min(avg), stdev = sd(avg) ) -> corn_stats

cotton %>%
  group_by(doy) %>%
  summarise(average = mean(avg), maximum = max(avg),
            minimum = min(avg), stdev = sd(avg) ) -> cotton_stats

rice %>%
  group_by(doy) %>%
  summarise(average = mean(avg), maximum = max(avg),
            minimum = min(avg), stdev = sd(avg) ) -> rice_stats

soybeans %>%
  group_by(doy) %>%
  summarise(average = mean(avg), maximum = max(avg),
            minimum = min(avg), stdev = sd(avg) ) -> soybeans_stats

fallow %>%
  group_by(doy) %>%
  summarise(average = mean(avg), maximum = max(avg),
            minimum = min(avg), stdev = sd(avg) ) -> fallow_stats

# Gather stats
crops <- list(corn_stats, cotton_stats, rice_stats,
              soybeans_stats, fallow_stats)
crop_names <- c('Corn','Cotton','Rice','Soybean','Fallow')
par(mfrow=c(2,3)) # all plots on one page

Now if we want to plot our aggregated averages, we can use a simple for loop to produce a plot for each crop.

c <- 2
for(i in 1:length(crops)){

  crop <- crops[[i]]

  heading = crop_names[i]
  plot(x=crop$doy, y=crop$average, ylim=c(1000, 8500), type="l",
       main=heading, col = c, xlab = 'Day of Year', ylab = 'Mean NDVI')
  lines(x=crop$doy, y=crop$average, type="l", col = c)
  c <- c + 1
}

Pretty neat.

Immediately the Fallow phenology curve stands out, with little response during the year which shouldn’t be surprising since these fields are idle.

Interestingly Cotton and Rice seem to have similarly shaped curves. Rice has a higher plateau near 8000 versus cotton which looks to be closer to 7000. According to the USDA Agricultural Statistics Board, the usual plant date for Mississippi cotton is April 14 and the harvest date ends by November 13. Rice planting starts April 2 and the harvest is typically complete by October 27.

So there’s definitely overlap in their production schedules. My initial thoughts are that the density of the rice plants and consistent irrigation might contribute to the high rice NDVI values as opposed to cotton which may or may not be irrigated depending on the farm thus lowering the overall NDVI average for cotton.

Eventually the idea is to use these phenology curves as references for historical crop classification of NDVI images from before 2000 when field scale crop landcover data doesn’t exist.

Measuring Crop Growth in R

The Data

The Analysis

Wes Porter