EXERCISE 2.1: Fitting the growth of French forests

How fast are French forests growing? How much timber can they provide in one hectare? Surely forests provide multiple ecosystem services, but in this (simplified) exercise we focus on the above questions by looking at the so-called "raw data" provided by the French National Forest Inventory, both in terms of individual trees and in terms of inventoried plots, to fit a generic growth model of the forest stands in terms of volumes with respect to the age of the trees.

Skills employed:

download and import data from internet
wrangle tabular data: compute new columns (fields) based on existing one, filter rows (records), join tables
fit a generic model (curve) with data

Instructions

If you have already cloned or downloaded the whole course repository the folder with the exercise is on [REPOSITORY_ROOT]/lessonsMaterial/02_JULIA2/forestExercise. Otherwise download a zip of just that folder here.

In the folder you will find the file forestExercise.jl containing the julia file that you will have to complete to implement and run the model (follow the instructions on that file). In that folder you will also find the Manifest.toml file. The proposal of resolution below has been tested with the environment defined by that file. If you are stuck and you don't want to lookup to the resolution above you can also ask for help in the forum at the bottom of this page. Good luck!

Resolution

Click "ONE POSSIBLE SOLUTION" to get access to (one possible) solution for each part of the code that you are asked to implement.

1) Setting up the environment...

Start by setting the working directory to the directory of this file and activate it. If you have the provided Manifest.toml file in the directory, just run Pkg.instantiate(), otherwise manually add the packages Pipe, HTTP, CSV, DataFrames, LsqFit, StatsPlots.

ONE POSSIBLE SOLUTION

cd(@__DIR__)         
using Pkg             
Pkg.activate(".")   
# If using a Julia version different than 1.10 please uncomment and run the following line (reproductibility guarantee will hower be lost)
# Pkg.resolve()   
Pkg.instantiate() 
using Random
Random.seed!(123)

2) Load the packages

Load the packages Pipe, HTTP, CSV, DataFrames, LsqFit, StatsPlots

ONE POSSIBLE SOLUTION

using Pipe, HTTP, CSV, DataFrames, LsqFit, StatsPlots

3) Load the data

Load from internet or from local files the following data:

ltURL     = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/blob/main/lessonsMaterial/02_JULIA2/forestExercise/data/arbres_foret_2012.csv?raw=true" # live individual trees data
dtURL     = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/blob/main/lessonsMaterial/02_JULIA2/forestExercise/data/arbres_morts_foret_2012.csv?raw=true" # dead individual trees data
pointsURL = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/blob/main/lessonsMaterial/02_JULIA2/forestExercise/data/placettes_foret_2012.csv?raw=true" # plot level data
docURL    = "https://github.com/sylvaticus/IntroSPMLJuliaCourse/blob/main/lessonsMaterial/02_JULIA2/forestExercise/data/documentation_2012.csv?raw=true" # optional, needed for the species label

If you have choosen to download the data from internet, you can make for each of the dataset a @pipe macro starting with HTTP.get(URL).body, continuing the pipe with CSV.File(_) and end the pipe with a DataFrame object.

ONE POSSIBLE SOLUTION

lt     = @pipe HTTP.get(ltURL).body     |> CSV.File(_) |> DataFrame
dt     = @pipe HTTP.get(dtURL).body     |> CSV.File(_) |> DataFrame
points = @pipe HTTP.get(pointsURL).body |> CSV.File(_) |> DataFrame
doc    = @pipe HTTP.get(docURL).body    |> CSV.File(_) |> DataFrame

# Or import from local...
lt     = CSV.read("data/arbres_foret_2012.csv",DataFrame)
dt     = CSV.read("data/arbres_morts_foret_2012.csv",DataFrame)
points = CSV.read("data/placettes_foret_2012.csv",DataFrame)
doc    = CSV.read("data/documentation_2012.csv",DataFrame)

4) Filter out unused information

These datasets have many variable we are not using in this exercise. Out of all the variables, select only for the lt and dt dataframes the columns idp (pixel id), c13 (circumference at 1.30 meters) and v (tree's volume). Then vertical concatenate the two dataset in an overall trees dataset. For the points dataset, select only the variables idp (pixel id), esspre (code of the main forest species in the stand) and cac (age class).

ONE POSSIBLE SOLUTION

lt     = lt[:,["idp","c13","v"]]
dt     = dt[:,["idp","c13","v"]]
trees  = vcat(lt,dt)
points = points[:,["idp","esspre","cac"]]

5) Compute the timber volumes per hectare

As the French inventory system is based on a concentric sample method (small trees are sampled on a small area (6 metres radius), intermediate trees on a concentric area of 9 metres and only large trees (with a circonference larger than 117.5 cm) are sampled on a concentric area of 15 metres of radius), define the following function to compute the contribution of each tree to the volume per hectare:

"""
    vHaContribution(volume,circonference)

Return the contribution in terms of m³/ha of the tree.

The French inventory system is based on a concentric sample method: small trees are sampled on a small area (6 metres radius), intermediate trees on a concentric area of 9 metres and only large trees (with a circonference larger than 117.5 cm) are sampled on a concentric area of 15 metres of radius.
This function normalise the contribution of each tree to m³/ha.
"""
function vHaContribution(v,c13)
    if c13 < 70.5
        return v/(6^2*pi/(100*100))
    elseif c13 < 117.5
        return v/(9^2*pi/(100*100))
    else 
        return v/(15^2*pi/(100*100))
    end
end

Use the above function to compute trees.vHa based on trees.v and trees.c13.

ONE POSSIBLE SOLUTION

trees.vHa  = vHaContribution.(trees.v,trees.c13)

6) Aggregate trees data

Aggregate the trees dataframe by the idp column to retrieve the sum of vHa and the number of trees for each point, calling these two columns vHa and ntrees.

ONE POSSIBLE SOLUTION

pointsVols = combine(groupby(trees,["idp"]) ,  "vHa" => sum => "vHa", nrow => "ntrees")

7) Join datasets

Join the output of the previous step (the trees dataframe aggregated "by point") with the original points dataframe using the column idp.

ONE POSSIBLE SOLUTION

points     = innerjoin(points,pointsVols,on="idp")

8) Filter data

Use boolean selection to apply the following filters:

filter_nTrees           = points.ntrees .> 5 # we skip points with few trees 
filter_IHaveAgeClass    = .! in.(points.cac,Ref(["AA","NR"]))
filter_IHaveMainSpecies = .! ismissing.(points.esspre) 
filter_overall          = filter_nTrees .&& filter_IHaveAgeClass .&& filter_IHaveMainSpecies

ONE POSSIBLE SOLUTION

points                  = points[filter_overall,:]

9) Compute the age class

Run the following command to parse the age class (originally as a string indicating the 5-ages group) to an integer and compute the mid-range of the class in years. For example, class "02" will become 7.5 years.

points.cac              = (parse.(Int64,points.cac) .- 1 ) .* 5 .+ 2.5

10) Define the model to fit

Define the following logistic model of the growth relation with respect to the age with 3 parameters and make its vectorised form:

logisticModel(age,parameters) = parameters[1]/(1+exp(-parameters[2] * (age-parameters[3]) ))
logisticModelVec(age,parameters) = # .... complete here

ONE POSSIBLE SOLUTION

logisticModelVec(age,parameters) = logisticModel.(age,Ref(parameters))

11) Set the initial values for the parameters to fit

Set initialParameters to 1000,0.05 and 50 respectivelly.

ONE POSSIBLE SOLUTION

initialParameters = [1000,0.05,50] #max growth; growth rate, mid age

12) Fit the model

Perform the fitting of the model using the function curve_fit(model,X,Y,initial parameters) and obtain the fitted parameter fitobject.param

ONE POSSIBLE SOLUTION

fitobject         = curve_fit(logisticModelVec, points.cac, points.vHa, initialParameters)
fitparams         = fitobject.param

13) Compute the errors

Compute the standard error for each estimated parameter and the confidence interval at 10% significance level

ONE POSSIBLE SOLUTION

sigma            = stderror(fitobject)
confidence_inter = confidence_interval(fitobject, 0.1) # 10% significance level

14) Plot fitted model

Plot a chart of fitted (y) by stand age (x) (i.e. the logisticModel with the given parameters)

ONE POSSIBLE SOLUTION

x = 0:maximum(points.cac)*1.5
plot(x->logisticModel(x,fitparams),0,maximum(x), label= "Fitted vols", legend=:topleft)

15) Add the observations to the plot

Add to the plot a scatter chart of the actual observed VHa

ONE POSSIBLE SOLUTION

plot!(points.cac, points.vHa, seriestype=:scatter, label = "Obs vHa")

16) [OPTIONAL] Differentiate the model per tree specie

Look at the growth curves of individual species. Try to perform the above analysis for individual species, for example plot the fitted curves for the 5 most common species

ONE POSSIBLE SOLUTION

speciesCount = combine(groupby(points, :esspre), nrow => :count)
sort!(speciesCount,"count",rev=true)

spLabel        = doc[doc.donnee .== "ESSPRE",:]
spLabel.spCode = parse.(Int64,spLabel.code)
speciesCount = leftjoin(speciesCount,spLabel, on="esspre" => "spCode")


# plot the 5 main species separately
for (i,sp) in enumerate(speciesCount[1:5,"esspre"])
    local fitobject, fitparams, x
    spLabel    = speciesCount[i,"libelle"]
    pointsSp   = points[points.esspre .== sp, : ]
    fitobject  = curve_fit(logisticModelVec, pointsSp.cac, pointsSp.vHa, initialParameters)
    fitparams  = fitobject.param
    x = 0:maximum(points.cac)*1.5
    println(i)
    println(sp)
    println(fitparams)
    if i == 1
        myplot = plot(x->logisticModel(x,fitparams),0,maximum(x), label= spLabel, legend=:topleft)
    else
        myplot = plot!(x->logisticModel(x,fitparams),0,maximum(x), label= spLabel, legend=:topleft)
    end
    display(myplot)
end