Including Autocorrelation Effects in GAMs
Source:vignettes/tractable-autocorrelations.Rmd
tractable-autocorrelations.Rmd
This vignette demonstrates the use of an AR1 model within a GAM to account for autocorrelation of the error terms of the model. We will use data from the Sarica dataset (Sarica et al. 2017) to demonstrate the how to include an autocorrelation term and its impact on the model statistics.
We start by loading the tractable
library:
Next, we will use a function that is included in
tractable
to read this dataset directly into memory.
Importantly, both the group (“ALS” or “CTRL”) and the subject identifier
(“subjectID”) need to be factors for subsequent analysis to work
properly.
sarica <- tractable::read_afq_sarica()
sarica$group <- factor(sarica$class)
sarica$subjectID <- unclass(factor(sarica$subjectID))
We will first fit a GAM model without any autocorrelation structure
using the tractable_single_bundle
function. This model will
use “group” and “age” to predict FA, while smoothing over the tract
nodes. We will also use the automated procedure implemented in
tractable_single_bundle
to determine the ideal value for
k
, a parameter used to determine the number of spline
functions. The default behavior for tractable_single_bundle
is to include the AR1 model. To avoid this, we set the parameter
autocor
to FALSE
.
gam_fit_cst_no_ac <- tractable::tractable_single_bundle(
df_afq = sarica,
tract = "Right Corticospinal",
participant_id = "subjectID",
group_by = "group",
covariates = c("age", "group"),
dwi_metric = "fa",
k = "auto",
autocor = FALSE
)
Examining the summary of the resulting GAM fit object shows us that
the k=16
is sufficiently large to describe the spatial
variation of tract profile data. In addition, we see that there is a
statistically significant effect of group (with a p-value of 4.66e-10)
and no statistically significant effect of age (p=0.2748).
summary(gam_fit_cst_no_ac)
#>
#> Family: Beta regression(113.72)
#> Link function: logit
#>
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID,
#> bs = "re")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.148763 0.080666 -1.844 0.0652 .
#> age 0.001425 0.001305 1.092 0.2748
#> groupCTRL 0.138659 0.022209 6.243 4.66e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(nodeID):groupALS 14.65 14.98 512.68 <2e-16 ***
#> s(nodeID):groupCTRL 14.60 14.97 569.00 <2e-16 ***
#> s(subjectID) 42.22 45.00 15.13 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.793 Deviance explained = 80.1%
#> fREML = 6902.9 Scale est. = 1 n = 4734
To account for potential spatial autocorrelation of FA values along
the length of the tract profile, we can incorporate a AR1 model into our
GAM. Briefly, this AR1 model is a linear model that estimates the amount
of influence of the FA value of the preceding node on the FA value of
the current node (see Van Rij et al. (2019) for an overview of
accounting for autocorrelation using the mgcv
package).
The AR1 model takes a parameter
to estimate autocorrelation effects. We can pass our initial model into
the function itsadug::start_value_rho
to automatically
determine the value of
.
We can also plot the autocorrelation by setting plot=TRUE
within that function (pictured below).
rho = itsadug::start_value_rho(gam_fit_cst_no_ac)
itsadug::start_value_rho(gam_fit_cst_no_ac, plot=T)
#> [1] 0.9255492
By default, the tractable_single_bundle
function will
determine the value of
and incorporate the AR1 model into the GAM estimation.
gam_fit_cst <- tractable::tractable_single_bundle(
df_afq = sarica,
tract = "Right Corticospinal",
participant_id = "subjectID",
group_by = "group",
covariates = c("age", "group"),
dwi_metric = "fa",
k = "auto"
)
Examining the summary of the resulting GAM fit object shows us that the inclusion of the AR1 model changes the resulting statistics of our model. Although there is still a statistically significant effect of group (p=0.044), the value of the t-statistic on this term has changed from 6.243 to 2.015.
summary(gam_fit_cst)
#>
#> Family: Beta regression(61.146)
#> Link function: logit
#>
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID,
#> bs = "re")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.329785 0.245062 -1.346 0.178
#> age 0.003904 0.003960 0.986 0.324
#> groupCTRL 0.144626 0.071782 2.015 0.044 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(nodeID):groupALS 14.50 14.97 74.217 < 2e-16 ***
#> s(nodeID):groupCTRL 14.38 14.95 85.385 < 2e-16 ***
#> s(subjectID) 37.89 45.00 9.105 0.000674 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.613 Deviance explained = 62.3%
#> fREML = -6443.6 Scale est. = 1 n = 4734