Skip to contents

This vignette demonstrates the use of an AR1 model within a GAM to account for autocorrelation of the error terms of the model. We will use data from the Sarica dataset (Sarica et al. 2017) to demonstrate the how to include an autocorrelation term and its impact on the model statistics.

We start by loading the tractable library:

Next, we will use a function that is included in tractable to read this dataset directly into memory. Importantly, both the group (“ALS” or “CTRL”) and the subject identifier (“subjectID”) need to be factors for subsequent analysis to work properly.

sarica <- tractable::read_afq_sarica()
sarica$group <- factor(sarica$class)
sarica$subjectID <- unclass(factor(sarica$subjectID))

We will first fit a GAM model without any autocorrelation structure using the tractable_single_bundle function. This model will use “group” and “age” to predict FA, while smoothing over the tract nodes. We will also use the automated procedure implemented in tractable_single_bundle to determine the ideal value for k, a parameter used to determine the number of spline functions. The default behavior for tractable_single_bundle is to include the AR1 model. To avoid this, we set the parameter autocor to FALSE.

gam_fit_cst_no_ac <- tractable::tractable_single_bundle(
  df_afq         = sarica,
  tract          = "Right Corticospinal",
  participant_id = "subjectID",
  group_by       = "group",
  covariates     = c("age", "group"),
  dwi_metric     = "fa",
  k              = "auto",
  autocor        = FALSE
)

Examining the summary of the resulting GAM fit object shows us that the k=16 is sufficiently large to describe the spatial variation of tract profile data. In addition, we see that there is a statistically significant effect of group (with a p-value of 4.66e-10) and no statistically significant effect of age (p=0.2748).

summary(gam_fit_cst_no_ac)
#> 
#> Family: Beta regression(113.72) 
#> Link function: logit 
#> 
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID, 
#>     bs = "re")
#> 
#> Parametric coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.148763   0.080666  -1.844   0.0652 .  
#> age          0.001425   0.001305   1.092   0.2748    
#> groupCTRL    0.138659   0.022209   6.243 4.66e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                       edf Ref.df      F p-value    
#> s(nodeID):groupALS  14.65  14.98 512.68  <2e-16 ***
#> s(nodeID):groupCTRL 14.60  14.97 569.00  <2e-16 ***
#> s(subjectID)        42.22  45.00  15.13  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.793   Deviance explained = 80.1%
#> fREML = 6902.9  Scale est. = 1         n = 4734

To account for potential spatial autocorrelation of FA values along the length of the tract profile, we can incorporate a AR1 model into our GAM. Briefly, this AR1 model is a linear model that estimates the amount of influence of the FA value of the preceding node on the FA value of the current node (see Van Rij et al. (2019) for an overview of accounting for autocorrelation using the mgcv package).

The AR1 model takes a parameter ρ\rho to estimate autocorrelation effects. We can pass our initial model into the function itsadug::start_value_rho to automatically determine the value of ρ\rho. We can also plot the autocorrelation by setting plot=TRUE within that function (pictured below).

rho = itsadug::start_value_rho(gam_fit_cst_no_ac)
itsadug::start_value_rho(gam_fit_cst_no_ac, plot=T)

#> [1] 0.9255492

By default, the tractable_single_bundle function will determine the value of ρ\rho and incorporate the AR1 model into the GAM estimation.

gam_fit_cst <- tractable::tractable_single_bundle(
  df_afq         = sarica,
  tract          = "Right Corticospinal",
  participant_id = "subjectID",
  group_by       = "group",
  covariates     = c("age", "group"),
  dwi_metric     = "fa",
  k              = "auto"
)

Examining the summary of the resulting GAM fit object shows us that the inclusion of the AR1 model changes the resulting statistics of our model. Although there is still a statistically significant effect of group (p=0.044), the value of the t-statistic on this term has changed from 6.243 to 2.015.

summary(gam_fit_cst)
#> 
#> Family: Beta regression(61.146) 
#> Link function: logit 
#> 
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID, 
#>     bs = "re")
#> 
#> Parametric coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.329785   0.245062  -1.346    0.178  
#> age          0.003904   0.003960   0.986    0.324  
#> groupCTRL    0.144626   0.071782   2.015    0.044 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                       edf Ref.df      F  p-value    
#> s(nodeID):groupALS  14.50  14.97 74.217  < 2e-16 ***
#> s(nodeID):groupCTRL 14.38  14.95 85.385  < 2e-16 ***
#> s(subjectID)        37.89  45.00  9.105 0.000674 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.613   Deviance explained = 62.3%
#> fREML = -6443.6  Scale est. = 1         n = 4734

References

Sarica, Alessia, Antonio Cerasa, Paola Valentino, Jason Yeatman, Maria Trotta, Stefania Barone, Alfredo Granata, et al. 2017. “The Corticospinal Tract Profile in Amyotrophic Lateral Sclerosis.” Hum. Brain Mapp. 38 (2): 727–39.
Van Rij, Jacolien, Petra Hendriks, Hedderik Van Rijn, R. Harald Baayen, and Simon N. Wood. 2019. “Analyzing the Time Course of Pupillometric Data.” Trends in Hearing 23 (January): 1–22.