Fitting Generalized Additive Models (GAMs) • tractable

This vignette demonstrates the use of GAMs for statistical analysis of tract profile data. The data we will use here contains tract profiles from diffusion MRI measurements in a group of patients with Amyotrophic Lateral Sclerosis (ALS) and a group of matched controls (Sarica et al. 2017).

We start by loading the tractable library:

library(tractable)

Next, we will use a function that is included in tractable to read this dataset directly into memory. Importantly, both the group (“ALS” or “CTRL”) and the subject identifier (“subjectID”) need to be factors for subsequent analysis to work properly.

sarica <- tractable::read_afq_sarica()
sarica$group <- factor(sarica$class)
sarica$subjectID <- unclass(factor(sarica$subjectID))

First, let’s visualize the data. We use the plot_tract_profiles function, selecting to view both fractional anisotropy (FA) and mean diffusivity profiles in two tracts: the right corticospinal tract (CST) and the right superior longitudinal fasciculus (SLF), which are identified in the “tractID” column of this dataset.

tractable::plot_tract_profiles(
  df          = sarica,
  group_col   = "group",
  n_groups    = 2,
  metrics     = c("fa", "md"),
  bundles     = list("Right Corticospinal", "Right SLF"),
  bundles_col = "tractID"
)
#> Warning: Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> $fa
#> Warning: Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).

#> 
#> $md

We can already see that ALS has a profound effect on the tract profiles of the CST, but does not affect SLF as much. We will use GAMs in order to quantify this in statistical terms. We start by fitting a GAM model to the data from the CST. Using the tractable_single_bundle function, we select the Right CST data, and focus here only on FA. We use “group” and “age” as relevant covariates. Comparing group as a main effect, that will also be used to fit separate smooth functions for each category of subjects. The mgcv GAM functions use a parameter k to determine how many spline functions to use in fitting the smooth change of FA over the length of the tract. We use an automated strategy to find k.

gam_fit_cst <- tractable::tractable_single_bundle(
  df_afq         = sarica,
  tract          = "Right Corticospinal",
  participant_id = "subjectID",
  group_by       = "group",
  covariates     = c("age", "group"),
  dwi_metric     = "fa",
  k              = "auto"
)

Examining the summary of the resulting GAM fit object shows us that the k=16 is sufficiently large to describe the spatial variation of tract profile data. In addition, we see that there is a statistically significant effect of group (with a p-value of 4.66e-10) and no statistically significant effect of age (p=0.2748).

summary(gam_fit_cst)
#> 
#> Family: Beta regression(61.146) 
#> Link function: logit 
#> 
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID, 
#>     bs = "re")
#> 
#> Parametric coefficients:
#>              Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.329785   0.245062  -1.346    0.178  
#> age          0.003904   0.003960   0.986    0.324  
#> groupCTRL    0.144626   0.071782   2.015    0.044 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                       edf Ref.df      F  p-value    
#> s(nodeID):groupALS  14.50  14.97 74.217  < 2e-16 ***
#> s(nodeID):groupCTRL 14.38  14.95 85.385  < 2e-16 ***
#> s(subjectID)        37.89  45.00  9.105 0.000674 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.613   Deviance explained = 62.3%
#> fREML = -6443.6  Scale est. = 1         n = 4734

Running the same analysis on the data from SLF, we see that there is no significant difference between the groups in this bundle, indicating that the effect observed in CST is rather specific to this bundle.

gam_fit_slf <- tractable::tractable_single_bundle(
  df_afq         = sarica,
  tract          = "Right SLF",
  participant_id = "subjectID",
  group_by       = "group",
  covariates     = c("age","group"),
  dwi_metric     = "fa",
  k              = "auto"
)

summary(gam_fit_slf)
#> 
#> Family: Beta regression(95.864) 
#> Link function: logit 
#> 
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID, 
#>     bs = "re")
#> 
#> Parametric coefficients:
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.3103302  0.1374484  -2.258    0.024 *
#> age         -0.0006771  0.0022217  -0.305    0.761  
#> groupCTRL    0.0238207  0.0389222   0.612    0.541  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Approximate significance of smooth terms:
#>                          edf Ref.df       F p-value    
#> s(nodeID):groupALS  14.75251  14.99 147.703  <2e-16 ***
#> s(nodeID):groupCTRL 14.75346  14.99 158.462  <2e-16 ***
#> s(subjectID)         0.01141  45.00   0.012  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> R-sq.(adj) =  0.632   Deviance explained = 66.2%
#> fREML = -9537.2  Scale est. = 1         n = 4785

This is in line with other work that we have conducted with this dataset using other methods (Richie-Halford et al. 2021).

If you are interested in exploring the ALS dataset even more, you can also see this data in an AFQ-Browser here (Yeatman et al. 2018).

References

Richie-Halford, Adam, Jason D Yeatman, Noah Simon, and Ariel Rokem. 2021. “Multidimensional Analysis and Detection of Informative Features in Human Brain White Matter.” PLoS Comput. Biol. 17 (6): e1009136.

Sarica, Alessia, Antonio Cerasa, Paola Valentino, Jason Yeatman, Maria Trotta, Stefania Barone, Alfredo Granata, et al. 2017. “The Corticospinal Tract Profile in Amyotrophic Lateral Sclerosis.” Hum. Brain Mapp. 38 (2): 727–39.

Yeatman, Jason D, Adam Richie-Halford, Josh K Smith, Anisha Keshavan, and Ariel Rokem. 2018. “A Browser-Based Tool for Visualization and Analysis of Diffusion MRI Data.” Nat. Commun. 9 (1): 940.