Fitting Generalized Additive Models (GAMs)
Source:vignettes/tractable-single-bundle.Rmd
tractable-single-bundle.Rmd
This vignette demonstrates the use of GAMs for statistical analysis of tract profile data. The data we will use here contains tract profiles from diffusion MRI measurements in a group of patients with Amyotrophic Lateral Sclerosis (ALS) and a group of matched controls (Sarica et al. 2017).
We start by loading the tractable
library:
Next, we will use a function that is included in
tractable
to read this dataset directly into memory.
Importantly, both the group (“ALS” or “CTRL”) and the subject identifier
(“subjectID”) need to be factors for subsequent analysis to work
properly.
sarica <- tractable::read_afq_sarica()
sarica$group <- factor(sarica$class)
sarica$subjectID <- unclass(factor(sarica$subjectID))
First, let’s visualize the data. We use the
plot_tract_profiles
function, selecting to view both
fractional anisotropy (FA) and mean diffusivity profiles in two tracts:
the right corticospinal tract (CST) and the right superior longitudinal
fasciculus (SLF), which are identified in the “tractID” column of this
dataset.
tractable::plot_tract_profiles(
df = sarica,
group_col = "group",
n_groups = 2,
metrics = c("fa", "md"),
bundles = list("Right Corticospinal", "Right SLF"),
bundles_col = "tractID"
)
#> Warning: Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> $fa
#> Warning: Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#> Removed 81 rows containing non-finite outside the scale range
#> (`stat_summary()`).
#>
#> $md
We can already see that ALS has a profound effect on the tract
profiles of the CST, but does not affect SLF as much. We will use GAMs
in order to quantify this in statistical terms. We start by fitting a
GAM model to the data from the CST. Using the
tractable_single_bundle
function, we select the Right CST
data, and focus here only on FA. We use “group” and “age” as relevant
covariates. Comparing group as a main effect, that will also be used to
fit separate smooth functions for each category of subjects. The
mgcv
GAM functions use a parameter k
to
determine how many spline functions to use in fitting the smooth change
of FA over the length of the tract. We use an automated strategy to find
k
.
gam_fit_cst <- tractable::tractable_single_bundle(
df_afq = sarica,
tract = "Right Corticospinal",
participant_id = "subjectID",
group_by = "group",
covariates = c("age", "group"),
dwi_metric = "fa",
k = "auto"
)
Examining the summary of the resulting GAM fit object shows us that
the k=16
is sufficiently large to describe the spatial
variation of tract profile data. In addition, we see that there is a
statistically significant effect of group (with a p-value of 4.66e-10)
and no statistically significant effect of age (p=0.2748).
summary(gam_fit_cst)
#>
#> Family: Beta regression(61.146)
#> Link function: logit
#>
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID,
#> bs = "re")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.329785 0.245062 -1.346 0.178
#> age 0.003904 0.003960 0.986 0.324
#> groupCTRL 0.144626 0.071782 2.015 0.044 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(nodeID):groupALS 14.50 14.97 74.217 < 2e-16 ***
#> s(nodeID):groupCTRL 14.38 14.95 85.385 < 2e-16 ***
#> s(subjectID) 37.89 45.00 9.105 0.000674 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.613 Deviance explained = 62.3%
#> fREML = -6443.6 Scale est. = 1 n = 4734
Running the same analysis on the data from SLF, we see that there is no significant difference between the groups in this bundle, indicating that the effect observed in CST is rather specific to this bundle.
gam_fit_slf <- tractable::tractable_single_bundle(
df_afq = sarica,
tract = "Right SLF",
participant_id = "subjectID",
group_by = "group",
covariates = c("age","group"),
dwi_metric = "fa",
k = "auto"
)
summary(gam_fit_slf)
#>
#> Family: Beta regression(95.864)
#> Link function: logit
#>
#> Formula:
#> fa ~ age + group + s(nodeID, by = group, k = 16) + s(subjectID,
#> bs = "re")
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.3103302 0.1374484 -2.258 0.024 *
#> age -0.0006771 0.0022217 -0.305 0.761
#> groupCTRL 0.0238207 0.0389222 0.612 0.541
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(nodeID):groupALS 14.75251 14.99 147.703 <2e-16 ***
#> s(nodeID):groupCTRL 14.75346 14.99 158.462 <2e-16 ***
#> s(subjectID) 0.01141 45.00 0.012 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.632 Deviance explained = 66.2%
#> fREML = -9537.2 Scale est. = 1 n = 4785
This is in line with other work that we have conducted with this dataset using other methods (Richie-Halford et al. 2021).
If you are interested in exploring the ALS dataset even more, you can also see this data in an AFQ-Browser here (Yeatman et al. 2018).