Use parametric statistics for group comparison¶
As a contrast to the approach presented in other examples, we also present a more "standard" statistical approach to analyze tract profile data. Here, we will conduct a point-by-point group comparison using a linear model fit using the standard ordinary least squares (OLS) method.
This example first fetches the ALS classification dataset from (Sarica et al, 2017). This dataset contains tractometry features from 24 patients with ALS and 24 demographically matched control subjects. It then uses the statsmodels library to compare between the tract profiles of the two groups in one tract (the left corticospinal tract) and in one feature (FA).
import os.path as op
from paths import afq_home
import matplotlib.pyplot as plt
from afqinsight import AFQDataset
from afqinsight.parametric import node_wise_regression
from afqinsight.plot import plot_regression_profiles
Read data from Sarica et al.¶
The nodes.csv
file, which is the input here is the output of pyAFQ processing. The subjects.csv
file contains phenotypic data (in this case, whether they have ALS or not in the variable "class") and uses the same subject identifiers as those that are stored in the pyAFQ output. This allows AFQ-Insight to merge the data between the two files.
afqdata = AFQDataset.from_files(op.join(afq_home, "afq-insight/sarica/nodes.csv"),
op.join(afq_home, "afq-insight/sarica/subjects.csv"),
dwi_metrics=["fa", "md"],
target_cols=["class"],
label_encode_cols=["class"])
/Users/john/AFQ-Insight/afqinsight/transform.py:144: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning. features = interpolated.stack(["subjectID", "tractID", "metric"]).unstack(
Specify the tracts of interest¶
Many times we have a specific set of tracts we want to analyze. We can specify those tracts using a list with each tract of interest
tracts = ["Left Arcuate", "Right Arcuate", "Left Corticospinal", "Right Corticospinal"]
Set up plot, run linear models, and show the results¶
With the next few lines of code, we fit a linear model in order to predict
group differences in the diffusion properties of our tracts of interest. The
function node_wise_regression
does this by looping through each node of a
tract and predicting our diffusion metric, in this case FA, as a function of
group. The initializer OLS.from_formula
takes R-style formulas as its model specification.
Here, we are using the "group"
variable as a categorical variable in the model.
We can also specify linear-mixed effects models for more complex phenotypic data
by passing in the appropriate model formula and setting lme=True
.
Because we conducted 100 comparisons, we need to correct the p-values that
we obtained for the potential for a false discovery. There are multiple
ways to conduct multiple comparison correction, and we will not get into
the considerations in selecting a method here. The function node_wise_regression
uses Benjamini/Hochberg FDR controlling method. This returns a boolean array for
the p-values that are rejected at a specified alpha level (after correction),
as well as an array of the corrected p-values.
num_cols = 2
# Define the figure and grid
fig, axes = plt.subplots(nrows=2, ncols=num_cols, figsize=(10, 6))
# Loop through the data and generate plots
for i, tract in enumerate(tracts):
# fit node-wise regression for each tract based on model formula
tract_dict = node_wise_regression(afqdata, tract, "fa", "fa ~ C(group)")
row = i // num_cols
col = i % num_cols
axes[row][col].set_title(tract)
# Visualize
# ----------
# We can visualize the results with the `plot_regression_profiles`
# function. Each subplot shows the tract profiles of the two groups,
# while controlling for any covariates, with stars indicating the nodes
# at which the null hypothesis is rejected.
plot_regression_profiles(tract_dict, axes[row][col])
# Adjust layout to prevent overlap
plt.tight_layout()
# Show the plot
plt.show()