Use parametric statistics for group comparison¶

As a contrast to the approach presented in other examples, we also present a more "standard" statistical approach to analyze tract profile data. Here, we will conduct a point-by-point group comparison using a linear model fit using the standard ordinary least squares (OLS) method.

This example first fetches the ALS classification dataset from (Sarica et al, 2017). This dataset contains tractometry features from 24 patients with ALS and 24 demographically matched control subjects. It then uses the statsmodels library to compare between the tract profiles of the two groups in one tract (the left corticospinal tract) and in one feature (FA).

In [1]:
import os.path as op
from paths import afq_home

import matplotlib.pyplot as plt

from afqinsight import AFQDataset
from afqinsight.parametric import node_wise_regression
from afqinsight.plot import plot_regression_profiles

Read data from Sarica et al.¶

The nodes.csv file, which is the input here is the output of pyAFQ processing. The subjects.csv file contains phenotypic data (in this case, whether they have ALS or not in the variable "class") and uses the same subject identifiers as those that are stored in the pyAFQ output. This allows AFQ-Insight to merge the data between the two files.

In [2]:
afqdata = AFQDataset.from_files(op.join(afq_home, "afq-insight/sarica/nodes.csv"), 
                                op.join(afq_home, "afq-insight/sarica/subjects.csv"),
                                dwi_metrics=["fa", "md"], 
                                target_cols=["class"], 
                                label_encode_cols=["class"])
/Users/john/AFQ-Insight/afqinsight/transform.py:144: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
  features = interpolated.stack(["subjectID", "tractID", "metric"]).unstack(

Specify the tracts of interest¶

Many times we have a specific set of tracts we want to analyze. We can specify those tracts using a list with each tract of interest

In [3]:
tracts = ["Left Arcuate", "Right Arcuate", "Left Corticospinal", "Right Corticospinal"]

Set up plot, run linear models, and show the results¶

With the next few lines of code, we fit a linear model in order to predict group differences in the diffusion properties of our tracts of interest. The function node_wise_regression does this by looping through each node of a tract and predicting our diffusion metric, in this case FA, as a function of group. The initializer OLS.from_formula takes R-style formulas as its model specification. Here, we are using the "group" variable as a categorical variable in the model. We can also specify linear-mixed effects models for more complex phenotypic data by passing in the appropriate model formula and setting lme=True.

Because we conducted 100 comparisons, we need to correct the p-values that we obtained for the potential for a false discovery. There are multiple ways to conduct multiple comparison correction, and we will not get into the considerations in selecting a method here. The function node_wise_regression uses Benjamini/Hochberg FDR controlling method. This returns a boolean array for the p-values that are rejected at a specified alpha level (after correction), as well as an array of the corrected p-values.

In [4]:
num_cols = 2

# Define the figure and grid
fig, axes = plt.subplots(nrows=2, ncols=num_cols, figsize=(10, 6))


# Loop through the data and generate plots
for i, tract in enumerate(tracts):
    # fit node-wise regression for each tract based on model formula
    tract_dict = node_wise_regression(afqdata, tract, "fa", "fa ~ C(group)")

    row = i // num_cols
    col = i % num_cols

    axes[row][col].set_title(tract)

    # Visualize
    # ----------
    # We can visualize the results with the `plot_regression_profiles`
    # function. Each subplot shows the tract profiles of the two groups,
    # while controlling for any covariates, with stars indicating the nodes
    # at which the null hypothesis is rejected.

    plot_regression_profiles(tract_dict, axes[row][col])

# Adjust layout to prevent overlap
plt.tight_layout()

# Show the plot
plt.show()
No description has been provided for this image