How pyAFQ uses BIDS#
The pyAFQ API relies heavily on the Brain Imaging Data Standard (BIDS), a widely used standard for organizing and describing neuroimaging data. This means that the software assumes that its inputs are organized according to the BIDS specification and its outputs conform where possible with BIDS.
Note
Derivatives of processing diffusion MRI are not currently fully described in the existing BIDS specification, but describing these is part of an ongoing effort. Wherever possible, we conform with the draft implementation of the BIDS DWI derivatives available [here](https://bids-specification.readthedocs.io/en/wip-derivatives/05-derivatives/05-diffusion-derivatives.html)
In this example, we will explore the use of BIDS in pyAFQ and see how BIDS allows us to extend and provide flexibility to the users of the software.
import os
import os.path as op
import AFQ.api.bundle_dict as abd
from AFQ.api.group import GroupAFQ
import AFQ.data.fetch as afd
import AFQ.definitions.image as afm
2026-05-19 01:06:53,862 INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
To interact with and query BIDS datasets, we use pyBIDS, which we import here:
import bids
from bids.layout import BIDSLayout
We start with some example data. The data we will use here is
generated from the
Stanford HARDI dataset.
The call below fetches
this dataset and organized it within the ~/AFQ_data folder in the BIDS
format.
afd.organize_stanford_data(clear_previous_afq="all")
After doing that, we should have a folder that looks like this:
| stanford_hardi | ├── dataset_description.json | └── derivatives | ├── freesurfer | │ ├── dataset_description.json | │ └── sub-01 | │ └── ses-01 | │ └── anat | │ ├── sub-01_ses-01_T1w.nii.gz | │ └── sub-01_ses-01_seg.nii.gz | └── vistasoft | ├── dataset_description.json | └── sub-01 | └── ses-01 | └── dwi | ├── sub-01_ses-01_dwi.bvals | ├── sub-01_ses-01_dwi.bvecs | └── sub-01_ses-01_dwi.nii.gz
The top level directory (stanford_hardi) is our overall BIDS dataset folder.
In many cases, this folder will include folders with raw data for each subject
in the dataset. In this case, we do not include the raw data folders and only
have the outputs of pipelines that were used to preprocess the data (e.g.,
correct the data for subject motion, eddy currents, and so forth).
In general, only the preprocessed diffusion data is required for pyAFQ to run.
See the :doc:"Organizing your data" </howto/data> section of the
documentation for more details.
In this case, one folder contains derivative of the Freesurfer software and
another folder contains the DWI data that has been preprocessed with the
Vistasoft software.
pyAFQ provides facilities to segment tractography results obtained
using other software as well. For example, we often use
qsiprep to preprocess
our data and reconstruct tractographies with software such as
MRTRIX. Here, we will demonstrate how to use
these reconstructions in the pyAFQ segmentation and tractometry pipeline
We fetch this data and add it as a separate pipeline
The following code will download a previously-created tractography and
organize it by adding it to the BIDS dataset folder and renaming them to be
BIDS-compliant (e.g., sub-01_ses_01_dwi_tractography.trk).
afd.fetch_stanford_hardi_tractography()
bids_path = op.join(op.expanduser('~'), 'AFQ_data', 'stanford_hardi')
tractography_path = op.join(bids_path, 'derivatives', 'my_tractography')
sub_path = op.join(tractography_path, 'sub-01', 'ses-01', 'dwi')
seg_file = op.join(afd.afq_home, "stanford_hardi", "derivatives",
"freesurfer", "sub-01", "ses-01", "anat",
"sub-01_ses-01_seg.nii.gz")
pve = afm.PVEImages(
afm.LabelledImageFile(
path=seg_file,
inclusive_labels=[0]),
afm.LabelledImageFile(
path=seg_file,
exclusive_labels=[0, 1, 2], combine="and"),
afm.LabelledImageFile(
path=seg_file,
inclusive_labels=[1, 2]))
os.makedirs(sub_path, exist_ok=True)
os.rename(
op.join(
op.expanduser('~'),
'AFQ_data',
'stanford_hardi_tractography',
'full_segmented_cleaned_tractography.trk'),
op.join(
sub_path,
'sub-01_ses-01-dwi_tractography.trk'))
afd.to_bids_description(
tractography_path,
**{"Name": "my_tractography",
"PipelineDescription": {"Name": "my_tractography"},
"GeneratedBy": [{"Name": "my_tractography"}]})
0%| | 0/11337 MB [00:00]
0%| | 2/11337 MB [00:00]
0%| | 5/11337 MB [00:00]
0%| | 9/11337 MB [00:00]
0%| | 13/11337 MB [00:00]
0%| | 22/11337 MB [00:00]
0%| | 29/11337 MB [00:00]
0%| | 44/11337 MB [00:00]
0%| | 55/11337 MB [00:00]
1%| | 84/11337 MB [00:00]
1%| | 114/11337 MB [00:01]
1%|▏ | 170/11337 MB [00:01]
2%|▏ | 227/11337 MB [00:01]
3%|▎ | 332/11337 MB [00:01]
4%|▍ | 443/11337 MB [00:01]
6%|▌ | 646/11337 MB [00:01]
8%|▊ | 858/11337 MB [00:01]
9%|▉ | 1077/11337 MB [00:01]
12%|█▏ | 1323/11337 MB [00:01]
14%|█▍ | 1564/11337 MB [00:01]
16%|█▌ | 1809/11337 MB [00:02]
18%|█▊ | 2056/11337 MB [00:02]
20%|██ | 2299/11337 MB [00:02]
22%|██▏ | 2541/11337 MB [00:02]
25%|██▍ | 2782/11337 MB [00:02]
27%|██▋ | 3026/11337 MB [00:02]
29%|██▉ | 3268/11337 MB [00:02]
31%|███ | 3510/11337 MB [00:02]
33%|███▎ | 3752/11337 MB [00:02]
35%|███▌ | 3998/11337 MB [00:03]
37%|███▋ | 4245/11337 MB [00:03]
40%|███▉ | 4491/11337 MB [00:03]
42%|████▏ | 4734/11337 MB [00:03]
44%|████▍ | 4979/11337 MB [00:03]
46%|████▌ | 5226/11337 MB [00:03]
48%|████▊ | 5473/11337 MB [00:03]
50%|█████ | 5717/11337 MB [00:03]
53%|█████▎ | 5963/11337 MB [00:03]
55%|█████▍ | 6208/11337 MB [00:04]
57%|█████▋ | 6451/11337 MB [00:04]
59%|█████▉ | 6692/11337 MB [00:04]
61%|██████ | 6938/11337 MB [00:04]
63%|██████▎ | 7184/11337 MB [00:04]
66%|██████▌ | 7431/11337 MB [00:04]
68%|██████▊ | 7671/11337 MB [00:04]
70%|██████▉ | 7916/11337 MB [00:04]
72%|███████▏ | 8161/11337 MB [00:04]
74%|███████▍ | 8407/11337 MB [00:04]
76%|███████▋ | 8650/11337 MB [00:05]
78%|███████▊ | 8895/11337 MB [00:05]
81%|████████ | 9142/11337 MB [00:05]
83%|████████▎ | 9388/11337 MB [00:05]
85%|████████▍ | 9635/11337 MB [00:05]
87%|████████▋ | 9879/11337 MB [00:05]
89%|████████▉ | 10124/11337 MB [00:05]
91%|█████████▏| 10368/11337 MB [00:05]
94%|█████████▎| 10613/11337 MB [00:05]
96%|█████████▌| 10855/11337 MB [00:06]
98%|█████████▊| 11097/11337 MB [00:06]
100%|██████████| 11337/11337 MB [00:06]
0%| | 0/14 MB [00:00]
14%|█▍ | 2/14 MB [00:00]
29%|██▊ | 4/14 MB [00:00]
50%|█████ | 7/14 MB [00:00]
71%|███████▏ | 10/14 MB [00:00]
100%|██████████| 14/14 MB [00:00]
0%| | 0/1037 MB [00:00]
0%| | 2/1037 MB [00:00]
0%| | 4/1037 MB [00:00]
1%| | 7/1037 MB [00:00]
1%| | 11/1037 MB [00:00]
1%|▏ | 15/1037 MB [00:00]
2%|▏ | 23/1037 MB [00:00]
3%|▎ | 30/1037 MB [00:00]
4%|▍ | 39/1037 MB [00:00]
5%|▌ | 54/1037 MB [00:00]
6%|▋ | 66/1037 MB [00:01]
8%|▊ | 86/1037 MB [00:01]
11%|█ | 111/1037 MB [00:01]
13%|█▎ | 133/1037 MB [00:01]
17%|█▋ | 178/1037 MB [00:01]
20%|█▉ | 206/1037 MB [00:01]
29%|██▊ | 296/1037 MB [00:01]
36%|███▋ | 377/1037 MB [00:01]
50%|████▉ | 517/1037 MB [00:01]
64%|██████▍ | 668/1037 MB [00:01]
88%|████████▊ | 916/1037 MB [00:02]
100%|██████████| 1037/1037 MB [00:02]
After we do that, our dataset folder should look like this:
| stanford_hardi | ├── dataset_description.json | └── derivatives | ├── freesurfer | │ ├── dataset_description.json | │ └── sub-01 | │ └── ses-01 | │ └── anat | │ ├── sub-01_ses-01_T1w.nii.gz | │ └── sub-01_ses-01_seg.nii.gz | ├── my_tractography | | ├── dataset_description.json | │ └── sub-01 | │ └── ses-01 | │ └── dwi | │ └── sub-01_ses-01-dwi_tractography.trk | └── vistasoft | ├── dataset_description.json | └── sub-01 | └── ses-01 | └── dwi | ├── sub-01_ses-01_dwi.bvals | ├── sub-01_ses-01_dwi.bvecs | └── sub-01_ses-01_dwi.nii.gz
To explore the layout of these derivatives, we will initialize a
:class:BIDSLayout class instance to help us see what is in this dataset
layout = bids.BIDSLayout(bids_path, derivatives=True)
Because there is no raw data in this BIDS layout (only derivatives), pybids will report that there are no subjects and sessions:
print(layout)
BIDS Layout: ...runner/AFQ_data/stanford_hardi | Subjects: 0 | Sessions: 0 | Runs: 0
But a query on the derivatives will reveal the different derivatives that are stored here:
print(layout.derivatives)
{'derivatives/recobundles': BIDS Layout: ..._hardi/derivatives/recobundles | Subjects: 1 | Sessions: 1 | Runs: 0, 'derivatives/my_tractography': BIDS Layout: ...di/derivatives/my_tractography | Subjects: 1 | Sessions: 1 | Runs: 0, 'derivatives/vistasoft': BIDS Layout: ...rd_hardi/derivatives/vistasoft | Subjects: 1 | Sessions: 1 | Runs: 0, 'derivatives/freesurfer': BIDS Layout: ...d_hardi/derivatives/freesurfer | Subjects: 1 | Sessions: 1 | Runs: 0}
We can use a :class:bids.BIDSValidator object to make sure that the
files within our data set are BIDS-compliant. For example, we can
extract the tractography derivatives part of our layout using:
my_tractography = layout.derivatives["my_tractography"]
This variable is also a BIDS layout object. This object has a get
method, which allows us to query and find specific items within the
layout. For example, we can ask for files that have a suffix consistent
with tractography results:
tractography_files = my_tractography.get(suffix='tractography')
Or ask for files that have a .trk extension:
tractography_files = my_tractography.get(extension='.trk')
In this case, both of these would produce the same result.
tractography_file = tractography_files[0]
print(tractography_file)
<BIDSFile filename='/home/runner/AFQ_data/stanford_hardi/derivatives/my_tractography/sub-01/ses-01/dwi/sub-01_ses-01-dwi_tractography.trk'>
We can also get some more structured information about this file:
print(tractography_file.get_entities())
{'subject': '01', 'session': '01', 'suffix': 'tractography', 'extension': '.trk', 'datatype': 'dwi'}
We can use a :class:bids.BIDSValidator class instance to validate that
this file is compliant with the specification. Note that the validator
requires that the filename be provided relative to the root of the BIDS
dataset, so we have to split the string that contains the full path
of the tractography to extract only the part that is relative to the
root of the entire BIDS layout object:
tractography_full_path = tractography_file.path
tractography_relative_path = tractography_full_path.split(layout.root)[-1]
validator = bids.BIDSValidator()
print(validator.is_bids(tractography_relative_path))
True
Next, we specify the information we need to define the bundles that we are interested in segmenting. In this case, we are going to use a list of bundle names for the bundle info. These names refer to bundles for which we already have clear definitions of the information needed to segment them (e.g., waypoint ROIs and probability maps). For an example that includes custom definition of bundle info, see the plot_callosal_tract_profile example.
bundle_info = abd.default_bd()[
"Left Inferior Longitudinal",
"Right Inferior Longitudinal",
"Left Arcuate",
"Right Arcuate",
"Left Corticospinal",
"Right Corticospinal"]
Now, we can define our GroupAFQ object, pointing to the derivatives of the
'my_tractography' pipeline as inputs. This is done by setting the
import_tract key-word argument. We pass the
bundle_info defined above. We also point to the preprocessed
data that is in a 'dmriprep' pipeline. Note that the pipeline name
is not necessarily the name of the folder it is in; the pipeline name is
defined in each pipeline’s dataset_description.json. These data were
preprocessed with ‘vistasoft’, so this is the pipeline we’ll point to
If we were using 'qsiprep', this is where we would pass that
string instead. If we did that, AFQ would look for a derivatives
folder called 'stanford_hardi/derivatives/qsiprep' and find the
preprocessed DWI data within it. Finally, to speed things up
a bit, we also sub-sample the provided tractography. This is
done by defining the segmentation_params dictionary input.
To sub-sample to 10,000 streamlines, we define
'nb_streamlines' = 10000.
my_afq = GroupAFQ(
bids_path,
dwi_preproc_pipeline='vistasoft',
t1_preproc_pipeline='freesurfer',
bundle_info=bundle_info,
import_tract={
"suffix": "tractography",
"scope": "my_tractography"
},
pve=pve,
segmentation_params={'nb_streamlines': 10000})
INFO:AFQ:Using the following files for subject 01 and session 01:
INFO:AFQ: DWI: /home/runner/AFQ_data/stanford_hardi/derivatives/vistasoft/sub-01/ses-01/dwi/sub-01_ses-01_dwi.nii.gz
INFO:AFQ: BVAL: /home/runner/AFQ_data/stanford_hardi/derivatives/vistasoft/sub-01/ses-01/dwi/sub-01_ses-01_dwi.bval
INFO:AFQ: BVEC: /home/runner/AFQ_data/stanford_hardi/derivatives/vistasoft/sub-01/ses-01/dwi/sub-01_ses-01_dwi.bvec
INFO:AFQ: T1: /home/runner/AFQ_data/stanford_hardi/derivatives/freesurfer/sub-01/ses-01/anat/sub-01_ses-01_T1w.nii.gz
Finally, to run the segmentation and extract tract profiles, we call
The export_all method. This creates all of the derivative outputs of
AFQ within the ‘stanford_hardi/derivatives/afq’ folder.
my_afq.export_all()
INFO:AFQ:Calculating _b0ref.nii.gz...
INFO:AFQ:_b0ref.nii.gz completed. Saving to /home/runner/AFQ_data/stanford_hardi/derivatives/afq/sub-01/ses-01/dwi/sub-01_ses-01_b0ref.nii.gz
INFO:AFQ:Calculating _desc-brain_mask.nii.gz...
2026-05-19 01:07:16.470411082 [W:onnxruntime:Default, device_discovery.cc:133 GetPciBusId] Skipping pci_bus_id for PCI path at "/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/MSFT1000:00/5620e0c7-8062-4dce-aeb7-520c7ef76171" because filename "5620e0c7-8062-4dce-aeb7-520c7ef76171" did not match expected pattern of [0-9a-f]+:[0-9a-f]+:[0-9a-f]+[.][0-9a-f]+
INFO:AFQ:Calculating _desc-T1w_mask.nii.gz...
INFO:AFQ:Running mindgrab...
A few common issues that can hinder BIDS from working properly are:
Faulty
dataset_description.jsonfile. You need to make sure that the file contains the right names for the pipeline. See above for an example of that.File naming convention doesn’t uniquely identify file with bids filters.
The outputs of AFQ are also BIDS compatible. Here we demonstrate how to load the afq entities and show all files with the key-value pair desc-bundles
layout = BIDSLayout(bids_path)
layout.add_derivatives(
f'{bids_path}/derivatives/afq',
config=['bids', 'derivatives'])
print(layout.get(desc="bundles", return_type="filename"))