How pyAFQ uses BIDS#

The pyAFQ API relies heavily on the Brain Imaging Data Standard (BIDS), a widely used standard for organizing and describing neuroimaging data. This means that the software assumes that its inputs are organized according to the BIDS specification and its outputs conform where possible with BIDS.

Note

Derivatives of processing diffusion MRI are not currently fully described in the existing BIDS specification, but describing these is part of an ongoing effort. Wherever possible, we conform with the draft implementation of the BIDS DWI derivatives available [here](https://bids-specification.readthedocs.io/en/wip-derivatives/05-derivatives/05-diffusion-derivatives.html)

In this example, we will explore the use of BIDS in pyAFQ and see how BIDS allows us to extend and provide flexibility to the users of the software.

import os
import os.path as op

import AFQ.api.bundle_dict as abd
from AFQ.api.group import GroupAFQ
import AFQ.data.fetch as afd
import AFQ.definitions.image as afm
2026-05-26 23:03:32,219	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.

To interact with and query BIDS datasets, we use pyBIDS, which we import here:

import bids
from bids.layout import BIDSLayout

We start with some example data. The data we will use here is generated from the Stanford HARDI dataset. The call below fetches this dataset and organized it within the ~/AFQ_data folder in the BIDS format.

afd.organize_stanford_data(clear_previous_afq="all")
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[3], line 1
----> 1 afd.organize_stanford_data(clear_previous_afq="all")

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/AFQ/data/fetch.py:1819, in organize_stanford_data(path, clear_previous_afq)
   1817 # fetches data for first subject and session
   1818 logger.info("fetching Stanford HARDI data")
-> 1819 dpd.fetch_stanford_hardi()
   1821 if path is None:
   1822     if not op.exists(afq_home):

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/dipy/data/fetcher.py:494, in _make_fetcher.<locals>.fetcher(include_optional)
    491         continue
    492     files[str(n)] = (baseurl + f, md5_list[i] if md5_list is not None else None)
--> 494 fetch_data(files, folder, data_size=data_size, use_headers=use_headers)
    496 if msg is not None:
    497     logger.info(msg)

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/dipy/testing/decorators.py:201, in warning_for_keywords.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    194 # Check if the current version is within the warning range
    195 if (
    196     version.parse(from_version)
    197     <= version.parse(current_version)
    198     <= version.parse(until_version)
    199 ):
    200     # Convert positional to keyword arguments and issue a warning
--> 201     return convert_positional_to_keyword(func, args, kwargs)
    203 # If the version is greater than the until_version,
    204 # pass the arguments as they are
    205 elif version.parse(current_version) > version.parse(until_version):

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/dipy/testing/decorators.py:192, in warning_for_keywords.<locals>.decorator.<locals>.wrapper.<locals>.convert_positional_to_keyword(func, args, kwargs)
    182         warnings.warn(
    183             f"Pass {positionally_passed_kwonly_args} as keyword args. "
    184             f"From version {until_version} passing these as positional "
   (...)    187             stacklevel=3,
    188         )
    190     return func(*positional_args, **corrected_kwargs)
--> 192 return func(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/dipy/data/fetcher.py:397, in fetch_data(files, folder, data_size, use_headers, raise_on_error)
    395 logger.info(f"From: {url}")
    396 try:
--> 397     _get_file_data(fullpath, url, use_headers=use_headers, stored_md5=md5)
    398     successful_downloads += 1
    399 except (FetcherError, Exception) as e:

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/site-packages/dipy/data/fetcher.py:262, in _get_file_data(fname, url, use_headers, timeout, max_retries, stored_md5)
    260 with open(fname, "wb") as data:
    261     if response_size is None:
--> 262         copyfileobj(opener, data)
    263     else:
    264         copyfileobj_withprogress(opener, data, response_size)

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/shutil.py:203, in copyfileobj(fsrc, fdst, length)
    201 fsrc_read = fsrc.read
    202 fdst_write = fdst.write
--> 203 while buf := fsrc_read(length):
    204     fdst_write(buf)

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/http/client.py:478, in HTTPResponse.read(self, amt)
    475     return b""
    477 if self.chunked:
--> 478     return self._read_chunked(amt)
    480 if amt is not None and amt >= 0:
    481     if self.length is not None and amt > self.length:
    482         # clip the read to the "end of response"

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/http/client.py:608, in HTTPResponse._read_chunked(self, amt)
    605     self.chunk_left = chunk_left - amt
    606     break
--> 608 value.append(self._safe_read(chunk_left))
    609 if amt is not None:
    610     amt -= chunk_left

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/http/client.py:648, in HTTPResponse._safe_read(self, amt)
    641 """Read the number of bytes requested.
    642 
    643 This function should be used when <amt> bytes "should" be present for
    644 reading. If the bytes are truly not available (due to EOF), then the
    645 IncompleteRead exception can be used to detect the problem.
    646 """
    647 cursize = min(amt, _MIN_READ_BUF_SIZE)
--> 648 data = self.fp.read(cursize)
    649 if len(data) >= amt:
    650     return data

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/socket.py:719, in SocketIO.readinto(self, b)
    717     raise OSError("cannot read from timed out object")
    718 try:
--> 719     return self._sock.recv_into(b)
    720 except timeout:
    721     self._timeout_occurred = True

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/ssl.py:1304, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1300     if flags != 0:
   1301         raise ValueError(
   1302           "non-zero flags not allowed in calls to recv_into() on %s" %
   1303           self.__class__)
-> 1304     return self.read(nbytes, buffer)
   1305 else:
   1306     return super().recv_into(buffer, nbytes, flags)

File /opt/hostedtoolcache/Python/3.13.13/x64/lib/python3.13/ssl.py:1138, in SSLSocket.read(self, len, buffer)
   1136 try:
   1137     if buffer is not None:
-> 1138         return self._sslobj.read(len, buffer)
   1139     else:
   1140         return self._sslobj.read(len)

KeyboardInterrupt: 

After doing that, we should have a folder that looks like this:

| stanford_hardi | ├── dataset_description.json | └── derivatives | ├── freesurfer | │   ├── dataset_description.json | │   └── sub-01 | │   └── ses-01 | │   └── anat | │   ├── sub-01_ses-01_T1w.nii.gz | │   └── sub-01_ses-01_seg.nii.gz | └── vistasoft | ├── dataset_description.json | └── sub-01 | └── ses-01 | └── dwi | ├── sub-01_ses-01_dwi.bvals | ├── sub-01_ses-01_dwi.bvecs | └── sub-01_ses-01_dwi.nii.gz

The top level directory (stanford_hardi) is our overall BIDS dataset folder. In many cases, this folder will include folders with raw data for each subject in the dataset. In this case, we do not include the raw data folders and only have the outputs of pipelines that were used to preprocess the data (e.g., correct the data for subject motion, eddy currents, and so forth). In general, only the preprocessed diffusion data is required for pyAFQ to run. See the :doc:"Organizing your data" </howto/data> section of the documentation for more details. In this case, one folder contains derivative of the Freesurfer software and another folder contains the DWI data that has been preprocessed with the Vistasoft software. pyAFQ provides facilities to segment tractography results obtained using other software as well. For example, we often use qsiprep to preprocess our data and reconstruct tractographies with software such as MRTRIX. Here, we will demonstrate how to use these reconstructions in the pyAFQ segmentation and tractometry pipeline We fetch this data and add it as a separate pipeline The following code will download a previously-created tractography and organize it by adding it to the BIDS dataset folder and renaming them to be BIDS-compliant (e.g., sub-01_ses_01_dwi_tractography.trk).

afd.fetch_stanford_hardi_tractography()

bids_path = op.join(op.expanduser('~'), 'AFQ_data', 'stanford_hardi')
tractography_path = op.join(bids_path, 'derivatives', 'my_tractography')
sub_path = op.join(tractography_path, 'sub-01', 'ses-01', 'dwi')

seg_file = op.join(afd.afq_home, "stanford_hardi", "derivatives",
                   "freesurfer", "sub-01", "ses-01", "anat",
                   "sub-01_ses-01_seg.nii.gz")
pve = afm.PVEImages(
    afm.LabelledImageFile(
        path=seg_file,
        inclusive_labels=[0]),
    afm.LabelledImageFile(
        path=seg_file,
        exclusive_labels=[0, 1, 2], combine="and"),
    afm.LabelledImageFile(
        path=seg_file,
        inclusive_labels=[1, 2]))

os.makedirs(sub_path, exist_ok=True)
os.rename(
    op.join(
        op.expanduser('~'),
        'AFQ_data',
        'stanford_hardi_tractography',
        'full_segmented_cleaned_tractography.trk'),
    op.join(
        sub_path,
        'sub-01_ses-01-dwi_tractography.trk'))

afd.to_bids_description(
    tractography_path,
    **{"Name": "my_tractography",
       "PipelineDescription": {"Name": "my_tractography"},
       "GeneratedBy": [{"Name": "my_tractography"}]})

After we do that, our dataset folder should look like this:

| stanford_hardi | ├── dataset_description.json | └── derivatives | ├── freesurfer | │   ├── dataset_description.json | │   └── sub-01 | │   └── ses-01 | │   └── anat | │   ├── sub-01_ses-01_T1w.nii.gz | │   └── sub-01_ses-01_seg.nii.gz | ├── my_tractography | | ├── dataset_description.json | │   └── sub-01 | │   └── ses-01 | │   └── dwi | │   └── sub-01_ses-01-dwi_tractography.trk | └── vistasoft | ├── dataset_description.json | └── sub-01 | └── ses-01 | └── dwi | ├── sub-01_ses-01_dwi.bvals | ├── sub-01_ses-01_dwi.bvecs | └── sub-01_ses-01_dwi.nii.gz

To explore the layout of these derivatives, we will initialize a :class:BIDSLayout class instance to help us see what is in this dataset

layout = bids.BIDSLayout(bids_path, derivatives=True)

Because there is no raw data in this BIDS layout (only derivatives), pybids will report that there are no subjects and sessions:

print(layout)

But a query on the derivatives will reveal the different derivatives that are stored here:

print(layout.derivatives)

We can use a :class:bids.BIDSValidator object to make sure that the files within our data set are BIDS-compliant. For example, we can extract the tractography derivatives part of our layout using:

my_tractography = layout.derivatives["my_tractography"]

This variable is also a BIDS layout object. This object has a get method, which allows us to query and find specific items within the layout. For example, we can ask for files that have a suffix consistent with tractography results:

tractography_files = my_tractography.get(suffix='tractography')

Or ask for files that have a .trk extension:

tractography_files = my_tractography.get(extension='.trk')

In this case, both of these would produce the same result.

tractography_file = tractography_files[0]
print(tractography_file)

We can also get some more structured information about this file:

print(tractography_file.get_entities())

We can use a :class:bids.BIDSValidator class instance to validate that this file is compliant with the specification. Note that the validator requires that the filename be provided relative to the root of the BIDS dataset, so we have to split the string that contains the full path of the tractography to extract only the part that is relative to the root of the entire BIDS layout object:

tractography_full_path = tractography_file.path
tractography_relative_path = tractography_full_path.split(layout.root)[-1]

validator = bids.BIDSValidator()
print(validator.is_bids(tractography_relative_path))

Next, we specify the information we need to define the bundles that we are interested in segmenting. In this case, we are going to use a list of bundle names for the bundle info. These names refer to bundles for which we already have clear definitions of the information needed to segment them (e.g., waypoint ROIs and probability maps). For an example that includes custom definition of bundle info, see the plot_callosal_tract_profile example.

bundle_info = abd.default_bd()[
    "Left Inferior Longitudinal",
    "Right Inferior Longitudinal",
    "Left Arcuate",
    "Right Arcuate",
    "Left Corticospinal",
    "Right Corticospinal"]

Now, we can define our GroupAFQ object, pointing to the derivatives of the 'my_tractography' pipeline as inputs. This is done by setting the import_tract key-word argument. We pass the bundle_info defined above. We also point to the preprocessed data that is in a 'dmriprep' pipeline. Note that the pipeline name is not necessarily the name of the folder it is in; the pipeline name is defined in each pipeline’s dataset_description.json. These data were preprocessed with ‘vistasoft’, so this is the pipeline we’ll point to If we were using 'qsiprep', this is where we would pass that string instead. If we did that, AFQ would look for a derivatives folder called 'stanford_hardi/derivatives/qsiprep' and find the preprocessed DWI data within it. Finally, to speed things up a bit, we also sub-sample the provided tractography. This is done by defining the segmentation_params dictionary input. To sub-sample to 10,000 streamlines, we define 'nb_streamlines' = 10000.

my_afq = GroupAFQ(
    bids_path,
    dwi_preproc_pipeline='vistasoft',
    t1_preproc_pipeline='freesurfer',
    bundle_info=bundle_info,
    import_tract={
        "suffix": "tractography",
        "scope": "my_tractography"
    },
    pve=pve,
    segmentation_params={'nb_streamlines': 10000})

Finally, to run the segmentation and extract tract profiles, we call The export_all method. This creates all of the derivative outputs of AFQ within the ‘stanford_hardi/derivatives/afq’ folder.

my_afq.export_all()

A few common issues that can hinder BIDS from working properly are:

  1. Faulty dataset_description.json file. You need to make sure that the file contains the right names for the pipeline. See above for an example of that.

  2. File naming convention doesn’t uniquely identify file with bids filters.

The outputs of AFQ are also BIDS compatible. Here we demonstrate how to load the afq entities and show all files with the key-value pair desc-bundles

layout = BIDSLayout(bids_path)
layout.add_derivatives(
    f'{bids_path}/derivatives/afq',
    config=['bids', 'derivatives'])
print(layout.get(desc="bundles", return_type="filename"))