.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/demo_afq_dataset.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_demo_afq_dataset.py: ===================================== Load and interact with an AFQ dataset ===================================== This example loads AFQ data from CSV files and manipulates that data using scikit-learn transformers and estimators. First we fetch the Weston-Havens dataset described in Yeatman et al [1]_. This dataset contains tractometry features from 77 subjects ages 6-50. Next, we split the dataset into a train and test split, impute missing values, and fit a LASSO model, all using :class:`AFQDataset` methods. Predictive performance for the default LASSO model is abysmal. It is only used here to demonstrate the use of scikit-learn estimators. In a research setting, one might use more advanced estimators, such as the SGL [2]_, a gradient boosting machine, or a neural network. Finally, we convert the AFQDataset to a tensorflow dataset and fit a basic one-dimensional CNN to predict age from the features. This last step requires that AFQ-Insight has been installed with:: pip install afqinsight[tf] or that tensorflow has been separately installed with:: pip install tensorflow .. [1] Jason D. Yeatman, Brian A. Wandell, & Aviv A. Mezer, "Lifespan maturation and degeneration of human brain white matter" Nature Communications, vol. 5:1, pp. 4932, 2014 DOI: 10.1038/ncomms5932 .. [2] Adam Richie-Halford, Jason Yeatman, Noah Simon, and Ariel Rokem "Multidimensional analysis and detection of informative features in human brain white matter" PLOS Computational Biology, 2021 DOI: 10.1371/journal.pcbi.1009136 .. GENERATED FROM PYTHON SOURCE LINES 38-51 .. code-block:: Python import os.path as op import tensorflow as tf from sklearn.impute import SimpleImputer from sklearn.linear_model import Lasso from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split import afqinsight.nn.tf_models as nn from afqinsight import AFQDataset from afqinsight.datasets import download_weston_havens .. GENERATED FROM PYTHON SOURCE LINES 52-64 Fetch example data ------------------ The :func:`download_weston_havens` function download the data used in this example and places it in the `~/.cache/afq-insight/weston_havens` directory. If the directory does not exist, it is created. The data follows the format expected by the :func:`load_afq_data` function: a file called `nodes.csv` that contains AFQ tract profiles and a file called `subjects.csv` that contains information about the subjects. The two files are linked through the `subjectID` column that should exist in both of them. For more information about this format, see also the `AFQ-Browser documentation `_ (items 2 and 3). .. GENERATED FROM PYTHON SOURCE LINES 64-67 .. code-block:: Python workdir = download_weston_havens() .. GENERATED FROM PYTHON SOURCE LINES 68-74 Read in the data ---------------- Next, we read in the data. The :func:`AFQDataset.from_files` static method expects a the filenames of a nodes.csv and subjects.csv file, and returns a dataset object. .. GENERATED FROM PYTHON SOURCE LINES 74-82 .. code-block:: Python dataset = AFQDataset.from_files( fn_nodes=op.join(workdir, "nodes.csv"), fn_subjects=op.join(workdir, "subjects.csv"), dwi_metrics=["md", "fa"], target_cols=["Age"], ) .. GENERATED FROM PYTHON SOURCE LINES 83-88 Train / test split ------------------ We can use the dataset in the :func:`train_test_split` function just as we would with an array. .. GENERATED FROM PYTHON SOURCE LINES 88-91 .. code-block:: Python dataset_train, dataset_test = train_test_split(dataset, test_size=1 / 3) .. GENERATED FROM PYTHON SOURCE LINES 92-97 Impute missing values --------------------- Next we train an imputer on the training set and use it to transform the features in both the training and the test set. .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: Python imputer = dataset_train.model_fit(SimpleImputer(strategy="median")) dataset_train = dataset_train.model_transform(imputer) dataset_test = dataset_test.model_transform(imputer) .. GENERATED FROM PYTHON SOURCE LINES 103-108 Fit a LASSO model ----------------- Next we fit a LASSO estimator to the training data and print the score of that model on the test dataset. .. GENERATED FROM PYTHON SOURCE LINES 108-116 .. code-block:: Python estimator = dataset_train.model_fit(Lasso()) y_pred = dataset_test.model_predict(estimator) train_score = dataset_train.model_score(estimator) test_score = dataset_test.model_score(estimator) print("LASSO train score:", train_score) print("LASSO test score: ", test_score) .. GENERATED FROM PYTHON SOURCE LINES 117-133 Convert to tensorflow datasets ------------------------------ Next we convert the train and test datasets to tensorflow datasets and use one of AFQ-Insight's built-in one-dimensional CNNs to predict age. This part of the example will only work if you have either installed AFQ-Insight with tensorflow using:: pip install afqinsight[tf] or separately install tensorflow using:: pip install tensorflow This model also performs poorly. It turns out predicting age in this dataset requires a bit more work. .. GENERATED FROM PYTHON SOURCE LINES 133-156 .. code-block:: Python tfset_train = dataset_train.as_tensorflow_dataset() tfset_test = dataset_test.as_tensorflow_dataset() batch_size = 2 tfset_train = tfset_train.batch(8) tfset_test = tfset_test.batch(8) print("CNN Architecture") model = nn.cnn_lenet( input_shape=(100, 40), output_activation=None, n_classes=1, verbose=True ) model.compile( loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4), metrics=["mean_squared_error"], ) model.fit(tfset_train, epochs=500, validation_data=tfset_test, verbose=0) print() print("CNN R^2 score: ", r2_score(dataset_test.y, model.predict(tfset_test))) .. _sphx_glr_download_auto_examples_demo_afq_dataset.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: demo_afq_dataset.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: demo_afq_dataset.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: demo_afq_dataset.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_