{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# AFQ with HCP data\nThis example demonstrates how to use the AFQ API to analyze HCP data.\nFor this example to run properly, you will need to gain access to the HCP data.\nThis can be done by following this instructions on the webpage\n[here](https://wiki.humanconnectome.org/display/PublicData/How+To+Connect+to+Connectome+Data+via+AWS).\nWe will use the ``Cloudknot`` library to run our AFQ analysis in the AWS\nBatch service (see also\n[this example](http://tractometry.org/pyAFQ/auto_examples/cloudknot_example.html)).\nIn the following we will use ``Cloudknot`` to run multiple\nconfigurations of pyAFQ on the HCP dataset. Specifically, here we will run\npyAFQ with different tractography seeding strategies.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import cloudknot and set the correct region. The HCP data is stored in `us-east-1`, so it's best\nto analyze it there.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import configparser\nimport itertools\nimport cloudknot as ck\nimport os.path as op\n\nck.set_region('us-east-1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a function to run. This function allows us to pass in the subject ID for the subjects we would\nlike to analyze , as well as strategies for seeding tractography (different masks and/or different\nnumbers of seeds per voxel).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def afq_process_subject(subject, seed_mask, n_seeds,\n aws_access_key, aws_secret_key):\n # define a function that each job will run\n # In this case, each process does a single subject\n import logging\n import s3fs\n # all imports must be at the top of the function\n # cloudknot installs the appropriate packages from pip\n from AFQ.data.fetch import fetch_hcp\n from AFQ.api.group import GroupAFQ\n import AFQ.definitions.image as afm\n\n # set logging level to your choice\n logging.basicConfig(level=logging.INFO)\n log = logging.getLogger(__name__)\n\n # Download the given subject to the AWS Batch machine from s3\n _, hcp_bids = fetch_hcp(\n [subject],\n profile_name=False,\n study=\"HCP_1200\",\n aws_access_key_id=aws_access_key,\n aws_secret_access_key=aws_secret_key)\n\n # We make a new seed mask for each process based off of the\n # seed_mask argument, which is a string.\n # This is to avoid any complications with pickling the masks.\n if seed_mask == \"roi\":\n seed_mask_obj = afm.RoiImage()\n elif seed_mask == \"fa\":\n seed_mask_obj = afm.ScalarImage(\"dti_fa\")\n else:\n seed_mask_obj = afm.FullImage()\n\n # Determined if n_seeds is per voxel or not\n if n_seeds > 3:\n random_seeds = True\n else:\n random_seeds = False\n\n # set the tracking_params based off our inputs\n tracking_params = {\n \"seed_mask\": seed_mask_obj,\n \"n_seeds\": n_seeds,\n \"random_seeds\": random_seeds}\n\n # define the api GroupAFQ object\n myafq = GroupAFQ(\n hcp_bids,\n tracking_params=tracking_params)\n\n # export_all runs the entire pipeline and creates many useful derivates\n myafq.export_all()\n\n # upload the results to some location on s3\n myafq.upload_to_s3(\n s3fs.S3FileSystem(),\n (f\"my_study_bucket/my_study_prefix_{seed_mask}_{n_seeds}\"\n f\"/derivatives/afq\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we will process the data from the following subjects\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "subjects = [\"103818\", \"105923\", \"111312\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will test combinations of different conditions:\nsubjects, seed masks, and number of seeds\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "seed_mask = [\"fa\", \"roi\"]\nn_seeds = [1, 2, 1000000, 2000000]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function creates all the combinations of the above lists, such that every subject is\nrun with every mask and every number of seeds.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "args = list(itertools.product(subjects, seed_mask, n_seeds))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We assume that the credentials for HCP usage are stored in the home directory in a\n`~/.aws/credentials` file. This is where these credentials are stored if the AWS CLI is used to\nconfigure the profile. We use the standard lib ``configparser`` library\nto get the relevant hcp keys from there.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "CP = configparser.ConfigParser()\nCP.read_file(open(op.join(op.expanduser('~'), '.aws', 'credentials')))\nCP.sections()\naws_access_key = CP.get('hcp', 'AWS_ACCESS_KEY_ID')\naws_secret_key = CP.get('hcp', 'AWS_SECRET_ACCESS_KEY')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function will attach your AWS keys to each list in a list of lists\nWe use this with each list being a list of arguments,\nand we append the AWS keys to each list of arguments, so that we can pass\nthem into the function to be used on AWS Batch to download the data into the\nAWS Batch machines.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def attach_keys(list_of_arg_lists):\n new_list_of_arg_lists = []\n for args in list_of_arg_lists:\n arg_ls = list(args)\n arg_ls.extend([aws_access_key, aws_secret_key])\n new_list_of_arg_lists.append(arg_ls)\n return new_list_of_arg_lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This calls the function to attach the access keys to the argument list\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "args = attach_keys(args)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the :meth:`Knot` object to run your jobs on. See\n[this example](http://tractometry.org/pyAFQ/auto_examples/cloudknot_example.html) for more\ndetails about the arguments to the object.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "knot = ck.Knot(\n name='afq-hcp-tractography-201110-0',\n func=afq_process_subject,\n base_image='python:3.11',\n image_github_installs=\"https://github.com/tractometry/pyAFQ.git\",\n pars_policies=('AmazonS3FullAccess',),\n bid_percentage=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This launches a process for each combination.\nBecause `starmap` is `True`, each list in `args` will be unfolded\nand passed into `afq_process_subject` as arguments.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "result_futures = knot.map(args, starmap=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function can be called repeatedly in a jupyter notebook\nto view the progress of jobs::\n\n knot.view_jobs()\n\nYou can also view the status of a specific job::\n\n knot.jobs[0].status\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When all jobs are finished, remember to clobber the knot to destroy all the resources that were\ncreated in AWS.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "result_futures.result() # waits for futures to resolve, not needed in notebook\nknot.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We continue processing to create another knot which takes the resulting profiles of each\ncombination and combines them all into one csv file\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def afq_combine_profiles(seed_mask, n_seeds):\n from AFQ.api import download_and_combine_afq_profiles\n download_and_combine_afq_profiles(\n \"my_study_bucket\", f\"my_study_prefix_{seed_mask}_{n_seeds}\")\n\n\nknot2 = ck.Knot(\n name='afq_combine_subjects-201110-0',\n func=afq_combine_profiles,\n base_image='python:3.11',\n image_github_installs=\"https://github.com/tractometry/pyAFQ.git\",\n pars_policies=('AmazonS3FullAccess',),\n bid_percentage=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the arguments to this call to :meth:`map` are all the different configurations of pyAFQ that we ran\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "seed_mask = [\"fa\", \"roi\"]\nn_seeds = [1, 2, 1000000, 2000000]\nargs = list(itertools.product(seed_mask, n_seeds))\n\nresult_futures2 = knot2.map(args, starmap=True)\nresult_futures2.result()\nknot2.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.13" } }, "nbformat": 4, "nbformat_minor": 0 }