.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "howto/howto_examples/cloudknot_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_howto_howto_examples_cloudknot_example.py: ========================================== Using cloudknot to run pyAFQ on AWS batch: ========================================== One of the purposes of ``pyAFQ`` is to analyze large-scale openly-available datasets, such as those in the `Human Connectome Project `_. To analyze these datasets, large amounts of compute are needed. One way to gain access to massive computational power is by using cloud computing. Here, we will demonstrate how to use ``pyAFQ`` in the Amazon Web Services cloud. We will rely on the `AWS Batch Service `_ , and we will submit work into AWS Batch using software that our group developed called `Cloudknot `_. .. GENERATED FROM PYTHON SOURCE LINES 20-25 Import cloudknot and set the AWS region within which computations will take place. Setting a region is important, because if the data that you are analyzing is stored in `AWS S3 `_ in a particular region, it is best to run the computation in that region as well. That is because AWS charges for inter-region transfer of data. .. GENERATED FROM PYTHON SOURCE LINES 25-28 .. code-block:: Python import cloudknot as ck ck.set_region('us-east-1') .. GENERATED FROM PYTHON SOURCE LINES 29-39 Define the function to use -------------------------- ``Cloudknot`` uses the single program multiple data paradigm of computing. This means that the same function will be run on multiple different inputs. For example, a ``pyAFQ`` processing function run on multiple different subjects in a dataset. Below, we define the function that we will use. Notice that ``Cloudknot`` functions include the import statements of the dependencies used. This is necessary so that ``Cloudknot`` knows what dependencies to install into AWS Batch to run this function. .. GENERATED FROM PYTHON SOURCE LINES 39-90 .. code-block:: Python def afq_process_subject(subject): # define a function that each job will run # In this case, each process does a single subject import s3fs # all imports must be at the top of the function # cloudknot installs the appropriate packages from pip from s3bids.utils import S3BIDSStudy from AFQ.api.group import GroupAFQ import AFQ.definitions.image as afm # Download the given subject to your local machine from s3 # Can find subjects more easily if they are specified in a # BIDS participants.tsv file, even if it is sparse study_ixi = S3BIDSStudy( "my_study", "my_study_bucket", "my_study_prefix", subjects=[subject], use_participants_tsv=True, anon=False) study_ixi.download( "local_bids_dir", include_derivs=["pipeline_name"]) # you can optionally provide your own segmentation file # in this case, we look for a file with suffix 'seg' # in the 'pipeline_name' pipeline, # and we consider all non-zero labels to be a part of the brain brain_mask_definition = afm.LabelledImageFile( suffix='seg', filters={'scope': 'pipeline_name'}, exclusive_labels=[0]) # define the api AFQ object myafq = GroupAFQ( "local_bids_dir", preproc_pipeline="pipeline_name", brain_mask_definition=brain_mask_definition, viz_backend_spec='plotly', # this will generate both interactive html and GIFs # noqa scalars=["dki_fa", "dki_md"]) # export_all runs the entire pipeline and creates many useful derivates myafq.export_all() # upload the results to some location on s3 myafq.upload_to_s3( s3fs.S3FileSystem(), "my_study_bucket/my_study_prefix/derivatives/afq") .. GENERATED FROM PYTHON SOURCE LINES 91-95 Here we provide a list of subjects that we have selected to process to randomly select 3 subjects without replacement, instead do: subjects = [[1], [2], [3]] see the docstring for S3BIDSStudy.__init__ for more information .. GENERATED FROM PYTHON SOURCE LINES 95-97 .. code-block:: Python subjects = ["123456", "123457", "123458"] .. GENERATED FROM PYTHON SOURCE LINES 98-114 Defining a ``Knot`` instance --------------------------------- We instantiate a class instance of the :class:`ck.Knot` class. This object will be used to run your jobs. The object is instantiated with the `'AmazonS3FullAccess'` policy, so that it can write the results out to S3, into a bucket that you have write permissions on. Setting the `bid_percentage` key-word makes AWS Batch use `spot EC2 instances `_ for the computation. This can result in substantial cost-savings, as spot compute instances can cost much less than on-demand instances. However, not that spot instances can also be evicted, so if completing all of the work is very time-sensitive, do not set this key-word argument. Using the `image_github_installs` key-word argument will install pyAFQ from GitHub. You can also specify other forks and branches to install from. .. GENERATED FROM PYTHON SOURCE LINES 114-122 .. code-block:: Python knot = ck.Knot( name='afq-process-subject-201009-0', func=afq_process_subject, base_image='python:3.8', image_github_installs="https://github.com/tractometry/pyAFQ.git", pars_policies=('AmazonS3FullAccess',), bid_percentage=100) .. GENERATED FROM PYTHON SOURCE LINES 123-128 Launching the computation -------------------------------- The :meth:`map` method of the :class:`Knot object maps each of the inputs provided as a sequence onto the function and executes the function on each one of them in parallel. .. GENERATED FROM PYTHON SOURCE LINES 128-130 .. code-block:: Python result_futures = knot.map(subjects) .. GENERATED FROM PYTHON SOURCE LINES 131-139 Once computations have started, you can call the following function to view the progress of jobs:: knot.view_jobs() You can also view the status of a specific job:: knot.jobs[0].status .. GENERATED FROM PYTHON SOURCE LINES 142-144 When all jobs are finished, remember to use the :meth:`clobber` method to destroy all of the AWS resources created by the :class:`Knot` .. GENERATED FROM PYTHON SOURCE LINES 144-147 .. code-block:: Python result_futures.result() knot.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True) .. GENERATED FROM PYTHON SOURCE LINES 148-150 In a second :class:`Knot` object, we use a function that takes the resulting profiles of each subject and combines them into one csv file. .. GENERATED FROM PYTHON SOURCE LINES 150-166 .. code-block:: Python def afq_combine_profiles(dummy_argument): from AFQ.api import download_and_combine_afq_profiles download_and_combine_afq_profiles( "my_study_bucket", "my_study_prefix") knot2 = ck.Knot( name='afq_combine_subjects-201009-0', func=afq_combine_profiles, base_image='python:3.8', image_github_installs="https://github.com/tractometry/pyAFQ.git", pars_policies=('AmazonS3FullAccess',), bid_percentage=100) .. GENERATED FROM PYTHON SOURCE LINES 167-171 This knot is called with a dummy argument, which is not used within the function itself. The `job_type` key-word argument is used to signal to ``Cloudknot`` that only one job is submitted rather than the default array of jobs. .. GENERATED FROM PYTHON SOURCE LINES 171-174 .. code-block:: Python result_futures2 = knot2.map(["dummy_argument"], job_type="independent") result_futures2.result() knot2.clobber(clobber_pars=True, clobber_repo=True, clobber_image=True) **Estimated memory usage:** 0 MB .. _sphx_glr_download_howto_howto_examples_cloudknot_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: cloudknot_example.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: cloudknot_example.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: cloudknot_example.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_