Developing Interactive Parallel Workflows in Python using Parsl
Presenter
Kyle Chard
Fellow and Senior Researcher
University of Chicago and Argonne National Laboratory
Bio
Kyle Chard is a Senior Researcher and Fellow in the Computation
Institute at the University of Chicago and Argonne National Laboratory. He
received his Ph.D. in Computer Science from Victoria University of
Wellington in 2011. His research focuses on developing and applying
computational and data-intensive approaches to solve scientific problems.
He leads the development of Parsl, a parallel scripting library for
implementing scalable data-oriented workflows in Python. He is a member of
the Globus leadership team where he co-leads the Globus Labs research group.
He also co-leads projects related to scientific reproducibility, elastic and
cost-aware use of cloud infrastructure, and research automation.
Abstract
Python is quickly becoming the predominant programming language used
in research. However, it is often challenging to execute Python applications
at scale and to develop workflows that integrate a variety of independent
Python functions and external applications. Computations that are simple to
perform at small scales (e.g., on a laptop) can easily become prohibitively
difficult as data sizes and analysis complexity grows, requiring complex
orchestration and management of applications and data as well as
customization for specific execution environments. In this webinar we will
present Parsl (Parallel Scripting Library), a Python library for programming
and executing data-oriented workflows at scale. Parsl is designed to be
simple and intuitive: developers simply annotate Python functions with Parsl
directives (to wrap either Python functions or external applications); Parsl
then manages the execution of the script, determines dependencies between
functions, orchestrates data movement, and executes functions concurrently
when dependencies are met. Parsl separates the code and configuration,
allowing the same script to be seamlessly executed on laptops, clusters,
clouds, grids, and supercomputers.
In this webinar we will introduce Parsl and demonstrate how it can be used
to write and execute data-oriented workflows on Blue Waters. We will show
how Parsl can be used within a Jupyter notebook to develop scalable parallel
workflows and how these workflows can be executed on arbitrary resources
with simple configurations. Finally, we will demonstrate how Parsl can
automatically stage data using Globus to transparently analyze remotely
accessible data.
The webinar is intended for researchers and developers who are interested in
interactive and parallel computing, and particularly those with an interest
in developing workflows in Python to run on Blue Waters.
Attendees can follow the webinar in a Jupyter notebook or Python script on
Blue Waters. A guide to setting up Jupyter notebooks on Blue Waters is
available on the Blue Waters website:
https://bluewaters-archive.ncsa.illinois.edu/pythonnotebooks.
We will also provide a
hosted Jupyter environment for those that wish to try Parsl without
installing any dependencies locally.
Session details
When: 10:00 CST, October 10, 2018
Length of session: 1 hour
Target audience: Researchers, developers, and scientific teams.
Prerequisites: None.
Training and reference materials: