The traction of Python in the context of scientific computing used to be ascribed to its effectiveness as a "glue language" (easy interoperability with other languages like C/C++ and Fortran), its full set of scientific libraries (NumPy, SciPy, SymPy, Matplotlib), and its notoriously shallow learning curve. But its downside was proclaimed to be performance: being an interpreted, dynamically typed language meant execution would be much slower than low-level implementations in compiled languages. In the past decade, solutions to this performance penalty have been multiplying. At first, researchers would identify performance bottlenecks in their Python applications, re-write the relevant portions of code in C/C++, and use swig to wrap this code and interface it with the main program in Python \cite{Beazley_2003}. Later, they gained Cython for numerical loops that cannot be expressed in NumPy operations. Cython compiles Python code extended with type declarations, generating code that can take advantage of the optimizations provided by the C compiler and achieve high-performance on those hotspots that dominate runtime. Parallel distributed computing with Python programs using message passing became available with the mpi4py package, which supports communication of Python data types and definition of communicator objects according to the MPI specification. A related package, petsc4py, provides access to the algorithms and data structures of the PETSc library (https://www.mcs.anl.gov/petsc/). It allows assembling distributed vectors and sparse matrices, solving linear systems of equations with Krylov iterative methods, and solving nonlinear equations with Newton methods—all of which are core needs in many scientific applications such as those using finite element methods \cite{Dalcin_2011}. Access to many-core hardware from Python programs was made possible with run-time code generation via PyCUDA and PyOpenCL \cite{Kl_ckner_2012}. These tools pioneered the pursuit of high-performance computing with Python, at the cost of increasingly specialized programmer effort. In the last few years, however, a new wave of innovation in scientific Python has sprung from the widespread use of Python in industry settings. Fortunately, much of this innovation has occurred under the open-source model of development and licensing pervading the Python ecosystem.
New tools to defeat the performance penalty include Numba (https://numba.pydata.org), which accelerates Python code by just-in-time compilation of functions to optimized machine code. The programmer only needs to add decorators, e.g., @numba.jit(nopython=True), ahead of the function definition and Numba will compile the function at runtime, and it will subsequently run without involving the Python interpreter. Numba can also compile a subset of Python code to CUDA kernels for execution on Nvidia GPUs (this could possibly be the easiest way to exploit GPUs for high-throughput computations). And the new parallel computing library for Python, Dask (https://dask.org), offers distributed data structures that stand in for NumPy arrays and pandas dataframes, scalable machine learning integrating with scikit-learn, and high-level tools for scheduling and distributing tasks in a cluster. This allows transitioning to parallel and distributed clusters with very little code rewrite. Numba and Dask—like NumPy, Matplotlib, SciPy, SymPy, pandas and Jupyter—are fiscally sponsored projects of NumFOCUS (https://numfocus.org), a 501(c)(3) public charity in the United States (I served on its Board of Directors from 2014 to 2020). This means that they are community governed projects developing software under a standard public license, and they both raise funds for their development and receive services (e.g., financial administration, legal support) via a non-profit. The core developers, maintainers, and users of these projects come together at conferences where they give technical talks and offer tutorials, participate in online conversations using discussion boards and code-repository issue trackers, and build value together tenaciously.
Technology companies often participate actively and benevolently in this activity. NumFOCUS receives corporate sponsorships that benefit the projects, certainly, but another impactful way companies contribute is by allowing or assigning their paid employees to work on the open-source projects of this large ecosystem. Numba and Dask were started at Anaconda, Inc., an Austin-based software and consulting company that also created the hugely popular Anaconda Python distribution. The explosion of machine learning and AI saw the tech giants developing and releasing open-source Python libraries for these applications: Google's TensorFlow (https://www.tensorflow.org) and Facebook's PyTorch (https://pytorch.org) being the most notable. Both libraries provide a Python interface while executing core operations in compiled languages and also CUDA for access to Nvidia GPUs. Google also developed a Jupyter-based cloud notebook, Colab (http://colab.research.google.com/), providing a hosted solution to run TensorFlow with access to GPUs and Google's own TPUs (Tensor Processing Units). The Japanese company Preferred Networks led the development of CuPy, a NumPy-like open-source library of matrix functions for Nvidia GPUs. And Nvidia embraced the PyData ecosystem creating its RAPIDS AI team (https://rapids.ai) to develop open-source libraries like cuDF, with a pandas-like API for manipulating dataframes on GPUs, and the machine learning library cuML, with a growing set of algorithms from scikit-learn. The data science community can now build high-performance workflows in Python and Jupyter, taking advantage of the latest hardware on cloud resources.
Python and Jupyter are also playing an increasingly important role in high-performance computing. Rollin Thomas and Shreyas Cholia, in the previous issue of CiSE, explained how the National Energy Research Scientific Computing Center (NERSC) began their voyage to Jupyter five years ago. They tell us that today about 25% of user interactions with the Cori supercomputer are via JupyterHub, and several scientific workflows have been made more user-friendly while at the same time enjoying parallel speed-ups with Dask. And they conclude: "Jupyter is quickly becoming the entry point to HPC for a growing class of users." \cite{Thomas_2021}. In the next issue, CiSE will feature several scientific applications that embody the powerful idea of combining high researcher productivity via Python and high performance through code generation, just-in-time compilation, or exploiting advanced Python libraries. John Rice's perspective of a scientific problem-solving environment for the 21st century may finally be realized, as long as we continue to engage and support the thriving communities of practice of the Python/Jupyter ecosystem. Don't miss our next issue!