Portable GPU Acceleration with Standard Parallelism

November 13, 2025 · View on GitHub

This tutorial teaches you how to accelerate portable HPC applications with CPUs and GPUs using the parallelism and concurrency features of modern Standard C++ and Fortran standards. You'll find the following content:

Notebooks containing lessons and exercises, intended for self-paced or instructor-led learning, which can be run on NVIDIA Brev or Google Colab.
Slides containing the lecture content for the lessons.
Docker Images and Docker Compose files for creating Brev Launchables or running locally.

Brev Launchables of this tutorial should use:

4xL4, 2xL4, 2xL40S, or 1x L40S instances.
GCP, AWS, or any other with Flexible Ports and Linux 6.1.24+, 6.2.11+, or 6.3+ (for HMM).

Notebooks

Portable GPU Acceleration of HPC Applications with ISO C++

Notebook	Link
Introduction
Lab 1: DAXPY
Lab 1: Select (optional)
Lab 2: 2D Heat Equation
Lab 3: Parallel Tree Construction

Accelerating Portable HPC Applications with ISO Fortran

Notebook	Link
Introduction
Lab 1: MATMUL
Lab 2: DAXPY
Lab 3: Heat Equation