Syllabus
The course content will be divided in five parts detailed below.
Part 1 : Introduction to GPU basics
- introduction to trends in graphics processing unit (GPU) hardware.
- progression of NVIDIA GPUs.
- background information & history on GPGPU (general purpose GPU)
computing.
- hardware considerations in GPU design.
- General Purpose GPU computing community & resources.
- CUDA programming basics.
- CUDA programming model and terminology.
- Asynchronous CPU/GPU compute model.
- Work flow for a GPGPU computation.
- Allocating storage arrays on the GPU device.
- Transferring data between host and device.
- The CUDA thread hierarchy.
- Invoking a CUDA kernel through special syntax
Hands-on Lab 1 : Mandelbrot Generator (CUDA)
Part 2 : Memory hierarchy, optimizations and libraries
- A simple CUDA kernel to add two vectors together.
- Catching CUDA errors.
- Timing CUDA kernels.
- How to compile and link CUDA programs using the nvcc compiler.
- Non-uniform memory architecture of GPGPU devices:
- Optimization techniques and case examples.
- Strategies for achieving high performance of CUDA (and OpenCL)
kernels:
- Overview of NVIDIA's CUDA Toolkit
- The nvcc compilation chain and intermediate compiler files.
- Debugging kernels with the NVIDIA's CUDA gdb debugger.
- Profiling CUDA kernels with NVIDIA's Visual Profiler.
- Profiling CUDA kernels from the command line.
- Compiler optimization options (CUDA and OpenCL)
Hands-on Lab 2 : Matrix-matrix operation (CUDA)
Part 3 : Programming tools and math libraries
- Building blocks for high-performance computing.
- CUDA Programming Tools.
- Profiling tools.
- Debugging tools and strategies.
- Standard libraries.
- Scripting for GPUs via python (pyCUDA).
Hands-on Lab 3 : Matrix-matrix operation via cuBLAS (CUDA)
Part 4 : OpenCL
- GPU hardware architectures (Nvidia and AMD).
- Background to OpenCL
- OpenCL standard for heterogenous computing on multicore archtectures.
- CUDA vs. OpenCL (syntax, functionality, terminology, memory models).
- CUDA vs. OpenCL case examples.
- Scripting for GPUs via python (pyOpenCL).
- Cross platform performance comparison.
- Porting CUDA to OpenCL using Swan.
Hands-on Lab 4 : Mandelbrot Generator and Matrix-Matrix operation (OpenCL)
Part 5 : GPU-based advanced PDE solvers
- GPU Accelerated Discontinuous Galerkin Methods.
- The Discontinous Galerkin Methods for building advanced solvers..
- Scientific computing challenges.
- Why GPUs Matter: trends.
- Parallel partitioning on Multi-GPUs.
- High-performance scientific computations.
Hands-on Lab: Initiation of Project work (CUDA/OpenCL)
Relevant Text Book (background reading):
David B. Kirk, Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, 2010.
This book is now available in the bookshop at DTU: Polyteknisk Boghandel in Building 101.