Week 11: GPU Programming

⚠ Warning!

First parallel implementation of project due Nov 5

Goals

Lab 8, Exercise 1
- Run GPU code on an ICDS Cluster
- Accelerate linear algebra computations with GPU
- Recognize what problem sizes and likely to result in acceleration with a GPU for linear algebra
Lab 8, Exercise 2:
- Learn to write a GPU kernel, using KernelAbstractions.jl
- Improve performance by reducing memory transfers via GPU reductions
- Perform custom scientific computations using high-level GPU interface, such as
  - map or mapreduce on CuArray from CUDA.jl (recommended), or
  - Folds.jl with CUDAEx() executor from FoldsCUDA.jl
- Improve performance through reduced memory allocations
- Recognize what types of problems and problem sizes are likely to result in acceleration with a GPU when using a high-level programming interface or custom GPU kernel
Project
- Gain experience parallelizing a real world code
- Identify changes need to acheive significant performance benefit via parallelization
Readings / Discussions
- Describe how GPU differs from CPU
- Assess the prospects for a given algorithm to achieve a significant speed-up using a GPU

Best Practices for Scientific Computing: (Sec 2; yes, let's all read it again!)
Best Practices for Scientific Computing: (Reference list on last page, just in case you didn't notice it before!)

Week 11 Class Discussion: Priorities, Build Systems, Parallel Random Number Generators, Q&A, Autodiff on GPUs