Home - SaoYan/Learning_CUDA GitHub Wiki

Learning CUDA - Personal Notes

Following the content of the book Cheng, J., Grossman, M. and McKercher, T., 2014. Professional Cuda C Programming. John Wiley & Sons..

Before starting:

The book is based on Fermi and Kepler microarchitectures. Other architectures may not share all features mentioned in the book.
Not all contents in this book are concise!

Contents of the notes:

Brief overview of keypoints:

Nvidia microarchitectures:

2007 - Tesla
2009 - Fermi
2013 - Kepler
2014 - Maxwell
2016 - Pascal
2017 - Volta
2018 - Turing

Maximizing Performance
CUDA programming guide - Performance Guidelines

From the view of instruction execution: the key is to hide latency by exposing more parallelism
- Chapter 2 - CUDA programming model
From the view of memory access: maximize bandwidth utilization
- Aligned and coalesced memory accesses that reduce wasted bandwidth - Chapter 4, sec 3.1-3.4
- [unconcise] Sufficient concurrent memory operations to hide memory latency - Chapter 4, sec 3.5