Home - SaoYan/Learning_CUDA GitHub Wiki
Learning CUDA - Personal Notes
Following the content of the book Cheng, J., Grossman, M. and McKercher, T., 2014. Professional Cuda C Programming. John Wiley & Sons..
Before starting:
- The book is based on Fermi and Kepler microarchitectures. Other architectures may not share all features mentioned in the book.
- Not all contents in this book are concise!
Contents of the notes:
- Chapter 2 - CUDA programming model
- Chapter 3 - CUDA execution model
- Chapter 4 - Global memory
- Chapter 5 - Share memory and constant memory
Brief overview of keypoints:
- Nvidia microarchitectures:
- 2007 - Tesla
- 2009 - Fermi
- 2013 - Kepler
- 2014 - Maxwell
- 2016 - Pascal
- 2017 - Volta
- 2018 - Turing
- Maximizing Performance
CUDA programming guide - Performance Guidelines
- From the view of instruction execution: the key is to hide latency by exposing more parallelism
- From the view of memory access: maximize bandwidth utilization
- Aligned and coalesced memory accesses that reduce wasted bandwidth - Chapter 4, sec 3.1-3.4
- [unconcise] Sufficient concurrent memory operations to hide memory latency - Chapter 4, sec 3.5