Home - SaoYan/Learning_CUDA GitHub Wiki

Learning CUDA - Personal Notes

Following the content of the book Cheng, J., Grossman, M. and McKercher, T., 2014. Professional Cuda C Programming. John Wiley & Sons..

Before starting:

  • The book is based on Fermi and Kepler microarchitectures. Other architectures may not share all features mentioned in the book.
  • Not all contents in this book are concise!

Contents of the notes:

Brief overview of keypoints:

  1. Nvidia microarchitectures:
  • 2007 - Tesla
  • 2009 - Fermi
  • 2013 - Kepler
  • 2014 - Maxwell
  • 2016 - Pascal
  • 2017 - Volta
  • 2018 - Turing
  1. Maximizing Performance
    CUDA programming guide - Performance Guidelines
  • From the view of instruction execution: the key is to hide latency by exposing more parallelism
  • From the view of memory access: maximize bandwidth utilization
    • Aligned and coalesced memory accesses that reduce wasted bandwidth - Chapter 4, sec 3.1-3.4
    • [unconcise] Sufficient concurrent memory operations to hide memory latency - Chapter 4, sec 3.5