Page Index - Guanmoyu/how-to-optimize-gemm GitHub Wiki
65 page(s) in this GitHub Wiki:
- Home
- Table of contents
- The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication - Step-by-Step
- NOTICE ON ACADEMIC HONESTY
- References
- Set Up
- Step-by-step optimizations
- Computing four elements of C at a time
- Hiding computation in a subroutine
- Computing four elements at a time
- Further optimizing
- Computing a 4 x 4 block of C at a time
- Repeating the same optimizations
- Further optimizing
- Blocking to maintain performance
- Packing into contiguous memory
- Acknowledgement
- Optimization1
- Please reload this page
- Optimization2
- Please reload this page
- Optimization_1x4_3
- Please reload this page
- Optimization_1x4_4
- Please reload this page
- Optimization_1x4_5
- Please reload this page
- Optimization_1x4_6
- Please reload this page
- Optimization_1x4_7
- Please reload this page
- Optimization_1x4_8
- Please reload this page
- Optimization_1x4_9
- Please reload this page
- Optimization_4x4_10
- Please reload this page
- Optimization_4x4_11
- Please reload this page
- Optimization_4x4_12
- Please reload this page
- Optimization_4x4_13
- Please reload this page
- Optimization_4x4_14
- Please reload this page
- Optimization_4x4_15
- Please reload this page
- Optimization_4x4_3
- Please reload this page
- Optimization_4x4_4
- Please reload this page
- Optimization_4x4_5
- Please reload this page
- Optimization_4x4_6
- Please reload this page
- Optimization_4x4_7
- Please reload this page
- Optimization_4x4_8
- Please reload this page
- Optimization_4x4_9
- Please reload this page
- README
- Please reload this page
- SetUp
- Please reload this page