KernelDatabaseFormat - tingxingdong/clBLAS-private GitHub Wiki
Kernel database file format
Common structure
- Header (23 bytes)
- Presented memory patterns information
- Binary kernels
Header
Offset |
Field name |
Size |
0 |
File ID ( 'CBS' ) |
3 |
3 |
Version |
4 |
7 |
Number of OpenCL functions |
4 |
11 |
Binary data start |
8 |
19 |
CRC32 |
4 |
Currently, the 'Version' field is equal to 1. The 'Binary data start' points to the offset that the OpenCL binary kernels begin at.
Memory pattern information
Field name |
Size |
Name Length |
4 |
Name |
Variable Length |
Number of settings |
4 |
CRC32 |
4 |
Settings array |
Variable Length |
Settings entry
Offset |
Field name |
Size |
0 |
Data type |
4 |
4 |
Kernel flags |
4 |
8 |
Number of granulations |
4 |
12 |
CRC32 |
4 |
16 |
Decompositions array |
Variable Length |
Supported data type identifiers
- Float - 0x1
- Double - 0x2
- Float complex - 0x3
- Double complex - 0x4
Kernel flags
These flags match to the code in the KernelExtraFlags enumeration
Name |
Value |
Description |
KEXTRA_TRANS_A |
0x01 |
Matrix A is transposed |
KEXTRA_CONJUGATE_A |
0x02 |
Matrix A conjugated form |
KEXTRA_TRANS_B |
0x04 |
Matrix B is transposed |
KEXTRA_CONJUGATE_B |
0x08 |
Matrix B conjugated form |
KEXTRA_COLUMN_MAJOR |
0x10 |
Matrices are stored in column major format |
KEXTRA_UPPER_TRIANG |
0x20 |
Matrix A is upper triangular |
KEXTRA_SIDE_RIGHT |
0x40 |
Matrix A on the right |
KEXTRA_SEPARATE_TAILS |
0x80 |
Problem tails are processed separately or no tails |
KEXTRA_BETA_ZERO |
0x800 |
Beta multiplier is zero |
Decomposition entry
The 'sizes' field is an array of 3 40-bytes structures which match the source code to the SubproblemDim structure, except each field has an 8-byte size. The 'Parallelism granularity' field represents the OpenCL work group and matches the PGranularity structure in the code. Every OpenCL solver can provide up to 3 kernels. The 'Kernel start offsets' and 'Kernel binary sizes' fields contain start offsets in the file and size of each such kernel. The 'Execution time' contains the best time in double precision that the computing kernel ran in.
Offset |
Field name |
Size |
0 |
Sizes |
120 |
120 |
Parallelism granularity |
16 |
136 |
Kernel start offsets |
24 |
148 |
kernel binary sizes |
12 |
160 |
Execution time |
8 |
168 |
CRC32 |
4 |