KernelDatabaseFormat - clMathLibraries/clBLAS GitHub Wiki

Kernel database file format

Common structure

  1. Header (23 bytes)
  2. Presented memory patterns information
  3. Binary kernels

Header

Offset Field name Size
0 File ID ( 'CBS' ) 3
3 Version 4
7 Number of OpenCL functions 4
11 Binary data start 8
19 CRC32 4

Currently, the 'Version' field is equal to 1. The 'Binary data start' points to the offset that the OpenCL binary kernels begin at.

Memory pattern information

Field name Size
Name Length 4
Name Variable Length
Number of settings 4
CRC32 4
Settings array Variable Length

Settings entry

Offset Field name Size
0 Data type 4
4 Kernel flags 4
8 Number of granulations 4
12 CRC32 4
16 Decompositions array Variable Length

Supported data type identifiers

  • Float - 0x1
  • Double - 0x2
  • Float complex - 0x3
  • Double complex - 0x4

Kernel flags

These flags match to the code in the KernelExtraFlags enumeration

Name Value Description
KEXTRA_TRANS_A 0x01 Matrix A is transposed
KEXTRA_CONJUGATE_A 0x02 Matrix A conjugated form
KEXTRA_TRANS_B 0x04 Matrix B is transposed
KEXTRA_CONJUGATE_B 0x08 Matrix B conjugated form
KEXTRA_COLUMN_MAJOR 0x10 Matrices are stored in column major format
KEXTRA_UPPER_TRIANG 0x20 Matrix A is upper triangular
KEXTRA_SIDE_RIGHT 0x40 Matrix A on the right
KEXTRA_SEPARATE_TAILS 0x80 Problem tails are processed separately or no tails
KEXTRA_BETA_ZERO 0x800 Beta multiplier is zero

Decomposition entry

The 'sizes' field is an array of 3 40-bytes structures which match the source code to the SubproblemDim structure, except each field has an 8-byte size. The 'Parallelism granularity' field represents the OpenCL work group and matches the PGranularity structure in the code. Every OpenCL solver can provide up to 3 kernels. The 'Kernel start offsets' and 'Kernel binary sizes' fields contain start offsets in the file and size of each such kernel. The 'Execution time' contains the best time in double precision that the computing kernel ran in.

Offset Field name Size
0 Sizes 120
120 Parallelism granularity 16
136 Kernel start offsets 24
148 kernel binary sizes 12
160 Execution time 8
168 CRC32 4