Driver interface API - DigitalMediaProfessionals/dv-sdk GitHub Wiki
Complete list of functions and structures can be found in dmp_dv.h and dmp_dv_cmdraw_v0.h
Context
To work with the device context must be created first.
dmp_dv_context is the data type for working with the context.
dmp_dv_context_create
dmp_dv_context dmp_dv_context_create();
Creates context for working with AI processor. It is thread-safe.
Returns Non-NULL on success, NULL on error.
dmp_dv_context_release
void dmp_dv_context_release(dmp_dv_context ctx);
Releases context for working with AI processor (decreases reference counter).
Should be called this when ctx is no longer needed.
It is thread-safe.
ctxContext for working with AI processor, when NULL it is ignored.
dmp_dv_context_retain
void dmp_dv_context_retain(dmp_dv_context ctx);
Retains context for working with AI processor (increases reference counter). It is thread-safe.
ctxContext for working with AI processor, when NULL it is ignored.
dmp_dv_context_get_info_string
const char *dmp_dv_context_get_info_string(dmp_dv_context ctx);
Returns information about context as human-readable string. It is thread-safe.
dmp_dv_context_get_info
// Structure with information about the context.
struct dmp_dv_info {
uint32_t size; // size of this structure
uint32_t version; // version of this structure
};
// Structure with information about the context (version 0).
struct dmp_dv_info_v0 {
struct dmp_dv_info header; // general structure information
int32_t ub_size; // unified buffer size
int32_t max_kernel_size; // maximum supported convolutional kernel size
int32_t conv_freq; // convolutional block frequency in MHz
int32_t fc_freq; // fully connected block frequency in MHz
int32_t max_fc_vector_size; // fully connected block maximum input vector size in elements
int32_t rsvd; // padding to 64-bits
};
int dmp_dv_context_get_info(dmp_dv_context ctx, struct dmp_dv_info *info);
Fills structure with information about the context. On return, the version field will be set to maximum supported version less or equal to the requested, the fields of the corresponding structure will be set only if the size is enough. It is thread-safe.
ctxContext for working with AI processor, when NULL it is ignored.infoStructure to be filled, fields size and version must be set.
Returns 0 on success, non-zero otherwise.
Memory management
This section contains functions for allocating and working with memory accessible to the device.
dmp_dv_mem is the data type for memory allocation.
dmp_dv_mem_alloc
dmp_dv_mem dmp_dv_mem_alloc(dmp_dv_context ctx, size_t size);
Allocates physically continuous chunk of memory accessible to the device. Memory is allocated using ION with CMA and is not yet mapped to user or kernel address space. It is thread-safe.
ctxContext for working with AI processor, when NULL the error is returned.sizeMemory size in bytes.
Returns handle for the allocated memory or NULL on error.
dmp_dv_mem_release
void dmp_dv_mem_release(dmp_dv_mem mem);
Releases allocated memory (decreses reference counter). Should be called when mem is no longer needed. dmp_dv_mem_unmap() will be called automatically before the memory is returned to the system.
It is thread-safe.
memHandle for the allocated memory, when NULL it is ignored.
dmp_dv_mem_retain
void dmp_dv_mem_retain(dmp_dv_mem mem);
Retains allocated memory (increases reference counter). It is thread-safe.
memHandle for the allocated memory, when NULL it is ignored.
dmp_dv_mem_map
uint8_t *dmp_dv_mem_map(dmp_dv_mem mem);
Maps previously allocated memory to the user address space. Retuned memory can be read or written, executable flag is not set. If the memory was already mapped, the same pointer will be returned. It is thread-safe only on different memory handles.
memHandle to the allocated memory, when NULL the error is returned.
Returns pointer to memory region in user address space or NULL on error.
dmp_dv_mem_unmap
void dmp_dv_mem_unmap(dmp_dv_mem mem);
Unmaps previously allocated and mapped memory from the user address space. Function can be called repeatedly. dmp_dv_mem_sync_end() will be called automatically before unmapping. It is thread-safe only on different memory handles.
memHandle to the allocated memory, when NULL the error is returned.
dmp_dv_mem_sync_start
int dmp_dv_mem_sync_start(dmp_dv_mem mem, int rd, int wr);
Starts Device <-> CPU synchronization of the memory buffer. When called multiple times with the same or less flags rd | wr, the function does nothing. It is thread-safe only on different memory handles.
memHandle to the allocated memory, when NULL the error is returned.rdIf non-zero, the Device -> CPU synchronization will occur before this function returns.wrIf non-zero, the CPU -> Device synchronization will occur ondmp_dv_mem_sync_end().
Returns 0 on success, non-zero otherwise.
dmp_dv_mem_sync_end
int dmp_dv_mem_sync_end(dmp_dv_mem mem);
Finishes the last started Device <-> CPU synchronization. When calling second time before next call to dmp_dv_mem_sync_start(), the function does nothing. It is thread-safe only on different memory handles.
memHandle to the allocated memory, when NULL the error is returned.
Returns 0 on success, non-zero otherwise.
dmp_dv_mem_get_size
size_t dmp_dv_mem_get_size(dmp_dv_mem mem);
Returns allocated size in bytes for the provided memory handle. It is thread-safe.
memHandle to the allocated memory, when NULL the function will return 0 and the error message is set.
Returns size in bytes (can be greater than requested in dmp_dv_mem_alloc()) or 0 if mem is NULL.
Command lists
Command list contains several commands for execution, e.g. chain of convolution operations. It packs the commands in the hardware-specific ready for execution format allowing low-delay execution of multiple operations at once. dmp_dv_cmdlist is the data type for command list.
dmp_dv_cmdlist_create
dmp_dv_cmdlist dmp_dv_cmdlist_create(dmp_dv_context ctx);
Creates command list. It is thread-safe.
ctxContext for working with AI processor, when NULL the error is returned.
Returns handle to command list or NULL on error.
dmp_dv_cmdlist_release
void dmp_dv_cmdlist_release(dmp_dv_cmdlist cmdlist);
Releases the command list (decreases reference counter). Should be called when cmdlist is no longer needed. It is thread-safe.
cmdlistHandle to command list, when NULL it is ignored.
dmp_dv_cmdlist_retain
void dmp_dv_cmdlist_retain(dmp_dv_cmdlist cmdlist);
Retains the command list (increases reference counter). It is thread-safe.
cmdlistHandle to command list, when NULL it is ignored.
dmp_dv_cmdlist_commit
int dmp_dv_cmdlist_commit(dmp_dv_cmdlist cmdlist);
Commits the command list, preparing device-specific structures for further execution. It is thread-safe only on different command lists.
cmdlistHandle to command list, when NULL the error is returned.
Returns 0 on success, non-zero otherwise.
dmp_dv_cmdlist_exec
int64_t dmp_dv_cmdlist_exec(dmp_dv_cmdlist cmdlist);
Schedules command list for execution. Each context is associated with a single execution queue. It is thread-safe.
cmdlistHandle to command list, when NULL the error is returned.
Returns exec_id >= 0 for this execution on success, < 0 on error.
dmp_dv_cmdlist_wait
int dmp_dv_cmdlist_wait(dmp_dv_cmdlist cmdlist, int64_t exec_id);
Waits for the specific scheduled command to be completed. It is thread-safe.
cmdlistHandle to command list, when NULL the error is returned.exec_idId of the scheduled command to wait for completion.
Returns 0 on success, non-zero otherwise.
dmp_dv_cmdlist_add_raw
// Memory buffer specification.
typedef struct dmp_dv_buf_impl {
union {
dmp_dv_mem *mem; // memory handle
uint64_t rsvd; // padding to 64-bit size
};
uint64_t offs; // offset from the start of the buffer, must be 16-bit aligned
} dmp_dv_buf;
// Convolutional device type id.
#define DMP_DV_DEV_CONV 1
/// Fully connected device type id.
#define DMP_DV_DEV_FC 2
/// Upper bound of different device type ids.
#define DMP_DV_DEV_COUNT 3
/// Raw command for execution.
struct dmp_dv_cmdraw {
uint32_t size; // size of this structure
uint8_t device_type; // device type
uint8_t version; // version of this structure
uint8_t rsvd[2]; // padding to 64-bit size
} ;
// Convolution layer runs.
struct dmp_dv_cmdraw_conv_v0_run {
dmp_dv_buf weight_buf; // Buffer with packed weights
uint32_t conv_pad; // Bits [7:0] = left padding, bits [15:8] = right padding, bits [23:16] = top padding, bits [31:24] = bottom padding
uint32_t pool_pad; // Bits [7:0] = left padding, bits [15:8] = right padding, bits [23:16] = top padding, bits [31:24] = bottom padding
uint16_t m; // Number of Output Channels
uint16_t conv_enable; // 1 = Enabled, 0 = Disabled, 3 = Depthwise
uint16_t p; // Filter Size (bits[7:0] = width, bits[15:8] = height)
uint16_t pz; // Filter Depth (1 in case of 2D convolution)
uint16_t conv_stride; // Bits [7:0] = X stride, bits [15:8] = Y stride
uint16_t conv_dilation; // Bits [7:0] = X dilation, bits [15:8] = Y dilation
uint16_t weight_fmt; // Weights format (0 = random access blocks, 1 = compact stream, 3 = 8-bit quantized stream)
uint16_t pool_enable; // 0 = disabled, 1 = max pooling, 2 = average pooling, 4 - upsampling
uint16_t pool_avg_param; // Usually be set to 1/pool_size^2 in FP16 when using average pooling (average pooling assumes square size)
uint16_t pool_size; // Bits [7:0] = width, bits [15:8] = height
uint16_t pool_stride; // Bits [7:0] = X stride, bits [15:8] = Y stride
uint16_t actfunc; // Activation Function: 0 = None, 1 = Tanh, 2 = Leaky ReLU, 3 = Sigmoid, 4 = PReLU, 5 = ELU, 6 = ReLU6
uint16_t actfunc_param; // Leaky ReLU parameter in FP16
uint16_t rectifi_en; // Rectification, i.e. max(0, x) (NOTE: Can be applied after non-ReLU activation function)
uint16_t lrn; // Bits [0]: 1 = LRN enable, 0 = LRN disable, [1]: 1 = incl. power func, 0 = excl., [8:11]: x^2 scale factor log2
uint16_t rsvd; // padding to 64-bit size
};
// Raw command for convolutional block version 0.
struct dmp_dv_cmdraw_conv_v0 {
dmp_dv_cmdraw header; // General structure information
dmp_dv_buf input_buf; // Input buffer
dmp_dv_buf output_buf; // Output buffer
dmp_dv_buf eltwise_buf; // Buffer for elementwise add (0 = UBUF Input Buffer)
uint32_t topo; // [31:0] Output Destination of each run, 0 = UBUF, 1 = EXTMEM
uint16_t w; // Input Width
uint16_t h; // Input Height
uint16_t z; // Input Depth
uint16_t c; // Input Channels
uint16_t input_circular_offset; // Input Depth circular offset
uint16_t output_mode; // 0 = concat, 1 = elementwise add
struct dmp_dv_cmdraw_conv_v0_run run[32]; // description of each run
} dmp_dv_cmdraw_conv_v0;
// Raw command for fully connected block version 0.
struct dmp_dv_cmdraw_fc_v0 {
dmp_dv_cmdraw header; // General structure information
dmp_dv_buf weight_buf; // Buffer with packed weights
dmp_dv_buf input_buf; // Input buffer
dmp_dv_buf output_buf; // Output buffer
uint16_t input_size; // Size of the input in elements
uint16_t output_size; // Size of the output in elements
uint16_t weight_fmt; // Weights format: 0 = half-float unquantized, 1 = 8-bit quantized
uint16_t actfunc; // Activation Function: 0 = None, 1 = ReLU, 2 = Tanh, 3 = Leaky ReLU, 4 = Sigmoid, 5 = PReLU (PReLU must be used with POST-OP=1)
uint16_t actfunc_param; // Leaky ReLU parameter (in FP16 format), 0 = non-leaky
uint16_t rsvd[3]; // padding to 64-bit size
};
// Raw command for convolutional block version 1.
struct dmp_dv_cmdraw_conv_v1 {
struct dmp_dv_cmdraw header; // General structure information
struct dmp_dv_buf u8tofp16_table; // u8tofp16 conversion table
uint16_t to_bgr; // flag to convert input to BGR format
uint16_t rsvd[3]; // padding
struct dmp_dv_cmdraw_conv_v0 conv_cmd; // Includes version 0 command
};
int dmp_dv_cmdlist_add_raw(dmp_dv_cmdlist cmdlist, struct dmp_dv_cmdraw *cmd);
Adds raw command to the command list. It is thread-safe only on different command lists.
cmdlistHandle to command list, when NULL the error is returned.cmdRaw command for execution. Seedmp_dv_cmdraw_v0.hfor the description of command version 0. Seedmp_dv_cmdraw_v1.hfor the description of command version 1.
Returns 0 on success, non-zero otherwise, known error codes:
EINVALInvalid argument such as structure sizeENOTSUPRaw command version is not supported.
Weights packing
This section contains functions for packing layer weights (Convolutional/Fully Connected) to the hardware-specific layout.
dmp_dv_pack_conv_weights
int dmp_dv_pack_conv_weights(
int n_channels, int kx, int ky, int n_kernels,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs convolution layer weights and biases into output array. It is thread-safe.
n_channelsNumber of input channels.kxKernel width.kyKernel height.n_kernelsNumber of output channels.quant_mapQuantization table for weights (but not bias), 256 elements, can be NULL.weightsIfquant_mapis NULL, array of half precision floating point weights in NCHW format (N=output_channel_size), else array of 1-byte indices.biasArray of half precision floating point biases of sizen_kernels.packed_weightsOutput buffer for packed weights information (can be NULL ifpacked_weights_sizeis 0).packed_weights_sizeOn input, contains the size of thepacked_weightsbuffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
dmp_dv_pack_dil_weights
int dmp_dv_pack_dil_weights(
int n_channels, int kx, int ky, int n_kernels,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs dilated convolution layer weights and biases into output array. It is thread-safe.
n_channelsNumber of input channels.kxKernel width.kyKernel height.n_kernelsNumber of output channels.quant_mapQuantization table for weights (but not bias), 256 elements, can be NULL.weightsIfquant_mapis NULL, array of half precision floating point weights in NCHW format (N=output_channel_size), else array of 1-byte indices.biasArray of half precision floating point biases of sizen_kernels.packed_weightsOutput buffer for packed weights information (can be NULL ifpacked_weights_sizeis 0).packed_weights_sizeOn input, contains the size of thepacked_weightsbuffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
dmp_dv_pack_fc_weights
int dmp_dv_pack_fc_weights(
int c_input, int h_input, int w_input,
int c_output, int h_output, int w_output,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs fully connected layer weights and biases into output array possibly rearranging them to match input and output shapes. The function packs weights in NCHW format to the AI processor input format WHC8 (n_channels / 8, width, height, 8 channels) with rearranging to produce output in AI processor format WHC8. It is thread-safe.
c_inputNumber of input channels.h_inputInput height (set to 1 for 1D input).w_inputInput width (set to 1 for 1D input).c_outputNumber of output channels.h_outputOutput height (set to 1 for 1D output).w_outputOutput width (set to 1 for 1D output).quant_mapQuantization table for weights (but not bias), 256 elements, can be NULL.weightsIfquant_mapis NULL, array of half precision floating point weights in NCHW format (N=output_channel_size), else array of 1-byte indices.biasArray of half precision floating point biases of sizeoutput_size=c_output*h_output*w_output.packed_weightsOutput buffer for packed weights information (can be NULL ifpacked_weights_sizeis 0).packed_weights_sizeOn input, contains the size of thepacked_weightsbuffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
General information functions
This section contains general information functions.
dmp_dv_get_version_string
const char *dmp_dv_get_version_string();
Returns version string of the driver interface. It is thread-safe.
The returned string starts with HW_MAJOR.HW_MINOR.YYYYMMDD for example "7.0.20181214":
- HW_MAJOR - supported hardware revision major,
- HW_MINOR - supported hardware revision minor,
- YYYYMMDD - release date.
dmp_dv_get_last_error_message
const char *dmp_dv_get_last_error_message();
Returns last error message. It might return garbage if several functions will fail from multiple threads simultaneously during this function call.
dmp_dv_fpga_device_exists
int dmp_dv_fpga_device_exists(dmp_dv_context ctx, int dev_type_id);
Returns if the dedicated fpga block exists.
ctxContext for working with AI processor, when NULL -1 is returned.dev_type_idDevice type id. This must be DMP_DV_DEV*.
Returns -1 on invalid argument, 1 if the block exists, 0 otherwise.