Driver interface API - DigitalMediaProfessionals/dv-sdk GitHub Wiki
Complete list of functions and structures can be found in dmp_dv.h
and dmp_dv_cmdraw_v0.h
Context
To work with the device context must be created first.
dmp_dv_context
is the data type for working with the context.
dmp_dv_context_create
dmp_dv_context dmp_dv_context_create();
Creates context for working with AI processor. It is thread-safe.
Returns Non-NULL on success, NULL on error.
dmp_dv_context_release
void dmp_dv_context_release(dmp_dv_context ctx);
Releases context for working with AI processor (decreases reference counter).
Should be called this when ctx
is no longer needed.
It is thread-safe.
ctx
Context for working with AI processor, when NULL it is ignored.
dmp_dv_context_retain
void dmp_dv_context_retain(dmp_dv_context ctx);
Retains context for working with AI processor (increases reference counter). It is thread-safe.
ctx
Context for working with AI processor, when NULL it is ignored.
dmp_dv_context_get_info_string
const char *dmp_dv_context_get_info_string(dmp_dv_context ctx);
Returns information about context as human-readable string. It is thread-safe.
dmp_dv_context_get_info
// Structure with information about the context.
struct dmp_dv_info {
uint32_t size; // size of this structure
uint32_t version; // version of this structure
};
// Structure with information about the context (version 0).
struct dmp_dv_info_v0 {
struct dmp_dv_info header; // general structure information
int32_t ub_size; // unified buffer size
int32_t max_kernel_size; // maximum supported convolutional kernel size
int32_t conv_freq; // convolutional block frequency in MHz
int32_t fc_freq; // fully connected block frequency in MHz
int32_t max_fc_vector_size; // fully connected block maximum input vector size in elements
int32_t rsvd; // padding to 64-bits
};
int dmp_dv_context_get_info(dmp_dv_context ctx, struct dmp_dv_info *info);
Fills structure with information about the context. On return, the version field will be set to maximum supported version less or equal to the requested, the fields of the corresponding structure will be set only if the size is enough. It is thread-safe.
ctx
Context for working with AI processor, when NULL it is ignored.info
Structure to be filled, fields size and version must be set.
Returns 0 on success, non-zero otherwise.
Memory management
This section contains functions for allocating and working with memory accessible to the device.
dmp_dv_mem
is the data type for memory allocation.
dmp_dv_mem_alloc
dmp_dv_mem dmp_dv_mem_alloc(dmp_dv_context ctx, size_t size);
Allocates physically continuous chunk of memory accessible to the device. Memory is allocated using ION with CMA and is not yet mapped to user or kernel address space. It is thread-safe.
ctx
Context for working with AI processor, when NULL the error is returned.size
Memory size in bytes.
Returns handle for the allocated memory or NULL on error.
dmp_dv_mem_release
void dmp_dv_mem_release(dmp_dv_mem mem);
Releases allocated memory (decreses reference counter). Should be called when mem
is no longer needed. dmp_dv_mem_unmap()
will be called automatically before the memory is returned to the system.
It is thread-safe.
mem
Handle for the allocated memory, when NULL it is ignored.
dmp_dv_mem_retain
void dmp_dv_mem_retain(dmp_dv_mem mem);
Retains allocated memory (increases reference counter). It is thread-safe.
mem
Handle for the allocated memory, when NULL it is ignored.
dmp_dv_mem_map
uint8_t *dmp_dv_mem_map(dmp_dv_mem mem);
Maps previously allocated memory to the user address space. Retuned memory can be read or written, executable flag is not set. If the memory was already mapped, the same pointer will be returned. It is thread-safe only on different memory handles.
mem
Handle to the allocated memory, when NULL the error is returned.
Returns pointer to memory region in user address space or NULL on error.
dmp_dv_mem_unmap
void dmp_dv_mem_unmap(dmp_dv_mem mem);
Unmaps previously allocated and mapped memory from the user address space. Function can be called repeatedly. dmp_dv_mem_sync_end()
will be called automatically before unmapping. It is thread-safe only on different memory handles.
mem
Handle to the allocated memory, when NULL the error is returned.
dmp_dv_mem_sync_start
int dmp_dv_mem_sync_start(dmp_dv_mem mem, int rd, int wr);
Starts Device <-> CPU synchronization of the memory buffer. When called multiple times with the same or less flags rd | wr, the function does nothing. It is thread-safe only on different memory handles.
mem
Handle to the allocated memory, when NULL the error is returned.rd
If non-zero, the Device -> CPU synchronization will occur before this function returns.wr
If non-zero, the CPU -> Device synchronization will occur ondmp_dv_mem_sync_end()
.
Returns 0 on success, non-zero otherwise.
dmp_dv_mem_sync_end
int dmp_dv_mem_sync_end(dmp_dv_mem mem);
Finishes the last started Device <-> CPU synchronization. When calling second time before next call to dmp_dv_mem_sync_start()
, the function does nothing. It is thread-safe only on different memory handles.
mem
Handle to the allocated memory, when NULL the error is returned.
Returns 0 on success, non-zero otherwise.
dmp_dv_mem_get_size
size_t dmp_dv_mem_get_size(dmp_dv_mem mem);
Returns allocated size in bytes for the provided memory handle. It is thread-safe.
mem
Handle to the allocated memory, when NULL the function will return 0 and the error message is set.
Returns size in bytes (can be greater than requested in dmp_dv_mem_alloc()
) or 0 if mem
is NULL.
Command lists
Command list contains several commands for execution, e.g. chain of convolution operations. It packs the commands in the hardware-specific ready for execution format allowing low-delay execution of multiple operations at once. dmp_dv_cmdlist
is the data type for command list.
dmp_dv_cmdlist_create
dmp_dv_cmdlist dmp_dv_cmdlist_create(dmp_dv_context ctx);
Creates command list. It is thread-safe.
ctx
Context for working with AI processor, when NULL the error is returned.
Returns handle to command list or NULL on error.
dmp_dv_cmdlist_release
void dmp_dv_cmdlist_release(dmp_dv_cmdlist cmdlist);
Releases the command list (decreases reference counter). Should be called when cmdlist
is no longer needed. It is thread-safe.
cmdlist
Handle to command list, when NULL it is ignored.
dmp_dv_cmdlist_retain
void dmp_dv_cmdlist_retain(dmp_dv_cmdlist cmdlist);
Retains the command list (increases reference counter). It is thread-safe.
cmdlist
Handle to command list, when NULL it is ignored.
dmp_dv_cmdlist_commit
int dmp_dv_cmdlist_commit(dmp_dv_cmdlist cmdlist);
Commits the command list, preparing device-specific structures for further execution. It is thread-safe only on different command lists.
cmdlist
Handle to command list, when NULL the error is returned.
Returns 0 on success, non-zero otherwise.
dmp_dv_cmdlist_exec
int64_t dmp_dv_cmdlist_exec(dmp_dv_cmdlist cmdlist);
Schedules command list for execution. Each context is associated with a single execution queue. It is thread-safe.
cmdlist
Handle to command list, when NULL the error is returned.
Returns exec_id >= 0
for this execution on success, < 0
on error.
dmp_dv_cmdlist_wait
int dmp_dv_cmdlist_wait(dmp_dv_cmdlist cmdlist, int64_t exec_id);
Waits for the specific scheduled command to be completed. It is thread-safe.
cmdlist
Handle to command list, when NULL the error is returned.exec_id
Id of the scheduled command to wait for completion.
Returns 0 on success, non-zero otherwise.
dmp_dv_cmdlist_add_raw
// Memory buffer specification.
typedef struct dmp_dv_buf_impl {
union {
dmp_dv_mem *mem; // memory handle
uint64_t rsvd; // padding to 64-bit size
};
uint64_t offs; // offset from the start of the buffer, must be 16-bit aligned
} dmp_dv_buf;
// Convolutional device type id.
#define DMP_DV_DEV_CONV 1
/// Fully connected device type id.
#define DMP_DV_DEV_FC 2
/// Upper bound of different device type ids.
#define DMP_DV_DEV_COUNT 3
/// Raw command for execution.
struct dmp_dv_cmdraw {
uint32_t size; // size of this structure
uint8_t device_type; // device type
uint8_t version; // version of this structure
uint8_t rsvd[2]; // padding to 64-bit size
} ;
// Convolution layer runs.
struct dmp_dv_cmdraw_conv_v0_run {
dmp_dv_buf weight_buf; // Buffer with packed weights
uint32_t conv_pad; // Bits [7:0] = left padding, bits [15:8] = right padding, bits [23:16] = top padding, bits [31:24] = bottom padding
uint32_t pool_pad; // Bits [7:0] = left padding, bits [15:8] = right padding, bits [23:16] = top padding, bits [31:24] = bottom padding
uint16_t m; // Number of Output Channels
uint16_t conv_enable; // 1 = Enabled, 0 = Disabled, 3 = Depthwise
uint16_t p; // Filter Size (bits[7:0] = width, bits[15:8] = height)
uint16_t pz; // Filter Depth (1 in case of 2D convolution)
uint16_t conv_stride; // Bits [7:0] = X stride, bits [15:8] = Y stride
uint16_t conv_dilation; // Bits [7:0] = X dilation, bits [15:8] = Y dilation
uint16_t weight_fmt; // Weights format (0 = random access blocks, 1 = compact stream, 3 = 8-bit quantized stream)
uint16_t pool_enable; // 0 = disabled, 1 = max pooling, 2 = average pooling, 4 - upsampling
uint16_t pool_avg_param; // Usually be set to 1/pool_size^2 in FP16 when using average pooling (average pooling assumes square size)
uint16_t pool_size; // Bits [7:0] = width, bits [15:8] = height
uint16_t pool_stride; // Bits [7:0] = X stride, bits [15:8] = Y stride
uint16_t actfunc; // Activation Function: 0 = None, 1 = Tanh, 2 = Leaky ReLU, 3 = Sigmoid, 4 = PReLU, 5 = ELU, 6 = ReLU6
uint16_t actfunc_param; // Leaky ReLU parameter in FP16
uint16_t rectifi_en; // Rectification, i.e. max(0, x) (NOTE: Can be applied after non-ReLU activation function)
uint16_t lrn; // Bits [0]: 1 = LRN enable, 0 = LRN disable, [1]: 1 = incl. power func, 0 = excl., [8:11]: x^2 scale factor log2
uint16_t rsvd; // padding to 64-bit size
};
// Raw command for convolutional block version 0.
struct dmp_dv_cmdraw_conv_v0 {
dmp_dv_cmdraw header; // General structure information
dmp_dv_buf input_buf; // Input buffer
dmp_dv_buf output_buf; // Output buffer
dmp_dv_buf eltwise_buf; // Buffer for elementwise add (0 = UBUF Input Buffer)
uint32_t topo; // [31:0] Output Destination of each run, 0 = UBUF, 1 = EXTMEM
uint16_t w; // Input Width
uint16_t h; // Input Height
uint16_t z; // Input Depth
uint16_t c; // Input Channels
uint16_t input_circular_offset; // Input Depth circular offset
uint16_t output_mode; // 0 = concat, 1 = elementwise add
struct dmp_dv_cmdraw_conv_v0_run run[32]; // description of each run
} dmp_dv_cmdraw_conv_v0;
// Raw command for fully connected block version 0.
struct dmp_dv_cmdraw_fc_v0 {
dmp_dv_cmdraw header; // General structure information
dmp_dv_buf weight_buf; // Buffer with packed weights
dmp_dv_buf input_buf; // Input buffer
dmp_dv_buf output_buf; // Output buffer
uint16_t input_size; // Size of the input in elements
uint16_t output_size; // Size of the output in elements
uint16_t weight_fmt; // Weights format: 0 = half-float unquantized, 1 = 8-bit quantized
uint16_t actfunc; // Activation Function: 0 = None, 1 = ReLU, 2 = Tanh, 3 = Leaky ReLU, 4 = Sigmoid, 5 = PReLU (PReLU must be used with POST-OP=1)
uint16_t actfunc_param; // Leaky ReLU parameter (in FP16 format), 0 = non-leaky
uint16_t rsvd[3]; // padding to 64-bit size
};
// Raw command for convolutional block version 1.
struct dmp_dv_cmdraw_conv_v1 {
struct dmp_dv_cmdraw header; // General structure information
struct dmp_dv_buf u8tofp16_table; // u8tofp16 conversion table
uint16_t to_bgr; // flag to convert input to BGR format
uint16_t rsvd[3]; // padding
struct dmp_dv_cmdraw_conv_v0 conv_cmd; // Includes version 0 command
};
int dmp_dv_cmdlist_add_raw(dmp_dv_cmdlist cmdlist, struct dmp_dv_cmdraw *cmd);
Adds raw command to the command list. It is thread-safe only on different command lists.
cmdlist
Handle to command list, when NULL the error is returned.cmd
Raw command for execution. Seedmp_dv_cmdraw_v0.h
for the description of command version 0. Seedmp_dv_cmdraw_v1.h
for the description of command version 1.
Returns 0 on success, non-zero otherwise, known error codes:
EINVAL
Invalid argument such as structure sizeENOTSUP
Raw command version is not supported.
Weights packing
This section contains functions for packing layer weights (Convolutional/Fully Connected) to the hardware-specific layout.
dmp_dv_pack_conv_weights
int dmp_dv_pack_conv_weights(
int n_channels, int kx, int ky, int n_kernels,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs convolution layer weights and biases into output array. It is thread-safe.
n_channels
Number of input channels.kx
Kernel width.ky
Kernel height.n_kernels
Number of output channels.quant_map
Quantization table for weights (but not bias), 256 elements, can be NULL.weights
Ifquant_map
is NULL, array of half precision floating point weights in NCHW format (N=output_channel_size
), else array of 1-byte indices.bias
Array of half precision floating point biases of sizen_kernels
.packed_weights
Output buffer for packed weights information (can be NULL ifpacked_weights_size
is 0).packed_weights_size
On input, contains the size of thepacked_weights
buffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
dmp_dv_pack_dil_weights
int dmp_dv_pack_dil_weights(
int n_channels, int kx, int ky, int n_kernels,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs dilated convolution layer weights and biases into output array. It is thread-safe.
n_channels
Number of input channels.kx
Kernel width.ky
Kernel height.n_kernels
Number of output channels.quant_map
Quantization table for weights (but not bias), 256 elements, can be NULL.weights
Ifquant_map
is NULL, array of half precision floating point weights in NCHW format (N=output_channel_size
), else array of 1-byte indices.bias
Array of half precision floating point biases of sizen_kernels
.packed_weights
Output buffer for packed weights information (can be NULL ifpacked_weights_size
is 0).packed_weights_size
On input, contains the size of thepacked_weights
buffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
dmp_dv_pack_fc_weights
int dmp_dv_pack_fc_weights(
int c_input, int h_input, int w_input,
int c_output, int h_output, int w_output,
const uint16_t quant_map[256],
const void *weights, const uint16_t *bias,
uint8_t *packed_weights, size_t *packed_weights_size);
Packs fully connected layer weights and biases into output array possibly rearranging them to match input and output shapes. The function packs weights in NCHW format to the AI processor input format WHC8 (n_channels / 8, width, height, 8 channels) with rearranging to produce output in AI processor format WHC8. It is thread-safe.
c_input
Number of input channels.h_input
Input height (set to 1 for 1D input).w_input
Input width (set to 1 for 1D input).c_output
Number of output channels.h_output
Output height (set to 1 for 1D output).w_output
Output width (set to 1 for 1D output).quant_map
Quantization table for weights (but not bias), 256 elements, can be NULL.weights
Ifquant_map
is NULL, array of half precision floating point weights in NCHW format (N=output_channel_size
), else array of 1-byte indices.bias
Array of half precision floating point biases of sizeoutput_size=c_output*h_output*w_output
.packed_weights
Output buffer for packed weights information (can be NULL ifpacked_weights_size
is 0).packed_weights_size
On input, contains the size of thepacked_weights
buffer in bytes (can be 0, in such case it will be filled with the required buffer size), on output will contain the required buffer size.
Returns 0 on success, non-zero otherwise.
General information functions
This section contains general information functions.
dmp_dv_get_version_string
const char *dmp_dv_get_version_string();
Returns version string of the driver interface. It is thread-safe.
The returned string starts with HW_MAJOR.HW_MINOR.YYYYMMDD for example "7.0.20181214":
- HW_MAJOR - supported hardware revision major,
- HW_MINOR - supported hardware revision minor,
- YYYYMMDD - release date.
dmp_dv_get_last_error_message
const char *dmp_dv_get_last_error_message();
Returns last error message. It might return garbage if several functions will fail from multiple threads simultaneously during this function call.
dmp_dv_fpga_device_exists
int dmp_dv_fpga_device_exists(dmp_dv_context ctx, int dev_type_id);
Returns if the dedicated fpga block exists.
ctx
Context for working with AI processor, when NULL -1 is returned.dev_type_id
Device type id. This must be DMP_DV_DEV*.
Returns -1 on invalid argument, 1 if the block exists, 0 otherwise.