Runtime custom layer implementation - DigitalMediaProfessionals/dv-sdk GitHub Wiki
As the final step, one should implement the custom layer callback function for the FPGA networl utility class to call in runtime.
In the network definition source codes generated by the conversion tool, the data structures and callback function prototypes will be generated in the header file for each type of the custom layers; and the parameters and layers will be generated in the source file for each custom layers.
The custom layer data structure and callback function prototype will look like this in the header for each custom layer type:
struct custom_param_type1
{
...
};
void custom_callback_type1(fpga_layer &layer, void *custom_param);
...
And the parameters of these layers will be generated in the source file; and looks like this:
void CSSDFacePerson::Layer_XXX()
{
static custom_param_type1 custom_param = {
...
};
struct fpga_layer& layer = get_layer(XXX);
layer.type = LT_CUSTOM;
...
layer.input_dim[0] = XXX;
layer.input_dim[1] = XXX;
layer.input_dim[2] = XXX;
layer.input_dim_size = 3;
layer.output_dim[0] = XXX;
layer.output_dim[1] = XXX;
layer.output_dim[2] = XXX;
layer.output_dim_size = 3;
layer.is_output = false;
layer.is_f32_output = true;
layer.is_input_hw_layout = true;
layer.custom_proc_ptr = &custom_callback_type1;
layer.custom_param = &custom_param;
}//end of Layer_XXX
When this layer is executed in runtime, the FPGA network utility class will call the callback function custom_callback_type1
, and pass the parameters to the function. In the callback function, one can obtain information of the layer from the layer
parameter and custom_param
parameter.
While implementing the callback function, in order to read the input data from the input buffer and write output data to the output buffer, the following functions are provided:
void get_layer_input(fpga_layer& layer, std::vector<float>& layer_input, uint8_t *io_ptr);
void put_layer_output(fpga_layer& layer, std::vector<float>& layer_output, uint8_t *io_ptr, bool is_output_hw_layout = false);
get_layer_input
reads the input data for the given layer
and return the results in layer_input
.
put_layer_output
writes the output data in layer_output
to the given layer
. If is_output_hw_layout
is true
, will convert the buffer back to 16-bits floating points numbers and also to hardware layout. If the output is going to be used as input buffer of other hardware convolution or fully connected layers, this parameters needs to be set to true
.
The conversion tool generates the parameter of PriorBox
and the prototype of callback function in the header as follows:
struct custom_param_PriorBox
{
int img_size[2];
float min_size;
float max_size;
float aspect_ratios[6];
float variances[4];
bool clip;
};
void custom_callback_PriorBox(fpga_layer &layer, void *custom_param);
The config parameters for these layers are generated in the source file as follows:
//Layer_64: Custom Layer
// ->: conv5_mbox_priorbox
void CSSDFacePerson::Layer_64()
{
static custom_param_PriorBox custom_param = {
{ 300, 300, },
10.0,
30.0,
{ 1.0, 1.0, 2, 0.5, 3, 0.3333333333333333, },
{ 0.1, 0.1, 0.2, 0.2, },
true,
};
struct fpga_layer& layer = get_layer(64);
layer.type = LT_CUSTOM;
...
layer.input_dim[0] = 38;
layer.input_dim[1] = 38;
layer.input_dim[2] = 256;
layer.input_dim_size = 3;
layer.output_dim[0] = 8664;
layer.output_dim[1] = 8;
layer.output_dim_size = 2;
layer.is_output = false;
layer.is_f32_output = true;
layer.is_input_hw_layout = true;
layer.custom_proc_ptr = &custom_callback_PriorBox;
layer.custom_param = &custom_param;
}//end of Layer_64
As one can see, the parameters and its data structure are automatically generated by the tool.
The only thing left that one needs to do is to implement the custom_callback_PriorBox
function, which may looks like this:
inline float box_clip(float x) {
if (x < 0.f)
return 0.f;
else if (x > 1.f)
return 1.f;
else
return x;
}
void custom_callback_PriorBox(fpga_layer &layer, void *param) {
custom_param_PriorBox *box_param = reinterpret_cast<custom_param_PriorBox*>(param);
vector<float> boxes_v(layer.output_dim[0] * layer.output_dim[1]);
prior_box_t box;
int box_count = 0;
// Box widths and heights
vector<float> box_widths_v;
vector<float> box_heights_v;
for (auto ar : box_param->aspect_ratios) {
if ((ar == 1.0) && box_widths_v.empty()) {
float sz = 0.5 * box_param->min_size;
box_widths_v.push_back(sz);
box_heights_v.push_back(sz);
} else if ((ar == 1.0) && (box_widths_v.size() > 0)) {
float sz = 0.5 * sqrt(box_param->min_size * box_param->max_size);
box_widths_v.push_back(sz);
box_heights_v.push_back(sz);
} else if (ar != 1.0) {
box_widths_v.push_back(0.5 * box_param->min_size * sqrt(ar));
box_heights_v.push_back(0.5 * box_param->min_size / sqrt(ar));
}
}
// Grid
int num_variances = sizeof(box_param->variances) / sizeof(box_param->variances[0]);
assert((num_variances == 1 || num_variances == 4) && "Number of variances must be either 1 or 4.");
if (num_variances == 1) {
box.xv = box_param->variances[0];
box.yv = box_param->variances[0];
box.wv = box_param->variances[0];
box.hv = box_param->variances[0];
} else {
box.xv = box_param->variances[0];
box.yv = box_param->variances[1];
box.wv = box_param->variances[2];
box.hv = box_param->variances[3];
}
float step_x = float(box_param->img_size[0]) / float(layer.input_dim[0]);
float step_y = float(box_param->img_size[1]) / float(layer.input_dim[1]);
for (int y = 0; y < layer.input_dim[1]; y++) {
float center_y = step_y * (float(y) + 0.5);
for (int x = 0; x < layer.input_dim[0]; x++) {
float center_x = step_x * (float(x) + 0.5);
for (unsigned int p = 0; p < box_widths_v.size(); p++) {
box.x0 = (center_x - box_widths_v[p]) / float(box_param->img_size[0]);
box.y0 = (center_y - box_heights_v[p]) / float(box_param->img_size[1]);
box.x1 = (center_x + box_widths_v[p]) / float(box_param->img_size[0]);
box.y1 = (center_y + box_heights_v[p]) / float(box_param->img_size[1]);
if (box_param->clip) {
box.x0 = box_clip(box.x0);
box.y0 = box_clip(box.y0);
box.x1 = box_clip(box.x1);
box.y1 = box_clip(box.y1);
}
memcpy(&boxes_v[box_count * 8], &box, sizeof(box));
box_count++;
}
}
}
put_layer_output(layer, boxes_v);
}
Note: the implementation above does not call the get_layer_input
function to get the input data. It is because that the results of the PriorBox
layer only depends on its parameter. If this is the case, one may opt in the optimization to only run the layer once in first time.