Runtime custom layer implementation - DigitalMediaProfessionals/dv-sdk GitHub Wiki

As the final step, one should implement the custom layer callback function for the FPGA networl utility class to call in runtime.

In the network definition source codes generated by the conversion tool, the data structures and callback function prototypes will be generated in the header file for each type of the custom layers; and the parameters and layers will be generated in the source file for each custom layers.

The custom layer data structure and callback function prototype will look like this in the header for each custom layer type:

struct custom_param_type1
{
	...
};

void custom_callback_type1(fpga_layer &layer, void *custom_param);
...

And the parameters of these layers will be generated in the source file; and looks like this:

void CSSDFacePerson::Layer_XXX()
{
  static custom_param_type1 custom_param = {
    ...
  };

  struct fpga_layer& layer = get_layer(XXX);
  layer.type = LT_CUSTOM;
    ...
  layer.input_dim[0] = XXX;
  layer.input_dim[1] = XXX;
  layer.input_dim[2] = XXX;
  layer.input_dim_size = 3;
  layer.output_dim[0] = XXX;
  layer.output_dim[1] = XXX;
  layer.output_dim[2] = XXX;
  layer.output_dim_size = 3;
  layer.is_output = false;
  layer.is_f32_output = true;
  layer.is_input_hw_layout = true;
  layer.custom_proc_ptr = &custom_callback_type1;
  layer.custom_param = &custom_param;
}//end of  Layer_XXX

When this layer is executed in runtime, the FPGA network utility class will call the callback function custom_callback_type1, and pass the parameters to the function. In the callback function, one can obtain information of the layer from the layer parameter and custom_param parameter.

Custom layer I/O

While implementing the callback function, in order to read the input data from the input buffer and write output data to the output buffer, the following functions are provided:

void get_layer_input(fpga_layer& layer, std::vector<float>& layer_input, uint8_t *io_ptr);
void put_layer_output(fpga_layer& layer, std::vector<float>& layer_output, uint8_t *io_ptr, bool is_output_hw_layout = false);

get_layer_input reads the input data for the given layer and return the results in layer_input.

put_layer_output writes the output data in layer_output to the given layer. If is_output_hw_layout is true, will convert the buffer back to 16-bits floating points numbers and also to hardware layout. If the output is going to be used as input buffer of other hardware convolution or fully connected layers, this parameters needs to be set to true.

Example: PriorBox layer runtime implementation

The conversion tool generates the parameter of PriorBox and the prototype of callback function in the header as follows:

struct custom_param_PriorBox
{
  int   img_size[2];
  float min_size;
  float max_size;
  float aspect_ratios[6];
  float variances[4];
  bool  clip;
};

void custom_callback_PriorBox(fpga_layer &layer, void *custom_param);

The config parameters for these layers are generated in the source file as follows:

//Layer_64: Custom Layer
//	->: conv5_mbox_priorbox
void CSSDFacePerson::Layer_64()
{
  static custom_param_PriorBox custom_param = {
    { 300, 300,  },
    10.0,
    30.0,
    { 1.0, 1.0, 2, 0.5, 3, 0.3333333333333333,  },
    { 0.1, 0.1, 0.2, 0.2,  },
    true,
  };

  struct fpga_layer& layer = get_layer(64);
  layer.type = LT_CUSTOM;
    ...
  layer.input_dim[0] = 38;
  layer.input_dim[1] = 38;
  layer.input_dim[2] = 256;
  layer.input_dim_size = 3;
  layer.output_dim[0] = 8664;
  layer.output_dim[1] = 8;
  layer.output_dim_size = 2;
  layer.is_output = false;
  layer.is_f32_output = true;
  layer.is_input_hw_layout = true;
  layer.custom_proc_ptr = &custom_callback_PriorBox;
  layer.custom_param = &custom_param;
}//end of  Layer_64

As one can see, the parameters and its data structure are automatically generated by the tool.

The only thing left that one needs to do is to implement the custom_callback_PriorBox function, which may looks like this:

inline float box_clip(float x) {
  if (x < 0.f)
    return 0.f;
  else if (x > 1.f)
    return 1.f;
  else
    return x;
}

void custom_callback_PriorBox(fpga_layer &layer, void *param) {
  custom_param_PriorBox *box_param = reinterpret_cast<custom_param_PriorBox*>(param);
  vector<float> boxes_v(layer.output_dim[0] * layer.output_dim[1]);
  prior_box_t box;
  int box_count = 0;

  // Box widths and heights
  vector<float> box_widths_v;
  vector<float> box_heights_v;
  for (auto ar : box_param->aspect_ratios) {
    if ((ar == 1.0) && box_widths_v.empty()) {
      float sz = 0.5 * box_param->min_size;
      box_widths_v.push_back(sz);
      box_heights_v.push_back(sz);
    } else if ((ar == 1.0) && (box_widths_v.size() > 0)) {
      float sz = 0.5 * sqrt(box_param->min_size * box_param->max_size);
      box_widths_v.push_back(sz);
      box_heights_v.push_back(sz);
    } else if (ar != 1.0) {
      box_widths_v.push_back(0.5 * box_param->min_size * sqrt(ar));
      box_heights_v.push_back(0.5 * box_param->min_size / sqrt(ar));
    }
  }

  // Grid
  int num_variances = sizeof(box_param->variances) / sizeof(box_param->variances[0]);
  assert((num_variances == 1 || num_variances == 4) && "Number of variances must be either 1 or 4.");
  if (num_variances == 1) {
    box.xv = box_param->variances[0];
    box.yv = box_param->variances[0];
    box.wv = box_param->variances[0];
    box.hv = box_param->variances[0];
  } else {
    box.xv = box_param->variances[0];
    box.yv = box_param->variances[1];
    box.wv = box_param->variances[2];
    box.hv = box_param->variances[3];
  }
  float step_x = float(box_param->img_size[0]) / float(layer.input_dim[0]);
  float step_y = float(box_param->img_size[1]) / float(layer.input_dim[1]);
  for (int y = 0; y < layer.input_dim[1]; y++) {
    float center_y = step_y * (float(y) + 0.5);
    for (int x = 0; x < layer.input_dim[0]; x++) {
      float center_x = step_x * (float(x) + 0.5);
      for (unsigned int p = 0; p < box_widths_v.size(); p++) {
        box.x0 = (center_x - box_widths_v[p]) / float(box_param->img_size[0]);
        box.y0 = (center_y - box_heights_v[p]) / float(box_param->img_size[1]);
        box.x1 = (center_x + box_widths_v[p]) / float(box_param->img_size[0]);
        box.y1 = (center_y + box_heights_v[p]) / float(box_param->img_size[1]);
        if (box_param->clip) {
          box.x0 = box_clip(box.x0);
          box.y0 = box_clip(box.y0);
          box.x1 = box_clip(box.x1);
          box.y1 = box_clip(box.y1);
        }
        memcpy(&boxes_v[box_count * 8], &box, sizeof(box));
        box_count++;
      }
    }
  }
	
  put_layer_output(layer, boxes_v);
}

Note: the implementation above does not call the get_layer_input function to get the input data. It is because that the results of the PriorBox layer only depends on its parameter. If this is the case, one may opt in the optimization to only run the layer once in first time.

⚠️ **GitHub.com Fallback** ⚠️