Benchmark Variants - fpga-opencl-benchmarks/rodinia_fpga GitHub Wiki

Each benchmark should have the following variants with different optimizations and parallelization schemes. Each benchmark should also have file README_fpga, describing some more details of the variants.

Version 0 (v0)

  • Minimal changes to be compatible with AOCL
    • Ahead-of-time compilation
    • Device selection
    • No struct kernel parameter
    • Aggregating multiple kernel files into a single file
    • restrict
  • No optimization
  • Originally developed for GPUs with NDRange kernels

Version 1 (v1)

  • Unoptimized single work-item kernels with restrict and ivdep if necessary to allow correct pipelining

Version 2 (v2)

  • NDRange
  • Basic optimizations
    • work-group size (reqd_work_group_size or max_work_group_size)
    • simd lanes (num_simd_work_items)
    • compute units (num_compute_units)

Version 3 (v3)

  • Single work-item
  • Basic optimizations
    • Shift register for floating-point reduction
    • Unrolling

Version 4 (v4)

  • NDRange
  • Advanced optimizations
    • Kernel rewrite
    • Local memory access reduction

Version 5 (v5)

  • Single work-item
  • Advanced optimizations
    • Kernel rewrite
    • Shift register as on-chip buffer
    • Loop blocking/tiling
    • Loop collapse
    • Exit condition optimization
    • Systolic array
    • etc.