Benchmark Variants - fpga-opencl-benchmarks/rodinia_fpga GitHub Wiki
Each benchmark should have the following variants with different optimizations and parallelization schemes. Each benchmark should also have file README_fpga, describing some more details of the variants.
Version 0 (v0)
- Minimal changes to be compatible with AOCL
- Ahead-of-time compilation
- Device selection
- No struct kernel parameter
- Aggregating multiple kernel files into a single file
- restrict
- No optimization
- Originally developed for GPUs with NDRange kernels
Version 1 (v1)
- Unoptimized single work-item kernels with restrict and ivdep if necessary to allow correct pipelining
Version 2 (v2)
- NDRange
- Basic optimizations
- work-group size (
reqd_work_group_size
ormax_work_group_size
) - simd lanes (
num_simd_work_items
) - compute units (
num_compute_units
)
- work-group size (
Version 3 (v3)
- Single work-item
- Basic optimizations
- Shift register for floating-point reduction
- Unrolling
Version 4 (v4)
- NDRange
- Advanced optimizations
- Kernel rewrite
- Local memory access reduction
Version 5 (v5)
- Single work-item
- Advanced optimizations
- Kernel rewrite
- Shift register as on-chip buffer
- Loop blocking/tiling
- Loop collapse
- Exit condition optimization
- Systolic array
- etc.