oneAPI Code Samples Targeted Devices - jimmytwei/oneAPI-samples GitHub Wiki

Device Targets by Sample

Code Sample Name	Supported Intel® Architecture(s)	Description
1D Heat Transfer	['CPU', 'GPU']	The 1D Heat Transfer sample simulates 1D Heat Transfer problem using Intel® oneAPI Data Parallel C++ (DPC++)
AWS Pub Sub	['CPU']	This sample uses the Message Broker for AWS* IoT to send and receive messages through an MQTT connection
All Pairs Shortest Paths	['CPU', 'GPU']	All Pairs Shortest Paths finds the shortest paths between pairs of vertices in a graph using a parallel blocked algorithm that enables the application to efficiently offload compute intensive work to the GPU.
Analog In	['CPU']	Demonstrate how to read an analog voltage value from an input pin using the Eclipse* MRAA library
Azure IoTHub Telemetry	['CPU']	Demonstrate how to send messages from a single device to Microsoft Azure IoT Hub via chosen protocol
Base: Vector Add	['CPU', 'GPU', 'FPGA']	This simple sample adds two large vectors in parallel. Provides a ‘Hello World!’ like sample to ensure your environment is setup correctly using simple Intel® oneAPI Data Parallel C++ (DPC++)
Bitonic Sort	['CPU', 'GPU']	Bitonic Sort using Intel® oneAPI Data Parallel C++ (DPC++)
Black Scholes	['CPU', 'GPU']	Black Scholes formula calculation using Intel® oneMKL Vector Math and Random Number Generators
Block Cholesky Decomposition	['CPU', 'GPU']	Block Cholesky Decomposition using Intel® oneMKL BLAS and LAPACK
Block LU Decomposition	['CPU', 'GPU']	Block LU Decomposition using Intel® oneMKL BLAS and LAPACK
Buffered Host-Device Streaming	['FPGA']	An FPGA tutorial demonstrating how to stream data between the host and device with multiple buffers
CMake FPGA	['FPGA']	Project Templates - Linux CMake project for FPGA
CMake GPU	['GPU']	Project Templates - Linux CMake project for GPU
CRR	['FPGA']	This sample shows a Binomial Tree Model for Option Pricing using a FPGA-optimized reference design of the Cox-Ross-Rubinstein (CRR) Binomial Tree Model with Greeks for American exercise options
Census end-to-end workload	['CPU']	This sample illustrates using Modin and daal optimized scikit-learn to build and run an end-to-end machine learning workload
Chapter 01 - Introduction	['CPU', 'GPU']	Chapter 1 source code examples: Introduction
Chapter 02 - Where Code Executes	['CPU', 'GPU']	Chapter 2 source code examples: Where Code Executes
Chapter 03 - Data Management	['CPU', 'GPU']	Chapter 3 source code examples: Data Management
Chapter 04 - Expresssing Parallelism	['CPU', 'GPU']	Chapter 4 source code examples: Expressing Parallelism
Chapter 05 - Error Handling	['CPU', 'GPU']	Chapter 5 source code examples: Error Handling
Chapter 06 - Unified Shared Memory	['CPU', 'GPU']	Chapter 6 source code examples: Unified Shared Memory
Chapter 07 - Buffers	['CPU', 'GPU']	Chapter 7 source code examples: Buffers
Chapter 08 - Scheduling Kernals and Data Movement	['CPU', 'GPU']	Chapter 8 source code examples: Scheduling Kernels and Data Movement
Chapter 09 - Communication and Synchronization	['CPU', 'GPU']	Chapter 9 source code examples: Communication and Synchronization
Chapter 10 - Defining Kernels	['CPU', 'GPU']	Chapter 10 source code examples: Defining Kernels
Chapter 11 - Vectors	['CPU', 'GPU']	Chapter 11 source code examples: Vectors
Chapter 12 - Device Information	['CPU', 'GPU']	Chapter 12 source code examples: Device Information
Chapter 13 - Practical Tips	['CPU', 'GPU']	Chapter 13 source code examples: Practical Tips
Chapter 14 - Common Parallel Patterns	['CPU', 'GPU']	Chapter 14 source code examples: Common Parallel Patterns
Chapter 15 - Programming for GPUs	['CPU', 'GPU']	Chapter 15 source code examples: Programming for GPUs
Chapter 16 - Programming for CPUs	['CPU', 'GPU']	Chapter 16 source code examples: Programming for CPUs
Chapter 17 - Programming for FPGA	['CPU', 'GPU']	Chapter 17 source code examples: Programming for FPGAs
Chapter 18 - Libraries	['CPU', 'GPU']	Chapter 18 source code examples: Libraries
Chapter 19 - Memory Model and Atomics	['CPU', 'GPU']	Chapter 19 source code examples: Memory Model and Atomics
Chapter 20 - Epilogue Future Direction	['CPU', 'GPU']	Epilogue source code examples: Future Direction of DPC++
Complex Mult	['CPU', 'GPU']	This sample computes Complex Number Multiplication
Compute Units	['FPGA']	Intel® FPGA tutorial showcasing a design pattern to enable the creation of compute units
Computed Tomography	['CPU', 'GPU']	Reconstruct an image from simulated CT data with Intel® oneMKL
DB	['FPGA']	An FPGA reference design that demonstrates high-performance Database Query Acceleration on Intel® FPGAs
DPC Reduce	['CPU', 'GPU']	This sample models transform Reduce in different ways showing capability of Intel® oneAPI
DPC++ Essentials Tutorials	['CPU', 'GPU']	DPC++ Essentials Tutorials using Jupyter Notebooks
DPC++ OpenCL Interoperability Samples	['CPU', 'GPU']	Samples showing DPC++ and OpenCL Interoperability
DPCPP Blur	['CPU', 'GPU']	Sample that shows how to use Intel® Video Processing Library (VPL) and Intel® oneAPI Data Parallel C++ (DPC++) to convert I420 raw video file in to BGRA and blur each frame
DPCPP Interoperability	['CPU', 'GPU']	Intel® oneDNN SYCL extensions API programming for both Intel® CPU and GPU
Debugger: Array Transform	['CPU', 'GPU']	A small Intel® oneAPI Data Parallel C++ (DPC++) example that is used in the "Get Started Guide" of the Application Debugger to exercise major debugger functionality
Digital In	['CPU']	Demonstrate how to read a digital value from an input pin using the Eclipse* MRAA library
Digital Out	['CPU']	Demonstrate how to write a digital value to an output pin using the Eclipse* MRAA library
Discrete Cosine Transform	['CPU', 'GPU']	An image processing algorithm as seen in the JPEG compression standard
Double Buffering	['FPGA']	Intel® FPGA tutorial design to demonstrate overlapping kernel execution with buffer transfers and host-processing to improve system performance
Dynamic Profiler	['FPGA']	An Intel® FPGA tutorial demonstrating how to use the Intel® FPGA Dynamic Profiler for Intel® oneAPI Data Parallel C++ (DPC++) to dynamically collect performance data and reveal areas for optimization
Explicit Data Movement	['FPGA']	An Intel® FPGA tutorial demonstrating an alternative coding style, explicit USM, in which all data movement is controlled explicitly by the author
FPGA Compile	['FPGA']	Intel® FPGA tutorial introducing how to compile Intel® oneAPI Data Parallel C++ (DPC++) for Intel® FPGA
FPGA Reg	['FPGA']	An Intel® FPGA advanced tutorial demonstrating how to apply the Intel® oneAPI Data Parallel C++ (DPC++) extension INTEL::fpga_reg
Fast Recompile	['FPGA']	An Intel® FPGA tutorial demonstrating how to separate the compilation of host and device code to save development time
Folder Options DPCT	['CPU']	Multi-folder project that illustrates migration of a CUDA project that has files located in multiple folders in a directory tree. Uses the `--in-root` and `--out-root` options to tell the Intel® Data Parallel C++ (DPC++) Compatibility Tool where to locate source code to be migrated
Fourier Correlation	['CPU', 'GPU']	Compute 1D Fourier correlation with Intel® oneMKL
GZIP	['FPGA']	Reference design demonstrating high-performance GZIP compression on Intel® FPGA
Gamma Correction	['CPU', 'GPU']	Gamma Correction - a nonlinear operation used to encode and decode the luminance of each image pixel
Getting Started	['CPU', 'GPU']	Basic Intel® oneDNN programming model for both Intel® CPU and GPU
Hello Decode	['CPU', 'GPU']	Sample that shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform a simple video decode
Hello Encode	['CPU', 'GPU']	Sample that shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform a simple video encode
Hello IoT World	['CPU']	This is a basic sample that outputs the classic 'Hello World' message along with the compiler identification string
Hello VPP	['CPU', 'GPU']	Sample that shows how to use the Intel® oneAPI Video Processing Library (VPL) to perform simple video processing
Hello World GPU	['GPU']	Template 'Hello World' on GPU
Hidden Markov Models	['CPU', 'GPU']	Hidden Markov Models using Intel® oneAPI Data Parallel C++
Histogram	['CPU', 'GPU']	This sample demonstrates Histogram using Dpstd APIs
Host-Device Streaming using USM	['FPGA']	An FPGA tutorial demonstrating how to stream data between the host and device with low latency and high throughput
IBM Device	['CPU']	This project shows how-to develop a device code using Watson IoT Platform iot-c device client library, connect and interact with Watson IoT Platform Service
IO streaming with DPC++ IO pipes	['FPGA']	An FPGA tutorial describing how to stream data to and from DPC++ IO pipes.
ISO2DFD DPCPP	['CPU', 'GPU']	The ISO2DFD sample illustrates Intel® oneAPI Data Parallel C++ (DPC++) Basics using 2D Finite Difference Wave Propagation
ISO3DFD DPCPP	['CPU']	The ISO3DFD Sample illustrates Intel® oneAPI Data Parallel C++ (DPC++) using Finite Difference Stencil Kernel for solving 3D Acoustic Isotropic Wave Equation
ISO3DFD OMP Offload	['GPU']	A Finite Difference Stencil Kernel for solving 3D Acoustic Isotropic Wave Equation using OpenMP* (OMP)
Intel® Low Precision Optimization Tool Tensorflow Getting Started	['CPU']	This sample illustrates how to run LPOT to quantize the FP32 model trained by Keras on Tensorflow to INT8 model to speed up the inference.
Intel® Modin Getting Started	['CPU']	This sample illustrates how to use Modin accelerated Pandas functions and notes the performance gain when compared to standard Pandas functions
Intel® PyTorch Getting Started	['CPU']	This sample illustrates how to train a PyTorch model and run inference with Intel® oneMKL and Intel® oneDNN
Intel® Python Daal4py Distributed K-Means	['CPU']	This sample code illustrates how to train and predict with a distributed K-Means model with the Intel® Distribution of Python using the Python API package Daal4py for Intel® oneDAL
Intel® Python Daal4py Distributed Linear Regression	['CPU']	This sample code illustrates how to train and predict with a Distributed Linear Regression model with the Intel® Distribution of Python using the Python API package Daal4py for Intel® oneDAL
Intel® Python Daal4py Getting Started	['CPU']	This sample illustrates how to do Batch Linear Regression using the Python API package Daal4py for Intel® oneDAL
Intel® Python XGBoost Daal4py Prediction	['CPU']	This sample code illustrates how to analyze the performance benefit of minimal code changes to port pre-trained XGBoost model to daal4py prediction for much faster prediction than XGBoost prediction
Intel® Python XGBoost Getting Started	['CPU']	The sample illustrates how to setup and train an XGBoost model on datasets for prediction
Intel® Python XGBoost Performance	['CPU']	This sample code illustrates how to analyze the performance benefit from using Intel optimizations upstreamed by Intel to latest XGBoost compared to un-optimized XGBoost 0.81
Intel® TensorFlow Horovod Multinode Training	['CPU']	This sample illustrates how to train a TensorFlow model on multiple nodes in a cluster using Horovod
Intel® TensorFlow Model Zoo Inference With FP32 Int8	['CPU']	This code example illustrates how to run FP32 and Int8 inference on Resnet50 with TensorFlow using Intel® Model Zoo
Intel® Tensorflow Getting Started	['CPU']	This sample illustrates how to train a TensorFlow model and run inference with oneMKL and oneDNN.
Interrupt	['CPU']	Demonstrates how to react on an Eclipse* MRAA digital pin event with an ISR (Interrupt Service Routine), which will run independently of the main program flow
Intrinsics	['CPU']	Demonstrates the Intrinsic functions of the Intel® oneAPI C++ Compiler Classic
Jacobi	['CPU', 'GPU']	A small Intel® oneAPI Data Parallel C++ (DPC++) example which solves a harcoded linear system with Jacobi iteration. The sample includes two versions of the same program: with and without bugs.
Kernel Args Restrict	['FPGA']	Explain the kernel_args_restrict attribute and its effect on the performance of Intel® FPGA kernels
LSU Control	['FPGA']	An Intel® FPGA tutorial demonstrating how to configure the load-store units (LSU) in your Intel® oneAPI Data Parallel C++ (DPC++) program using the LSU controls extension
Lidar Object Detection using PointPillars	['CPU', 'GPU']	Object detection using a LIDAR point cloud as input. This implementation is based on the paper 'PointPillars: Fast Encoders for Object Detection from Point Clouds'
Loop Coalesce	['FPGA']	An Intel® FPGA tutorial demonstrating the loop_coalesce attribute
Loop IVDEP	['FPGA']	An Intel® FPGA tutorial demonstrating the usage of the loop_ivdep attribute
Loop Initiation Interval	['FPGA']	An Intel® FPGA tutorial demonstrating the usage of the initiation_interval attribute
Loop Unroll	['CPU', 'GPU']	Demonstrates the use of loop unrolling as a simple optimization technique to speed up compute and increase memory access throughput.
Loop Unroll	['FPGA']	An Intel® FPGA tutorial design demonstrating the loop_unroll attribute
MVDR Beamforming	['FPGA']	A reference design demonstrating a high-performance streaming MVDR beamformer
Makefile FPGA	['FPGA']	Project Templates - Linux Makefile project for FPGA
Makefile GPU	['GPU']	Project Templates - Linux Makefile project for GPU
Mandelbrot	['CPU', 'GPU']	The Mandelbrot Set - a fractal example in mathematics
Mandelbrot OMP	['CPU', 'GPU']	Calculates the Mandelbrot Set and outputs a BMP image representation using OpenMP* (OMP)
Matrix Mul	['CPU', 'GPU']	This sample Multiplies two large Matrices in parallel using Intel® oneAPI Data Parallel C++ (DPC++) and OpenMP* (OMP)
Matrix Mul MKL	['CPU', 'GPU']	Accelerate Matrix Multiplication with Intel® oneMKL
Matrix Multiply Advisor	['CPU', 'GPU']	Simple program that shows how to improve the Intel® oneAPI Data Parallel C++ (DPC++) Matrix Multiplication program using Intel® VTune™ Profiler and Intel® Advisor
Matrix Multiply VTune Profiler	['CPU', 'GPU']	Simple program that shows how to improve the Intel® oneAPI Data Parallel C++ (DPC++) Matrix Multiplication program using Intel® VTune™ Profiler and Intel® Advisor
Max Interleaving	['FPGA']	An Intel® FPGA tutorial demonstrating the usage of the loop max_interleaving attribute
Memory Attributes	['FPGA']	An Intel® FPGA tutorial demonstrating the use of on-chip memory attributes to control memory structures in a Intel® oneAPI Data Parallel C++ (DPC++) program
Merge SPMV	['CPU', 'GPU']	Sparse Matrix Vector sample provides a parallel implementation of a Merge based Sparse Matrix and Vector Multiplication Algorithm using Intel® oneAPI Data Parallel C++ (DPC++)
Merge Sort	['FPGA']	A Reference design demonstrating merge sort on an Intel® FPGA
MergeSort OMP	['CPU']	Classic OpenMP* (OMP) Mergesort algorithm
Monte Carlo European Opt	['CPU', 'GPU']	Monte Carlo Simulation of European Options pricing with Intel® oneMKL random number generators
Monte Carlo Pi	['CPU', 'GPU']	Monte Carlo procedure for estimating Pi
Monte Carlo Pi	['CPU', 'GPU']	Estimating Pi with Intel® oneMKL random number generators
N-Body	['CPU', 'GPU']	An N-Body simulation is a simulation of a dynamical system of particles, usually under the influence of physical forces, such as gravity. This N-Body sample code is implemented using Intel® oneAPI Data Parallel C++ (DPC++) for CPU and GPU
N-Way Buffering	['FPGA']	Intel® FPGA tutorial design to demonstrate overlapping kernel execution with buffer transfers and multi-threaded host-processing to improve system performance
On-Board Blink	['CPU']	Demonstrates how to blink the on board LED, by writing a digital value to an output pin using the Eclipse* MRAA library
On-Chip Memory Cache	['FPGA']	Intel® FPGA tutorial demonstrating the caching of on-chip memory to reduce loop initiation interval
OpenMP Offload C++ Tutorials	['CPU', 'GPU']	C++ OpenMP Offload Basics using Jupyter Notebooks
OpenMP Offload Feature Samples	['CPU', 'GPU']	Samples showing new OpenMP Offload features supported
OpenMP Offload Fortran Tutorials	['CPU', 'GPU']	Fortran OpenMP Offload Basics using Jupyter Notebooks
OpenMP* Primes	['CPU']	Fortran Tutorial - Using OpenMP* (OMP)
OpenMP* Reduction	['CPU', 'GPU']	This sample models OpenMP* (OMP) Reduction in different ways showing capability of Intel® oneAPI
Optimize Inner Loop	['FPGA']	An Intel® FPGA tutorial design demonstrating how to optimize the throughput of inner loops with low trip counts
Optimize Integral	['CPU']	Fortran Sample - Simple Compiler Optimizations
Optimize TensorFlow pre-trained model for inference	['CPU']	This tutorial will guide you how to optimize a pre-trained model for a better inference performance, and also analyze the model pb files before and after the inference optimizations.
PWM	['CPU']	Demonstrate how to use PWM with an output pin using the Eclipse* MRAA library. If the output is connected to a led, its brightness will vary depending on the duty cycle
Particle Diffusion	['CPU', 'GPU']	The Particle Diffusion code sample illustrates Intel® oneAPI Data Parallel C++ (DPC++) using a simple (non-optimized) implementation of a Monte Carlo Simulation of the Diffusion of Water Molecules in Tissue
Pipe Array	['FPGA']	An Intel® FPGA tutorial showcasing a design pattern to enables the creation of arrays of pipes
Pipes	['FPGA']	How to use Pipes to transfer data between kernels on an Intel® FPGA
Prefix Sum	['CPU', 'GPU']	Compute Prefix Sum using Intel® oneAPI Data Parallel C++ (DPC++)
Private Copies	['FPGA']	An Intel® FPGA tutorial demonstrating how to use the private_copies attribute to trade off the resource use and the throughput of a DPC++ FPGA program
QRD	['FPGA']	Reference design demonstrating high-performance QR Decomposition (QRD) of complex matrices on a Intel® FPGA
Random Sampling Without Replacement	['CPU', 'GPU']	Multiple simple random sampling without replacement with Intel® oneMKL random number generators
Remove Loop Carried Dependency	['FPGA']	An Intel® FPGA tutorial design demonstrating performance optimization by removing loop carried dependencies
Rodinia NW DPCT	['CPU']	Migrate a CUDA project using the Intel® DPCT intercept-build feature to create a compilation database. The compilation database provides compilation options, settings, macro definitions and include paths that the Intel® Data Parallel C++ (DPC++) Compatibility Tool (DPCT) will use during migration of the project
STREAM	['CPU', 'GPU']	The STREAM is a program that measures memory transfer rates in MB/s for simple computational kernels coded in C
Sepia Filter	['CPU', 'GPU']	A program that converts an image to Sepia Tone
Shannonization	['FPGA']	An Intel® FPGA tutorial design that demonstrates an optimization for removing computation from the critical path and improves Fmax/II
Simple Add	['CPU', 'GPU', 'FPGA']	This simple sample adds two large vectors in parallel and provides a ‘Hello World!’ like sample to ensure your environment is setup correctly using Intel® oneAPI Data Parallel C++ (DPC++)
Simple Model	['CPU', 'GPU']	Run a simple CNN on both Intel® CPU and GPU with sample C++ codes
Sparse Conjugate Gradient	['CPU', 'GPU']	Solve Sparse linear systems with the Conjugate Gradient method using Intel® oneMKL sparse BLAS
Speculated Iterations	['FPGA']	An Intel® FPGA tutorial demonstrating the speculated_iterations attribute
Stable Sort By Key	['CPU', 'GPU']	This sample models Stable Sort By Key during the sorting of 2 sequences (keys and values) only keys are compared but both keys and values are swapped
Stall Enable	['FPGA']	An Intel® FPGA tutorial demonstrating the use_stall_enable_clusters attribute
Student's T-test	['CPU', 'GPU']	Performing Student's T-test with Intel® oneMKL Vector Statistics functionality
System Profiling	['FPGA']	An Intel® FPGA tutorial demonstrating how to use the OpenCL* Intercept Layer to improve a design with the double buffering optimization
TBB ASYNC SYCL	['CPU', 'GPU']	This sample illustrates how computational kernel can be split for execution between CPU and GPU using Intel® oneTBB Flow Graph asynchronous node and functional node. The Flow Graph asynchronous node uses SYCL to implement calculations on GPU while the functional node does CPU part of calculations. This TBB ASYNC SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
TBB Resumable Tasks SYCL	['CPU', 'GPU']	This sample illustrates how computational kernel can be split for execution between CPU and GPU using Intel® oneTBB Resumable Task and parallel_for. The Intel® oneTBB resumable task uses SYCL to implement calculations on GPU while the parallel_for algorithm does CPU part of calculations. This TBB Resumable Tasks SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
TBB Task SYCL	['CPU', 'GPU']	This sample illustrates how 2 Intel® oneTBB tasks can execute similar computational kernels with one task executing SYCL code and another one executing the Intel® oneTBB code. This TBB Task SYCL sample code is implemented using C++ and SYCL language for Intel® CPU and GPU
Triangular Loop	['FPGA']	An Intel® FPGA tutorial demonstrating an advanced optimization technique for triangular loops
Tutorials	['CPU', 'GPU']	Intel® oneCCL Tutorials
Tutorials	['CPU', 'GPU']	Intel® oneDNN Tutorials
UP2 LEDS	['CPU']	This sample shows how to use the LED class and APIs of the Eclipse* MRAA library. It is intended to run on the UP Squared board, and will utilize the 4 built-in color LEDs located under the Ethernet ports. No additional hardware is required
Use Library	['FPGA']	An Intel® FPGA Tutorial demonstrating how to create Intel® FPGA libraries and to incorporate them in a Intel® oneAPI Data Parallel C++ (DPC++) project
Vector Add DPCT	['CPU']	Simple project to illustrate the basic migration of CUDA code. Use this sample to ensure your environment is configured correctly and to understand the basics of migrating existing CUDA projects to Intel® oneAPI Data Parallel C++ (DPC++)
Vectorize VecMatMult	['CPU']	Fortran Tutorial - Using Auto Vectorization
Zero Copy Data Transfer	['FPGA']	An Intel® FPGA tutorial demonstrating zero-copy host memory using the SYCL restricted Unified Shared Memory (USM) model
oneCCL Getting Started	['CPU', 'GPU']	Basic Intel® oneCCL programming model for both Intel® CPU and GPU
Total Samples: 157

Report Generated on: August 02, 2021