StarPU DFT - Song655/SKA-DFT GitHub Wiki
Install StarPU
Install dependence library:
1.Install CUDA and set CUDA environment
2.Install ATLAS for using ATLAS BLAS library (--need it when we test Cholesky)
sudo add-apt-repository universe
sudo add-apt-repository main
sudo apt-get update
sudo apt-get install libatlas-base-dev liblapack-dev libblas-dev
Set environment: (Add the following into the last line of the file home/.bashrc)
export C_INCLUDE_PATH=/usr/include/atlas:$C_INCLUDE_PATH
3.Install FFTW (we install fftw-3.3.8) (--need it when we test Starpufft)
Installing FFTW in both single and double precision:
./configure --enable-shared [ other options ]
sudo make CFLAGS=-fPIC
sudo make install
make clean
./configure --enable-shared --enable-float [ other options ]
sudo make CFLAGS=-fPIC
sudo make install
Install StarPU
Getting StarPU Source code (The code used version 1.3.*)
1.Configuring StarPU and install
./autogen.sh
mkdir build
cd build
../configure --prefix=$HOME/starpu --enable-openmp --disable-opencl --enable-blas-lib=atlas --enable-cuda --enable-starpufft-examples
make
make check
make install
Note: --prefix=$HOME/starpu is to specify the path you want to install.
2.Set the environment:
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$STARPU_PATH/lib/pkgconfig
export C_INCLUDE_PATH=$C_INCLUDE_PATH:$STARPU_PATH/include/starpu/1.3
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$STARPU_PATH/lib
Note: replace the $STARPU_PATH to your starpu installed path.
Source Code
Build:
mkdir build
cd build
cmake ..
make
Test:
a.Test CPU version:
./dft cpu
b.Test CUDA version:
./dft cuda
c.Test StarPU version:
STARPU_SCHED=ws STARPU_WORKER_STATS=1 ./dft starpu
You can specify the number of cpu and device of cuda:
STARPU_NCPU=2 STARPU_NCUDA=1 STARPU_SCHED=ws STARPU_WORKER_STATS=1 ./dft starpu
Code Structure
main.cpp : the entry of the code.
dft_cuda.cu : the algorithm of DFT in cude version
dtf_cpu.cpp : the algorithm of DFT in cpu serial version
dft_starpu.cu: the algorithm of DFT in starpu version.
util.cpp : Method of reading and saving files; and configuration of the algorithm, e.g. setting number of cuda threads per block, setting the number of starpu tasks, etc.
Note : for different input data formats, we need to modify the reading method.
dft.h : the header, we can set single precision or double precision in this file.
Apply StarPU
The algorithm of DFT is about:
for (int i=0; i < num_visibilities; i++){
sum_source = {0,0};
for(int s = 0; s < num_source; s++){
theta_c ={cos(source[s] * theta) , sin(source[s] * theta)} ;
sum_source += theta_c;
}
}
According to the algorithm, we can separate the outer loop into many small loops, because there is no dependence for each item in visibilities. After separation (separate into num_tasks), the algorithm may looks like:
Taks 1:
for (i=0; i < num_ visibilities / num_tasks; i++)
for (j=0; j < num_source; j++)
Taks 2:
for (i=num_ visibilities /num_tasks; i < 2*num_ visibilities / num_tasks; i++)
for (j=0; j < num_source; j++)
Taks 3:
for (i=2*num_ visibilities / num_tasks; i < 3*num_ visibilities / num_tasks; i++)
for (j=0; j < num_source; j++)
…
In StarPU, we can use starpu_data_partition(num_tasks) to partition the data into num_tasks sub data ,and use starpu_data_get_sub_data(i) to get the ith sub data. For each sub data, we create a task, and then submit these tasks into starpu, StarPU will schedule these tasks into CPU or CUDA device for computing according the STARPU_SCHED.