StarPU DFT - Song655/SKA-DFT GitHub Wiki

Install StarPU

Install dependence library:

1.Install CUDA and set CUDA environment

2.Install ATLAS for using ATLAS BLAS library (--need it when we test Cholesky)

sudo add-apt-repository universe 
sudo add-apt-repository main
sudo apt-get update 
sudo apt-get install libatlas-base-dev liblapack-dev libblas-dev

Set environment: (Add the following into the last line of the file home/.bashrc)

export C_INCLUDE_PATH=/usr/include/atlas:$C_INCLUDE_PATH

3.Install FFTW (we install fftw-3.3.8) (--need it when we test Starpufft)

Installing FFTW in both single and double precision:

./configure --enable-shared [ other options ]
sudo make CFLAGS=-fPIC
sudo make install

make clean
./configure --enable-shared --enable-float [ other options ]
sudo make CFLAGS=-fPIC
sudo make install

Install StarPU

Getting StarPU Source code (The code used version 1.3.*)

1.Configuring StarPU and install

./autogen.sh 
mkdir build 
cd build 
../configure --prefix=$HOME/starpu --enable-openmp --disable-opencl --enable-blas-lib=atlas --enable-cuda --enable-starpufft-examples 
make 
make check 
make install

Note: --prefix=$HOME/starpu is to specify the path you want to install.

2.Set the environment:

export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$STARPU_PATH/lib/pkgconfig
export C_INCLUDE_PATH=$C_INCLUDE_PATH:$STARPU_PATH/include/starpu/1.3
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$STARPU_PATH/lib

Note: replace the $STARPU_PATH to your starpu installed path.

Source Code

Build:

mkdir build
cd build
cmake ..
make

Test:

a.Test CPU version:

./dft cpu

b.Test CUDA version:

./dft cuda

c.Test StarPU version:

STARPU_SCHED=ws STARPU_WORKER_STATS=1 ./dft starpu

You can specify the number of cpu and device of cuda:

STARPU_NCPU=2 STARPU_NCUDA=1 STARPU_SCHED=ws STARPU_WORKER_STATS=1 ./dft starpu

Code Structure

main.cpp     : the entry of the code. 
dft_cuda.cu  : the algorithm of DFT in cude version 
dtf_cpu.cpp  : the algorithm of DFT in cpu serial version
dft_starpu.cu: the algorithm of DFT in starpu version.
util.cpp     : Method of reading and saving files; and configuration of the algorithm, e.g. setting number of cuda threads per block, setting the number of starpu tasks, etc.
        Note : for different input data formats, we need to modify the reading method.
dft.h        : the header, we can set single precision or double precision in this file.

Apply StarPU

The algorithm of DFT is about:

for (int i=0; i < num_visibilities; i++){
    sum_source = {0,0};
    for(int s = 0; s < num_source; s++){
        theta_c ={cos(source[s] * theta) , sin(source[s] * theta)} ; 
        sum_source += theta_c;
    }
}

According to the algorithm, we can separate the outer loop into many small loops, because there is no dependence for each item in visibilities. After separation (separate into num_tasks), the algorithm may looks like:

Taks 1: 
    for (i=0; i < num_ visibilities / num_tasks; i++)
       for (j=0; j < num_source; j++)
Taks 2: 
    for (i=num_ visibilities /num_tasks; i < 2*num_ visibilities / num_tasks; i++)
       for (j=0; j < num_source; j++)
Taks 3: 
    for (i=2*num_ visibilities / num_tasks; i < 3*num_ visibilities / num_tasks; i++)
       for (j=0; j < num_source; j++)
…

In StarPU, we can use starpu_data_partition(num_tasks) to partition the data into num_tasks sub data ,and use starpu_data_get_sub_data(i) to get the ith sub data. For each sub data, we create a task, and then submit these tasks into starpu, StarPU will schedule these tasks into CPU or CUDA device for computing according the STARPU_SCHED.