AMD Hackathon Notes - dawiedotcom/IMEX_SfloW2D_v2 GitHub Wiki

Notes for the EPCC/AMD Hackathon

Build Instructions

Current instructions are as follows and might change soon:

Modules

Load the AMD flang compiler:

$ module load amdflang-new/rocm-afar-6.1.0
$ which amdflang
> /opt/rocmplus-6.4.0/rocm-afar-6.1.0/bin/amdflang

Enable CPU/GPU shared memory

export HSA_XNACK=1

Get the code and generate the Makefile

$ git clone https://github.com/jklebes/IMEX_SfloW2D_v2 --branch=amd
$ cd IMEX_SfloW2D_v2
$ autoreconf
$ ./configure --prefix=$PWD
$ make FCFLAGS='-fopenmp' FC=amdflang LDFLAGS='' install

Create a python venv

Some of the examples require Python packages for pre and post-processing:

$ python -m venv .venv
$ . .venv/bin/activate
(.venv) $ pip install numpy pandas netcdf4 matplotlib

Run the Etna example

Same instructions as in [/EXAMPLES/EXAMPLE_ETNA/README.txt]:

(.venv) $ cd DEM
(.venv) $ unzip *.zip
(.venv) $ cd ..
(.venv) $ python create_input_ellipsoid.py
(.venv) $ ../../bin/IMEX_SfloW2D
> ...
> Time taken by iterations is 461.651952007 seconds                                                                    
> Elapsed real time =  483.532 seconds  
> ...

Compiling with OpenMP GPU offloading

I place the following in the src/Makefile.amd and invoke with make -f Makefile.amd from src/

TARGET=IMEX_SfloW2D

FCFLAGS=-fopenmp -fopenmp-force-usm --offload-arch=gfx942
LDFLAGS=-lflang_rt.hostdevice 

SRC= parameters_2d.f90 \
    complexify.f90 \
    geometry_2d.f90 \
    constitutive_2d.f90 \
    solver_2d.f90 \
    init_2d.f90 \
    inpout_2d.f90 \
    IMEX_SfloW2D.f90 

OBJ=$(patsubst %.f90, %.o, $(SRC))

$(info $(SRC))
$(info $(OBJ))

all : $(TARGET)

%.o : %.f90
        $(FC) $(FCFLAGS) -c -o $@ $<

$(TARGET) : $(OBJ)
        $(FC) $(FCFLAGS) $(LDFLAGS) -o $@ $(OBJ)

.phony: clean
clean:
        rm -rf $(TARGET) *.o *.mod

Profiling

Based on the tutorial in [1].

  • Added rocxt.f90 with the rocxt module to the IMEX project's src/ directory.
  • Added the following to a .f90 file to include the module:
    USE roctx, ONLY: roctxpop, roctxpush
    USE ISO_C_BINDING,   ONLY: c_null_char
  • Instrumented sections of the code with:
    CALL rocxtpush("some label" // c_null_char)
    ! ...
    CALL rocxtpop("some label" // c_null_char)
  • Added the following link flags:
    LDFLAGS+=-L${ROCM_PATH}/lib -lrocprofiler-sdk-roctx
  • Compiled the code and ran the profiler (from one of the example directories) with:
    rocprofv3 --sys-trace --marker-trace --output-format pftrace -- <bin_name>
    Which produced .ptrace files that can be visualised in ui.perfetto.dev.

Resources

⚠️ **GitHub.com Fallback** ⚠️