QUDA Environment Variables - lattice/quda GitHub Wiki

Below is a list of QUDA specific environment variables. We also include a list of useful CUDA specific variables at the bottom of the page.

Variable name	Function
`QUDA_RESOURCE_PATH`	Path where tune cache and profile files will be output
`QUDA_PROFILE_OUTPUT_BASE`	Filename prefix for profile output. Setting this will result in the files `$(QUDA_PROFILE_OUTPUT_BASE).tsv` and `$(QUDA_PROFILE_OUTPUT_BASE_async).tsv` and being written out (default is simply `profile.tsv` and `profile_async.tsv`)
`QUDA_ENABLE_P2P`	`QUDA_ENABLE_P2P=0` # disable all p2p transfers`QUDA_ENABLE_P2P=1` # enable only copy engines`QUDA_ENABLE_P2P=2` # enable only remote writing`QUDA_ENABLE_P2P=3` # enable both copy engines and remote writing Default is enabling copy engines and remote writing (3)
`QUDA_ENABLE_P2P_MAX_ACCESS_RANK`	Set limit on which GPUs are peer-to-peer connected (use to disable low-bandwidth connections), e.g., `QUDA_ENABLE_P2P_MAX_ACCESS_RANK=0` would limit to only highest bandwidth connections
`QUDA_ENABLE_TUNING`	Enable / disable kernel autotuning. Default is enabled, disable with `QUDA_ENABLE_TUNING=0`
`QUDA_REORDER_LOCATION`	Set where data should be reordered when transferring CPU<->GPU (default is GPU)
`QUDA_RANK_VERBOSITY`	Set which global ranks are active in `printfQuda` calls (default is rank 0)
`QUDA_ENABLE_DEVICE_MEMORY_POOL`	Enable / disable device memory allocator (default is enabled, disable with `QUDA_ENABLE_DEVICE_MEMORY_POOL=0`
`QUDA_ENABLE_PINNED_MEMORY_POOL`	Enable / disable device memory allocator (default is enabled, disable with `QUDA_ENABLE_PINNED_MEMORY_POOL=0`
`QUDA_ENABLE_MANAGED_MEMORY`	Enable / disable using managed memory for allocations (default is disabled, enable with `QUDA_ENABLE_MANAGED_MEMORY=1`). Note: managed memory has some limitations for pre-Pascal architectures.
`QUDA_ENABLE_MANAGED_PREFETCH`	Enable / disable explicit managed memory prefetching calls (default is disabled, enable with `QUDA_ENABLE_MANAGED_PREFETCH=1`). Does nothing if `QUDA_ENABLE_MANAGED_MEMORY` isn't enabled.
`QUDA_ENABLE_NUMA`	Enabled NUMA placement. Default is enabled, if NUMA has been enabled in cmake, disabled with `QUDA_ENABLE_NUMA=0`
`QUDA_MILC_HISQ_RECONSTRUCT`	Set the reconstruct type in the MILC interface used for the long links in the HISQ solver. Allowed values are 9/13/18 with 18 the default
`QUDA_MILC_HISQ_RECONSTRUCT_SLOPPY`	Set the sloppy reconstruct type in the MILC interface used for the long links in the HISQ solver. Allowed values are 9/13/18 with 18 the default
`QUDA_ENABLE_GDR`	Enable GPU-Direct RDMA. Default is disabled, enabled with `QUDA_ENABLE_GDR=1`
`QUDA_ENABLE_ZERO_COPY`	Enable zero-copy policies (can be beneficial on systems without performant GDR). Default is disabled, enabled with `QUDA_ENABLE_ZERO_COPY=1`
`QUDA_ENABLE_NVSHMEM`	Enable NVSHMEM communication policies if QUDA is build with NVSHMEM support. Default is enabled, set to `0` to disable.
`QUDA_TEST_GRID_SIZE`	Set the process geometry for the unit tests. Overrides the `--gridsize` parameter if set.
`QUDA_TEST_GRID_PARTITION`	Set the process grid partition geometry for the unit tests (for split grid). Overrides the `--grid-partition` parameter if set.
`QUDA_ENABLE_MPS`	Enable support for MPS in QUDA. Generally not recommended except for testing purposes. Default is disabled, enable with `QUDA_ENABLE_MPS=1`
`QUDA_DEVICE_RESET`	Call `cudaDeviceReset` in `endQuda` - this legacy behavior can be useful for profiling, but destroys the CUDA context of other CUDA libraries outside of QUDA (e.g., GPU-aware MPI). Default is disabled, enable with `QUDA_DEVICE_RESET=1`
`QUDA_DETERMINISTIC_REDUCE`	Perform all MPI reductions deterministically: setting this flag means that post-tuning or no tuning, QUDA will run completely deterministically regardless of the rank order. Default is disabled, enable with `QUDA_DETERMINISTIC_REDUCE=1`
`QUDA_TUNE_VERSION_CHECK`	Set `QUDA_TUNE_VERSION_CHECK=0` to disable the check that prevents using a tunecache.tsv file from a different QUDA version
`QUDA_ENABLE_TUNING_SHARED`	Disable shared memory autotuning. Useful for checking the effect of this.
`QUDA_TUNING_RANK`	Set the global default rank for doing kernel autotuning (default is rank 0)
`QUDA_MAX_MULTI_RHS`	Set the maximum number of RHS per kernel. Default is 64 with large kernel arguments, and 16 otherwise.
`QUDA_ENABLE_MONITOR`	Set `QUDA_ENABLE_MONITOR=1` to enable device monitoring during execution. Monitor log dumped to the `QUDA_RESOURCE_PATH` upon `endQuda` being called.
`QUDA_ENABLE_MONITOR_PERIOD`	Set the monitor period in microseconds (default is 1000 microseconds = 1 millisecond)

CUDA environment variables

Variable name	Function
`CUDA_LAUNCH_BLOCKING`	If set to =0 (default behaviour) this will ensure that all kernels are launched synchronously. If set to =1, kernels are launched asynchronously. This will ensure that that error messages pertain to precisely the last kernel called