Training on GPUs - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki

When trying to fit a model using Tensorflow, by default, the first GPU is selected. Tensorflow will try to allocate memory for the training, however if there is not enough memory available on this GPU (possibly because it is already beeing used), then the following error might occur:

unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error

In this case the training can be moved to the second GPU by executing

# Specify which GPU(s) to use
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Or 2, 3, etc. other than 0

config = tf.compat.v1.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True
tf.compat.v1.Session(config=config)

Note that this should be the first line you execute after importing tensorflow, otherwise the first GPU gpu:0 will be automatically selected.