I often find myself training a neural network on my while simultaneously attempting to implement changes for a different experiment. In this case, I typically run the training on the GPU and perform debugging on the CPU.
I do this primarily to avoid the dreaded out of memory error. However, if you’re really maxing out memory usage on your GPU, the error may still appear. This is due to the fact that PyTorch writes information to the default GPU device regardless of where you place your tensors and model.
To prevent this, you need to empty the environment variable CUDA_VISIBLE_DEVICES, which could be done via your bashrc or a manual export:
. However, it’s simplest to hide your GPU devices when you call your Python program, by running it with the following syntax:
CUDA_VISIBLE_DEVICES= python your_program.py