如何防止张量流分配GPU内存的总量?

我在一个计算资源共享的环境中工作，也就是说，我们有几台服务器机器，每台机器都配备了几个Nvidia Titan X gpu。

For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.

TensorFlow的问题在于，默认情况下，它在启动时分配了全部可用的GPU内存。即使是一个小型的两层神经网络，我看到所有12 GB的GPU内存都用完了。

有没有一种方法让TensorFlow只分配，比如说，4 GB的GPU内存，如果我们知道这对一个给定的模型来说已经足够了?

当前回答

以下是《TensorFlow深度学习》一书的节选

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as it is needed by the process. TensorFlow provides two configuration options on the session to control this. The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations, it starts out allocating very little memory, and as sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process.

1)允许增长:(更灵活)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二个方法是per_process_gpu_memory_fraction选项，它决定每个可见GPU应该分配的内存总量的百分比。注意:不需要释放内存，这样做甚至会恶化内存碎片。

2)分配固定内存:

每个GPU只分配40%的内存:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

注意: 不过，只有当你真的想绑定TensorFlow进程上可用的GPU内存数量时，这才有用。

2018-01-11 18:57:16

其他回答

你可以使用

TF_FORCE_GPU_ALLOW_GROWTH=true

在环境变量中。

在tensorflow代码中:

bool GPUBFCAllocator::GetAllowGrowthValue(const GPUOptions& gpu_options) {
  const char* force_allow_growth_string =
      std::getenv("TF_FORCE_GPU_ALLOW_GROWTH");
  if (force_allow_growth_string == nullptr) {
    return gpu_options.allow_growth();
}

2019-06-02 17:15:29

config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)

https://github.com/tensorflow/tensorflow/issues/1578

2016-05-26 07:43:45

以下是《TensorFlow深度学习》一书的节选

1)允许增长:(更灵活)

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二个方法是per_process_gpu_memory_fraction选项，它决定每个可见GPU应该分配的内存总量的百分比。注意:不需要释放内存，这样做甚至会恶化内存碎片。

2)分配固定内存:

每个GPU只分配40%的内存:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

注意: 不过，只有当你真的想绑定TensorFlow进程上可用的GPU内存数量时，这才有用。

2018-01-11 18:57:16

我尝试在voc数据集上训练unet，但由于图像大小巨大，内存结束。我尝试了上面所有的技巧，甚至尝试了batch size==1，但没有任何改善。有时候TensorFlow版本也会导致内存问题。尝试使用

PIP install tensorflow-gpu==1.8.0

2018-10-16 06:05:52

对于TensorFlow 2.0和2.1 (docs):

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth(True)

对于TensorFlow 2.2+ (docs):

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
  tf.config.experimental.set_memory_growth(gpu, True)

文档还列出了更多的方法:

设置环境变量TF_FORCE_GPU_ALLOW_GROWTH为true。使用tf.config.experimental。set_virtual_device_configuration设置虚拟GPU设备的硬限制。

2019-04-05 18:26:37

如何防止张量流分配GPU内存的总量?

推荐文章

最新文章

标签