如何防止张量流分配GPU内存的总量?

我在一个计算资源共享的环境中工作，也就是说，我们有几台服务器机器，每台机器都配备了几个Nvidia Titan X gpu。

For small to moderate size models, the 12 GB of the Titan X is usually enough for 2–3 people to run training concurrently on the same GPU. If the models are small enough that a single model does not take full advantage of all the computational units of the GPU, this can actually result in a speedup compared with running one training process after the other. Even in cases where the concurrent access to the GPU does slow down the individual training time, it is still nice to have the flexibility of having multiple users simultaneously train on the GPU.

TensorFlow的问题在于，默认情况下，它在启动时分配了全部可用的GPU内存。即使是一个小型的两层神经网络，我看到所有12 GB的GPU内存都用完了。

有没有一种方法让TensorFlow只分配，比如说，4 GB的GPU内存，如果我们知道这对一个给定的模型来说已经足够了?

当前回答

以上答案都是指在TensorFlow 1中设置一定的内存。或者在TensorFlow 2.X中允许内存增长。

方法tf.config.experimental。Set_memory_growth确实适用于在分配/预处理期间允许动态增长。然而，人们可能喜欢从一开始就分配一个特定的GPU内存上限。

分配特定GPU内存的逻辑也是为了防止在训练期间使用OOM内存。例如，如果一个人在打开占用视频内存的Chrome选项卡/任何其他视频消耗过程时进行训练，tf.config.experimental. js将被调用。set_memory_growth(gpu, True)可能导致抛出OOM错误，因此在某些情况下需要从一开始就分配更多的内存。

TensorFlow 2中为每个GPU分配内存的推荐和正确方法。X是通过以下方式完成的:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]

2020-08-28 09:54:59

其他回答

对于Tensorflow 2.0和2.1版本，请使用以下代码片段:

 import tensorflow as tf
 gpu_devices = tf.config.experimental.list_physical_devices('GPU')
 tf.config.experimental.set_memory_growth(gpu_devices[0], True)

对于以前的版本，下面的代码段用于我:

import tensorflow as tf
tf_config=tf.ConfigProto()
tf_config.gpu_options.allow_growth=True
sess = tf.Session(config=tf_config)

2019-12-01 14:47:55

当你构造一个tf时，你可以设置GPU内存的分配比例。会话通过传递一个tf。GPUOptions作为可选配置参数的一部分:

# Assume that you have 12GB of GPU memory and want to allocate ~4GB:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

per_process_gpu_memory_fraction充当同一台机器上每个GPU上的进程将使用的GPU内存量的硬上限。目前，这个分数统一应用于同一台机器上的所有gpu;没有办法在每个gpu基础上设置这个。

2015-12-10 11:00:19

上面所有的答案都假设使用sess.run()调用来执行，这在TensorFlow的最新版本中成为异常而不是规则。

当使用tf。估计器框架(TensorFlow 1.4及以上)将分数传递给隐式创建的MonitoredTrainingSession的方式是，

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
trainingConfig = tf.estimator.RunConfig(session_config=conf, ...)
tf.estimator.Estimator(model_fn=..., 
                       config=trainingConfig)

类似地，在Eager模式下(TensorFlow 1.5及以上)，

opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
conf = tf.ConfigProto(gpu_options=opts)
tfe.enable_eager_execution(config=conf)

编辑:11-04-2018 例如，如果要使用tf.contrib.gan。Train，那么你可以使用类似bellow的东西:

tf.contrib.gan.gan_train(........, config=conf)

2018-02-08 03:25:04

我尝试在voc数据集上训练unet，但由于图像大小巨大，内存结束。我尝试了上面所有的技巧，甚至尝试了batch size==1，但没有任何改善。有时候TensorFlow版本也会导致内存问题。尝试使用

PIP install tensorflow-gpu==1.8.0

2018-10-16 06:05:52

以上答案都是指在TensorFlow 1中设置一定的内存。或者在TensorFlow 2.X中允许内存增长。

方法tf.config.experimental。Set_memory_growth确实适用于在分配/预处理期间允许动态增长。然而，人们可能喜欢从一开始就分配一个特定的GPU内存上限。

TensorFlow 2中为每个GPU分配内存的推荐和正确方法。X是通过以下方式完成的:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]

2020-08-28 09:54:59

如何防止张量流分配GPU内存的总量?

推荐文章

最新文章

标签