如何在tensorflow中获得当前可用的gpu ?

我有一个使用分布式TensorFlow的计划，我看到TensorFlow可以使用gpu进行训练和测试。在集群环境中，每台机器可能有0个或1个或多个gpu，我想在尽可能多的机器上运行我的TensorFlow图。

我发现当运行tf.Session()时，TensorFlow在日志消息中给出了关于GPU的信息，如下所示:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)

我的问题是如何从TensorFlow获取当前可用GPU的信息?我可以从日志中获得加载的GPU信息，但我想以一种更复杂的编程方式来实现。我也可以故意使用CUDA_VISIBLE_DEVICES环境变量限制GPU，所以我不想知道从OS内核获取GPU信息的方法。

简而言之，我想要一个函数像tf.get_available_gpu()将返回['/gpu:0'， '/gpu:1']如果有两个gpu可用的机器。我如何实现这个?

当前回答

除了Mrry的精彩解释(他建议使用device_lib.list_local_devices())之外，我还可以向您展示如何从命令行检查GPU相关信息。

因为目前只有Nvidia的gpu适用于NN框架，所以答案只涉及它们。Nvidia有一个页面记录了如何使用/proc文件系统接口来获取有关驱动程序、任何已安装的Nvidia显卡和AGP状态的运行时信息。

/proc/driver/nvidia/gpus/0..N/information 提供有关每个安装的NVIDIA图形适配器(型号名称，IRQ, BIOS 版本，总线类型)。注意，BIOS版本仅在 X正在运行。

因此，你可以从命令行cat /proc/driver/nvidia/ GPU /0/information运行这个命令，并查看关于你的第一个GPU的信息。从python中运行这个很容易，你也可以检查第二个、第三个、第四个GPU，直到它失败。

当然，Mrry的答案更加可靠，我不确定我的答案是否适用于非linux机器，但Nvidia的页面提供了其他有趣的信息，这些信息不是很多人知道的。

2017-07-29 04:31:12

其他回答

在任何shell中运行以下命令

python -c "import tensorflow as tf; print(\"Num GPUs Available: \", len(tf.config.list_physical_devices('GPU')))"

2022-04-03 20:48:48

除了Mrry的精彩解释(他建议使用device_lib.list_local_devices())之外，我还可以向您展示如何从命令行检查GPU相关信息。

/proc/driver/nvidia/gpus/0..N/information 提供有关每个安装的NVIDIA图形适配器(型号名称，IRQ, BIOS 版本，总线类型)。注意，BIOS版本仅在 X正在运行。

当然，Mrry的答案更加可靠，我不确定我的答案是否适用于非linux机器，但Nvidia的页面提供了其他有趣的信息，这些信息不是很多人知道的。

2017-07-29 04:31:12

用这种方法检查所有部件:

from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds


version = tf.__version__
executing_eagerly = tf.executing_eagerly()
hub_version = hub.__version__
available = tf.config.experimental.list_physical_devices("GPU")

print("Version: ", version)
print("Eager mode: ", executing_eagerly)
print("Hub Version: ", h_version)
print("GPU is", "available" if avai else "NOT AVAILABLE")

2020-01-16 09:16:48

tensorflow 2中的工作如下:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

从2.1开始，你可以放弃实验性:

    gpus = tf.config.list_physical_devices('GPU')

https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices

2019-10-07 03:50:01

接受的答案给出了gpu的数量，但它也分配了这些gpu上的所有内存。可以通过在调用device_lib.list_local_devices()之前创建具有固定低内存的会话来避免这种情况，这对于某些应用程序来说可能是不需要的。

我最终使用nvidia-smi来获得gpu的数量，而不分配任何内存。

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

2018-10-12 04:22:29

如何在tensorflow中获得当前可用的gpu ?

推荐文章

最新文章

标签