如何在tensorflow中获得当前可用的gpu ?

我有一个使用分布式TensorFlow的计划，我看到TensorFlow可以使用gpu进行训练和测试。在集群环境中，每台机器可能有0个或1个或多个gpu，我想在尽可能多的机器上运行我的TensorFlow图。

我发现当运行tf.Session()时，TensorFlow在日志消息中给出了关于GPU的信息，如下所示:

I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)

我的问题是如何从TensorFlow获取当前可用GPU的信息?我可以从日志中获得加载的GPU信息，但我想以一种更复杂的编程方式来实现。我也可以故意使用CUDA_VISIBLE_DEVICES环境变量限制GPU，所以我不想知道从OS内核获取GPU信息的方法。

简而言之，我想要一个函数像tf.get_available_gpu()将返回['/gpu:0'， '/gpu:1']如果有两个gpu可用的机器。我如何实现这个?

当前回答

tensorflow推荐的最新版本:

tf.config.list_physical_devices('GPU')

2021-12-14 10:47:46

其他回答

我在我的机器上有一个名为NVIDIA GTX GeForce 1650 Ti的GPU, tensorflow-gpu==2.2.0

运行以下两行代码:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

输出:

Num GPUs Available:  1

2020-05-30 10:57:00

您可以使用以下代码字段来显示设备名称、类型、内存和位置。

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

2023-01-13 06:50:12

接受的答案给出了gpu的数量，但它也分配了这些gpu上的所有内存。可以通过在调用device_lib.list_local_devices()之前创建具有固定低内存的会话来避免这种情况，这对于某些应用程序来说可能是不需要的。

我最终使用nvidia-smi来获得gpu的数量，而不分配任何内存。

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

2018-10-12 04:22:29

在TensorFlow Core v2.3.0中，以下代码应该可以工作。

import tensorflow as tf
visible_devices = tf.config.get_visible_devices()
for devices in visible_devices:
  print(devices)

根据您的环境，这段代码将产生流动的结果。

PhysicalDevice (name = / physical_device: CPU: 0, device_type = CPU) PhysicalDevice (name = / physical_device: GPU: 0, device_type = GPU)

2020-11-19 07:58:03

您可以使用以下代码检查所有设备列表:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

2017-07-19 06:52:44

如何在tensorflow中获得当前可用的gpu ?

推荐文章

最新文章

标签