如何用SSE4.2和AVX指令编译Tensorflow ?

这是运行脚本检查Tensorflow是否工作时收到的消息:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

我注意到它提到了SSE4.2和AVX，

什么是SSE4.2和AVX? 这些SSE4.2和AVX如何提高Tensorflow任务的CPU计算。如何使用这两个库使Tensorflow编译?

当前回答

这些是SIMD矢量处理指令集。

对于许多任务来说，使用矢量指令更快;机器学习就是这样一项任务。

引用tensorflow安装文档:

为了与尽可能多的机器兼容，TensorFlow默认只在x86机器上使用SSE4.1 SIMD指令。大多数现代pc和mac都支持更高级的指令，所以如果您正在构建一个只在您自己的机器上运行的二进制文件，您可以在bazel构建命令中使用——copt=-march=native来启用这些指令。

2016-12-29 21:28:01

其他回答

要隐藏这些警告，可以在实际代码之前执行此操作。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

2017-08-12 18:44:53

使用SSE4.2和AVX编译TensorFlow，可以直接使用

Bazel build -config=mkl ——配置= "选择" ——科普特人=“3 = broadwell” ——科普特人= " o3 " / / tensorflow /工具/ pip_package: build_pip_package

来源: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-cpu-mkl

2018-06-16 08:46:23

2.0兼容方案:

在终端(Linux/MacOS)或命令提示符(Windows)中执行以下命令，使用Bazel安装Tensorflow 2.0:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow

#The repo defaults to the master development branch. You can also checkout a release branch to build:
git checkout r2.0

#Configure the Build => Use the Below line for Windows Machine
python ./configure.py 

#Configure the Build => Use the Below line for Linux/MacOS Machine
./configure
#This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options. 

#Build Tensorflow package

#CPU support
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package 

#GPU support
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package

2019-11-28 12:04:44

我编译了一个小型的Mac Bash脚本(很容易移植到Linux)来检索所有CPU特性，并应用其中的一些来构建TF。我在TF大师和使用有点经常(一对夫妇在一个月)。

https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f

2017-08-18 06:04:43

我先回答你的第三个问题:

如果您想在conda-env中运行一个自行编译的版本，可以这样做。这些是我运行的获取tensorflow并将其安装到我的系统上的一般指令。注意:这个版本是AMD A10-7850版本(检查你的CPU支持什么指令…它可能不同)运行Ubuntu 16.04 LTS。我在conda-env中使用Python 3.5。链接到tensorflow源安装页面和上面提供的答案。

git clone https://github.com/tensorflow/tensorflow 
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU 
# support. Also. If resources are limited consider adding the following 
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2  -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.

关于第二个问题:

在我看来，一个带有优化的自编译版本是非常值得努力的。在我的设置中，以前需要560-600秒的计算现在只需要300秒!虽然确切的数字会有所不同，但我认为在您的特定设置上，您可以期望大约35-50%的速度提高。

最后你的第一个问题:

上面已经给出了很多答案。总结一下:AVX、SSE4.1、SSE4.2、MFA是X86 cpu上不同种类的扩展指令集。许多都包含处理矩阵或向量运算的优化指令。

我将强调我自己的误解，希望能为您节省一些时间:并不是说SSE4.2是取代SSE4.1的指令的新版本。SSE4 = SSE4.1(一组47条指令)+ SSE4.2(一组7条指令)

在tensorflow编译的上下文中，如果你的计算机支持AVX2和AVX，以及SSE4.1和SSE4.2，你应该把这些优化标志都放进去。不要像我一样，认为SSE4.2更新，应该超越SSE4.1。这显然是错误的!我不得不重新编译，因为这花了我40分钟。

2017-03-30 03:27:20

如何用SSE4.2和AVX指令编译Tensorflow ?

推荐文章

最新文章

标签