Install TensorFlow @ LN41 - refraction-ray/TH2-demos GitHub Wiki

使用说明：

用户可以通过module 加载或 source xxxx/active . 因为这是GPU 分区，就只配置 GPU 版本的 Module 了，参见Quick Example.

Quick Example

[nscc_ts_gpu@ln41%tianhe2-G ~]$ module avail TensorFlow

------------------------ /BIGDATA/app/modulefiles ---------------------------------------
TensorFlow/1.0.1-gpu-py2   TensorFlow/1.0.1-gpu-py3.6

[nscc_ts_gpu@ln41%tianhe2-G ~]$ module load TensorFlow/1.0.1-gpu-py3.6
[nscc_ts_gpu@ln41%tianhe2-G ~]$ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25d7160
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25dafa0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25dede0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
Version of TensorFlow : 1.0.1
b'Hello, TensorFlow!'
[nscc_ts_gpu@ln41%tianhe2-G ~]$

基本版

Py 2.7.9 , CPU ONLY , tensorflow 1.0.1

安装Python

解压python 安装包，配置并安装 ,配置环境变量

$ ./configure --prefix=/BIGDATA/app/Python/2.7.9  --with-ensurepip   --with-threads --enable-shared --enable-unicode=ucs4
$ make -j 12
$ make install
$ export PATH=/BIGDATA/app/Python/2.7.9/bin:$PATH
$ export LD_LIBRARY_PATH=/BIGDATA/app/Python/2.7.9/lib:$LD_LIBRARY_PATH

安装 TensorFlow

安装 virtualenv , 并创建环境并加装

$ pip install virtualenv
$ virtualenv --system-site-packages /BIGDATA/app/TensorFlow/python-venv/py2.9
$ source /BIGDATA/app/TensorFlow/python-venv/py2.9/bin/activate

安装 tensorflow 并测试

(py2.9) $ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl

(py2.9) $ python  /BIGDATA/app/TensorFlow/testtf.py
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Hello, TensorFlow!

可以看出 tensorflow 正常工作了，但 W 也指出了可以通过针对指令集编译 TensorFLow 的库来提高运算性能，这方面需要之后再探究了。

URL 等参考 TensorFlow 官方指南安装过程 URL

其他

第一次安装时直接 ./configure 安装的Python , 导致 tensorflow 安装完出现： undefined symbol: PyUnicodeUCS4_AsUTF8String , 后来参考 centos7.1搭建tensorflow-1.0.1运行环境

GPU 版

Py 2.7.9 , GPU , tensorflow 1.0.1

安装

即使下载了whl文件，安装过程中还是会有网络请求。

$ virtualenv --system-site-packages /BIGDATA/app/TensorFlow/python-venv/py2.9-gpu
$ source /BIGDATA/app/TensorFlow/python-venv/py2.9-gpu/bin/activate
(py2.9-gpu) $ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp27-none-linux_x86_64.whl

测试

(py2.9-gpu) $ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /BIGDATA/app/CUDA/8.0/lib64/stubs:/BIGDATA/app/CUDA/8.0/libnvvp:/BIGDATA/app/CUDA/8.0/libnsight:/BIGDATA/app/CUDA/8.0/lib64:/BIGDATA/app/CUDA/8.0/lib:/BIGDATA/app/Python/2.7.9/lib:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f489d0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f4c350
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f4fcd0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
('Version of TensorFlow :', '1.0.1')
Hello, TensorFlow!

~~从测试结果可以看出GPU 版本能正常工作，可以提高性能之处除了针对指令集的编译外还有使用 cuDNN DSO .~~ 目前已经在module 中加了 cudnn/5.1-CUDA8.0 ，可以正常利用cuDNN 了。

其他说明

如果需要在环境中添加PYTHON 包可以和我联系
如果需要的PYTHON 包太多，特别是有很多非通用的包的话可以在自己的账号目录下创建 virtualenv 环境
自己的PYTHON 环境需要通过 PIP 安装包时可以联系我使用PROXY.

PY3 版本

安装过程和之前过程一致,安装GPU 版，直接上history ：

$ wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz
$ tar xvf Python-3.6.1.tgz
$ cd Python-3.6.1/
$ ./configure --prefix=/BIGDATA/app/Python/3.6.1  --with-ensurepip   --with-threads --enable-shared --enable-unicode=ucs4
$ make -j 12
$ make install
$ export PATH=/BIGDATA/app/Python/3.6.1/bin:$PATH
$ export LD_LIBRARY_PATH=/BIGDATA/app/Python/3.6.1/lib:$LD_LIBRARY_PATH
$ which python3
/BIGDATA/app/Python/3.6.1/bin/python3
$ which pip3
/BIGDATA/app/Python/3.6.1/bin/pip3
$ pip3 install pandas scipy scikit-learn virtualenv
$ which virtualenv
/BIGDATA/app/Python/3.6.1/bin/virtualenv
$ mkdir -p /BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu
$ source /BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/activate
(py3.6.1-gpu) $ which python
/BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/python
(py3.6.1-gpu) $ python --version
Python 3.6.1
(py3.6.1-gpu) $ which pip
/BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/pip
(py3.6.1-gpu) $ pip install  --upgrade  https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp36-cp36m-linux_x86_64.whl

测试


(py3.6.1-gpu) $ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b1e010
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b21e50
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b25c90
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
Version of TensorFlow : 1.0.1
b'Hello, TensorFlow!'
(py3.6.1-gpu) $