H.T.U Tensorflow - refraction-ray/TH2-demos GitHub Wiki
在天河HPC 上使用 tensorflow
-- LiJIang 20161215
目前在GPU 分区(LN41) 部署了最新的TensorFlow
-- LiJiang 20170623
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ module avail tensorflow
---------------------------- /WORK/app/modulefiles -----------------------------
tensorflow/0.10.0rc0 tensorflow/0.11.0
[nscc-gz_jiangli@ln2%tianhe2-C ~]$
可以看到目前部署了两个版本的 tensorflow/0.10.0rc0 和 tensorflow/0.11.0
使用方式略有区别.
我在 /WORK/app/tensorflow/test/hello.py 里写了个简单的测试脚本:
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ cat /WORK/app/tensorflow/test/hello.py
#!/usr/bin/env python
import tensorflow as tf
hello = tf.constant("Hello, TensorFLow")
sess = tf.Session()
print ( sess.run(hello) )
a = tf.constant(10)
b = tf.constant(32)
print ( sess.run(a+b) )
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ module load tensorflow/0.10.0rc0
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ which python
/WORK/app/TensorFlow/anaconda2/bin/python
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ python --version
Python 2.7.12 :: Anaconda 4.2.0 (64-bit)
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ yhrun -n 1 python /WORK/app/tensorflow/test/hello.py
Hello, TensorFLow
42
如上所示,直接 moule load 即可使用
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ module load tensorflow/0.11.0
####################################################################
To use TensorFlow you have to activate the conda environment by executing these two commands:
1. settf
2. source activate tensorflow_0_11_0
(tensorflow)$ # Your prompt should change.
# Run Python programs that use TensorFlow.
# ...
# When you are done using TensorFlow, deactivate the environment.
(tensorflow)$ source deactivate tensorflow_0_11_0
####################################################################
[nscc-gz_jiangli@ln2%tianhe2-C ~]$
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ yhrun -n 1 python /WORK/app/tensorflow/test/hello.py
Traceback (most recent call last):
File "/WORK/app/tensorflow/test/hello.py", line 2, in <module>
import tensorflow as tf
ImportError: No module named tensorflow
yhrun: error: cn11642: task 0: Exited with exit code 1
如上所示,直接module load 还不行,还需要进行一些操作:
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ settf
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ source activate tensorflow_0_11_0
(tensorflow_0_11_0) [nscc-gz_jiangli@ln2%tianhe2-C ~]
(tensorflow_0_11_0) [nscc-gz_jiangli@ln2%tianhe2-C ~]$ yhrun -n 1 python /WORK/app/tensorflow/test/hello.py
Traceback (most recent call last):
File "/WORK/app/tensorflow/test/hello.py", line 2, in <module>
import tensorflow as tf
File "/WORK/app/anaconda/4.2.0/envs/tensorflow_0_11_0/lib/python2.7/site-packages/tensorflow/__init__.py", line 23, in <module>
from tensorflow.python import *
File "/WORK/app/anaconda/4.2.0/envs/tensorflow_0_11_0/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/WORK/app/anaconda/4.2.0/envs/tensorflow_0_11_0/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
_pywrap_tensorflow = swig_import_helper()
File "/WORK/app/anaconda/4.2.0/envs/tensorflow_0_11_0/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /WORK/app/anaconda/4.2.0/envs/tensorflow_0_11_0/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so)
yhrun: error: cn5987: task 0: Exited with exit code 1
这个版本的Tensorflow 直接提交似乎有些问题。 不过经测试可以登录节点后使用:
[nscc-gz_jiangli@ln2%tianhe2-C ~]$ yhalloc
yhalloc: Granted job allocation 3907141
[nscc-gz_jiangli@ln2 ~]$ yhq
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3907141 work bash nscc-gz_jian R 0:02 1 cn5060
[nscc-gz_jiangli@ln2 ~]$ ssh cn5060
Warning: Permanently added 'cn5060' (RSA) to the list of known hosts.
[nscc-gz_jiangli@cn5060%tianhe2-C ~]$ module load tensorflow/0.11.0
####################################################################
To use TensorFlow you have to activate the conda environment by executing these two commands:
1. settf
2. source activate tensorflow_0_11_0
(tensorflow)$ # Your prompt should change.
# Run Python programs that use TensorFlow.
# ...
# When you are done using TensorFlow, deactivate the environment.
(tensorflow)$ source deactivate tensorflow_0_11_0
####################################################################
[nscc-gz_jiangli@cn5060%tianhe2-C ~]$ settf
[nscc-gz_jiangli@cn5060%tianhe2-C ~]$ source activate tensorflow_0_11_0
(tensorflow_0_11_0) [nscc-gz_jiangli@cn5060%tianhe2-C ~]$ python /WORK/app/tensorflow/test/hello.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] Couldn't open CUDA library libcuda.so.1. LD_LIBRARY_PATH: /WORK/app/CUDA/8.0/libnvvp:/WORK/app/CUDA/8.0/libnsight:/WORK/app/CUDA/8.0/lib64:/WORK/app/CUDA/8.0/lib:/WORK/app/cudnn/5.1-CUDA8.0/lib64:/WORK/app/gcc/4.9.2/lib64:/WORK/app/gcc/4.9.2/lib:/WORK/app/gcc/4.9.2/libexec:/WORK/app/mpc/0.8.1/lib:/WORK/app/MPFR/2.4.2/lib:/WORK/app/gmp/4.3.2/lib:/opt/intel/mic/coi/host-linux-release/lib
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:160] hostname: cn5060
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:185] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] kernel reported version is: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1080] LD_LIBRARY_PATH: /WORK/app/CUDA/8.0/libnvvp:/WORK/app/CUDA/8.0/libnsight:/WORK/app/CUDA/8.0/lib64:/WORK/app/CUDA/8.0/lib:/WORK/app/cudnn/5.1-CUDA8.0/lib64:/WORK/app/gcc/4.9.2/lib64:/WORK/app/gcc/4.9.2/lib:/WORK/app/gcc/4.9.2/libexec:/WORK/app/mpc/0.8.1/lib:/WORK/app/MPFR/2.4.2/lib:/WORK/app/gmp/4.3.2/lib:/opt/intel/mic/coi/host-linux-release/lib
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1081] failed to find libcuda.so on this system: Failed precondition: could not dlopen DSO: libcuda.so.1; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:491] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:140] kernel driver does not appear to be running on this host (cn5060): /proc/driver/nvidia/version does not exist
Hello, TensorFLow
42
(tensorflow_0_11_0) [nscc-gz_jiangli@cn5060%tianhe2-C ~]$
输出的信息有些多,但是计算还是正常进行了。
- 如果要使用GPU,请使用相应的分区
- 希望有用户能协助提供性能测试数据或算例