JetsonTX2 Tensorflow - eiichiromomma/CVMLAB GitHub Wiki

Jetson TX2にTensorflow 1.2を入れる (元ネタ)

ソースや依存ファイルの準備

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow/
git checkout v1.2.1
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install zip unzip autoconf automake libtool curl zlib1g-dev maven -y
sudo apt-get install python-numpy swig python-dev python-pip python-wheel -y

Bazelのビルド

BazelのReleasesから最新版(ここでは0.5.2)のソースのzipを持ってくる。ビルドにはかなり時間が掛かる。

cd
mkdir bazel
cd bazel
unzip ../Downloads/bazel-0.5.2-dist.zip
./compile.sh
sudo cp output/bazel /usr/local/bin/

ストレージを圧迫するのでbazelのソース類は削除。

cd
rm -rf bazel

Swapfileの作成

TX2はメモリが8MBと貧弱なのでスワップファイルを作っておく。これでビルド中に死ぬことが無くなる。

fallocate -l 8G swapfile
chmod 600 swapfile
mkswap swapfile
sudo swapon swapfile
swapon -s

Tensorflowのビルド

ファイルの修正

cuda_gpu_executor.ccとworkspace.bzlの変更が必要。

diff --git a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
index c1e72bb..a829c81 100644
--- a/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
+++ b/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc
@@ -853,6 +853,8 @@ CudaContext* CUDAExecutor::cuda_context() { return context_; }
 // For anything more complicated/prod-focused than this, you'll likely want to
 // turn to gsys' topology modeling.
 static int TryToReadNumaNode(const string &pci_bus_id, int device_ordinal) {
+LOG(INFO) << "ARM has no NUMA node, hardcoding to return zero";
+return 0;
 #if defined(__APPLE__)
   LOG(INFO) << "OS X does not support NUMA - returning NUMA node zero";
   return 0;
diff --git a/tensorflow/workspace.bzl b/tensorflow/workspace.bzl
index 2a206b0..c060fce 100644
--- a/tensorflow/workspace.bzl
+++ b/tensorflow/workspace.bzl
@@ -150,11 +150,15 @@ def tf_workspace(path_prefix="", tf_repo_name=""):
   native.new_http_archive(
       name = "eigen_archive",
       urls = [
-          "http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
-          "https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
-      ],
-      sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
-      strip_prefix = "eigen-eigen-f3a22f35b044",
+"http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/d781c1de9834.tar.gz",
+"https://bitbucket.org/eigen/eigen/get/d781c1de9834.tar.gz",
+          #"http://mirror.bazel.build/bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
+          #"https://bitbucket.org/eigen/eigen/get/f3a22f35b044.tar.gz",
+      ],
+sha256 = "a34b208da6ec18fa8da963369e166e4a368612c14d956dd2f9d7072904675d9b",
+strip_prefix = "eigen-eigen-d781c1de9834",
+      #sha256 = "ca7beac153d4059c02c8fc59816c82d54ea47fe58365e8aded4082ded0b820c4",
+#      strip_prefix = "eigen-eigen-f3a22f35b044",
       build_file = str(Label("//third_party:eigen.BUILD")),
   )
 

実際にやったのは以下の手順。configureのタイミングがこれで良いかは不明

cd tensorflow
vim tensorflow/stream_executor/cuda/cuda_gpu_executor.cc 
./configure
vim tensorflow/workspace.bzl

bazelでビルド。2時間強掛かる。

bazel build -c opt --local_resources 3072,4.0,1.0 --verbose_failures --config=cuda //tensorflow/tools/pip_package:build_pip_package

Tensorflowのインストール

pip packageの作成。これは直ぐできる。

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
mv /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_aarch64.whl $HOME/
pip install ../tensorflow-1.2.1-cp27-cp27mu-linux_aarch64.whl 

動作確認

すんなりと動いてくれそうなところだが、mnistのデータ取得で以下のエラーが出る。

Traceback (most recent call last):
  File "mnist_with_summaries.py", line 211, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "mnist_with_summaries.py", line 186, in main
    train()
  File "mnist_with_summaries.py", line 41, in train
    fake_data=FLAGS.fake_data)
  File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 235, in read_data_sets
    SOURCE_URL + TRAIN_IMAGES)
  File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 208, in maybe_download
    temp_file_name, _ = urlretrieve_with_retry(source_url)
  File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 165, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/nvidia/.local/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 190, in urlretrieve_with_retry
    return urllib.request.urlretrieve(url, filename)
  File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 443, in open_https
    h.endheaders(data)
  File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 897, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 859, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1278, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python2.7/ssl.py", line 353, in wrap_socket
    _context=self)
  File "/usr/lib/python2.7/ssl.py", line 601, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 830, in do_handshake
    self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

Tensorflowの問題ではなく配布サイトの問題なのでwgetで no-check-certificate を付けて拾ってくる。

cd tensorflow/examples/tutorials/mnist
mkdir MNIST_data
cd MNIST_data
wget --no-check-certificate  https://github.com/HIPS/hypergrad/raw/master/data/mnist/mnist_data.pkl 
wget --no-check-certificate  https://github.com/HIPS/hypergrad/raw/master/data/mnist/t10k-images-idx3-ubyte.gz
wget --no-check-certificate  https://github.com/HIPS/hypergrad/raw/master/data/mnist/t10k-labels-idx1-ubyte.gz
wget --no-check-certificate  https://github.com/HIPS/hypergrad/raw/master/data/mnist/train-images-idx3-ubyte.gz
wget --no-check-certificate  https://github.com/HIPS/hypergrad/raw/master/data/mnist/train-labels-idx1-ubyte.gz

で、後はdata_dirを指定して実行できる。

python mnist_deep.py --data_dir=MNIST_data/

ちなみにtegrastatusをsudo付きで実行すると、GPUの使用状況が不親切ながらも分かる。(GR3D 89@1122みたいに出てきて、89%稼働率で1.122GHzらしい)

sudo ./tegrastatus
⚠️ **GitHub.com Fallback** ⚠️