Azure で tensorflow-gpu ~tensorflow-gpu 導入編~ - IMACEL Academy -人工知能・画像解析の技術応用に向けて-| エルピクセル株式会社

はじめに

今回は，Azure で tensorflow-gpu ~仮想マシン作成編~ (https://lp-tech.net/articles/XjVg6) で作成した仮想マシンに tensorflow-gpu を導入する手順についての記事となります．

tensorflow-gpu をインストールの手順は，大きく４つの項目に分けられます．
・Anaconda をインストール (https://bit.ly/2O9zMsK)
・CUDA Toolkit 9.0 をインストール (https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
・cuDNN をインストール (https://developer.nvidia.com/cudnn)
・tensorflow-gpu をインストール (https://www.tensorflow.org/install/install_linux)

．

Anaconda をインストール

まず，下記のコマンドを実行して Anaconda をインストールしましょう．

$ wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
$ sh Anaconda3-5.2.0-Linux-x86_64.sh

#---- 以下は実行結果です．

Please, press ENTER to continue
>>> ⏎

Do you accept the license terms? [yes|no]
[no] >>> yes

Anaconda3 will now be installed into this location:
/home/taiki/anaconda3
    - Press ENTER to confirm the location
    - Press CTRL-C to abort the installation
    - Or specify a different location below
[/home/taiki/anaconda3] >>> ⏎

Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/taiki/.bashrc ? [yes|no]
[no] >>> yes

Do you wish to proceed with the installation of Microsoft VSCode? [yes|no]
>>> no

Install anaconda

CUDA Toolkit 9.0 をインストール

仮想マシンの GPU がCUDA に対応するかを確認するため，下記のコマンドを実行します．GPU が http://developer.nvidia.com/cuda-gpus のリストの中にあれば，仮想マシンのGPU は CUDA に対応しています．今回使用した仮想マシンの GPU は Tesla M60 でした．

$ lspci | grep -i nvidia

#---- 以下は実行結果です．

18e4:00:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

Verify You Have a CUDA-Capable GPU

CUDA Toolkit を利用する際に，gcc compiler が必要になります．なので，以下のコマンドを実行して gcc がインストールされているかを確認しましょう．今回は，インストールされていなかったようです．

$ gcc --version

#---- 以下は実行結果です．

Command 'gcc' not found, but can be installed with:
sudo apt install gcc

Verify the System Has gcc Installed

以下のコマンドを実行して gcc をインストールしましょう．

$ sudo apt install gcc

Install gcc

CUDA ドライバのインストール時に，kernel header と開発パッケージをインストールする必要があります．そこで，下記のコマンドを実行して，インストールを行いましょう．今回は，既に最新バージョンがインストールされていたようです．

$ sudo apt-get install linux-headers-$(uname -r)

#---- 以下は実行結果です．

Reading package lists... Done
Building dependency tree
Reading state information... Done
linux-headers-4.15.0-1022-azure is already the newest version (4.15.0-1022.23).
linux-headers-4.15.0-1022-azure set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 26 not upgraded.

Verify the System has the Correct Kernel Headers and Development Packages Installed

そして，以下のコマンドを上から順に実行すると，CUDA のインストールおよびパスの指定が完了します．

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo dpkg -i cuda-repo-ubuntu1604_9.0.176-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda-9-0
$ export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Install CUDA

CUDA が適切にインストールされているかを確認するため，下記のコマンドを実行します．ここで，CUDA 9.0 がインストールされていることを確認しましょう．

$ nvcc -V

#---- 以下は実行結果です．

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Verify the Driver Version

cuDNN をインストール

① 自身のローカルPCで，cuDNN のダウンロードページ (https://developer.nvidia.com/cuda-90-download-archive) に行き，"Download cuDNN >" をクリックします．

② 続いて，"Download cuDNN v7.2.1 (August 7, 2018), for CUDA 9.0" をクリックします．

③ そして，cuDNN v7.2.1 Runtime Library for Ubuntu16.04 (Deb)，cuDNN v7.2.1 Developer Library for Ubuntu16.04 (Deb)，cuDNN v7.2.1 Code Samples and User Guide for Ubuntu16.04 (Deb) をクリックし，ダウンロードを開始します．

以下３つのファイルが自身のローカルPCにダウンロードされたと思います．
libcudnn7_7.2.1.38-1+cuda9.0_amd64.deb
libcudnn7-dev_7.2.1.38-1+cuda9.0_amd64.deb
libcudnn7-doc_7.2.1.38-1+cuda9.0_amd64.deb

自身のローカル PC にダウンロードした libcudnn ファイルたちを，仮想マシンにアップロードする必要があるので，以下の scp コマンドを実行します．

$ scp 先程ダウンロードしたファイルが保存されているディレクトリ/libcudnn* taiki@パブリックIPアドレス:コピー先のディレクトリ

Send libcudnn files to the virtual machine

cd コマンドで，libcunn ファイルがあるディレクトリに移動し，以下のコマンドを実行すると，cuDNN のインストールが完了します．

$ sudo dpkg -i libcudnn7_7.2.1.38-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-dev_7.2.1.38-1+cuda9.0_amd64.deb
$ sudo dpkg -i libcudnn7-doc_7.2.1.38-1+cuda9.0_amd64.deb

Install cuDNN

tensorflow-gpu をインストール

最後に，以下のコマンドを実行することで，tensorflow-gpu のインストールが完了します．

$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.10.1-cp36-cp36m-linux_x86_64.whl

Install tensorflow-gpu

tensorflow-gpu が正しくインストールされていることを確認しましょう．以下のように，CPU と GPU が認識されていれば，tensorflow-gpu のインストールが適切に完了しています．

$ python3
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())

#---- 以下は実行結果です．

2018-09-12 19:09:01.915780: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-12 19:09:03.354013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla M60 major: 5 minor: 2 memoryClockRate(GHz): 1.1775
pciBusID: 22b1:00:00.0
totalMemory: 7.94GiB freeMemory: 7.86GiB
2018-09-12 19:09:03.354060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-12 19:09:03.646371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-12 19:09:03.646429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0
2018-09-12 19:09:03.646452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N
2018-09-12 19:09:03.646661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 7580 MB memory) -> physical GPU (device: 0, name: Tesla M60, pci bus id: 22b1:00:00.0, compute capability:
  5.2)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 10924952114733214568
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 7949005620
locality {
    bus_id: 1
    links {
    }
}
incarnation: 8147529738102065137
physical_device_desc: "device: 0, name: Tesla M60, pci bus id: 22b1:00:00.0, compute capability: 5.2"
]

ensure that TensorFlow is running by GPU or not by GPU