Environment Config

To config a computer for deep learning or deep reinforcement learning, we install cuda, cudnn, torch and so on.
There may be some problems during install this software. I record my process of configuring the DL environment. My
computer is a DELL PRECISION TOWER 7810 working station with Ubuntu 16.04 OS and Quadro VGA controller with M5000 GPU.

Anaconda and Pycharm

Conda

Installation

All you need to install conda is here.
This tutorial is in Chinese for your reference.

To increase the speed for conda install, you should modify the download source for conda.

Source Channel

You could use

conda config --show or conda config --show channels to check the source channels.

Use conda config --remove <channel name> to remove a channel.

To add Tsinghua Souce channel you need the following command:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

or you could edit the source channel in .condarc. This file is usually exists in $HOME. You could find it by using
sudo find / -name '.condarc'
After that, conda config --set show_channel_urls yes is need to show the download url for every installation.

Here is a good tutorial for this work

Create Virtual Environment

Show current environments: conda env list or conda info --envs.
Create new environments conda create -n <env_name> python=3.7.
Remove environments conda remove -n <env name> --all
Activate environments conda activate <env name>
Deactivate environments conda deactivate

A good blog

Pycharm-community

Official Tutorial.

Torch ,TF, CUDA and cudnn

The first thing you need to do is to make sure the match of the versions among all of these softwares.
The first step is to check the CUDA version corresponding with pytorch
and tensorflow match.

Environment Config

The second step is to verify the nvidia driver version corresponding with CUDA. See [**CUDA and
nvidia-driver match**](https://docs.nvidia.com/cuda/...

Environment Config

Then, you need to make sure the the cudnn version corresponding with CUDA. This can be seen in
cudnn.

Environment Config

The version on my machine are as follows:

software	version
torch	1.4
CUDA	10.1
nvidia driver	418
cudnn	7.6
tensorflow-gpu	2.1

After these, you can start install them.

Torch

The installation command depends on what virtual environment you are using. Refer pytorch
for exact command.

Tensorflow

You are recommanded to install tensorflow-2.1. The differences between version 2.0 and version 2.1 are big. You
should always use tools in the newest stable version.

pip install tensorflow==2.1
pip install tensorflow-gpu==2.1

When you import tensorflow, you may face the following warning:

2020-01-20 11:46:50.881093: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib
2020-01-20 11:46:50.881169: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib

2020-01-20 11:46:50.881178: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

It is only a warning and will not affect your usage.

However, you should notice that version mismatch problem will not warn you in tf but it will do in torch. Thus
the version match work is very important.

Some blog you could refer

-install tf2

Nvidia driver

You can use the following command to check the corresponding driver for your machine

ubuntu-drivers devices

Then, you can use command to install the nvidia driver.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-418

Some useful commands:

see the info of VGA driver:

lspci |grep VGA

see the info of nvidia VGA hard ware:

lspci |grep -i nvidia

CUDA

You can follow the tutorial in homepage of CUDA.
But you could only get the latest version of CUDA.
For history version, you need to visit history release.
For version 10.1, you can get it here.

Then, you can follow the command as follows:

sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

Then you need to add CUDA to your environment varibale.
In the ~/.bashrc, add the following context.

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

Then, source ~/.bashrc will finish the work.

In fact, it would be convient to install CUDA. However, I made a mistake during my procedure.
I tried to install 10.2 first and shut down before the last step. However, dpkg record the
package in its memory. To install 10.1, you need to run the following command first.

dpkg -r cuda-repo-<version>
dpkg -P cuda-repo-<version>

You could watch the nvidia driver using:

nvidia-smi or watch -n 10 nvidia-smi

If the error is

Failed to initialize NVML: Driver/library version mismatch

This is because the kernel module of the nvidia is mismatch with current driver version. Under this condition.
restarting the machine is a good choice.

Then, you can see (the version is wrong because I can't get my working station now)

Environment Config

Some useful commands:

see the version of CUDA:

cat /usr/local/cuda/version.txt

cudnn

It is very easy to install cudnn. Here, I recommand you to install cudnn use tar rather than deb.

First, download it from cudnn.
Then, run the following command:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Some useful commands:

see the version of cudnn:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

There are some method you could refer

DL Dependencies

You need tensorboardX, sciki-image, seaborn, matplotlib and so on. Some of them may be have been installed
during installation of Torch or Tensorflow, otherwise you need to conda install them manually.

tensorflow-probability

Firstly, do the version match. tfp

For tf 2.1, the required tfp version is 0.9.

pip install tensorflow-probability==0.9

ffmpeg

sudo apt-get install ffmpeg

add export PATH=/usr/local/ffmpeg/bin:$PATH in ~/.bashrc.

DRL Suites

OpenAI baselines

Clone the source code and follow the tutorial.
Use pip install -e ., you could install the baselines.

OpenAI gym

You should note that the OpenAI gym could also be installed. You don't need to install it again for the reason
that there may be a version missmatch.

However you could still follow gym to install it.

Mujoco and mujoco-py

It is also esay to install them if you are lucky.

Mujoco

You could get a 30 days trial license for mujoco for one machine.
An e-mail could get three machines. The trial is necessary because sometimes you can't install mujoco-py anyway.

chmod +x getid
./getid

Download product first, for the mujoco version, you should see the mujoco-py for
version support.

Then

$ mkdir ~/.mujoco 
$ cp mujoco200_linux.zip ~/.mujoco 
$ cd ~/.mujoco 
$ unzip mujoco200_linux.zip
$ cp -r mujoco200_linux mujoco200

the last line is because the mujoco_py will need the directory name without linux.

Copy license

$ cp mjkey.txt ~/.mujoco 
$ cp mjkey.txt ~/.mujoco/mujoco200/bin

Environment variable, edit ~/.bashrc and add the following command in it. Then source ~/.bashrc.

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

Testing

$ cd ~/.mujoco/mujoco200_linux/bin 
$ ./simulate ../model/humanoid.xml

You will see.
Environment Config

For some remote machine, you will not the this for the limit of hardware, but for some you could see it.

mujoco-py

download source code git clone https://github.com/openai/mujoco-py.git.

Install patchelf, this is for the lG.

$ sudo curl -o /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf 
$ sudo chmod +x /usr/local/bin/patchelf

Install gcc dependencies:

sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3

Some other dependencies

$ cd ~/mujoco-py
$ cp requirements.txt requirements.dev.txt ./mujoco_py
$ cd mujoco_py
$ pip install -r requirements.txt
$ pip install -r requirements.dev.txt

Installation

$ cd ~/mujoco-py/vendor 
$ ./Xdummy-entrypoint 
$ cd .. 
$ python setup.py install

Testing, import mujoco_py, for the first time it will compile some file. If you face the gcc error, infer the
trouble shooting in mujoco-py. If this could not help you, may be you need
change another computer.

dm-control

Another control environment which regardless of mujoco_py. The directory of mujoco for dm_control is ~/.mujoco/mujoco200_linux/, thus you need to copy another directory
of mujoco:

$ cd ~/.mujoco
$ cp -r mujoco200 mujoco200_linux

Then you could install dm_control

$ pip install dm_control

One thing you need to notice is that the visual tools used is OpenGL EGL.
First, you need to pip install pyopengl. Then, you need to export PYOPENGLPLATFORM=egl.
By this way, you could use dm_control.