Environment Config

裂谷
• 阅读 2113

Environment Config

To config a computer for deep learning or deep reinforcement learning, we install cuda, cudnn, torch and so on.
There may be some problems during install this software. I record my process of configuring the DL environment. My
computer is a DELL PRECISION TOWER 7810 working station with Ubuntu 16.04 OS and Quadro VGA controller with M5000 GPU.

Anaconda and Pycharm

Conda

Installation

All you need to install conda is here.
This tutorial is in Chinese for your reference.

To increase the speed for conda install, you should modify the download source for conda.

Source Channel

You could use

conda config --show or conda config --show channels to check the source channels.

Use conda config --remove <channel name> to remove a channel.

To add Tsinghua Souce channel you need the following command:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes

or you could edit the source channel in .condarc. This file is usually exists in $HOME. You could find it by using
sudo find / -name '.condarc'
After that, conda config --set show_channel_urls yes is need to show the download url for every installation.

Here is a good tutorial for this work

Create Virtual Environment
  • Show current environments: conda env list or conda info --envs.
  • Create new environments conda create -n <env_name> python=3.7.
  • Remove environments conda remove -n <env name> --all
  • Activate environments conda activate <env name>
  • Deactivate environments conda deactivate

A good blog

Pycharm-community

Official Tutorial.

Torch ,TF, CUDA and cudnn

The first thing you need to do is to make sure the match of the versions among all of these softwares.
The first step is to check the CUDA version corresponding with pytorch
and tensorflow match.

Environment Config

Environment Config

The second step is to verify the nvidia driver version corresponding with CUDA. See [**CUDA and
nvidia-driver match**](https://docs.nvidia.com/cuda/...

Environment Config

Then, you need to make sure the the cudnn version corresponding with CUDA. This can be seen in
cudnn.

Environment Config

The version on my machine are as follows:

software version
torch 1.4
CUDA 10.1
nvidia driver 418
cudnn 7.6
tensorflow-gpu 2.1

After these, you can start install them.

Torch

The installation command depends on what virtual environment you are using. Refer pytorch
for exact command.

Tensorflow

You are recommanded to install tensorflow-2.1. The differences between version 2.0 and version 2.1 are big. You
should always use tools in the newest stable version.

pip install tensorflow==2.1
pip install tensorflow-gpu==2.1

When you import tensorflow, you may face the following warning:

2020-01-20 11:46:50.881093: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib

2020-01-20 11:46:50.881169: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/protobuf/lib:/usr/local/lib

2020-01-20 11:46:50.881178: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

It is only a warning and will not affect your usage.

However, you should notice that version mismatch problem will not warn you in tf but it will do in torch. Thus
the version match work is very important.

Some blog you could refer

-install tf2

Nvidia driver

You can use the following command to check the corresponding driver for your machine

ubuntu-drivers devices

Then, you can use command to install the nvidia driver.

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-418

Some useful commands:

  • see the info of VGA driver:

lspci |grep VGA

  • see the info of nvidia VGA hard ware:

lspci |grep -i nvidia

CUDA

You can follow the tutorial in homepage of CUDA.
But you could only get the latest version of CUDA.
For history version, you need to visit history release.
For version 10.1, you can get it here.

Then, you can follow the command as follows:

sudo dpkg -i cuda-repo-ubuntu1604-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

Then you need to add CUDA to your environment varibale.
In the ~/.bashrc, add the following context.

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

Then, source ~/.bashrc will finish the work.

In fact, it would be convient to install CUDA. However, I made a mistake during my procedure.
I tried to install 10.2 first and shut down before the last step. However, dpkg record the
package in its memory. To install 10.1, you need to run the following command first.

dpkg -r cuda-repo-<version>
dpkg -P cuda-repo-<version>

You could watch the nvidia driver using:

nvidia-smi or watch -n 10 nvidia-smi

If the error is

Failed to initialize NVML: Driver/library version mismatch

This is because the kernel module of the nvidia is mismatch with current driver version. Under this condition.
restarting the machine is a good choice.

Then, you can see (the version is wrong because I can't get my working station now)

Environment Config

Some useful commands:

  • see the version of CUDA:

cat /usr/local/cuda/version.txt

cudnn

It is very easy to install cudnn. Here, I recommand you to install cudnn use tar rather than deb.

First, download it from cudnn.
Then, run the following command:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Some useful commands:

  • see the version of cudnn:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

There are some method you could refer

DL Dependencies

You need tensorboardX, sciki-image, seaborn, matplotlib and so on. Some of them may be have been installed
during installation of Torch or Tensorflow, otherwise you need to conda install them manually.

tensorflow-probability

Firstly, do the version match. tfp

For tf 2.1, the required tfp version is 0.9.

pip install tensorflow-probability==0.9

ffmpeg

sudo apt-get install ffmpeg

add export PATH=/usr/local/ffmpeg/bin:$PATH in ~/.bashrc.

DRL Suites

OpenAI baselines

Clone the source code and follow the tutorial.
Use pip install -e ., you could install the baselines.

OpenAI gym

You should note that the OpenAI gym could also be installed. You don't need to install it again for the reason
that there may be a version missmatch.

However you could still follow gym to install it.

Mujoco and mujoco-py

It is also esay to install them if you are lucky.

Mujoco

You could get a 30 days trial license for mujoco for one machine.
An e-mail could get three machines. The trial is necessary because sometimes you can't install mujoco-py anyway.

Register your computer and get the license. For your computer id, download the getid file and then:

chmod +x getid
./getid

Download product first, for the mujoco version, you should see the mujoco-py for
version support.

Then

$ mkdir ~/.mujoco 
$ cp mujoco200_linux.zip ~/.mujoco 
$ cd ~/.mujoco 
$ unzip mujoco200_linux.zip
$ cp -r mujoco200_linux mujoco200

the last line is because the mujoco_py will need the directory name without linux.

Copy license

$ cp mjkey.txt ~/.mujoco 
$ cp mjkey.txt ~/.mujoco/mujoco200/bin

Environment variable, edit ~/.bashrc and add the following command in it. Then source ~/.bashrc.

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

Testing

$ cd ~/.mujoco/mujoco200_linux/bin 
$ ./simulate ../model/humanoid.xml

You will see.
Environment Config

For some remote machine, you will not the this for the limit of hardware, but for some you could see it.

mujoco-py

download source code git clone https://github.com/openai/mujoco-py.git.

Install patchelf, this is for the lG.

$ sudo curl -o /usr/local/bin/patchelf https://s3-us-west-2.amazonaws.com/openai-sci-artifacts/manual-builds/patchelf_0.9_amd64.elf 
$ sudo chmod +x /usr/local/bin/patchelf

Install gcc dependencies:

sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3

Some other dependencies

$ cd ~/mujoco-py
$ cp requirements.txt requirements.dev.txt ./mujoco_py
$ cd mujoco_py
$ pip install -r requirements.txt
$ pip install -r requirements.dev.txt

Installation

$ cd ~/mujoco-py/vendor 
$ ./Xdummy-entrypoint 
$ cd .. 
$ python setup.py install

Testing, import mujoco_py, for the first time it will compile some file. If you face the gcc error, infer the
trouble shooting in mujoco-py. If this could not help you, may be you need
change another computer.

dm-control

Another control environment which regardless of mujoco_py. The directory of mujoco for dm_control is ~/.mujoco/mujoco200_linux/, thus you need to copy another directory
of mujoco:

$ cd ~/.mujoco
$ cp -r mujoco200 mujoco200_linux

Then you could install dm_control

$ pip install dm_control

One thing you need to notice is that the visual tools used is OpenGL EGL.
First, you need to pip install pyopengl. Then, you need to export PYOPENGLPLATFORM=egl.
By this way, you could use dm_control.

点赞
收藏
评论区
推荐文章
blmius blmius
4年前
MySQL:[Err] 1292 - Incorrect datetime value: ‘0000-00-00 00:00:00‘ for column ‘CREATE_TIME‘ at row 1
文章目录问题用navicat导入数据时,报错:原因这是因为当前的MySQL不支持datetime为0的情况。解决修改sql\mode:sql\mode:SQLMode定义了MySQL应支持的SQL语法、数据校验等,这样可以更容易地在不同的环境中使用MySQL。全局s
Oracle 分组与拼接字符串同时使用
SELECTT.,ROWNUMIDFROM(SELECTT.EMPLID,T.NAME,T.BU,T.REALDEPART,T.FORMATDATE,SUM(T.S0)S0,MAX(UPDATETIME)CREATETIME,LISTAGG(TOCHAR(
Wesley13 Wesley13
4年前
MySQL部分从库上面因为大量的临时表tmp_table造成慢查询
背景描述Time:20190124T00:08:14.70572408:00User@Host:@Id:Schema:sentrymetaLast_errno:0Killed:0Query_time:0.315758Lock_
皕杰报表之UUID
​在我们用皕杰报表工具设计填报报表时,如何在新增行里自动增加id呢?能新增整数排序id吗?目前可以在新增行里自动增加id,但只能用uuid函数增加UUID编码,不能新增整数排序id。uuid函数说明:获取一个UUID,可以在填报表中用来创建数据ID语法:uuid()或uuid(sep)参数说明:sep布尔值,生成的uuid中是否包含分隔符'',缺省为
Wesley13 Wesley13
4年前
FLV文件格式
1.        FLV文件对齐方式FLV文件以大端对齐方式存放多字节整型。如存放数字无符号16位的数字300(0x012C),那么在FLV文件中存放的顺序是:|0x01|0x2C|。如果是无符号32位数字300(0x0000012C),那么在FLV文件中的存放顺序是:|0x00|0x00|0x00|0x01|0x2C。2.  
Wesley13 Wesley13
4年前
mysql设置时区
mysql设置时区mysql\_query("SETtime\_zone'8:00'")ordie('时区设置失败,请联系管理员!');中国在东8区所以加8方法二:selectcount(user\_id)asdevice,CONVERT\_TZ(FROM\_UNIXTIME(reg\_time),'08:00','0
Wesley13 Wesley13
4年前
PHP创建多级树型结构
<!lang:php<?php$areaarray(array('id'1,'pid'0,'name''中国'),array('id'5,'pid'0,'name''美国'),array('id'2,'pid'1,'name''吉林'),array('id'4,'pid'2,'n
Wesley13 Wesley13
4年前
Java日期时间API系列36
  十二时辰,古代劳动人民把一昼夜划分成十二个时段,每一个时段叫一个时辰。二十四小时和十二时辰对照表:时辰时间24时制子时深夜11:00凌晨01:0023:0001:00丑时上午01:00上午03:0001:0003:00寅时上午03:00上午0
Wesley13 Wesley13
4年前
00:Java简单了解
浅谈Java之概述Java是SUN(StanfordUniversityNetwork),斯坦福大学网络公司)1995年推出的一门高级编程语言。Java是一种面向Internet的编程语言。随着Java技术在web方面的不断成熟,已经成为Web应用程序的首选开发语言。Java是简单易学,完全面向对象,安全可靠,与平台无关的编程语言。
Stella981 Stella981
4年前
Django中Admin中的一些参数配置
设置在列表中显示的字段,id为django模型默认的主键list_display('id','name','sex','profession','email','qq','phone','status','create_time')设置在列表可编辑字段list_editable
Python进阶者 Python进阶者
2年前
Excel中这日期老是出来00:00:00,怎么用Pandas把这个去除
大家好,我是皮皮。一、前言前几天在Python白银交流群【上海新年人】问了一个Pandas数据筛选的问题。问题如下:这日期老是出来00:00:00,怎么把这个去除。二、实现过程后来【论草莓如何成为冻干莓】给了一个思路和代码如下:pd.toexcel之前把这