Home Tricks Summary 2020
Post
Cancel

Tricks Summary 2020

In the blog, I record several problems that I met with during coding. I hope the content is helpful.

Install Nvidia Driver CUDA and CUDNN on Ubuntu

Official installation tutorial link.

  1. Use Command Line: Tutorial

  2. Search for PPA which can be used to install packages by apt-get: link

    Following the steps:tutorial

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    
    # add ppa and find suitable nvidia-driver
    sudo add-apt-repository ppa:graphics-drivers/ppa
    sudo apt-get update
    apt-cache search nvidia-driver
    # sudo apt-get install nvidia-driver-version
    sudo apt-get install nvidia-440
    # another way to install nvidia driver
    ubuntu-drivers devices
    sudo ubuntu-drivers autoinstall
       
    # add keys for your specific ubuntu version, be careful about ubuntu1x04
    sudo apt-key adv --fetch-keys  http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
    # if error message:
    # using the command:
    # wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
    # sudo apt-key add 7fa2af80.pub
    sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
       
    sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda_learn.list'
       
    sudo apt-get update
    sudo apt install cuda-10-1
    sudo apt install libcudnn7
       
    # add the following codes to ~/.bashrc
    # set PATH for cuda installation
    if [ -d "/usr/local/cuda/bin/" ]; then
        export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
        export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    fi
       
    reboot -i
       
    # check settings
    nvcc --version
    /sbin/ldconfig -N -v $(sed 's/:/ /' <<< $LD_LIBRARY_PATH) 2>/dev/null | grep libcudnn
    nvidia-smi
    
  3. Reboot system

Then the NVIDIA driver will be installed smoothly. After that, we should install CUDA.

Check CUDA version: cat /usr/local/cuda/version.txt.

If we want to download package by ourselves, we should check the website first to find the right version for our system. The greatest way to install it is following the official guide.

There is also a supportive tutorial for installation cuda and cudnn for ubuntu.

Debian CUDA install

Recently, I need to install cuda11-1 and nvidia driver on debian 11. Here is something that I achieve based on the experience.

The key point is that the cuda package can be used smoothly on debian 11 from my testing.

The faster method is using sudo apt install nvidia-cuda-toolkit . However, it can just install the latest version. In my case, it will install cuda11-2 while torch cannot support such a new version.

Then I tried several methods on the internet. But the solution is just based on the previous method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-driver


sudo apt-key adv --fetch-keys  http://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub

sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

sudo apt-get update
sudo apt install cuda-11-1

if [ -d "/usr/local/cuda/bin/" ]; then
    export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
fi

reboot -i

Then just enjoy the new world.

Conda Usage

1. search available python version

1
conda search "^python$"

2. Create or Remove conda environment

1
2
conda remove --name myenv --all
conda create -n env python==3.6

Tensorflow Usage Problem Recording

1. Tensorflow does not use GPU

In the case, I actually installed cuda and Nvidia driver at first. So it is because I did not add cuda/bin and related library to the .bashrc.

By adding the following code to .bashrc file will solve the problem.

1
2
export PATH=/usr/local/cuda-10.2/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Besides, I think it is a useful tutorial for use Tensorflow on GPU.

Tensorflow2.1 is not useful for CUDA10.2 due to the lack of some libraries.

So I reinstall CUDA10.1 for tensorflow2.1-gpu.

Test tensorflow for GPU

1
2
import tensorflow as tf
tf.test.is_gpu_available(cuda_only=False,min_cuda_compute_capability=None)

2. Install TensorRT for Tensorflow2-GPU version

Official tutorial: TensorRT is not a must for tensorflow-gpu. It is just useful for speed up training process.

3. Change CUDA Version

1
2
3
ll /usr/local/cuda
# make a link for cuda. usage: ln -s source target
ln -s /usr/local/cuda-7.5 /usr/local/cuda

Explanation

4. Cannot visit tensorflow.org official website

In the case, the following step is useful for Mac. Linux should edit its specific hosts file.

  1. edit /private/etc/hosts
  2. add 64.233.188.121 www.tensorflow.org

5. Tensorflow GPU Allocation Problem

By default, tensroflow will use all gpu memories to update efficiency. The way to use specific memory is by following codes in two ways. More details are available on the link

tf.config.experimental.set_memory_growth:

1
2
3
4
5
6
7
8
9
10
11
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

tf.config.experimental.set_virtual_device_configuration:

1
2
3
4
5
6
7
8
9
10
11
12
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

6. Pytorch Jupyter set GPU

1
2
3
print(torch.cuda.device_count()) # list visible GPU
device = torch.device("cuda:6" if torch.cuda.is_available() else "cpu") # set GPU for device
model_ft = torch.nn.DataParallel(model_ft, device_ids=[0]) # set GPU for models

7. Data Loader Problem

1
2
raw_train_X = next(iter(train_dataloader))[0].numpy() # (100000, 3, 64, 64)
raw_train_Y = next(iter(train_dataloader))[1].numpy() # (100000, )

The aforementioned code is not right because next(iter(dataloader)) will re-output so the X and Y are not mapping.

8. Check label counts

1
2
unique, counts = np.unique(sy_train, return_counts=True)
print(counts)

9. Allocation problem

Warning: ensorflow/core/framework/allocator.cc:101] Allocation of X exceeds 10% of system memory

Solution: The main problem is the batch_size is too big. Sometimes the problem is on related settings like shuffle function intf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle().

10. How to use @tf.function [Need Digging into]

When I want to change the gradients when training by function train_step on tensorflow2, I found that if I did not use @tf.function before the train_step function, the accuracy grow very slow. But if I add @tf.function, the process becomes normal without considering optimizers and data type. The related code is here.

1
2
3
4
5
6
7
8
9
10
11
@tf.function
def fix_train_step(model, images, labels, all_mask):
  with tf.GradientTape() as tape:
    predictions = model(images)
    loss = loss_object(labels, predictions)
  gradients = tape.gradient(loss, model.trainable_variables)
  for i in range(len(all_mask)):
    gradients[i] = gradients[i] * all_mask[i]
  optimizer.apply_gradients(zip(gradients, model.trainable_variables))
  train_loss(loss)
  train_accuracy(labels, predictions)

11. Problem after suspending the Machine

After I suspend the ubuntu, I met with such problemfailed call to cuInit: CUDA_ERROR_UNKNOWN and I cannot use GPU. Rebooting it probably can solve the problem. As to my case, I reinstall nvidia-smi 440 solve the problem.

Related link

12. Save Model Problem in Tensorflow2

When I want to save a Keras subclass model, it will meet with the no bounded node error when I want to use tf.keras.models.Model to get middle layers’ output. So in tensorflow 2, the suitable way to save and load model is listed as follows.

1
2
3
4
5
6
7
# save model weights
model.save_weights(MODEL_FILEPATH + 'weight.h5')
model.load_weights(MODEL_FILEPATH + "weight.h5")

# save whole model
tf.keras.models.save_model(model, MODEL_FILEPATH)
model = model.load_weights(MODEL_FILEPATH)

13. Check GPU is available or not on tensorflow:

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU'))) tf.config.list_physical_devices('GPU')

The version compatible version of Tensorflow details.

Other Problems

1. Install Cudnn on Debian

The installation steps of installing cudnn on debian is different from that of ubuntu. And the path of cuda related package is not the same of that of ubuntu.

Step 1: Download Cudnn on Nvidia official website.

Step 2: Add the related lib to the path ./usr/lib/x86_64-linux-gnu/. A good way to find the path is by find . -name libcublas.so.10.

Step 3: Check the result of installation of cudnn.

2. Use matplotlib to draw 3D gradient descent pictures

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import *
from mpl_toolkits import mplot3d #用于绘制3D图形


#梯度函数的导数
def gradJ1(theta):
    return 4*theta
def gradJ2(theta):
    return 2*theta

#梯度函数
def f(x, y):
    return  2*x**2 +y**2

def ff(x,y):
    return 2*np.power(x,2)+np.power(y,2)

def train(lr,epoch,theta1,theta2,up,dirc):
    t1 = [theta1]
    t2 = [theta2]
    for i in range(epoch):
        gradient = gradJ1(theta1)
        theta1 = theta1 - lr*gradient
        t1.append(theta1)
        gradient = gradJ2(theta2)
        theta2 = theta2 - lr*gradient
        t2.append(theta2)

    plt.figure(figsize=(20,10))     #设置画布大小
    x = np.linspace(-3,3,30)
    y = np.linspace(-3,3,30)
    X, Y = np.meshgrid(x, y)
    Z = f(X,Y)
    ax = plt.axes(projection='3d')
    print(t1, t2, ff(t1,t2))
#     ax.scatter(t1, t2, ff(t1,t2), c='black',marker = '*', linewidth=1)

    ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='viridis', edgecolor='none', alpha=0.9) #曲面图
    #ax.plot_wireframe(X, Y, Z, color='c') #线框图
#     ax.contour3D(X, Y, Z, 50, cmap='binary')#等高线图
#     ax.scatter3D(t1, t2, ff(t1,t2), c='black',marker = 'o')
    ax.plot(t1, t2, ff(t1,t2), c='black',marker = 'o', markersize=5, zorder=5)
#     ax.plot_wireframe(t1, t2, ff(t1,t2))
#     ax.plot3D(t1, t2,  ff(t1,t2),'red')
    #调整观察角度和方位角。这里将俯仰角设为60度,把方位角调整为35度
    ax.view_init(up, dirc)
    plt.savefig("./temp.png")

#可以随时调节,查看效果 (最小值,最大值,步长)
@interact(lr=(0, 2, 0.0002),epoch=(1,100,1),init_theta1=(-3,3,0.1),init_theta2=(-3,3,0.1),up=(-180,180,1),dirc=(-180,180,1),continuous_update=False)
#lr为学习率(步长) epoch为迭代次数   init_theta为初始参数的设置 up调整图片上下视角 dirc调整左右视角
def visualize_gradient_descent(lr=0.05,epoch=10,init_theta1=-2,init_theta2=-3,up=60,dirc=60):
    train(lr,epoch,init_theta1,init_theta2,up,dirc)

3. git add new repo

link

4. Out of memory on Pytorch

When training, memory usage of GPU will increase by calculating loss, output. So delete them when they are useless is a suitable way to decrease memory usage.

1
2
del cost, out
print("\nall", torch.cuda.memory_allocated())

5. Mac New Application Damaged Problem

link

  1. sudo spctl --master-disable

    • Enable it again: sudo spctl --master-enable
  2. xattr -r -d com.apple.quarantine <path>

    • ` xattr -r -d com.apple.quarantine /Applications/PDF\ Expert.app`

6. Display the pictures from the remote ubuntu on the local server

link

  1. Install xquartz on mac from the link.
  2. Use ssh -X <remote server> to enable x11 forwarding
  3. Run python script with matplotlib to build the connection.

Enable OpenCL Problems:

  1. link
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast

set parameters: export LIBGL_ALWAYS_INDIRECT=1 export LIBGL_DEBUG=verbose export LIBGL_ALWAYS_SOFTWARE=1

  1. Open3d Problem
[Open3D WARNING] GLFW Error: GLX: Forward compatibility requested but GLX_ARB_create_context_profile is unavailable
[Open3D WARNING] Failed to create window
[Open3D WARNING] [DrawGeometries] Failed creating OpenGL window.

clinfo glxinfo: OpenGL version string: 1.4 (2.1 INTEL-16.1.7) Need to be change to nvidia driver command:

1
2
nvidia-settings
sudo prime-select nvidia

link

opengl version with GPU https://opengl.gpuinfo.org/displayreport.php?id=5738

7. How to install a editable python package

Detailed Tutorial

Basically, setup.cfg and setup.py are configured for the editable package.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# setup.cfg

[metadata]
name = local_structure
version = 0.1.0

[options]
packages = structure

# setup.py

import setuptools

setuptools.setup()

And then use the following command to install the package locally under the package folder: python -m pip install -e .

8. The Information about the file system and the cooresponding operating system

Linux:

  • Best Recommands: Ext4

  • Do not support: Exfat, Fat

Mac:

如果没有 Homebrew 的话,需要先安装 Homebrew:

1
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

安装 e2fsprogs

1
brew install e2fsprogs

把 U 盘插到 Mac 上,执行:

1
diskutil list

找到自己 U 盘的盘符,比如我这里是:/dev/disk2s1,

1
2
3
4
/dev/disk2 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:     FDisk_partition_scheme                        *31.0 GB    disk2
   1:                 DOS_FAT_32 KINGSTON                31.0 GB    disk2s1

然后执行格式化:

1
2
diskutil unmountdisk /dev/disk2s1
sudo $(brew --prefix e2fsprogs)/sbin/mkfs.ext4 /dev/disk2s1

执行命令后会要求输入用户密码,然后输入 y 确认,等待一会儿就可以了。

When we mount a NTFS external disk on Ubuntu, all the files are owned by root and the priviledge is not able to be changed. That will leads to some priviledge problems when using softwares in it.

So the most suitable way is using an Ext4 external disk for Ubuntu.

10. latex one table covers two columns

link

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
\usepackage{stfloats}

\begin{table*}[tp]
    \centering
    \begin{tabular}{c|ccc|ccc|ccc|ccc}
    \hline
    \multirow{2}{*}{Case} & \multicolumn{3}{c}{PointNet++(Random)} & \multicolumn{3}{c}{PointNet++(\Mname)} & \multicolumn{3}{c}{ResGCN-28(Random)} & \multicolumn{3}{c}{ResGCN-28(\Mname)} \\

    & $L_2$ & Acc & mIoU & $L_2$ & Acc & mIoU  & $L_2$ & Acc & mIoU  & $L_2$ & Acc & mIoU  \\
    \hline
    \hline
    Best    & \\
    Average & \\
    Worst   & \\
    \hline
    \end{tabular}
    \caption{The results of the non-targeted attack.}
    \label{tab:nt-performance}
\end{table*}

11. Oh-my-zsh completions commands with repeated words

link

12. Download Bilibili video automatically

you-get -l https://www.bilibili.com/video/BV1U7411a7xG\?p\=20 --debug

Use you-get command. The syntax is listed as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
usage: you-get [OPTION]... URL...

A tiny downloader that scrapes the web

optional arguments:
  -V, --version         Print version and exit
  -h, --help            Print this help message and exit

Dry-run options:
  (no actual downloading)

  -i, --info            Print extracted information
  -u, --url             Print extracted information with URLs
  --json                Print extracted URLs in JSON format

Download options:
  -n, --no-merge        Do not merge video parts
  --no-caption          Do not download captions (subtitles, lyrics, danmaku, ...)
  -f, --force           Force overwriting existing files
  --skip-existing-file-size-check
                        Skip existing file without checking file size
  -F STREAM_ID, --format STREAM_ID
                        Set video format to STREAM_ID
  -O FILE, --output-filename FILE
                        Set output filename
  -o DIR, --output-dir DIR
                        Set output directory
  -p PLAYER, --player PLAYER
                        Stream extracted URL to a PLAYER
  -c COOKIES_FILE, --cookies COOKIES_FILE
                        Load cookies.txt or cookies.sqlite
  -t SECONDS, --timeout SECONDS
                        Set socket timeout
  -d, --debug           Show traceback and other debug info
  -I FILE, --input-file FILE
                        Read non-playlist URLs from FILE
  -P PASSWORD, --password PASSWORD
                        Set video visit password to PASSWORD
  -l, --playlist        Prefer to download a playlist
  -a, --auto-rename     Auto rename same name different files
  -k, --insecure        ignore ssl errors

Proxy options:
  -x HOST:PORT, --http-proxy HOST:PORT
                        Use an HTTP proxy for downloading
  -y HOST:PORT, --extractor-proxy HOST:PORT
                        Use an HTTP proxy for extracting only
  --no-proxy            Never use a proxy
  -s HOST:PORT, --socks-proxy HOST:PORT
                        Use an SOCKS5 proxy for downloading

13. Recovery the deleted files on Ubuntu

testdisk

14. How to set the local Mac to proxy the Ubuntu server’s packages

Step 1: Open Mac’s SSH service and Farword configure.

Step 2: Set Ubuntu proxy config

1
2
3
4
5
6
7
8
9
10
11
12
# add the following two lines on the ~/.bashrc
export https_proxy=127.0.0.1:1234
export http_proxy=127.0.0.1:1234

# set the ssh channel, 7890 is the VPN proxy port
ssh -N -f -L localhost:1234:localhost:7890 jason@10.177.74.47<local machine ssh service>

# check the port-using process
lsof -ti:1234

# check the vpn service
curl -I https://google.com

15. A fast way to transfer files between remote servers with progress bar

rsync

1
rsync -r --info=progress2 <files path> <username@remote server> 

16. Jupyter cannot use the specific environment of conda

A helpful link.

Basically, the main problem is that the system does not use the jupyter command in the conda environment. Instead, it uses the system default version.

We can use the sys.path to check whether we use the correct command or not.

If the result of sys.path is like the following, the environment is correct.

1
2
3
4
5
6
7
8
['/home/jxu/random-fourier-features/examples',
 '/home/jxu/miniconda3/envs/rf/lib/python37.zip',
 '/home/jxu/miniconda3/envs/rf/lib/python3.7',
 '/home/jxu/miniconda3/envs/rf/lib/python3.7/lib-dynload',
 '',
 '/home/jxu/miniconda3/envs/rf/lib/python3.7/site-packages',
 '/home/jxu/miniconda3/envs/rf/lib/python3.7/site-packages/IPython/extensions',
 '/home/jxu/.ipython']

Basically, I did not exploit the detail of the problem this time. But I will list the solution here.

I install the jupyterhub by the command: conda install -c conda-forge jupyterhub from the link.

I reinstall it by the commnads from the link.

1
2
3
conda install -c conda-forge jupyterlab
conda install -c conda-forge nb_conda_kernels
# conda install -c conda-forge jupyter_contrib_nbextensions

Some userful jupyter extensions can be found here.

17. Use slack to receive signals or messages from the commands

A helpful link.

Command for the direct message to the user:

curl -X POST --data-urlencode "payload={\"channel\": \"@memberid\", \"username\": \"webhookbot\", \"text\": \"The machine with GTX1080 has been rebooted:)\", \"icon_emoji\": \":ghost:\"}" <link>

Command for the channel message:

curl -X POST --data-urlencode "payload={\"channel\": \"#general\", \"username\": \"webhookbot\", \"text\": \"The machine with GTX1080 has been rebooted:)\", \"icon_emoji\": \":ghost:\"}" <link>

For example, we can send a message to the user if the machine reboot.

We can write the command to the file \etc\rc.local.

18. Ubuntu set Default Desktop

1
2
sudo update-alternatives --config x-session-manager
sudo dpkg-reconfigure gdm3 # set the default desktop

Currently, I test it on Ubuntu 16.04 and figure out gdm3 cannot be run while lightdm works well. I am not sure the reason.

This post is licensed under CC BY 4.0 by the author.

Visual Studio Code Plugins

Machine Learning Tutorial