系统安装

在官网下载Ubuntu镜像:Ubuntu 20.04.1 LTS (Focal Fossa),选择Desktop Image版本,得到.iso的镜像文件。

黑屏无法进入安装界面

1
2
3
sudo gedit /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nomodeset“
sudo update-grub

选中引导界面,选中ubuntu,依据提示按e

1
rw quiet splash nomodeset

锁定显卡自动更新

关闭nouveau

1
sudo vim /etc/modprobe.d/blacklist.conf

文件最后插入:

1
2
blacklist nouveau
options nouveau modeset=0

更新

1
sudo update-initramfs -u

GRUB

1
vim /etc/default/grub

找到 GRUB_HIDDEN_TIMEOUT=0 这行,使用#注释掉,变成 #GRUB_HIDDEN_TIMEOUT=0
保存退出

1
sudo update-grub

若不行,重装

1
2
3
sudo update-grub 
sudo grub-install /dev/sda
sudo reboot #重启

/etc/default/grub文件介绍
GRUB_TIMEOUT=10(默认是为10秒的)意思是等待10秒钟,设置为负数为一直等待操作
启动的时候就会显示grub菜单了,如果10秒内不选择,则会自动进入系统
进入grub快捷键
shift

配置国内的源

1
cp /etc/apt/sources.list /etc/apt/sources.list.bak
1
sudo vim /etc/apt/sources.list

deb http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ focal-backports main restricted universe multiverse

1
2
sudo apt update
sudo apt upgrade

安装python & pip

1
2
sudo apt install python3
sudo apt install python3-pip
1
sudo apt install ssh

安装Cuda

  1. 选择对应版本,下载[https://developer.nvidia.com/cuda-12-0-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local]

  2. 按照提示安装,若提示“Existing package manager installation of the driver found. It is strongly ”,选择continue,去掉“Driver”

  3. 设置环境变量

    1
    2
    3
    4
    5
    6
    7
    nano ~/.bashrc
    文件最后加入以下语句:
    export CUDA_HOME=/usr/local/cuda-12.0
    export LD_LIBRARY_PATH=${CUDA_HOME}/lib64
    export PATH=${CUDA_HOME}/bin:${PATH}
    使文件生效
    source ~/.bashrc
  4. 验证安装

    1
    nvcc -V

安装Anaconda

1
2
3
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh
bash Anaconda3-2023.03-Linux-x86_64.sh
nano ~/.bashrc

末尾添加:

1
2
export PATH="~/anaconda3/bin":$PATH
source ~/anaconda3/bin/activate
1
source ~/.bashrc

进入base环境

创建新环境

1
2
3
conda create -n pytorch python=3.10.9
conda info --envs
conda activate pytorch

安装pytorch

1
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

验证GPU

1
2
3
4
import torch
torch.cuda.is_available()
from torch.backends import cudnn
cudnn.is_available()

安装配置SSH

1
2
3
4
5
6
7
8
# 安装openssh-server
sudo apt-get install openssh-server
# 检查ssh服务是否启动,若输出sshd则启动成功
sudo ps -e | grep ssh
# 启动ssh
sudo service ssh start
# 查看ssh状态
service sshd status

安装包出现依赖包问题解决

1
2
3
4
# 使用apt安装时出现依赖包缺失问题,改用aptitude安装
sudo apt-get update
sudo apt-get install aptitude
sudo aptitude install DEPENDS_NAME

查看显卡情况

1
2
3
4
5
6
# 显示服务器上的GPU的情况
nvidia-smi
# 定时更新显示服务器上的GPU的情况
nvidia-smi -l
# nvidia-smi 设定刷新时间(秒)显示GPU使用情况
watch -n 3

Reference:
ubuntu 黑屏 进入不了图形界面 dev/sda1: clean, 552599/6111232 files, 7119295/24414464 blocks

【保姆级教程】个人深度学习工作站配置指南