问题1:Linux服务器深度学习代码无法用GPU执行;
问题2:执行命令 nvidia-smi,提示:Failed to initialize NVML: Driver/library version mismatch
原因:系统自动更新驱动
分析:
1.查看nvidia相关安装包信息,确认一下版本
1
| sudo dpkg --list | grep nvidia-*
|
2.查看nvidia内核版本
1
| cat /proc/driver/nvidia/version
|
3.查看安装包安装或更新情况
1
| cat /var/log/dpkg.log | grep nvidia
|
核对发现服务驱动自动更新了,所以导致显卡驱动用不了,执行不力nvidia-smi
解决办法:禁用自动更新
1
| sudo vim /etc/apt/apt.conf.d/50unattended-upgrades
|
注释掉以下两行:
//“${distro_id}:${distro_codename}”;
//“${distro_id}:${distro_codename}-security”;
重启系统
再次执行:
正常:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| (base) dino@jarvis:~$ nvidia-smi Wed Sep 6 01:43:33 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 0% 57C P0 107W / 350W | 0MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
|
Ref: https://zhuanlan.zhihu.com/p/453955370