1013维修后,操作记录
第一次运行
我用的是用户xxxy2080_10
首先,用我的电脑VNCviewer打开服务器,发现只能显示一部分屏幕,而且,当我把窗口放大发现,这部分屏幕也会被放大。总之,就是看不了完整的整张屏幕。解决方法是,带参数启动vncviewer:
需要先将VNCviewer加入path。然后
vncviewer --FullScreen=1
然后,根据教程来
下载python,Anconada3
// 创建screen会话
screen -S python
// 我看到Home文件夹中已经有了,所以有些不用再下了
// 下载python3.7.10
wget -c https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz
// 下载anaconda3
wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
安装python
// 下面不空行的,是解压python,安装python
tar vxf Python-3.7.10.tar.xz
cd Python-3.7.10/
// 注意,下面prefix的参数使用命令 pwd 获取
./configure --prefix=/xxxy2080hppc/xxxy2080_10/Python-3.7.10
make && make altinstall
// 添加环境变量
vim ~/.bash_profile
// 下面写入之前,依次: i(切换输入模式)-写入下面的-esc(切换命令模式)-:wq(保存退出)
:/xxxy2080hppc/xxxy2080_10/Python-3.7.10/bin
// 使环境变量生效
source ~/.bash_profile
python3.7 -V // Python 3.7.10
pycharm
看到系统路径里已经有了pycharm,于是试了一下
// 首先,系统路径里有这些:
PATH=$PATH:$HOME/.local/bin:$HOME/bin:/usr/local/cuda-10.2/bin:/usr/local/TensorRT-7.1.3.4/lib:/xxxy2080hppc/xxxy2080_10/pycharm-community-2021.2.2/bin/:/xxxy2080hppc/xxxy2080_10/Python-3.7.10/bin
使用教程给出的命令是:
cd pycharm-community-2021.2.2/bin/
sh pycharm.sh
安装anaconda
// 安装anaconda
// 首先修改Anaconda3-2021.05-Linux-x86_64.sh脚本为可执行脚本。
chmod u+x Anaconda3-2021.05-Linux-x86_64.sh
./Anaconda3-2021.05-Linux-x86_64.sh
出现提示"Permission denied"
进入文件存放地址,我的在Home中,右击设置属性-Permission,将后面两个属性改为读和写
再次运行上面的命令,就可以了
Please answer 'yes' or 'no':'
>>> yes
// 接下来这个命令好像是创建安装目录
/share/nishome/20070104_5/anaconda3
报错:mkdir: cannot create directory \u2018/share\u2019: Permission denied
// 上面这一步是使用共享文件安装,但是没有权限创建文件夹,故报错,解决方法是
[/xxxy2080hppc/xxxy2080_10/anaconda3] >>> (直接按Enter)
(这一步没有成功,探索性的)获取权限,参考这个帖子,建议不光要读博文,也要读评论,慎重!!!
出现Permission denied的解决办法(750权限谨慎使用)
// 获取权限
sudo chmod -R 750 share
//这时会显示以下内容,这里直接翻译一下:
我们相信您已经收到了当地系统管理员的常规讲座。 通常归结为以下三点:
#1) 尊重他人的隐私。
#2) 打字前请三思。
#3) 能力越大,责任越大。
// 紧接着,需要输入密码,然后,并没有成功。它提示
xxxy2080_10 不在 sudoers 文件中。 此事件将被报告。
添加anaconda环境变量:
// 添加环境变量
vim ~/.bash_profile
// 下面写入之前,依次: i(切换输入模式)-写入下面的-esc(切换命令模式)-:wq(保存退出)
:/xxxy2080hppc/xxxy2080_10/anaconda3/bin
// 使环境变量生效
source ~/.bash_profile
// 下面这句: 安装完conda一定要执行,否则会导致VNC黑屏
(base) [xxxy2080_10@xxxy2080 ~]$ conda config --set auto_activate_base false
(base) [xxxy2080_10@xxxy2080 ~]$ anaconda -V
anaconda Command line client (version 1.7.2)
至此,python 3.7.10,anaconda 3,pycharm 安装完毕!
安装tf-1
// 这儿可能需要先添加一下源,清华源.参照教程中来
// 安装TensorFlow 1.15
conda create -n tf_1 python=3.7.10 tensorflow-gpu=1.15.0
安装完毕后,测试tf -1 环境是否可用
(base) [xxxy2080_10@xxxy2080 ~]$ conda env list
# conda environments:
#
base * /xxxy2080hppc/xxxy2080_10/anaconda3
tf_1 /xxxy2080hppc/xxxy2080_10/anaconda3/envs/tf_1
(base) [xxxy2080_10@xxxy2080 ~]$ conda activate tf_1
(tf_1) [xxxy2080_10@xxxy2080 ~]$ python
Python 3.7.10 (default, Jun 4 2021, 14:48:32)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.15.0'
>>> tf.test.is_gpu_available()
2021-10-19 15:53:58.720085: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-10-19 15:53:58.749006: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2021-10-19 15:53:58.760656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a5ff66e410 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-10-19 15:53:58.760754: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-10-19 15:53:58.763249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-10-19 15:53:59.777727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:18:00.0
2021-10-19 15:53:59.779281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:3b:00.0
2021-10-19 15:53:59.780800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:86:00.0
2021-10-19 15:53:59.782295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:af:00.0
2021-10-19 15:53:59.782762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-19 15:53:59.784861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-10-19 15:53:59.787020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-10-19 15:53:59.787481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-10-19 15:53:59.789899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-10-19 15:53:59.791726: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-10-19 15:53:59.797119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-10-19 15:53:59.807201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3
2021-10-19 15:53:59.807265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-19 15:53:59.814790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-19 15:53:59.814839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1 2 3
2021-10-19 15:53:59.814892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N N N
2021-10-19 15:53:59.814906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N N N
2021-10-19 15:53:59.814938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2: N N N N
2021-10-19 15:53:59.814971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 3: N N N N
2021-10-19 15:53:59.822682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2021-10-19 15:53:59.826405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2021-10-19 15:53:59.829359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:2 with 10312 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
2021-10-19 15:53:59.832894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:3 with 10312 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:af:00.0, compute capability: 7.5)
2021-10-19 15:53:59.836824: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a60163e770 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-10-19 15:53:59.836863: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836878: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836892: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836906: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5
True
至此,tf 1.15.0安装成功
传输文件
XFTP新建会话后,设置如下:
连接名及远端主机: 问我我告诉你
后面两项的账号密码: 使用的用户服务器账号和密码
查看配置信息
服务器维修前,操作记录
登录
ssh登录,使用cmd
输入yes:
输入密码:
(选做)查看:版本
查看NVIDIA版本:
查看GPU状态:
拷贝conda本地环境至实验室Gpu:
参考链接:conda环境迁移到其他机器上
anaconda使用教程+直接环境拷贝移植所遇到的问题解决博文最后
导出本地环境
本地端操作,先是激活本地使用的环境。
然后,导出conda安装的包记录。
然后,导出pip安装的包记录。