8.1 配置环境/Linux进程管理总结/Argument/save&load Module/切片

本文链接：https://blog.csdn.net/IsayIwant/article/details/132059436

本文概述了配置开发环境的方法，Linux进程管理的技巧，ArgumentParser在命令行参数处理中的应用，模型保存和加载的PyTorch实践，以及如何利用视图节省内存。还介绍了微调模型和跨模型参数加载的策略。

摘要由CSDN通过智能技术生成

文章目录

一、配置环境

github配置环境可以直接赋值到txt中，然后使用 pip install -r requ.txt
不过需要注意的是，如果你原来已经下载了的版本，他会直接覆盖掉，先uninstall再install

二、Linux 进程管理总结

当使用Ctrl+Z键盘快捷键暂停了一个进程后，该进程会被挂起（暂停），并且被移动到后台执行。

jobs命令：使用jobs命令可以查看当前终端中的所有作业（包括后台和前台作业）。在终端中输入jobs，输出将显示当前所有作业的状态，其中包括被暂停的进程的编号（job number）和状态。
%N命令：使用%N命令可以将某个后台作业切换到前台运行，其中N是被暂停进程的编号。

在按下Ctrl+Z之后，进程将不会继续在前台执行，而是暂停在后台。换句话说，进程的执行会被中止，直到您采取下面的操作。
1、fg（foreground）当您在终端使用fg命令时，它会将一个后台作业切换到前台执行，使该作业占用终端输入和输出，并让您可以与该作业进行交互。
2、bg（background）当您在终端使用bg命令时，它会将一个前台暂停的作业切换到后台执行，让该作业在后台默默地运行，不占用终端输入和输出。
但是就算使用bg他还是会输出到终端上来，可能是因为你没有使用输出重定向。原因：
如果在后台执行的进程仍然占用您的终端输入和输出，可能是因为该进程在后台运行时仍然输出大量的信息到终端，导致终端被占用。在后台执行的进程如果产生大量输出，会在终端上显示这些输出，导致终端无法正常响应其他输入。
有几种方法可以解决这个问题：

如果希望执行某个命令，但又不希望在屏幕上显示输出结果，那么可以将输出重定向到 /dev/null：command > /dev/null & 这样是直接丢弃命令后面加&是指在后台运行
还可以重定向到txt文件 command > process1.txt &
使用终端多路复用器，Terminal Multiplexer，如tmux或screen，可以让您在终端上创建多个会话，并在后台保持进程运行，同时在需要时可以切换到对应会话查看输出。screen reference Reference2
BTW，>>是在后面追加

三、ArgumentParser

标志参数是一种命令行参数类型，通常用于在命令行中表示一个布尔值，即真（True）或假（False）。
标志参数不需要附加参数值，而只需在命令行中指定参数名即可

parser.add_argument('--verbose', action='store_true', help='Flag to enable verbose mode')

如果设置为选项参数（Optional Arguments）：

parser.add_argument('--finetune', type=str, default="hahaha")

一旦你在命令里写了--finetune你后面就必须有参数，要不然会报如下错误：

error: argument --finetune: expected one argument

四、Saving and Loading Models nn.Modules

Fine-tuning（微调）是指在预训练好的模型基础上，通过继续训练（fine-tune）来适应特定的任务或数据集。通常情况下，预训练的模型在大规模的数据集上进行了较长时间的训练，学习到了丰富的特征表示。微调的目的是在这些预训练的特征表示的基础上，针对特定任务或数据集进行调整，以便更好地适应新的任务
Fine-tuning说白了：
1.load保存好的模型 --> 你需要知道model.save到底保存的是什么，以及model.load到底下载的什么
2.训练几个epoch
学习官方写法：you can save any other items that may aid you in resuming training by simply appending them to the dictionary.

torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'best_acc': best_acc
            'loss': loss,
            ...
            }, PATH)

If you have more than one model, you can also store them in the form of dictionary. for instance:

torch.save({
			'modelA_state_dict': modelA.state.dict(),
			'modelB_state_dict': modelB.state.dict(),
			}, PATH)

以后加载模型就按照这个模板写：

# 先定义好model和optimizer
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)
# 先用checkpoint下载下来
checkpoint = torch.load(PATH)
# 再load_state_dict
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

记住，在运行推理之前，必须调用 model.eval() 将滤除层和批量归一化层设置为评估模式。如果不这样做，推理结果将不一致。
如果希望恢复训练，请调用 model.train() 以确保这些层处于训练模式。

Warmstarting Model Using Parameters from a Different Model

modelB.load_state_dict(torch.load(PATH), strict=False)
Partially loading a model or loading a partial model are common scenarios when transfer learning or training a new complex model. Leveraging trained parameters, even if only a few are usable, will help to warmstart the training process and hopefully help your model converge much faster than training from scratch.
Whether you are loading from a partial state_dict, which is missing some keys, or loading a state_dict with more keys than the model that you are loading into, you can set the strict argument to False in the load_state_dict() function to ignore non-matching keys.
即，当遇到不匹配的情况，如：
在这里插入图片描述

设置strict=False你可以直接忽略不匹配的键值，也就是说只传递匹配的键值。
You can simply change the name of the parameter keys in the state_dict that you are loading to match the keys in the model that you are loading into.

e.g. new_state_dict = {k.replace('module.', ''): v for k, v in model_state_dict.items()}

五、切片！

定义： $se q u e n ce [s t a r t : s t o p : s t e p]$
sequence不可以是dict，因为dict是无序的。
几种常见的用法：
a = [0,1,2,3,4,5,6,7]

a[::2]即隔一个选一个，结果为 0,2,4,6
a[1::2] 1,3,5,7
a[::-1] 7,6,5,4,3,2,1,0
a[::-2]7,5,3,1

numbers = torch.arange(1, 10) # [1,2,3....9]
evens = numbers[1::2] # [2,4,...8]

numbers里面的1是指从索引为1的开始

拓展：

PyTorch allows a tensor to be a View of an existing tensor.
View tensor shares the same underlying data with its base tensor.这么做可以节约开销。

t = torch.rand(4, 4)
b = t.view(2, 8)
t.storage().data_ptr() == b.storage().data_ptr() -> True
b[0][0] = 3.14
t[0][0] -> 3.14

t是一个4x4的随机张量，b是通过对T进行view操作得到的2x8的张量，t和b共享相同的底层数据。因此，对b的修改也会反映在t上。
我们日常中用的一些函数很多都是view模式，比如：
Basic slicing and indexing op, e.g. tensor[0, 2:, 1:7:2] returns a view of base tensor
tensor_split(),T,transpose(),unsqueeze(),squeeze(),detach() etc.

所以在save tensor的时候， it saves their storage objects and tensor metadata separately.
例子如下：

large = torch.arange(1, 1000)
small = large[0:5]
torch.save(small, 'small.pt')
loaded_small = torch.load('small.pt')
loaded_small.storage().size() -> 999