Annadata报错总结
1 TypeError: Can’t implicitly convert non-string objects to strings
Above error raised while writing key ‘sens_binary_preds’ of <class ‘h5py._hl.group.Group’> to g
adata.obs["sens_binary"] = str(binary_output_list)
adata.write("save/adata/"+ args.data_name + '_' + args.para + ".h5ad")
该错误的含义是当写入关键词 'sens_binary_preds’到‘h5ad’格式文件的时候,出现了不能将非字符串转成字符串的情况,但是目前看来没有什么是非字符格式的,这个时候的错误实际上是因为关键词 'sens_binary_preds’太长所导致的,将关键词缩短成“sens_binary”就不报错了。
配置Python环境常见错误总结
1 No module named ‘_sysconfigdata_x86_64_conda_linux_gnu’"
"ModuleNotFoundError: No module named '_sysconfigdata_x86_64_conda_linux_gnu'"
在base下搜索之后
sudo find / -name _sysconfigdata_x86_64*`
找到缺失的py文件’_sysconfigdata_x86_64_conda_linux_gnu’
/home/ubuntu/anaconda3/lib/python3.8/_sysconfigdata_x86_64_conda_cos6_linux_gnu.py
复制
将此文件复制到丢失文件的环境下,解决了我的问题
cd ~/anaconda3/lib/python3.8
cp _sysconfigdata_x86_64_conda_cos6_linux_gnu.py _sysconfigdata_x86_64_conda_linux_gnu.py
2 添加清华镜像源方法
(1)pip 使用国内镜像源
需要修改配置文件。
Linux/Mac os 环境中,配置文件位置在 ~/.pip/pip.conf(如果不存在创建该目录和文件):
mkdir ~/.pip
打开配置文件 ~/.pip/pip.conf,修改如下:
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
[install]
trusted-host = https://pypi.tuna.tsinghua.edu.cn
查看 镜像地址:
$ pip3 config list
global.index-url='https://pypi.tuna.tsinghua.edu.cn/simple'
install.trusted-host='https://pypi.tuna.tsinghua.edu.cn'
可以看到已经成功修改了镜像。
(2)conda 使用国内镜像
https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
vim ~/.condarc # 这一步要在base下去操作
添加如下内容:
channels:
- defaults
show_channel_urls: true
default_channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/
:wq!保存
(3)torch一定要去官网下载:https://pytorch.org/get-started/previous-versions/
(4)pyg常见错误:torch_geometric无法正常工作,且代码debug/正常跑均不报错:
此时需要根据自己的torch版本和cuda版本安装对应的四个依赖包:
比如torch1.12.1+cuda113就去这个网站下面去下载:
https://data.pyg.org/whl/torch-1.12.1%2Bcu113.html
网络数据加载tips
1、标签二值化
pred = (pred > 0.5).astype(int) # 转化为整形分类 # 阈值
2、对数组输入array和标签array同时进行打乱
ge_y = np.column_stack((xc_ge_all,y_all)) # 将y添加到x的最后一列 4*4
np.random.shuffle(ge_y)
xc_ge_all = ge_y[:,:-1]
y_all = ge_y[:,-1]
3、一种训练集数据不平衡的上采样方法,在数据预处理阶段完成
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import RandomOverSampler
xd_all_trainval, xd_test, xc_ge_all_trainval, xc_ge_test, y_all_trainval, y_test = train_test_split(xd_all, xc_ge_all, y_all, test_size=0.2, random_state=42) # 划分训练+验证/测试集
xd_train, xd_val, sc_ge_train, xc_ge_val, y_train, y_val = train_test_split(xd_all_trainval, xc_ge_all_trainval, y_all_trainval, test_size=0.2, random_state=42)
rds = RandomOverSampler(random_state=42)
print('原来sensitive的样本数: ', sum(y_train))
# 下面的fit_resample的输入必须是数组形式,字符串类型与array类型必须用zip打包才能转成array,如果直接np.column_stack会报错
oversample = list(zip(xd_train, sc_ge_train))
oversample = np.array(oversample)
oversample, y_train = rds.fit_resample(oversample, y_train) # (76736, 2) (76736,) -> (135240) 67704
xd_train, xc_ge_train = zip(*oversample)
xd_train = np.array(xd_train)
xc_ge_train = np.array(xc_ge_train)
网络训练tips
1、对图做最大池化:
from torch_geometric.nn import global_mean_pool as gap, global_max_pool as gmp
gmp(x, batch)
这里面的batch是图网络中的节点数目,对于化合物SMILES来说就是原子的数目:
2、AttributeError: ‘function’ object has no attribute " xxx"
不要慌,先用dir()打印一下属性,看看该对象是否真的没有被 “xxx”,如果真的没有,检查源代码出错行的语法,看是否漏写()
。比如下面这行代码就会报错:
DaNN_model = torch.nn.DataParallel(DaNN_model).cuda
DaNN_model.state_dict()
AttributeError: ‘function’ object has no attribute "state_dict"
3、在DataParallel中,如果报错一个对象在cuda0, 另一个对象在cuda1,
Caught AttributeError in replica 0 on device 0.
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking arugment for argume
说明下面这个代码用了不止一次,模型并行了两次:
torch.nn.DataParallel(net).cuda()
4、分布式训练同时并行训练两个py程序报错:
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500
解决方法:设置master_port 为 新的端口号29500+1=25901,…依此类推
python -m torch.distributed.launch --nproc_per_node 2 --master_port 29501 finetune.py
5、标签类别数目不正确报错:
./aten/src/ATen/native/cuda/Loss.cu:271: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes
failed
6、loss
(1)训练过程中用到两个loss.backward()报错:RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
只需要确保第一次loss的计算图没有释放就可以:
方法就是第二次loss.backward( )的时候不要将梯度清零:
### 第一个loss
self.optimizer.zero_grad()
loss_cross.backward()
self.optimizer.step()
self.lr_scheduler.step()
### 第二个loss
torch.autograd.set_detect_anomaly(True)
# self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
(2)多个loss相加的时候,先用系数1试跑一下,看看loss之间的量纲在什么级别,相差多少倍,比如loss1=0.005和loss2=5相差1000倍,此时需要在loss2的前面0.001:
loss = loss1 + 0.001loss2
Problem Settings
遇到经典的trick不好用的情况,通常是没用对地方造成的。