有关python代码的常见错误都在这里，生信数据加载，网络训练，并行分布式，持续更新中

二又

已于 2024-05-21 09:32:33 修改

阅读量942

点赞数 15

文章标签：机器学习

于 2024-01-16 15:42:25 首次发布

本文链接：https://blog.csdn.net/weixin_44003026/article/details/135251904

版权

Annadata报错总结

1 TypeError: Can’t implicitly convert non-string objects to strings

Above error raised while writing key ‘sens_binary_preds’ of <class ‘h5py._hl.group.Group’> to g

adata.obs["sens_binary"] = str(binary_output_list)
adata.write("save/adata/"+ args.data_name + '_' + args.para + ".h5ad")

该错误的含义是当写入关键词 'sens_binary_preds’到‘h5ad’格式文件的时候，出现了不能将非字符串转成字符串的情况，但是目前看来没有什么是非字符格式的，这个时候的错误实际上是因为关键词 'sens_binary_preds’太长所导致的，将关键词缩短成“sens_binary”就不报错了。

配置Python环境常见错误总结

1 No module named ‘_sysconfigdata_x86_64_conda_linux_gnu’"

"ModuleNotFoundError: No module named '_sysconfigdata_x86_64_conda_linux_gnu'"

在base下搜索之后

sudo find / -name _sysconfigdata_x86_64*`
找到缺失的py文件’_sysconfigdata_x86_64_conda_linux_gnu’

/home/ubuntu/anaconda3/lib/python3.8/_sysconfigdata_x86_64_conda_cos6_linux_gnu.py

复制
将此文件复制到丢失文件的环境下，解决了我的问题

cd ~/anaconda3/lib/python3.8   
cp _sysconfigdata_x86_64_conda_cos6_linux_gnu.py _sysconfigdata_x86_64_conda_linux_gnu.py

2 添加清华镜像源方法

（1）pip 使用国内镜像源

需要修改配置文件。

Linux/Mac os 环境中，配置文件位置在 ~/.pip/pip.conf（如果不存在创建该目录和文件）：

mkdir ~/.pip

打开配置文件 ~/.pip/pip.conf，修改如下：

[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
[install]
trusted-host = https://pypi.tuna.tsinghua.edu.cn

查看镜像地址：

$ pip3 config list   
global.index-url='https://pypi.tuna.tsinghua.edu.cn/simple'
install.trusted-host='https://pypi.tuna.tsinghua.edu.cn'

可以看到已经成功修改了镜像。

（2）conda 使用国内镜像

https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

vim ~/.condarc             # 这一步要在base下去操作

添加如下内容：

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch-lts: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  deepmodeling: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/

：wq！保存

（3）torch一定要去官网下载：https://pytorch.org/get-started/previous-versions/

（4）pyg常见错误：torch_geometric无法正常工作，且代码debug/正常跑均不报错：

此时需要根据自己的torch版本和cuda版本安装对应的四个依赖包：
比如torch1.12.1+cuda113就去这个网站下面去下载：
https://data.pyg.org/whl/torch-1.12.1%2Bcu113.html

网络数据加载tips

1、标签二值化

pred = (pred > 0.5).astype(int) # 转化为整形分类  # 阈值

2、对数组输入array和标签array同时进行打乱

 ge_y = np.column_stack((xc_ge_all,y_all))  # 将y添加到x的最后一列 4*4
 np.random.shuffle(ge_y)
 xc_ge_all = ge_y[:,:-1]
 y_all = ge_y[:,-1]

3、一种训练集数据不平衡的上采样方法，在数据预处理阶段完成

from sklearn.model_selection import train_test_split
from imblearn.over_sampling import RandomOverSampler

xd_all_trainval, xd_test, xc_ge_all_trainval, xc_ge_test, y_all_trainval, y_test = train_test_split(xd_all, xc_ge_all, y_all, test_size=0.2, random_state=42)     # 划分训练+验证/测试集
xd_train, xd_val, sc_ge_train, xc_ge_val, y_train, y_val = train_test_split(xd_all_trainval, xc_ge_all_trainval, y_all_trainval, test_size=0.2, random_state=42)  
rds = RandomOverSampler(random_state=42)
print('原来sensitive的样本数: ', sum(y_train))
 # 下面的fit_resample的输入必须是数组形式，字符串类型与array类型必须用zip打包才能转成array，如果直接np.column_stack会报错
oversample = list(zip(xd_train, sc_ge_train))   
oversample = np.array(oversample)    
oversample, y_train = rds.fit_resample(oversample, y_train)    # (76736, 2)  (76736,) -> (135240)    67704
xd_train, xc_ge_train = zip(*oversample)   
xd_train = np.array(xd_train)
xc_ge_train = np.array(xc_ge_train)

网络训练tips

1、对图做最大池化：

from torch_geometric.nn import global_mean_pool as gap, global_max_pool as gmp

gmp(x, batch)

这里面的batch是图网络中的节点数目，对于化合物SMILES来说就是原子的数目：
在这里插入图片描述
2、AttributeError: ‘function’ object has no attribute " xxx"
不要慌，先用dir()打印一下属性，看看该对象是否真的没有被 “xxx”,如果真的没有，检查源代码出错行的语法，看是否漏写（）。比如下面这行代码就会报错：

DaNN_model = torch.nn.DataParallel(DaNN_model).cuda
DaNN_model.state_dict()
AttributeError: ‘function’ object has no attribute "state_dict"

3、在DataParallel中，如果报错一个对象在cuda0, 另一个对象在cuda1,

Caught AttributeError in replica 0 on device 0.
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking arugment for argume

说明下面这个代码用了不止一次，模型并行了两次：
torch.nn.DataParallel(net).cuda()
4、分布式训练同时并行训练两个py程序报错：
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500
解决方法：设置master_port 为新的端口号29500+1=25901，…依此类推

 python -m torch.distributed.launch --nproc_per_node 2 --master_port 29501 finetune.py

5、标签类别数目不正确报错：
./aten/src/ATen/native/cuda/Loss.cu:271: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion t >= 0 && t < n_classes failed
6、loss
（1）训练过程中用到两个loss.backward()报错：RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

只需要确保第一次loss的计算图没有释放就可以：
方法就是第二次loss.backward( )的时候不要将梯度清零：

### 第一个loss
self.optimizer.zero_grad()
loss_cross.backward()
self.optimizer.step()
self.lr_scheduler.step()
### 第二个loss
torch.autograd.set_detect_anomaly(True)
# self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

（2）多个loss相加的时候，先用系数1试跑一下，看看loss之间的量纲在什么级别，相差多少倍，比如loss1=0.005和loss2=5相差1000倍，此时需要在loss2的前面0.001：
loss = loss1 + 0.001loss2

Problem Settings

遇到经典的trick不好用的情况，通常是没用对地方造成的。

二又

关注

15
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
有关python代码的常见错误都在这里，生信数据加载，网络训练，并行分布式，持续更新中

在base下搜索之后找到缺失的py文件’_sysconfigdata_x86_64_conda_linux_gnu’复制将此文件复制到丢失文件的环境下，解决了我的问题。
复制链接

扫一扫