【代码示例】将label转为np数组，并存为pkl文件

最新推荐文章于 2024-11-02 16:28:26 发布

LRJ-jonas

最新推荐文章于 2024-11-02 16:28:26 发布

阅读量162

点赞数

文章标签： python 机器学习深度学习

本文链接：https://blog.csdn.net/m0_55097528/article/details/132401167

版权

本文以SFcnn打分函数的python代码为例：

首先读取训练集和测试集的复合物的文件地址，去除训练集中与测试集重叠的样本，再打乱顺序。

#Get the path of training and test set
train_dirs = glob(os.path.join('/home/lrj/Documents/PDB2019/pdbbind_v2019_refined','*')) 
core_dirs = glob(os.path.join('/home/lrj/Documents/CASF-2016/coreset','*'))
core_dirs.sort()
core_id = [os.path.split(i)[1] for i in core_dirs]
train_new_dirs=[]
for i in train_dirs:
    pdb_id = os.path.split(i)[1]
    if pdb_id not in core_id:
        train_new_dirs.append(i)
        
np.random.shuffle(train_new_dirs)

得到的 train_new_dirs 就是训练用的复合物的 pdb id

得到的 core_dirs 就是测试用的复合物的 pdb id

接下来通过活性数据表‘INDEX_general_PL_data.2019’依次对读取train_new_dirs和core_dirs的结合活性值。

需要说明这里的训练集由于会多旋转9次进行数据增强，所以label会乘以10.

#Get the data of affinties of proteins and ligands. -logKd/Ki values are the label.
affinity ={}
with open('INDEX_general_PL_data.2019','r') as f:
    for line in f.readlines():
        if line[0] != '#':
            affinity[line.split()[0]] = line.split()[3]
train_label=[]
core_label=[]
for i in train_new_dirs:
    pdb_id=os.path.split(i)[1]
    train_label.extend([affinity[pdb_id]]*10)
for i in core_dirs:
    core_id=os.path.split(i)[1]
    if not affinity.get(core_id):
        print(core_id)
    else:
        core_label.append(affinity[core_id])
train_label=np.array(train_label,dtype=np.float32)
core_label=np.array(core_label,dtype=np.float32)

最终使用ectend和append添加成数组，保存为np.array

接着就可以存为pkl文件。