Graphormer跑数据和踩坑记录

1.修改的create_customized_dataset

使用自己的数据转化成与QM9数据集类似的数据结构。
读取CSV文件,创建DGL图和标签列表。
将这些图和标签列表组合成一个单一的DGL数据集对象。

from graphormer.data import register_dataset
from dgl.data import DGLDataset
from sklearn.model_selection import train_test_split
import dgl
import pandas as pd
from rdkit import Chem
import torch
import numpy as np

class Custom: //省略

@register_dataset("customized_qm9_dataset")
def create_customized_dataset():
    train_df = pd.read_csv('train.csv')
    val_df = pd.read_csv('valid.csv')
    test_df = pd.read_csv('test.csv')

    data = pd.concat([train_df, val_df, test_df])
    dataset = Custom(data)

    num_graphs = len(dataset)

    train_valid_idx, test_idx = train_test_split(
        np.arange(num_graphs), test_size=num_graphs // 10, random_state=0
    )
    train_idx, valid_idx = train_test_split(
        train_valid_idx, test_size=num_graphs // 5, random_state=0
    )

    return {
        "dataset": dataset,
        "train_idx": train_idx,
        "valid_idx": valid_idx,
        "test_idx": test_idx,
        "source": "dgl"
    }

2.数据集定制:

https://docs.dgl.ai/en/0.6.x/api/python/dgl.data.html#qm9-dataset

3报错

3.1 TypeError: ‘type’ object is not subscriptable

2023-09-03 14:54:58 | WARNING | root | The OGB package is out of date. Your version is 1.3.2, while the latest version is 1.3.6.
Using backend: pytorch
Traceback (most recent call last):
  File "/root/miniconda3/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/root/miniconda3/lib/python3.8/site-packages/fairseq_cli/train.py", line 512, in cli_main
    parser = options.get_training_parser()
  File "/root/miniconda3/lib/python3.8/site-packages/fairseq/options.py", line 38, in get_training_parser
    parser = get_parser("Trainer", default_task)
  File "/root/miniconda3/lib/python3.8/site-packages/fairseq/options.py", line 234, in get_parser
    utils.import_user_module(usr_args)
  File "/root/miniconda3/lib/python3.8/site-packages/fairseq/utils.py", line 497, in import_user_module
    import_tasks(tasks_path, f"{module_name}.tasks")
  File "/root/miniconda3/lib/python3.8/site-packages/fairseq/tasks/__init__.py", line 117, in import_tasks
    importlib.import_module(namespace + "." + task_name)
  File "/root/miniconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/root/autodl-tmp/Graphormer/graphormer/tasks/is2re.py", line 25, in <module>
    class LMDBDataset:
  File "/root/autodl-tmp/Graphormer/graphormer/tasks/is2re.py", line 43, in LMDBDataset
    def __getitem__(self, idx: int) -> dict[str, Union[Tensor, float]]:
TypeError: 'type' object is not subscriptable

版本问题,需要更新到python3.9

3.2 error: unknown file type ‘.pyx’ (from ‘fairseq/data/data_utils_fast.pyx’)

在这里插入代码片/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/ld: warning: /root/miniconda3/envs/Graphormer/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
building 'fairseq.data.data_utils_fast' extension
error: unknown file type '.pyx' (from 'fairseq/data/data_utils_fast.pyx')

pip版本太低,需要更新pip install --upgrade pip
不行的话用这个:


python -m pip install --upgrade "pip==21.1"


3.3 python: can’t open file ‘setup.py’: [Errno 2] No such file or directory

ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
python: can't open file 'setup.py': [Errno 2] No such file or directory

用这句,然后继续

cd fairseq
pip install . --use-feature=in-tree-build
python setup.py build_ext --inplace

3.4 must be regenerated with protoc >= 3.19.0

If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are: 1. Downgrade the protobuf package to 3.20.x or lower. 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slow

解决办法

pip install protobuf==3.20.*

3.5 AttributeError: module ‘numpy’ has no attribute ‘float’.

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

解决办法:

pip install numpy==1.23.5

3.6 urllib.error.HTTPError: HTTP Error 409: Conflict

Downloading: "https://ml2md.blob.core.windows.net/graphormer-ckpts/checkpoint_best_pcqm4mv1.pt" to /root/.cache/torch/hub/checkpoints/checkpoint_best_pcqm4mv1.pt
Traceback (most recent call last):
  File "/root/miniconda3/envs/Graphormer/bin/fairseq-train", line 8, in <module>
    sys.exit(cli_main())
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/fairseq_cli/train.py", line 528, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/fairseq_cli/train.py", line 94, in main
    model = task.build_model(cfg.model)
  File "/root/autodl-tmp/Graphormer/graphormer/tasks/graph_prediction.py", line 229, in build_model
    model = models.build_model(cfg, self)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/fairseq/models/__init__.py", line 105, in build_model
    return model.build_model(cfg, task)
  File "/root/autodl-tmp/Graphormer/graphormer/models/graphormer.py", line 149, in build_model
    return cls(args, encoder)
  File "/root/autodl-tmp/Graphormer/graphormer/models/graphormer.py", line 43, in __init__
    self.load_state_dict(load_pretrained_model(args.pretrained_model_name))
  File "/root/autodl-tmp/Graphormer/graphormer/pretrain/__init__.py", line 15, in load_pretrained_model
    return load_state_dict_from_url(PRETRAINED_MODEL_URLS[pretrained_model_name], progress=True)["model"]
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/torch/hub.py", line 571, in load_state_dict_from_url
    download_url_to_file(url, cached_file, hash_prefix, progress=progress)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/site-packages/torch/hub.py", line 437, in download_url_to_file
    u = urlopen(req)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/root/miniconda3/envs/Graphormer/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 409: Conflict

解决办法

wget https://ml2md.blob.core.windows.net/graphormer-ckpts/checkpoint_best_pcqm4mv1.pt
mv checkpoint_best_pcqm4mv1.pt /root/.cache/torch/hub/checkpoints/

不过后面还是不行,就没用预训练模型

3.7 TypeError: ‘DiskDataset’ object is not subscriptable

  File "/root/autodl-tmp/Graphormer/graphormer/data/dgl_datasets/dgl_dataset.py", line 155, in __getitem__
    graph, y = self.dataset[idx]
TypeError: 'DiskDataset' object is not subscriptable

这里记得要改

    return {
        "dataset": dataset,
        "train_idx": train_idx,
        "valid_idx": valid_idx,
        "test_idx": test_idx,
        "source": "smiles"
    }

3.8 AttributeError: ‘int’ object has no attribute ‘dim’

ile "/root/autodl-tmp/Graphormer/graphormer/data/dgl_datasets/dgl_dataset.py", line 156, in __getitem__
    return self.__preprocess_dgl_graph(graph, y, idx)
  File "/root/autodl-tmp/Graphormer/graphormer/data/dgl_datasets/dgl_dataset.py", line 144, in __preprocess_dgl_graph
    if y.dim() == 0:
AttributeError: 'int' object has no attribute 'dim'

在前面加上下面这句就行了:

y = torch.tensor(y)

3.9 Exception in thread Thread-5:

  File "/root/autodl-tmp/Graphormer/graphormer/criterions/binary_logloss.py", line 103, in forward
    logits_flatten[mask].float(), targets_flatten[mask].float(), reduction="sum"
IndexError: The shape of the mask [16] at index 0 does not match the shape of the indexed tensor [32] at index 0
Exception in thread Thread-5:

出现这个就把num_class设为2

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
Graphormer 是一种新型的图神经网络,它结合了自注意力机制和图卷积神经网络(GCN)的优点。Graphormer 可以用于节点分类、图分类和图生成等任务。在 Graphormer 中,每个节点都可以看作是一个 Transformer 编码器,节点之间的关系可以通过自注意力机制进行建模。在这里,我将为您提供使用 DGL 实现 Graphormer 的代码示例,其中使用了 Transformer 和 GAT 的实现。 ``` import dgl import torch import torch.nn as nn import dgl.function as fn from dgl.nn.pytorch import GATConv, TransformerEncoderLayer class Graphormer(nn.Module): def __init__(self, in_feats, hidden_feats, num_heads, num_layers, dropout): super(Graphormer, self).__init__() self.in_feats = in_feats self.hidden_feats = hidden_feats self.num_heads = num_heads self.num_layers = num_layers self.dropout = dropout self.layers = nn.ModuleList() self.transformer = TransformerEncoderLayer(d_model=in_feats, nhead=num_heads) for i in range(num_layers): self.layers.append(GATConv(in_feats=in_feats, out_feats=hidden_feats, num_heads=num_heads)) self.layers.append(self.transformer) def forward(self, g, feats): # 输入特征 h = feats for i in range(self.num_layers): h = self.layers[i](g, h).flatten(1) h = self.layers[i+1](h) h = h.reshape(feats.shape[0], feats.shape[1], -1) h = nn.functional.dropout(h, p=self.dropout, training=self.training) # 最终表示 return h ``` 在这个实现中,我们首先定义了一个 Graphormer 类,它包含了 Transformer 编码器和 GATConv 层。在 forward 函数中,我们首先传入输入特征 feats 和图 g,然后将每个节点视为 Transformer 编码器,使用自注意力机制建模节点之间的关系。然后,我们将 GATConv 层和 Transformer 编码器交替进行,以便更好地建模图中的信息流动。最后,我们使用 dropout 进行正则化,并返回最终表示。 请注意,这个实现只是一个示例,您需要根据您的具体需求进行修改和调整。例如,您可能需要调整层数、隐藏特征的维度和头数等超参数,以便更好地适应您的任务。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值