[代码复现]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks

夏莉莉iy

已于 2024-04-29 12:58:18 修改

阅读量1.2k

点赞数 20

分类专栏：代码复现文章标签：深度学习人工智能学习图论笔记 python

于 2024-01-14 14:34:32 首次发布

本文链接：https://blog.csdn.net/Sherlily/article/details/135582174

版权

代码复现专栏收录该内容

3 篇文章 1 订阅

订阅专栏

前情提要：这代码的readme其实已经写得清楚得不能再清楚了，如果真的哪里有问题看readme

1. 论文资料

（1）论文原文：BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

（2）论文代码：GitHub - HennyJie/BrainGB: Officially Accepted to IEEE Transactions on Medical Imaging (TMI, IF: 11.037) - Special Issue on Geometric Deep Learning in Medical Imaging.

（3）库网址：braingb.us

（4）论文笔记：[论文精读]BrainGB: A Benchmark for Brain Network Analysis With Graph Neural Networks-CSDN博客

2. 代码复现步骤及可能存在的问题

2.1. 环境配置

以下为作者在readme中提供的方法，鉴于很新应该没什么大问题

①克隆存储库并安装所需的依赖项：

git clone https://github.com/HennyJie/BrainGB.git

②or导航到存储库并安装依赖项：

pip install -r requirements.txt

③orBrainGB 依赖于以下软件包：

torch~=1.10.2
numpy~=1.22.2
nni~=2.4
PyYAML~=5.4.1
scikit-learn~=1.0.2
networkx~=2.6.2
scipy~=1.7.3
tensorly~=0.6.0
pandas~=1.4.1
libsvm~=3.23.0.4
matplotlib~=3.4.3
tqdm~=4.62.3
torch-geometric~=2.0.3
h5py~=3.6.0

④如果上面的都不行，python说缺哪个就直接pip哪个不用管版本。如果再不行就文心问一下嘛。

2.2. 代码运行

（1）直接运行BrainGB-master\examples\utils\get_abide的01-fetch_data

①懒得检查路径了，然后他默认路径是和BrainGB-master在同一个根目录下。它叫home\root\ABIDE_pcp。然后长这样（一开始没有文件夹是后来它自己创的）：

里面是这样：

②老样子，运行到Caltech会卡一下报错，不管他再run一下就好了。

记录一下Caltech报错详情：

Downloaded 229376 of 291775 bytes (78.6%,   10.6s remaining)Error while fetching file Caltech_0051462_rois_cc200.1D; dataset fetching aborted.Traceback (most recent call last):
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 444, in _error_catcher
    yield
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 567, in read
    data = self._fp_read(amt) if not fp_closed else b""
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 533, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "F:\anaconda3\lib\http\client.py", line 463, in read
    n = self.readinto(b)
  File "F:\anaconda3\lib\http\client.py", line 507, in readinto
    n = self.fp.readinto(b)
  File "F:\anaconda3\lib\socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "F:\anaconda3\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "F:\anaconda3\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\anaconda3\lib\site-packages\requests\models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 628, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 593, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "F:\anaconda3\lib\contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "F:\anaconda3\lib\site-packages\urllib3\response.py", line 449, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\BrainGB-master\examples\utils\get_abide\01-fetch_data.py", line 102, in <module>
    main(args)
  File "F:\BrainGB-master\examples\utils\get_abide\01-fetch_data.py", line 62, in main
    abide = datasets.fetch_abide_pcp(data_dir=root_folder, pipeline=pipeline,
  File "F:\anaconda3\lib\site-packages\nilearn\datasets\func.py", line 1221, in fetch_abide_pcp
    files.append(_fetch_files(data_dir, file_, verbose=verbose)[0])
  File "F:\anaconda3\lib\site-packages\nilearn\datasets\utils.py", line 784, in _fetch_files
    return _fetch_files(
  File "F:\anaconda3\lib\site-packages\nilearn\datasets\utils.py", line 843, in _fetch_files
    dl_file = _fetch_file(
  File "F:\anaconda3\lib\site-packages\nilearn\datasets\utils.py", line 666, in _fetch_file
    _chunk_read_(
  File "F:\anaconda3\lib\site-packages\nilearn\datasets\utils.py", line 171, in _chunk_read_
    for chunk in response.iter_content(chunk_size):
  File "F:\anaconda3\lib\site-packages\requests\models.py", line 822, in generate
    raise ConnectionError(e)

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Read timed out.

这是它卡住时候的中断文件和被试编号

（2）在直接运行02-process_data.py的时候，要求数据集文件的目录是BrainGB-master\examples\utils\get_abide，即：

如果下到其他地方去了拖过来就好。（前面的home\root就别拖了）

①它在filt_noglobal里生成了一个raw，这个和BrainGNN一样诶，里面全是.h5文件

（3）⭐不一样的是他们的03-generate_abide_dataset运行了之后在ABIDE_pcp文件中得到了abide.npy

①abide.npy的意义（readme中的）:

regust.npy文件包含以下内容：

时间序列：表示每个主题的BOLD时间序列数据。这是一个numpy数组，其形状为（#sub，#ROI，#timesteps）。

标签：为每个受试者提供自闭症谱系障碍的诊断标签“0”表示负，“1”表示正。这是一个数字数组的形状（#sub）。

corr：根据BOLD时间序列数据计算的相关矩阵。这是一个numpy数组，其形状为（#sub，#ROI，#ROI）。

pcorr：表示从BOLD时间序列数据导出的偏相关矩阵。这是一个具有维度（#sub、#ROI、#ROI）的numpy数组。

site：指定为每个受试者收集数据的位置。这是一个形状为（#sub）的numpy数组。

重要提示：Label和corr矩阵是BrainGB的实际输入。标签代表我们感兴趣预测的目标结果，通常表示大脑研究中受试者的诊断或状况。corr矩阵描述了相关的大脑网络。如果您正在考虑使用自己的数据集运行BrainGB，那么以类似的方式格式化Label和corr矩阵以确保兼容性和准确的结果是很重要的。确保Label位于形状（#sub）的numpy数组中，并且corr矩阵被构造为具有形状（#sub、#ROI、#ROI）的nummy数组。

将数据集文件abide.npy放在examples文件夹下

（4）BrainGB运行

①readme说直接命令行：

但是我这样第一个main.example_main会报错于是我改成了（原作者在2024.1的新github文件里已经改过这个问题了，不太会出现这样的）：

python -m examples.example_main --dataset_name ABIDE --pooling concat --gcn_mp_type edge_node_concate --hidden_dim 256

②继续报错：FileNotFoundError: [Errno 2] No such file or directory: 'F:\\BrainGB-master\\examples\\datasets\\ABIDE/abide.npy。意思是我abide.npy位置没对，把它挪到这个位置下面就好了

③又报错IndexError: Only slices (':'), list, tuples, torch.tensor and np.ndarray of dtype long or bool are valid indices (got 'ndarray')。

于是我在example_main的

train_set, test_set = dataset[train_index], dataset[test_index]

的前面加了两行（我觉得可以只要第二行但是我懒得删了）

train_index, test_index = np.array(train_index), np.array(test_index) # 我新加的

train_index, test_index = train_index.tolist(), test_index.tolist() # 我新加的

④然后没啥问题了就跑嘛：

⑤笔记本3060跑ABIDE大概四个小时，然后在BrainGB文件夹里出现一个result.log，点进去就是结果了：

再跑一次后面就多一个：

怎么讲呢，这玩意儿也不用全部跑完。BrainGB实际上已经全部跑完了（但是没跑ABIDE）并且分析了。可能注意力还是得放在怎么改模型上吧。

⑥作者在2024.1的时候更改了跑代码的显示，以前跑代码是啥也没有，现在有如下显示：

2.3. 可以自行增补的内容

（1）（目前有acc, auc和F1，F1是文中说的，不知道为啥这代码里面写macro）（但如果不是做自己的模型似乎就没有必要，毕竟别人的感觉可能更需要的就只是结果了）

①SEN

②SPE

③Running time

在example_main文件中加入（有#的是我加的，可以定位到不是我加的内容再加上我加的代码惹）这两段：

    end_time = time.time()  # 记录结束时间 我加的
    elapsed_time = end_time - start_time  # 计算所花费的时间 我加的
    hours = int(elapsed_time / 3600)  #
    minutes = int((elapsed_time % 3600) / 60) #
    seconds = int(elapsed_time % 60) #

    result_str = f'(K Fold Final Result)| avg_acc={(np.mean(accs) * 100):.2f} +- {(np.std(accs) * 100): .2f}, ' \
                 f'avg_auc={(np.mean(aucs) * 100):.2f} +- {np.std(aucs) * 100:.2f}, ' \
                 f'avg_macro={(np.mean(macros) * 100):.2f} +- {np.std(macros) * 100:.2f}\n' \
                 f'running_time={hours}h:{minutes}min:{seconds}s\n'  # 我加的
    logging.info(result_str)

if __name__ == "__main__":
    start_time = time.time()  # 记录开始时间 我加的

    parser = argparse.ArgumentParser()
    parser.add_argument('--dataset_name', type=str,
                        choices=['PPMI', 'HIV', 'BP', 'ABCD', 'PNC', 'ABIDE'],
                        default="BP")
....

④图

（2）参数

①dataset_name: ABIDE

②pooling

③gcn_mp_type:

④n_GNN_layer

⑤n_MLP_layers

⑦hidden_dim

⑧epochs

⑨enable_nni（这个似乎不需要带参，作者说这个可以自动优化超参数，它是AutoML tool NNI）

当启用了这个之后的显示（省略中间）：

(base) PS F:\BrainGB-master> python -m examples.example_main --dataset_name ABIDE --pooling concat --gcn_mp_type node_concate --hidden_dim 256 --enable_nni
F:\anaconda3\lib\site-packages\nni\runtime\trial_command_channel\standalone.py:34: RuntimeWarning: Running trial
 code without runtime. Please check the tutorial if you are new to NNI: https://nni.readthedocs.io/en/stable/tutorials/hpo_quickstart_pytorch/main.html
  warnings.warn(warning_message, RuntimeWarning)
Processing...
Done!
seed for seed_everything(): 492724
[2024-01-16 14:31:06] Intermediate result: 0.6659916913508929  (Index 0)
INFO:nni:Intermediate result: 0.6659916913508929  (Index 0)
[2024-01-16 14:31:28] Intermediate result: 0.6954468357075256  (Index 1)
INFO:nni:Intermediate result: 0.6954468357075256  (Index 1)
[2024-01-16 14:31:50] Intermediate result: 0.6954068902791264  (Index 2)
INFO:nni:Intermediate result: 0.6954068902791264  (Index 2)

.......


[2024-01-16 18:03:20] Intermediate result: 1.0  (Index 487)
[2024-01-16 18:07:24] Intermediate result: 1.0  (Index 496)
INFO:nni:Intermediate result: 1.0  (Index 496)
[2024-01-16 18:07:51] Intermediate result: 1.0  (Index 497)
INFO:nni:Intermediate result: 1.0  (Index 497)
[2024-01-16 18:08:18] Intermediate result: 1.0  (Index 498)
INFO:nni:Intermediate result: 1.0  (Index 498)
[2024-01-16 18:08:47] Intermediate result: 1.0  (Index 499)
INFO:nni:Intermediate result: 1.0  (Index 499)
[2024-01-16 18:08:49] Final result: 0.7123405874446097
INFO:nni:Final result: 0.7123405874446097

⑩model_name: a) gcn, b) gat

（3）这是咋设定的参数？

①node_features

（4）代码内置参数（可修改）：

    parser.add_argument('--dataset_name', type=str,
                        choices=['PPMI', 'HIV', 'BP', 'ABCD', 'PNC', 'ABIDE'],
                        default="BP")
    parser.add_argument('--view', type=int, default=1)
    parser.add_argument('--node_features', type=str,
                        choices=['identity', 'degree', 'degree_bin', 'LDP', 'node2vec', 'adj', 'diff_matrix',
                                 'eigenvector', 'eigen_norm'],
                        default='adj')
    parser.add_argument('--pooling', type=str,
                        choices=['sum', 'concat', 'mean'],
                        default='concat')
                        
    parser.add_argument('--model_name', type=str, default='gcn')
    # gcn_mp_type choices: weighted_sum, bin_concate, edge_weight_concate, edge_node_concate, node_concate
    parser.add_argument('--gcn_mp_type', type=str, default="weighted_sum") 
    # gat_mp_type choices: attention_weighted, attention_edge_weighted, sum_attention_edge, edge_node_concate, node_concate
    parser.add_argument('--gat_mp_type', type=str, default="attention_weighted") 

    parser.add_argument('--enable_nni', action='store_true')
    parser.add_argument('--n_GNN_layers', type=int, default=2)
    parser.add_argument('--n_MLP_layers', type=int, default=1)
    parser.add_argument('--num_heads', type=int, default=2)
    parser.add_argument('--hidden_dim', type=int, default=360)
    parser.add_argument('--gat_hidden_dim', type=int, default=8)
    parser.add_argument('--edge_emb_dim', type=int, default=256)
    parser.add_argument('--bucket_sz', type=float, default=0.05)
    parser.add_argument('--lr', type=float, default=1e-4)
    parser.add_argument('--weight_decay', type=float, default=1e-4)
    parser.add_argument('--dropout', type=float, default=0.5)

    parser.add_argument('--repeat', type=int, default=1)
    parser.add_argument('--k_fold_splits', type=int, default=5)
    parser.add_argument('--epochs', type=int, default=100)
    parser.add_argument('--test_interval', type=int, default=5)
    parser.add_argument('--train_batch_size', type=int, default=16)
    parser.add_argument('--test_batch_size', type=int, default=16)

    parser.add_argument('--seed', type=int, default=112078)
    parser.add_argument('--diff', type=float, default=0.2)
    parser.add_argument('--mixup', type=int, default=1) #[0, 1]

2.4. 报错内容

（1）运行gat的话会报错，以下是根据不同gat_mp_type的报错记录（作者2024.1的代码版本更新后已经不会报错了）

attention_weighted	ValueError: Encountered tensor with size 3200 in dimension 0, but expected size 640000.
attention_edge_weighted	ValueError: Encountered tensor with size 3200 in dimension 0, but expected size 640000.
sum_attention_edge	ValueError: Encountered tensor with size 3200 in dimension 0, but expected size 640000.
edge_node_concate	ValueError: Encountered tensor with size 640000 in dimension 0, but expected size 3200.
node_concate	ValueError: Encountered tensor with size 640000 in dimension 0, but expected size 3200.