Python LLaGA: Large Language and Graph Assistant,graph fundation model,图基础模型,节点分类,链路预测,图分类,基础全任务模型

25 篇文章 1 订阅
18 篇文章 1 订阅

1、文章简介

《LLaGA: Large Language and Graph Assistant》

 

 

 LLaGA模型的三种特性:通用性,可推广性以及可解释性

 

 

 

 该模型的大体思路:

1. 把各种图(graph)结构转化为序列;

2. 转化为序列的方式:以中心节点为root节点,转化为树结构,然后进行层序遍历,每层选取的叶子节点是固定好的,如果不够则用pad补全序列;

3. 如图1将图结构转化为node序列部分,选取了两跳邻居转化为树结构,每一跳选取三个邻居,不够三个用【pad】补全;

 

1. 对graph的文本特征进行embedding,用现成的LM就可以;

2. 是两部分特征(node sequence和text describe)进行对齐;

 

 

 

 

 

 

 

 

 

 

2.开源代码,实验复现

Step 1: Environment Preparation
# create a new environment
conda create -n llaga python=3.10
conda activate llaga

# install pytorch. Modify the command to align with your own CUDA version.
pip3 install torch  --index-url https://download.pytorch.org/whl/cu118

# install related libraries
pip install -r requirements.txt

# install flash-attn
pip install flash-attn --no-build-isolation

# install pyg
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu118.html
Step 2: Data Preparation

Download our datasets from Box (updated link prediction data on 4/11/2024). And move the processed data to ./dataset

.
├── README.md
├── __init__.py
├── doc
├── dataset
│   ├── ogbn-arxiv
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── ogbn-products
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   └── pubmed
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── cora
│   │   ├── sampled_2_10_test.jsonl
│   │   ├── sampled_2_10_train.jsonl
│   │   ├── edge_sampled_2_10_only_test.jsonl
│   │   ├── edge_sampled_2_10_only_train.jsonl
│   │   ├── processed_data.pt
│   │   ├── roberta_x.pt
│   │   ├── sbert_x.pt
│   │   ├── simteg_*_x.pt
│   ├── laplacian_2_10.pt
│   ├── laplacian_2_20.pt
│   ├── laplacian_2_5.pt
├── eval
├── model
├── requirements.txt
├── scripts
├── train
├── utils
Step 3: Training

To execute the training process, you can run either ./scripts/train.sh or ./scripts/train_deepspeed.sh. The usage instructions are as follows:

#Auguments
# $1 = model type, e.g. vicuna, vicuna_4hop
# $2 = training task Use nc/lp/nd for single task. For multiple tasks combined, connect these abbreviations with '-', e.g. nc-lp
# $3 = dataset arxiv/products/pubmed/cora  For multiple datasets combined, connect these abbreviations with '-', and use '.n' to repeat multiple times e.g. arxiv-products-pubmed-cora.3 means using arxiv+products+pubmed+cora, and repeat cora for 3 times
# $4 = batch size  default: 16
# $5 = embedding e.g. simteg, sbert, roberta


#  training on single GPU
CUDA_VISIBLE_DEVICES=0 ./scripts/train.sh vicuna nc-lp  arxiv-products-pubmed-cora.3 16 simteg

#  training on multiple GPU
./scripts/train_deepspeed.sh vicuna nc-lp  arxiv-products-pubmed-cora.3 4 simteg
Step 4: Evaluation

You can evaluate LLaGA with the command:

 

model_path="/path/to/projector" # local path or huggingface repo
model_base="lmsys/vicuna-7b-v1.5-16k" #meta-llama/Llama-2-7b-hf
mode="v1" # use 'llaga_llama_2' for llama and "v1" for others
dataset="arxiv" #test dataset
task="nc" #test task
emb="simteg"
use_hop=2 # 2 for ND and 4 for HO
sample_size=10
template="ND"
output_path="/path/to/output"

python eval/eval_pretrain.py \
--model_path ${model_path} \
--model_base ${model_base} \
--conv_mode  ${mode} \
--dataset ${dataset} \
--pretrained_embedding_type ${emb} \
--use_hop ${use_hop} \
--sample_neighbor_size ${sample_size} \
--answers_file ${output_path} \
--task ${task} \
--cache_dir ../../checkpoint \
--template ${template}

To evaluate your predicted results, please run:

python eval/eval_res.py --dataset ${dataset} --task ${task}  --res_path ${output_path}

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

医学小达人

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值