xLSTM扩展长短期记忆网络技术文档-CSDN博客

本文链接：https://blog.csdn.net/gitblog_01205/article/details/143038026

xLSTM扩展长短期记忆网络技术文档

xlstm Official repository of the xLSTM. 项目地址: https://gitcode.com/gh_mirrors/xl/xlstm

安装指南

最小化安装

为了快速启动，你可以通过以下命令创建一个基于Conda的环境，并安装必要的依赖：

conda env create -n xlstm -f environment_pt220cu121.yaml
conda activate xlstm
pip install xlstm

或者，如果你更倾向于从源代码安装，可以采用以下步骤：

git clone https://github.com/NX-AI/xlstm.git
cd xlstm
pip install -e .

请注意，本项目基于PyTorch框架，并且要求版本为>=1.8。对于需要CUDA支持的场景，确保你的GPU计算能力不低于8.0，详细信息可参考NVIDIA CUDA GPUs页面。

项目使用说明

核心模块使用

xLSTM提供了两个主要的使用场景：作为模型的一部分（通过xLSTMBlockStack）和语言建模专用（通过xLSTMLMModel）。

xLSTM Block Stack

如果你想将xLSTM块集成到现有架构中，应使用xLSTMBlockStack。示例代码如下：

import torch
from xlstm import xLSTMBlockStack, xLSTMBlockStackConfig

# 配置xLSTMBlockStack
cfg = xLSTMBlockStackConfig(
    num_blocks=7, embedding_dim=128, context_length=256, ... # 其余配置项
)

xlstm_stack = xLSTMBlockStack(cfg)
x = torch.randn(4, 256, 128).to("cuda")  # 输入数据
output = xlstm_stack(x)  # 输出保持相同维度

xLSTM语言模型(xLSTMLMModel)

用于语言建模任务，xLSTMLMModel包含了嵌入层和语言模型头部。

from omegaconf import OmegaConf
from xlstm import xLSTMLMModel, xLSTMLMModelConfig

# 配置xLSTMLMModel
xlstm_lm_cfg = """ 
vocab_size: 50304
... # 基于xLSTMBlockStack的其他配置
"""
cfg = OmegaConf.create(xlstm_lm_cfg)
cfg = from_dict(data_class=xLSTMLMModelConfig, data=OmegaConf.to_container(cfg))

xlstm_model = xLSTMLMModel(cfg)
input_ids = torch.randint(0, 50304, size=(4, 256)).to("cuda")
output_logits = xlstm_model(input_ids)  # 输出形状为(batch_size, sequence_length, vocab_size)

API使用文档

xLSTMBlockStack: 接收配置对象xLSTMBlockStackConfig来构建堆叠的xLSTM块。
xLSTMLMModel: 构造函数接受xLSTMLMModelConfig，包含词汇表大小等参数，并集成了token嵌入和预测头。

配置文件解析

配置文件可通过YAML定义，利用OmegaConf解析后通过dacite转换为Python类实例。

实验与应用

本项目附带实验脚本(experiments/main.py)，可用于展示不同配置下的xLSTM性能，如执行不同的任务配置文件即可运行特定实验。

引用

在您的研究或项目中使用此技术时，请正确引用原论文：

@article{xlstm,
  title={xLSTM: 扩展长短期记忆网络},
  author={贝克, 马克西米兰等},
  journal={arXiv预印本arXiv:2405.04517},
  year={2024}
}

以上就是xLSTM项目的基本技术文档。请根据具体需求调整配置细节以满足你的应用场景。希望这份文档能够帮助您顺利地使用xLSTM进行研发工作。

xlstm Official repository of the xLSTM. 项目地址: https://gitcode.com/gh_mirrors/xl/xlstm

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考