论文“LLM4CP: Adapting Large Language Models for Channel Prediction”原文+代码解读，该论文利用LLM赋能无线通信物理层任务

最新推荐文章于 2024-11-27 21:32:06 发布

平平无奇哈斯特

最新推荐文章于 2024-11-27 21:32:06 发布

阅读量1.1k

点赞数 11

文章标签： OFDM CSI LLM 语言模型无线通信人工智能信道估计

本文链接：https://blog.csdn.net/m0_64636251/article/details/141636757

版权

注：本文基于个人对论文和代码的理解，难免会有错误，欢迎大家一起交流和学习。

背景

近期，GPT-4和LLaMA等大语言模型（LLM）在自然语言处理等领域取得了巨大的成功，并逐步应用于金融、医疗和教育等特定领域。通过在大规模数据集上进行预训练，大语言模型获得了强大的通用建模能力和泛化能力。然而，当前大语言模型在通信任务上的应用局限于语言形式的协议理解等任务，限制了其在物理层的应用范围。那么，大语言模型能否突破语言局限，赋能非语言形式的无线通信物理层任务？为此，程翔教授团队以信道预测任务为突破点，尝试利用预训练大语言模型提升信道预测的预测精度和泛化能力。然而，利用预训练大语言模型直接处理非语言形式的CSI数据存在以下挑战：

1）不同于文本数据，CSI为高维结构化数据，具有复杂的“空-时-频”三维关系；

2）自然语言域和信道域的知识存在域差异（domain gap），进一步增大了知识迁移的难度。

克服以上挑战，程翔教授团队提出了一种基于预训练大语言模型的MIMO-OFDM信道预测方案，可应用于TDD和FDD通信系统。团队构建了一个基于预训练GPT-2的信道预测网络，包含预处理模块、嵌入模块、预训练LLM模块和输出模块，如图1所示。在训练过程中，预训练LLM的多头注意力层和前馈层保持冻结，以保留预训练LLM中的通用知识。为解决空域高维问题，将天线维并行化处理，在降低网络开销的同时提升了任务的可扩展性。为充分捕获频域特征，充分考虑信道结构化特征，引入时延域以直接刻画多径时延特征。为有效提取时域特征，采用分块处理，捕获局部时域变化特征，并降低计算复杂度。此外，为了克服域差异，设计了嵌入模块将预处理后的特征进一步处理，以对齐预训练LLM的特征空间。

信道估计，FDD/TDD

TDD上行链路和下行链路有信道互易性，BS可以通过上行链路的导频估计下行链路的CSI

In TDD systems, thanks to channel reciprocity, the downlink CSI can be obtained at the BS side by channel estimation on uplink pilots.

FDD上行链路和下行链路使用不同的频段，下行链路的CSI需要由用户估计之后反馈至BS

In FDD systems where the frequency of the uplink and downlink channels differs, downlink CSI can only be estimated at the user side and then fed back to the BS.

由于FDD系统中上下行频段不同，BS不能直接获得下行链路的CSI。为了让BS能够准确地进行下行链路的资源分配和波束成形等操作，用户设备（UE）需要通过反馈机制将估计的下行链路CSI反馈给BS。这使得BS能够了解下行链路的信道状况，从而优化传输性能。

虽然反馈机制使得BS能够获取下行链路的CSI，但也带来了一些问题。首先，CSI的估计和反馈过程会产生额外的计算和传输延迟，尤其是在信道动态变化较快的场景下，会导致信道时变性（即“信道老化channel aging”）的问题。此外，额外的下行导频会占用时间-频率资源，降低FDD系统的频谱效率（SE）。额外的下行导频是相对于TDD系统而言的。在TDD系统中，由于信道的互易性，BS可以通过上行链路的导频信号直接估计下行链路的CSI，因此不需要在下行链路上额外发送导频信号。而在FDD系统中，因为需要在下行链路上发送导频信号给UE进行CSI估计，这些额外的下行导频就会占用一些时间-频率资源，降低系统的频谱效率（SE）。

新方案: BS测使用先前的上行CSI来预测未来的下行CSI

3.1 Channel Prediction-based Transmission

Traditional downlink CSI acquisition schemes for TDD and FDD systems are illustrated in Fig. 2 (a) and (b), respectively. In TDD systems, thanks to channel reciprocity, the downlink CSI can be obtained at the BS side by channel estimation on uplink pilots. In FDD systems where the frequency of the uplink and downlink channels differs, downlink CSI can only be estimated at the user side and then fed back to the BS. However, there are some shortcomings in existing downlink CSI acquisition methods. First, the CSI estimation and feedback process incur additional computational and transmission time overhead, causing channel aging[18] in high dynamic scenarios. In addition, extra downlink pilots occupy some of the time-frequency resources, reducing the SE of FDD systems. Channel prediction-based transmission scheme provides a promising solution to address the above two drawbacks, as shown in Fig. 2 (c). Specifically, it predicts future downlink CSI sequences based on historical uplink CSI sequences, avoiding the overhead of downlink pilots and feedback delay. For further clarification, the time and frequency relationship between uplink and downlink CSI of the channel prediction-based scheme can be illustrated in Fig. 3 (b). Region A represents the uplink CSI, while regions B and D correspond to the predicted downlink CSI under TDD and FDD modes, respectively. Each time-frequency region consists of multiple time-frequency resource blocks (RBs), and each RB contains a pilot, as shown in Fig. 3 (a). In the following channel prediction process, we only consider the CSI associated with the pilots’ positions, while CSI between pilots can be obtained through interpolation methods. We assume that the uplink and downlink links have the same bandwidth and each covers K resource blocks in the frequency domain. In the time domain, future L RBs are predicted based on historical P RBs. For simplicity, we denote the uplink and downlink CSI of each RB as hu(k,s) and hd(k,s) , where k and s represent the indices of RBs in the frequency domain and time domain, respectively.

预测问题的构建

accurately predict future downlink CSI of K × L RBs based on historical CSI of K × P RBs

总体网络架构

借助LLM进行CSI的预测

In order to adapt text-based pre-trained LLM to the complex matrix format of CSI data, specific modules are designed for format conversion and feature extraction, including preprocessor, embedding, backbone, and output.

参数说明

        self.K = K  # 子载波数目
        self.UQh = UQh # BS端天线阵列水平方向阵元数目
        self.UQv = UQv # BS端天线阵列垂直方向阵元数目
        self.BQh = BQh # User端天线阵列水平方向阵元数目
        self.BQv = BQv # User端天线阵列水平方向阵元数目
        self.Nt = UQh * UQv # 对应于uplink CSI
        self.Nr = BQh * BQv

        self.mul = prev_len * K * UQh * UQv * BQh * BQv
        self.enc_in = K * UQh * UQv * BQh * BQv # 每一对天线的子载波输入
        self.c_out = K * UQh * UQv * BQh * BQv

预处理模块

天线并行化处理parallelize the processing of antennas，分别获取子载波的时频数据，并且进行张量和归一化转换，再根据时间划分patching operation

模块说明

代码解读

结构框图

具体代码

B表示批处理长度

L表示时间长度

D表示每个时间步特征维度（Feature Dimension），该特征可能包含了复杂信号的实部和虚部的展开(2*K)

重排张量维度并构造复数张量
应用傅里叶逆变换
分离实部和虚部并沿着 dim=2（即 k 维度）拼接, 生成的张量 x_enc_delay 的形状为 [B, L, 2*k]，这里 2*k 是实部和虚部的拼接结果
对张量形状进行调整以适配补丁处理

        mean = torch.mean(x_enc)
        std = torch.std(x_enc)
        x_enc = (x_enc - mean) / std
        B, L, enc_in = x_enc.shape  # [B, L, D]

        # process in delay domain
        x_enc_r = rearrange(x_enc, 'b l (k o) -> b l k o', o=2)
        # 将特征维度 D 分解后的两个新维度 k 和 o 分别作为输出张量的两个独立维度
        x_enc_complex = torch.complex(x_enc_r[:, :, :, 0], x_enc_r[:, :, :, 1])
        # 复数形式
        x_enc_delay = torch.fft.ifft(x_enc_complex, dim=2)
        x_enc_delay = torch.cat([torch.real(x_enc_delay), torch.imag(x_enc_delay)], dim=2)
        x_enc_delay = x_enc_delay.reshape(B, L // self.patch_size, self.patch_size, enc_in)
        x_enc_delay = self.patch_layer(x_enc_delay.permute(0, 1, 3, 2)).permute(0, 1, 3, 2)
        # 原来的维度顺序是 [B, L // self.patch_size, self.patch_size, enc_in]-->
        # 变为 [B, L // self.patch_size, enc_in, self.patch_size]-->
        # 利用self.patch_layer = nn.Linear(self.patch_size, self.patch_size)线性层处理-->
        # 再将维度顺序切换回来
        x_enc_delay = x_enc_delay.reshape(B, L, enc_in)
        x_enc_delay = rearrange(x_enc_delay, 'b l (k o) -> b o l k', o=2)
        x_enc_delay = self.RB_f(x_enc_delay)

        # process in frequency domain
        x_enc_fre = x_enc.reshape(B, L // self.patch_size, self.patch_size, enc_in)
        x_enc_fre = self.patch_layer(x_enc_fre.permute(0, 1, 3, 2)).permute(0, 1, 3, 2)
        x_enc_fre = x_enc_fre.reshape(B, L, enc_in)
        x_enc_fre = rearrange(x_enc_fre, 'b l (k o) -> b o l k', o=2)
        x_enc_fre = self.RB_e(x_enc_fre)

        x_enc = x_enc_fre + x_enc_delay
        x_enc = rearrange(x_enc, 'b o l k -> b l (k o)', o=2)  # [B, L, D]

嵌入模块

CSI attention modules

CSI attention module is designed for feature analysis

The convolution layers extract temporal and frequency features within each patch and integrate features across different patches.

代码解读

结构框图

具体代码

CSI attention module

where Conv(·) represents the 2D convolution operator and ReLU(·) represents the ReLU[37] activation function. The convolution layers extract temporal and frequency features within each patch and integrate features across different patches.

网络结构定义--RB_e 和 RB_f 是两个包含多个卷积层和残差块的神经网络模块，输出的通道数与输入相同（都是 2）

在输入之前已经进行了维度变换'b l (k o) -> b o l k', o=2

        self.RB_e = nn.Sequential(nn.Conv2d(2, res_dim, 3, 1, 1))
        self.RB_f = nn.Sequential(nn.Conv2d(2, res_dim, 3, 1, 1))
        for i in range(self.res_layers):
            self.RB_e.append(Res_block(res_dim))
            self.RB_f.append(Res_block(res_dim))
        self.RB_e.append(nn.Conv2d(res_dim, 2, 3, 1, 1))
        self.RB_f.append(nn.Conv2d(res_dim, 2, 3, 1, 1))

残差块说明

疑问：

在代码的参数传递过程当中，在进入RB_e之前，执行了x_enc_fre = rearrange(x_enc_fre, 'b l (k o) -> b o l k', o=2)，也即是维度变为（B,2,L,K），并没有符合patching（理论上N*P'=L）
并且上述块连接当中，进入Res_block之前的维度为（B,res_dim,L,K）
框图里面的N1、N2是否就是self.res_layers的数目

class Res_block(nn.Module):
    def __init__(self, in_planes):
        super(Res_block, self).__init__()

        self.conv1 = nn.Conv2d(in_planes, in_planes, 3, 1, 1)
        self.conv2 = nn.Conv2d(in_planes, in_planes, 3, 1, 1)
        self.ca = ChannelAttention(in_planes=in_planes, ratio=1)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        rs1 = self.relu(self.conv1(x))
        rs1 = self.conv2(rs1)
        channel_attn = self.ca(rs1)
        output = channel_attn * rs1
        rs = torch.add(x, output)
        return rs

通道注意力说明

class ChannelAttention(nn.Module):
    def __init__(self, in_planes, ratio=4):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.fc1 = nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Conv2d(in_planes // ratio, in_planes, 1, bias=False)

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc2(self.relu1(self.fc1(self.avg_pool(x))))
        max_out = self.fc2(self.relu1(self.fc1(self.max_pool(x))))
        out = avg_out + max_out
        return self.sigmoid(out)

对应于论文当中的SE block

 x_enc = rearrange(x_enc, 'b o l k -> b l (k o)', o=2)  # [B, L, D]

该行代码完成了rearranged，论文中的效果是2K×N×P′转换至2KN×P′，代码中是

[B,2,L,K]转换至[B,L,D(2*K)]

而关于patching的处理是在之前的代码

x_enc_delay = self.patch_layer(x_enc_delay.permute(0, 1, 3, 2)).permute(0, 1, 3, 2)

enc_out = self.enc_embedding1(x_enc, x_mark_enc) # [B, L, 768]

该行代码完成了下面三个embedding的操作，其中的TokenEmbedding类似于框图中的FC全连接层操作，将输入维度变化至预训练网络的特征维度，然后PositionalEmbedding实现位置嵌入之后相加

class DataEmbedding(nn.Module):
    def __init__(self, c_in, d_model, embed_type='fixed', freq='h', dropout=0.1):
        super(DataEmbedding, self).__init__()

        self.value_embedding = TokenEmbedding(c_in=c_in, d_model=d_model)
        self.position_embedding = PositionalEmbedding(d_model=d_model)
        self.temporal_embedding = TemporalEmbedding(d_model=d_model, embed_type=embed_type,
                                                    freq=freq) if embed_type != 'timeF' else TimeFeatureEmbedding(
            d_model=d_model, embed_type=embed_type, freq=freq)
        self.dropout = nn.Dropout(p=dropout)

    def forward(self, x, x_mark):
        if x_mark is None:
            # 2,25,512   1,25,512
            x = self.value_embedding(x) + self.position_embedding(x)
        else:
            x = self.value_embedding(
                x) + self.temporal_embedding(x_mark) + self.position_embedding(x)
        return self.dropout(x)

TokenEmbedding

TokenEmbedding 类在 DataEmbedding 中用于对输入数据进行特征提取和编码。

初始化：
- TokenEmbedding 类继承自 nn.Module。
- 构造函数中定义了一个一维卷积层 self.tokenConv，其输入通道数为 c_in，输出通道数为 d_model，卷积核大小为 3，填充方式为 circular，不使用偏置。
- 使用 nn.init.kaiming_normal_ 对卷积层的权重进行初始化。
前向传播：
- 输入 x 的形状为 (batch_size, seq_len, c_in)。
- x 通过 permute 方法调整维度顺序，变为 (batch_size, c_in, seq_len)，以适应一维卷积层的输入要求(在 PyTorch 中，nn.Conv1d 层要求输入数据的维度顺序为 (batch_size, channels, sequence_length))。
- 经过卷积层 self.tokenConv 后，输出形状为 (batch_size, d_model, seq_len)。
- 使用 transpose 方法调整维度顺序，变为 (batch_size, seq_len, d_model)，以适应后续处理。

class TokenEmbedding(nn.Module):
    def __init__(self, c_in, d_model):
        super(TokenEmbedding, self).__init__()
        padding = 1 if torch.__version__ >= '1.5.0' else 2
        self.tokenConv = nn.Conv1d(in_channels=c_in, out_channels=d_model,
                                   kernel_size=3, padding=padding, padding_mode='circular', bias=False)
        for m in self.modules():
            if isinstance(m, nn.Conv1d):
                nn.init.kaiming_normal_(
                    m.weight, mode='fan_in', nonlinearity='leaky_relu')

    def forward(self, x):
        x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
        return x

PositionalEmbedding

self.pe[:, :x.size(1)]: 根据输入序列的长度，截取位置编码的前 x.size(1) 个位置

class PositionalEmbedding(nn.Module):
    def __init__(self, d_model, max_len=5000):
        super(PositionalEmbedding, self).__init__()
        # Compute the positional encodings once in log space.
        pe = torch.zeros(max_len, d_model).float()
        pe.require_grad = False

        position = torch.arange(0, max_len).float().unsqueeze(1)  # 5000,1

        div_term = (torch.arange(0, d_model, 2).float()  # 256
                    * -(math.log(10000.0) / d_model)).exp()

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)

        pe = pe.unsqueeze(0)  # 1,5000,512
        self.register_buffer('pe', pe)

    def forward(self, x):
        return self.pe[:, :x.size(1)]

TemporalEmbedding

TemporalEmbedding 用于将时间信息嵌入到输入数据中，帮助模型理解序列中的时间特征。

初始化：根据不同的时间频率（freq），如小时、星期几、日期、月份等，构建相应的嵌入层。时间信息通过 x_mark 输入，并分别经过这些嵌入层得到不同的时间特征。
前向传播：前向传播过程中，x_mark 输入的时间信息通过对应的嵌入层得到时间特征，然后这些特征相加，形成时间嵌入。

FixedEmbedding 类实现了一个固定的嵌入层

c_in: 表示输入的特征维度或类别数量。
d_model: 表示目标嵌入的维度大小。
形状为 (c_in, d_model)

class TemporalEmbedding(nn.Module):
    def __init__(self, d_model, embed_type='fixed', freq='h'):
        super(TemporalEmbedding, self).__init__()

        minute_size = 4
        hour_size = 24
        weekday_size = 7
        day_size = 32
        month_size = 13

        Embed = FixedEmbedding if embed_type == 'fixed' else nn.Embedding
        if freq == 't':
            self.minute_embed = Embed(minute_size, d_model)
        self.hour_embed = Embed(hour_size, d_model)
        self.weekday_embed = Embed(weekday_size, d_model)
        self.day_embed = Embed(day_size, d_model)
        self.month_embed = Embed(month_size, d_model)

    def forward(self, x):
        x = x.long()
        minute_x = self.minute_embed(x[:, :, 4]) if hasattr(
            self, 'minute_embed') else 0.
        hour_x = self.hour_embed(x[:, :, 3])
        weekday_x = self.weekday_embed(x[:, :, 2])
        day_x = self.day_embed(x[:, :, 1])
        month_x = self.month_embed(x[:, :, 0])

        return hour_x + weekday_x + day_x + month_x + minute_x

疑问：

为有效提取时域特征，采用分块处理，捕获局部时域变化特征，并降低计算复杂度。

TemporalEmbedding具体物理意义如何理解？论文中没有注意到相关描述？

骨干网络

分析

Without loss of generality, GPT-2[39] is chosen as the LLM backbone in this work. The backbone of GPT-2 is composed of a learnable positional embedding layer and stacked transformer decoders[38], where the number of stacks and feature dimensions can be flexibly adjusted according to the requirements. Each layer consists of self-attention layers, feedforward layers, addition, and layer normalization, as shown in Fig. 4. During the training process, self-attention and feedforward layers are frozen to retain universal knowledge, while addition, layer normalization, and positional embedding are fine-tuned for adapting the LLM to the channel prediction task. It is worth noting that in the proposed method, the GPT2 backbone can be flexibly replaced with other LLM, such as Llama[40]. The selection of the type and size of the LLM needs to consider the trade-off between training costs and performance.

代码

        dec_out = self.gpt2(inputs_embeds=enc_out).last_hidden_state  # [B, L, 768]
        dec_out = dec_out[:, :, :self.d_ff]

        dec_out = self.out_layer_dim(dec_out) # 线性层
        dec_out = self.output_layer_time(dec_out.permute(0, 2, 1)).permute(0, 2, 1)

        dec_out = dec_out * std + mean

        return dec_out[:, -self.pred_len:, :]  # [B, L, D]

其中

        self.out_layer_dim = nn.Linear(d_ff, self.c_out * 2)
        self.output_layer_time = nn.Sequential(
            nn.Linear(self.prev_len, self.pred_len)
        )

实现prev_len转化至pred_len，即使用过去prev_len个时隙预测未来pred_len个时隙

疑问：

output_layer_time = nn.Sequential(nn.Linear(self.prev_len, self.pred_len))其中prev_len = 16，pred_len=4，现在执行 dec_out = self.output_layer_time(dec_out.permute(0, 2, 1)).permute(0, 2, 1)，但是dec_out的维度为[B,L,768],第二个维度并不一定是16，这样还可以执行吗