GPIoT 解读_chatiot: zero-code generation of trigger-action ba-CSDN博客

本文链接：https://blog.csdn.net/weixin_52319505/article/details/146153808

GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development

Abstract

大意：
现有的代码大型语言模型（如WizardCoder和CodeLlama）无法较好完成IoT领域的代码生成任务，因为它们主要针对通用编程任务进行训练，IoT相关知识在其训练数据中占比很小。此外，使用云端LLMs（如GPT-4）进行IoT应用开发存在隐私泄露、网络不稳定和高昂的查询成本等问题。
而本文提出的GPIoT由三个本地部署的小型语言模型组成，分别是任务分解SLM（TDSLM）、需求转换SLM（RTSLM）和代码生成SLM（CGSLM）。TDSLM将IoT应用分解为多个子任务，RTSLM将子任务描述转换为结构化的规范，CGSLM根据规范生成代码和文档。（这个论文提出的方法有三个slm组成）

1 INTRODUCTION

当前出现了许多结合工具的LLM，如图1所示。

通用大模型在训练过程中，因为使用的数据集中IOT占比太少，它会赋予iot相关术语（terminologies）更低的比重导致输出的结果不适用于iot，虽然现在有许多llm+rag的方法出现，但是有挑战：
1.模型本身的能力问题；
2.对RAG的质量要求；
3.hallucinations and unreality of LLMs，Prompt必须保证格式！

GPIOT的优势：
1）本地部署；
2）iot数据集与iot相关性高
3）iot数据集自带格式，避免问题3

GPIOT使用3个SLM（small language Model）来解决这个问题。TDSLM for Task Decomposition, RTSLM for Requirement Transformation, and CGSLM for Code Generation.
这三个模型里只有TDSLM和RTSLM需要微调，而CGSLM只要进行语言加工就好了。

GPIOT的挑战：
1）缺数据集。自行搭建两个数据集。
2）SLMs之间的领域不能对齐。TDSLM分解的任务可能超过了CGSLM的处理范围。we propose a parameter-efficient co-tuning (PECT) paradigm featuring a multi-path Low-Rank Adaptation (LoRA) pipeline.

论文中的PECT使用的是一个共同的base model但使用了不同的adapter，以此来降低不协同并利用共享参数。

3）格式问题。使用COT来提示格式规范。

测评：
在iotbenchmark来进行测评。

GPIoT can generate code adopting more IoT-specialized algorithms and outperform SOTA code LLMs in terms of task accuracy (more than 64.7% on average), memory usage (less than 310 MB on average), and user satisfaction.

2 background & motivation

为了证明定制Llm模型（tailored IoT-related LLMs）的重要性，实施了几个预实验（preliminary experiments）。

2.1 现有模型局限性

现有的LLM在通用和简单的编程任务上表现出色，但在复杂的IoT应用开发中存在不足。因为它们主要针对通用编程任务进行训练，IoT相关知识在其训练数据中占比很小。这导致通用词比专业词在向量空间中有更高的相似度。

如图2(a) 所示，当提示LLM设计个先点图的r peak detection method，它还是优先设计一个同样的方法，而不是心电图专用的方法，如Pan-Tompkins。

LLMs+RAG

LLMs+RAG的方法通过提供参考和级联的agent设计，来提升正确率，如图2(b)所示。论文使用MapCoder [30]进行测试，发现代码的正确率只有28%。原因：1.需要高精度的rag；2.多级连agents会引入noise and propagates errors；3.LLM在生成内容的过程中不可避免地倾向于生成通用方法。

Note that we use Llama2-13b as the default SLM for demonstration purpose.

2.2 预实验

预实验主要在证明以下几个方面。

数据集质量问题。四个模型，gpt-4o，original, simple tuned, augmented这四个模型，常见的数据增强方法带来的提升很弱（如 Evol-Instruct），因为它只专注于增强原始文本数据的语言特征，这可能无法有效地捕获物联网术语之间复杂的关系。

GPT-4O是最高的，但是依旧低于50%。

This motivates us to design an IoT-tailored text augmentation method to enhance the

quantity, quality, and diversity of the original dataset

Domain misalignment. TDSLM和CGSLM在不同数据集上训练，串联时会发生领域不对齐问题。主要因为这俩模型不是都在一个领域训练最终会导致TDSLM的输出超过CGSLM能处理的范围内。

GSLM to synthesize corresponding programs for each sub-task. Surprisingly, we find that only 53.4% of the programs can be successfully executed without bugs and only 10.6% of the programs adopt IoT-specialized algorithms for the IoT tasks.

This motivates us to develop a knowledge-sharing strategy between the two SLMs during

tuning so that they can reach a consensus when handling IoT tasks.

Format incompatibility. 因为训练的时候CGSLM已经微调成接受结构化语言，而TCSLM则是输出自然语言（如图3b所示），这导致直接以TCSLM作为CGSLM的输入的准确率只有23.6%。考虑到直接将TCSLM微调成输出结构化语言会超过slm的承受范畴，所以我们增加一个中间模型。

This motivates us to develop a method to convert the task descriptions in natural language into well-organized specifications.

3 SYSTEM OVERVIEW

GPIoT系统主要由两个阶段组成：离线调优阶段（Offline Stage）和在线处理阶段（Online Stage）。如图4所示。

offline stage

构建两个专门用于IoT领域的数据集，并微调TDSLM和CGSLM。
为了微调，首先建立一个rag agent，此外从website和paper获取iot data用于构建数据集，然后再使用IoT-oriented augmentation method(§ 4.1)

RAG是在offline stage构建数据集时使用，至于微调两个模型，原文如下：

we fine-tune two SLMs via our PECT paradigm, where certain model parameters are collaboratively tuned through a multiple-path LoRA pipeline with two projection layers

for task decomposition and code generation, respectively

两个模型参数是通过lora加投影层共享协调的。

Online Stage.

过程具体讲解如下：

GPIoT first leverages Task Decomposition SLM (TDSLM) to decompose the IoT application into multiple manageable sub-tasks with detailed descriptions (①~ ② ). Next, through CoT-based prompting techniques, the sub-task descriptions will be gradually transformed into well- structured specifications by Requirement Transformation SLM (RTSLM) (③ ∼ ④ ). Next, for each sub-task, Code Generation SLM (CGSLM) accordingly generates a code snippet with documentation (⑤ ∼ ⑥ ). Users can execute the code sequentially to realize the IoT application based on the instructions from the documentation (⑦ ).

SLM Considerations.（SLM注意事项）

SLM是开源模型可以本地部署，在开销、隐私、独立性上相较于云端模型有优势。

This aligns with the practical constraints of normal users, where local models offer advantages in terms of cost, privacy, and independence from the cloud.

此外，这个微调模型的开销很小很小。

there are three SLMs working simultaneously, they share the same foundation model and differ only in some additional tunable parame ters , which only occupy 1% of all the parameters.

4 SYSTEM DESIGN

4.1 Data Collection & Augmentation

论文首先构建了一个任务分解数据集（TDD）和一个代码生成数据集（CGD）。该数据集包含文本形式的Q&A，这与传统的IoT数据集根本不同，后者通常包含传感器数据和标签对。

4.1.1 Task Decomposition Dataset.

TDD contains pairs of "problem statement → decomposed tasks "，用于TDSLM的微调。

这个数据集的构建包括三个过程：

raw IoT-related text data collection, data formatting, and IoT-oriented text data augmentation.

Raw Data Collection. public literature databases中收集数据。SOTA的iot论文包含许多Iot相关词语。

Data Formatting. 一开始想直接使用" System → technical modules"的格式分解，但问题是：1）内容太长超过SLM限制；2）子模块太复杂（sophisticated）影响大模型处理。所以论文将每个模块视为独立的，SLM就可以比较容易地处理。生成" problem statement → decomposed tasks"格式。图5所示就是完整数据集构建过程。

具体构建过程在原文中如下：

we first build a RAG agent by combining the downloaded papers with an LLM (GPT-4o). Based on the provided context documents, we then prompt (Fig. 6(a)) the agent to split the proposed system in the paper into multiple technical modules with detailed descriptions. Next, for each technical module, we prompt (Fig. 6(b)) the agent to further decompose it into several sub-tasks with detailed implementation steps.

如此，我们可以得到每个模块的问题 $p_i$ 和它对应的sub-task $t_i$ 作为问题对 $Q_i$ ，如下所示：

where $n_t$ is the total number of technical modules from all the papers.

图8(a)展示了数据集的一个样本，其中每个子任务 $t_i$ 由一个空行分隔，以便于解析和进一步处理。

IoT-Oriented Data Augmentation. 之前已经提到了LLM会倾向于同样术语，所以我们提出了我们的IoT-oriented data augmentation method，如图7（b）所示。这个方法会考虑Iot特性，如sensor modality, data representation, and system resource heterogeneity

为了使分解的数据集能够考虑sensor modality, data representation, and system resource heterogeneity这三个特性，如图8（b）所示提示gpt-4o rewrite and augment each problem

statement from $D_t$ .

为了达到这个目的我们使用了个search agent去提示gpt-4o产生这样的decomposed任务。并且人工过滤不正确的结果和选择格式。

The augmented dataset D 𝑡 ′ is:

where 𝐴 𝑗 (·) is the 𝑗 -th type of augmentation operation and 𝐺 (·) is the black-box function of GPT-4o.

4.1.2 Code Generation Dataset.

CGD contains pairs of " task specification → code & documentation ", 用于CGSLM微调。

CGD数据集构建包括如下两个过程：1）raw data collection 2）target diversity-aware augmentation

Raw Data Collection. 从公开网站爬虫相关Iot的包和应用，

一个网页典型包括两个部分：1） API reference （包括一系列modules的指引信息，我们将这些信息记为 $m_i$ ）2)Example gallery (使用示例，我们将每个示例记作 $u_j$ ）

结合 $m_i$ 和 $u_j$ ，我们得到了raw dataset $D_c$ 。

where $n_m$ and $n_u$ is the total number of modules and usage samples.

Target Diversity-Aware Augmentation. 这是为了增强 $D_c$ 中metadata的多样性。

对软件包的每个模块进行两个任务：1） Module Description: providing detailed descriptions of a module and 2) Mod ule Implementation: writing code & documentation to demonstrate usage samples of the module.

1) Module Description.

"Provide detailed descriptions of <module> from <package> ."

This Q&A mapping relation from a task specification $t_i$ to a module description $m_i$ is expressed as:

" task specification → module description " mapping 可以教会大模型学习模型的信息。

图10(a)展示了模块描述的示例，图10(b)展示了模块实现的示例，图10(c)展示了任务规范的示例。

2) Module Implementation.

"Write some Python code with comments and documentation to perform <target> by using <module> from <package> ."

This Q&A mapping relation from a task specification $t_i$ to the corresponding module

implementation ( i . e ., code $c_i$ and documentation $d_i$ ) is:

" task specification → module implementation " mapping

3) Example Implementation.

This Q&A mapping relation from a task specification 𝑡 𝑗 to the code 𝑐 𝑗 and documentation $d_j$ can be expressed as:

Ultimately, by concatenating all three augmented datasets, the final CGD D 𝑐 ′ becomes:

4.1.3 IoTBench.

作者从tdd和cgd中挑选，外加手动编造了100个测试任务。这些任务不加入调优过程中。

不使用公开数据集是因为那些数据集与Iot领域无关。

4.2 Parameter-Efficient Co-Tuning (PECT)

在获得了two augmented datasets (D𝑡 ′ and D𝑐 ′ )，下一步是微调。

4.2.1 LoRA Tuning.

图10（d）所示是传统lora微调过程。

each Transformer block in an LLM contains two main components: a self-attention mechanism and a feed-forward network (FFN), both of which are followed by residual connections and layer normalization.

The self-attention features three tunable weight matrices ( 𝑊 𝑞 , 𝑊 𝑘 , and 𝑊 𝑣 ) to capture contextual relationships between input embeddings, while the FFN processes the outputs from the attention mechanism to refine the feature representations.

传统LLM微调更新全部权重矩阵（ weight matrices）导致内存告罄和开销太大。

LoRA reduces the number of tunable parameters [15 , 18 ], where two low-rank matrices 𝑨 and 𝑩 ( i .e., LoRA adapters) are inserted alongside the weight matrix.（低秩矩阵A和B被插入到权重矩阵中）

Given an input 𝑿, the tuning process can be expressed as:

只要更新AB就够了，降低开销。（为什么这个公式里只出现了Wv）

如下是rola的原理图：

更新的矩阵A和B它们的参数要少，这主要由参数r决定的。

论文的公式没有写全，它其实是用V'表示了Wv的调优，而对于K'和Q'都是一样的。

但由于§ 2.2提到的不对齐问题，在不同数据集上训练的tdslm和cgslm没法很好地协调工作。故提出了parameter-efficient co-tuning (PECT) paradigm。

Un like conventional LoRA tuning that tunes adapters separately, PECT enables collaborative fine-tuning of several SLMs with a shared base model but with different LoRA adapters.

PECT features（在这里翻译为拥有） a Multi-Path LoRA Pipeline (MPLP) and two lightweight

projection layers

4.2.2 Multi-Path LoRA Pipeline.

Pipeline Construction.

we create three pipelines of LoRA adapters in each Transformer block. Two pipelines (the orange one and the green one) are independently tuned by TDSLM and CGSLM with respect to TDD and CGD . The other pipeline (the gray one) is co-tuned on both TDD and CGD.

For example, given a data sample from TDD, only the orange LoRA adapters and gray LoRA adapters are updated, as shown in Fig. 11.

Note that we only assign the shared adapters beside the key and value weight matrices (𝑾 𝑘 , 𝑾 𝑣 ).

value vector provides the information to be activated based on the key vector [63 ]. In other words, the mapping from problem statement to decomposed tasks and the mapping from task specification to code & documentation are determined by the key and value vectors in

TDSLM and CGSLM, respectively.

也正是因此，为别微调这两个矩阵就造成了不对齐，所以论文中设置了共享pipeline。

Co-Tuning.

In Fig. 11, we designate the orange line as the task decomposition path (TDP), through which only data from TDD will pass. The green line is the code generation path (CGP), through which only data from CGD will pass. The grey line represents the co-tuning path, through which all data will pass.

take the adapter alongside the projection matrix 𝑾 𝑘 as an example, the key vectors after projection in the two paths are calculated by

where 𝑿 is the input text embedding, 𝑲 1 and 𝑲 2 are the key vectors within TDP and CGP, respectively. 𝑩 1 𝑨 1 and 𝑩 2 𝑨 2 are the parameters of LoRA adapters independently tuned within the two paths, respectively.

𝑩 𝑐 𝑨 𝑐 are the LoRA adapters collaboratively tuned by the two paths.

𝜆 is a hyper-parameter to balance the data flow between the two paths.

在协同微调过程中，我们首先从TDD和CGD中随机抽取数据。接下来，如果数据从TDD采样，则将通过TDP（task decompose pipeline），否则将通过CGP。然后，我们计算损失，并根据数据样本的源更新相应的LoRA适配器。

4.2.3 Projection Layers.

为了进一步增强在调优过程中两个slm之间的知识共享，我们为这两条路径创建了两个 projection layers。

作者在transformer block中将two projection与FFN并行。

As such（因此）, they can serve as extra FFNs that apply non-linear transformations to the

attention representations. （是指self attention层的表示形式吗？）

take TDP as an example, as shown in the upper part of Fig. 11, with the obtained value (denoted as 𝑥 1 ) from the LayerNorm in TDP, we feed it into a projection layer 𝐿 1 (·) . The output is then

added with the FFN’s output 𝐹 ( 𝑥 2 ) and 𝑥 2 in CGP. The sum will be sent to the next Transformer block for further processing. Such a knowledge transfer process can be expressed as:

where 𝑥 ′ 1 and 𝑥 ′ 2 are the final output of the two paths. 𝑥 1 and 𝑥 2 are the input attention representations from TDP and CGP, respectively.

𝐹 (·) represents the FFN layer, 𝐿 1 (·) and 𝐿 2 (·) are the projection layers in the two paths, and 𝛾 is a hyper-parameter to balance the knowledge-sharing between the two paths.

请注意，每个投影层都具有与FFN相同的体系结构，由两个完全连接的层和一个非线性的SwiGLU函数[54]组成。

4.3 Requirement Transformation

使用RTSLM进行转换，将task description 转化为well-structured data，考虑到SLM的能力受限，使用RAG辅助。

RAG Construction.

we first transform all the downloaded papers into a text embedding database. Then, armed with such an IoT knowledge database, we build a RAG agent based on RTSLM to retrieve relevant context for reference.

CoT Prompting

如图10（c）所示，a well-structured task specification for code generation, consisting of three parts: task target, input and output specifications of the expected code.

对每个 decomposed task 𝑡𝑖 generated by TDSLM。

we first prompt (Fig. 12(a)) the agent to summarize a target for the task. Next, we further instruct (Fig. 12(b)) the agent to generate a list of parameter descriptions for the input and output of the expected code.

每个单一的参数描述项（parameter description item）都包含参数名称（parameter name）、参数类型（parameter type）和对其含义的简要说明（a brief explanation of its meaning）。例如，“信号（numpy.ndarray）：从有噪声的患者那里收集的原始心电图数据。”最后，RTSLM将上述信息重新组织并格式化为一个结构良好的任务规范，CGSLM将进一步处理该规范，以生成相应的代码片段和文档。

5 EXPERIMENT SETUP

5.1 Implementation

System Configurations.

an RTX 4090 GPU (24 GB)
use selenium as a web crawler
an agent based on GPT-4o and LangChain to perform data formatting and augmentation
an NVIDIA A100 GPU (80 GB) to tune SLM on cloud.

Hyper-parameters.

TDD contains 36,098 pairs of " problem state ment → decomposed tasks ".

CGD contains 35,419 pairs of " task specification → code & documentation ".

Llama2-13b [ 61 ] with INT8 quantization serves as the foundation model

LoRA [ 27 ], with a rank of 64 and a dropout rate of 0.1

tuning epochs is 5, with an initial learning rate of 0.0001, variedby a cosine learning rate scheduler

The 𝜆 in Eq. 9 and the 𝛾 in Eq. 10 are both set to 0.5 by default.

The tuning process takes around 80 GPU hours. Since TDSLM, RTSLM, and CGSLM share the same foundation model, only about 16 GB of GPU memory is needed for

the whole system

5.2 IoT Applications

1) Heartbeat Detection (HD)

2) Human Activity Recog nition (HAR)

3) Multimodal HAR

6 EVALUATION

6.1 Metrics

HD . 1) Precision:The fraction of correctly detected R-peaks out ofall detected peaks: $\frac{TP}{TP+FP}$ . 2) Recall rate: The proportion of correctly detected R-peaks out of all actual R-peaks: $\frac{TP}{TP+FN}$ .

HAR . 1) Classification accuracy : The portion of the test data that is correctly classified based on the label. 2) GPU memory usage : The amount of GPU memory used during model inference. 3) Inference time : The time it takes from feeding the data into the code to the generation of the recognition result.

6.2 Baselines

compare their performance with GPIoT (GT) .

1) GPT-4o (G4)

2) DeepSeek-Coder (DC)

3) CodeLlama-34b (CL)

4) WizardCoder-33b (WC)

5) CodeQwen-7b(CQ)

6)GitHub Copilot (GC)

7) MapCoder (MC)

6.3 Application Evaluation

With the designed two problem statements (Fig. 13) for the three IoT applications, we input them into GPIoT and the baselines to synthesize 20 different programs for each task.

6.3.1 HD.

6.3.2 HAR.

we set the training epochs to 10 and the batch size to 32

6.3.3 Multimodal HAR.

6.4 Breakdown Evaluation

We separately evaluate TDSLM and CGSLM on IoTBench to explore the effectiveness of fine-tuning in the IoT domain.

6.4.1 Metrics.

TDSLM.
1) BLEU:
2) Format Correctness Rate (FCR):
3) Sub-Task Completeness (STC):
CGSLM.
1) Code embedding similarity:
2) Pass@k:

3) User Requirement Coverage (URC):
4) Code quality:

6.4.2 TDSLM.

We input each problem statement from IoTBench into TDSLM and the baselines to generate 20 different decomposed tasks and calculate the average BLEU score, FCR, and URC.

6.4.3 CGSLM.

We input each task specification from IoTBench into CGSLM and the baselines to generate 20 different code & docu mentation . We then report the average code embedding similarity, pass@1, pass@5, URC, and the number of various issues detected by SonarQube.

6.5 Ablation Study

6.5.1 IoT-Oriented Data Augmentation.

We directly tune TDSLM on the raw dataset without our IoT-oriented data augmentation, which contains only 273 data samples. We then evaluate the tuned model on IoTBench and report the average BLUE score, FCR, and STC.

主要原因是原始数据集在物联网领域缺乏通用性和多样性，这限制了调优的SLM将物联网问题分解为可管理的子任务的能力，偶尔会导致幻觉导致不正确的结果。

6.5.2 PECT.

We separately fine-tune TDSLM and CGSLM on their own datasets without our PECT paradigm. We then evaluate the performance of the tuned CGSLM on IoTBench and measure the average code embedding similarity, pass@1, and URC.

这是因为TDSLM所拥有的一些物联网领域知识不能与CGSLM共享。因此，CGSLM无法处理一些超出工作范围的编程任务，从而提供了性能下降的简单程序。然而，即使是这种调整不足的CGSLM仍然优于GPT-4o，这突出了微调在提高物联网相关代码生成能力方面的优势。此外，在HAR任务中没有PECT的情况下，最终的代码既没有采用数据预处理方法，也没有设计高性能的神经网络，导致分类精度下降

6.5.3 RTSLM.

We directly feed the natural-language-described decomposed tasks from TDSLM into CGSLM. We then compare the performance of the generated code by GPIoT , GPIoT without

RTSLM, and GPT-4o.

6.6 User Study

QUESTION

这篇论文的2.2 Preliminary Experiments & Findings中指出Evol-Instruct微调的TDSLM效果不好，因为may fall short of effectively capturing intricate relationships among IoT terminologies 那为什么论文的微调方法PECT就克服了这个问题？

在这篇论文中，作者指出Evol-Instruct微调的TDSLM效果不佳，主要是因为Evol-Instruct专注于增强原始文本数据的语言特性，而未能有效地捕捉物联网（IoT）术语之间复杂的关系。具体来说，Evol-Instruct可能无法充分理解和处理物联网领域特有的术语和概念，导致生成的解决方案与物联网领域的相关性不足。

相比之下，论文提出的Parameter-Efficient Co-Tuning (PECT) 范式通过以下方式克服了这一问题：

多路径低秩适应（Multi-Path LoRA Pipeline）：PECT采用了一个多路径的低秩适应（LoRA）管道，允许在任务分解和代码生成之间共享知识。通过这种方式，TDSLM和CGSLM可以在一个共享的基础模型上进行协作微调，从而促进知识在两个阶段之间的传递和共享。
轻量级投影层（Projection Layers）：PECT引入了轻量级的投影层，这些层可以在Transformer块中并行于FFN层，应用非线性变换来增强注意力表示的特征提取能力。这种设计有助于在跨领域任务中增强模型的学习能力，从而更好地理解和处理物联网术语之间的关系。
IoT导向的数据增强方法：论文提出了一种针对物联网的数据增强方法，考虑了物联网应用的独特特性，如传感器模态、数据表示和系统资源异质性。这种方法旨在增强数据集的质量和多样性，从而提高SLMs对物联网知识的理解能力。

通过这些技术手段，PECT能够更有效地捕捉和处理物联网领域的复杂关系，从而提高了TDSLM在任务分解方面的表现，并增强了CGSLM在生成相关代码方面的能力。