(21-6-02)基于Gemma 2B模型的智能文本摘要系统:实验(2)

9.6.4  创建MapReduce 摘要链

使用 LangChain中的函数load_summarize_chain创建一个 MapReduce 摘要链,并将参数verbose设置为 True。

chain = load_summarize_chain(langchain_hf, chain_type='map_reduce', verbose=True, map_prompt=prompt_init, combine_prompt=combine_prompt)

out_summary = chain.invoke(splits)

通过设置参数verbose=True,可以获得有关摘要链执行过程的更多信息,这对于调试和优化摘要过程非常有用。额外的输出可以帮助我们了解模型是如何逐步处理文档的各个部分,以及它是如何生成最终摘要的。执行后会输出:

> Entering new MapReduceDocumentsChain chain...


> Entering new LLMChain chain...
Prompt after formatting:
<bos><start_of_turn>user
Summarize the following text in a technical way. Focus on facts, numbers and strategies used. Divide the summary in chapters, be impersonal and use bullet points:

TLDR
We used an approach similar to audio spectrogram classification using the EfficientNet-B0 model, with numerous augmentations and transformer models such as BERT and DeBERTa as helper models. The final solution consists of one EfficientNet-B0 with an input size of 160x80, trained on a single fold from 8 randomly split folds, as well as DeBERTa and BERT trained on the full dataset. A single fold model using EfficientNet has a CV score of 0.898 and a leaderboard score of ~0.8.  
We used only competition data.<end_of_turn>
<start_of_turn>model
Prompt after formatting:
<bos><start_of_turn>user
Summarize the following text in a technical way. Focus on facts, numbers and strategies used. Divide the summary in chapters, be impersonal and use bullet points:

1. Data Preprocessing
We extracted 18 lip points, 20 pose points (including arms, shoulders, eyebrows, and nose), and all hand points, resulting in a total of 80 points. During training, we applied various augmentations. We implemented standard normalization. Instead of dropping NaN values, we filled them with zeros after normalization. We interpolated the time axis to a size of 160 using 'nearest' interpolation: yy = F.interpolate(yy[None, None, :], size=self.new_size, mode='nearest'). Finally, we obtained a tensor with dimensions 160x80x3, where 3 represents the (X, Y, Z) axes.  
Only 61 points were kept, including 40 lip points and 21 hand points. For left and right hand, the one with less NaN was kept. If right hand was kept, mirror it to left hand.  
Augmentations, normalization and NaN-filling were applied sequentially.  
Sequences longer than 96 were interpolated to 96. Sequences shorter than 96 were unchanged.  
Apart from raw positions, hand-crafted features were also used, including motion, distances, and cosine of angles.  
Motion features consist of future motion and history motion, which can be denoted as:  
$$ Motion_{future} = position_{t+1} - position_{t} $$ $$ Motion_{history} = position_{t} - position_{t-1} $$  
Full 210 pairwise distances among 21 hand points were included.  
There are 5 vertices in a finger (e.g. thumb is [0,1,2,3,4]), and therefore, there are 3 angles: <0,1,2>, <1,2,3>, <2,3,4>. So 15 angles of 5 fingers were included.  
Randomly selected 190 pairwise distances and randomly selected 8 angles among 40 lip points were included.<end_of_turn>
<start_of_turn>model
Prompt after formatting:
<bos><start_of_turn>user
Summarize the following text in a technical way. Focus on facts, numbers and strategies used. Divide the summary in chapters, be impersonal and use bullet points:

2. Augmentation
These augmentations are used in both CNN training and transformer training  
Random affine: Same as @hengck23 shared. In CNN, after global affine, shift-scale-rotate was also applied to each part separately (e.g. hand, lip, body-pose).  
Random interpolation: Slightly scale and shift the time dimension.  
Flip pose: Flip the x-coordinates of all points. In CNN, x_new = x_max - x_old. In transformer, x_new = 2 * frame[:,0,0] - x_old.  
Finger tree rotate: There are 4 root-children pairs in a finger with 5-vertices. E.g. in thumb ([0,1,2,3,4]), these 4 root-children pairs are: 0-[1,2,3,4],1-[2,3,4],2-[3,4],3-[4]. We randomly choose some of these pairs, and rotate the children points around root point with a small random angle.  
省略中间输出部分

**Chapter 1: Data Preparation**

- Split the dataset into 8 folds.
- Use a random split for training and the full dataset for validation.
- Implement weighted CrossEntropyLoss for class imbalance.
- Choose an EfficientNet model with 5 blocks.
- Use Ranger optimizer with 60% flat and 40% cosine annealing learning rate schedule.

**Chapter 2: Model Training**

- Train on one fold with a random split (8 folds in total).
- Use Optuna to tune most parameters.
- Implement knowledge distillation in the 3-layer model.

**Chapter 3: Evaluation**

- Evaluate the model on the validation set.
- Tune the learning rate and dropout probability with Optuna.
- Use EarlyStopping to prevent overfitting.

**Chapter 1: Model Conversion and Ensemble**

- Rewrote all models in Keras and transferred PyTorch weights to them.
- Speed boost of 30% for transformer model.
- Rewriting DepthwiseConv2D with a hard-coded way, whose speed is 200%~300% of its original version of tflite DepthwiseConv2D.

**Chapter 2: Ensemble**

- Calculated ensemble weights for models trained on fold 0 using the local fold 0 score.
- Applied these weights to the full dataset models.
- Ensemble included: EfficientNet-B0, fold 0 BERT, full data train DeBERTa, full data train.

**Chapter 3: Results**

- EfficientNet-B0 achieved a leaderboard score of approximately 0.8.
- Transformers improved the score to 0.81.

**Chapter 1: Introduction**

* Depthwise convolution is a type of convolution that is used to extract features from images.
* Traditional CNN and ViT models are both used for depthwise convolution, but they can be computationally expensive.
* In this paper, we propose a new depthwise convolution model that is faster than existing models.

**Chapter 2: Background**

* Depthwise convolution is a type of convolution that is used to extract features from images.
* Traditional CNN and ViT models are both used for depthwise convolution, but they can be computationally expensive.
* One way to reduce the computational cost of depthwise convolution is to use a depthwise convolution kernel.
* Another way to reduce the computational cost of depthwise convolution is to use a group convolution kernel.

**Chapter 3: Proposed Depthwise Convolution Model**

* We propose a new depthwise convolution model that is faster than existing models.
* Our model uses a group convolution kernel that is applied to the input image in a parallel fashion.
* This allows our model to be much faster than traditional depthwise convolution models.

**Chapter 4: Experimental Results**

* We compare our new model to existing depthwise convolution models on a variety of tasks.
* Our model outperforms existing models on all of the tasks that we tested.
* We also compare our model to an EfficientNet model with ONNX, which is a state-of-the-art model for image classification.
* Our model is ~5 times faster than EfficientNet with ONNX.

**Chapter 5: Conclusion**

* Our new depthwise convolution model is faster than existing models on all of the tasks that we tested.
* Our model is also more efficient than EfficientNet with ONNX.
* We believe that our model has the potential to be a significant improvement for computer vision applications.<end_of_turn>
<start_of_turn>model

> Finished chain.

> Finished chain.

9.6.5  Refine(细化)处理

(1)使用 LangChain实现 Refine(细化)策略的文本摘要信息,通过 Refine 策略,可以在初次摘要的基础上进一步考虑新的上下文信息,生成一个更加准确和完善的摘要。这种方法特别适用于需要根据不断变化的信息进行调整的情况,或者在初次摘要可能遗漏了一些重要细节时进行优化。

prompt_template = """<bos><start_of_turn>user
Summarize the following text in a technical way. Focus on facts, numbers and strategies used. Divide the summary in chapters, be impersonal and use bullet points:

{text}<end_of_turn>
<start_of_turn>model"""
prompt_init = PromptTemplate.from_template(prompt_template)

refine_template = """<bos><start_of_turn>user
Produce a final document divided in chapters and bullet points.
You are given a text containing an existing summary to a certain point:

{existing_answer}

You can now refine it (if necessary) with more context below.

{text}

Given the new context, refine the original summary.<end_of_turn>
<start_of_turn>model"""
prompt_refine = PromptTemplate.from_template(refine_template)


chain = load_summarize_chain(langchain_hf, chain_type='refine',
                             return_intermediate_steps=True,
                             input_key='input_documents',
                             output_key='output_text',
                             question_prompt=prompt_init,
                             refine_prompt=prompt_refine)

out_summary = chain.invoke(splits, return_only_outputs=True)

对上述代码的具体说明如下:

  1. 定义初次摘要的提示模板:创建一个用于初次摘要的提示模板 prompt_template,它指导模型以技术性方式总结文本,关注事实、数字和战略,并使用项目符号分章节。
  2. 初始化提示模板:使用 PromptTemplate.from_template(prompt_template) 创建一个 PromptTemplate 实例 prompt_init,被用于初次摘要阶段。
  3. 定义细化阶段的提示模板:创建一个用于细化摘要的提示模板 refine_template,它允许模型在已有摘要的基础上根据新的上下文信息进行调整和完善。
  4. 初始化细化提示模板:使用 PromptTemplate.from_template(refine_template) 创建一个 PromptTemplate 实例 prompt_refine,它将用于摘要的细化阶段。
  5. 创建 Refine 策略的摘要链:使用 load_summarize_chain 函数创建一个 Refine 策略的摘要链 chain,这个链将用于接收初次摘要的结果,并将它们与新的上下文信息结合起来,生成一个细化后的最终摘要。
  6. 设置摘要链参数:chain_type='refine' 指定了使用 Refine 策略。return_intermediate_steps=True 表示返回中间步骤的结果。input_key='input_documents' 和 output_key='output_text' 分别指定了输入和输出的键名。question_prompt=prompt_init 和 refine_prompt=prompt_refine 分别指定了初次摘要和细化阶段使用的提示模板。
  7. 运行摘要链:使用 chain.invoke(splits, return_only_outputs=True) 对分割后的文档块 splits 运行 Refine 摘要链,参数return_only_outputs=True表示只返回最终的输出结果,而不包括中间步骤。
  8. 存储输出摘要:变量out_summary将包含最终生成的细化摘要信息。

(2)通过下面的代码,可以在保证在Jupyter Notebook 中以 Markdown 格式展示 out_summary 字典中 'output_text' 键对应的文本内容,并确保在展示过程中避免 Markdown语法冲突的问题。

display(Markdown(out_summary['output_text'].replace('#', '')))

执行后会输出:

Chapter 1: Introduction

The task is to classify images into different categories. We use an approach similar to audio spectrogram classification. We use multiple models, including EfficientNet-B0 and DeBERTa.

Chapter 2: Model Architecture

EfficientNet-B0 model with input size of 160x80.
Transformer models (BERT and DeBERTa) as helper models.
The final solution consists of one EfficientNet-B0 with an input size of 160x80.
Chapter 3: Training

We use 8 randomly split folds for training.
A single fold model is trained on each fold.
We use a single EfficientNet-B0 model with an input size of 160x80.
Chapter 4: Evaluation

We use a single fold for evaluation.
The model has a CV score of 0.898.
The model has a leaderboard score of ~0.8.
Chapter 5: Refinement and Ensemble

We rewrote all our models in Keras and transferred PyTorch weights to them, resulting in a speed boost of around 30%.
We used ensemble weights for models trained on fold 0 using the local fold 0 score and applied these weights to the full dataset models.
EfficientNet-B0 achieved a leaderboard score of approximately 0.8, and transformers improved the score to 0.81.
The final ensemble included:
EfficientNet-B0, fold 0 BERT, full data train DeBERTa, full data train

(3)通过下面的代码可以打印并查看在 Refine 策略摘要过程中生成的中间步骤,通过这种方式,可以查看摘要过程中的每个步骤,包括初次摘要和细化步骤的输出。这对于调试摘要过程、理解模型的行为以及优化摘要策略非常有用。

print("\n###############################\n".join(out_summary["intermediate_steps"]))

执行后会输出:

**Chapter 1: Introduction**

- The task is to classify images into different categories.
- We use an approach similar to audio spectrogram classification.
- We use multiple models, including EfficientNet-B0 and DeBERTa.

**Chapter 2: Model Architecture**

- EfficientNet-B0 model with input size of 160x80.
- Transformer models (BERT and DeBERTa) as helper models.
- The final solution consists of one EfficientNet-B0 with an input size of 160x80.

**Chapter 3: Training**

- We use 8 randomly split folds for training.
- A single fold model is trained on each fold.
- We use a single EfficientNet-B0 model with an input size of 160x80.

**Chapter 4: Evaluation**

- We use a single fold for evaluation.
- The model has a CV score of 0.898.
- The model has a leaderboard score of ~0.8.
###############################
**Chapter 1: Introduction**

/省略中间的输出结果

- We use 8 randomly split folds for training.
- A single fold model is trained on each fold.
- We use a single EfficientNet-B0 model with an input size of 160x80.

**Chapter 4: Evaluation**

- We use a single fold for evaluation.
- The model has a CV score of 0.898.
- The model has a leaderboard score of ~0.8.

**Chapter 5: Refinement and Ensemble**

- We rewrote all our models in Keras and transferred PyTorch weights to them, resulting in a speed boost of around 30%.
- We used ensemble weights for models trained on fold 0 using the local fold 0 score and applied these weights to the full dataset models.
- EfficientNet-B0 achieved a leaderboard score of approximately 0.8, and transformers improved the score to 0.81.
- The final ensemble included:
  - EfficientNet-B0, fold 0 BERT, full data train DeBERTa, full data train

  • 23
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

码农三叔

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值