阿里天池ICDAR 2023 DTT in Images 1: Text Manipulation Classification（10/1267）

Fly-Pluche

已于 2023-09-13 11:20:22 修改

阅读量1.1k

点赞数 4

文章标签：人工智能计算机视觉深度学习

于 2023-04-02 17:20:27 首次发布

本文链接：https://blog.csdn.net/qq_51302564/article/details/129913693

版权

文章探讨了阿里云天池大赛中的图像篡改识别问题，分析了数据平衡、图像类别特点，尝试了多种增强、优化策略及损失函数。模型集成在提升性能上有效，但针对文本类别的困难仍存在过拟合现象。团队发现模型在截图类别表现优秀，而在文本类别上表现不佳。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Challenge interpretation

https://tianchi.aliyun.com/competition/entrance/532048/introduction

Data analysis

Label is balance
The type of data could generate divided into two class:screenshot,text
It seems that the tampering of the text class is obvious than screenshot.
The tampered area is smaller than other benchmark.
The resolution in train data and test data have differences.The train data is more large.
White background is the majority.
The aspect ratio of screenshot images is roughly in the range of 2:1.Text images is more likely 1:1.

Solution

Augmentation

RandomRotation90
Mutli resolution
Randomaugmentation
Padding

Optimization & Learning rate scheduler

Adamw
One Cycle
Early Stop

loss

SmothingBCEloss，weight:1/(epoch+1)
Focal loss，weight:1-1/(1+epoch) (This Loss may be not necessary because no data sample imbalance)

Not Work

Seg head (Add segmentation head to do aux loss)
Augmentation: Flip、RandomBrightness、RandomContrast、RandomCrop、RandomAffine、RandomThinPlateSpline、RandomGamma、RandomContrast、RandomBrightness
Slide window predict
TTA（Rotation，Flip，Mutli Resolution）
All data to train (Overfit?)
Extral data(Some common data sets, data sets from the last competition, and data sets we produced)
Smaller resolution:(512,512),(768,768)
Larger resolution:(1536,1536) (overfit:CV reachs to 89.2. Try train in shape(1024,1024),then finetune in shape(1536,1536))
Other models :MVSS(CV82,LB56),Segformer
Kfold

Model ensemble

We ensembled five models w/o Kfold.

Mult stage in Efficientnet-b2 (padding->resize LB:83.4467 CV:85.83749)
Mult stage in Efficientnet-b2 (LB: 82.64 CV:85.0125)
Mult feature:RGB+ BayerConv + SRMConv2D backbone:efficientnet-b2(padding->resize LB:80.68 CV:80.63)
Convnext-small(padding->resize LB:80.5133 CV:83.8125)
Efficientnet-b2(resize->padding LB:82.98 CV:84.475)

Large scale model(1536,1536 lb 82.9) not contribute well in ensembled.

Case Analysis

CAM

the probability of tampered is 0.99：

在这里插入图片描述

the probability of untampered is 0.003：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kZfScHc6-1680426828990)(C:/Users/BlackFriday/AppData/Roaming/Typora/typora-user-images/image-20230322133534057.png)]

Cam’s visualization shows that the model is not cared to the real point and there is overfitting.

probability

We analysised the probability in test data and discovered that the Confidence will decreased when the background is change.

The confidence is 0.74

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-falWVDQk-1680426828991)(C:/Users/BlackFriday/AppData/Roaming/Typora/typora-user-images/image-20230322134144198.png)]

Difficulties in the text category

My teammate splited the data into two categories:screenshot and text.I tryed to calucated the score in these major categories.The results were a shock.We found that models all perform best in screenshot and bad on text(suspected of poor performance on invoices).

We tested three models.They all get 99. score in screenshot and 80~70 score in text. The overfit model(cv 89.2) get 99.9 score in screenshot and 85 score in text.

So we would like to design a model which could perform well in text.We tried finetune models in the test image, but the score is rise small.

Unfortunately,the preliminary round is coming to an end, no more chances and time to try.

Acknowledge

Thanks to the organizers for organizing this competition, the tampering task was characteristically challenging, inspired research and thinking, and the requirements were closely related to practical applications.This competition has opened my eyes to a lot of situations that conventional trick can’t work. Tianchi provided participants with a platform to exercise and carefully answered everyone’s questions, allowing us to gain a lot of valuable experience. We hope to have the opportunity to continue to participate in such competitions in the future as well.

Thanks to teammate Peng for the lively discussion with me and for providing some innovative ideas.