阿里天池ICDAR 2023 DTT in Images 1: Text Manipulation Classification(10/1267)

文章探讨了阿里云天池大赛中的图像篡改识别问题,分析了数据平衡、图像类别特点,尝试了多种增强、优化策略及损失函数。模型集成在提升性能上有效,但针对文本类别的困难仍存在过拟合现象。团队发现模型在截图类别表现优秀,而在文本类别上表现不佳。
摘要由CSDN通过智能技术生成

Challenge interpretation

https://tianchi.aliyun.com/competition/entrance/532048/introduction

Data analysis

  • Label is balance

  • The type of data could generate divided into two class:screenshot,text

  • It seems that the tampering of the text class is obvious than screenshot.

  • The tampered area is smaller than other benchmark.

  • The resolution in train data and test data have differences.The train data is more large.

  • White background is the majority.

  • The aspect ratio of screenshot images is roughly in the range of 2:1.Text images is more likely 1:1.

Solution

Augmentation

  1. RandomRotation90
  2. Mutli resolution
  3. Randomaugmentation
  4. Padding

Optimization & Learning rate scheduler

  1. Adamw
  2. One Cycle
  3. Early Stop

loss

  • SmothingBCEloss,weight:1/(epoch+1)
  • Focal loss,weight:1-1/(1+epoch) (This Loss may be not necessary because no data sample imbalance)

Not Work

  1. Seg head (Add segmentation head to do aux loss)
  2. Augmentation: Flip、RandomBrightness、RandomContrast、RandomCrop、RandomAffine、RandomThinPlateSpline、RandomGamma、RandomContrast、RandomBrightness
  3. Slide window predict
  4. TTA(Rotation,Flip,Mutli Resolution)
  5. All data to train (Overfit?)
  6. Extral data(Some common data sets, data sets from the last competition, and data sets we produced)
  7. Smaller resolution:(512,512),(768,768)
  8. Larger resolution:(1536,1536) (overfit:CV reachs to 89.2. Try train in shape(1024,1024),then finetune in shape(1536,1536))
  9. Other models :MVSS(CV82,LB56),Segformer
  10. Kfold

Model ensemble

We ensembled five models w/o Kfold.

  1. Mult stage in Efficientnet-b2 (padding->resize LB:83.4467 CV:85.83749)
  2. Mult stage in Efficientnet-b2 (LB: 82.64 CV:85.0125)
  3. Mult feature:RGB+ BayerConv + SRMConv2D backbone:efficientnet-b2(padding->resize LB:80.68 CV:80.63)
  4. Convnext-small(padding->resize LB:80.5133 CV:83.8125)
  5. Efficientnet-b2(resize->padding LB:82.98 CV:84.475)

Large scale model(1536,1536 lb 82.9) not contribute well in ensembled.

Case Analysis

CAM

the probability of tampered is 0.99:

在这里插入图片描述

the probability of untampered is 0.003:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-kZfScHc6-1680426828990)(C:/Users/BlackFriday/AppData/Roaming/Typora/typora-user-images/image-20230322133534057.png)]

Cam’s visualization shows that the model is not cared to the real point and there is overfitting.

probability

We analysised the probability in test data and discovered that the Confidence will decreased when the background is change.

The confidence is 0.74

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-falWVDQk-1680426828991)(C:/Users/BlackFriday/AppData/Roaming/Typora/typora-user-images/image-20230322134144198.png)]

Difficulties in the text category

My teammate splited the data into two categories:screenshot and text.I tryed to calucated the score in these major categories.The results were a shock.We found that models all perform best in screenshot and bad on text(suspected of poor performance on invoices).

We tested three models.They all get 99. score in screenshot and 80~70 score in text. The overfit model(cv 89.2) get 99.9 score in screenshot and 85 score in text.

So we would like to design a model which could perform well in text.We tried finetune models in the test image, but the score is rise small.

Unfortunately,the preliminary round is coming to an end, no more chances and time to try.

Acknowledge

Thanks to the organizers for organizing this competition, the tampering task was characteristically challenging, inspired research and thinking, and the requirements were closely related to practical applications.This competition has opened my eyes to a lot of situations that conventional trick can’t work. Tianchi provided participants with a platform to exercise and carefully answered everyone’s questions, allowing us to gain a lot of valuable experience. We hope to have the opportunity to continue to participate in such competitions in the future as well.

Thanks to teammate Peng for the lively discussion with me and for providing some innovative ideas.

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值