Challenge interpretation
https://tianchi.aliyun.com/competition/entrance/532048/introduction
Data analysis
-
Label is balance
-
The type of data could generate divided into two class:screenshot,text
-
It seems that the tampering of the text class is obvious than screenshot.
-
The tampered area is smaller than other benchmark.
-
The resolution in train data and test data have differences.The train data is more large.
-
White background is the majority.
-
The aspect ratio of screenshot images is roughly in the range of 2:1.Text images is more likely 1:1.
Solution
Augmentation
- RandomRotation90
- Mutli resolution
- Randomaugmentation
- Padding
Optimization & Learning rate scheduler
- Adamw
- One Cycle
- Early Stop
loss
- SmothingBCEloss,weight:1/(epoch+1)
- Focal loss,weight:1-1/(1+epoch) (This Loss may be not necessary because no data sample imbalance)
Not Work
- Seg head (Add segmentation head to do aux loss)
- Augmentation: Flip、RandomBrightness、RandomContrast、RandomCrop、RandomAffine、RandomThinPlateSpline、RandomGamma、RandomContrast、RandomBrightness
- Slide window predict
- TTA(Rotation,Flip,Mutli Resolution)
- All data to train (Overfit?)
- Extral data(Some common data sets, data sets from the last competition, and data sets we produced)
- Smaller resolution:(512,512),(768,768)
- Larger resolution:(1536,1536) (overfit:CV reachs to 89.2. Try train in shape(1024,1024),then finetune in shape(1536,1536))
- Other models :MVSS(CV82,LB56),Segformer
- Kfold
Model ensemble
We ensembled five models w/o Kfold.
- Mult stage in Efficientnet-b2 (padding->resize LB:83.4467 CV:85.83749)
- Mult stage in Efficientnet-b2 (LB: 82.64 CV:85.0125)
- Mult feature:RGB+ BayerConv + SRMConv2D backbone:efficientnet-b2(padding->resize LB:80.68 CV:80.63)
- Convnext-small(padding->resize LB:80.5133 CV:83.8125)
- Efficientnet-b2(resize->padding LB:82.98 CV:84.475)
Large scale model(1536,1536 lb 82.9) not contribute well in ensembled.
Case Analysis
CAM
the probability of tampered is 0.99:
the probability of untampered is 0.003:
Cam’s visualization shows that the model is not cared to the real point and there is overfitting.
probability
We analysised the probability in test data and discovered that the Confidence will decreased when the background is change.
The confidence is 0.74
Difficulties in the text category
My teammate splited the data into two categories:screenshot and text.I tryed to calucated the score in these major categories.The results were a shock.We found that models all perform best in screenshot and bad on text(suspected of poor performance on invoices).
We tested three models.They all get 99. score in screenshot and 80~70 score in text. The overfit model(cv 89.2) get 99.9 score in screenshot and 85 score in text.
So we would like to design a model which could perform well in text.We tried finetune models in the test image, but the score is rise small.
Unfortunately,the preliminary round is coming to an end, no more chances and time to try.
Acknowledge
Thanks to the organizers for organizing this competition, the tampering task was characteristically challenging, inspired research and thinking, and the requirements were closely related to practical applications.This competition has opened my eyes to a lot of situations that conventional trick can’t work. Tianchi provided participants with a platform to exercise and carefully answered everyone’s questions, allowing us to gain a lot of valuable experience. We hope to have the opportunity to continue to participate in such competitions in the future as well.
Thanks to teammate Peng for the lively discussion with me and for providing some innovative ideas.