Abstract
the proposed ACE loss function exhibits two noteworthy properties:
- it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input
- it requires only characters and their numbers in the sequence annotation for supervision
对于第一点,这个似乎可以来解决任意排列的文本,论文中做的是1D和2D,颠覆了之前的先定位后识别的pipeline,看起来确实不错。
对于第二点更少的监督,都有个问题是,如果gt都没有序列信息,那么网络能学习到序列信息吗?ACE loss让网络失去了处理序列信息的能力吗?
Related Work
- Connectionist temporal classification: CNN-LSTM-CTC
- Attention mechanism: use attention mechansim locate the character.
Aggregation Cross-Entropy
对于识别任务,loss函数可以抽象成下面的形式 S S S 为标注, I I I 为输入, ω \omega ω为网络参数, Q Q Q 训练数据。
L ( ω ) = − ∑ ( I , s ) ∈ Q l o g P ( S ∣ I ; ω ) = − ∑ ( I , S ) ∈ Q ∑ l = 1 L l o g P ( S l ∣ l ; ω ) L(\omega)=-\sum_{(I,s)\in Q}logP(S\mid I;\omega)\\ =-\sum_{(I,S)\in Q}{\sum_{l=1}^{L}logP(S_l\mid l;\omega)} L(ω)=−