UP DETR思路:
(1)为了避免查询patch检测对分类特征的破坏,冻结了的预训练backbone,然后用patch的特征重构来维持变压器的特征识别。
(2)不同的对象查询关注不同的位置区域和方框大小。简单的单查询pre_train,并将其扩展为多查询版本。针对多查询patch,用对象查询shuffle和注意掩码解决了查询补丁和对象查询之间的分配问题。
裁剪的补丁是随机选择的,彼此独立。必须在整个解码器中保持这种独立性。即分配给一个补丁的对象查询不得与来自其他补丁的对象查询交互。
Independence of Query Patches
The cropped patches are randomly selected and are independent of each other. This independence must be preserved throughout the decoder .i.e Object Queries assigned to one patch must not interact with object queries from other patches. This can be enforced using an attention mask. This attention mask is added to the similarity of Q,K while calculating attention. Value of the mask is -infinity when Q and K belong to different image patches and 0 when they belong to the same image patch.