今日arXiv精选 | 预训练模型 & Transformer

最新推荐文章于 2024-10-04 04:16:22 发布

PaperWeekly

最新推荐文章于 2024-10-04 04:16:22 发布

阅读量453

点赞数

文章标签：人工智能 3d firebug sms 信息熵

原文链接：https://mp.weixin.qq.com/s/K5DHVUE9Hxv71BloW4VBhw#rd

版权

关于 #今日arXiv精选

这是「AI 学术前沿」旗下的一档栏目，编辑将每日从arXiv中精选高质量论文，推送给读者。

BloomNet: A Robust Transformer based model for Bloom's Learning Outcome Classification

Comment: Bloom's Taxonomy, Natural Language Processing, Transformer, Robustness and Generalization

Link: http://arxiv.org/abs/2108.07249

Abstract

Bloom taxonomy is a common paradigm for categorizing educational learningobjectives into three learning levels: cognitive, affective, and psychomotor.For the optimization of educational programs, it is crucial to design courselearning outcomes (CLOs) according to the different cognitive levels of BloomTaxonomy. Usually, administrators of the institutions manually complete thetedious work of mapping CLOs and examination questions to Bloom taxonomylevels. To address this issue, we propose a transformer-based model namedBloomNet that captures linguistic as well semantic information to classify thecourse learning outcomes (CLOs). We compare BloomNet with a diverse set ofbasic as well as strong baselines and we observe that our model performs betterthan all the experimented baselines. Further, we also test the generalizationcapability of BloomNet by evaluating it on different distributions which ourmodel does not encounter during training and we observe that our model is lesssusceptible to distribution shift compared to the other considered models. Wesupport our findings by performing extensive result analysis. In ablation studywe observe that on explicitly encapsulating the linguistic information alongwith semantic information improves the model on IID (independent andidentically distributed) performance as well as OOD (out-of-distribution)generalization capability.

Exploring Generalization Ability of Pretrained Language Models on Arithmetic and Logical Reasoning

Comment: Accepted by NLPCC2021

Link: http://arxiv.org/abs/2108.06743

Abstract

To quantitatively and intuitively explore the generalization ability ofpre-trained language models (PLMs), we have designed several tasks ofarithmetic and logical reasoning. We both analyse how well PLMs generalize whenthe test data is in the same distribution as the train data and when it isdifferent, for the latter analysis, we have also designed a cross-distributiontest set other than the in-distribution test set. We conduct experiments on oneof the most advanced and publicly released generative PLM - BART. Our researchfinds that the PLMs can easily generalize when the distribution is the same,however, it is still difficult for them to generalize out of the distribution.

Flying Guide Dog: Walkable Path Discovery for the Visually Impaired Utilizing Drones and Transformer-based Semantic Segmentation

Comment: Code, dataset, and video demo will be made publicly available

Code: https://github.com/EckoTan0804/flying-guide-dog

Link: http://arxiv.org/abs/2108.07007

Abstract

Lacking the ability to sense ambient environments effectively, blind andvisually impaired people (BVIP) face difficulty in walking outdoors, especiallyin urban areas. Therefore, tools for assisting BVIP are of great importance. Inthis paper, we propose a novel "flying guide dog" prototype for BVIP assistanceusing drone and street view semantic segmentation. Based on the walkable areasextracted from the segmentation prediction, the drone can adjust its movementautomatically and thus lead the user to walk along the walkable path. Byrecognizing the color of pedestrian traffic lights, our prototype can help theuser to cross a street safely. Furthermore, we introduce a new dataset namedPedestrian and Vehicle Traffic Lights (PVTL), which is dedicated to trafficlight recognition. The result of our user study in real-world scenarios showsthat our prototype is effective and easy to use, providing new insight intoBVIP assistance.

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Comment: Technical Report

Link: http://arxiv.org/abs/2108.06932

Abstract

Most polyp segmentation methods use CNNs as their backbone, leading to twokey issues when exchanging information between the encoder and decoder: 1)taking into account the differences in contribution between different-levelfeatures; and 2) designing effective mechanism for fusing these features.Different from existing CNN-based methods, we adopt a transformer encoder,which learns more powerful and robust representations. In addition, consideringthe image acquisition influence and elusive properties of polyps, we introducethree novel modules, including a cascaded fusion module (CFM), a camouflageidentification module (CIM), a and similarity aggregation module (SAM). Amongthese, the CFM is used to collect the semantic and location information ofpolyps from high-level features, while the CIM is applied to capture polypinformation disguised in low-level features. With the help of the SAM, weextend the pixel features of the polyp area with high-level semantic positioninformation to the entire polyp area, thereby effectively fusing cross-levelfeatures. The proposed model, named \ourmodel, effectively suppresses noises inthe features and significantly improves their expressive capabilities.Extensive experiments on five widely adopted datasets show that the proposedmodel is more robust to various challenging situations (e.g., appearancechanges, small objects) than existing methods, and achieves the newstate-of-the-art performance. The proposed model is available athttps://github.com/DengPingFan/Polyp-PVT .

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

Comment: None

Link: http://arxiv.org/abs/2108.06858

Abstract

The goal of No-Reference Image Quality Assessment (NR-IQA) is to estimate theperceptual image quality in accordance with subjective evaluations, it is acomplex and unsolved problem due to the absence of the pristine referenceimage. In this paper, we propose a novel model to address the NR-IQA task byleveraging a hybrid approach that benefits from Convolutional Neural Networks(CNNs) and self-attention mechanism in Transformers to extract both local andnon-local features from the input image. We capture local structure informationof the image via CNNs, then to circumvent the locality bias among the extractedCNNs features and obtain a non-local representation of the image, we utilizeTransformers on the extracted features where we model them as a sequentialinput to the Transformer model. Furthermore, to improve the monotonicitycorrelation between the subjective and objective scores, we utilize therelative distance information among the images within each batch and enforcethe relative ranking among them. Last but not least, we observe that theperformance of NR-IQA models degrades when we apply equivariant transformations(e.g. horizontal flipping) to the inputs. Therefore, we propose a method thatleverages self-consistency as a source of self-supervision to improve therobustness of NRIQA models. Specifically, we enforce self-consistency betweenthe outputs of our quality assessment model for each image and itstransformation (horizontally flipped) to utilize the rich self-supervisoryinformation and reduce the uncertainty of the model. To demonstrate theeffectiveness of our work, we evaluate it on seven standard IQA datasets (bothsynthetic and authentic) and show that our model achieves state-of-the-artresults on various datasets.

SOTR: Segmenting Objects with Transformers

Comment: ICCV 2021

Link: http://arxiv.org/abs/2108.06747

Abstract

Most recent transformer-based models show impressive performance on visiontasks, even better than Convolution Neural Networks (CNN). In this work, wepresent a novel, flexible, and effective transformer-based model forhigh-quality instance segmentation. The proposed method, Segmenting Objectswith TRansformers (SOTR), simplifies the segmentation pipeline, building on analternative CNN backbone appended with two parallel subtasks: (1) predictingper-instance category via transformer and (2) dynamically generatingsegmentation mask with the multi-level upsampling module. SOTR can effectivelyextract lower-level feature representations and capture long-range contextdependencies by Feature Pyramid Network (FPN) and twin transformer,respectively. Meanwhile, compared with the original transformer, the proposedtwin transformer is time- and resource-efficient since only a row and a columnattention are involved to encode pixels. Moreover, SOTR is easy to beincorporated with various CNN backbones and transformer model variants to makeconsiderable improvements for the segmentation accuracy and trainingconvergence. Extensive experiments show that our SOTR performs well on the MSCOCO dataset and surpasses state-of-the-art instance segmentation approaches.We hope our simple but strong framework could serve as a preferment baselinefor instance-level recognition. Our code is available athttps://github.com/easton-cau/SOTR.