深度学习英文论文部分表达方式笔记

该文提出了一种新颖的深度学习框架,针对低光照图像增强任务。通过引入特征恢复模块和双子网络结构,实现了对图像细节和色彩的精确恢复,同时提高了模型的适应性和效率。实验结果显示,提出的算法在平均精度上超越了现有 state-of-the-art 方法,并在多种下游任务中表现出优越性能。此外,研究还探讨了模型复杂度与性能之间的平衡,以及如何通过对比学习策略优化无监督学习方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • In this paper, we address xxx problem by introducing a novel xxx that can xxx;
  • xxx attains performance improvement by xxx;
  • we employ xxx as a backbone network;
  • adopt xxx for visibility enhancement;
  • Experimental results show that our xxx achieved 50.84% mean average precision (mAP) on xxx, outperforming many state-of-the-art xxx while maintaining a high speed.
  • to cope with the problem of detecting objects in poor visibility conditions.
  • Quantitative and qualitative evaluation results show that our proposed approach surpasses the accuracy of current state-of-the-art object detectors and the cascade of dehazing and detection models.
  • Section xxx provides/describes/presents xxx.
  • Many multi-task learning methods have been proposed and proven to be effective for various deep learning applications in the computer vision field.
  • The detection subnet is employed by utilizing a CNN, namely, RetinaNet
  • The restoration subnet is designed by attaching a proposed feature recovery (FR) module to the CB module for visibility enhancement, as shown in Fig. 1
  • These two subnetworks share the CB module to ensure that the clean features (fC2) produced at this module can be used in both subnetworks during joint learning.
  • have achieved impressive results
  • However, the performance of these approaches ultimately depends on the realism of the physical model, and many works only concentrate on everyday photography
  • Our approach extends existing learning-based approaches by xxx
  • improve by leveraging
  • the student IN-encoder tries to mimic the behavior of the teacher AN-encoder
  • We disentangle these two tasks
  • diminishes computation cost
  • elevate the model capability that can adapt general scenes
  • we make comprehensive explorations to excavate SCI’s inherent properties
  • Applications on low-light face detection and nighttime semantic segmentation fully reveal the latent practical values for SCI.
  • To settle the above issues, , we develop a novel xxx for xxx.
  • endowing the adaptation ability towards diverse scenes
  • Extensive experiments are conducted to illustrate our superiority against other state-of-the-art methods.
  • It should be noted that
  • they also cause two main issues
  • The convolution filters have static weights at inference, and thereby cannot flexibly adapt to the input content.
  • To deal with the above-mentioned shortcomings, xxx
  • its complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to high-resolution images.
  • Recently, few efforts have been made to tailor Transformers for image restoration tasks.
  • To reduce the computational loads
  • It applies SA across feature dimension rather than the spatial dimension
  • MDTA computes cross-covariance across feature channels to obtain attention map from the (key and query projected) input features.
  • it emphasizes on the spatially local context and brings in the complimentary strength of convolution operation within our pipeline.
  • The main contributions of this work are summarized below:
  • To alleviate this issue
  • The key ingredient is to apply SA across channels rather than the spatial dimension, i.e., to compute cross-covariance across channels to generate an attention map encoding the global context implicitly.
  • The model trained on mixed-size patches via progressive learning shows enhanced performance at test time where images can be of different resolutions (a common case in image restoration).
  • Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3× or more) and improve accuracy
  • Our scalable approach allows for learning high-capacity models that generalize well
  • Transfer performance in downstream tasks outperforms supervised pretraining and shows promising scaling behavior
  • Aided by the rapid gains in hardware, models today can easily overfit one million images and begin to demand hundreds of millions of—often publicly inaccessible—labeled images.
  • *progress of autoencoding methods in vision lags behind NLP.
  • In vision, convolutional networks were dominant over the last decade
  • This architectural gap, however, has been addressed with the introduction of Vision Transformers (ViT) and should no longer present an obstacle
  • this task appears to induce sophisticated language understanding
  • xxx results in a large reduction in computation
  • Surprisingly, our MAE behaves decently even if using no data augmentation (only center-crop, no flipping).
  • Selfsupervised learning in vision may now be embarking on a similar trajectory as in NLP.
  • Likewise,
  • Nevertheless, we observe (e.g.,Figure 4) that our MAE infers complex, holistic reconstructions, suggesting it has learned numerous visual concepts, i.e., semantics.
  • We hypothesize that
  • These issues warrant further research and consideration when building upon this work to generate images.
  • Broader impacts
  • Traditional low light enhancement methods [5, 6] are designed for improving visual quality, therefore cannot fill the semantic gap, as shown in Fig. 1 (b)
  • By brightening the low light images and distorting the normal light images, we build intermediate states that lie between the normal and low light.
  • we design a bidirectional scheme.
  • Existing low light enhancement methods are mainly designed for human vision rather than machine vision.
  • From a perspective on contrastive learning as dictionary look-up, we build a xxx
  • MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks
  • But supervised pre-training is still dominant in computer vision
  • The reason may stem from differences in their respective signal spaces
  • Several recent studies present promising results on unsupervised visual representation learning using approaches related to the contrastive loss
  • From this perspective, we hypothesize that it is desirable to build dictionaries that are: (i) large and (ii) consistent as they evolve during training.
  • Contrastive learning is at the core of several recent works on unsupervised learning [61, 46, 36, 66, 35, 56, 2], which we elaborate on later in context (Sec. 3.1).
  • Our hypothesis is that good features can be learned by a large dictionary that covers a rich set of negative samples, while the encoder for the dictionary keys is kept as consistent as possible despite its evolution.
  • This training strategy helps Restormer to learn context from large images, and subsequently provides quality performance improvements at test time.
  • These weights are applied to the feature maps U to generate the output of the SE block which can be fed directly into subsequent layers of the network.
  • Finally, since the above three attention mechanisms are applied sequentially, we can nest Equation 2 multiple times to effectively stack multiple πL, πS, and πC blocks together.
  • each pixel value of the enhanced image should fall in the normalized range of [0,1] to avoid information loss induced by overflow truncation;
  • θ(·) is implemented similar to [3], which first conducts a global average pooling on L × S dimensions to reduce the dimensionality, then uses two fully connected layers and a normalization layer, and finally applies a shifted sigmoid function to normalize the output to [−1; 1].
  • denote的用法:
    • A denote B
    • Let A denote B
    • we denote B by A
    • we use/utilize A to denote B
    • we denote with A B
  • 形容模型笨重:
    • xxx system/model is/becomes cumbersome
    • heavy computation
    • heavy computional resources
  • test dataset, training dataset
  • Meanwhile, adjusting the illumination map to enhance reflectance IR (scheme two) cannot fully keep the visual naturalness of the predicted normallight image
  • More specifically, since the degradation additionally damages the texture, color, and contrast of normal-light images, we argue that subtracting the perturbation IN from its low-light input IL directly (scheme one) cannot fully recover multiple kinds of fine details and color distribution (refer to MIRNet (Zamir et al. 2020) in Figure 1).
  • refining the color and contrast information by a refinement generator (ReG)
  • xxx takes IB as input, and applies ReG to estimate the inverse transformation matrix to further refine the color and textural details.
  • project the input into the feature space to extract initial features
  • Such degenerated images severely hinder some downstream tasks from operating smoothly, such as semantic segmentation or object detection in vision-based driving assistance systems.
  • high-quality positive samples without brightness and color defects are challenging to acquire in practice.
  • SCL-LLE casts the image enhancement task as multi-task joint learning, where LLE is converted into three constraints of contrastive learning, semantic brightness consistency, and feature preservation for simultaneously ensuring the exposure, texture, and color consistency.
  • We train SCL-LLE end-to-end while fixing the weights of the semantic segmentation network and the feature extraction network.
  • However, these methods would require a diverse and large collection of paired night images taken with and without light effects, which is intractable to obtain.
  • In this paper, we introduce an unsupervised learning approach that integrates a decomposition network and a light-effects suppression network in a single unified framework.
  • We employ the structure consistency based on the VGG network and utilize the guided filter to obtain HF features.
  • Hence, even when non-linear images are used in training, applications that are less concerned about physically correct intensity values but suffer from light effects can benefit from our method.
  • To better suppress light effects, we integrate our decomposition network with an unpaired light-effects suppression network.
  • the model learned from them cannot be directly deployed into real-world scenarios due to domain shift
  • achieve a good balance between
  • In this paper, we proposed a self-supervised low light image enhancement framework to realize automatic image contrast enhancement and denoising simultaneously.
  • the method proposed in this paper has good generalization ability
  • This paper proposes a deep learning method for lowlight image enhancement, which exploits the generation capability of Neural Networks (NNs) while requiring no training samples except the input image itself.
  • while the complex noise with spatially-varying characteristics is handled by an illumination-adaptive self-supervised denoising module.
  • The enhancement is done by jointly optimizing the Retinex decomposition and the illumination adjustment.
  • Appropriate discrepancy between the two untrained NNs is needed for accurately determining the attribution of image gradients.
  • a direct inversion for estimating the reflectance will significantly magnify the measurement noise
  • have the merit of training with both paired and unpaired data.
  • the proposed network is well designed to extract a series of coarse-to-fine band representations, whose estimations are mutually beneficial in a recursive process.
  • It aims to restore an image captured in the low-light condition to a normal one, where visibility, contrast, and noise are expected to be improved, stretched, and suppressed, respectively.
  • First, we propose a novel “generative” strategy for Retinex decomposition, by which the decomposition is cast as a generative problem.
  • As we can see, the overall framework is mainly constituted of two encoder-decoder networks, i.e. DIP1 and DIP2.
  • The images in X have negligible light effects, while the images in Y have prominent glow, glare, or floodlight light effects.
  • Such photos not only look unpleasing and fail to capture what the user desires, but also challenge many fundamental computer vision tasks, such as segmentation, object detection and tracking, since the underexposed regions have barely-visible details and relatively low contrast, as well as dull colors
  • Further, we adopt bilateral-grid-based upsampling to reduce the computational cost, and design a loss function that adopts various constraints and priors on illumination, so that we can efficiently recover underexposed photos with natural exposure, proper contrast, clear details, and vivid color.
  • We also find that such interventions do improve student accuracy, but there still remains a large discrepancy between the predictive distributions of the teacher and the student.
  • For these large models there is still a significant accuracy gap between student and teacher, so fidelity is aligned with generalization.
  • Recalling that
  • LKD is the added knowledge distillation term that encourages the student to match the teacher.
  • Notably, in this case, self-distillation does not improve generalization, since the slight difference between the teacher and student accuracy is explained by variance between trials
  • Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments.
  • In order to alleviate the reliance on training data, instead of restricting the output to be consistent with the reference directly, we employ a contrastive loss to distinguish the degraded images and their high-quality counterparts automatically.
  • xxx integrated feature fusion modules into the CNN architecture
  • uneven exposure
  • Please refer to the supplementary materials for more details. In the following section, we focus
    more on the learning of the sampling coordinates P.
  • Predicting the sampling color coordinates is equivalent to learning the placement of the sampling points in the 3D color space. Although the totally free sampling points placement provides high flexibility, it complicates the lookup procedure and increases the overhead significantly. To this end, we present a simple yet effective way to achieve the so-called constrained sampling point placement.
  • Please refer to Section 4.2 for more implementation details.
  • Such a procedure can be vividly analogized to a rendering process, as illustrated in Figure 2.
  • modify the local contrast and attenuate the high-intensities of the input image.
  • people have different retouching preferences。
  • taking these considerations into account, we propose
  • divide-and-conquer strategy
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值