fuxin607-CSDN博客

原创跨模态预训练迁移

1.ViLD，Zero-Shot Detection via Vision and Language Knowledge Distillation。code2.OVR-CNN，Open-Vocabulary Object Detection Using Captions[CVPR2021]。code3.LSeg，Language-driven Semantic Segmentation[ICLR2022]。code4.OpenSeg，Open-Vocabulary Image Segmentatio.

2022-02-17 19:50:36 1844

原创跨模态预训练

1.LXMERT，LXMERT: Learning Cross-Modality Encoder Representations from Transformers[EMNLP2019]。[code]（https://github.com/airsplay/lxmert）

2021-12-11 17:19:01 3659 1

原创语义分割最新进展

https://www.jeremyjordan.me/semantic-segmentation/

2020-08-15 10:11:14 792

原创计算机视觉中的长尾问题

现实生活中的计算视觉问题往往会存在长尾问题，即类别的数据量分布不均衡，有的类样本非常多，而有的类却非常少。1.单标签分类。2.多标签分类。3.目标检测。4.语义分割。...

2020-04-01 19:13:43 2492

翻译如何做学术研究

翻译MIT Bill Freeman教授 http://people.csail.mit.edu/billf/publications/How_To_Do_Research.pdf

2020-01-28 12:10:47 1104

原创 image caption研究进展

主要介绍image caption最近的几篇文章，及其相关的应用。1.Google NIC，Show and Tell: A Neural Image Caption Generator [CVPR2015]。2.Hard(soft)-Attention，Show, Attend and Tell: Neural Image Caption Generation with Visual A...

2019-11-08 12:10:20 684

原创 Baby Talk and Neural Baby Talk

Baby Talk: Understanding and Generating Image DescriptionsNeural Baby Talk

2019-05-16 21:22:54 516

原创跨媒体分析中的新任务

1.Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions, https://arxiv.org/pdf/1811.08592.pdf.2.Composing Text and Image for Image Retrieval - An Empirical Odyssey, ht...

2018-12-20 17:10:26 1037 2

转载深度学习500问

深度学习500问，以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述，以帮助自己及有需要的读者。全书分为18个章节，近30万字。https://github.com/scutan90/DeepLearning-500-questions...

2018-12-06 11:04:47 622

原创读NeurIPS 2018 accepted paper list的十点感想

1.变分非常火(Variational) ，至少有35篇变分的文章。

2018-11-30 16:10:23 2137

转载文本检索

纯文本检索相关资料 https://github.com/NTMC-Community/awaresome-neural-models-for-semantic-match

2018-10-31 14:27:29 2180

原创计算机视觉方向如何写文章

TitleAbstractIntroductionRelated WorkProposed methodsExperimentsConclusion and Future WorkAcknowledgements

2018-10-23 18:45:13 9105

原创关于GAN的一些资料

VAE, BIRVAE, NSGAN, MMGAN, WGAN, WGANGP, LSGAN, DRAGAN, BEGAN, RaGAN, InfoGAN, fGAN, FisherGAN pytorch code

2018-09-26 08:05:17 511

原创 ECCV2018比较有意思的paper

Double JPEG Detection in Mixed JPEG Quality Factors using Deep Convolutional Neural Network Fighting Fake News: Image Splice Detection via Learned Self-Consistency Face De-Spoofing: Anti-Spoofing vi...

2018-09-25 14:28:41 2348

转载自然语言处理会议期刊

列一下自然语言处理(NLP)这个小方向的：会议(C): ACL (Annual Meeting of the Association for Computational Linguistics) NAACL (Annual Conference of the North American Chapter of Association for Computational Linguistic...

2018-05-24 10:10:07 6520

原创 AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

这是CVPR2018一篇关于文本到图像合成的文章，paper链接https://arxiv.org/abs/1711.10485，code已经released出来了https://github.com/taoxugit/AttnGAN，作者的homepage https://sites.google.com/view/taoxu。...

2018-05-17 21:08:53 3310

原创文本图像跨媒体检索进展

主要介绍9篇关于文本图像双向检索任务的9篇论文。

2018-05-15 11:07:54 12576 1

原创 Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

这是CVPR2018 Oral的一片关于Weakly-Supervised Video Grounding的文章，paper连接http://ai.stanford.edu/~dahuang/papers/cvpr18-ramil.pdf，作者的homepage http://ai.stanford.edu/~dahuang/，code暂时没有被released出来。文章要做的事情：输入：...

2018-05-05 10:51:51 862

原创 Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

CVPR2018一篇关于Visual Question Answering Tricks的文章，作者是2017 VQA Challenge冠军团队成员之一，paper连接https://arxiv.org/abs/1708.02711，作者的homepage https://www.damienteney.info/adventures。文章要做的事情： visual question an...

2018-05-04 09:37:39 1054

原创 Learning Cross-modal Embeddings for Cooking Recipes and Food Images

这是CVPR2017的一篇做cross-modal retrieval的文章，paper和相关数据代码链接http://im2recipe.csail.mit.edu/，作者的homepage https://imatge.upc.edu/web/people/amaia-salvador。文章要做的事情(recipe retreival)：输入：image（sentence）+datas...

2018-05-03 16:48:37 691

原创 Cross-Modal Retrieval in the Cooking Context：Learning Semantic Text-Image Embeddings

这是 ACM SIGIR2018的一篇做cross-modal retrieval的文章，paper链接 https://arxiv.org/pdf/1804.11146.pdf，作者是巴黎第六大学的PHD，作者的homepage http://webia.lip6.fr/~carvalho/static/home/，code暂时没有被released出来。文章要做的事情(recipe ret...

2018-05-02 10:12:36 458

原创 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

这是CVPR2018 Oral的一篇关于 Image Captioning和Visual Question Answering的文章，paper链接https://arxiv.org/abs/1707.07998，作者的homepage http://www.panderson.me/，code已经被released出来了https://github.com/peteanderson80/bott...

2018-04-28 16:11:31 982

原创 Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

这是CVPR2016的一篇关于Temporal Action Localization的文章，paper连接https://arxiv.org/abs/1601.02129，作者的homepage http://www.columbia.edu/~zs2262/research.html，code已经被released出来了 https://github.com/zhengshou/scnn/。 ...

2018-04-28 09:30:37 1694

原创 TALL: Temporal Activity Localization via Language Query

这是ICCV2017 Spotlight的一篇关于temporal activity localization via language query in an untrimmed video的文章，paper连接https://arxiv.org/abs/1705.02101，作者的homepage https://jiyanggao.github.io/，code已经被released出来了h...

2018-04-27 08:49:59 1560

原创 Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

这是2018年4月24日用GAN和reinforcement learning（RL）做poetry generation from image的文章，paper连接https://arxiv.org/abs/1804.08473，暂时还没有找到作何的主页和相关的code. 文章要所的事情：输入：image　　　　　　输出：poetry 文章中show出来的example。文章与...

2018-04-26 20:03:54 785

原创 Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

这是CVPR2018 Oral的一篇关于做Visual Dialog Generation的文章，paper连接https://arxiv.org/abs/1711.07613，作者的homepage http://qi-wu.me/home.html，一作是University of Adelaide Chunhua Shen组的Assistant Professor，code暂时还没有被rel...

2018-04-25 09:30:09 946

原创 Learning Hierarchical Features for Scene Labeling

这是TPAMI2013的一篇关于做scene parsing（semantic segmentation）的文章，他的会议版本发表于ICML2012，paper连接https://ieeexplore.ieee.org/abstract/document/6338939/，作者的homepage http://www.clement.farabet.net/，作者现在是VP of AI Infra...

2018-04-24 09:54:15 441

原创 Efficient Video Object Segmentation via Network Modulation

这是CVPR2018的一篇关于efficient video segmentation的文章，paper链接https://arxiv.org/abs/1802.01218，作者的homepage https://sites.google.com/site/linjieyang89/，tensorflow的code已经被released出来了https://github.com/linjieya...

2018-04-23 08:53:04 780

原创 An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep RL

这是一篇做关于用reinfocement learning（RL）做Natural Language Object Retrieval的文章，paper的链接https://arxiv.org/abs/1703.07579，没有找到作者的homepage，但是code已经released出来了https://github.com/jxwufan/NLOR_A3C。文章要做的事情：输入：te...

2018-04-22 10:29:02 481

原创 Show, Reward and Tell

这是AAAI2018用GAN和reinforcement learning（RL）做Photo Stream Story Telling的文章。paper链接https://pdfs.semanticscholar.org/977b/eecdf0b5c3487d03738cff501c79770f0858.pdf，暂时还没有找到作何的主页和相关的code，文章题目Show, Reward and ...

2018-04-21 10:03:34 485

原创 Deep Reinforcement Learning for Dialogue Generation

这是EMNLP2016的一片关于用reinforcement learning（RL）做dialogue generation的文章，paper链接https://arxiv.org/abs/1606.01541，一作是仍然是李纪为大神（据说是stanford CS方向第一个3年毕业的PHD），现在是香侬科技的创始人，作者homepage http://stanford.edu/~jiweil/i...

2018-04-20 09:37:23 1008

原创 Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

CVPR2018的一篇关于跨媒体检索的文章，paper链接https://arxiv.org/abs/1711.06420，一作是南洋理工大学的PHD，作者的homepage http://jxgu.cc/，code已经被released出来了https://github.com/ujiuxiang/NLP_Practice.PyTorch/tree/master/cross_modal_retr...

2018-04-19 08:18:36 2596 12

原创 Watch,Listen,and Describe:Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

这是NAACL2018的一篇关于video caption（CV与NLP结合）的文章，paper链接https://arxiv.org/abs/1804.05448，一作是加州大学圣塔芭芭拉分校（UCSB）的PHD，作者的homepage http://www.cs.ucsb.edu/~xwang/，code还没有被released出来（作者没有release code的习惯）。个人瞎扯：看...

2018-04-18 09:12:42 636 1

原创 A Hierarchical Neural Autoencoder for Paragraphs and Documents

这是ACL2015的一片关于自然语言生成的文章，paper链接https://arxiv.org/abs/1506.01057，一作是李纪为大神（据说是stanford CS方向第一个3年毕业的PHD），现在是香侬科技的创始人，作者homepage http://stanford.edu/~jiweil/index.html，code在github上面已经released出来了https://gi...

2018-04-17 08:10:01 1250

原创 Actor and Action Video Segmentation from a Sentence

CVPR2018 Oral的一篇关于跨媒体(Video与NLP结合)的文章，paper链接 https://arxiv.org/abs/1803.07485，一作是荷兰阿姆斯特丹大学的PHD，作者的homepage https://kgavrilyuk.github.io/，code和datasets还没有被released出来。个人瞎扯：这是我见过的第一篇发表出来的用NLP做video se...

2018-04-16 08:40:36 1847

原创 Margin Sample Mining Loss: A Deep Learning Based Method for Person Re-identification

这篇文章2017年月份就挂载了arxiv上面，一作是CMU的学生，文章应该是投CVPR2018被reject了，没有找到作者的homepage，但是二作在github上面released keras的code https://github.com/michuanhaohao/keras_reid。个人瞎扯：这篇文章我很早就在微信公众号 PaperWeekly上面见过，今天早上忽然想起这篇文章，...

2018-04-15 10:08:20 899

原创 Image-to-Image Translation with Conditional Adversarial Networks

CVPR2017年的一篇文章，大名鼎鼎的pix2pix，paper链接https://arxiv.org/abs/1611.07004，一作是MIT的PHD，现在是UC Berkeley的postdoctoral，作者的主页https://people.eecs.berkeley.edu/~isola/，code已经released在github上面了https://github.com/phil...

2018-04-14 09:38:15 1162

原创 Text2Colors: Guiding Image Colorization through Text-Driven Palette Generation

arxiv上面2018年4月13号更新的韩国高丽大学的关于跨媒体（NLP与CV结合）的文章，一作是个研究生，团队主页http://davian.korea.ac.kr，文章链接https://arxiv.org/pdf/1804.04128.pdf，看文章的格式应该是ECCV2018在投，作者已经将pytorch code和dataset released在github上面了https://git...

2018-04-13 15:13:39 798

原创 360图像拼接

这是我很早之前用另外一个CSDN帐号写得博客，直接被我拷过来了。图像拼接的定义：将一组相互之间存在重叠部分的图像序列先进行空间配准，再经过图像变换、重采样和图像融合后形成一幅包含每个图像序列的宽视角或360度视角的全景图像的技术。图像拼接技术主要包括三个重要步骤：图像预处理、图像配准和图像融合。 1. 图像预处理目的：降低图像配准的难度，提高图像配准精度。 1.1 ...

2018-04-12 14:56:01 3073

空空如也

空空如也