2018年08月_黄鑫huangxin

转载 Python 图像库PIL

1、简单实用Image函数从文件加载图像，用Image函数的open方法>>> from PIL import Image>>> im = Image.open("hopper.ppm")如果成功，这个函数将返回一个图像对象。现在您可以使用实例属性来检查文件内容>>> from __future__ import print_fun...

2018-08-28 16:33:21 249

原创 pytorch之权重初始化

因为权重是一个Variable，所以只需要取出其中的data属性，然后对它进行所需要的处理。for m in model.modules(): if isinstance(m,nn.Conv2d): nn.init.normal(m.weight.data) nn.init.xavier_normal(m.weight.data) nn...

2018-08-23 12:03:48 3281

原创 Pytorch之提取模型中的某一层

modules()会返回模型中所有模块的迭代器，它能够访问到最内层，比如self.layer1.conv1这个模块，还有一个与它们相对应的是name_children()属性以及named_modules(),这两个不仅会返回模块的迭代器，还会返回网络层的名字。方法如下：new_model = nn.Sequential(*list(model.children())[:2] 取模型...

2018-08-22 22:13:07 19573 4

原创 cross-entropy loss(softmax) 与hinge loss(SVM)

2018-08-22 17:06:32 561

原创 pytorch中 max()、view()、 squeeze()、 unsqueeze()

总结：max()：max(a),用于一维数据，求出最大值。 max(a,0),计算出数据中一列的最大值，并输出最大值所在的行号。 max(a,1),计算出数据中一行的最大值，并输出最大值所在的列号。a.view(i,j)表示将原矩阵转化为i行j列的形式 a.view(i,-1)表示不限制列数，输出i行 a.view(-1,j)表示不限制行数，输出j列a.squeeze(i)...

2018-08-22 10:51:30 506

原创《Adversarial Cross-Modal Retrieval》阅读笔记

论文地址：https://www.researchgate.net/publication/320541510_Adversarial_Cross-Modal_Retrievalwww.researchgate.net来源：ACM Multimedia 2017作者：电子科技大学英才实验学院2014级本科生王泊锟同学以第一作者身份发表，获ACM Multimedia 2017会议最佳...

2018-08-19 12:44:43 7689 19

原创 python之matplotlib绘制散点

使用scatter()绘制散点图#随意给出一些点x_train = np.array([[3.3],[4.4],[5.5],[6.71],[6.93],[4.168],[9.776],[6.182],[7.59],[2.167]],dtype=np.float32)y_train = np.array([[1.7],[2.76],[2.09],[3.19],[1.694],[1.573]...

2018-08-18 17:23:20 523

原创《图文匹配&VQA》小结

图文匹配以及图像的QA是图像与文本多模态融合，是计算机视觉与自然语言处理的交叉。图文匹配：将图像与文本都映射到一个相同的语义空间，然后通过距离对他们的相似度进行判断。图文匹配问题与VQA最大的不同就是，需要比对两种特征之间的距离。将文本和图像分别做attention，DAN计算每一步attention后的文本和图像向量相似度累加得到similarity.VQA：给定一张图像和一...

2018-08-16 10:08:37 4417 1

原创《Dual Attention Networks for Multimodal Reasoning and Matching》

Dual Attention Networks for Multimodal Reasoning and MatchingCVPR 2017图文匹配终极问题是整个Text与整个Image的匹配问题，但是这个问题比较难以解决，所以一个最基本的想法就是把这个问题拆分开来，Text由不同的单词构成，Image由不同的区域构成，如果能把Text的单词与Image的区域进行一个匹配，那么这个问题就会...

2018-08-16 10:08:25 1739

原创《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》

《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》来源：CVPR 2018参考CSDN博客：论文笔记：Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering...

2018-08-16 10:08:15 3503

原创《Linking Image and Text with 2-Way Nets》

Linking Image and Text with 2-Way NetsCVPR 2017这篇文章可以为Corr-AE中的Corr-Cross-AE结构的一种拓展，另外文章中加入了很多的技巧和约束，且都有理论上的证明。在介绍这篇文章之前，先回顾一下Corr-Cross-AE结构。1.文本和图像特征分别通过encoder映射到共同空间，然后用L2计算文本和图像之间相似性，...

2018-08-08 15:52:00 489

原创《Learning Semantic Concepts and Order for Image and Sentence Matching》

Learning Semantic Concepts and Order for Image and Sentence MatchingCVPR 2018一、Motivation目前该领域主要问题之一是像素级别的图片描述缺少高层次的语义信息，之前的做法都是提取一个全局的CNN特征向量。这样一些主要的信息就会占据主导地位，背景一些的就会被忽略。这篇文章提出了语义增强图片及语句匹配模型。...

2018-08-08 15:51:17 1191 1

转载 Triplet Loss

Triplet Loss三元组（Triplet）计算而来的损失（Loss）由Anchor(A)，Negative(N)，Positive(P)组成，基点为A,正匹配为P，负匹配为N。Triplet Loss的学习目标可以形象的表示如下图：网络没经过学习之前，A和P的欧式距离可能很大，A和N的欧式距离可能很小，如上图左边，在网络的学习过程中，A和P的欧式距离会逐渐减小，而A和N的...

2018-08-08 15:50:36 1899

原创《Stacked Cross Attention for Image-Text Matching》

ECCV 2018主要思路：分别对文本和图像应用attention的机制，学习比较好的文本和图像表示，然后再在共享的子空间中利用hard triplet loss度量文本和图像之间的相似性。图像特征：采用ResNet-101的Faster R-CNN网络对每一个图像产生k个目标区域，提取每一个目标对象的特征，嵌入矩阵变换为h维的vector文本特征：文本的每一个word得到...

2018-08-08 15:49:41 4855 7

翻译 Cross-media Retrieval

Reference：An Overview of Cross-media Retrieval: Concepts,Methodologies, Benchmarks and Challenges（关于跨模态检索的概念、方法、主要挑战和开放性问题，包括数据集和实验结果的基准）主要挑战：media gap不同模态的表示特征不一致并且位于不同的特征空间中，主要挑战是度量它们之间的相似性。...

2018-08-08 15:48:25 1155

原创《Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models》

来源：CVPR2018一、Introduction第一篇同时利用GAN和Reinforcement Learning(RL)做跨媒体检索的文章。这个网络可以同时做三个跨媒体的任务：cross-media retrieval，image caption and text-to-image synthesis（对于后两个任务，文章只给出了可视化的结果，没有给出定量的分析）。这篇文章发表...

2018-08-08 15:47:43 2372 5

原创《Learning Cross-modal Embeddings for Cooking Recipes and Food Images》阅读笔记

论文地址：https://www.researchgate.net/publication/320964718_Learning_Cross-Modal_Embeddings_for_Cooking_Recipes_and_Food_Imageswww.researchgate.net来源：CVPR 2017一、Introduction文章要做的事情(recipe retreiv...

2018-08-08 15:45:18 1306

原创《Learning Cross-modal Embeddings for Cooking Recipes and Food Images》阅读笔记

来源：CVPR 2017一、Introduction文章要做的事情(recipe retreival)：输入：image（sentence）+dataset 　　　　　输出：sentence（image） rank list在本文中介绍了Recipe1M数据集，并训练一个食谱和图像联合嵌入的神经网络，应用于图像配方检索任务上。另外，证明通过添加高级分类目标的正则化既提高了检索性能。...

2018-08-08 15:45:15 1085 3

原创 Cross-modal Retrieval

Cross-modal retrieval aims at retrieving relevant items that are of different nature with respect to the query format.Four Challenges:1.representation2.translation3.alignment(对齐)4.co-learnin...

2018-08-08 15:44:19 2143

原创《Cross-Modal Retrieval in the Cooking Context__Learning Semantic Text-Image Embeddings》

论文地址：https://arxiv.org/pdf/1804.11146.pdfarxiv.org来源：ACM SIGIR2018（暂未发布源码）一、Introduction:文章要做的事情(recipe retreival)：输入：image（sentence）+dataset 　　　　　输出：sentence（image） rank list在本文中，我...

2018-08-08 15:43:24 1004

黄鑫的博客

转载 Python 图像库PIL

原创 pytorch之权重初始化

原创 Pytorch之提取模型中的某一层

原创 cross-entropy loss(softmax) 与hinge loss(SVM)

原创 pytorch中 max()、view()、 squeeze()、 unsqueeze()

原创《Adversarial Cross-Modal Retrieval》阅读笔记

原创 python之matplotlib绘制散点

原创《图文匹配&VQA》小结

原创《Dual Attention Networks for Multimodal Reasoning and Matching》

原创《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》

原创《Linking Image and Text with 2-Way Nets》

原创《Learning Semantic Concepts and Order for Image and Sentence Matching》

转载 Triplet Loss

原创《Stacked Cross Attention for Image-Text Matching》

翻译 Cross-media Retrieval

原创《Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models》

原创《Learning Cross-modal Embeddings for Cooking Recipes and Food Images》阅读笔记

原创《Learning Cross-modal Embeddings for Cooking Recipes and Food Images》阅读笔记

原创 Cross-modal Retrieval

原创《Cross-Modal Retrieval in the Cooking Context__Learning Semantic Text-Image Embeddings》

空空如也

空空如也