bow(li feifei)

Two bag-of-words classifiers

ICCV 2005 short courses on
Recognizing and Learning Object Categories



A simple approach to classifying images is to treat them as a collection of regions, describing only their appearance and igorning their spatial structure. Similar models have been successfully used in the text community for analyzing documents and are known as "bag-of-words" models, since each document is represented by a distribution over fixed vocabulary(s). Using such a representation, methods such as probabalistic latent semantic analysis (pLSA) [1] and latent dirichlet allocation (LDA) [2] are able to extract coherent topics within document collections in an unsupervised manner.

Recently, Fei-Fei et al. [3] and Sivic et al. [4] have applied such methods to the visual domain. The demo code implements pLSA, including all pre-processing stages. For comparison, a Naive Bayes classifier is also provided which requires labelled training data, unlike pLSA.

The code consists of Matlab scripts (which should run under both Windows and Linux) and a couple of 32-bit Linux binaries for doing feature detection and representation. Hence the whole system will need to be run on Linux. The code is for teaching/research purposes only. If you find a bug, please email me at fergus where csail point mit point edu.

Download
Download the code and datasets (32 MBytes): tgz file zip file
Operation of code
To run the demos:
1. Unpack the .zip file into a new directory (e.g. /home/username/demos)
2. Put the common/ directory in your Matlab path (by doing 'addpath /home/username/demos/common'). This directory holds all the code used in the experiments
3. move into one of the experiment directories, e.g. 'cd /home/username/demos/experiments/parts_structure'
4. ensure that the paths at the top of the config_file_1.m file are correct. (i.e. swap 'fergus' for 'username' or whatever).
5. Then you are good to go. Type 'doall('config_file_1')' and it all run. Since the training is manual, you will need to do some clicking at some point, but the rest is automated.

For an overview of what is going on see the sections below. For a more detailed understanding, read the comments in the .m files.
Description of code

Experiment script
config_file_1.m - Each experiment has its own script. This holds ALL settings required to reproduce the experiment in its entirety. The script should sit in its own subdirectory within experiments/. Each of the 'do_' functions is passed the script. The parameters and settings are grouped into structures for neatness. At the top of the script, the paths to various key directories are set. Before running ensure that these are correct. At the very top of the script is the EXPERIMENT_TYPE variable, used by do_all.m to call the appropriate sub-stages. For these demos, it should either be 'plsa' or 'naive_bayes'.
Top-level function
do_all.m - Master function that calls all the different section of the algorithms in turn. The choice of the algorithm is made by the EXPERIMENT_TYPE setting in the configuration file.
Stages of the algorithm
do_random_indices.m - randomly picks the training and test indices as stores them in a file in the current experiment directory.
do_preprocessing.m - copies and rescales (as specified) all images from the IMAGES_DIR into the images/ directory in the current experiement directory.
do_interest_op.m - Runs a very crude interest operator over the images to gather a set of regions. See Edge_Sampling.m for details of the operator. A Caddy edge detector is used, which is implemented as binary (both Windows and Linux versions are provided).
do_representation.m - takes the regions produced by do_interest_op.m and characterises their appearance, in this instance using SIFT features. This is done by means of a linux binary.do_form_codebook.m - Runs k-means on the SIFT vectors from all training images to build a visual vocabulary. Uses a Linux .mex file, but source code is provided so you can recompile for any platform you wish.
do_vq.m - Vector quantizes the regions from all images using the vocabulary built by do_form_codebook.m.
do_plsa.m - Learns a pLSA model from the training images. The number of topics is specified in Learn.Num_Topics in the configuration file.
do_plsa_evaluation.m - Test a pLSA model on the testing images. Plots ROC and RPC curves as well as test images with regions overlaid, colored according to their preffered topic.
do_naive_bayes.m - Forms a Naive Bayes classifier from the training images.
do_naive_bayes_evaluation.m - Tests the Naive Bayes classifier on the testing images. Plots ROC and RPC curves as well as test images with regions overlaid, colored according to their preffered topic.

Walkthough of demo


Sample images
This is a sample of the images used for this demo. It contains a mix of faces from the Caltech face dataset and images from the Caltech background datasets. The do_preprocessing.m script should resize all of them to 200 pixels in width.
bag of word(li feifei)转 - 黄药师 - 东邪

Interset point operator
Running do_interest_op.m calls the crude interest operator, Edge_Sampling.m. This runs a Canny Edge detector over the image and then samples points from the set of edgels. The scale is also determined by randomly sampling over a uniform distribution. The two images below show the set of edgels and sampled regions for a typical image.
bag of word(li feifei)转 - 黄药师 - 东邪
bag of word(li feifei)转 - 黄药师 - 东邪

Representation
Running do_representation.m produces SIFT vectors for each interest point. Once a visual vocabulary or codebook has been built and the do_vq.m procedure has been run, each interest point file should contain the variables shown in the figure below (in this image, there are 154 regions - this number will vary from image to image):
bag of word(li feifei)转 - 黄药师 - 东邪

Training
Running do_plsa.m should produce output like (n.b. it will not be identical due to the random initialization of the model):
>> do_plsa('config_file_1')
Iteration 1 dLi=0.000000
Iteration 2 dLi=225.753173
Iteration 3 dLi=142.124978
Iteration 4 dLi=114.276770
Iteration 5 dLi=102.580805
Iteration 6 dLi=95.619991
Iteration 7 dLi=89.379039
Iteration 8 dLi=83.222576
Iteration 9 dLi=78.063911
Iteration 10 dLi=74.565845
Iteration 11 dLi=72.679587
Iteration 12 dLi=72.276609
Iteration 13 dLi=73.070117
Iteration 14 dLi=74.539317
...
It should finish after 100 iterations.

Evaluation
Running do_plsa_evaluation.m will then produce (after a few more EM iterations) several figures. Below are plots of the p(w|z) (on left) and p(d|z) (on right) densities:
bag of word(li feifei)转 - 黄药师 - 东邪 bag of word(li feifei)转 - 黄药师 - 东邪

Then the ROC and RPC curves are shown for classification task (face present/absent) only - the bag of words models cannot localize.
bag of word(li feifei)转 - 黄药师 - 东邪 bag of word(li feifei)转 - 黄药师 - 东邪

Finally, some example images are plotted with the regions overlaid, coloured according to their preferred topic:
bag of word(li feifei)转 - 黄药师 - 东邪

References
[1] Hofmann, T., "Probabalistic Latent Semantic Analysis", UAI 1999.
[2] Blei, D. and Jordan, M., "Latent Dirichlet Allocation",Journal of Machine Learning Research, 3:993-1022, January 2003.
[3] Fei-Fei, L. and Perona, P., "A Bayesian Heirarcical Model for Learning Natural Scene Categories", Proc. CVPR, 2005.
[4] Sivic, J. and Russell, B. and Efros, A. and Zisserman, A. and Freeman, W., "Discovering object categories in image collections." Proc. Int'l Conf. Computer Vision, Beijing, 2005.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
本demo实现的是基于bow原理对图片进行分类,并实现对选取得测试集进行查找 BoW(Bag of Words)词袋模型最初被用在文本分类中,将文档表示成特征矢量。它的基本思想是假定对于一个文本,忽略其词序和语法、句法,仅仅将其看做是一些词汇的集合,而文本中的每个词汇都是独立的。简单说就是讲每篇文档都看成一个袋子(因为里面装的都是词汇,所以称为词袋,Bag of words即因此而来),然后看这个袋子里装的都是些什么词汇,将其分类。如果文档中猪、马、牛、羊、山谷、土地、拖拉机这样的词汇多些,而银行、大厦、汽车、公园这样的词汇少些,我们就倾向于判断它是一篇描绘乡村的文档,而不是描述城镇的。 serachFeatures.py中,前面主要是一些通过parse使得可以在敲命令行的时候可以向里面传递参数,后面就是提取SIFT特征,然后聚类,计算TF和IDF,得到单词直方图后再做一下L2归一化。一般在一幅图像中提取的到SIFT特征点是非常多的,而如果图像库很大的话,SIFT特征点会非常非常的多,直接聚类是非常困难的(内存不够,计算速度非常慢),所以,为了解决这个问题,可以以牺牲检索精度为代价,在聚类的时候先对SIFT做降采样处理。最后对一些在在线查询时会用到的变量保存下来。对于某个图像库,我们可以在命令行里通过下面命令生成BoF。 query.py只能每次查找一张图片,并返回与之匹配度(递减)最接近的6张图片

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值