传统的单标签分类(中国也有翻译成单标记,不过我个人认为还是应该翻译成一个名词)学习是从一个只属于一个标签l的样本集合中学习,其中每一个标签属于一个互斥的标签集合L |L|
> 1。在多标签分类中,每个样本属于一个L样本集合的一个子集。在过去,多标签分类由文本分类和医学分析而产生和推动的。现在,我们发现现代的许多应用对多标签分类方法需求持续增长,比如蛋白质分类,音乐归类,和语义场景分类。
原文比较抽象,翻译比较费解,这里再翻译两个Tsoumakas的综述Multi-Label
Classification: On View(这是一篇不错的入门论文)中的两个例子:一篇有关基督教教堂对于Da Vinci
Code(达芬奇密码,这本书也不错)电影发行反应的新闻文章,可以同时被分类(归类)到Society\Religion(社会\宗教)和Arts\Movies(艺术\电影)。在semantic
scene分类中,一张照片可以属于多个概念类别,如它可以同时属于日出和海滩。
Mulan is a package of Java classes for
Multi-label classification. Mulan contains several problem
transformation and algorithm adaptation methods for multilabel
classification, an uation framework that computes several
multilabel classification uation measures and a class providing
data set statistics. Mulan is built on top of Weka
and it therefore requires the indicated version of Weka and Java
v1.5 or better. Mulan accepts input files in ARFF format structured
in the following way: The input attributes must be followed by |L|
attributes, where L the set of labels. These attributes accept
values from the set {0, 1}. A value of 1 for these attributes
indicates that an instance belongs to this label, while a value of
0 that it does not belong to this label.
Eclipse运行Mulan操作步骤:
(1)下载Mulan,解压;
(2)运行Eclipse,New Java Project,命名随便;
(3)在src目录下 New package, 命名为mulan;
(4)单击mulan 右击 import FileSystem, 选择解压的整个文件夹;
(5)右击工程 Properties Libraries选项卡 单击Add External JARs;
(6)选择Weka安装目录下的 weka.jar;
(7)运行examples.CrossValidationExperiment,即可。