论文笔记: 针对图像分类的多标签主动学习

最新推荐文章于 2023-10-13 16:31:22 发布

闵帆

最新推荐文章于 2023-10-13 16:31:22 发布

阅读量652

点赞数 1

分类专栏：论文笔记文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/minfanphd/article/details/119727301

版权

论文笔记专栏收录该内容

29 篇文章 3 订阅

订阅专栏

原文: Jian Wu 等. Multi-Label Active Learning Algorithms for Image Classification: Overview and Future Promise, ACM Computing Surveys, Vol. 53, No. 2, Article 28. Publication date: March 2020
我报着学习与研讨的态度来写贴子, 难免与原作者的观点有诸多不同, 并没有不尊重原作者的意思.

Abstract

These algorithms can be categorized into two top groups from two aspects respectively: sampling and annotation.
我以前在做主动学习的时候, 一般思考的是如何抽样, 样本送给专家标注就获得了正确结果. 所以很少关心后者. 换言的, 我需要知道后者究竟有什么花样可以玩.
… that actively selects the examples with the highest informativeness from an unlabeled data pool, according to various information measures.
从作者的观点, uncertainty, representativeness 这些都是为计算 informativeness 而存在的 (中间) 指标.

1. Introduction

However, obtaining a large number of labeled images is time-consuming and needs a lot of manpower and resources.
不清楚 manpower 是否标准说法, 我有时用 human endeavor.

2. Problem definition

According to various ways of querying, active learning algorithms can be categorized into three paradigms: membership query synthesis active learning [4, 45, 46, 48, 69], stream-based selective active learning [14, 16, 22, 47, 69, 109], and pool-based active learning [15, 50].
我们做的一般是 pool-based. stream-based 也有很好的实际意义, 它是指对象一个个地来, 我们需要立即确定是否标注它.
Definition 1 里的数学表达与我们的习惯不符
- ${x_i, Y_i\}$ 表示一个对象, 花括号一般是集合专用, 因此我们使用小括号;
- 既然 $x_i$ 和 $Y_i$ 都表示向量, 都应统一成小写;
- $y_{i, l_1}$ 这种描述太繁琐, $l_1$ 直接写成 $1$ 就行了.
- 将 $Y_i$ 称为一个 label set 很奇怪, 它不是一个集合, 而是一个 0, 1 向量.
- 如果按我的习惯, 可以写为
  The data is represented by a matrix $\mathbf{X} = [x_{ij}]_{N \times M} \in \mathbb{R}^{N \times M}$ , where $N$ and $M$ are the number of instances and features, respectively. The $i$ -th row, denoted by $\mathbf{x}_i$ , represents an instance. The labels are represented by a matrix $\mathbf{Y} = [y_{il}]_{N \times L} \in \{-1, +1\}^{N \times L}$ , where $L$ is the number of labels. The $i$ -th row, denoted by $\mathbf{y}_i$ , is the label array of $\mathbf{x}_i$ . $y_{il} = -1$ indicates $\mathbf{x}_i$ has the label, while $y_{il} = +1$ indicates no.
In a typical multi-label active learning scenario, there is an example set $X$ including a small labeled training set $L$ and a large unlabeled dataset $U$ .
这种说法把 semi-supervised learning 过来了, 不一定合适. 对于 active learning, 开始时可以一个标签都没有, 可称之为 cold-start. 作者非要说 typical 问题也不大, 但这样会使得符号表达变得复杂.

3. Overview of topics

3.1 Sampling

抽样是主动学习的重点.

3.1.1 Sampling granularity

Example based
- In an example-based algorithm, all the labels of the selected examples are supposed to be annotated simultaneously.
  这种方案没有针对多标签, 意义相对较弱.
- 每次选信息量最大的样本:
  $\mathbf{x}^* = \argmax_{\mathbf{x} \in \mathbf{U}} Info(\mathbf{x}) \tag{1}$
- 如果有多种评价指标就将它们加权和:
  $Info(\mathbf{x}) = \sum_s \alpha_s I_s(\mathbf{x}) \textrm{ s.t. } \alpha_s \in [0, 1], \sum_s \alpha_s = 1 \tag{2}$
- 还可以用另一种方式融合:
  $Info(\mathbf{x}) = I_1(\mathbf{x})^{\alpha} I_2(\mathbf{x})^{1 - \alpha} \tag{3}$
- 利用已标记数据训练 $\Theta^0 = [f_1^0, \dots, f_m^0]$ , 即每个标签一个 SVM 分类器. 根据不同的 $\alpha$ 取值可以选择一批未标记数据, 记为 $\mathbf{S}$ . 对于任意 $\mathbf{x}' \in \mathbf{S}$ , 预测其标签 $\mathbf{y}'$ , 并将该数据加入数据集 (使用的是伪标签), 训练新的多标签分类器 $\Theta = [f_1, \dots, f_m]$ . $\mathbf{x}'$ 及其标签带来的改变量为
  $\varepsilon(\mathbf{x}') = \sum_{j = 1}^{n_u} \max_{\mathbf{y}_p' = 1}\left[1 - f_p(\mathbf{x}_j)\right]_+ + \max_{\mathbf{y}_p' = 0}\left[1 + f_p(\mathbf{x}_j)\right]_+ \tag{4}$
  其中,
  - 求和表示考虑所有的未标记样本;
  - 第一个 $\sum$ 表示被分类为正例中最大的偏差, 第二个 $\sum$ 表示被分类为反例中最大的偏差. 下标的正号就需要看原文才知道涵义了. 从这里可以看也反例的标签应该为 $- 1$ 而不是 $0$

参数选择就使用
$\alpha^* = \argmin_{\alpha_k \in \mathbf{V}} \sum_{\mathbf{x}' \in \mathbf{S}_k} \varepsilon(\mathbf{x}') \tag{5}$
说了半天还是 (3) 式中的参数选择. 从 $\mathbf{V} = \{0.1, 0.2, \dots, 1.0\}$ 中选一个最好的.

Example-label-based
$(\mathbf{x}, y)^* = \argmax_{\mathbf{x}_j \in \mathbf{U}, y_{jk} \in UL(\mathbf{x}_j)} Info(\mathbf{x}_j, y_{jk}) \tag{6}$
其中:
- 选择的是对象-标签对, 也可以表示为 $j^*, k^*)$ ;
- $y_{jk} \in UL(\mathbf{x}_j)$ 表示相应标签未知. 从这个意义上讲 $\mathbf{x}_j \in \mathbf{U}$ 就冗余了.
  因此我把这个式子改为:
  $(i^*, k^*) = \argmax_{\mathbf{x}_i \in \mathbf{U}, k \in UL(\mathbf{x}_i)} Info(\mathbf{x}_i, k) \tag{6'}$
  还是可以使用多种指标的加权和:
  $Info(\mathbf{x}_j, y_{jk}) = \sum_s \alpha_s I_s(\mathbf{x}_j, y_{jk}) \textrm{ s.t. } \alpha_s \in [0, 1], \sum_s \alpha_s = 1 \tag{7}$
  与基于实例的方案相比, 基于实例-标签的方案更能节约标注量, 因此它比前者更加主流. 但同时需要考虑算法的时间复杂度. 这方面的研究也非常有借鉴意义, 如
- Qi 等 [63] 不但考虑实例空间 $\mathbf{X}$ , 还考虑标签空间 $\mathbf{Y}$ . 需要查证后者是否使用了矩阵分解;
- Wu 等 [94] 首先提出实例-标签的不确定性. 这与单标签的实例不确定性有本质区别, 是多标签特有的;
- Zhang 等 [112] 使用批量方式;
- Guo 等 [91] 结合了低秩映射, 这个也需要查证与矩阵分解的关系.
Mixed-mode-based
混合方式首先选择 $n_s$ 个最具代表性的实例, 再选择这些实例的标签. 最具有代表性的实例为:
$\mathbf{x}^* = \argmax_{\mathbf{x} \in \mathbf{U}} Info_1(\mathbf{x}) \tag{8}$
信息量最大的标签为
$Y_{sub}^* = \argmax_{y_{*, k} \in UL(\mathbf{x}^*)} Info_2(y_{*, k}) \tag{9}$
我将其修改为:
$y_{sub}^*(\mathbf{x}) = \argmax_{k \in UL(\mathbf{x})} Info_2(\mathbf{x}, k) \tag{9'}$
可以认为混合方式是实例-标签方式的一种特例. 它比后者更省时间.
Batch-mode-based methods.
该方式主要用于节约标注者的等待时间.
- 文献 [10] 中, 未标记样本的信息量被定义单个标签的平均熵. 两个未标记样本的多样性使用一个矩阵来计算. 多样性避免在同一批中查询相似样本.
- Jiao 等 [38, 39] 使用 kernel k-means 进行预聚类, 然后使用高斯计算每簇中最具信息量的样本.
- Zhang 等 [111, 112] 提出高阶标签相关性方案. 并将批量选择定义为一个整数规划问题.
- Reyes 等 [68] 将它定义为一个多目标 (信息量、代表性、多样性) 优化问题, 这个需要仔细读下.

3.1.2 Informativeness Measure

总结出了 6 个指标: uncertainty, label correlation, representativeness, diversity, noise content, expected error reduction.

3.1.2.A. Uncertainty

$\left\{\begin{array}{ll} \mathbb{R}^M \rightarrow \mathbb{R}, & \textrm{example-based};\\ \mathbb{R}^M \times [1 .. L] \to \mathbb{R}, & \textrm{example-label-based} .\end{array}\right. \tag{10}$
这个式子与原文的有些不同, 但意思是一样的. Example-based 仅从样本的条件属性来计算样本的不确定性, 而 Example-label-based 要指定相应的标签, 即获得样本-标签对的不确定性.
从直觉的角度, 样本 (样本-标签) 的不确定性越大, 则其信息量越大, 越有利于提高分类器的质量. Singh et al. [73] 使用 SVM 预测对象的各个标签, 然后根据 margin 计算标签的不确定性, 再将不同标签的不确定性取一个均值作为样本的不确定性. 我的想法与这个的前半部分一致, 但考虑 example-label 就不需要最后一步的求均值.
Reyes et al. [66] 将 margin 与排序结合,
$mar_{i, k} = \vert p(y_{ik} = 1 \vert \mathbf{x}_i) - p(y_{ik} = 0 \vert \mathbf{x}_i)\vert \tag{11}$
这个式子的含义是两个概率之差的绝对值. 该值越接近于 0, 表示不确定性越强; 越接近于 1, 表示不确定性越弱. 使用它构建向量 $M(\mathbf{x}_i) = [mar_{i1}, \dots, mar_{iL}]$ .

未完待续

闵帆

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
论文笔记: 针对图像分类的多标签主动学习

原文: Jian Wu 等. Multi-Label Active Learning Algorithms for Image Classification: Overview and Future Promise, ACM Computing Surveys, Vol. 53, No. 2, Article 28. Publication date: March 2020我报着学习与研讨的态度来写贴子, 难免与原作者的观点有诸多不同, 并没有不尊重原作者的意思.AbstractThese algo
复制链接

扫一扫

专栏目录