数学表达式day5

最新推荐文章于 2023-04-15 16:10:42 发布

伍大大

最新推荐文章于 2023-04-15 16:10:42 发布

阅读量140

点赞数

分类专栏：笔记文章标签：数学

本文链接：https://blog.csdn.net/FERNsummer/article/details/119345203

版权

笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

决策表

Patient	Headache	Temperature	Lymphocyte	Leukocyte	Eosinophil	Heartbeat	Flu
$x_1$	Yes	High	High	High	High	Normal	Yes
$x_2$	Yes	High	Normal	High	High	Abnormal	Yes
$x_3$	Yes	High	Highl	High	Normal	Abnormal	Yes
$x_4$	No	High	Normal	Normal	Normal	Normal	Yes
$x_5$	Yes	Normal	Normal	Low	High	Abnormal	Yes
$x_6$	Yes	Normal	Low	High	Normal	Abnormal	Yes
$x_7$	Yes	Low	Low	High	Normal	Normal	Yes

写出本例中的 $\mathbf{U}, \mathbf{C}, \mathbf{D}$ 和 $\mathbf{V}$ . 注:
最后两个属性为决策属性. $I$ 是怎么表示的?

14.5 作业
定义一个标签分布系统, 即各标签的值不是 0/1, 而是 [0,1] 区间的实数, 且同一对象的标签和为 1.

首先解释一下分布系统，即满足概率分布，也就是说这个系统同一对象的所有决策标签的和要为 1. 根据这个想法写一下定义：
A multi-label distribution system is a tuple $(\mathbf{X}, \mathbf{Y})$ where $\mathbf{X} = [x_{ij}]_ {n \times m} \in \mathbb{R}^{ n \times m }$ is the data matrix, $\mathbf{Y}$ is the label array, and $\forall y_{ik} \in [0, 1], \exist \sum_{k=1}^{l}y_{ik} = 1 st. \mathbf{Y} = [y_{ik}]_{n \times l} \in [0, 1]^{n \times l}$ , $n$ is the number of instances, $m$ is the number of features.

15.3 作业
找一篇你们小组的论文来详细分析数学表达式, 包括其涵义, 规范, 优点和缺点.

感恩老师分享了她的论文给我们做表达式分析的范本~~ 【Noise label learning through label confidence statistical inference】

请添加图片描述

首先，
$\mathbf{x}_i \in \mathbf{D}_l \setminus \mathbf{T}$ 说明对象 $\mathbf{x_i}$ 是训练数据集 $\mathbf{D}_l$ 减去已知对象集合 $\mathbf{T}$ 所表示的集合中的某个元素.
$c$ 表示为置信度的函数关系 (the instance confidence function. 不知道具体是不是这么翻译的~ ), $\mathbf{D}_l \setminus \mathbf{A} \times 2^{\mathbf{A}} \to [0, 1]$ , 这个函数关系我理解为是将除了 $\mathbf{A} \times 2^{\mathbf{A}}$ 的训练数据集 $\mathbf{D}_l$ 映射到 0 到 1 的范围里，这里的 $\mathbf{A} \times 2^{\mathbf{A}}$ ，一个对象集合 $\mathbf{A}$ 与它的幂集 $2^{\mathbf{A}}$ 进行一个笛卡尔积，我不知道到底该怎么解释。总之这里的 $c(\mathbf{x}_i, \mathbf{T})$ 表示 $\mathbf{x}_i$ 相对于 $\mathbf{T} \subseteq \mathbf{A}$ 的置信度.
$(\mathbf{x}_i, \mathbf{x}_j)$ 表示为 $\mathbf{x}_i, \mathbf{x}_j$ 之间的距离， $r$ 表示这个对象的邻域半径。
$s^*$ 就是置信度函数 $c(\mathbf{x}_i, \mathbf{T})$ 在 $\mathbf{x}_i \in \mathbf{D}_l \setminus \mathbf{T}$ 的情况下，取到最大值时的参数。
这个参数 $s *$ 使得 $\mathbf{x}_i, \mathbf{x}_j$ 之间的距离要小于邻域半径 $r$ 。

请添加图片描述
$r_p$ 为实例对的邻域半径， $\mathbf{N}_{r_p}$ 表示为相对于 $r_p$ 的实例对邻居，也就是说， $\mathbf{x}_i$ 和 $\mathbf{x}_j$ 之间的距离要小于等于邻域半径 $r_p$ , 在这个条件下对 $\mathbf{x}_i$ 和 $\mathbf{x}_j$ 取样。
$y_i$ 和 $y_j$ 表示为 $\mathbf{x}_i$ 和 $\mathbf{x}_j$ 的类标签。
$f(r_p)$ 是实例对标签不一致性统计函数。在函数中表示为，满足实例对类标签不同的情况下的实例对 $\mathbf{x}_i$ 和 $\mathbf{x}_j$ 的个数与所有实例对邻居的比值。
请添加图片描述
$\mathbf{N}_r(\mathbf{x}_i)$ 表示为半径为 $r$ 的包含在训练数据集中的样本 $\mathbf{x}_i$ 的邻域。

$\mathbf{N}_T(\mathbf{x}_i)$ 表示为半径为 $r$ 的包含在训练数据集中的样本 $\mathbf{x}_i$ 的可信邻域。
$\mathbf{x}_j$ 既是可信实例集合中的样本，又是样本 $\mathbf{x}_i$ 的邻域。
请添加图片描述
对于要预测的实例 $\mathbf{x}_i$ ，当且仅当 $\mathbf{x}_j$ 是可信邻域 $\mathbf{N}_T(\mathbf{x}_i)$ 中的样本， $\mathbf{x}_i$ 和 $\mathbf{x}_j$ 之间的距离取得最大值时，参数值 $\mathbf{x}_j$ 被称为最近的可信邻域。

这里有一个问题， $\mathbf{x}_i$ 之类的样本不是独立集合而且集合中的一个元素，为什么要加粗呢？不知道是我理解的问题还是一些瑕疵？
暂且先写到这些，后面还有一些式子稍微有点难度，慢慢再看~

伍大大

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
数学表达式day5

决策表14.5 作业定义一个标签分布系统, 即各标签的值不是 0/1, 而是 [0,1] 区间的实数, 且同一对象的标签和为 1.首先解释一下分布系统，即满足概率分布，也就是说这个系统同一对象的所有决策标签的和要为 1. 根据这个想法写一下定义：A multi-label distribution system is a tuple S=(X,Y)S = (\mathbf{X}, \mathbf{Y})S=(X,Y) where X=[xij]n×m∈Rn×m\mathbf{X} = [x_{
复制链接

扫一扫