数学表达式: 从恐惧到单挑 (14. 决策表)

14. 决策表

决策表是机器学习中常用的概念.

14.1 数值数据

如果采集的数据均为实型值, 标签为二值的 (是/否, 0/1), 相应的决策表称为二分类决策系统.
Definition 1. A binary class decision system is a tuple S = ( X , Y ) S = (\mathbf{X}, \mathbf{Y}) S=(X,Y) where X = [ x i j ] n × m ∈ R n × m \mathbf{X} = [x_{ij}]_{n \times m} \in \mathbb{R}^{n \times m} X=[xij]n×mRn×m is the data matrix, Y = [ y 1 , y 2 , … , y n ] ∈ { 0 , 1 } n \mathbf{Y} = [y_1, y_2, \dots, y_n] \in \{0, 1\}^n Y=[y1,y2,,yn]{0,1}n is the label array, n n n is the number of instances, and m m m is the number of features.
Naturally, y i y_i yi is the label of instance x i = [ x i 1 , x i 2 , … , x i m ] \mathbf{x}_i = [x_{i1}, x_{i_2}, \dots, x_{im}] xi=[xi1,xi2,,xim].

  • 在某些文献中, m m m n n n 可能互换, 即每一列表示一个实例.
  • 讨论: n n n, m m m 是否应该出现在元组里面?

Java 代码:

public class BinaryDecisionSystem {
	/**
	 * 数据矩阵.
	 */
	double[][] data;

	/**
	 * 决策属性.
	 */
	boolean[] decision;

	/**
	 * 初始化一个决策系统.
	 */
	public BinaryDecisionSystem(int paraN, int paraM) {
		data = new double[paraN][paraM];
		decision = new boolean[paraN];
	}// Of the constructor
}// Of class BinaryDecisionSystem

以下各定义, 均建议读者先自己写一个版本, 再与我的对照.

如果标签为多值的 (高/中/低, 1/2/…/ d d d). 相应的决策表称为多分类决策系统.

Definition 2. A multi-class decision system is a tuple S = ( X , Y ) S = (\mathbf{X}, \mathbf{Y}) S=(X,Y) where X = [ x i j ] n × m ∈ R n × m \mathbf{X} = [x_{ij}]_{n \times m} \in \mathbb{R}^{n \times m} X=[xij]n×mRn×m is the data matrix, Y = [ y 1 , y 2 , … , y n ] ∈ { 1 , 2 , … , d } n \mathbf{Y} = [y_1, y_2, \dots, y_n] \in \{1, 2, \dots, d\}^n Y=[y1,y2,,yn]{1,2,,d}n is the label array, n n n is the number of instances, m m m is the number of features, and d d d is the number of classes.

如果有多个标签, 且均为二值, 相应的决策表称为多标签决策系统.
Definition 3. A multi-label decision system is a tuple S = ( X , Y ) S = (\mathbf{X}, \mathbf{Y}) S=(X,Y) where X = [ x i j ] n × m ∈ R n × m \mathbf{X} = [x_{ij}]_{n \times m} \in \mathbb{R}^{n \times m} X=[xij]n×mRn×m is the data matrix, Y = [ y i k ] n × l ∈ { 0 , 1 } n × l \mathbf{Y} = [y_{ik}]_{n \times l} \in \{0, 1\}^{n \times l} Y=[yik]n×l{0,1}n×l is the label matrix, n n n is the number of instances, m m m is the number of features, and l l l is the number of labels.

14.2 符号型数据

表 1. 符号型决策表.

PatientHeadacheTemperatureLymphocyteLeukocyteEosinophilHeartbeatFlu
x 1 x_1 x1YesHighHighHighHighNormalYes
x 2 x_2 x2YesHighNormalHighHighAbnormalYes
x 3 x_3 x3YesHighHighHighNormalAbnormalYes
x 4 x_4 x4NoHighNormalNormalNormalNormalNo
x 5 x_5 x5YesNormalNormalLowHighAbnormalNo
x 6 x_6 x6YesNormalLowHighNormalAbnormalNo
x 7 x_7 x7YesLowLowHighNormalNormalYes

思考:

  • 属性值为符号型, 怎么办?
  • 不同属性的定义域不同, 怎么办?

Definition 4. A decision system is a 5-tuple S = ( U , C , D , V , I ) S = (\mathbf{U}, \mathbf{C}, \mathbf{D}, \mathbf{V}, I) S=(U,C,D,V,I), where

  • U = { x 1 , x 2 , … , x n } \mathbf{U} = \{x_1, x_2, \dots, x_n\} U={x1,x2,,xn} is the set of instances,
  • C = { a 1 , a 2 , … , a m } \mathbf{C} = \{a_1, a_2, \dots, a_m\} C={a1,a2,,am} is the set of conditions attributes,
  • D = { d 1 , d 2 , … , d l } \mathbf{D} = \{d_1, d_2, \dots, d_l\} D={d1,d2,,dl} is the set of decisional attributes,
  • V = ⋃ a ∈ C ∪ D V a \mathbf{V} = \bigcup_{a \in \mathbf{C} \cup \mathbf{D}}\mathbf{V}_a V=aCDVa,
  • V a \mathbf{V}_a Va is the domain of a ∈ C ∪ D a \in \mathbf{C} \cup \mathbf{D} aCD,
  • I : U × ( C ∪ D ) → V I: \mathbf{U} \times (\mathbf{C} \cup \mathbf{D}) \to \mathbf{V} I:U×(CD)V is the information function.

说明:

  • U \mathbf{U} U, C \mathbf{C} C, 和 D \mathbf{D} D 本质上是枚举型的;
  • 使用 V \mathbf{V} V 是为了方便;
  • C \mathbf{C} C D \mathbf{D} D 应分开;
  • I I I 是映射 (函数).

课堂练习:

  • 写出本例中的 U \mathbf{U} U, C \mathbf{C} C, D \mathbf{D} D V \mathbf{V} V. 注: 最后两个属性为决策属性.
  • I I I 是怎么表示的?

Java 代码:

public enum ValuesEnum {
	YES, NO, HIGH, LOW, NORMAL, ABNORMAL;
}//Of enum ValuesEnum
public class DecisionSystem {
	/**
	 * Number of instances.
	 */
	int numInstances;

	/**
	 * Number of conditions.
	 */
	int numConditions;

	/**
	 * Number of decisions.
	 */
	int numDecisions;

	/**
	 * The data.
	 */
	ValuesEnum data[][];

	/**
	 * Construct the data.
	 */
	public DecisionSystem(int paraN, int paraM, int paraL) {
		numInstances = paraN;
		numConditions = paraM;
		numDecisions = paraL;
		data = new ValuesEnum[numInstances][numConditions + numDecisions];
	}// Of the first constructor
}// Of class DecisionSystem

进一步思考: 可否将属性写成函数?
Definition 5. A decision system is a quadruple S = ( U , C , D , V ) S = (\mathbf{U}, \mathbf{C}, \mathbf{D}, \mathbf{V}) S=(U,C,D,V), where

  • U = { x 1 , x 2 , … , x n } \mathbf{U} = \{x_1, x_2, \dots, x_n\} U={x1,x2,,xn} is the set of instances,
  • C = { a 1 , a 2 , … , a m } \mathbf{C} = \{a_1, a_2, \dots, a_m\} C={a1,a2,,am} is the set of conditions attributes,
  • D = { d 1 , d 2 , … , d l } \mathbf{D} = \{d_1, d_2, \dots, d_l\} D={d1,d2,,dl} is the set of decisional attributes,
  • ∀ a ∈ C ∪ D \forall a \in \mathbf{C} \cup \mathbf{D} aCD, a : U → V a a: \mathbf{U} \to \mathbf{V_a} a:UVa is an information function,
  • V = ⋃ a ∈ C ∪ D V a \mathbf{V} = \bigcup_{a \in \mathbf{C} \cup \mathbf{D}}\mathbf{V}_a V=aCDVa.

讨论:

  • 每个属性都是一个函数, 函数是否可以作为集合的元素?
  • 是否可以简写为 S = ( U , C , D ) S = (\mathbf{U}, \mathbf{C}, \mathbf{D}) S=(U,C,D)?
  • 是否可以简写为 S = ( U , C ∪ D ) S = (\mathbf{U}, \mathbf{C} \cup \mathbf{D}) S=(U,CD)

14.4 决策系统体系

结合论文 A hierarchical model for test-cost-sensitive decision systems 讲解.

14.5 作业

  • 定义一个标签分布系统, 即各标签的值不是 0/1, 而是 [ 0 , 1 ] [0, 1] [0,1] 区间的实数, 且同一对象的标签和为 1.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值