机器学习的数学基础2：字母表、二叉树、树

最新推荐文章于 2023-06-30 14:19:52 发布

闵帆

最新推荐文章于 2023-06-30 14:19:52 发布

阅读量608

点赞数 1

分类专栏：计算机数学基础文章标签：二叉树数据挖掘

本文链接：https://blog.csdn.net/minfanphd/article/details/116582678

版权

计算机数学基础专栏收录该内容

14 篇文章 8 订阅

订阅专栏

按上一贴的二叉树习题.
a) 二叉树的左、右子树需要区别对待, 所以, 定义子节点函数更合理. 有两种方法: 1) 使用两个函数; 2) 使用一个函数, 但将子树类型写成一个字符类型的参数 (取值为 l 或 r), 其取值范围为一个字母表. 后一种方案由 彭子峰 同学提出.
b) 要想说明子树的子树, 需要处理 lrrl 这类的字符串. 这涉及到字母表的正闭包.
c) 叶节点的子树缺乏定义, 这个还是不行, 需要引入一个空节点 $\phi$ (phi, 不是 emptyset), 它有吸收作用. 即该节点的左、右子树都是它自己.

1. 字母表

我们先来定义字母表、字母表的闭包.

Definition 11. An alphabet $\Sigma$ is a set of characters.

常见的字母表包括: $\Sigma = \{0, 1\}$ , $\Sigma = \{\mathrm{a}, \dots, \mathrm{z}\}$ . 我们这里需要的是 $\Sigma = \{\mathrm{l}, \mathrm{r}\}$ , 其中 l 表示左, r 表示右. 字母在式子里写成 mathrm 格式，表示常量.

字母表的正闭包定义如下:
Definition 12. The positive closure of alphabet $\Sigma$ is given by $\Sigma^+ = \Sigma^1 \cup \Sigma^2 \cup ...$ .

如果 $\Sigma = \{0, 1\}$ , 那么 $\Sigma^+ = \{0, 1, 00, 01, 10, 11, 000, \dots \}$ , 如果 $\Sigma = \{\mathrm{l}, \mathrm{r}\}$ , 那么 $\Sigma^+ = \{\mathrm{l}, \mathrm{r}, \mathrm{ll}, \mathrm{lr}, \mathrm{rl}, \mathrm{rr}, \mathrm{lll}, \dots \}$ .

还有个叫克林的哥们儿考虑空串 $\varepsilon$ (varepsilon)，硬生生把自己的名字加进来. 英文名怎么写我不知道, 直接翻译的.

字母表的克林闭包定义如下:
Definition 13. The Cling closure of alphabet $\Sigma$ is given by $\Sigma^* = \Sigma^0 \cup \Sigma^+ = \{\varepsilon\} \cup \Sigma^+$ .

字母表克林闭包的元素, 就称为字符串.

在某个状态 (可以是图或树的某个节点, 或有限状态自动机的某个节点) 下, 接受一个字符进行状态的跳转, 跳转函数定义为:
Definition 14. Let $\bm{S}$ denote the set of states. The state transition function is given by $\bm{S} \times \Sigma \to \bm{S}$ .

写成一个定义有点牵强, 先这样吧.

为了描述在某个状态接受字符串的跳转, 可定义跳转函数为
Definition 15. Let $\bm{S}$ denote the set of states. The positive state transition function is given by $\bm{S} \times \Sigma^+ \to \bm{S}$ , where $\forall$ $\in \bm{S}$ and $a_1 a_2 \dots a_k \in \Sigma^+$ , $a_1 a_2 \dots a_k) = f(f(s, a_1), a_2 a_3 \dots a_k)$ .

即: 接受第 1 个字符跳转到一个状态, 再接受第 2 个字符继续跳转, 以此类推.

2. 二叉树

2.1 初始版本

Definition 16. Let $\Sigma = \{\mathrm{l}, \mathrm{r}\}$ be the alphbet and $\phi$ be a null node. A binary tree is a triple $(\bm{V}, r, c)$ , where $\bm{V} = \{v_1, \dots, v_n\}$ is the set of nodes, $\in \bm{V}$ is the root, and $\bm{V} \cup \{\phi\} \times \Sigma^+ \to \bm{V} \cup \{\phi\}$ satisfying
a) $c(\phi, \mathrm{l}) = c(\phi, \mathrm{r}) = \phi$ ;
b) $\forall v \in \bm{V} \setminus \{r\}$ , $\exists !$ $\in \Sigma^+$ st. $c (r, s) = v$ ;
c) $\forall v \in \bm{V}, a \in \Sigma$ , $\neq r$ .

说明:
a) 我现在不清楚条件 a) 是否冗余, 好像可以从 b) 和 c) 推导出;
b) 条件 b) 表示从根节点可到达任意其它正常节点, 且路径唯一;
c) 条件 c) 表示根节点没有父节点.
d) 从条件 b) 和 c) 可以推导出 $\bm{V}$ 中节点没有环, 可写为: Property 1: $\forall v \in \bm{V}$ , $\not\exists s \in \Sigma^+$ st. $c (v, s) = v$ . 这里 $\not\exists$ 为 not exists.
e) 上一条可以写成单独的一条性质 Property. 在理论体系中, 哪些内容写进定义, 哪些内容写成性质 (Propertyy)、定理 (Theorem)、命题 (Proposition), 既与内容本身有关, 也与研究者的个人喜好有关.
d) ! 表示 ”不“.
e) $\Sigma$ 和 $\phi$ 是常量, 与具体的哪棵二叉树无关, 因此未把它们作为单独的元组.

习题 9. 考虑 $\phi$ , 重新写 Definition 7 以解决其存在的问题, 见其讨论 d).

2.2 打磨版本

同学们终于肯跟我讨论了, 而且指出了前面定义的若干问题. 是时候展示真正的实力了!

Definition 17. Let $\Sigma = \{\mathrm{l}, \mathrm{r}\}$ be the alphbet and $\phi$ be a null node. A binary tree is a triple $(\bm{V}, r, c)$ , where $\bm{V} = \{v_1, \dots, v_n\}$ is the set of nodes, $\in \bm{V}$ is the root, and $\bm{V} \cup \{\phi\} \times \Sigma^* \to \bm{V} \cup \{\phi\}$ satisfying
$\forall v \in \bm{V}$ , $\exists !$ $\in \Sigma^*$ st. $c (r, s) = v$ .

说明:
a) Definition 16 中, a) 与 c) 都是冗余的.
b) 如果仅保留 Definition 16 中的 b), 会出现 bug. 反例: $\bm{V} = \{r\}$ , $\mathrm{l}) = c(r, \mathrm{r}) = \phi$ , $c(\phi, \mathrm{l}) = c(\phi, \mathrm{r}) = \phi$ . Definition 17 中的条件未单独考虑 $r$ , 因此修复了该 bug.
c) $\varepsilon) = r$ . 即从 $r$ 读入空串到自己.

现在讨论该二叉树的几个性质:
第一, 二叉树的任何节点 (空节点除外) 不会有到自己的环.
Property 1. $\forall v \in \bm{V}$ , $\not\exists s \in \Sigma^+$ st. $c (v, s) = v$ .
Proof. Suppose that $\exists v_i \in \bm{V}$ and $\in \Sigma^+$ st. $c(v_i, s') = v_i$ .
According to Definition 17, $\exists s_1 \in \Sigma^*$ st. $c(r, s_1) = v_i$ .
Consequentyly $c(r, s_1s') = c(c(r, s_1), s') = c(v_i, s') = v_i$ , and $s$ takes at least two values ( $s_1$ and $s_1s'$ ), making it not unique.
This contradition shows that the assumption does not hold.
The proof is finished.

第二, 空节点的左右孩子都是自己.
Property 1. $c(\phi, \mathrm{l}) = c(\phi, \mathrm{r}) = \phi$ .
Proof. Given any $a_1 a_2 \dots a_{n+1} \in \Sigma^*$ , we consider $c (r, s)$ . Let the path corresponding to the calculation of $c (r, s)$ be $v_0' v_1' \dots v_{n+1}'$ where $v_0' = r$ . Since $|\bm{V} \cup \{\phi\}| = n + 1$ , according to the Pigeon Cage Principle (鸽笼原理), there must $\exists$ $\leq i < j \leq n + 1$ st. $v_i' = v_j'$ . In other words, $v_i' \dots v_j'$ is a loop.
According to Property 1, $v_i' \not\in \bm{V}$ , hence $v_i' = v_j' = \phi$ .
Now assume that $\exists i < k < j$ st. $v_k' \in \bm{V}$ . We have $a_1 a_2 \dots a_k) = v_k'$ , and $a_1 a_2 \dots a_j a_{i+1} a_{i + 2} \dots a_k) = v_k'$ , making the path from $r$ to $v_k'$ not unique.
Hence the assumption does not hold, and $v_i' = v_{i + 1}' = \dots = v_j' = \phi$ .
In other words, any character takes $\phi$ to itself.
This completes the proof.

说明:
a) 鸽笼原理: $n + 1$ 只鸽子飞进 $n$ 个鸽笼, 至少有两个鸽子在同一个笼子里. 这是组合数学中重要的定理.
b) 这里用到了有穷状态自动机 (Finite state automata) 的知识. 从任一节点 (状态), 读入一个字符, 到达下一个节点. 这里的 $\phi$ 被称为 陷井状态.