05.决策树公式推导

ID3决策树

信息熵,是度量样本集合纯度最常用的一种指标,其定义如下
E n t ( D ) = − ∑ k = 1 ∣ Y ∣ p k log ⁡ 2 p k Ent(D) = -\sum_{k=1}^{|\mathcal{Y}|}p_{k}\log_{2}p_{k} Ent(D)=k=1Ypklog2pk
其中 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x n , y n ) } D=\left \{ (x_{1},y_{1}),(x_{2},y_{2}),\cdots,(x_{n},y_{n}) \right \} D={(x1,y1),(x2,y2),,(xn,yn)}表示样本集合, ∣ Y ∣ |\mathcal{Y}| Y表示样本类别总数,如果是二分类,就是2, p k p_{k} pk表示第 k k k类样本所占比例,且 0 ≤ p k ≤ 1 , ∑ k = 1 ∣ Y ∣ p k = 1 0 \leq p_{k}\leq 1 ,\sum_{k=1}^{|\mathcal{Y}|}p_{k} = 1 0pk1,k=1Ypk=1 E n t ( D ) Ent(D) Ent(D)值越小,纯度越高

证明: 0 ≤ E n t ( D ) ≤ log ⁡ 2 ∣ Y ∣ 0 \leq Ent(D) \leq \log_{2}|\mathcal{Y}| 0Ent(D)log2Y

E n t ( D ) Ent(D) Ent(D)最大值,若令 ∣ Y ∣ = n , p k = x k |\mathcal{Y}|=n,p_{k}=x_{k} Y=n,pk=xk,则是一个n分类问题,那么信息熵 E n t ( D ) Ent(D) Ent(D)就可以看作一个n元实值函数,也即
E n t ( D ) = f ( x 1 , x 2 , ⋯   , x n ) = − ∑ k = 1 n x k log ⁡ 2 x k Ent(D) = f(x_{1},x_{2},\cdots,x_{n}) = -\sum_{k=1}^{n}x_{k}\log_{2}x_{k} Ent(D)=f(x1,x2,,xn)=k=1nxklog2xk
0 ≤ x k ≤ 1 , ∑ k = 1 n x k = 1 0 \leq x_{k}\leq 1 ,\sum_{k=1}^{n}x_{k} = 1 0xk1,k=1nxk=1,下面考虑求该多元函数的最值

如果不考虑约束 0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1,仅考虑 ∑ k = 1 n x k = 1 \sum_{k=1}^{n}x_{k} = 1 k=1nxk=1的话,对 f ( x 1 , x 2 , ⋯   , x n ) f(x_{1},x_{2},\cdots,x_{n}) f(x1,x2,,xn)求最大值等价于如下最小化问题
m i n ∑ k = 1 n x k log ⁡ 2 x k s . t . ∑ k = 1 n x k = 1 min \sum_{k=1}^{n}x_{k}\log_{2}x_{k}\\ s.t. \sum_{k=1}^{n}x_{k} = 1 mink=1nxklog2xks.t.k=1nxk=1
∑ k = 1 n x k log ⁡ 2 x k \sum_{k=1}^{n}x_{k}\log_{2}x_{k} k=1nxklog2xk可以看成是n个 x log ⁡ 2 x x\log_{2}x xlog2x求和

单独看其中一个函数,记 f ( x ) = x l o g 2 x f(x) = xlog_{2}x f(x)=xlog2x,则
f ′ ( x ) = l o g 2 x + x ⋅ 1 x ln ⁡ 2 = l o g 2 x + 1 ln ⁡ 2 f ′ ′ ( x ) = 1 x ln ⁡ 2 f'(x) = log_{2}x + x\cdot \frac{1}{x\ln 2} = log_{2}x + \frac{1}{\ln 2}\\ f''(x) = \frac{1}{x\ln 2} f(x)=log2x+xxln21=log2x+ln21f(x)=xln21
0 ≤ x ≤ 1 0 \leq x\leq 1 0x1, f ′ ′ ( x ) > 0 f''(x)>0 f(x)>0,所以 f ( x ) f(x) f(x)是凸函数,由n个 f ( x ) f(x) f(x)组合而成的 ∑ k = 1 n x k log ⁡ 2 x k \sum_{k=1}^{n}x_{k}\log_{2}x_{k} k=1nxklog2xk函数也是凸函数

0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1时,此问题为凸优化问题,而对于凸优化问题来说,满足KKT条件的点即为最优解,由于此最小化问题仅含等式约束,那么令其拉格朗日函数的一阶偏导数等于0的点即为满足KKT条件的点

根据拉格朗日乘子法可知,该优化问题的拉格朗日函数为
L ( x 1 , ⋯   , x n , λ ) = ∑ k = 1 n x k log ⁡ 2 x k + λ ( ∑ k = 1 n x k − 1 ) L(x_{1},\cdots,x_{n},\lambda ) = \sum_{k=1}^{n}x_{k}\log_{2}x_{k} + \lambda(\sum_{k=1}^{n}x_{k} - 1) L(x1,,xn,λ)=k=1nxklog2xk+λ(k=1nxk1)
对该拉格朗日函数分别关于 x 1 , ⋯   , x n , λ x_{1},\cdots,x_{n},\lambda x1,,xn,λ求一阶偏导数,并令偏导数等于0

先对 x 1 x_{1} x1求偏导等于0
∂ L ( x 1 , ⋯   , x n , λ ) ∂ x 1 = ∂ ∂ x [ ∑ k = 1 n x k log ⁡ 2 x k + λ ( ∑ k = 1 n x k − 1 ) ] = l o g 2 x 1 + x 1 ⋅ 1 x 1 ln ⁡ 2 + λ = l o g 2 x 1 + 1 ln ⁡ 2 + λ = 0 \begin{aligned} \frac{\partial L(x_{1},\cdots,x_{n},\lambda )}{\partial x_{1}} &= \frac{\partial }{\partial x}\left [ \sum_{k=1}^{n}x_{k}\log_{2}x_{k} + \lambda(\sum_{k=1}^{n}x_{k} - 1) \right ] \\&= log_{2}x_{1} + x_{1}\cdot \frac{1}{x_{1}\ln 2} + \lambda \\&= log_{2}x_{1} + \frac{1}{\ln 2} + \lambda =0 \end{aligned} x1L(x1,,xn,λ)=x[k=1nxklog2xk+λ(k=1nxk1)]=log2x1+x1x1ln21+λ=log2x1+ln21+λ=0

λ = − l o g 2 x 1 − 1 ln ⁡ 2 \lambda = -log_{2}x_{1} - \frac{1}{\ln 2} λ=log2x1ln21
然后分别对 x 2 , ⋯   , x n {x_{2},\cdots, x_{n}} x2,,xn分别求偏导,可得
λ = − l o g 2 x 1 − 1 ln ⁡ 2 = − l o g 2 x 2 − 1 ln ⁡ 2 = ⋯ = − l o g 2 x n − 1 ln ⁡ 2 \lambda = -log_{2}x_{1} - \frac{1}{\ln 2} = -log_{2}x_{2} - \frac{1}{\ln 2} = \cdots = -log_{2}x_{n} - \frac{1}{\ln 2} λ=log2x1ln21=log2x2ln21==log2xnln21
λ \lambda λ求偏导
∂ L ( x 1 , ⋯   , x n , λ ) ∂ λ = ∂ ∂ λ [ ∑ k = 1 n x k log ⁡ 2 x k + λ ( ∑ k = 1 n x k − 1 ) ] = ∑ k = 1 n x k − 1 \frac{\partial L(x_{1},\cdots,x_{n},\lambda )}{\partial \lambda} = \frac{\partial }{\partial \lambda}\left [ \sum_{k=1}^{n}x_{k}\log_{2}x_{k} + \lambda(\sum_{k=1}^{n}x_{k} - 1) \right ] = \sum_{k=1}^{n}x_{k} - 1 λL(x1,,xn,λ)=λ[k=1nxklog2xk+λ(k=1nxk1)]=k=1nxk1
令其等于0得
∑ k = 1 n x k = 1 \sum_{k=1}^{n}x_{k} = 1 k=1nxk=1
所以可以解得 x 1 = x 2 = ⋯ = x n = 1 n x_{1} = x_{2} = \cdots = x_{n} = \frac{1}{n} x1=x2==xn=n1 (因为 x 1 = x 2 = ⋯ = x n x_{1} = x_{2} = \cdots = x_{n} x1=x2==xn ∑ k = 1 n x k = 1 \sum_{k=1}^{n}x_{k} = 1 k=1nxk=1)

又因为 x k x_{k} xk还要满足 0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1,显然 0 ≤ 1 n ≤ 1 0 \leq \frac{1}{n} \leq 1 0n11,所以 x 1 = x 2 = ⋯ = x n = 1 n x_{1} = x_{2} = \cdots = x_{n} = \frac{1}{n} x1=x2==xn=n1是满足所有约束的最优解,也即为当前最小化问题的最小值点,同时也是$ f(x_{1},x_{2},\cdots,x_{n}) 的 最 大 值 点 , 将 的最大值点,将 x_{1} = x_{2} = \cdots = x_{n} = \frac{1}{n} 代 入 代入 f(x_{1},x_{2},\cdots,x_{n})$中可得
f ( 1 n , 1 n , ⋯   , 1 n ) = − ∑ k = 1 n 1 n log ⁡ 2 1 n = − n ⋅ 1 n ⋅ log ⁡ 2 1 n = log ⁡ 2 n f(\frac{1}{n},\frac{1}{n},\cdots,\frac{1}{n}) = -\sum_{k=1}^{n}\frac{1}{n}\log_{2}\frac{1}{n} = -n\cdot \frac{1}{n} \cdot \log_{2}\frac{1}{n} = \log_{2}n f(n1,n1,,n1)=k=1nn1log2n1=nn1log2n1=log2n
所以$ f(x_{1},x_{2},\cdots,x_{n})$在满足约束 0 ≤ x k ≤ 1 , ∑ k = 1 n x k = 1 0 \leq x_{k}\leq 1,\sum_{k=1}^{n}x_{k} = 1 0xk1,k=1nxk=1时的最大值为 log ⁡ 2 n \log_{2}n log2n

E n t ( D ) Ent(D) Ent(D)的最小值

如果不考虑 ∑ k = 1 n x k = 1 \sum_{k=1}^{n}x_{k} = 1 k=1nxk=1,仅考虑 l e q x k ≤ 1 leq x_{k}\leq 1 leqxk1的话,$ f(x_{1},x_{2},\cdots,x_{n})$可以看成是n个互不相关的一元函数加和,也即
f ( x 1 , x 2 , ⋯   , x n ) = ∑ k = 1 n g ( x k ) f(x_{1},x_{2},\cdots,x_{n}) =\sum_{k=1}^{n}g(x_{k}) f(x1,x2,,xn)=k=1ng(xk)
其中, g ( x k ) = − x k log ⁡ 2 x k g(x_{k}) = -x_{k}\log_{2}x_{k} g(xk)=xklog2xk 0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1,那么当 g ( x 1 ) , g ( x 2 ) , ⋯   , g ( x k ) g(x_{1}),g(x_{2}),\cdots,g(x_{k}) g(x1),g(x2),,g(xk)分别取到其最小值时, f ( x 1 , x 2 , ⋯   , x n ) f(x_{1},x_{2},\cdots,x_{n}) f(x1,x2,,xn)也就取到了最小值,由于 g ( x 1 ) , g ( x 2 ) , ⋯   , g ( x k ) g(x_{1}),g(x_{2}),\cdots,g(x_{k}) g(x1),g(x2),,g(xk)的定义域和函数表达式均相同,所以只需求出 g ( x 1 ) g(x_{1}) g(x1)的最小值也就求出了 g ( x 2 ) , ⋯   , g ( x k ) g(x_{2}),\cdots,g(x_{k}) g(x2),,g(xk)的最小值,下面考虑求 g ( x 1 ) g(x_{1}) g(x1)的最小值

首先对 g ( x 1 ) g(x_{1}) g(x1)关于 x 1 x_{1} x1求一阶和二阶导数
g ′ ( x 1 ) = − l o g 2 x 1 − x 1 ⋅ 1 x 1 ln ⁡ 2 = − l o g 2 x 1 − 1 ln ⁡ 2 g ′ ′ ( x 1 ) = − 1 x 1 ln ⁡ 2 g'(x_{1}) = -log_{2}x_{1} - x_{1}\cdot \frac{1}{x_{1}\ln 2} = -log_{2}x_{1} - \frac{1}{\ln 2}\\ g''(x_{1}) = -\frac{1}{x_{1}\ln 2} g(x1)=log2x1x1x1ln21=log2x1ln21g(x1)=x1ln21
显然,当 0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1 g ′ ′ ( x 1 ) = − 1 x 1 ln ⁡ 2 g''(x_{1}) = -\frac{1}{x_{1}\ln 2} g(x1)=x1ln21恒小于0,所以 g ( x 1 ) g(x_{1}) g(x1)是一个在其定义域范围内开口向下的凹函数,那么其最小值必定在边界取,于是分别取 x 1 = 0 x_{1} = 0 x1=0 x 1 = 1 x_{1}=1 x1=1代入 g ( x 1 ) g(x_{1}) g(x1)
g ( 0 ) = − 0 log ⁡ 2 0 = 0 g ( 1 ) = − l o g 2 1 = 0 g(0) = -0\log_{2}0 = 0\\ g(1) = -log_{2}1 = 0 g(0)=0log20=0g(1)=log21=0
所以, g ( x 1 ) g(x_{1}) g(x1)的最小值为0,同理可得 g ( x 2 ) , ⋯   , g ( x k ) g(x_{2}),\cdots,g(x_{k}) g(x2),,g(xk)的最小值也为0,那么$ f(x_{1},x_{2},\cdots,x_{n})$的最小值也为0,但是,此时是仅考虑 0 ≤ x k ≤ 1 0 \leq x_{k}\leq 1 0xk1时取到的最小值,若考虑约束 ∑ k = 1 n x k = 1 \sum_{k=1}^{n}x_{k} = 1 k=1nxk=1的话,那么$ f(x_{1},x_{2},\cdots,x_{n}) 的 最 小 值 一 定 大 于 等 于 0 , 如 果 令 某 个 的最小值一定大于等于0,如果令某个 0x_{k}=1 , 那 么 根 据 约 束 ,那么根据约束 \sum_{k=1}^{n}x_{k} = 1 可 知 可知 x_{1} = x_{2} = \cdots = x_{k-1} = x_{k+1} = \cdots = x_{n} = 0 , 将 其 代 入 ,将其代入 f(x_{1},x_{2},\cdots,x_{n})$可得
f ( 0 , 0 , ⋯   , 1 , 0 , ⋯   , 0 ) = − 0 log ⁡ 2 0 − − 0 log ⁡ 2 0 − ⋯ − log ⁡ 2 1 − − 0 log ⁡ 2 0 − ⋯ − 0 log ⁡ 2 0 = 0 f(0,0,\cdots,1,0,\cdots,0) = -0\log_{2}0 - -0\log_{2}0 - \cdots -\log_{2}1 - -0\log_{2}0 - \cdots -0\log_{2}0 = 0 f(0,0,,1,0,,0)=0log200log20log210log200log20=0
所以 x k = 1 , x 1 = x 2 = ⋯ = x k − 1 = x k + 1 = ⋯ = x n = 0 x_{k} = 1,x_{1} = x_{2} = \cdots = x_{k-1} = x_{k+1} = \cdots = x_{n} = 0 xk=1,x1=x2==xk1=xk+1==xn=0一定是$ f(x_{1},x_{2},\cdots,x_{n})$在满足约束 0 ≤ x k ≤ 1 , ∑ k = 1 n x k = 1 0 \leq x_{k}\leq 1,\sum_{k=1}^{n}x_{k} = 1 0xk1,k=1nxk=1 的条件下的最小值,其最小值为0

条件熵:在已知样本属性a的取值情况下,度量样本集合纯度的一种指标
H ( D ∣ a ) = ∑ v = 1 V ∣ D v ∣ D E n t ( D v ) H(D|a) = \sum_{v=1}^{V}\frac{|D^v|}{D}Ent(D^v) H(Da)=v=1VDDvEnt(Dv)
其中, a a a表示样本的某个属性,假定属性 a a a V V V个可能的取值 { a 1 , a 2 , ⋯   , a V } \left \{ a^1,a^2,\cdots,a^V\right \} {a1,a2,,aV},样本集合 D D D中在属性 a a a上取值为 a V a^V aV的样本记为 D V D^V DV, E n t ( D V ) Ent(D^V) Ent(DV)表示样本集合 D v D^v Dv的信息熵, H ( D ∣ a ) H(D|a) H(Da)值越小,纯度越高

ID3决策树,已信息增益为准则来选择划分属性的决策树,信息增益公式为
G a i n ( D , a ) = E n t ( D ) − ∑ v = 1 V ∣ D v ∣ D E n t ( D v ) = E n t ( D ) − H ( D ∣ a ) \begin{aligned} Gain(D,a) &= Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{D}Ent(D^v) \\&= Ent(D) - H(D|a) \end{aligned} Gain(D,a)=Ent(D)v=1VDDvEnt(Dv)=Ent(D)H(Da)
选择信息增益值最大的属性作为划分属性,因为信息增益越大,则意味着使用该属性来进行划分所获得的"纯度提升"越大

以信息增益为划分标准的ID3决策树对可取值越多数目较多的属性有所偏好
G a i n ( D , a ) = E n t ( D ) − ∑ v = 1 V ∣ D v ∣ D E n t ( D v ) = E n t ( D ) − ∑ v = 1 V ∣ D v ∣ D ( − ∑ k = 1 ∣ Y ∣ p k log ⁡ 2 p k ) = E n t ( D ) − ∑ v = 1 V ∣ D v ∣ D ( − ∑ k = 1 ∣ Y ∣ p k log ⁡ 2 p k ) ∣ D k v ∣ D v \begin{aligned} Gain(D,a) &= Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{D}Ent(D^v) \\&= Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{D}(-\sum_{k=1}^{|\mathcal{Y}|}p_{k}\log_{2}p_{k}) \\&=Ent(D) - \sum_{v=1}^{V}\frac{|D^v|}{D}(-\sum_{k=1}^{|\mathcal{Y}|}p_{k}\log_{2}p_{k})\frac{|D_{k}^{v}|}{D^v} \end{aligned} Gain(D,a)=Ent(D)v=1VDDvEnt(Dv)=Ent(D)v=1VDDv(k=1Ypklog2pk)=Ent(D)v=1VDDv(k=1Ypklog2pk)DvDkv
其中, D k v D_{k}^{v} Dkv样本集合 D D D中在属性 a a a上取值为 a v a^{v} av且类别为 k k k的样本

C4.5决策树

C4.5决策树以信息增益率为准则来选择划分属性的决策树,信息增益率
G a i n _ r a t i o ( D , a ) = G a i n ( D , a ) I V ( a ) Gain\_ratio(D,a) = \frac{Gain(D,a)}{IV(a)} Gain_ratio(D,a)=IV(a)Gain(D,a)
其中
I V ( a ) = − ∑ v = 1 V ∣ D v ∣ D log ⁡ 2 ∣ D v ∣ D IV(a) = -\sum_{v=1}^{V}\frac{|D^v|}{D}\log_{2}\frac{|D^v|}{D} IV(a)=v=1VDDvlog2DDv

CART决策树

CART决策树以基尼指数为准则来选择划分属性的决策树

基尼值:
G i n i ( D ) = ∑ k = 1 ∣ Y ∣ ∑ k ′ ≠ k p k p k ′ = ∑ k = 1 ∣ Y ∣ p k ∑ k ′ ≠ k p k ′ = ∑ k = 1 ∣ Y ∣ p k ( 1 − p k ) = 1 − ∑ k = 1 ∣ Y ∣ p k 2 Gini(D) = \sum_{k=1}^{|\mathcal{Y}|}\sum_{k'\neq k}p_{k}p_{k'} = \sum_{k=1}^{|\mathcal{Y}|}p_{k}\sum_{k'\neq k}p_{k'} = \sum_{k=1}^{|\mathcal{Y}|}p_{k}(1-p_{k}) = 1-\sum_{k=1}^{|\mathcal{Y}|}p_{k}^2 Gini(D)=k=1Yk=kpkpk=k=1Ypkk=kpk=k=1Ypk(1pk)=1k=1Ypk2
基尼指数:
G i n i _ i n d e x ( D , a ) = ∑ v = 1 V ∣ D v ∣ D G i n i ( D v ) Gini\_index(D,a) =\sum_{v=1}^{V}\frac{|D^v|}{D}Gini(D^v) Gini_index(D,a)=v=1VDDvGini(Dv)
基尼值和基尼指数越小,样本集合纯度越高

CART决策树分类算法

  1. 根据基尼指数公式 G i n i _ i n d e x ( D , a ) = ∑ v = 1 V ∣ D v ∣ D G i n i ( D v ) Gini\_index(D,a) =\sum_{v=1}^{V}\frac{|D^v|}{D}Gini(D^v) Gini_index(D,a)=v=1VDDvGini(Dv)找出基尼指数最小的属性 a ∗ a_{*} a
  2. 计算属性 a ∗ a_{*} a的所有可能取值的基尼值 G i n i ( D v ) Gini(D^v) Gini(Dv), v = 1 , 2 , ⋯   V v=1,2,\cdots\,V v=1,2,V,选择季妮志最小的取值 a ∗ v a_{*}^{v} av作为划分点,将集合 D D D划分为 D 1 D1 D1 D 2 D2 D2两个集合(节点),其中 D 1 D1 D1集合的样本为 a ∗ = a ∗ v a_{*}=a_{*}^{v} a=av的样本, D 2 D2 D2集合为 a ∗ ≠ a ∗ v a_{*}\neq a_{*}^{v} a=av的样本
  3. 对集合 D 1 D1 D1 D 2 D2 D2重复步骤1和步骤2,直到满足停止条件

CART决策树回归算法

  1. 根据以下公式找出最优划分特征 a ∗ a^* a和最优划分点 a ∗ v a_{*}^v av
    a ∗ , a ∗ v = a r g m i n a , a v [ m i n c 1 ∑ x i ∈ D 1 ( a , a v ) ( y i − c 1 ) 2 − m i n c 2 ∑ x i ∈ D 2 ( a , a v ) ( y i − c 2 ) 2 ] a_{*},a_{*}^v = \underset{a,a^v}{arg min}\left [\underset{c_{1}}{min} \underset{x_{i} \in D_{1}(a,a^v)}{\sum }(y_{i}-c_{1})^2-\underset{c_{2}}{min} \underset{x_{i} \in D_{2}(a,a^v)}{\sum }(y_{i}-c_{2})^2 \right ] a,av=a,avargminc1minxiD1(a,av)(yic1)2c2minxiD2(a,av)(yic2)2
    其中, D 1 ( a , a ∗ ) D_{1}(a,a^*) D1(a,a)表示在属性 a a a上取值小于等于 a v a^v av的样本集合, D 2 ( a , a v ) D_{2}(a,a^v) D2(a,av)表示在属性 a a a上取值大于 a v a^v av的样本集合, c 1 c_{1} c1表示 D 1 D_{1} D1的样本输出均值, c 2 c_{2} c2表示 D 2 D_{2} D2的样本输出均值

  2. 根据划分点 a ∗ v a_{*}^v av将集合 D D D划分为 D 1 D_{1} D1 D 2 D_{2} D2两个集合(节点)

  3. 对集合 D 1 D_{1} D1 D 2 D_{2} D2重复步骤1和步骤2,直至满足停止条件

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值