ML Note 1.2 - Classification

对于离散型随机变量的学习问题称为 classification.

Generative Learning Algorithms

生成式模型中,我们首先对 class-conditioned probability density p ( x ∣ y ) p(x|y) p(xy) 和 prior p ( y ) p(y) p(y) 进行建模。这一步可以对联合分布应用 MLE 来完成
l ( θ ) = log ⁡ ∏ i p ( x ( i ) , y ( i ) ) l(\theta) = \log\prod\limits_ip(x^{(i)}, y^{(i)}) l(θ)=logip(x(i),y(i))

而后我们使用贝叶斯公式计算后验概率
p ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) ∝ p ( x ∣ y ) p ( y ) p(y|x) = \frac{p(x|y)p(y)}{p(x)} \propto p(x|y)p(y) p(yx)=p(x)p(xy)p(y)p(xy)p(y)

一般来说,预测时只需要最小化错误率即可
h ( x ) = max ⁡ y p ( x ∣ y ) p ( y ) h(x) = \max\limits_yp(x|y)p(y) h(x)=ymaxp(xy)p(y)

有时一些不正确的预测结果可能会导致较大损失。为了尽可能减少预测错误带来的损失,可以构造决策表 λ ( h ( x ) , y ) \lambda(h(x), y) λ(h(x),y) 表示当实际类别为 y y y 而预测类别为 h ( x ) h(x) h(x) 时的损失。定义条件期望损失
R ( α ∣ x ) = ∑ i λ ( α , i ) p ( x ∣ i ) p ( i ) R(\alpha|x) = \sum\limits_{i} \lambda(\alpha, i)p(x|i)p(i) R(αx)=iλ(α,i)p(xi)p(i)

则预测可以表示成
h ( x ) = min ⁡ α R ( α ∣ x ) h(x) = \min\limits_{\alpha} R(\alpha|x) h(x)=αminR(αx)

Gaussian Discriminant Analysis

假设对于二分类问题
y ∼ B e r n ( ϕ ) x ∣ y = 1 ∼ M V N ( μ ⃗ 1 , Σ ) x ∣ y = 0 ∼ M V N ( μ ⃗ 0 , Σ ) \begin{array}{rcl} y &\sim& Bern(\phi)\\ x|y = 1 &\sim& MVN(\vec\mu_1, \Sigma)\\ x|y = 0 &\sim& MVN(\vec\mu_0, \Sigma) \end{array} yxy=1xy=0Bern(ϕ)MVN(μ 1,Σ)MVN(μ 0,Σ)

MLE 给出
ϕ = 1 m ∑ i = 1 m y ( i ) μ ⃗ 0 = ∑ i = 1 m ( 1 − y ( i ) ) x ( i ) ∑ i = 1 m ( 1 − y ( i ) ) μ ⃗ 1 = ∑ i = 1 m y ( i ) x ( i ) ∑ i = 1 m y ( i ) Σ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T \begin{array}{rcl} \phi &=& \frac{1}{m}\sum\limits_{i=1}^my^{(i)}\\ \vec\mu_0 &=& \frac{\sum\limits_{i=1}^m\left(1-y^{(i)}\right)x^{(i)}} {\sum\limits_{i=1}^m\left(1-y^{(i)}\right)}\\ \vec\mu_1 &=& \frac{\sum\limits_{i=1}^my^{(i)}x^{(i)}} {\sum\limits_{i=1}^my^{(i)}}\\ \Sigma &=& \frac{1}{m}\sum\limits_{i=1}^m\left(x^{(i)}-\mu_{y^{(i)}}\right)\left(x^{(i)}-\mu_{y^{(i)}}\right)^T \end{array} ϕμ 0μ 1Σ====m1i=1my(i)i=1m(1y(i))i=1m(1y(i))x(i)i=1my(i)i=1my(i)x(i)m1i=1m(x(i)μy(i))(x(i)μy(i))T

GDA 的后验概率可以写成如下形式
p ( y = 1 ∣ x ; ϕ , Σ , μ 0 , μ 1 ) = 1 1 + exp ⁡ ( − θ ϕ , Σ , μ 0 , μ 1 T x ) p(y=1|x;\phi, \Sigma, \mu_0, \mu_1) = \frac{1}{1+\exp(-\theta_{\phi, \Sigma, \mu_0, \mu_1}^Tx)} p(y=1x;ϕ,Σ,μ0,μ1)=1+exp(θϕ,Σ,μ0,μ1Tx)1

尽管形式上与逻辑回归相同,但一般来说二者的决策边界并不相同。事实上,任何指数族分布 x ∣ y ∼ E x p F a m i l y ( η ) x|y \sim ExpFamily(\eta) xyExpFamily(η) 都将推出对数后验。因此逻辑回归有更强的 robust 性能,而且对模型的假设较少。而 GDA 假设数据来自正态分布,因此收敛速度较快。

Naive Bayes Classifier

如果特征的各个分量 x i x_i xi 之间条件独立,即满足 NB assumption
p ( x 1 , … , x n ∣ y ) = ∏ i = 1 n p ( x i ∣ y ) p(x_1, \dots, x_n|y) = \prod\limits_{i=1}^n p(x_i|y) p(x1,,xny)=i=1np(xiy)

则可以使用朴素贝叶斯分类器进行建模。考虑如下的问题

判别一封邮件是否是垃圾邮件。假设我们有一本字典 V V V 可以将单词映射为整数 { 1 , … , n } \{1, \dots, n\} {1,,n}.

取特征 x ∈ { 0 , 1 } n x\in \{0,1\}^n x{0,1}n 表示字典中第 i i i 个单词是否出现。应用 MLE 可以得到参数
ϕ i ∣ y = 1 ≡ p ( x i = 1 ∣ y = 1 ) = ∑ j = 1 m x i ( j ) y ( j ) ∑ j = 1 m y ( j ) ϕ i ∣ y = 0 ≡ p ( x i = 1 ∣ y = 0 ) = ∑ j = 1 m x i ( j ) ( 1 − y ( j ) ) ∑ j = 1 m ( 1 − y ( j ) ) ϕ y ≡ p ( y = 1 ) = ∑ j = 1 m y ( j ) m \begin{array}{rcccl} \phi_{i|y=1} &\equiv& p(x_i=1|y=1) &=& \frac{\sum\limits_{j=1}^mx_i^{(j)}y^{(j)}}{\sum\limits_{j=1}^my^{(j)}}\\ \phi_{i|y=0} &\equiv& p(x_i=1|y=0) &=& \frac{\sum\limits_{j=1}^mx_i^{(j)}(1-y^{(j)})}{\sum\limits_{j=1}^m(1-y^{(j)})}\\ \phi_y &\equiv& p(y=1) &=& \frac{\sum\limits_{j=1}^my^{(j)}}{m} \end{array} ϕiy=1ϕiy=0ϕyp(xi=1y=1)p(xi=1y=0)p(y=1)===j=1my(j)j=1mxi(j)y(j)j=1m(1y(j))j=1mxi(j)(1y(j))mj=1my(j)

上述模型被称为 multi-variate Bernoulli event model 因为特征 x x x 是一个多元伯努利变量。另一种可能的模型被称为 multinomial event model 其中特征 x ∈ { 1 , … , n } l x \in \{1,\dots,n\}^l x{1,,n}l 是对于邮件的一个翻译。其中 l l l 表示邮件的单词数。应用 MLE 可以得到参数
ϕ i ∣ y = 1 ≡ p ( x j = i ∣ y = 1 ) = ∑ i = 1 m ∑ j = 1 l i y ( i ) I { x j ( i ) = k } ∑ i = 1 m y ( i ) l i ϕ i ∣ y = 0 ≡ p ( x j = i ∣ y = 0 ) = ∑ i = 1 m ∑ j = 1 l i ( 1 − y ( i ) ) I { x j ( i ) = k } ∑ i = 1 m ( 1 − y ( i ) ) l i ϕ y ≡ p ( y = 1 ) = ∑ i = 1 m y ( i ) m \begin{array}{rcccl} \phi_{i|y=1} &\equiv& p(x_j=i|y=1) &=& \frac{\sum\limits_{i=1}^m \sum\limits_{j=1}^{l_i}y^{(i)}I\{x_j^{(i)}=k\}}{\sum\limits_{i=1}^my^{(i)}l_i}\\ \phi_{i|y=0} &\equiv& p(x_j=i|y=0) &=& \frac{\sum\limits_{i=1}^m\sum\limits_{j=1}^{l_i}(1-y^{(i)})I\{x_j^{(i)}=k\}}{\sum\limits_{i=1}^m(1-y^{(i)})l_i}\\ \phi_y &\equiv& p(y=1) &=& \frac{\sum\limits_{i=1}^my^{(i)}}{m} \end{array} ϕiy=1ϕiy=0ϕyp(xj=iy=1)p(xj=iy=0)p(y=1)===i=1my(i)lii=1mj=1liy(i)I{xj(i)=k}i=1m(1y(i))lii=1mj=1li(1y(i))I{xj(i)=k}mi=1my(i)

上述算法的问题在于,如果一个单词在所有垃圾邮件样本中都没有出现,则待测样本中只要出现这个单词就一定不会被判为垃圾邮件。这个问题可以被 Laplace smoothing 解决,其思想是给每个单词一个默认的出现概率。设随机变量 z ∼ M u l t i n o m i a l ( ϕ 1 , ϕ 2 , … , ϕ k − 1 ) z \sim Multinomial(\phi_1, \phi_2, \dots, \phi_{k-1}) zMultinomial(ϕ1,ϕ2,,ϕk1) 的 MLE 结果为 ϕ j = p / q \phi_j = p / q ϕj=p/q 。因为 z z z k k k 个候选中取值,因此每个值的默认出现概率应为 1 / k 1 / k 1/k 。修改后的参数估计值为
ϕ j = p + 1 q + k \phi_j = \frac{p+1}{q+k} ϕj=q+kp+1

应该注意到,上面的算法只对离散型随机变量 x x x 有效。如果希望对连续型随机变量应用朴素贝叶斯,可以将其 discretize 成为离散型随机变量。

Discriminative

判别式模型中,我们直接对后验概率 p ( y ∣ x ) p(y|x) p(yx) 建模,并以 p ( y ∣ x ) = 0.5 p(y|x) = 0.5 p(yx)=0.5 作为决策边界。

Softmax Regression

对于 k k k 类决策问题,假设目标变量
y ∣ x ; θ ∼ M u l t i n o m i a l ( 1 , ϕ ⃗ ) y|x;\theta \sim Multinomial(1, \vec\phi) yx;θMultinomial(1,ϕ )

其中每一类的出现概率
ϕ ⃗ = ( ϕ 1 , ϕ 2 , … , ϕ k − 1 ) \vec\phi = (\phi_1, \phi_2, \dots, \phi_{k-1}) ϕ =(ϕ1,ϕ2,,ϕk1)

一般不将 ϕ k \phi_k ϕk 作为参数之一,这是因为归一性要求
ϕ k = 1 − ∑ i = 1 k − 1 ϕ i \phi_k = 1-\sum\limits_{i=1}^{k-1}\phi_i ϕk=1i=1k1ϕi

现在我们希望证明多项分布属于指数族。为了使
h ( x ) = E [ T ( y ) ] = [ ϕ 1 ϕ 2 ⋮ ϕ k − 1 ] h(x) = E[T(y)] = \left[\begin{array}{c} \phi_1\\ \phi_2\\ \vdots\\ \phi_{k-1}\\ \end{array}\right] h(x)=E[T(y)]=ϕ1ϕ2ϕk1

构造
T ( y ) = [ I { y = 1 } I { y = 2 } ⋮ I { y = k − 1 } ] T(y) = \left[\begin{array}{c} I\{y = 1\}\\ I\{y = 2\}\\ \vdots\\ I\{y = k-1\}\\ \end{array}\right] T(y)=I{y=1}I{y=2}I{y=k1}

则可以观察到
I { y = k } = 1 − ∑ i = 1 k − 1 T ( y ) I\{y = k\} = 1 - \sum\limits_{i = 1}^{k - 1} T(y) I{y=k}=1i=1k1T(y)

因为多项分布满足
p ( y ; ϕ ) = ∏ i = 1 k ϕ i I { y = i } = exp ⁡ ( ∑ i = 1 k I { y = i } log ⁡ ϕ i ) = exp ⁡ ( ∑ i = 1 k − 1 T i ( y ) log ⁡ ϕ i + ( 1 − ∑ i = 1 k − 1 T i ( y ) ) log ⁡ ϕ k ) = exp ⁡ ( ∑ i = 1 k − 1 T i ( y ) log ⁡ ϕ i ϕ k + log ⁡ ϕ k ) \begin{array}{rcl} p(y;\phi) &=& \prod\limits_{i=1}^k\phi_i^{I\{y = i\}}\\ &=& \exp\left(\sum\limits_{i = 1}^k I\{y = i\}\log\phi_i\right)\\ &=& \exp\left(\sum\limits_{i = 1}^{k - 1} T_i(y)\log\phi_i + \left(1 - \sum\limits_{i = 1}^{k - 1} T_i(y)\right)\log\phi_k\right)\\ &=& \exp\left(\sum\limits_{i = 1}^{k - 1} T_i(y)\log\frac{\phi_i}{\phi_k} + \log\phi_k\right) \end{array} p(y;ϕ)====i=1kϕiI{y=i}exp(i=1kI{y=i}logϕi)exp(i=1k1Ti(y)logϕi+(1i=1k1Ti(y))logϕk)exp(i=1k1Ti(y)logϕkϕi+logϕk)

所以指数族分布参数为
η = [ log ⁡ ( ϕ 1 / ϕ k ) log ⁡ ( ϕ 2 / ϕ k ) ⋮ log ⁡ ( ϕ k − 1 / ϕ k ) ] a ( η ) = − l o g ( ϕ k ) b ( y ) = 1 \begin{array}{rcl} \eta &=& \left[\begin{array}{c}\log(\phi_1/\phi_k)\\\log(\phi_2/\phi_k)\\\vdots\\\log(\phi_{k-1}/\phi_k)\\\end{array}\right]\\ a(\eta) &=& -log(\phi_k)\\ b(y) &=& 1 \end{array} ηa(η)b(y)===log(ϕ1/ϕk)log(ϕ2/ϕk)log(ϕk1/ϕk)log(ϕk)1

定义 η k ≡ log ⁡ ϕ k ϕ k = 0 \eta_k \equiv \log\frac{\phi_k}{\phi_k} = 0 ηklogϕkϕk=0 则可以证明
h ( x ) = [ exp ⁡ η 1 ∑ j = 1 k exp ⁡ η j exp ⁡ η 2 ∑ j = 1 k exp ⁡ η j ⋮ exp ⁡ η k − 1 ∑ j = 1 k exp ⁡ η j ] = [ exp ⁡ θ 1 T x ∑ j = 1 k exp ⁡ θ j T x exp ⁡ θ 2 T x ∑ j = 1 k exp ⁡ θ j T x ⋮ exp ⁡ θ k − 1 T x ∑ j = 1 k exp ⁡ θ j T x ] h(x) = \left[\begin{array}{c} \frac{\exp{\eta_1}}{\sum\limits_{j=1}^k\exp{\eta_j}}\\ \frac{\exp{\eta_2}}{\sum\limits_{j=1}^k\exp{\eta_j}}\\ \vdots\\ \frac{\exp{\eta_{k-1}}}{\sum\limits_{j=1}^k\exp{\eta_j}}\\ \end{array}\right] = \left[\begin{array}{c} \frac{\exp{\theta_1^Tx}}{\sum\limits_{j=1}^k\exp{\theta_j^Tx}}\\ \frac{\exp{\theta_2^Tx}}{\sum\limits_{j=1}^k\exp{\theta_j^Tx}}\\ \vdots\\ \frac{\exp{\theta_{k-1}^Tx}}{\sum\limits_{j=1}^k\exp{\theta_j^Tx}}\\ \end{array}\right] h(x)=j=1kexpηjexpη1j=1kexpηjexpη2j=1kexpηjexpηk1=j=1kexpθjTxexpθ1Txj=1kexpθjTxexpθ2Txj=1kexpθjTxexpθk1Tx

Logistic Regression

对于二分类问题,假设目标变量
y ∣ x ; θ ∼ B e r n ( p ) y|x;\theta \sim Bern(p) yx;θBern(p)

为 softmax regression 在 k = 2 k = 2 k=2 时的特殊情况。因此可以得出
h ( x ) = 1 1 + e − θ T x h(x) = \frac{1}{1+e^{-\theta^Tx}} h(x)=1+eθTx1

这样的函数被称为 logistic 或 sigmoid
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+ez1

Sigmoid

所以该模型称为 logistic regression. 应用 MLE 得到
L ( θ ) = ∏ i = 1 m g ( θ T x ( i ) ) y ( i ) ( 1 − g ( θ T x ( i ) ) ) 1 − y ( i ) l ( θ ) = ∑ i = 1 m y ( i ) ln ⁡ g ( θ T x ( i ) ) + ( 1 − y ( i ) ) ln ⁡ ( 1 − g ( θ T x ( i ) ) ) \begin{array}{rcl} L(\theta) &=& \prod\limits_{i = 1}^m g\left(\theta^Tx^{(i)}\right)^{y^{(i)}} \left(1 - g\left(\theta^Tx^{(i)}\right)\right)^{1 - y^{(i)}}\\ l(\theta) &=& \sum\limits_{i = 1}^m y^{(i)}\ln g\left(\theta^Tx^{(i)}\right) + \left(1 - y^{(i)}\right)\ln\left(1 - g\left(\theta^Tx^{(i)}\right)\right) \end{array} L(θ)l(θ)==i=1mg(θTx(i))y(i)(1g(θTx(i)))1y(i)i=1my(i)lng(θTx(i))+(1y(i))ln(1g(θTx(i)))

根据 sigmoid 函数的对称性
g ( − z ) = 1 − g ( z ) g(-z) = 1 - g(z) g(z)=1g(z)

将上式化为
l ( θ ) = ∑ i = 1 m y ( i ) ln ⁡ g ( θ T x ( i ) ) + ( 1 − y ( i ) ) ln ⁡ g ( − θ T x ( i ) ) = ∑ i = 1 m y ( i ) θ T x ( i ) + ln ⁡ g ( − θ T x ( i ) ) \begin{array}{rcl} l(\theta) &=& \sum\limits_{i = 1}^m y^{(i)}\ln g\left(\theta^Tx^{(i)}\right) + \left(1 - y^{(i)}\right)\ln g\left(-\theta^Tx^{(i)}\right)\\ &=& \sum\limits_{i = 1}^m y^{(i)}\theta^Tx^{(i)} + \ln g\left(-\theta^Tx^{(i)}\right)\\ \end{array} l(θ)==i=1my(i)lng(θTx(i))+(1y(i))lng(θTx(i))i=1my(i)θTx(i)+lng(θTx(i))

因为 sigmoid 函数的导数满足
g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) g'(z) = g(z)(1-g(z)) g(z)=g(z)(1g(z))

所以
∇ θ l = ∑ i = 1 m y ( i ) x ( i ) + 1 g ( − θ T x ( i ) ) ⋅ g ( − θ T x ( i ) ) ( 1 − g ( − θ T x ( i ) ) ) ( − x ( i ) ) = ∑ i = 1 m y ( i ) x ( i ) − g ( θ T x ( i ) ) x ( i ) = ∑ i = 1 m ( y ( i ) − g ( θ T x ( i ) ) ) x ( i ) \begin{array}{rcl} \nabla_\theta l &=& \sum\limits_{i = 1}^m y^{(i)}x^{(i)} + \frac{1}{g\left(-\theta^Tx^{(i)}\right)} \cdot g\left(-\theta^Tx^{(i)}\right) \left(1 - g\left(-\theta^Tx^{(i)}\right)\right)\left(-x^{(i)}\right)\\ &=& \sum\limits_{i = 1}^m y^{(i)}x^{(i)} - g\left(\theta^Tx^{(i)}\right)x^{(i)}\\ &=& \sum\limits_{i = 1}^m \left(y^{(i)} - g\left(\theta^Tx^{(i)}\right)\right)x^{(i)} \end{array} θl===i=1my(i)x(i)+g(θTx(i))1g(θTx(i))(1g(θTx(i)))(x(i))i=1my(i)x(i)g(θTx(i))x(i)i=1m(y(i)g(θTx(i)))x(i)

由此就可以通过梯度下降来求解了。


Appendix

Logistic Regression

logistic_regression

main.m

%% Initialization
clear ; close all; clc

%% Load Data
%  The first two columns contains the X values and the third column
%  contains the label (y).

data = load('ex2data2.txt');
X = data(:, [1, 2]); y = data(:, 3);

% Note that mapFeature also adds a column of ones for us, so the intercept
% term is handled
X = mapFeature(X(:,1), X(:,2));

%% Regularization and Accuracies

initial_theta = zeros(size(X, 2), 1);
lambda = 1;

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
	fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);

%% Plot Boundary

% Plot Data
figure; hold on;
pos = (y == 1);
neg = (y == 0);

plot(X(pos, 2), X(pos, 3), 'k+');
plot(X(neg, 2), X(neg, 3), 'ko');
hold on

% Here is the grid range
u = linspace(-1, 1.5, 50);
v = linspace(-1, 1.5, 50);

z = zeros(length(u), length(v));
% Evaluate z = theta*x over the grid
for i = 1:length(u)
    for j = 1:length(v)
        z(i,j) = mapFeature(u(i), v(j))*theta;
    end
end
z = z'; % important to transpose z before calling contour

% Plot z = 0
% Notice you need to specify the range [0, 0]
contour(u, v, z, [0, 0], 'LineWidth', 2)

title(sprintf('lambda = %g', lambda))

% Labels and Legend
xlabel('Microchip Test 1')
ylabel('Microchip Test 2')

legend('y = 1', 'y = 0', 'Decision boundary')
hold off;

%% Compute accuracy on our training set
p = double(logsig(X * theta) >= 0.5);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

costFunctionReg.m

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta

J = -1 / m * (sum(y .* log(logsig(X * theta)) + (1 - y) .* log(1 - logsig(X * theta))) - lambda / 2 * norm(theta(2:end))^2);
grad = 1 / m * (X' * (logsig(X * theta) - y) + lambda * [0; theta(2:end)]);

% =============================================================

end

mapFeature.m

function out = mapFeature(X1, X2)
% MAPFEATURE Feature mapping function to polynomial features
%
%   MAPFEATURE(X1, X2) maps the two input features
%   to quadratic features used in the regularization exercise.
%
%   Returns a new feature array with more features, comprising of 
%   X1, X2, X1.^2, X2.^2, X1*X2, X1*X2.^2, etc..
%
%   Inputs X1, X2 must be the same size
%

degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
    for j = 0:i
        out(:, end+1) = (X1.^(i-j)).*(X2.^j);
    end
end

end
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LutingWang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值