[论文笔记&随手] Training with Weighted Sum of Denoising Score Matching Objectives

[note] Training with Weighted Sum of Denoising Score Matching Objectives

利用 去噪分数匹配目标的加权和 进行训练,去噪指的是使用sde的方法就不需要自行补充噪声了。

本文的目的是解释如何对原始数据进行扰动。 from https://yang-song.github.io/blog/2021/score/

一、理论

首先,挑选一个随机过程(SDE)对原始数据分布 p 0 p_0 p0进行扰动得到扰动后数据的概率密度分布 p t p_t pt

本文选择的随机过程为:
d x = σ t d w ,   t ∈ [ 0 , 1 ] d{\bf x} = \sigma^td{\bf w}, \ t\in[0,1] dx=σtdw, t[0,1]
在这种情况下,扰动后数据的概率密度分布 p t p_t pt,在原始数据下的条件概率分布为:
p 0 t ( x ( t ) ∣ x ( 0 ) ) = N ( x ( t ) ; x ( 0 ) , 1 2 log ⁡ σ ( σ 2 t − 1 ) I ) p_{0t}(\mathbf{x}(t) \mid \mathbf{x}(0)) = \mathcal{N}\bigg(\mathbf{x}(t); \mathbf{x}(0), \frac{1}{2\log \sigma}(\sigma^{2t} - 1) \mathbf{I}\bigg) p0t(x(t)x(0))=N(x(t);x(0),2logσ1(σ2t1)I)
关于这个函数的解释是,使用参数$ \frac{1}{2\log \sigma}(\sigma^{2t} - 1) 作 为 我 们 的 权 重 函 数 , 即 作为我们的权重函数,即 \lambda(t) = \frac{1}{2 \log \sigma}(\sigma^{2t} - 1)$.

当参数 σ \sigma σ变得非常大的时候,其中的先验分布 p t = 1 p_{t=1} pt=1,也就是最终扰动后的数据分布就可以变成一个正太分布:
∫ p 0 ( y ) N ( x ; y , 1 2 log ⁡ σ ( σ 2 − 1 ) I ) d y ≈ N ( x ; 0 , 1 2 log ⁡ σ ( σ 2 − 1 ) I ) , \int p_0(\mathbf{y})\mathcal{N}\bigg(\mathbf{x}; \mathbf{y}, \frac{1}{2 \log \sigma}(\sigma^2 - 1)\mathbf{I}\bigg) d \mathbf{y} \approx \mathbf{N}\bigg(\mathbf{x}; \mathbf{0}, \frac{1}{2 \log \sigma}(\sigma^2 - 1)\mathbf{I}\bigg), p0(y)N(x;y,2logσ1(σ21)I)dyN(x;0,2logσ1(σ21)I),
直观地说,这个SDE通过一个变种函数 1 2   l o g   σ ( σ 2 t − 1 ) \frac1{2\ log\ \sigma}(\sigma^{2t}-1) 2 log σ1(σ2t1)帮助我们捕获了高斯扰动的数据变量集合(连续统continuum),即 x ( t ) x(t) x(t)。这个数据变量集合可以帮助我们逐渐将原始数据分布 p 0 p_0 p0变成了一个简单的高斯分布 p 1 p_1 p1,也就是t=1时候的分布。

二、代码实现

1) 对t进行连续采样

 # 对时间特征t进行均匀采样
 random_t = torch.rand(x.x.shape[0]//30, device=device) * (1. - eps) + eps # 防止采样到0

2)定义权重函数

可以看到,这里定义的权重函数就是作者在上面提到的 λ ( t ) \lambda(t) λ(t)函数。

def marginal_prob_std(t, sigma):
    # t = torch.tensor(t, device=device)
    return torch.sqrt((sigma ** (2 * t) - 1.) / 2. / np.log(sigma))

3)对数据进行扰动

# 表征时间的特征t, 从0到1上进行均匀采样
random_t = torch.rand(batchsize, device=device) * (1. - eps) + eps # 这里的eps是为了防止采样到t=0

# 构造一个与原始数据结构一样的向量,并在[0,1)上进行均匀采样。
z = torch.randn_like(x.x)

# 利用前面均匀采样的时间特征t,求得权重函数的值,这个权重函数的目的就是为了使得t=1时的扰动数据达到一个正太分布的结果。重复30遍的目的是因为一轮训练中设置的batch_size = 30
std = marginal_prob_std_func(random_t).repeat(1, 30).view(-1, 1)

# 这里将噪声与标准差相乘,
perturbed_x = copy.deepcopy(x)
perturbed_x.x += z * std

4)利用扰动的数据进行训练

需要补充一下,为了训练积分函数模型,目前的目标函数变成了下面这个样子:
E t ∈ u ( 0 , T ) E p t ( x ) [ λ ( t ) ∣ ∣ ∇ x l o g   p t ( x ) − s θ ( x , t ) ∣ ∣ 2 2 ] \mathbb{E}_{t\in u(0,T)}\mathbb{E}_{p_t(x)}[\lambda(t)||\nabla_xlog\ p_t(x)-s_\theta(x,t)||_2^2] Etu(0,T)Ept(x)[λ(t)xlog pt(x)sθ(x,t)22]

这里是最基本的目标函数的样子:
E p ( x ) [ ∣ ∣ ∇ x l o g   p ( x )   −   s θ ( x ) ∣ ∣ 2 2 ]   =   ∫   p ( x ) ∣ ∣ ∇ x   l o g   p ( x )   −   s θ ( x ) ∣ ∣ 2 2 d x . \mathbb{E}_{p(x)}[{||\nabla_xlog\ p(x)\ -\ s_\theta(x)||}_2^2]\ =\ \int\ p(x)||\nabla_x\ log\ p(x)\ -\ s_\theta(x)||_2^2dx. Ep(x)[xlog p(x)  sθ(x)22] =  p(x)x log p(x)  sθ(x)22dx.
为了估计这个目标函数,需要如下估计,即使用Score Matching的方法进行估计(Hyvärinen 2005):

可以看到,去估计如下的目标函数是可以达到的。

E p d a t a ( x ) [ 1 2 ∣ ∣ s θ ( x ) ∣ ∣ 2 2 + t r a c e ( ∇ x s θ ( x ) ) ] \mathbb{E}_{p_{data}(x)}[\frac12||s_\theta(x)||_2^2+trace(\nabla_xs_\theta(x))] Epdata(x)[21sθ(x)22+trace(xsθ(x))]

具体上,体现在代码上,用的是如下的公式:
1 N ∑ i = 1 N [ 1 2 ∣ ∣ s θ ( x i ) ∣ ∣ 2 2 + t r a c e ( ∇ x s θ ( x i ) ) ] ≈ 1 N ∑ i = 1 N [ 1 2 ∣ ∣ s θ ( x i ) ∣ ∣ 2 2 + t r a c e ( ∇ x s θ ( x i ) ) \frac1N\sum^N_{i=1}[\frac12||s_\theta(x_i)||_2^2+trace(\nabla_xs_\theta(x_i))] \\ \approx \frac1N \sum_{i=1}^N [\frac12||s_\theta(x_i)||_2^2+trace(\nabla_xs_\theta(x_i)) N1i=1N[21sθ(xi)22+trace(xsθ(xi))]N1i=1N[21sθ(xi)22+trace(xsθ(xi))

# 计算积分函数的值
output = model(perturbed_x, random_t)
# score matching的损失函数,与上式不一致的原因在于,本文的目标函数中还有一个参数\lambda(t),所以表现为如下的形式。
loss_ = torch.mean(torch.sum(((output * std + z)**2).view(batch_size, -1)), dim=-1)
# 一轮训练之后,将score matching的目标函数的结果返回
return loss_

🙋‍♂️ 我有一个问题,这个目标函数是怎么推理得到的呀? 🤔

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
书的目录 Contents Website viii Acknowledgments ix Notation xiii 1 Introduction 1 1.1 Who Should Read This Book? . . . . . . . . . . . . . . . . . . . . 8 1.2 Historical Trends in Deep Learning . . . . . . . . . . . . . . . . . 12 I Applied Math and Machine Learning Basics 27 2 Linear Algebra 29 2.1 Scalars, Vectors, Matrices and Tensors . . . . . . . . . . . . . . . 29 2.2 Multiplying Matrices and Vectors . . . . . . . . . . . . . . . . . . 32 2.3 Identity and Inverse Matrices . . . . . . . . . . . . . . . . . . . . 34 2.4 Linear Dependence and Span . . . . . . . . . . . . . . . . . . . . 35 2.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Special Kinds of Matrices and Vectors . . . . . . . . . . . . . . . 38 2.7 Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.8 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 42 2.9 The Moore-Penrose Pseudoinverse . . . . . . . . . . . . . . . . . . 43 2.10 The Trace Operator . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.11 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.12 Example: Principal Components Analysis . . . . . . . . . . . . . 45 3 Probability and Information Theory 51 3.1 Why Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 i CONTENTS 3.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Marginal Probability . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.5 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 57 3.6 The Chain Rule of Conditional Probabilities . . . . . . . . . . . . 57 3.7 Independence and Conditional Independence . . . . . . . . . . . . 58 3.8 Expectation, Variance and Covariance . . . . . . . . . . . . . . . 58 3.9 Common Probability Distributions . . . . . . . . . . . . . . . . . 60 3.10 Useful Properties of Common Functions . . . . . . . . . . . . . . 65 3.11 Bayes’ Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.12 Technical Details of Continuous Variables . . . . . . . . . . . . . 69 3.13 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.14 Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . 73 4 Numerical Computation 78 4.1 Overflow and Underflow . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Poor Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.3 Gradient-Based Optimization . . . . . . . . . . . . . . . . . . . . 80 4.4 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . 91 4.5 Example: Linear Least Squares . . . . . . . . . . . . . . . . . . . 94 5 Machine Learning Basics 96 5.1 Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Capacity, Overfitting and Underfitting . . . . . . . . . . . . . . . 108 5.3 Hyperparameters and Validation Sets . . . . . . . . . . . . . . . . 118 5.4 Estimators, Bias and Variance . . . . . . . . . . . . . . . . . . . . 120 5.5 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . 129 5.6 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7 Supervised Learning Algorithms . . . . . . . . . . . . . . . . . . . 137 5.8 Unsupervised Learning Algorithms . . . . . . . . . . . . . . . . . 142 5.9 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . 149 5.10 Building a Machine Learning Algorithm . . . . . . . . . . . . . . 151 5.11 Challenges Motivating Deep Learning . . . . . . . . . . . . . . . . 152 II Deep Networks: Modern Practices 162 6 Deep Feedforward Networks 164 6.1 Example: Learning XOR . . . . . . . . . . . . . . . . . . . . . . . 167 6.2 Gradient-Based Learning . . . . . . . . . . . . . . . . . . . . . . . 172 ii CONTENTS 6.3 Hidden Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.4 Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.5 Back-Propagation and Other Differentiation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 6.6 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 7 Regularization for Deep Learning 224 7.1 Parameter Norm Penalties . . . . . . . . . . . . . . . . . . . . . . 226 7.2 Norm Penalties as Constrained Optimization . . . . . . . . . . . . 233 7.3 Regularization and Under-Constrained Problems . . . . . . . . . 235 7.4 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 236 7.5 Noise Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 7.6 Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . 240 7.7 Multitask Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.8 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.9 Parameter Tying and Parameter Sharing . . . . . . . . . . . . . . 249 7.10 Sparse Representations . . . . . . . . . . . . . . . . . . . . . . . . 251 7.11 Bagging and Other Ensemble Methods . . . . . . . . . . . . . . . 253 7.12 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 7.13 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 265 7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 8 Optimization for Training Deep Models 271 8.1 How Learning Differs from Pure Optimization . . . . . . . . . . . 272 8.2 Challenges in Neural Network Optimization . . . . . . . . . . . . 279 8.3 Basic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 8.4 Parameter Initialization Strategies . . . . . . . . . . . . . . . . . 296 8.5 Algorithms with Adaptive Learning Rates . . . . . . . . . . . . . 302 8.6 Approximate Second-Order Methods . . . . . . . . . . . . . . . . 307 8.7 Optimization Strategies and Meta-Algorithms . . . . . . . . . . . 313 9 Convolutional Networks 326 9.1 The Convolution Operation . . . . . . . . . . . . . . . . . . . . . 327 9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 9.3 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 9.4 Convolution and Pooling as an Infinitely Strong Prior . . . . . . . 339 9.5 Variants of the Basic Convolution Function . . . . . . . . . . . . 342 9.6 Structured Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 352 9.7 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 iii CONTENTS 9.8 Efficient Convolution Algorithms . . . . . . . . . . . . . . . . . . 356 9.9 Random or Unsupervised Features . . . . . . . . . . . . . . . . . 356 9.10 The Neuroscientific Basis for Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 9.11 Convolutional Networks and the History of Deep Learning . . . . 365 10 Sequence Modeling: Recurrent and Recursive Nets 367 10.1 Unfolding Computational Graphs . . . . . . . . . . . . . . . . . . 369 10.2 Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . 372 10.3 Bidirectional RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . 388 10.4 Encoder-Decoder Sequence-to-Sequence Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 10.5 Deep Recurrent Networks . . . . . . . . . . . . . . . . . . . . . . 392 10.6 Recursive Neural Networks . . . . . . . . . . . . . . . . . . . . . . 394 10.7 The Challenge of Long-Term Dependencies . . . . . . . . . . . . . 396 10.8 Echo State Networks . . . . . . . . . . . . . . . . . . . . . . . . . 399 10.9 Leaky Units and Other Strategies for Multiple Time Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 10.10 The Long Short-Term Memory and Other Gated RNNs . . . . . . 404 10.11 Optimization for Long-Term Dependencies . . . . . . . . . . . . . 408 10.12 Explicit Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 11 Practical Methodology 416 11.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 417 11.2 Default Baseline Models . . . . . . . . . . . . . . . . . . . . . . . 420 11.3 Determining Whether to Gather More Data . . . . . . . . . . . . 421 11.4 Selecting Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 422 11.5 Debugging Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 431 11.6 Example: Multi-Digit Number Recognition . . . . . . . . . . . . . 435 12 Applications 438 12.1 Large-Scale Deep Learning . . . . . . . . . . . . . . . . . . . . . . 438 12.2 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 12.3 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 453 12.4 Natural Language Processing . . . . . . . . . . . . . . . . . . . . 456 12.5 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 473 iv CONTENTS III Deep Learning Research 482 13 Linear Factor Models 485 13.1 Probabilistic PCA and Factor Analysis . . . . . . . . . . . . . . . 486 13.2 Independent Component Analysis (ICA) . . . . . . . . . . . . . . 487 13.3 Slow Feature Analysis . . . . . . . . . . . . . . . . . . . . . . . . 489 13.4 Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 13.5 Manifold Interpretation of PCA . . . . . . . . . . . . . . . . . . . 496 14 Autoencoders 499 14.1 Undercomplete Autoencoders . . . . . . . . . . . . . . . . . . . . 500 14.2 Regularized Autoencoders . . . . . . . . . . . . . . . . . . . . . . 501 14.3 Representational Power, Layer Size and Depth . . . . . . . . . . . 505 14.4 Stochastic Encoders and Decoders . . . . . . . . . . . . . . . . . . 506 14.5 Denoising Autoencoders . . . . . . . . . . . . . . . . . . . . . . . 507 14.6 Learning Manifolds with Autoencoders . . . . . . . . . . . . . . . 513 14.7 Contractive Autoencoders . . . . . . . . . . . . . . . . . . . . . . 518 14.8 Predictive Sparse Decomposition . . . . . . . . . . . . . . . . . . 521 14.9 Applications of Autoencoders . . . . . . . . . . . . . . . . . . . . 522 15 Representation Learning 524 15.1 Greedy Layer-Wise Unsupervised Pretraining . . . . . . . . . . . 526 15.2 Transfer Learning and Domain Adaptation . . . . . . . . . . . . . 534 15.3 Semi-Supervised Disentangling of Causal Factors . . . . . . . . . 539 15.4 Distributed Representation . . . . . . . . . . . . . . . . . . . . . . 544 15.5 Exponential Gains from Depth . . . . . . . . . . . . . . . . . . . 550 15.6 Providing Clues to Discover Underlying Causes . . . . . . . . . . 552 16 Structured Probabilistic Models for Deep Learning 555 16.1 The Challenge of Unstructured Modeling . . . . . . . . . . . . . . 556 16.2 Using Graphs to Describe Model Structure . . . . . . . . . . . . . 560 16.3 Sampling from Graphical Models . . . . . . . . . . . . . . . . . . 577 16.4 Advantages of Structured Modeling . . . . . . . . . . . . . . . . . 579 16.5 Learning about Dependencies . . . . . . . . . . . . . . . . . . . . 579 16.6 Inference and Approximate Inference . . . . . . . . . . . . . . . . 580 16.7 The Deep Learning Approach to Structured Probabilistic Models . . . . . . . . . . . . . . . . . . . . . . . . . 581 17 Monte Carlo Methods 587 17.1 Sampling and Monte Carlo Methods . . . . . . . . . . . . . . . . 587 v CONTENTS 17.2 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 589 17.3 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . 592 17.4 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596 17.5 The Challenge of Mixing between Separated Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 18 Confronting the Partition Function 603 18.1 The Log-Likelihood Gradient . . . . . . . . . . . . . . . . . . . . 604 18.2 Stochastic Maximum Likelihood and Contrastive Divergence . . . 605 18.3 Pseudolikelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 18.4 Score Matching and Ratio Matching . . . . . . . . . . . . . . . . 615 18.5 Denoising Score Matching . . . . . . . . . . . . . . . . . . . . . . 617 18.6 Noise-Contrastive Estimation . . . . . . . . . . . . . . . . . . . . 618 18.7 Estimating the Partition Function . . . . . . . . . . . . . . . . . . 621 19 Approximate Inference 629 19.1 Inference as Optimization . . . . . . . . . . . . . . . . . . . . . . 631 19.2 Expectation Maximization . . . . . . . . . . . . . . . . . . . . . . 632 19.3 MAP Inference and Sparse Coding . . . . . . . . . . . . . . . . . 633 19.4 Variational Inference and Learning . . . . . . . . . . . . . . . . . 636 19.5 Learned Approximate Inference . . . . . . . . . . . . . . . . . . . 648 20 Deep Generative Models 651 20.1 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 651 20.2 Restricted Boltzmann Machines . . . . . . . . . . . . . . . . . . . 653 20.3 Deep Belief Networks . . . . . . . . . . . . . . . . . . . . . . . . . 657 20.4 Deep Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . 660 20.5 Boltzmann Machines for Real-Valued Data . . . . . . . . . . . . . 673 20.6 Convolutional Boltzmann Machines . . . . . . . . . . . . . . . . . 679 20.7 Boltzmann Machines for Structured or Sequential Outputs . . . . 681 20.8 Other Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . 683 20.9 Back-Propagation through Random Operations . . . . . . . . . . 684 20.10 Directed Generative Nets . . . . . . . . . . . . . . . . . . . . . . . 688 20.11 Drawing Samples from Autoencoders . . . . . . . . . . . . . . . . 707 20.12 Generative Stochastic Networks . . . . . . . . . . . . . . . . . . . 710 20.13 Other Generation Schemes . . . . . . . . . . . . . . . . . . . . . . 712 20.14 Evaluating Generative Models . . . . . . . . . . . . . . . . . . . . 713 20.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 Bibliography 717 vi CONTENTS Index 774

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值