在gym的MountainCar环境下,用图片帧作为状态训练DQN网络

Apply DQN in gym environment in MountainCar-v0

一、Gym Environment

首先启动环境,采取随机的动作后会返回几个变量,简单的代码过程如下:

env = gym.make('MountainCar-v1') # 打开一个环境,这个环境是修改后的后面会讲
env.reset() # 重置环境
action = env.action_space.sample() # 从动作空间里随机采样一个动作
state, reward, done, info = env.step(action) # 采取动作后,获得了状态,奖励,是否完成和info额外信息。
1.1 ACTION SPACE

动作空间有三个,分别是左,原地不动和右,离散的形式为action=[0,1,2]

1.2 STATE SPACE

原本的状态是两个,分别是车辆的位置和速度,离散的形式为state=[position,velocity]

其中,position=[-0.6,0.6],velocity=[-0.1,0.1]

传统的方法是通过确定的状态来更新Q-value,本实验将不同的图片帧作为状态,通过卷积神经网络输出一个Q-value,进一步再选择动作。

1.3 REWARD

奖励为-1 / time step

1.4 DONE

一般是DONE=FALSE,当完成任务后,DONE=TRUE,然而内置版本设定为200个episode没有到达终点,则环境重置env.reset()

❗️由于重置后Done会重新设置为True,而到达终点时候的Done也是True,多数情况下,很难在200个episode中收敛,不利于训练,因此在注册函数中更改一下信息。

位置为:XXX/anaconda3/envs/py36/lib/python3.6/site-packages/gym/envs/__init__.py

打开后可以看到有如下两个版本,分别是离散连续的两种情况。

复制粘贴第一个环境,修改最大episode为100000后,注册为MountainCar-v1并且保存。

register(
    id='MountainCar-v0',
    entry_point='gym.envs.classic_control:MountainCarEnv',
    max_episode_steps=200,
    reward_threshold=-110.0,
)
register(
    id='MountainCarContinuous-v0',
    entry_point='gym.envs.classic_control:Continuous_MountainCarEnv',
    max_episode_steps=999,
    reward_threshold=90.0,
)
# 新注册一个环境
register(
    id='MountainCar-v1',
    entry_point='gym.envs.classic_control:MountainCarEnv',
    max_episode_steps=100000,
    reward_threshold=-110.0,
)

2021.3.26 补充一下另一种修改gym最大episode的方式

import gym
env = gym.make("CartPole-v0")
print("The default max episode is",env._max_episode_steps)
env._max_episode_steps = 500
print("After changing, the max episode is",env._max_episode_steps)

The default max episode is 200
After changing, the max episode is 500

二、Deep Q-learning

2.1 Preprocess Frame

事实上,在训练时候并不需要那么大的图片,主要分为以下操作过程:

  • RGB的图片转换成黑白图像
  • (可选择)剪裁掉没有用的区域,例如游戏的计分板区域
  • 归一化,将像素值转换为[0,1]之间,减少计算量
  • 再次改变大小,将图片转换成[84,84]的区域。
def preprocess_frame(frame):
    gray = rgb2gray(frame)
   #crop the frame
   #cropped_frame = gray[:,:]
    normalized_frame = gray/255.0
    preprocessed_frame = transform.resize(normalized_frame, [84,84])
    return preprocessed_frame
2.2 Stack Frames

将每四帧图片堆叠在一起,因此得到的图像大小为[84,84,4]

此时需要分两种情况讨论,首先建立一个双端队列deque,每个位置的大小为[84,84],最大队列长度为4

  • 新的episode(DONE=True

    • 初始化,将第一帧复制四份填满这个队列
  • 当前episode还未结束(DONE=False

    • 将最新的帧添加到队列中,并且旧的帧自动移除队列

def stack_frames(stacked_frames, state, is_new_episode):
    # Preprocess frame
    frame = preprocess_frame(state)
    
    if is_new_episode:
        # Clear our stacked_frames
        stacked_frames = deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
        
        # Because we're in a new episode, copy the same frame 4x
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        
        # Stack the frames
        stacked_state = np.stack(stacked_frames, axis=2)
        
    else:
        # Append frame to deque, automatically removes the oldest frame
        stacked_frames.append(frame)

        # Build the stacked state (first dimension specifies different frames)
        stacked_state = np.stack(stacked_frames, axis=2) 
    
    return stacked_state, stacked_frames

在存储图片后,也可以通过读取的方式显示这些图片。

batch_size = 64
# 调用DQN类,创建一个memory
memory = DQNetwork()

replay_batch = memory.sample(batch_size) 

s_batch = [replay[0] for replay in replay_batch][np.random.randint(0,batch_size)] 
# print("the shape of s_batch is",s_batch.shape)
# (s_batch[63]).shape       84*84*4 
next_s_batch = [replay[3] for replay in replay_batch][np.random.randint(0,batch_size)] # (s_batch[63]).shape       84*84*4 
# print("the shape of next_s_batch is",next_s_batch.shape)
# 然后可以输出对应的图片,需要乘以255并且选择通道。

plt.imshow(255*s_batch[:,:,3])
2.3 Replay Buffer
  • 在预测Q值的时候,用的不是当前的状态,而是经验回放池中抽取的状态
  • 依次定义三个过程
    • 添加经验
    • 从经验中采样
    • 再从中采样出一组经验
# 定义 batch_size,每次从Replay Buffer中抽取多少batch
    def add(self, experience):
        self.buffer.append(experience)
        
    def sample(self, batch_size):
        buffer_size = len(self.buffer) 
        index = np.random.choice(np.arange(buffer_size),size = batch_size,replace = True) #如果开始的时候都是0,采样会重复,因此改为True。
        return [self.buffer[i] for i in index]
    
	def train(self, batch_size=64):
        replay_batch = self.sample(batch_size)
        #简化一个过程,抽取了若干batch后,先随机抽取一个batch
        batch_number = np.random.randint(0,batch_size)
        s_batch = [replay[0] for replay in replay_batch][batch_number]
		next_s_batch = [replay[0] for replay in replay_batch][batch_number]
        

此时来分析一下维度:

  • replay_batch:大小为 64 × 5 64\times 5 64×5,包含了64个batch,并且每个batch中由5个部分组成
  • replay:大小为 1 × 5 1\times 5 1×5,而后面的batch_number决定了这个replay是这64中的哪一组。
    • replay由5部分组成,里面的内容为[(84,84,4),action=[0,1,2],reward=-1,(84,84,4),False or True]
  • s_batch:即这个replay中的replay[0],大小为 84 × 84 × 4 84\times 84 \times 4 84×84×4,由连续四帧堆叠而成。
  • next_s_batch:即这个replay中的replay[3],大小为 84 × 84 × 4 84\times 84 \times 4 84×84×4,由连续四帧堆叠而成。
  • actionreplay[1],动作为[0,1,2]中的一个随机动作
  • rewardreplay[2],表示当前时间步的奖励
  • Donereplay[5]True或者False,表示当前episode是否结束。
2.4 Q-target network and Q network
class DQNetwork:
    def __init__(self):
        self.step = 0
        self.update_freq = 50  # 模型更新频率
    # 省略一些函数,看一下关键语句
    # 每 update_freq 步,将 model 的权重赋值给 target_model
    def train(self):
    	self.step += 1
        if self.step % self.update_freq == 0:
            self.target_model.set_weights(self.model.get_weights())
    # 省略一些语句,看一下关键语句
        Q = self.model.predict(s_batch.reshape(-1, 84, 84, 4))
        Q_next = self.target_model.predict(next_s_batch.reshape(-1, 84, 84, 4))
  • Q-target network的参数每间隔一定步数,将Q network的权重复制过去。
  • Q-target network的网络结构和Q network完全一样

这里reshape(-1,84,84,4)是将其增加了一个维度,由(84,84,4)变成了(1,84,84,4),因为在输入网络的时候是batch=1,输入的格式为(batch_size,length,width,channels),因此需要改变输入大小。

2.5 Network Model
    def create_model(self):

        # 这个网络是原版Atari的网络架构
        inputs = layers.Input(shape=(84, 84, 4,))

        # Convolutions on the frames on the screen
        layer1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)
        layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(layer1)
        layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(layer2)

        layer4 = layers.Flatten()(layer3)

        layer5 = layers.Dense(512, activation="relu")(layer4)
        action = layers.Dense(3, activation="linear")(layer5)

        model=keras.Model(inputs=inputs, outputs=action)

        return model

使用model.summary()后可以看到它的结构:

Model: "model_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_19 (InputLayer)        [(None, 84, 84, 4)]       0         
_________________________________________________________________
conv2d_54 (Conv2D)           (None, 20, 20, 32)        8224      
_________________________________________________________________
conv2d_55 (Conv2D)           (None, 9, 9, 64)          32832     
_________________________________________________________________
conv2d_56 (Conv2D)           (None, 7, 7, 64)          36928     
_________________________________________________________________
flatten_18 (Flatten)         (None, 3136)              0         
_________________________________________________________________
dense_36 (Dense)             (None, 512)               1606144   
_________________________________________________________________
dense_37 (Dense)             (None, 3)                 1539      
=================================================================
Total params: 1,685,667
Trainable params: 1,685,667
Non-trainable params: 0
2.6 Update Q-Value

其主要公式为:
Q n e w = ( 1 − α ) Q o l d + α ( r e w a r d + γ m a x Q f u t u r e ) Q_{new}=(1-\alpha)Q_{old}+\alpha(reward+\gamma maxQ_{future}) Qnew=(1α)Qold+α(reward+γmaxQfuture)
其中, α \alpha α表示学习率, γ \gamma γ表示折扣系数。

  • Q f u t u r e Q_{future} Qfuture用的是Q-target network
  • Q o l d Q_{old} Qold用的是Q network

更新后,将 Q n e w Q_{new} Qnew的值给Q network

Q[0][a] = ( 1 - lr) * Q[0][a] + lr * (reward + factor * np.amax(Q_next[0]))
  • Q的值为:
Q = array([[ 0.2  , -0.525,  0.4  ]])

总结一下,更新过程如下:

  1. 初始化 Q=Q target

    repeat

    1. 通过公式,用Q和Q target更新 Q
    2. 每隔一段时间,Q target复制Q的权重更新Q target
2.7 Save and Load Weights
class DQNetwork:
    def save_model(self, file_path='MountainCar-v1-dqn.h5'):
        print('model saved')
        self.model.save(file_path)
agent_test = DQNetwork()
agent_test.model.load_weights(r'/home/shy/桌面/MountainCar-v1-dqn.h5')

如何确认权重是否读取成功呢?在TensorFlow 1中,读取后会显示权重对应存储的列表。

而TensorFlow 2中,可能没有显示这些信息,读取后使用get_weights()方法,如果再次创建一个实体发现权重一样,说明读取进去了,否则每次随机初始化的权重不同,说明读取失败。

三、Some Details

3.1 Save Frames
  • 在gym中,使用env.render(mode='rgb_array')便可以渲染出当前的状态图片,但是这种方式消耗CPU过大,在这里也看到了很多人提出了这个问题https://github.com/openai/gym/issues/659,也许可以利用内存回收机制,节约一些内存。

经典控制的游戏并没有提供不渲染就能返回图片的接口,也就是这种写法env.render(mode='rgb_array',close=True),而在一些Atari的部分游戏中,可以采用此方法节约内存空间。

  • 如果不将图片保存到本地,将env.render(mode='rgb_array')返回给一个变量,然后经过了动作后,再次env.render(mode='rgb_array')返回另外一个变量,显示图片发现指向了同一个空间,而这两个变量的图片是一样的。
  • 因此采用了将图片保存到本地,再次从本地读取的办法处理这个问题。
def save_gym_state(env,i):

    next_state = env.render(mode='rgb_array')

    plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(next_state)) 
    next_state = plt.imread('/home/shy/state/state{}.png'.format(i))
    
    return next_state
3.2 Train and Test Agent

这里分为三个步骤,

  • 没有权重时,初始化权重训练出Q network
  • 将学习好的权重读入,再次训练Q network
  • Q network测试,直接返回Q值最大对应的动作

由于内存空间不足,第二个步骤可以将episode设为1000,反复训练多次可以拟合出更准的Q network

四、Full Code

4.1 Import
import matplotlib.pyplot as plt
import tensorflow as tf
import gym
import scipy
import numpy as np
from skimage import transform # Help us to preprocess the frames
from skimage.color import rgb2gray # Help us to gray our frames
from collections import deque
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
config = ConfigProto()
config.allow_soft_placement=True
config.gpu_options.per_process_gpu_memory_fraction=0.8
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
4.2 Defined Function and Class
# 构造函数,输入episode,结果保存图片或返回处理好的数组
def preprocess_frame(frame):
    gray = rgb2gray(frame)
   #crop the frame
   #cropped_frame = gray[:,:]
    normalized_frame = gray/255.0
    preprocessed_frame = transform.resize(normalized_frame, [84,84])
    return preprocessed_frame


stack_size = 4 # We stack 4 frames

# Initialize deque with zero-images one array for each image
stacked_frames  =  deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)

def stack_frames(stacked_frames, state, is_new_episode):
    # Preprocess frame
    frame = preprocess_frame(state)
    
    if is_new_episode:
        # Clear our stacked_frames
        stacked_frames = deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
        
        # Because we're in a new episode, copy the same frame 4x
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        stacked_frames.append(frame)
        
        # Stack the frames
        stacked_state = np.stack(stacked_frames, axis=2)
        
    else:
        # Append frame to deque, automatically removes the oldest frame
        stacked_frames.append(frame)

        # Build the stacked state (first dimension specifies different frames)
        stacked_state = np.stack(stacked_frames, axis=2) 
    
    return stacked_state, stacked_frames

def save_gym_state(env,i):

    next_state = env.render(mode='rgb_array')

    plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(next_state)) 
    next_state = plt.imread('/home/shy/state/state{}.png'.format(i))
    
    return next_state
class DQNetwork:
    def __init__(self):
        self.step = 0
        self.update_freq = 50  # 模型更新频率
        
        self.buffer = deque(maxlen = 200)
        self.model = self.create_model()

        self.target_model = self.create_model()
    def add(self, experience):
        self.buffer.append(experience)
    
    def sample(self, batch_size ):
        buffer_size = len(self.buffer) #agent.buffer=0??这里有问题
        
        index = np.random.choice(np.arange(buffer_size),
                                size = batch_size,
                                replace = True) #如果开始的时候都是0,不然会出现采样重复的状况,因此改称True
        
        return [self.buffer[i] for i in index]
        
    def create_model(self):

        # 这个网络是原版Atari的网络架构
        inputs = layers.Input(shape=(84, 84, 4,))

        # Convolutions on the frames on the screen
        layer1 = layers.Conv2D(32, 8, strides=4, activation="relu")(inputs)
        layer2 = layers.Conv2D(64, 4, strides=2, activation="relu")(layer1)
        layer3 = layers.Conv2D(64, 3, strides=1, activation="relu")(layer2)

        layer4 = layers.Flatten()(layer3)

        layer5 = layers.Dense(512, activation="relu")(layer4)
        action = layers.Dense(3, activation="linear")(layer5)

        model=keras.Model(inputs=inputs, outputs=action)

        return model

    
    def act(self, state, epsilon=0.1):
        """预测动作"""
        # 刚开始时,加一点随机成分,产生更多的状态
        if np.random.uniform() < epsilon - self.step * 0.0002:
            return np.random.choice([0, 1, 2])
        return np.argmax(self.model.predict(state.reshape(-1, 84, 84, 4)))
                         
    def save_model(self, file_path='MountainCar-v0-dqn.h5'):
        print('model saved')
        self.model.save(file_path)
                         
    def train(self, batch_size=64, lr=0.1, factor=0.95):

        self.step += 1
        # 每 update_freq 步,将 model 的权重赋值给 target_model
        if self.step % self.update_freq == 0:
            self.target_model.set_weights(self.model.get_weights())
        
        replay_batch = self.sample(batch_size) 
        
       #num = np.random.randint(0,batch_size) # 举例子,一个最基本的情况,从该batch中随机抽取一个样本作为输入
        
        s_batch = [replay[0] for replay in replay_batch][np.random.randint(0,batch_size)] 
        #print("the shape of s_batch is",np.array(s_batch).shape)
        # (s_batch[63]).shape       84*84*4   
        next_s_batch = [replay[3] for replay in replay_batch][np.random.randint(0,batch_size)] # 84,84,4 正确
        #print("the shape of next_s_batch is",np.array(next_s_batch).shape)

        Q = self.model.predict(s_batch.reshape(-1, 84, 84, 4))
        Q_next = self.target_model.predict(next_s_batch.reshape(-1, 84, 84, 4))

        # 使用公式更新训练集中的Q值 这里的句子还没有改,这里现打印一下吧
        for i, replay in enumerate(replay_batch):
#             print("the shape of replay_batch is",np.array(replay_batch).shape)
#             print("the shape of replay is ",replay[0].shape)
#             print("the action of replay is",replay[1])
#             print("the reward of replay is ",replay[2])
#             print("the shape of next_replay is ",replay[3].shape)
#             print("the last replay is ",replay[4])
            a = replay[1]
            Q[0][a] = ( 1 - lr) * Q[0][a] + lr * (reward + factor * np.amax(Q_next[0]))
        
        # 传入网络进行训练
        self.model.compile(loss='mean_squared_error',
                           optimizer=keras.optimizers.Adam(0.001))
        self.model.fit(s_batch.reshape(-1, 84, 84, 4), Q, verbose=0) 
4.3 Main Function (Random Initialization)
###########################
#这个是初次训练时候的主函数####
###########################
env = gym.make('MountainCar-v1')
episodes = 1000  # 训练1000次
score_list = [] 
agent = DQNetwork()
env.reset()
##########
# 1.修改注册后,要重启内核
# 2.env.render(mode='rgb_array',close=True) 可以节约内存,但这个环境不适用
# 3.在注册里修改里面的max变量后,增加episode,否则默认的episode是200的时候自动结束开启新的周期
###########

#state = env.render(mode='rgb_array')
#plt.imsave('/home/shy/state/state.png',preprocess_frame(state))
#state = plt.imread('/home/shy/state/state.png')

state = save_gym_state(env,'init')

stacked_frames  =  deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
score = 0
for i in range(episodes):
    #action = env.action_space.sample() # 初始化,随机从环境中选取一个动作 
    action = agent.act(state)
    _ , reward, done, _ = env.step(action)

    if i % 200 == 0:
        print("# It has finished {} episodes".format(i))
    
    #next_state = env.render(mode='rgb_array')
    #plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(next_state)) 
    #next_state = plt.imread('/home/shy/state/state{}.png'.format(i))
	next_state = save_gym_state(env,i)
    
    next_state, stacked_frames = stack_frames(stacked_frames, next_state, False)
    if done: # 说明这个游戏暂时结束了
        next_state = np.zeros([84,84,4]) #那么上个周期的最后没有未来记忆,因此需要把最后一个的记忆给成1
        score += reward
        score_list.append(score)
        agent.add((state, action, reward, next_state, done))
        print("# 第{}个episode的reward为".format(i),score)
        env.reset() 
        state = env.render(mode='rgb_array') 
        score = 0 # 分数置为0
        #plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(state))
        #state = plt.imread('/home/shy/state/state{}.png'.format(i))
  		state =  save_gym_state(env,i)
        
        state, stacked_frames = stack_frames(stacked_frames, state, True)
    else:
        #print("the shape of state is ",state.shape)
        #print("the shape of next_state is ",next_state.shape)
        agent.add((state, action, reward, next_state, done)) # 将获得的状态添加到记忆memroy中
        agent.train() ## 这里要用同一个类,不然添加不上记忆
        score += reward
        score_list.append(score)
        state = next_state
agent.save_model(r'/home/shy/桌面/MountainCar-v1-dqn.h5')
4.4 Main Function (Trained Weights Initialization)
#########################################
#这个是得到了训练权重后,再次训练时候的主函数####
#########################################
env = gym.make('MountainCar-v1')
episodes = 1000  # 训练1000次
score_list = [] 
agent_train = DQNetwork()
agent_train.model.load_weights(r'/home/shy/桌面/MountainCar-v1-dqn.h5')
env.reset()
##########
# 1.修改注册后,要重启内核
# 2.env.render(mode='rgb_array',close=True) 可以节约内存,但这个环境不适用
# 3.在注册里修改里面的max变量后,增加episode,否则默认的episode是200的时候自动结束开启新的周期
###########

#state = env.render(mode='rgb_array')
#plt.imsave('/home/shy/state/state.png',preprocess_frame(state))
#state = plt.imread('/home/shy/state/state.png')

state =  save_gym_state(env,'init')
stacked_frames  =  deque([np.zeros((84,84), dtype=np.int) for i in range(stack_size)], maxlen=4)
score = 0
for i in range(episodes):
    #action = env.action_space.sample() # 初始化,随机从环境中选取一个动作 
    action = agent.act(state)
    _ , reward, done, _ = env.step(action)

    if i % 200 == 0:
        print("# It has finished {} episodes".format(i))
    
    #next_state = env.render(mode='rgb_array')
    #plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(next_state)) 
    #next_state = plt.imread('/home/shy/state/state{}.png'.format(i))
    next_state = save_gym_state(env,i)
    next_state, stacked_frames = stack_frames(stacked_frames, next_state, False)
    if done: # 说明这个游戏暂时结束了
        next_state = np.zeros([84,84,4]) #那么上个周期的最后没有未来记忆,因此需要把最后一个的记忆给成1
        score += reward
        score_list.append(score)
        agent_train.add((state, action, reward, next_state, done))
        print("# 第{}个episode的reward为".format(i),score)
        env.reset() 
        state = env.render(mode='rgb_array') 
        score = 0 # 分数置为0
        #plt.imsave('/home/shy/state/state{}.png'.format(i),preprocess_frame(state))
        #state = plt.imread('/home/shy/state/state{}.png'.format(i))
        state =  save_gym_state(env,i)
        
        state, stacked_frames = stack_frames(stacked_frames, state, True)
    else:
        #print("the shape of state is ",state.shape)
        #print("the shape of next_state is ",next_state.shape)
        agent_train.add((state, action, reward, next_state, done)) # 将获得的状态添加到记忆memroy中
        agent_train.train() ## 这里要用同一个类,不然添加不上记忆
        score += reward
        score_list.append(score)
        state = next_state
agent_train.save_model(r'/home/shy/桌面/MountainCar-v1-dqn_weights.h5')
4.4 Main Function (Test)
# 测试时候的改动
     action = agent_weight_train.act(state) # 直接选择Q值最大的动作
# 取消训练过程和添加记忆的过程,只保留将4帧图片保存到队列的过程,作为网络的输入

演示效果:Bilibili

  • 5
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值