citylearn模块single_agent奖励函数reward_function设计

citylearn模块single_agent奖励函数reward_function设计
参考文献:
A Centralised Soft Actor Critic Deep Reinforcement Learning
Approach to District Demand Side Management through
CityLearn 2020:11–4. doi:10.1145/3427773.3427869.

在这里插入图片描述
每天的累计reward在0-300范围内

net_electric_consumption in one day  -13.72085932928827
net_electric_consumption in one day  -87.24712266261261
net_electric_consumption in one day  -101.96857692574041
net_electric_consumption in one day  -97.90874044831011
net_electric_consumption in one day  -122.00274374121156
net_electric_consumption in one day  -123.47533288637986
net_electric_consumption in one day  -20.941267323103407
net_electric_consumption in one day  -14.24673384461663
net_electric_consumption in one day  -81.38051917535762
net_electric_consumption in one day  -113.88382543664981
net_electric_consumption in one day  -135.83601615225706
net_electric_consumption in one day  -147.74054092999444

1

            	if env.time_step % 24 == 0:
                #print('net_electric_consumption in one day ', sum(env.net_electric_consumption[-24:-1]))
                # 如果白天放电,reward_day = 0,否则为-300
                if np.array(action_day[7:18]).mean() > 2:
                    reward_day = -300
                else:
                    reward_day = 0
                # 如果夜晚充电,reward_day = 300,否则为-300
                if np.array(action_day[0:6]).mean() > 2.1:
                    reward_night = 300
                else:
                    reward_night = -300
            reward = reward + reward_day + reward_night
 
 # 耗电量为负,则电价为0,如果耗电量为正,则电价为正, reward为负
def reward_function_sa(electricity_demand):
    #print('electricity_demand ', electricity_demand)
    total_energy_demand = 0
    for e in electricity_demand:
        total_energy_demand += -e

    price = max(total_energy_demand * 0.01, 0)
    #print('price ', price)

    for i in range(len(electricity_demand)):
        electricity_demand[i] = price * electricity_demand[i]

    return sum(electricity_demand)

参数:
MAX_EPISODES = 300
learn_rate = 0.001

在这里插入图片描述

2

# 鼓励晚上充电,白天放电
            if env.time_step % 24 == 0:
                #print('net_electric_consumption in one day ', sum(env.net_electric_consumption[-24:-1]))
                # 如果白天放电,reward_day = 0,否则为-300
                if np.array(action_day[7:-1]).mean() > 2.0![请添加图片描述](https://img-blog.csdnimg.cn/5eb0bae13a00449ea1223da57250b047.png)
:
                    reward_day = -10
                else:
                    reward_day = 1
                # 如果夜晚充电,reward_day = 300,否则为-300
                if np.array(action_day[0:7]).mean() > 2.0:
                    reward_night = 10
                else:
                    reward_night = -3
            reward = reward + reward_day + reward_night
            #print('reward: ', reward)
            reward_epoch += reward

MAX_EPISODES = 100
learn_rate = 0.00001在这里插入图片描述

3

            if env.time_step % 24 == 0:
                #print('net_electric_consumption in one day ', sum(env.net_electric_consumption[-24:-1]))
                # 如果白天放电,reward_day = 0,否则为-300
                if np.array(action_day[7:-1]).mean() > 2.0:
                    reward_day = -30
                else:
                    reward_day = 10
                # 如果夜晚充电,reward_day = 300,否则为-300
                if np.array(action_day[0:7]).mean() > 2.1:
                    reward_night = 40
                else:
                    reward_night = -30
            reward = reward + reward_day + reward_night
            #print('reward: ', reward)
            reward_epoch += reward

MAX_EPISODES = 100
learn_rate = 0.00001

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值