gym模块citylearn解读

1. 查看staste_space和 action_space

citylearn.py line 303

self.buildings, self.observation_spaces, self.action_spaces, 
self.observation_space, self.action_space = building_loader(
     data_path, building_attributes, weather_file, solar_profile,
     building_ids, self.buildings_states_actions)

因而我们要查看的对象有self.observation_space, self.action_space和self.observation_spaces, self.action_spaces

进入函数building_loader()

return buildings, observation_spaces, action_spaces, observation_space_central_agent, action_space_central_agent
# self.buildings, self.observation_spaces, self.action_spaces, self.observation_space, self.action_space

故后面四个变量是我们要在building_loader()中查找的

1.1 observation_spaces, action_spaces

#Finding the max and min possible values of all the states, which can
#then be used by the RL agent to scale the states and train any function
#approximators more effectively
s_low.append(min(building.sim_results[state_name]))  #line168
s_high.append(max(building.sim_results[state_name]))

building.set_state_space(np.array(s_high), np.array(s_low))  #line238
'''
在energy_models.py的class building中
class Building:
      def set_state_space(self, high_state, low_state):
      		# Setting the state space and the lower and upper bounds of each state-variable
      		self.observation_space = spaces.Box(
      		low=low_state, high=high_state, dtype=np.float32)
'''
observation_spaces.append(building.observation_space)   #line247

action_spaces查找同理

1.2 observation_space_central_agent, action_space_central_agent

对应citylearn.observation_space

s_low_central_agent.append(min(building.sim_results[state_name]))   #line 181
s_high_central_agent.append(max(building.sim_results[state_name]))

observation_space_central_agent = spaces.Box(low=np.array(s_low_central_agent), 
high=np.array(s_high_central_agent), dtype=np.float32) #line252

2. step(actions)

2.1 actions是什么

SAC.py文件的SACAgentCore类中有

def select_action():
	action = self.actor(hidden_state, **kwargs)  #line344

这个action维度、类型、如何运算得到,目前不清楚

2.2 if self.central_agent / else

默认central_agent=False,即decentralized multi-agent controller,对应reward_function_ma()
如果central_agent=True,即acentral single-agent,对应reward_function_sa()

If the agent is centralized, all the actions for all the buildings are provided as an ordered list of numbers. 
The order corresponds to the order of the buildings as they appear on the file building_attributes.json, 
and only considering the buildings selected for the simulation by the user (building_ids).

当我们把central_agent改为True时,citylearn可以模拟单个building的用电

2.3 reward

building_electric_demand = 0 #line 427

# Adding loads from appliances and subtracting solar generation to the net electrical load of each building
building_electric_demand += _electric_demand_cooling + _electric_demand_dhw + _non_shiftable_load - _solar_generation   #line 461

rewards.append(-building_electric_demand)    #line 464

rewards = reward_function_ma(rewards)     #line 508
'''ma:decentralized multi-agent, takes the total net electricity consumption of each building
 (< 0 if generation is higher than demand) at every time-step as input and returns a list 
 with as many rewards as the number of agents '''
 self.cumulated_reward_episode += sum(rewards)

2.4 state, done

return (self._get_ob(), rewards, terminal, {})
#self._get_ob() return self.state
#self._terminal() return is_terminal = bool(self.time_step >= self.simulation_period[1])
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值