gym模块citylearn解读

最新推荐文章于 2024-06-07 09:49:49 发布

xiaochen_hzau

最新推荐文章于 2024-06-07 09:49:49 发布

阅读量527

点赞数 1

文章标签： pytorch 智慧城市深度学习

本文链接：https://blog.csdn.net/xiaochen_hzau/article/details/127599228

版权

1. 查看staste_space和 action_space

citylearn.py line 303

self.buildings, self.observation_spaces, self.action_spaces, 
self.observation_space, self.action_space = building_loader(
     data_path, building_attributes, weather_file, solar_profile,
     building_ids, self.buildings_states_actions)

因而我们要查看的对象有self.observation_space, self.action_space和self.observation_spaces, self.action_spaces

进入函数building_loader（）

return buildings, observation_spaces, action_spaces, observation_space_central_agent, action_space_central_agent
# self.buildings, self.observation_spaces, self.action_spaces, self.observation_space, self.action_space

故后面四个变量是我们要在building_loader（）中查找的

1.1 observation_spaces, action_spaces

#Finding the max and min possible values of all the states, which can
#then be used by the RL agent to scale the states and train any function
#approximators more effectively
s_low.append(min(building.sim_results[state_name]))  #line168
s_high.append(max(building.sim_results[state_name]))

building.set_state_space(np.array(s_high), np.array(s_low))  #line238
'''
在energy_models.py的class building中
class Building:
      def set_state_space(self, high_state, low_state):
      		# Setting the state space and the lower and upper bounds of each state-variable
      		self.observation_space = spaces.Box(
      		low=low_state, high=high_state, dtype=np.float32)
'''
observation_spaces.append(building.observation_space)   #line247

action_spaces查找同理

1.2 observation_space_central_agent, action_space_central_agent

对应citylearn.observation_space

s_low_central_agent.append(min(building.sim_results[state_name]))   #line 181
s_high_central_agent.append(max(building.sim_results[state_name]))

observation_space_central_agent = spaces.Box(low=np.array(s_low_central_agent), 
high=np.array(s_high_central_agent), dtype=np.float32) #line252

2. step(actions)

2.1 actions是什么

SAC.py文件的SACAgentCore类中有

def select_action():
	action = self.actor(hidden_state, **kwargs)  #line344

这个action维度、类型、如何运算得到，目前不清楚

2.2 if self.central_agent / else

默认central_agent=False，即decentralized multi-agent controller，对应reward_function_ma()
如果central_agent=True，即acentral single-agent，对应reward_function_sa()

If the agent is centralized, all the actions for all the buildings are provided as an ordered list of numbers. 
The order corresponds to the order of the buildings as they appear on the file building_attributes.json, 
and only considering the buildings selected for the simulation by the user (building_ids).

当我们把central_agent改为True时，citylearn可以模拟单个building的用电

2.3 reward

building_electric_demand = 0 #line 427

# Adding loads from appliances and subtracting solar generation to the net electrical load of each building
building_electric_demand += _electric_demand_cooling + _electric_demand_dhw + _non_shiftable_load - _solar_generation   #line 461

rewards.append(-building_electric_demand)    #line 464

rewards = reward_function_ma(rewards)     #line 508
'''ma:decentralized multi-agent, takes the total net electricity consumption of each building
 (< 0 if generation is higher than demand) at every time-step as input and returns a list 
 with as many rewards as the number of agents '''
 self.cumulated_reward_episode += sum(rewards)

2.4 state, done

return (self._get_ob(), rewards, terminal, {})
#self._get_ob() return self.state
#self._terminal() return is_terminal = bool(self.time_step >= self.simulation_period[1])