1. 查看staste_space和 action_space
citylearn.py line 303
self.buildings, self.observation_spaces, self.action_spaces,
self.observation_space, self.action_space = building_loader(
data_path, building_attributes, weather_file, solar_profile,
building_ids, self.buildings_states_actions)
因而我们要查看的对象有self.observation_space, self.action_space和self.observation_spaces, self.action_spaces
进入函数building_loader()
return buildings, observation_spaces, action_spaces, observation_space_central_agent, action_space_central_agent
# self.buildings, self.observation_spaces, self.action_spaces, self.observation_space, self.action_space
故后面四个变量是我们要在building_loader()中查找的
1.1 observation_spaces, action_spaces
#Finding the max and min possible values of all the states, which can
#then be used by the RL agent to scale the states and train any function
#approximators more effectively
s_low.append(min(building.sim_results[state_name])) #line168
s_high.append(max(building.sim_results[state_name]))
building.set_state_space(np.array(s_high), np.array(s_low)) #line238
'''
在energy_models.py的class building中
class Building:
def set_state_space(self, high_state, low_state):
# Setting the state space and the lower and upper bounds of each state-variable
self.observation_space = spaces.Box(
low=low_state, high=high_state, dtype=np.float32)
'''
observation_spaces.append(building.observation_space) #line247
action_spaces查找同理
1.2 observation_space_central_agent, action_space_central_agent
对应citylearn.observation_space
s_low_central_agent.append(min(building.sim_results[state_name])) #line 181
s_high_central_agent.append(max(building.sim_results[state_name]))
observation_space_central_agent = spaces.Box(low=np.array(s_low_central_agent),
high=np.array(s_high_central_agent), dtype=np.float32) #line252
2. step(actions)
2.1 actions是什么
SAC.py文件的SACAgentCore类中有
def select_action():
action = self.actor(hidden_state, **kwargs) #line344
这个action维度、类型、如何运算得到,目前不清楚
2.2 if self.central_agent / else
默认central_agent=False,即decentralized multi-agent controller,对应reward_function_ma()
如果central_agent=True,即acentral single-agent,对应reward_function_sa()
If the agent is centralized, all the actions for all the buildings are provided as an ordered list of numbers.
The order corresponds to the order of the buildings as they appear on the file building_attributes.json,
and only considering the buildings selected for the simulation by the user (building_ids).
当我们把central_agent改为True时,citylearn可以模拟单个building的用电
2.3 reward
building_electric_demand = 0 #line 427
# Adding loads from appliances and subtracting solar generation to the net electrical load of each building
building_electric_demand += _electric_demand_cooling + _electric_demand_dhw + _non_shiftable_load - _solar_generation #line 461
rewards.append(-building_electric_demand) #line 464
rewards = reward_function_ma(rewards) #line 508
'''ma:decentralized multi-agent, takes the total net electricity consumption of each building
(< 0 if generation is higher than demand) at every time-step as input and returns a list
with as many rewards as the number of agents '''
self.cumulated_reward_episode += sum(rewards)
2.4 state, done
return (self._get_ob(), rewards, terminal, {})
#self._get_ob() return self.state
#self._terminal() return is_terminal = bool(self.time_step >= self.simulation_period[1])