Flexsim 强化学习

2 篇文章 0 订阅
2 篇文章 0 订阅

效果

本教程主要是复现了Flexsim 2022最新的官方文档的样例。废话不多说先上优化前后效果(相同倍速),主要是根据等待时间策略学习出了比较好的处理产品的先后顺序。

Flexsim强化学习优化前


Flexsim强化学习优化后

环境依赖

  1. Flexsim 2022
  2. Python 3.0,相关库包括
    Gym
    Stable-baselines3

模型搭建

Flexsim模型搭建

  1. 新建模型,拖拉 Source, Queue, Processor, Sink元素。并连接元素。在这里插入图片描述
  2. 在Toolbox中,添加一个Global Table;在table的properties中,将表格重命名为ChangeoverTimes,并将行数和列数改为5,添加以下元素。
    在这里插入图片描述
    表格代表由工件i换到工件j时所消耗的时间。
  3. 点击Processor来编辑它的properties,在Setup Time中,从下拉菜单选择From/To Lookup Table。为Table选择ChangeoverTimes/
    在这里插入图片描述
  4. 点击Source来编辑它的Properties,在triggers中为On creation 添加要素,选择Data>set Label and color。将值改为duniform(1,5,getstream(current))。随机产生五种产品。
    在这里插入图片描述
  5. 这个时候保存模型为 ChangeoverTimesRL.fsm。就可以看到模型随机产生物品。
  6. 在Toolbox中加入 Statistics > Model Parameter Table.
  7. 将Parameter1表格重命名为Observations,同理创建一个名为Actions的表格。
  8. 将Observations的Parameter2重命名为LastItemType; 在这一行的value将值设置为整数,上限值为5.

在这里插入图片描述
在这里插入图片描述
9. 将Actions表格中的Parameter3命名为ItemType。将value限定为整数,上限5同样对应5种商品。
在这里插入图片描述

  1. 点击Processor,在Properties中,点击Pull,在Pull Strategy的下拉菜单,选择Pull Best Item 选项。
  2. 在显示出的Label中,选择Custom Value,,并输入
item.Type == Model.parameters["ItemType"].value

在这里插入图片描述
保存模型,运行可以看到红色的优先被拉去。

为模型添加强化学习功能。

  1. 在Toolbox,添加Connectivity > Reinforcement Learning

  2. 在Observation Space中,选择Discrete, 在Observation的参数中,选择LastItem Type;在 Action Space中,选择Discrete,并选择ItemType.
    在这里插入图片描述

  3. 点击Apply,点击Processor,在Setup Time picklist选择Codebutton,可以看到“f_lastlabelval”,之后要用到这个值

  4. 返回强化学习的属性窗口,在On observation 的trigger添加,并选择Code Snippet。将描述文字从Code Snippet改为Set observation parameter。

  5. 将下面代码粘贴到field

Model.parameters["LastItemType"].value = getvarnum(Model.find("Processor1"), "f_lastlabelval");

在这里插入图片描述
6. 返回3D视图,点击sink,在Labels中,添加一个Number label标签,并命名为LastTime
7. 再添加一个标签命名为Reward,勾选Automatically Reset按钮,并保存
8. 在Triggers中添加On Entry,添加Data>Increment Value选项,在Increment的下拉菜单,选择current.labels[“Reward”],在by中输入 10/(Model.time - current.LastTime)
9. 添加一个Data> Set Label选项,Object 选择Current,Label选为“LastTime”,在Value选择Model.time。这就是我们的奖励计算
在这里插入图片描述
10. 返回强化学习的属性,编辑奖励方程,将其命名从 Reward Function 改为 Reward based on throughput。粘贴下面的代码。

double reward = Model.find("Sink1").Reward;
Model.find("Sink1").Reward = 0;
int done = (Model.time > 1000);
return [reward, done];

在这里插入图片描述
11. 在On Request Action中,添加Take a Random Action选择
12. 在Decision Events中添加一个新的event ,选择Pull Strategy选项。
在这里插入图片描述
保存并运行模型。

Python部分

依次,修改并运行下述代码,分别起到测试接口,训练,测试功能。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
红色的部分是要修改的地方。
flexsim_env.py

import gym
import os
import subprocess
import socket
import json
from gym import error, spaces, utils
from gym.utils import seeding
import numpy as np

class FlexSimEnv(gym.Env):
    metadata = {'render.modes': ['human', 'rgb_array', 'ansi']}

    def __init__(self, flexsimPath, modelPath, address='localhost', port=5005, verbose=False, visible=False):
        self.flexsimPath = flexsimPath
        self.modelPath = modelPath
        self.address = address
        self.port = port
        self.verbose = verbose
        self.visible = visible

        self.lastObservation = ""

        self._launch_flexsim()
        
        self.action_space = self._get_action_space()
        self.observation_space = self._get_observation_space()

    def reset(self):
        self._reset_flexsim()
        state, reward, done = self._get_observation()
        return state

    def step(self, action):
        self._take_action(action)
        state, reward, done = self._get_observation()
        info = {}
        return state, reward, done, info

    def render(self, mode='human'):
        if mode == 'rgb_array':
            return np.array([0,0,0])
        elif mode == 'human':
            print(self.lastObservation)
        elif mode == 'ansi':
            return self.lastObservation
        else:
            super(FlexSimEnv, self).render(mode=mode)

    def close(self):
        self._close_flexsim()
        
    def seed(self, seed=None):
        self.seedNum = seed
        return self.seedNum

    
    def _launch_flexsim(self):
        if self.verbose:
            print("Launching " + self.flexsimPath + " " + self.modelPath)

        args = [self.flexsimPath, self.modelPath, "-training", self.address + ':' + str(self.port)]
        if self.visible == False:
            args.append("-maintenance")
            args.append("nogui")
        self.flexsimProcess = subprocess.Popen(args)

        self._socket_init(self.address, self.port)
    
    def _close_flexsim(self):
        self.flexsimProcess.kill()

    def _release_flexsim(self):
        if self.verbose:
            print("Sending StopWaiting message")
        self._socket_send(b"StopWaiting?")

    def _get_action_space(self):
        self._socket_send(b"ActionSpace?")
        if self.verbose:
            print("Waiting for ActionSpace message")
        actionSpaceBytes = self._socket_recv()
        
        return self._convert_to_gym_space(actionSpaceBytes)

    def _get_observation_space(self):
        self._socket_send(b"ObservationSpace?")
        if self.verbose:
            print("Waiting for ObservationSpace message")
        observationSpaceBytes = self._socket_recv()
        
        return self._convert_to_gym_space(observationSpaceBytes)

    def _reset_flexsim(self):
        if self.verbose:
            print("Sending Reset message")
        resetString = "Reset?"
        if hasattr(self, "seedNum"):
            resetString = "Reset:" + str(self.seedNum) + "?"
        self._socket_send(resetString.encode())

    def _get_observation(self):
        if self.verbose:
            print("Waiting for Observation message")
        observationBytes = self._socket_recv()
        self.lastObservation = observationBytes.decode('utf-8')
        state, reward, done = self._convert_to_observation(observationBytes)

        return state, reward, done
    
    def _take_action(self, action):
        actionStr = json.dumps(action, cls=NumpyEncoder)
        if self.verbose:
            print("Sending Action message: " + actionStr)
        actionMessage = "TakeAction:" + actionStr + "?"
        self._socket_send(actionMessage.encode())


    def _socket_init(self, host, port):
        if self.verbose:
            print("Waiting for FlexSim to connect to socket on " + self.address + ":" + str(self.port))

        self.serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.serversocket.bind((host, port))
        self.serversocket.listen();

        (self.clientsocket, self.socketaddress) = self.serversocket.accept()
        if self.verbose:
            print("Socket connected")
        
        if self.verbose:
            print("Waiting for READY message")
        message = self._socket_recv()
        if self.verbose:
            print(message.decode('utf-8'))
        if message != b"READY":
            raise RuntimeError("Did not receive READY! message")

    def _socket_send(self, msg):
        totalsent = 0
        while totalsent < len(msg):
            sent = self.clientsocket.send(msg[totalsent:])
            if sent == 0:
                raise RuntimeError("Socket connection broken")
            totalsent = totalsent + sent

    def _socket_recv(self):
        chunks = []
        while 1:
            chunk = self.clientsocket.recv(2048)
            if chunk == b'':
                raise RuntimeError("Socket connection broken")
            if chunk[-1] == ord('!'):
                chunks.append(chunk[:-1])
                break;
            else:
                chunks.append(chunk)
        return b''.join(chunks)


    def _convert_to_gym_space(self, spaceBytes):
        paramsStartIndex = spaceBytes.index(ord('('))
        paramsEndIndex = spaceBytes.index(ord(')'), paramsStartIndex)
        
        type = spaceBytes[:paramsStartIndex]
        params = json.loads(spaceBytes[paramsStartIndex+1:paramsEndIndex])
        
        if type == b'Discrete':
            return gym.spaces.Discrete(params)
        elif type == b'Box':
            return gym.spaces.Box(np.array(params[0]), np.array(params[1]))
        elif type == b'MultiDiscrete':
            return gym.spaces.MultiDiscrete(params)
        elif type == b'MultiBinary':
            return gym.spaces.MultiBinary(params)

        raise RuntimeError("Could not parse gym space string")

    def _convert_to_observation(self, spaceBytes):
        observation = json.loads(spaceBytes)
        state = observation["state"]
        if isinstance(state, list):
            state = np.array(observation["state"])
        reward = observation["reward"]
        done = (observation["done"] == 1)
        return state, reward, done


class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)


def main():

    env = FlexSimEnv(
        flexsimPath = "C:/Program Files/FlexSim 2022/program/flexsim.exe",
        modelPath = "E:/刘一阳资料/Flexsim/demo/ChangeoverTimesRL.fsm",
        verbose = True,
        visible = True
        )

    for i in range(2):
        env.seed(i)
        observation = env.reset()
        env.render()
        done = False
        rewards = []
        while not done:
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            env.render()
            rewards.append(reward)
            if done:
                cumulative_reward = sum(rewards)
                print("Reward: ", cumulative_reward, "\n")
    env._release_flexsim()
    input("Waiting for input to close FlexSim...")
    env.close()


if __name__ == "__main__":
    main()

flexsim_training.py

import gym
from flexsim_env import FlexSimEnv
from stable_baselines3.common.env_checker import check_env
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env

def main():
    print("Initializing FlexSim environment...")

    # Create a FlexSim OpenAI Gym Environment
    env = FlexSimEnv(
        flexsimPath = "C:/Program Files/FlexSim 2022/program/flexsim.exe",
        modelPath = "E:/刘一阳资料/Flexsim/demo/ChangeoverTimesRL.fsm",
        verbose = False,
        visible = False
        )
    check_env(env) # Check that an environment follows Gym API.

    # Training a baselines3 PPO model in the environment
    model = PPO("MlpPolicy", env, verbose=1)
    print("Training model...")
    model.learn(total_timesteps=50000)
    
    # save the model
    print("Saving model...")
    model.save("ChangeoverTimesModel")

    input("Waiting for input to do some test runs...")

    # Run test episodes using the trained model
    for i in range(4):
        env.seed(i)
        observation = env.reset()
        env.render()
        done = False
        rewards = []
        while not done:
            action, _states = model.predict(observation)
            observation, reward, done, info = env.step(action)
            env.render()
            rewards.append(reward)
            if done:
                cumulative_reward = sum(rewards)
                print("Reward: ", cumulative_reward, "\n")
    env._release_flexsim()
    input("Waiting for input to close FlexSim...")
    env.close()


if __name__ == "__main__":
    main()

import json
from stable_baselines3 import PPO
from http.server import BaseHTTPRequestHandler, HTTPServer
from urllib.parse import urlparse, parse_qs
import numpy as np

class FlexSimInferenceServer(BaseHTTPRequestHandler):

    def do_GET(self):
        params = parse_qs(urlparse(self.path).query)
        self._handle_reply(params)

    def do_POST(self):
        content_length = int(self.headers['Content-Length'])
        body = self.rfile.read(content_length)
        params = parse_qs(body)
        self._handle_reply(params)

    def _handle_reply(self, params):
        if len(params):
            observation = []
            if b'observation' in params.keys():
                observationBytes = params[b'observation'][0]
                observation = np.array(json.loads(observationBytes))
            elif 'observation' in params.keys():
                observationBytes = params['observation'][0]
                observation = np.array(json.loads(observationBytes))
            if isinstance(observation, list):
                observation = np.array(observation)
            action, _states = FlexSimInferenceServer.model.predict(observation)
            self.send_response(200)
            self.send_header("Content-type", "application/json")
            self.end_headers()
            self.wfile.write(bytes(json.dumps(action, cls=NumpyEncoder), "utf-8"))
            return
      
        self.send_response(200)
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(bytes("", "utf-8"))


class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        return json.JSONEncoder.default(self, obj)


def main():
    print("Loading model...")
    model = PPO.load("ChangeoverTimesModel.zip")
    FlexSimInferenceServer.model = model
    
    # Create server object
    print("Starting server...")
    hostName = "localhost"
    serverPort = 8890
    webServer = HTTPServer((hostName, serverPort), FlexSimInferenceServer)
    print("Server started http://%s:%s" % (hostName, serverPort))

    # Start the web server
    try:
        webServer.serve_forever()
    except KeyboardInterrupt:
        pass

    webServer.server_close()
    print("Server stopped.")


if __name__ == "__main__":
    main()

运行到最后一个模型可以得到一个本地的ip接口。
并在模型中修改。在这里插入图片描述
现在再运行就可以了。

  • 5
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值