【路径规划】基于贝叶斯网络实现仓储机器人巡逻路径规划附matlab代码

1 简介

Introduction

In a search and rescue scenario, a robot must navigate through a disaster area, identifying survivors to be rescued while preserving its own safety. Many tradeoffs must be calculated, such as whether to exploit a known area or explore a new area. Acquired knowledge about the safety levels of certain areas, likelihood of finding survivors, and urgency of attention for each survivor are all factors that must account into the route planning of the robot.

This applied scenario is essentially a foraging task in which beliefs about foraging sites and safety of areas can be modeled by Bayesian inference methods. In this project, I simulate small environments with different arrangements and configurations of foraging sites, and observe the effects of explorative and exploitative agents searching the environment.

Methods

The project was completed in two stages. The first focuses on getting the agent to effectively learn the changing danger levels in an environment. The second stage adds on the element of foraging.

Experiment 1: Inferring the safety levels of different regions of the map

Figure 1

As seen in figure 1, the environment is simulated as a square 2D grid. Gray squares indicate off-limit areas, while any other colored area indicates a part of the valid path. Squares of increasing redness indicate the danger level of the area, a value that ranges 1-100. A random destination, marked by a star, is set for the agent, marked by a dot, who first starts out in the home square, marked by a dark outline and star pattern. The left side of the figure represents the true state of the world, whereas the right side represents the agent’s beliefs of the world. As we see in the initial maps, the agent starts out with no knowledge of the dangers on the map, and is already aware of the entire physical layout and able to calculate multiple routes to reach the assigned destination.

Each step in the simulation has a defined sequence of events. First, all possible routes from the current location to the destination are calculated. Then, the accumulated danger of each route is found by summing the danger values of each square involved in the route. The agent takes a weighted sum of the distance and accumulated danger of each route and chooses the ideal route to take. It takes one step along this preferred route. In the case of this small map, the danger rating is multiplied by .1 to achieve a reasonable effect, such that a path of length 3 containing a danger spot of rating 40 would amount to a route desirability score of 7. In the new location, the agent samples the area for danger and updates its beliefs about the danger rating of that square. It also does the same for all neighboring locations of the current square. If the destination is now reached, a new random destination is assigned, and the process starts over. The sequence of random destination points allows the agent to roam the entire environment for our observation.

For each square, the agent has a belief value of the danger level of the square, which can range from 1 to 100. The prior belief of this value is represented as a normal probability distribution the danger levels as μ parameters.

Belief of μ is updated by Bayes rule using maximum a posteriori estimate. The prior is represented as a normal probability distribution as in the following equation. Initially μ is set to 0 and σ to 1.

Likelihood is also represented as a normal distribution, with x as a new observation of danger rating.

Hypotheses about μ are updated using Bayes rule.

To calculate the maximum a posteriori estimate, we note that the likelihood is set to a distribution of N(μ,1) while the prior has a distribution of N(μ0 ,σ0). The product of two normal distributions results in another normal distribution with parameters μ and σ as

Thus, the mode of such a distribution is simply a calculation of μ. We can then plug in values to determine the updated belief. Setting σ0 to 1 for simplicity and n to 1 (a memoryless strategy), we end up with a simple averaging of the old and new belief of danger.

Figure 2

With this simple model of averaging, the agent is able to effectively learn the danger ratings of each area. Figure 2 shows the exploring agent at the initial stages of exploration. It has learned the danger ratings of a few areas.

Figure 3

After more time has passed, the entire map is learned.

Figure 4

The agent can adapt to new dangers in the environment. The addition of a dangerous spot in the center is immediately detected, and will eventually be completely learned.

Figure 5

The agent also properly strategizes its route. In Figure 5, there are 2 reasonably short routes from the current location to the destination. After the agent has properly learned the danger mapping, it will take the slightly longer route in order to avoid the crossing the highly dangerous red strip. We can see that it effectively balances between distance and safety.

Experiment 2: Collecting resources on the map

Figure 6

The next step is to add a foraging component to the search and rescue. The stars on the map now represent resources, with darker shades of blue representing higher reward values from a range of 1-100.  Similar to the safety levels, the reward levels are sampled when the agent crosses a resource in its path.

Figure 7

The agent’s new goal is to pick its own new destinations, whether it is to go to a new foraging site or to return home. The agent now has a health value that gets decremented by 1 for distance and further decremented by a sampling of the danger in that area. If health reaches below zero, the agent dies and starts over at the home square, with no recollection of past observances. The agent can only restore health by returning to the home square, but must also strive to maximize foraged rewards. An exploration parameter is also added.  This parameter directly influences 3 things: the prior belief of the resources having high reward, the minimum health buffer that the agent maintains before heading home, and the selection of destination. An explorative agent will simply pick a random resource to visit, whereas and exploitative agent will pick one that it has seen to provide the greatest reward.

Rather than a normal distribution, the rewards are sampled along an exponential distribution to ensure that all values are positive. The goal of the agent is then to learn the decay rate of λ for the exponential distribution of each square. For ease of computation, the conjugate prior of a gamma distribution is used to represent prior belief of λ.

The likelihood is represented by an exponential distribution

Hypotheses of gamma are updated using Bayes rule

Noting that our prior has a distribution of Gamma(α,β) and our likelihood has distribution Exponential(λ), the conjugate pair of distributions results in a posterior of Gamma as well, with α and β calculated as

 .

The mode of a Gamma distribution is

.

Therefore, plugging in the α and β of the posterior into the mode equation results in the following:

Exploration is a value ranging (0,1) that affects 3 parts of the decision-making process:

1)The initial value of α in the belief. A highly explorative agent will have an α that affects the shape of the distribution such that the mean is very high. The agent thinks all foraging locations are very promising.

2)The safety net value of health that the agent maintains before deciding to head home

3)Picking a destination. A random number is chosen in the range (0,1). If it is less than the exploration value, the agent will pick a random foraging site to head towards rather than just picking a site it knows is good.

Results

For each map, 5 conditions of exploration parameter values are tested for a fixed and equal amount of time (50 steps). Then, for each map, the number of deaths occurred and reward earned in each life is recorded down.

Map 1 is designed for the benefit of a more explorative agent. Foraging spots of very high value are hidden within dangerous areas that an explorative agent would be more likely to reach. Map 2 is designed more to the benefit of an exploitative agent, as the hidden rewards are of very low value. Map 3 is somewhere in between 1 and 2, with some hidden foraging spots worth a lot and some worth a little.

Map 1: Environment beneficial for exploration

Map 2: Environment beneficial for exploitation

2 部分代码

<span style="color:#333333"><span style="background-color:rgba(0, 0, 0, 0.03)"><code>%Route Planning</code><code>%Iteration <span style="color:#0e9ce5">1</span>: Only learns the safety level of <span style="color:#ca7d37">each</span> location</code><code>%Runs route planning simulation</code><code>​</code><code>%Parameters: sigma, safety_wt, exploration?, exploitation?</code><code>%Variables: real grid, belief grid, location, destination</code><code>%0:neutral, ><span style="color:#0e9ce5">0</span>:safe, <<span style="color:#0e9ce5">0</span>:unsafe, NaN:<span style="color:#ca7d37">not</span> a valid location</code><code>​</code><code>function bayesianNavigation()</code><code>    clear all;</code><code>    </code><code>    %ui functions <span style="color:#ca7d37">and</span> variables</code><code>    speed = .<span style="color:#0e9ce5">2</span>;</code><code>    quit = false;</code><code>    </code><code>    function set_real_grid(points)</code><code>        points = round(points);</code><code>        real_grid(points(<span style="color:#0e9ce5">1</span>,<span style="color:#0e9ce5">2</span>),points(<span style="color:#0e9ce5">1</span>,<span style="color:#0e9ce5">1</span>))=str2double(get(danger_edit,<span style="color:#dd1144">'String'</span>));</code><code>    end</code><code>    function set_speed(sp)</code><code>        speed = sp;</code><code>    end</code><code>    function qt()</code><code>        quit = true;</code><code>    end</code><code>​</code><code>    %Set up figure/UI</code><code>    f = figure(<span style="color:#dd1144">'Position'</span>, [<span style="color:#0e9ce5">100</span>, <span style="color:#0e9ce5">100</span>, <span style="color:#0e9ce5">1200</span>, <span style="color:#0e9ce5">500</span>]);</code><code>    danger_edit = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'edit'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">60</span>,<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">40</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'String'</span>,<span style="color:#dd1144">'0'</span>);</code><code>    danger_text = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'text'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">10</span>,<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">40</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'String'</span>,<span style="color:#dd1144">'Danger Rating:'</span>);</code><code>    qt_ui = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'pushbutton'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">110</span>,<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">40</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'String'</span>,<span style="color:#dd1144">'Quit'</span>,<span style="color:#dd1144">'Callback'</span>,@(~,~)qt());</code><code>    spd_ui = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'slider'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">170</span>,<span style="color:#0e9ce5">473</span>,<span style="color:#0e9ce5">200</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'Min'</span>,<span style="color:#0e9ce5">0</span>,<span style="color:#dd1144">'Max'</span>,<span style="color:#0e9ce5">10</span>,<span style="color:#dd1144">'SliderStep'</span>,[<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span>]./<span style="color:#0e9ce5">10</span>,...</code><code>        <span style="color:#dd1144">'Value'</span>,<span style="color:#0e9ce5">5</span>,<span style="color:#dd1144">'Callback'</span>,@(h,~)set_speed(get(h,<span style="color:#dd1144">'Value'</span>)));</code><code>    health_text = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'text'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">425</span>,<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">40</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'String'</span>,<span style="color:#dd1144">'Health: 1000'</span>);</code><code>    reward_text = uicontrol(<span style="color:#dd1144">'Parent'</span>,f,<span style="color:#dd1144">'Style'</span>,<span style="color:#dd1144">'text'</span>,<span style="color:#dd1144">'Position'</span>,[<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">475</span>,<span style="color:#0e9ce5">40</span>,<span style="color:#0e9ce5">23</span>],<span style="color:#dd1144">'String'</span>,<span style="color:#dd1144">'Reward: 30'</span>);</code><code>     </code><code>    %Set parameters</code><code>    sigma = <span style="color:#0e9ce5">1</span>;</code><code>    danger_wt = .<span style="color:#0e9ce5">5</span>;</code><code>​</code><code>    %Set real <span style="color:#ca7d37">and</span> belief grid</code><code>    %real_grid = [<span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">50</span>; <span style="color:#0e9ce5">0</span> NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>];</code><code>    %belief_grid = [<span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>];</code><code>    real_grid = [<span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">50</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> NaN NaN NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">20</span>; <span style="color:#0e9ce5">0</span> NaN NaN NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">50</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">30</span>];</code><code>    belief_grid = [<span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> NaN NaN NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> NaN NaN NaN <span style="color:#0e9ce5">0</span>; <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span> <span style="color:#0e9ce5">0</span>];</code><code>    %Set home, real resources, belief resources</code><code>    home = [<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span>];</code><code>    real_resources = [<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">0</span>];</code><code>    belief_resources = [<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">0</span>];</code><code>    %Set health counter</code><code>    health_counter = <span style="color:#0e9ce5">1000</span>;</code><code>    %Set reward counter</code><code>    reward_counter = <span style="color:#0e9ce5">0</span>;</code><code>    %Set current location <span style="color:#ca7d37">and</span> destination</code><code>    %curr_loc = [<span style="color:#0e9ce5">3</span> <span style="color:#0e9ce5">3</span>];</code><code>    %dest_loc = [<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span>];</code><code>    curr_loc = [<span style="color:#0e9ce5">1</span> <span style="color:#0e9ce5">1</span>];</code><code>    dest_loc = [<span style="color:#0e9ce5">5</span> <span style="color:#0e9ce5">5</span>];</code><code>​</code><code>    %Loop through <span style="color:#ca7d37">time</span> <span style="color:#ca7d37">until</span> whenever</code><code>    <span style="color:#ca7d37">while</span> true</code><code>        <span style="color:#ca7d37">if</span> quit</code><code>            <span style="color:#ca7d37">break</span></code><code>        end</code><code>        %plot <span style="color:#ca7d37">and</span> <span style="color:#ca7d37">wait</span></code><code>        s1 = subplot(<span style="color:#0e9ce5">1</span>,<span style="color:#0e9ce5">2</span>,<span style="color:#0e9ce5">1</span>);</code><code>        imgsc = plot_grid(real_grid,curr_loc,dest_loc,home,real_resources);</code><code>        ax = imgca;</code><code>        set(imgsc,<span style="color:#dd1144">'ButtonDownFcn'</span>,@(~,~)set_real_grid(get(ax,<span style="color:#dd1144">'CurrentPoint'</span>)),<span style="color:#dd1144">'HitTest'</span>,<span style="color:#dd1144">'on'</span>)</code><code>        s2 = subplot(<span style="color:#0e9ce5">1</span>,<span style="color:#0e9ce5">2</span>,<span style="color:#0e9ce5">2</span>);</code><code>        plot_grid(belief_grid, curr_loc, dest_loc, home, belief_resources);</code><code>        pause(<span style="color:#0e9ce5">1</span>-speed/<span style="color:#0e9ce5">11</span>);</code><code>​</code><code>        %check whether goal has been reached</code><code>        <span style="color:#ca7d37">if</span> isequal(curr_loc,dest_loc)</code><code>            %compare journey with optimal journey <span style="color:#ca7d37">and</span> record results</code><code>            %set new dest</code><code>            dest_loc = new_dest(real_grid);</code><code>        <span style="color:#ca7d37">else</span></code><code>            %make safety observation <span style="color:#ca7d37">and</span> update belief grid</code><code>            observation = randn + real_grid(curr_loc(<span style="color:#0e9ce5">1</span>),curr_loc(<span style="color:#0e9ce5">2</span>));</code><code>            curr_belief = belief_grid(curr_loc(<span style="color:#0e9ce5">1</span>),curr_loc(<span style="color:#0e9ce5">2</span>));</code><code>            belief_grid(curr_loc(<span style="color:#0e9ce5">1</span>),curr_loc(<span style="color:#0e9ce5">2</span>)) = (curr_belief+observation)/<span style="color:#0e9ce5">2</span>; %assume sigma = <span style="color:#0e9ce5">1</span></code><code>            %make safety observation <span style="color:#ca7d37">and</span> update belief grid <span style="color:#ca7d37">for</span> neighbors</code><code>            [ neighbors ] = find_neighbors(curr_loc, belief_grid);</code><code>            <span style="color:#ca7d37">for</span> i = <span style="color:#0e9ce5">1</span>:size(neighbors,<span style="color:#0e9ce5">1</span>)</code><code>                observation = randn + real_grid(neighbors(i,<span style="color:#0e9ce5">1</span>),neighbors(i,<span style="color:#0e9ce5">2</span>));</code><code>                curr_belief = belief_grid(neighbors(i,<span style="color:#0e9ce5">1</span>),neighbors(i,<span style="color:#0e9ce5">2</span>));</code><code>                belief_grid(neighbors(i,<span style="color:#0e9ce5">1</span>),neighbors(i,<span style="color:#0e9ce5">2</span>)) = (curr_belief+observation)/<span style="color:#0e9ce5">2</span>; %assume sigma = <span style="color:#0e9ce5">1</span></code><code>            end</code><code>            %take resource <span style="color:#ca7d37">and</span> update belief_resources</code><code>            %update health counter</code><code>            %calculate routes</code><code>            routes = calculate_routes(curr_loc,dest_loc,belief_grid);</code><code>            %calculate safety belief of <span style="color:#ca7d37">each</span> route</code><code>            danger_ratings = zeros(<span style="color:#0e9ce5">1</span>,size(routes,<span style="color:#0e9ce5">1</span>));</code><code>            <span style="color:#ca7d37">for</span> i =<span style="color:#0e9ce5">1</span>:size(routes,<span style="color:#0e9ce5">1</span>)</code><code>                route = routes<span style="color:#dd1144">{i}</span>;</code><code>                idx = sub2ind(size(belief_grid), route(:,<span style="color:#0e9ce5">1</span>), route(:,<span style="color:#0e9ce5">2</span>));</code><code>                danger_ratings(i) = sum(belief_grid(idx));</code><code>            end</code><code>            %calculate distance of <span style="color:#ca7d37">each</span> route</code><code>            distances = zeros(<span style="color:#0e9ce5">1</span>,size(routes,<span style="color:#0e9ce5">1</span>));</code><code>            <span style="color:#ca7d37">for</span> i =<span style="color:#0e9ce5">1</span>:size(routes,<span style="color:#0e9ce5">1</span>)</code><code>                route = routes<span style="color:#dd1144">{i}</span>;</code><code>                distances(i) = size(route,<span style="color:#0e9ce5">1</span>);</code><code>            end</code><code>            %calculate desired route based on safety <span style="color:#ca7d37">and</span> distance</code><code>            desirability = danger_wt .* danger_ratings + distances;</code><code>            [b,ind] = <span style="color:#ca7d37">sort</span>(desirability,<span style="color:#dd1144">'ascend'</span>);</code><code>            %take a step in the desired direction</code><code>            curr_loc = routes{ind(<span style="color:#0e9ce5">1</span>)}(<span style="color:#0e9ce5">2</span>,:);</code><code>        end</code><code>    end</code><code>​</code><code>%Plot difference in safety/efficiency of actual vs optimal route taken as a function of <span style="color:#ca7d37">time</span></code><code>end</code></span></span>

3 仿真结果

4 参考文献

[1]毛蕊蕊. 基于贝叶斯网络的最佳交通路径规划[D]. 西安工程大学.

博主简介:擅长智能优化算法、神经网络预测、信号处理、元胞自动机、图像处理、路径规划、无人机等多种领域的Matlab仿真,相关matlab代码问题可私信交流。

部分理论引用网络文献,若有侵权联系博主删除。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

matlab科研助手

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值