Matlab实现自适应动态规划多层神经网络的算例汇总

使用MATLAB实现自适应动态规划(ADP)多层神经网络的算例,包括扭摆系统、仿射非线性算例以及“质量-弹簧-阻尼”系统。

  1. 扭摆系统 (torsional pendulum system)

    文献出处:

    【1】Liu D , Wei Q . Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems[J]. IEEE Trans Neural Netw Learn Syst, 2014, 25(3):621-634.

    【2】Mu C , Wang D , He H . Novel iterative neural dynamic programming for data-based approximate optimal control design[J]. Automatica, 2017, 81:240-252.

Dynamic:
{ d θ d t = ω J d ω d t = u − M g l sin ⁡ θ − f d d θ d t \left\{\begin{array}{l} \frac{d \theta}{d t}=\omega \\ J \frac{d \omega}{d t}=u-M g l \sin \theta-f_{d} \frac{d \theta}{d t} \end{array}\right. {dtdθ=ωJdtdω=uMglsinθfddtdθ
where M = 1 / 3 k g M=1 / 3 \mathrm{kg} M=1/3kg and l = 2 / 3 m l=2 / 3 \mathrm{m} l=2/3m are the mass and length of the pendulum bar, respectively. The system states are the current angle θ \theta θ and the angular velocity ω . \omega . ω. Let J = 4 / 3 M l 2 J=4 / 3 M l^{2} J=4/3Ml2 and f d = 0.2 f_{d}=0.2 fd=0.2 be the rotary inertia and frictional factor, respectively. Let g = 9.8 m / s 2 g=9.8 \mathrm{m} / \mathrm{s}^{2} g=9.8m/s2 be the gravity. Discretization of the system function and performance index function using Euler and trapezoidal methods with the sampling interval Δ t = 0.1 s \Delta t=0.1 \mathrm{s} Δt=0.1s leads to
[ x 1 ( k + 1 ) x 2 ( k + 1 ) ] = [ 0.1 x 2 k + x 1 k − 0.49 × sin ⁡ ( x 1 k ) − 0.1 × f d × x 2 k + x 2 k ] + [ 0 0.1 ] u k \begin{array}{r} {\left[\begin{array}{c} x_{1(k+1)} \\ x_{2(k+1)} \end{array}\right]=\left[\begin{array}{c} 0.1 x_{2 k}+x_{1 k} \\ -0.49 \times \sin \left(x_{1 k}\right)-0.1 \times f_{d} \times x_{2 k}+x_{2 k} \end{array}\right]} +\left[\begin{array}{c} 0 \\ 0.1 \end{array}\right] u_{k} \end{array} [x1(k+1)x2(k+1)]=[0.1x2k+x1k0.49×sin(x1k)0.1×fd×x2k+x2k]+[00.1]uk
或者
x t + 1 = [ x 1 t + 0.1 x 2 t 0.2 ( − 0.49 sin ⁡ ( x 1 t ) − 0.2 x 2 t + x 2 t ) ] + [ 0 0.02 ] u t x_{t+1}=\left[\begin{array}{c} x_{1 t}+0.1 x_{2 t} \\ 0.2\left(-0.49 \sin \left(x_{1 t}\right)-0.2 x_{2 t}+x_{2 t}\right) \end{array}\right]+\left[\begin{array}{c} 0 \\ 0.02 \end{array}\right] u_{t} xt+1=[x1t+0.1x2t0.2(0.49sin(x1t)0.2x2t+x2t)]+[00.02]ut
The initial state is $ x_{0}=[1,-1]^{T}$

仿真结果:ResultsCollation1.m

  1. 非线性算例

    文献出处:

    【1】Wang F Y , Jin N , Liu D , et al. Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound.[J]. IEEE Trans Neural Netw, 2011, 22(1):24-36.

    【2】Zhang H , Wei Q , Luo Y . A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm[J]. IEEE Transactions on Systems Man & Cybernetics Part B, 2008, 38(4):937-942.

    【3】Liu D , Wei Q . Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems[J]. IEEE Trans Neural Netw Learn Syst, 2014, 25(3):621-634.

    We consider the following nonlinear system:
    x k + 1 = f ( x k ) + g ( x k ) u k \begin{align*} x_{k+1}=f\left(x_{k}\right)+g\left(x_{k}\right) u_{k} \end{align*} xk+1=f(xk)+g(xk)uk
    variables, respectively. The system functions are given as
    f ( x k ) = [ 0.2 x 1 k exp ⁡ ( x 2 k 2 ) 0.3 x 2 k 3 ] , g ( x k ) = [ 0 − 0.2 ] \begin{align*} f\left(x_{k}\right)=\left[\begin{array}{c} 0.2 x_{1 k} \exp \left(x_{2 k}^{2}\right) \\ 0.3 x_{2 k}^{3} \end{array}\right], \quad g\left(x_{k}\right)=\left[\begin{array}{c} 0 \\ -0.2 \end{array}\right] \end{align*} f(xk)=[0.2x1kexp(x2k2)0.3x2k3],g(xk)=[00.2]
    The initial state is $ x_{0}=[2,-1]^{T}$

    仿真结果:ResultsCollation2.m


3. “质量-弹簧-阻尼”系统(Mass-Spring-Damper System)

文献出处:

Winston Alexander Baker. Observer incorporated neoclassical controller design: A discrete perspective[J]. Dissertations & Theses - Gradworks, 2010.
[ x 1 ( k + 1 ) x 2 ( k + 1 ) ] = [ 0.0099 x 2 k + 0.9996 x 1 k − 0.0887 x 1 k + 0.97 x 2 k ] + [ 0 0.0099 ] u ( k ) \left[\begin{array}{l} x_{1}(k+1) \\ x_{2}(k+1) \end{array}\right]=\left[\begin{array}{c} 0.0099 x_{2 k}+0.9996 x_{1 k} \\ -0.0887 x_{1 k}+0.97 x_{2 k} \end{array}\right]+\left[\begin{array}{c} 0 \\ 0.0099 \end{array}\right] u(k) [x1(k+1)x2(k+1)]=[0.0099x2k+0.9996x1k0.0887x1k+0.97x2k]+[00.0099]u(k)
The initial state vector is set as x 0 = [ − 1 , 1 ] T x_{0}=[-1,1]^{T} x0=[1,1]T.

仿真结果:ResultsCollation3.m


程序有偿,需要代码可私信

评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值