自动驾驶:LQR、ILQR和DDP原理、公式推导以及代码演示(五 迭代条件判断)

(五)iLQR迭代终止判断条件

iLQR 的终止条件可以根据实际需求设置,最常见的是控制输入变化量\delta u或成本函数的变化量J来判断是否收敛。合理选择阈值\epsilon和最大迭代次数可以在保证收敛的同时避免过度计算。

1. 控制输入的变化量\delta u很小

一种常见的收敛条件是,控制输入的更新幅度变得很小,也就是\delta u在每次迭代中的变化小于某个设定的阈值。通常定义一个小的阈值\epsilon_u,如果在每次迭代中控制输入的变化\|\delta u_k\|都小于这个阈值,则认为算法已经收敛:

\|\delta u_k\| < \epsilon_u \quad \forall k

这种方法能够确保控制策略的变化逐渐减小,从而表明已经接近最优解。

2. 成本函数 J的变化量很小

另一种方法是检测每次迭代中成本函数的变化。当成本函数 J在两次迭代中的差异小于某个阈值时,迭代停止:

|J_{\text{new}} - J_{\text{old}}| < \epsilon_J

这种方式可以保证算法接近最优点,因为成本函数的变化趋J近于零表示已经很难继续优化。

3. 状态轨迹的变化量很小

状态轨迹的变化量也是一个判断收敛的条件。当新的状态轨迹和之前的状态轨迹之间的差异小于某个设定的阈值\epsilon_x,可以认为系统已经收敛:

\|x_k^{\text{new}} - x_k^{\text{old}}\| < \epsilon_x \quad \forall k

这种条件适用于需要精确逼近状态目标的情况。

4. 固定迭代次数

有时出于计算时间的考虑,iLQR 也可能设置一个最大迭代次数N_{\text{max}}。如果算法在达到最大迭代次数之前没有满足其他收敛条件,则强制停止迭代:

\text{iteration count} > N_{\text{max}}

5. 控制增量与状态增量同时收敛

一些实现中同时监测控制增量\delta u和状态增量\delta x,当二者同时满足某个收敛条件时停止迭代。

python代码示例

我们依然用一个简单的非线性系统,其状态x_k和控制输入u_k满足以下非线性动力学方程:

x_{k+1} = x_k + \sin(u_k)

这里的非线性在于控制输入的影响是通过正弦函数进入系统的。我们希望通过控制输入u_k使状态x_k快速收敛到目标值 0,同时控制量尽可能小,因此二次成本函数:

J = \sum_{k=0}^{N-1} \left( \frac{1}{2} x_k^2 + \frac{1}{2} u_k^2 \right)

import numpy as np

# System dynamics (nonlinear)
def system_dynamics(x, u):
    return x + np.sin(u)

# Cost function for a single step
def cost_function(x, u):
    return 0.5 * (x**2 + u**2)

# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):
    return u

# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):
    return x

# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():
    return 1

# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():
    return 1


# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):
    x = np.zeros(N+1)
    x[0] = x0
    for k in range(N):
        x[k+1] = system_dynamics(x[k], u[k])
    return x

# iLQR algorithm with different stopping conditions
def ilqr_with_conditions(x, u, iterations, epsilon_u, epsilon_J, epsilon_x):
    prev_cost = np.inf
    for i in range(iterations):
        # Backward pass
        V_x = np.zeros(N+1)
        V_xx = np.zeros(N+1)
        V_x[-1] = x[-1]  # Terminal value for V_x
        V_xx[-1] = 1  # Terminal value for V_xx (quadratic cost on terminal state)

        du = np.zeros(N)  # Control updates

        # Backward pass: compute Q function and control update
        for k in range(N-1, -1, -1):
            # Compute Q-function terms
            f_u = np.cos(u[k])  # Derivative of system dynamics w.r.t. u
            Q_u = cost_u(u[k]) + f_u * V_x[k+1]  # Q_u = l_u + f_u^T * V_x(k+1)
            Q_uu = cost_uu() + f_u**2 * V_xx[k+1]  # Q_uu = l_uu + f_u^T * V_xx(k+1) * f_u
            Q_x = cost_x(x[k]) + V_x[k+1]  # Q_x = l_x + f_x^T * V_x(k+1)
            Q_xx = cost_xx() + V_xx[k+1]  # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x

            # Update control input
            du[k] = -Q_u / Q_uu  # Control update
            V_x[k] = Q_x + Q_uu * du[k]  # Update value function gradient
            V_xx[k] = Q_xx  # Update value function Hessian (V_xx)

        # Forward pass: update trajectory using the new control inputs
        x_new = np.zeros(N+1)
        u_new = np.zeros(N)
        x_new[0] = x0

        for k in range(N):
            u_new[k] = u[k] + du[k]  # Update control
            x_new[k+1] = system_dynamics(x_new[k], u_new[k])  # Update state

        
        # Compute the total cost for the current trajectory
        current_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)])

        # 1. Stop based on control input change
        if np.max(np.abs(du)) < epsilon_u:
            print(f"Stopped due to control input convergence at iteration {i}")
            break

        # # 2. Stop based on cost function change
        # if np.abs(current_cost - prev_cost) < epsilon_J:
        #     print(f"Stopped due to cost function convergence at iteration {i}")
        #     break

        # # 3. Stop based on state trajectory change
        # if np.max(np.abs(x_new - x)) < epsilon_x:
        #     print(f"Stopped due to state trajectory convergence at iteration {i}")
        #     break

        # Update for next iteration
        x = x_new
        u = u_new
        prev_cost = current_cost

    return x, u, i

if __name__ == "__main__":

    # iLQR parameters
    N = 3  # Number of time steps
    x0 = 1  # Initial state
    iterations = 50  # Maximum number of iterations
    epsilon_u = 1e-3  # Tolerance for control input changes
    epsilon_J = 1e-4  # Tolerance for cost function change
    epsilon_x = 1e-4  # Tolerance for state trajectory change

    # Initialize control sequence and state trajectory
    u = np.zeros(N)  # Initial control sequence
    x = np.zeros(N+1)  # State trajectory
    x[0] = x0


    # Compute initial trajectory
    x_initial = compute_initial_trajectory(x0, u)

    # Run iLQR with stopping conditions
    x_final, u_final, num_iterations = ilqr_with_conditions(x_initial, u, iterations, epsilon_u, epsilon_J, epsilon_x)

    # Output the final results and number of iterations
    print(x_final, u_final, num_iterations)

通过在代码中选择不同的收敛方式,其收敛结果如下:

控制量变化阈值:

Stopped due to control input convergence at iteration 11
[1.         0.44125242 0.18222331 0.0923361 ] [-0.59287488 -0.26201687 -0.0900087 ] 11

成本函数变化阈值:

Stopped due to cost function convergence at iteration 7
[1.         0.44820357 0.18851501 0.08514765] [-0.58451676 -0.26269969 -0.10355232] 7

状态量变化阈值:

Stopped due to state trajectory convergence at iteration 16
[1.         0.44158355 0.18146959 0.09103245] [-0.59247567 -0.26314022 -0.09056088] 16

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值