自动驾驶：LQR、ILQR和DDP原理、公式推导以及代码演示（五迭代条件判断）

机械心

已于 2024-09-11 23:58:32 修改

阅读量619

点赞数 20

分类专栏：最优化理论文章标签：自动驾驶人工智能机器学习

于 2024-09-06 11:33:32 首次发布

本文链接：https://blog.csdn.net/a8598671/article/details/141951119

版权

最优化理论专栏收录该内容

9 篇文章 1 订阅

订阅专栏

（五）iLQR迭代终止判断条件

iLQR 的终止条件可以根据实际需求设置，最常见的是控制输入变化量 $\delta u$ 或成本函数的变化量J来判断是否收敛。合理选择阈值 $\epsilon$ 和最大迭代次数可以在保证收敛的同时避免过度计算。

1. 控制输入的变化量 $\delta u$ 很小

一种常见的收敛条件是，控制输入的更新幅度变得很小，也就是 $\delta u$ 在每次迭代中的变化小于某个设定的阈值。通常定义一个小的阈值 $\epsilon_u$ ，如果在每次迭代中控制输入的变化 $\|\delta u_k\|$ 都小于这个阈值，则认为算法已经收敛：

$\|\delta u_k\| < \epsilon_u \quad \forall k$

这种方法能够确保控制策略的变化逐渐减小，从而表明已经接近最优解。

2. 成本函数 J的变化量很小

另一种方法是检测每次迭代中成本函数的变化。当成本函数 J在两次迭代中的差异小于某个阈值时，迭代停止：

$|J_{\text{new}} - J_{\text{old}}| < \epsilon_J$

这种方式可以保证算法接近最优点，因为成本函数的变化趋J近于零表示已经很难继续优化。

3. 状态轨迹的变化量很小

状态轨迹的变化量也是一个判断收敛的条件。当新的状态轨迹和之前的状态轨迹之间的差异小于某个设定的阈值 $\epsilon_x$ ，可以认为系统已经收敛：

$\|x_k^{\text{new}} - x_k^{\text{old}}\| < \epsilon_x \quad \forall k$

这种条件适用于需要精确逼近状态目标的情况。

4. 固定迭代次数

有时出于计算时间的考虑，iLQR 也可能设置一个最大迭代次数 $N_{\text{max}}$ 。如果算法在达到最大迭代次数之前没有满足其他收敛条件，则强制停止迭代：

$\text{iteration count} > N_{\text{max}}$

5. 控制增量与状态增量同时收敛

一些实现中同时监测控制增量 $\delta u$ 和状态增量 $\delta x$ ，当二者同时满足某个收敛条件时停止迭代。

python代码示例

我们依然用一个简单的非线性系统，其状态 $x_k$ 和控制输入 $u_k$ 满足以下非线性动力学方程：

$x_{k+1} = x_k + \sin(u_k)$

这里的非线性在于控制输入的影响是通过正弦函数进入系统的。我们希望通过控制输入 $u_k$ 使状态 $x_k$ 快速收敛到目标值 0，同时控制量尽可能小，因此二次成本函数：

$J = \sum_{k=0}^{N-1} \left( \frac{1}{2} x_k^2 + \frac{1}{2} u_k^2 \right)$

import numpy as np

# System dynamics (nonlinear)
def system_dynamics(x, u):
    return x + np.sin(u)

# Cost function for a single step
def cost_function(x, u):
    return 0.5 * (x**2 + u**2)

# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):
    return u

# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):
    return x

# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():
    return 1

# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():
    return 1


# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):
    x = np.zeros(N+1)
    x[0] = x0
    for k in range(N):
        x[k+1] = system_dynamics(x[k], u[k])
    return x

# iLQR algorithm with different stopping conditions
def ilqr_with_conditions(x, u, iterations, epsilon_u, epsilon_J, epsilon_x):
    prev_cost = np.inf
    for i in range(iterations):
        # Backward pass
        V_x = np.zeros(N+1)
        V_xx = np.zeros(N+1)
        V_x[-1] = x[-1]  # Terminal value for V_x
        V_xx[-1] = 1  # Terminal value for V_xx (quadratic cost on terminal state)

        du = np.zeros(N)  # Control updates

        # Backward pass: compute Q function and control update
        for k in range(N-1, -1, -1):
            # Compute Q-function terms
            f_u = np.cos(u[k])  # Derivative of system dynamics w.r.t. u
            Q_u = cost_u(u[k]) + f_u * V_x[k+1]  # Q_u = l_u + f_u^T * V_x(k+1)
            Q_uu = cost_uu() + f_u**2 * V_xx[k+1]  # Q_uu = l_uu + f_u^T * V_xx(k+1) * f_u
            Q_x = cost_x(x[k]) + V_x[k+1]  # Q_x = l_x + f_x^T * V_x(k+1)
            Q_xx = cost_xx() + V_xx[k+1]  # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x

            # Update control input
            du[k] = -Q_u / Q_uu  # Control update
            V_x[k] = Q_x + Q_uu * du[k]  # Update value function gradient
            V_xx[k] = Q_xx  # Update value function Hessian (V_xx)

        # Forward pass: update trajectory using the new control inputs
        x_new = np.zeros(N+1)
        u_new = np.zeros(N)
        x_new[0] = x0

        for k in range(N):
            u_new[k] = u[k] + du[k]  # Update control
            x_new[k+1] = system_dynamics(x_new[k], u_new[k])  # Update state

        
        # Compute the total cost for the current trajectory
        current_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)])

        # 1. Stop based on control input change
        if np.max(np.abs(du)) < epsilon_u:
            print(f"Stopped due to control input convergence at iteration {i}")
            break

        # # 2. Stop based on cost function change
        # if np.abs(current_cost - prev_cost) < epsilon_J:
        #     print(f"Stopped due to cost function convergence at iteration {i}")
        #     break

        # # 3. Stop based on state trajectory change
        # if np.max(np.abs(x_new - x)) < epsilon_x:
        #     print(f"Stopped due to state trajectory convergence at iteration {i}")
        #     break

        # Update for next iteration
        x = x_new
        u = u_new
        prev_cost = current_cost

    return x, u, i

if __name__ == "__main__":

    # iLQR parameters
    N = 3  # Number of time steps
    x0 = 1  # Initial state
    iterations = 50  # Maximum number of iterations
    epsilon_u = 1e-3  # Tolerance for control input changes
    epsilon_J = 1e-4  # Tolerance for cost function change
    epsilon_x = 1e-4  # Tolerance for state trajectory change

    # Initialize control sequence and state trajectory
    u = np.zeros(N)  # Initial control sequence
    x = np.zeros(N+1)  # State trajectory
    x[0] = x0


    # Compute initial trajectory
    x_initial = compute_initial_trajectory(x0, u)

    # Run iLQR with stopping conditions
    x_final, u_final, num_iterations = ilqr_with_conditions(x_initial, u, iterations, epsilon_u, epsilon_J, epsilon_x)

    # Output the final results and number of iterations
    print(x_final, u_final, num_iterations)

通过在代码中选择不同的收敛方式，其收敛结果如下：

控制量变化阈值：