最优化理论与自动驾驶(八):ILQR正则化和line search

(六)ILQR正则化和line search

1. ILQR正则化

在iLQR中,我们通常线性化系统动力学并对目标函数进行二阶近似。在反向传播步骤中,我们需要计算逆矩阵Q_{uu}(控制变量对目标函数的二阶导数矩阵),用以更新控制增量。但在某些情况下,Q_{uu}可能是奇异的或者接近奇异,导致逆矩阵的计算不稳定。为了避免这种问题,可以通过加入正则化项,使得Q_{uu}变得更具条件性。

方法1:基于梯度下降的正则化

一种常见的正则化方法是在Q_{uu}上加上一个对角项,这类似于梯度下降中的学习率调节。

令正则化矩阵为\lambda I,其中I是单位矩阵,\lambda是正则化系数。我们可以将Q_{uu}调整为:

Q_{uu}^{\text{reg}} = Q_{uu} + \lambda I

通过这种方式,即便Q_{uu}是非正定或接近奇异的,我们仍然可以通过计算 Q_{uu}^{\text{reg}}的逆矩阵来进行控制更新。

方法2:信赖域正则化(Trust-Region Regularization)

另一种正则化方法是基于信赖域(trust region)优化的思想。我们在优化控制增量时限制其大小。具体而言,控制增量\delta u_k的大小受限于一个固定范围:

\|\delta u_k\| \leq \Delta

这种方法通过限制更新步长来确保每一步的更新都不会过大,从而提高优化的鲁棒性。

正则化对算法的影响

收敛性: 适当的正则化能够提高 DDP 收敛的稳定性,但过大的正则化参数\lambda可能会减慢算法的收敛速度。

步长控制: 一些算法中,正则化系数\lambda会根据当前迭代的效果动态调整。若迭代效果较好,减小\lambda以加快收敛;若效果较差,则增大\lambda以确保稳定性。

2. 线搜索(Line Search)

在 iLQR中,线搜索(Line Search)是一种常见的技巧,用于在更新控制序列时进一步提高算法的数值稳定性和收敛性。在 iLQR 的反向传播阶段,通过局部线性化的系统模型和二次近似的目标函数来计算新的控制增量。然而,由于近似并不精确,直接应用所得到的控制更新可能会导致目标函数值的增加,尤其是当系统具有高度非线性时。为了防止目标值恶化,线搜索通过控制步长来确保新的控制输入能带来实际的性能提升。

线搜索的基本步骤

线搜索的核心思想是在一条给定的方向上缩小步长,直到找到一个合适的步长,使得目标函数有所改善。以下是具体的步骤:

  1. 初始控制增量: 通过反向传播步骤得到一个控制增量\delta u_k,假设控制更新为:

    u_k^{\text{new}} = u_k + \alpha \delta u_k

    其中\alpha是步长系数(初始设定为 1)。

  2. 目标函数改善判定: 对于每一个\alpha,我们进行前向传播以计算新的目标函数值J_{\text{new}}​。比较更新前后的目标函数值J_{\text{old}}

    J_{\text{new}} \leq J_{\text{old}}

    如果目标函数值降低,则可以接受当前的步长\alpha并更新控制序列。

  3. 缩小步长: 如果新的控制增量导致目标函数值变差,即J_{\text{new}} > J_{\text{old}}​,则缩小步长\alpha,例如设\alpha为初始值的一半(或其他比例),即\alpha = \beta \alpha \quad,其中\quad 0 < \beta < 1,继续计算目标函数值,直到找到合适的步长。

  4. 重复迭代: 重复前向模拟和目标函数比较,直到找到合适的 α\alphaα 或步长缩小到某个阈值以下。

3. python代码演示

import numpy as np

# System dynamics (nonlinear)
def system_dynamics(x, u):
    return x + np.sin(u)

# Cost function for a single step
def cost_function(x, u):
    return 0.5 * (x**2 + u**2)

# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):
    return u

# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):
    return x

# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():
    return 1

# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():
    return 1

# Terminal cost for the final state
def terminal_cost(x):
    return 0.5 * x**2

# Derivative of terminal cost w.r.t state (l_x at final step)
def terminal_cost_x(x):
    return x

# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):
    x = np.zeros(N+1)
    x[0] = x0
    for k in range(N):
        x[k+1] = system_dynamics(x[k], u[k])
    return x

# iLQR algorithm with gradient descent, trust-region regularization, and line search
def ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size, line_search_steps=10):
    x = compute_initial_trajectory(x0, u)  # Compute initial trajectory
    prev_cost = np.inf
    regularization = 1e-3  # Regularization term for gradient descent (small to avoid overfitting)

    for i in range(iterations):
        # Backward pass
        V_x = np.zeros(N+1)
        V_xx = np.zeros(N+1)
        V_x[-1] = terminal_cost_x(x[-1])  # Terminal value for V_x
        V_xx[-1] = 1  # Terminal value for V_xx (quadratic cost on terminal state)

        du = np.zeros(N)  # Control updates

        # Backward pass: compute Q function and control update
        for k in range(N-1, -1, -1):
            # Compute Q-function terms
            f_u = np.cos(u[k])  # Derivative of system dynamics w.r.t. u
            Q_u = cost_u(u[k]) + f_u * V_x[k+1]  # Q_u = l_u + f_u^T * V_x(k+1)
            Q_uu = cost_uu() + f_u**2 * V_xx[k+1] + regularization  # Add regularization to avoid singularity
            Q_x = cost_x(x[k]) + V_x[k+1]  # Q_x = l_x + f_x^T * V_x(k+1)
            Q_xx = cost_xx() + V_xx[k+1]  # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x

            # Update control input with trust-region regularization
            step = -Q_u / Q_uu  # Compute step size for control update
            step = np.clip(step, -max_step_size, max_step_size)  # Apply trust-region constraint (step size regularization)
            du[k] = step
            V_x[k] = Q_x + Q_uu * du[k]  # Update value function gradient
            V_xx[k] = Q_xx  # Update value function Hessian (V_xx)

        # Forward pass: update trajectory using the new control inputs and line search
        x_new = np.zeros(N+1)
        u_new = np.zeros(N)
        x_new[0] = x0

        alpha = 1.0  # Starting step size for line search
        for search_step in range(line_search_steps):
            for k in range(N):
                u_new[k] = u[k] + alpha * du[k]  # Update control with line search step
                x_new[k+1] = system_dynamics(x_new[k], u_new[k])  # Update state

            # Compute the total cost for the current trajectory
            current_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)]) + terminal_cost(x_new[-1])

            # Check if this step size improves the cost
            if current_cost < prev_cost:
                break
            alpha *= 0.5  # Reduce step size if no improvement

        # 1. Stop based on control input change
        if np.max(np.abs(du)) < epsilon_u:
            print(f"Stopped due to control input convergence at iteration {i}")
            break

        # 2. Stop based on cost function change (relative)
        if np.abs(prev_cost - current_cost) / np.abs(prev_cost) < epsilon_J:
            print(f"Stopped due to cost function convergence at iteration {i}")
            break

        # 3. Stop based on state trajectory change
        if np.max(np.abs(x_new - x)) < epsilon_x:
            print(f"Stopped due to state trajectory convergence at iteration {i}")
            break

        # Update for next iteration
        x = x_new
        u = u_new
        prev_cost = current_cost

    else:
        print("Reached maximum number of iterations")

    return x, u, i

if __name__ == "__main__":

    # iLQR parameters
    N = 3  # Number of time steps
    x0 = 1  # Initial state
    iterations = 50  # Maximum number of iterations
    epsilon_u = 1e-3  # Tolerance for control input changes
    epsilon_J = 1e-4  # Tolerance for cost function change (relative)
    epsilon_x = 1e-4  # Tolerance for state trajectory change
    max_step_size = 1.0  # Trust region step size for control updates

    # Initialize control sequence and state trajectory
    u = np.zeros(N)  # Initial control sequence
    x = np.zeros(N+1)  # State trajectory
    x[0] = x0

    # Compute initial trajectory
    x_initial = compute_initial_trajectory(x0, u)

    # Run iLQR with stopping conditions
    x_final, u_final, num_iterations = ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size)

    # Output the final results and number of iterations
    print("Final state trajectory: ", x_final)
    print("Final control inputs: ", u_final)
    print("Total iterations: ", num_iterations)

输出结果如下:

Stopped due to cost function convergence at iteration 8
Final state trajectory:  [1.         0.44370578 0.17737126 0.0834065 ]
Final control inputs:  [-0.58991961 -0.26958819 -0.09410359]
Total iterations:  8

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值