(五)iLQR迭代终止判断条件
iLQR 的终止条件可以根据实际需求设置,最常见的是控制输入变化量或成本函数的变化量J来判断是否收敛。合理选择阈值和最大迭代次数可以在保证收敛的同时避免过度计算。
1. 控制输入的变化量很小
一种常见的收敛条件是,控制输入的更新幅度变得很小,也就是在每次迭代中的变化小于某个设定的阈值。通常定义一个小的阈值,如果在每次迭代中控制输入的变化都小于这个阈值,则认为算法已经收敛:
这种方法能够确保控制策略的变化逐渐减小,从而表明已经接近最优解。
2. 成本函数 J的变化量很小
另一种方法是检测每次迭代中成本函数的变化。当成本函数 J在两次迭代中的差异小于某个阈值时,迭代停止:
这种方式可以保证算法接近最优点,因为成本函数的变化趋J近于零表示已经很难继续优化。
3. 状态轨迹的变化量很小
状态轨迹的变化量也是一个判断收敛的条件。当新的状态轨迹和之前的状态轨迹之间的差异小于某个设定的阈值,可以认为系统已经收敛:
这种条件适用于需要精确逼近状态目标的情况。
4. 固定迭代次数
有时出于计算时间的考虑,iLQR 也可能设置一个最大迭代次数。如果算法在达到最大迭代次数之前没有满足其他收敛条件,则强制停止迭代:
5. 控制增量与状态增量同时收敛
一些实现中同时监测控制增量和状态增量,当二者同时满足某个收敛条件时停止迭代。
python代码示例
我们依然用一个简单的非线性系统,其状态和控制输入满足以下非线性动力学方程:
这里的非线性在于控制输入的影响是通过正弦函数进入系统的。我们希望通过控制输入使状态快速收敛到目标值 0,同时控制量尽可能小,因此二次成本函数:
import numpy as np
# System dynamics (nonlinear)
def system_dynamics(x, u):
return x + np.sin(u)
# Cost function for a single step
def cost_function(x, u):
return 0.5 * (x**2 + u**2)
# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):
return u
# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):
return x
# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():
return 1
# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():
return 1
# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):
x = np.zeros(N+1)
x[0] = x0
for k in range(N):
x[k+1] = system_dynamics(x[k], u[k])
return x
# iLQR algorithm with different stopping conditions
def ilqr_with_conditions(x, u, iterations, epsilon_u, epsilon_J, epsilon_x):
prev_cost = np.inf
for i in range(iterations):
# Backward pass
V_x = np.zeros(N+1)
V_xx = np.zeros(N+1)
V_x[-1] = x[-1] # Terminal value for V_x
V_xx[-1] = 1 # Terminal value for V_xx (quadratic cost on terminal state)
du = np.zeros(N) # Control updates
# Backward pass: compute Q function and control update
for k in range(N-1, -1, -1):
# Compute Q-function terms
f_u = np.cos(u[k]) # Derivative of system dynamics w.r.t. u
Q_u = cost_u(u[k]) + f_u * V_x[k+1] # Q_u = l_u + f_u^T * V_x(k+1)
Q_uu = cost_uu() + f_u**2 * V_xx[k+1] # Q_uu = l_uu + f_u^T * V_xx(k+1) * f_u
Q_x = cost_x(x[k]) + V_x[k+1] # Q_x = l_x + f_x^T * V_x(k+1)
Q_xx = cost_xx() + V_xx[k+1] # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x
# Update control input
du[k] = -Q_u / Q_uu # Control update
V_x[k] = Q_x + Q_uu * du[k] # Update value function gradient
V_xx[k] = Q_xx # Update value function Hessian (V_xx)
# Forward pass: update trajectory using the new control inputs
x_new = np.zeros(N+1)
u_new = np.zeros(N)
x_new[0] = x0
for k in range(N):
u_new[k] = u[k] + du[k] # Update control
x_new[k+1] = system_dynamics(x_new[k], u_new[k]) # Update state
# Compute the total cost for the current trajectory
current_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)])
# 1. Stop based on control input change
if np.max(np.abs(du)) < epsilon_u:
print(f"Stopped due to control input convergence at iteration {i}")
break
# # 2. Stop based on cost function change
# if np.abs(current_cost - prev_cost) < epsilon_J:
# print(f"Stopped due to cost function convergence at iteration {i}")
# break
# # 3. Stop based on state trajectory change
# if np.max(np.abs(x_new - x)) < epsilon_x:
# print(f"Stopped due to state trajectory convergence at iteration {i}")
# break
# Update for next iteration
x = x_new
u = u_new
prev_cost = current_cost
return x, u, i
if __name__ == "__main__":
# iLQR parameters
N = 3 # Number of time steps
x0 = 1 # Initial state
iterations = 50 # Maximum number of iterations
epsilon_u = 1e-3 # Tolerance for control input changes
epsilon_J = 1e-4 # Tolerance for cost function change
epsilon_x = 1e-4 # Tolerance for state trajectory change
# Initialize control sequence and state trajectory
u = np.zeros(N) # Initial control sequence
x = np.zeros(N+1) # State trajectory
x[0] = x0
# Compute initial trajectory
x_initial = compute_initial_trajectory(x0, u)
# Run iLQR with stopping conditions
x_final, u_final, num_iterations = ilqr_with_conditions(x_initial, u, iterations, epsilon_u, epsilon_J, epsilon_x)
# Output the final results and number of iterations
print(x_final, u_final, num_iterations)
通过在代码中选择不同的收敛方式,其收敛结果如下:
控制量变化阈值:
Stopped due to control input convergence at iteration 11
[1. 0.44125242 0.18222331 0.0923361 ] [-0.59287488 -0.26201687 -0.0900087 ] 11
成本函数变化阈值:
Stopped due to cost function convergence at iteration 7
[1. 0.44820357 0.18851501 0.08514765] [-0.58451676 -0.26269969 -0.10355232] 7
状态量变化阈值:
Stopped due to state trajectory convergence at iteration 16
[1. 0.44158355 0.18146959 0.09103245] [-0.59247567 -0.26314022 -0.09056088] 16