凸函数（Convex functions）-- Part 1

xy_optics

已于 2024-10-28 12:00:30 修改

阅读量1.8k

点赞数 8

分类专栏： optimization 文章标签： optimization

于 2024-10-28 11:54:06 首次发布

本文链接：https://blog.csdn.net/xy_optics/article/details/143292150

版权

文章目录

1. 凸函数的定义

给定一个函数 $\mathbb{R}^n \rightarrow \mathbb{R}$ ，如果函数的定义域（即 $\text{dom} \ f$ ）是凸集，并且对于任意的 $\in \text{dom} \ f$ 以及 $\leq \theta \leq 1$ ，都满足如下不等式：
$f(\theta x + (1 - \theta) y) \leq \theta f(x) + (1 - \theta) f(y)$
那么这个函数 $f$ 称为凸函数。

几何上，这个不等式意味着，连接 $(x, f (x))$ 和 $(y, f (y))$ 的线段（称为弦）在函数 $f$ 的图像上方。严格凸函数的定义是在上述不等式严格成立的情况下（即 $\neq y$ 且 $\theta < 1$ ）。

凹函数：若 $- f$ 是凸的，则称 $f$ 为凹函数。
线性函数：如果一个函数是仿射的（即线性加偏移），那么它既是凸函数也是凹函数。

例1：一个简单的凸函数例子是二次函数 $f(x) = x^2$ 。它的定义域是实数集 $\mathbb{R}$ ，并且对于任意的 $\in \mathbb{R}$ 和 $\leq \theta \leq 1$ ，都满足凸函数定义的不等式：
$f(\theta x + (1 - \theta) y) \leq \theta f(x) + (1 - \theta) f(y)$
在这里插入图片描述

下面是使用MATLAB绘制凸函数图像的代码示例，展示 $f(x) = x^2$ 和其几何意义，即弦的线段位于函数图像上方。

% MATLAB Code for Plotting a Convex Function

% Define the function f(x) = x^2
f = @(x) x.^2;

% Define a range of x values
x = linspace(-2, 2, 100);

% Compute f(x) values
y = f(x);

% Plot the function
figure;
plot(x, y, 'LineWidth', 2);
hold on;

% Choose two points on the curve
x1 = -1;
x2 = 1.5;
y1 = f(x1);
y2 = f(x2);

% Plot the points
plot([x1 x2], [y1 y2], 'ro', 'MarkerSize', 8, 'MarkerFaceColor', 'r');

% Plot the chord connecting the two points
theta = linspace(0, 1, 100);
x_chord = theta * x1 + (1 - theta) * x2;
y_chord = theta * y1 + (1 - theta) * y2;
plot(x_chord, y_chord, '--g', 'LineWidth', 2);

% Add labels and title
xlabel('x');
ylabel('f(x)');
title('Convex Function: f(x) = x^2');
legend('f(x)', 'Points on curve', 'Chord between points');
grid on;

hold off;

例2： $f(x) = e^x$ 指数函数也是凸函数。

在这里插入图片描述

2. 凸函数的性质

函数 $f$ 是凸的，当且仅当对于定义域内的任意直线，其在直线上的限制也是凸函数。也就是说，如果我们定义函数 $g (t) = f (x + t v)$ （其中 $\in \mathbb{R}$ ，且 $\in \text{dom} \ f$ ），那么如果 $g (t)$ 是凸的，说明 $f$ 也是凸的。

“如果一个函数在任意一条直线上的限制是凸的，那么这个函数就是凸函数”这一性质的体现是通过将函数在某一方向上限制为一维，然后检查该限制函数是否也是凸的。

例：假设我们在某个方向 $v$ 上对一个多维函数 $f (x)$ 进行限制，定义一个新函数 $g (t)$ ，其中 $t$ 是沿着方向 $v$ 的参数：
$g (t) = f (x + t v)$
其中 $\in \mathbb{R}^n$ ， $v$ 是一个固定方向向量， $t$ 是标量。这个新函数 $g (t)$ 是一维函数。我们只需要检查这个限制后的函数 $g (t)$ 是否是凸函数。

考虑一个二维函数 $f(x_1, x_2) = x_1^2 + x_2^2$ ，这是一个典型的凸函数。

在这里插入图片描述

% MATLAB Code for Plotting a 3D Convex Function f(x1, x2) = x1^2 + x2^2

% Define the function f(x1, x2) = x1^2 + x2^2
f = @(x1, x2) x1.^2 + x2.^2;

% Create a grid of x1 and x2 values
[x1, x2] = meshgrid(linspace(-2, 2, 100), linspace(-2, 2, 100));

% Compute the function values for f(x1, x2)
z = f(x1, x2);

% Create a 3D plot of the surface
figure;
surf(x1, x2, z, 'EdgeColor', 'none');

% Add labels and title
xlabel('x1');
ylabel('x2');
zlabel('f(x1, x2)');
title('3D Plot of f(x1, x2) = x1^2 + x2^2');
grid on;

% Adjust view angle for better visualization
view(45, 30);

我们可以选择方向向量 $v = [1; 1]$ ，这表示我们沿着 45 度的直线方向进行限制。可以选择其他方向 $v$ ，并验证其结果是否仍然是凸函数。

在这里插入图片描述

% MATLAB Code for Testing Convexity of a Function Along a Line

% Define a convex function f(x1, x2) = x1^2 + x2^2
f = @(x1, x2) x1.^2 + x2.^2;

% Define a direction vector v = [1, 1] (along the 45-degree line)
v = [1; 1];

% Define a range of t values (the parameter along the direction)
t = linspace(-2, 2, 100);

% Define an initial point x0 = [0, 0] for the line restriction
x0 = [0; 0];

% Compute the function values along the line x = x0 + t * v
g = @(t) f(x0(1) + t*v(1), x0(2) + t*v(2));

% Compute g(t) values
g_values = g(t);

% Plot the restricted function g(t)
figure;
plot(t, g_values, 'LineWidth', 2);
xlabel('t');
ylabel('g(t)');
title('Restricted Convex Function g(t) Along the Line');
grid on;

% Check convexity by plotting the chord between two points
t1 = -1;
t2 = 1.5;
g1 = g(t1);
g2 = g(t2);

% Plot the points
hold on;
plot([t1 t2], [g1 g2], 'ro', 'MarkerSize', 8, 'MarkerFaceColor', 'r');

% Plot the chord connecting the two points
theta = linspace(0, 1, 100);
t_chord = theta * t1 + (1 - theta) * t2;
g_chord = theta * g1 + (1 - theta) * g2;
plot(t_chord, g_chord, '--g', 'LineWidth', 2);

legend('g(t)', 'Points on curve', 'Chord between points');
hold off;

3. 凸函数的扩展值定义

为了便于处理，常常将凸函数 $f$ 扩展到整个 $\mathbb{R}^n$ ，并将定义域外的值设为无穷大。这种扩展值函数 $\hat{f}$ 定义如下：
$\hat{f}(x) =\begin{cases} f(x), & \text{如果} \ x \in \text{dom} \ f \\ \infty, & \text{如果} \ x \notin \text{dom} \ f \end{cases}$
通过这种扩展，我们可以将凸函数 $f$ 定义在所有 $\mathbb{R}^n$ 上，而不需要每次使用函数时去明确其定义域。即，当 $\notin \text{dom} \ f$ 时，函数值自动为无穷大。这使得我们能够简化操作，例如可以在整个空间内应用不等式。

对于凸函数定义的基本不等式：
$f(\theta x + (1 - \theta) y) \leq \theta f(x) + (1 - \theta) f(y),$
我们可以在扩展值函数上应用，即即使 $x$ 或 $y$ 超出了定义域，该不等式也仍然成立，因为扩展值会自动为无穷大，满足不等式要求。

指示函数 $I_C(x)$ 是凸函数的一个特殊例子，它定义为：
$I_C(x) = \begin{cases} 0, & \text{如果} \ x \in C \\ \infty, & \text{如果} \ x \notin C \end{cases}$
这里，集合 $C$ 是指示函数的定义域。指示函数的作用是限制优化问题的范围。例如，在优化过程中，如果我们希望对某个集合 $C$ 上的函数 $f$ 进行最小化，我们可以通过最小化函数 $f + I_C$ 来实现，即对整个 $\mathbb{R}^n$ 上进行优化，但只会在集合 $C$ 内有意义。

例：假设我们有一个凸函数 $f(x) = x^2$ ，并且我们只希望在区间 $C = [0, 1]$ 上最小化该函数。我们可以使用指示函数 $I_C(x)$ 来表示该约束。

我们想要最小化以下目标函数：
$min_{x} f(x) + I_C(x)$
其中，
$f(x) = x^2$
而指示函数 $I_C(x)$ 定义为：
$I_C(x) = \begin{cases} 0, & \text{如果} \ x \in [0, 1] \\ \infty, & \text{如果} \ x \notin [0, 1] \end{cases}$

在集合 $C$ 内（即 $\in [0, 1]$ ），目标函数等于 $f(x) = x^2$ ，这是一个凸函数。
在集合 $C$ 外（即 $\notin [0, 1]$ ），由于指示函数 $I_C(x)$ 为无穷大，这意味着在这个区域上该解是不可行的，也就是我们不需要考虑定义域之外的解。

4.凸函数的一阶条件

假设函数 $f$ 是可微的，即其梯度 $\nabla f$ 在其定义域 $\text{dom} \ f$ 中的每一点都存在。那么函数 $f$ 是凸函数，当且仅当对于所有 $\in \text{dom} \ f$ ，都满足不等式：
$\geq f(x) + \nabla f(x)^T (y - x)$
这个不等式的几何含义是：在任意点 $x$ 处，以 $f (x)$ 和梯度 $\nabla f(x)$ 构成的线性近似（即函数的泰勒展开的一阶项）是函数 $f$ 的一个全局下界.

在这里插入图片描述

% MATLAB Code to Illustrate First-Order Condition of Convexity

% Define the convex function f(x) = x^2
f = @(x) x.^2;

% Define its derivative (gradient) df(x) = 2x
df = @(x) 2*x;

% Define a range of x values
x = linspace(-2, 2, 100);

% Compute f(x) values
y = f(x);

% Choose a point x0 for the tangent line
x0 = 1;
f_x0 = f(x0);
df_x0 = df(x0);

% Compute the tangent line at x0: f(x0) + df(x0)*(x - x0)
tangent_line = f_x0 + df_x0 * (x - x0);

% Plot the function
figure;
plot(x, y, 'b-', 'LineWidth', 2);
hold on;

% Plot the tangent line
plot(x, tangent_line, 'r--', 'LineWidth', 2);

% Plot the point (x0, f(x0))
plot(x0, f_x0, 'ro', 'MarkerSize', 8, 'MarkerFaceColor', 'r');

% Add labels and title
xlabel('x');
ylabel('f(x)');
title('First-Order Condition: Tangent Line as a Global Underestimator');
legend('f(x) = x^2', 'Tangent Line at x_0', 'Point (x_0, f(x_0))');
grid on;

hold off;

不等式说明，局部信息（函数在某点的值和导数）可以推导出函数的全局信息（即全局下界）。这也是凸函数的一个重要性质。因为对于凸函数，只需要检查它在一点处的局部性质，就能判断它在整个定义域上的行为。

例如，如果对于某个 $\in \text{dom} \ f$ ，有 $\nabla f(x) = 0$ ，那么 $\geq f(x)$ 对于所有 $\in \text{dom} \ f$ 都成立，表明 $x$ 是函数 $f$ 的全局最小值。

如果函数 $f$ 是严格凸的，那么不等式严格成立：
$\nabla f(x)^T (y - x)$
对于所有 $\neq y$ 。

通过函数的一阶条件，我们可以从局部的梯度信息推导出全局的凸性。这也是凸函数最有用的性质之一，特别是在优化问题中。利用这一条件，我们可以利用局部信息（导数和函数值）来判断全局最优解的存在与否。

5. 凸函数的二阶条件

假设函数 $f$ 是二次可微的，即在定义域 $\text{dom} \ f$ 中，函数的 Hessian 矩阵（或在一维情况下，函数的二阶导数）存在。那么，函数 $f$ 是凸的，当且仅当其 Hessian 矩阵是半正定的，对于所有 $\in \text{dom} \ f$ ，满足：
$\nabla^2 f(x) \succeq 0$
这里， $\nabla^2 f(x)$ 表示 $f$ 的 Hessian 矩阵，它给出了函数的二阶导数信息。

对于一维函数 $\mathbb{R} \to \mathbb{R}$ ，该条件简化为 $\geq 0$ ，这意味着函数的二阶导数非负，从几何角度看，函数的图像是“开口向上”的。

如果函数 $f$ 是严格凸的，当且仅当其 Hessian 矩阵严格正定，即 $\nabla^2 f(x) \succ 0$ 。
类似地，如果 $f$ 是凹函数，则 $\nabla^2 f(x) \preceq 0$ 。
对于严格凹性，要求 Hessian 矩阵严格负定，即 $\nabla^2 f(x) \prec 0$ 。

例: 一个二次函数：
$\frac{1}{2} x^T P x + q^T x + r$

其中，矩阵 $P$ 是一个对称矩阵， $\in \mathbb{R}^n$ ， $\in \mathbb{R}$ 。

由于二次函数的 Hessian 矩阵 $\nabla^2 f(x) = P$ ，因此，当 $\succeq 0$ （即 $P$ 是半正定矩阵）时，函数 $f$ 是凸的；当 $\succ 0$ （即 $P$ 是正定矩阵）时，函数 $f$ 是严格凸的。如果 $\preceq 0$ ，函数 $f$ 是凹的。

定义域 $\text{dom} \ f$ 的凸性要求不能从二阶条件中省略。例如，函数 $f(x) = 1/x^2$ ，其定义域是 $\mathbb{R} \setminus \{0\}$ ，在该定义域内函数的二阶导数 $f^{''} (x) > 0$ ，但该函数并不是凸函数，因为其定义域不是凸集。

二阶条件提供了一种通过函数的二阶导数（或 Hessian 矩阵）来判断函数凸性的方法。对于凸函数来说，Hessian 矩阵需要是半正定的，而对于严格凸函数，它需要是正定的。这个条件对于分析二次可微函数的凸性非常有用，尤其是在二次函数的情形下，Hessian 矩阵的正定性直接决定了函数的凸性或凹性。

6. 如何通过Hessian矩阵判断凸性？

假设函数 $\mathbb{R}^n \to \mathbb{R}$ 是二次可微的，那么其 Hessian 矩阵 $\nabla^2 f(x)$ 是一个 $\times n$ 的对称矩阵，矩阵的每个元素是 $f$ 的二阶偏导数：
$\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$