Matrix calculus 矩阵微分

最新推荐文章于 2019-07-15 17:43:44 发布

narutojxl

最新推荐文章于 2019-07-15 17:43:44 发布

阅读量4.3k

点赞数

分类专栏：数学基础

数学基础专栏收录该内容

13 篇文章

订阅专栏

转载地址：https://en.wikipedia.org/wiki/Matrix_calculus

前提：若 x为向量，则默认 x为列向量， xT 为行向量，后面提到的两种布局都是这样认为的。

Types of Matrix Derivatives
Types	Scalar	Vector	Matrix
Scalar	$\frac{\partial y}{\partial x}$	$\frac{\partial \mathbf{y}}{\partial x}$	$\frac{\partial \mathbf{Y}}{\partial x}$
Vector	$\frac{\partial y}{\partial \mathbf{x}}$	$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$
Matrix	$\frac{\partial y}{\partial \mathbf{X}}$

书写说明：

let M(n,m) denote the space of real n×m matrices with n rows and m columns. Such matrices will be denoted using bold capital letters: A, X, Y, etc. An element of M(n,1), that is, a column vector, is denoted with a boldface lowercase letter: a, x, y, etc. An element of M(1,1) is a scalar, denoted with lowercase italic typeface: a, t, x, etc. XT denotes matrix transpose, tr(X) is the trace, and det(X) is the determinant.

下面关于上面这6种微分形式的讨论，都遵循 numerator layout convention （分子布局）形式，有些论文和书籍中的数学推导公式是按照Denominator-layout （分母布局），有的为了方便，甚至在一篇论文里两种分布都有。分子布局的结果和分母布局的结果刚好是转置的关系。

Vector-by-scalar[edit]

The derivative of a vector $\mathbf{y} =\begin{bmatrix}y_1 \\y_2 \\\vdots \\y_m \\\end{bmatrix}$ , by a scalar x is written (in numerator layout notation) as

$\frac{\partial \mathbf{y}}{\partial x} =\begin{bmatrix}\frac{\partial y_1}{\partial x}\\\frac{\partial y_2}{\partial x}\\\vdots\\\frac{\partial y_m}{\partial x}\\\end{bmatrix}.$

In vector calculus the derivative of a vector y with respect to a scalar x is known as the tangent vector of the vector y, $\frac{\partial \mathbf{y}}{\partial x}$ . Notice here that y:R1 $\rightarrow$ Rm.

Example Simple examples of this include the velocity vector in Euclidean space, which is the tangent vector of the position vector (considered as a function of time). Also, the acceleration is the tangent vector of the velocity.

Scalar-by-vector[edit]

The derivative of a scalar y by a vector $\mathbf{x} =\begin{bmatrix}x_1 \\x_2 \\\vdots \\x_n \\\end{bmatrix}$ , is written (in numerator layout notation) as

$\frac{\partial y}{\partial \mathbf{x}} =\left[\frac{\partial y}{\partial x_1} \ \ \frac{\partial y}{\partial x_2} \ \ \cdots \ \ \frac{\partial y}{\partial x_n}\right].$

In vector calculus,the gradient of a scalar field y in the space Rn (whose independent coordinates are the components of x) is the transpose of the derivative of a scalar by a vector. In physics, the electric field is the vector gradient of the electric potential.

The directional derivative （方向导数）of a scalar function f(x) of the space vector x in the direction of theunit vector u is defined using the gradient as follows.

$\nabla_{\bold{u}}{f}(\bold{x}) = \nabla f(\bold{x}) \cdot \bold{u}$ （两个向量点积，内积）

Using the notation just defined for the derivative of a scalar with respect to a vector we can re-write the directional derivative as $\nabla_\mathbf{u} f = \frac{\partial f}{\partial \mathbf{x}}\mathbf{u}.$ This type of notation will be nice when proving product rules and chain rules that come out looking similar to what we are familiar with for the scalar derivative.

Vector-by-vector[edit]

Each of the previous two cases can be considered as an application of the derivative of a vector with respect to a vector, using a vector of size one appropriately. Similarly we will find that the derivatives involving matrices will reduce to derivatives involving vectors in a corresponding way.

The derivative of a vector function (a vector whose components are functions) $\mathbf{y} =\begin{bmatrix}y_1 \\y_2 \\\vdots \\y_m \\\end{bmatrix}$ , with respect to an input vector, $\mathbf{x} =\begin{bmatrix}x_1 \\x_2 \\\vdots \\x_n \\\end{bmatrix}$ , is written (in numerator layout notation) as

$\frac{\partial \mathbf{y}}{\partial \mathbf{x}} =\begin{bmatrix}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\\vdots & \vdots & \ddots & \vdots\\\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\\end{bmatrix}.$

In vector calculus, the derivative of a vector function y with respect to a vector x whose components represent a space is known as the pushforward (or differential), or the Jacobian matrix.（雅克比矩阵）

Derivatives with matrices

Matrix-by-scalar

The derivative of a matrix function Y by a scalar x is known as the tangent matrix and is given (in numerator layout notation) by

$\frac{\partial \mathbf{Y}}{\partial x} =\begin{bmatrix}\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\\frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\\vdots & \vdots & \ddots & \vdots\\\frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x}\\\end{bmatrix}.$

Scalar-by-matrix[edit]

The derivative of a scalar y function of a p×q matrix X of independent variables, with respect to the matrix X, is given (in numerator layout notation) by

$\frac{\partial y}{\partial \mathbf{X}} =\begin{bmatrix}\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\\frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\\vdots & \vdots & \ddots & \vdots\\\frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\\end{bmatrix}.$

Important examples of scalar functions of matrices include the trace of a matrix and the determinant.

In analog with vector calculus this derivative is often written as the following.

$\nabla_\mathbf{X} y(\mathbf{X}) = \frac{\partial y(\mathbf{X})}{\partial \mathbf{X}}$

总结表

Result of differentiating various kinds of aggregates with other kinds of aggregates
	Scalar y		Vector y (size m)		Matrix Y (size m×n)
	Notation	Type	Notation	Type	Notation	Type
Scalar x	$\frac{\partial y}{\partial x}$	scalar	$\frac{\partial \mathbf{y}}{\partial x}$	(numerator layout) size-m column vector (denominator layout) size-m row vector	$\frac{\partial \mathbf{Y}}{\partial x}$	(numerator layout) m×n matrix
Vector x(size n)	$\frac{\partial y}{\partial \mathbf{x}}$	(numerator layout) size-n row vector (denominator layout) size-n column vector	$\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$	(numerator layout) m×n matrix (denominator layout) n×m matrix	$\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}$
Matrix X(size p×q)	$\frac{\partial y}{\partial \mathbf{X}}$	(numerator layout) q×p matrix (denominator layout) p×q matrix	$\frac{\partial \mathbf{y}}{\partial \mathbf{X}}$		$\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}$