Properties and Applications of adjoints

最新推荐文章于 2022-08-21 13:33:43 发布

BrianYan_CSU

最新推荐文章于 2022-08-21 13:33:43 发布

阅读量308

点赞数

分类专栏： Machine Learning 文章标签： Adjoints

本文链接：https://blog.csdn.net/RokoBasilisk/article/details/105406558

版权

Machine Learning 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

First, we need to clarify the definition of the adjoint equation:

$\int u^*Audx = \int uA^*u^*dx$

In general, with non-vanishing boundary conditions, and the inner product given as $< \cdot, \cdot >$ ：

$u^*,Au> = <u,A^*u^*> + B.T.$

Note:

$A^∗$ is alinearoperator.
It is not straight forward to define the adjoint of anonlinearoperator. It can however be introduced based on the theory of linear operators. This means that one mustfirst linearise the nonlinear operator andthen define its adjoint.

Properties

The adjoint equations is counterintuitive, because it run backwards in time and reverse the propagation of information. The essence of these properties is transposition in matrix.

reverses the propagation of information

A simple advection example

Suppose we are are solving a one-dimensional advection-type equation on a mesh with three nodes, at $x_{0}=0, x_{1}=0.5$ , and $x_{2}=1$ .

The velocity goes from left to right, and so we impose an inflow boundary condition at the left-most node $x_{0}$ .

A simple sketch of the linear system that might describe this configuration could look as follows:

$\left[ \begin{matrix} 1 & 0 & 0 \\ a & b & 0 \\ c & d & e \end{matrix} \right] \left[ \begin{matrix} u_{0} \\ u_{1}\\ u_{2} \end{matrix} \right] =\left[ \begin{matrix} 1\\ 0\\ 0 \end{matrix} \right],$

where $a, b, c, d$ and $e$ are some coefficients of the matrix arising from a discretisation of the equation.

The equation for $u_{1}$ does not depend on $u_{2}$ , as information is flowing from left to right. The structure of the matrix dictates the propagation of information of the system: first $u_{0}$ is set to the boundary condition value, then $u_{1}$ may be computed, and then finally $u_{2}$ .

The lower-triangular nature of the matrix reflects the rightward propagation of information.

Notice that $u_{0}$ is prescribed: that is, the value of $u_{0}$ does not depend on the values at any other nodes; all off-diagonal entries on the row for $u_{0}$ are zero. Notice further that the value $u_{2}$ is diagnostic: no other nodes depend on its value; all off-diagonal entries on its column are zero.

Now suppose that we take the adjoint of this system with respect to some functional $J (u)$ . The operator is linear (no entry in the matrix depends on $u$ ), and so the adjoint of this system is just its transpose:

$\left[ \begin{matrix} 1 & a & c \\ 0 & b & d \\ 0 & 0 & e \end{matrix} \right] \left[ \begin{matrix} \lambda_{0} \\ \lambda_{1}\\ \lambda_{2} \end{matrix} \right] =\left[ \begin{matrix} \partial J / \partial u_{0}\\ \partial J / \partial u_{1}\\ \partial J / \partial u_{2} \end{matrix} \right],$

where $λ$ is the adjoint variable corresponding to $u$ . Observe that transposing the forward system yields an upper-triangular adjoint system: the adjoint propagates information from right to left, in the opposite sense to the propagation of the forward system. To solve this system, one would first solve for $λ_{2}$ , then compute $λ_{1}$ , and finally $λ_{0}$ .

Further notice that $λ_{2}$ is now prescribed: it can be computed directly from the data, with no dependencies on the values of other adjoint variables; all of the off-diagonal entries in its row are zero. $λ_{0}$ is now diagnostic: no other variables depend on its value; all off-diagonal entries in its column are zero.

Note:

If the forward equation is $u \cdot \nabla T$ , where $u$ is the advecting velocity and $T$ is the advected tracer, then its corresponding adjoint term is $- u \cdot \nabla λ$ . The adjoint advection equation is itself an advection equation, with the reverse of the forward velocity.

Variables that are prescribed in the forward model are diagnostic in the adjoint; variables that are diagnostic in the forward model are prescribed in the adjoint.

linear

The operator of the tangent linear system is the linearisation of the operator about the solution $u$ , so the adjoint system is always linear in $λ$ .

This has two major effects:

Beneficial effect: the computation time of the adjoint run
Not beneficial effect： the storage requirements of the adjoint run

Beneficial effect

the computation time of the adjoint run

The forward model may be nonlinear, but the adjoint is always linear, and so it can be much cheaper to solve than the forward model.

e.g.:

If the forward model employs a Newton solver for the nonlinear problem that uses on average $5$ linear solves to converge to machine precision, then a rough estimate for the adjoint computation is that it will take ${\frac{1}{5}}$ the runtime of the forward model.

Not beneficial effect

the storage requirements of the adjoint run

We all know that the adjoint operator is a linearisation of the nonlinear operator about the solution $u$ :, therefore,

if the forward model is nonlinear, the forward solution must be available to assemble the adjoint system.;
If the forward model is steady, this is not a significant difficulty;
if the forward model is time-dependent, the entire solution trajectory through time must be available.

Applications

PDE-constrained optimisation
Sensitivity analysis
Data assimilation
Inverse problems
Generalised stability theory
Error estimation

Only the basic idea across

PDE-constrained optimisation

Firstly, adjoints form the core technique for efficiently computing the gradient ${\frac{dJ(u,m)}{dm}}$ of a functional $J$ to be minimised. This is usually essential for solving such optimisation problems in practice:

Gradient-free optimisation algorithms typically take orders of magnitude more iterations to converge; since each iteration involves a PDE solve, minimising the number of iterations taken is crucial.

Sensitivity analysis

Occasionally, the gradient of a functional $J$ with respect to some parameter $m$ is not merely required as an input to an optimisation algorithm, but rather is of scientific interest in its own right.

Adjoint computations can tease apart hidden influences and teleconnections;
Such computations can also inform scientists regarding which variables matter the least, which is often important for deriving approximate models;
Parameters with little impact on the simulation can be ignored.

This process is also often undertaken in advance of solving an optimisation problem:

By discarding parameters which do not significantly influence the functional, the dimension of the parameter space may be systematically reduced.

Data assimilation

A forward model requires data on which to operate.

We will analy from a simple example that weather forecast.

e.g.:

To start a weather forecast, knowledge of the entire state of the atmosphere at some point in time is required as an initial condition from which to begin the simulation: start from the wrong initial condition, and you will get the wrong weather.

Analysis:

The problem is that, in practice, the initial condition is unknown.

Instead, observations of the state of the atmosphere are available, some available at the initial time, and some taken at later times. The goal of data assimilation is to systematically combine observations and computations, usually with the intention of acquiring the best possible estimate of the unknown initial condition, so that a forecast may be run.

There are two major approaches to data assimilation.

Kalman filter algorithm
Variational data assimilation

Kalman filter algorithm

The most popular approach to computing the amplitude of the update

In a sequential algorithm, the forward system is timestepped until an observation is available, at which point the model is instantaneously “updated” to incorporate the information contained in the observation. The model is then continued from this updated state until all of the observations are used.

Drawback：

The observation only influences the state at times later than the observation time. In other words, its temporal influence only propagates forward, not backwards in time.

variational data assimilation

a special case of PDE-constrained optimisation

In this approach, a functional $J$ is chosen to represent the misfit between the observations and the computations, weighted by the uncertainties in each.

The initial condition is treated as a control parameter $m$ , and chosen to minimise the misfit $J$ .

The data assimilation community tends to place significant emphasis on modelling the uncertainties in both the computations and observations, as this is key to extracting the maximal amount of information out of both.

Inverse problems

Data assimilation can be seen as a particular kind of inverse problem.

Because it focus on obtaining the best estimate for the system state at some point in the past.

In fact, the essence of inverse problems is that we gain information about unobservable system parameters from observable system outputs.

This just confirmed that the properties of adjoints equations which reversing the propagation of information.

Generalised stability theory

In the nonnormal case, the usual stability theory fails.

Note:

usual stability theory: If the real component of every eigenvalue is negative, the state is stable, and the associated eigenmode will vanish in the limit as $t \to \infty$ ; while if any eigenvalue has a positive real part, the state is unstable, and the associated eigenmode will grow in amplitude.

Process:

Instead of focusing on the eigenvalues of the operator linearised about some steady state, generalised stability theory analyses the generalised eigenvalues associated with the propagator of the system, which maps perturbations in initial conditions to perturbations in the final state.Essentially, the propagator is the inverse of the tangent linear operator.By examining these values, such an analysis can describe and predict the perturbations that will grow maximally over finite time windows. In order to compute these generalised eigenvalues of the system propagator, both the tangent linear and adjoint operators must be repeatedly solved.

Error estimation

goal-based error estimation、the related computational technique of goal-based adaptivity.

The fundamental theorem of error estimation, due to Rannacher and co-workers，states that:

$J(u)−J(u_{h})={\frac{1}{2}}⟨λ−λ_{h},ρ_{u}⟩+{\frac{1}{2}}⟨u−u_{h},ρ_{λ}⟩+R_{h}^{(3)},$

Based on the examples provided by Rannacher and co-workers, we find that :

a computation that employs goal-based adaptivity is dramatically faster at computing the functional to within a certain tolerance than the corresponding fixed-mesh or heuristically-driven adaptivity.
it raises the possibility of reliable automated computation: not only can the discretisation of the differential equation be automated with the FEniCS system, it can be automated to reliably and efficiently compute desired quantities to within a specified accuracy.

BrianYan_CSU

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Properties and Applications of adjoints

首先，我们需要阐明伴随方程的定义： intu∗Audx= intuA∗u∗dx， \ int u ^ * Au dx = \ int uA ^ * u ^ * dx， intu∗Audx= intuA∗u∗dx，通常，在边界条件不消失...
复制链接

扫一扫