FOURIER NEURAL OPERATOR FOR PARAMETRIC PARTIAL DIFFERENTIAL EQUATIONS
Peng Jian
1.Abstract
Neural operators that learn mappings between function spaces
formulate a new neural operator by parameterizing the integral kernel directly in Fourier space(频率空间)
f : R → R , F : f → g , g : k ( 频率 ) → C f:R\rightarrow R, \mathcal{F}:f\rightarrow g,g:k(频率)\rightarrow C f:R→R,F:f→g,g:k(频率)→C
2.Introduction
Two mainstream neural network-based approaches for PDEs – the finite-dimensional operators and Neural-FEM
The integral operator is restricted to a convolution, and instantiated through a linear transformation in the Fourier domain
The proposed framework can approximate complex operators raising in PDEs that are highly non-linear, with high frequency modes and slow energy decay
优点:
They learn an entire family of PDEs, in contrast to classical methods which solve on instance of the equation
Three orders of magnitude faster compared to traditional PDE solvers.
Achieves superior accuracy compared to previous learning-based solvers under fixed resolution
3.Learning Operators
3.1符号说明
有界定义域: D ⊂ R d D\subset\mathbb{R}^d D⊂Rd
separable Banach spaces of function: A = A ( D ; R d a ) \mathcal{A}=\mathcal{A}(D;\mathbb{R}^{d_a}) A=A(D;Rda) and U = U ( D ; R d u ) \mathcal{U}=\mathcal{U}(D;\mathbb{R}^{d_u}) U=U(D;Rdu)
函数对集合: { a j , u j } j = 1 N where a j ∼ μ is an i.i.d. sequence \{a_j,u_j\}_{j=1}^N\text{ where }a_j\sim\mu\text{ is an i.i.d. sequence} {aj,uj}j=1N where aj∼μ is an i.i.d. sequence
learns a mapping between two infinite dimensional spaces from a finite collection of observed input-output pairs
non-linear map: G † : A → U G^\dagger:{\mathcal{A}}\rightarrow\mathcal{U} G†:A→U, u j = G † ( a j ) \begin{aligned}u_j=G^\dagger(a_j)\end{aligned} uj=G†(aj)
We aim to build an approximation of G † G^\dagger G† by constructing a parametric map(参数映射)
parametric map: G : A × Θ → U or equivalently , G θ : A → U , θ ∈ Θ G:\mathcal{A}\times\Theta\to\mathcal{U}\quad\text{or equivalently},\quad G_\theta:\mathcal{A}\to\mathcal{U},\quad\theta\in\Theta G:A×Θ→Uor equivalently,Gθ:A→U,θ∈Θ
cost function: C : U × U → R C:\mathcal{U}\times\mathcal{U}\to\mathbb{R} C:U×U→R
seek a minimizer of the problem: min θ ∈ Θ E a ∼ μ [ C ( G ( a , θ ) , G † ( a ) ) ] \min_{\theta\in\Theta}\mathbb{E}_{a\sim\mu}[C(G(a,\theta),G^\dagger(a))] minθ∈ΘEa∼μ[C(G(a,θ),G†(a))]
3.2Discretization(离散化)
Since our data a j , u j a_j ,u_j aj,uj are, in general, functions, to work with them numerically,we assume access only to point-wise evaluations(逐点评估).
D j = { x 1 , … , x n } ⊂ D D_j~=~\{x_1,\ldots,x_n\}~\subset~D~ Dj = {x1,…,xn} ⊂ D be a n - n\text{-} n-point discretization of the domain D D D
a j ∣ D j ∈ R n × d a , u j ∣ D j ∈ R n × d v \begin{aligned}a_j|_{D_j}\in\mathbb{R}^{n\times d_a},u_j|_{D_j}\in\mathbb{R}^{n\times d_v}\end{aligned} aj∣Dj∈Rn×da,uj∣Dj∈Rn×dv
To be discretization-invariant, the neural operator can produce an answer u ( x ) {u(x)} u(x) for any x ∈ D x \in D x∈D, potentially x ∉ D j x\notin D_j x∈/Dj
4.Neural operator
神经算子的本质:Iterative architecture
v 0 ↦ v 1 ↦ … ↦ v T v_0\mapsto v_1\mapsto\ldots\mapsto v_T v0↦v1↦…↦vT
where v j for j = 0 , 1 , … , T − 1 \begin{aligned}v_j\text{ for }j=0,1,\ldots,T-1\end{aligned} vj for j=0,1,…,T−1 is a sequence of functions each taking values in R d υ \mathbb{R}^{d_{\upsilon}} Rdυ
higher dimensional representation of a ( x ) a(x) a(x): v 0 ( x ) = P ( a ( x ) ) v_0(x)=P(a(x)) v0(x)=P(a(x)), where P P P is a fully connected neural network
output: u ( x ) = Q ( v T ( x ) ) \begin{aligned}u(x)=Q(v_T(x))\end{aligned} u(x)=Q(vT(x)), where Q : R d v → R d u Q:\mathbb{R}^{d_{v}}\to\mathbb{R}^{d_{u}} Q:Rdv→Rdu
神经算子的强大之处在于将线性的全局积分算子(通过傅里叶变换)与非线性的局部激活函数相结合。类似于标准神经网络通过将线性乘法与非线性激活函数相结合来逼近高度非线性函数
迭代关系 v t ↦ v t + 1 \begin{aligned}v_t\mapsto v_{t+1}\end{aligned} vt↦vt+1:
v t + 1 ( x ) = σ ( W v t ( x ) + ∫ D κ ( x , y , a ( x ) , a ( y ) ; ϕ ) v t ( y ) d y ) v_{t+1}(x)=\sigma(Wv_{t}(x)+\int_D\kappa\big(x,y,a(x),a(y);\phi\big)v_t(y)\mathrm{d}y) vt+1(x)=σ(Wvt(x)+∫Dκ(x,y,a(x),a(y);ϕ)vt(y)dy)
因而可以理解下图:
5.Fourier Neural operator
进一步改进:用卷积算子(傅立叶变换)替换神经算子中的核积分算子,相应地,定义域发生改变(核积分算子自变量为 x x x,卷积算子自变量为 k k k (频率))
Fourier Transform:
( F f ) j ( k ) = ∫ D f j ( x ) e − 2 i π ⟨ x , k ⟩ d x (\mathcal{F}f)_j(k)=\int_Df_j(x)e^{-2i\pi\langle x,k\rangle}\mathrm{d}x (Ff)j(k)=∫Dfj(x)e−2iπ⟨x,k⟩dx
inverse:
( F − 1 f ) j ( x ) = ∫ D f j ( k ) e 2 i π ⟨ x , k ⟩ d k (\mathcal{F}^{-1}f)_j(x)=\int_Df_j(k)e^{2i\pi\langle x,k\rangle}\mathrm{d}k (F−1f)j(x)=∫Dfj(k)e2iπ⟨x,k⟩dk
改进操作:令核积分算子中的 κ ϕ ( x , y , a ( x ) , a ( y ) ) = κ ϕ ( x − y ) \kappa_\phi(x,y,a(x),a(y))=\kappa_\phi(x-y) κϕ(x,y,a(x),a(y))=κϕ(x−y)
那么 ( K ϕ v t ) ( x ) = ( K ϕ ∗ v t ) ( x ) \left(\mathcal{K}_\phi v_t\right)(x)=(\mathcal{K}_\phi*v_t)(x) (Kϕvt)(x)=(Kϕ∗vt)(x)
根据卷积定理: F ( K ϕ ∗ v t ) = F ( K ϕ ) ∗ F ( v t ) \mathcal{F}(\mathcal{K}_\phi*v_t)=\mathcal{F}(\mathcal{K}_\phi)*\mathcal{F}(v_t) F(Kϕ∗vt)=F(Kϕ)∗F(vt)
所以: K ϕ ∗ v t = F − 1 F ( K ϕ ∗ v t ) = F − 1 ( F ( K ϕ ) F ( v t ) ) \mathcal{K}_\phi*v_t=\mathcal{F}^{-1}\mathcal{F}(\mathcal{K}_\phi*v_t)=\mathcal{F}^{-1}(\mathcal{F}(\mathcal{K}_\phi)\mathcal{F}(v_t)) Kϕ∗vt=F−1F(Kϕ∗vt)=F−1(F(Kϕ)F(vt))
最终得: ( K ( a ; ϕ ) v t ) ( x ) = F − 1 ( F ( κ ϕ ) ⋅ F ( v t ) ) ( x ) , ∀ x ∈ D . \left(\mathcal{K}(a;\phi)v_t\right)(x)=\mathcal{F}^{-1}\left(\mathcal{F}(\kappa_\phi)\cdot\mathcal{F}(v_t)\right)(x),\quad\forall x\in D. (K(a;ϕ)vt)(x)=F−1(F(κϕ)⋅F(vt))(x),∀x∈D.
directly parameterize κ ϕ \kappa_\phi κϕ in Fourier space then let R ϕ = F ( κ ϕ ) R_{\phi}=\mathcal{F}(\kappa_\phi) Rϕ=F(κϕ)
因此: v t + 1 ( x ) = σ ( W v t ( x ) + F − 1 ( R ϕ ⋅ ( F v t ) ) ( x ) ) v_{t+1}(x)=\sigma(Wv_{t}(x)+\mathcal{F}^{-1}\left(R_\phi\cdot(\mathcal{F}v_t)\right)(x)) vt+1(x)=σ(Wvt(x)+F−1(Rϕ⋅(Fvt))(x))
卷积算子自变量为 k k k (频率模式 frequency mode)
所以: ( F v t ) ( k ) ∈ C d v a n d R ϕ ( k ) ∈ C d v × d v (\mathcal{F}v_t)(k)~\in~\mathbb{C}^{d_v}\mathrm{~and~}R_\phi(k)~\in~\mathbb{C}^{d_v\times d_v} (Fvt)(k) ∈ Cdv and Rϕ(k) ∈ Cdv×dv (参考Abstract)
前文已经假设 κ ϕ \kappa_\phi κϕ是周期函数,可以展开成傅立叶级数形式,so we may work with the discrete modes k ∈ Z d k\in\mathbb{Z}^d k∈Zd
截断傅立叶级数:
We pick a fifinite-dimensional parameterization by truncating(截断) the Fourier series at a maximal number of modes(最大模式数) k max = ∣ Z k max ∣ = . ∣ { k ∈ Z d : ∣ k j ∣ ≤ k max , j , for j = 1 , … , d } ∣ , Z k max = . { k ∈ Z d : ∣ k j ∣ ≤ k max , j , for j = 1 , … , d } k_{\max}=|Z_{k_{\max}}|\stackrel{.}{=}|\{k\in\mathbb{Z}^d:|k_j|{\leq}k_{\max,j},\text{ for }j=1,\ldots,d\}|,\quad Z_{k_{\max}}\stackrel{.}{=}\{k\in\mathbb{Z}^d:|k_j|{\leq}k_{\max,j},\text{ for }j=1,\ldots,d\} kmax=∣Zkmax∣=.∣{k∈Zd:∣kj∣≤kmax,j, for j=1,…,d}∣,Zkmax=.{k∈Zd:∣kj∣≤kmax,j, for j=1,…,d}
因为 R ϕ ( k ) ∈ C d v × d v R_\phi(k)~\in~\mathbb{C}^{d_v\times d_v} Rϕ(k) ∈ Cdv×dv,所以 R ϕ ( Z k max ) ∈ C k max × d v × d v R_\phi(Z_{k_{\max}})~\in~\mathbb{C}^{k_{\max}\times d_v\times d_v} Rϕ(Zkmax) ∈ Ckmax×dv×dv, 所以直接参数化 R ϕ R_\phi Rϕ为一组复值张量(complex-valued ( k max × d v × d v ) (k_{\max}\times d_v\times d_v) (kmax×dv×dv) -tensor),因此可以省略符号 ϕ \phi ϕ
总结:周期函数–>傅立叶展开–>使用离散的频率模式表示不同的频率分量–>设定最大模式数–>截断傅立叶级数–>得到周期函数的近似–>进而得到 R ϕ R_\phi Rϕ–>得到复值张量(参数化表示)
共轭对称性(conjugate symmetry):由于函数 κ \kappa κ 是实值的,对应的傅里叶变换结果 R ϕ R_\phi Rϕ 具有共轭对称性。这意味着在频域中的负频率模式的振幅和相位与正频率模式相同,只是共轭的。在实际应用中,利用共轭对称性可以减少计算量。由于正负频率模式具有相同的振幅和相位(只是共轭关系),我们只需要计算正频率模式的傅里叶系数,并根据共轭对称性得到负频率模式的系数。
5.1The discrete case and the FFT.
假设区域 D D D离散化为 n n n个点,那么 v t ∈ R n × d v and F ( v t ) ∈ C n × d v v_t\in\mathbb{R}^{n\times d_v}\text{ and }\mathcal{F}(v_t)\in\mathbb{C}^{n\times d_v} vt∈Rn×dv and F(vt)∈Cn×dv,因为我们将 v t v_t vt和只有 k max k_{\max} kmax个傅立叶模式的函数进行卷积,因此我们可以进行截断,得到 F ( v t ) ∈ C k max × d v \mathcal{F}(v_t)\in\mathbb{C}^{k_{\max}\times d_v} F(vt)∈Ckmax×dv。
所以 ( R ⋅ ( F v t ) ) k , l = ∑ j = 1 d v R k , l , j ( F v t ) k , j , k = 1 , … , k max , j = 1 , … , d v \left(R\cdot(\mathcal{F}v_t)\right)_{k,l}=\sum_{j=1}^{d_v}R_{k,l,j}(\mathcal{F}v_t)_{k,j},\quad k=1,\ldots,k_{\max},\quad j=1,\ldots,d_v (R⋅(Fvt))k,l=∑j=1dvRk,l,j(Fvt)k,j,k=1,…,kmax,j=1,…,dv,得到 k max × d v k_{\max} \times d_{v} kmax×dv的矩阵,每一行都是 v t + 1 ( x ) = σ ( W v t ( x ) + ( K ( a ; ϕ ) v t ) ( x ) ) v_{t+1}(x)=\sigma\Big(Wv_t(x)+\big(\mathcal{K}(a;\phi)v_t\big)(x)\Big) vt+1(x)=σ(Wvt(x)+(K(a;ϕ)vt)(x))中激活函数的一个输入。
When the discretization is uniform with resolution s 1 × ⋯ × s d = n , d = dim ( D ) s_1\times\cdots\times s_d=n,\quad d = \dim{(D)} s1×⋯×sd=n,d=dim(D), F \mathcal{F} F can be replaced by the Fast Fourier Transform
快速傅立叶变换:
f ∈ R n × d v , k = ( k 1 , … , k d ) ∈ Z s 1 × ⋯ × Z s d , and x = ( x 1 , … , x d ) ∈ D f\in\mathbb{R}^{n\times d_v},k=(k_1,\ldots,k_d)\in\mathbb{Z}_{s_1}\times\cdots\times\mathbb{Z}_{s_d},\text{and }x=(x_1,\ldots,x_d)\in D f∈Rn×dv,k=(k1,…,kd)∈Zs1×⋯×Zsd,and x=(x1,…,xd)∈D
快速傅立叶变换 F F T FFT FFT F ^ \hat{\mathcal{F}} F^
逆变换: F ^ − 1 \hat{\mathcal{F}}^{-1} F^−1
( F ^ f ) l ( k ) = ∑ x 1 = 0 s 1 − 1 ⋯ ∑ x d = 0 s d − 1 f l ( x 1 , … , x d ) e − 2 i π ∑ j = 1 d x j k j s j , ( F ^ − 1 f ) l ( x ) = ∑ k 1 = 0 s 1 − 1 ⋯ ∑ k d = 0 s d − 1 f l ( k 1 , … , k d ) e 2 i π ∑ j = 1 d x j k j s j \begin{aligned}(\hat{\mathcal{F}}f)_l(k)&=\sum_{x_1=0}^{s_1-1}\cdots\sum_{x_d=0}^{s_d-1}f_l(x_1,\ldots,x_d)e^{-2i\pi\sum_{j=1}^d\frac{x_jk_j}{s_j}},\\(\hat{\mathcal{F}}^{-1}f)_l(x)&=\sum_{k_1=0}^{s_1-1}\cdots\sum_{k_d=0}^{s_d-1}f_l(k_1,\ldots,k_d)e^{2i\pi\sum_{j=1}^d\frac{x_jk_j}{s_j}}\end{aligned} (F^f)l(k)(F^−1f)l(x)=x1=0∑s1−1⋯xd=0∑sd−1fl(x1,…,xd)e−2iπ∑j=1dsjxjkj,=k1=0∑s1−1⋯kd=0∑sd−1fl(k1,…,kd)e2iπ∑j=1dsjxjkj
R R R is treated as a ( s 1 × ⋯ × s d × d v × d v ) (s_1\times\cdots\times s_d\times d_v\times d_v) (s1×⋯×sd×dv×dv)-tensor