Hyperbolic Representation Learning

Manifolds: A Gentle Introduction

Manifolds

  • 关于 Manifold,我们可能想到的就是 Manifold Hypothesis:现实中的高维数据通常位于一个嵌入在高维空间里的低维流形,这解释了为什么 ML 能从高维数据中抽取出有效的低维特征。但究竟什么是 Manifold

Manifolds belong to the branches of mathematics of topology and differential geometry.


Circles and Spheres as Manifolds

  • Manifold 可以看作局部与欧氏空间同胚 (homeomorphic) 的拓扑空间,其中同胚是指能保留拓扑性质的连续双射 (当然欧氏空间也是 manifold),因此 n n n 维流形就是指局部能被 R n \R^n Rn 近似的空间。下图展示了几种嵌入在二维空间里的一维流形,它们的每条小圆弧都与一维线段近似,但二维空间里的任何有交点的空间都不是一维流形 (e.g. 8),因为交点处并不满足对一维线段的局部近似
    在这里插入图片描述下图展示了几种嵌入在三维空间里的二维流形。Similarly, any 2D surface (including a plane) that doesn’t self-intersect is also a 2D manifold.
    在这里插入图片描述

A (slightly) More Formal Look at Manifolds

在这里插入图片描述

  • 如下图所示, X X X 为 manifold, U α , U β U_\alpha,U_\beta Uα,Uβ 为两个不同的 “patches” or domains,每个 domain 都可以通过 chart φ α , φ β \varphi_\alpha,\varphi_\beta φα,φβ 映射到低维欧氏空间。如果一个集合的 charts 对应的 domains 刚好覆盖整个 manifold,则该集合为 atlas. φ α β = φ β ∘ φ α − 1 , φ β α = φ α ∘ φ β − 1 \varphi_{\alpha \beta}=\varphi_\beta \circ \varphi_\alpha^{-1},\varphi_{\beta \alpha}=\varphi_\alpha \circ \varphi_\beta^{-1} φαβ=φβφα1,φβα=φαφβ1transition map,当 transition maps k k k 次可微时,manifold 为 differentiable manifolds,当 transition maps 无穷可微时,manifold 为 smooth manifolds.
    在这里插入图片描述

Example: A 1D Manifold with Multiple Charts

  • 单位圆是一维流形,如下图所示,可以定义 4 个 charts M → R M\rightarrow\R MR 将一维流形映射到一维空间 (极坐标表示, θ ∈ [ 0 , 2 π ) \theta\in[0,2\pi) θ[0,2π)),定义多个 charts 是因为 domain 必须为 open set
    φ 1 ( r , θ ) = θ θ ∈ ( − π 3 , π 3 ) φ 2 ( r , θ ) = θ θ ∈ ( π 6 , 5 π 6 ) φ 3 ( r , θ ) = θ θ ∈ ( 2 π 3 , 4 π 3 ) φ 4 ( r , θ ) = θ θ ∈ ( 7 π 6 , 11 π 6 ) \begin{aligned} \varphi_1(r, \theta)=\theta &\quad \theta \in\left(-\frac{\pi}{3}, \frac{\pi}{3}\right) \\ \varphi_2(r, \theta)=\theta &\quad \theta \in\left(\frac{\pi}{6}, \frac{5 \pi}{6}\right) \\ \varphi_3(r, \theta)=\theta &\quad \theta \in\left(\frac{2 \pi}{3}, \frac{4 \pi}{3}\right) \\ \varphi_4(r, \theta)=\theta & \quad \theta \in\left(\frac{7 \pi}{6}, \frac{11 \pi}{6}\right) \end{aligned} φ1(r,θ)=θφ2(r,θ)=θφ3(r,θ)=θφ4(r,θ)=θθ(3π,3π)θ(6π,65π)θ(32π,34π)θ(67π,611π)在这里插入图片描述
  • 我们也可以通过 stereographic projection 定义 charts. 通过 “north” or “south” pole ( 0 , 1 ) , ( 0 , − 1 ) (0,1),(0,-1) (0,1),(0,1) 可以定义两个 charts,将除了自身外的其余点投影到 x x x 轴上
    u 1 : = φ 1 ( p ) = φ 1 ( p ) 1 = x p 1 − y p u 2 : = φ 2 ( q ) = φ 2 ( q ) 1 = x q 1 + y q u_1:=\varphi_1(p)=\frac{\varphi_1(p)}{1}=\frac{x_p}{1-y_p}\\ u_2:=\varphi_2(q)=\frac{\varphi_2(q)}{1}=\frac{x_q}{1+y_q} u1:=φ1(p)=1φ1(p)=1ypxpu2:=φ2(q)=1φ2(q)=1+yqxq在这里插入图片描述联立等式 x 2 + y 2 = 1 x^2+y^2=1 x2+y2=1 可得逆映射
    x p = 2 u 1 u 1 2 + 1 , y p = u 1 2 − 1 u 1 2 + 1 x q = 2 u 2 u 2 2 + 1 , y q = 1 − u 2 2 u 2 2 + 1 x_p=\frac{2 u_1}{u_1^2+1}, y_p=\frac{u_1^2-1}{u_1^2+1} \\ x_q=\frac{2 u_2}{u_2^2+1}, y_q=\frac{1-u_2^2}{u_2^2+1} xp=u12+12u1,yp=u12+1u121xq=u22+12u2,yq=u22+11u22由此可得 transition map φ α β \varphi_{\alpha\beta} φαβ
    u 2 = φ α β ( u 1 ) = φ 2 ∘ φ 1 − 1 ( u 1 ) = φ 2 ( φ 1 − 1 ( u 1 ) ) = φ 2 ( ( 2 u 1 u 1 2 + 1 , u 1 2 − 1 u 1 2 + 1 ) ) = 2 u 1 u 1 2 + 1 1 + u 1 2 − 1 u 1 2 + 1 = 1 u 1 \begin{aligned} u_2 & =\varphi_{\alpha \beta}\left(u_1\right) \\ & =\varphi_2 \circ \varphi_1^{-1}\left(u_1\right) \\ & =\varphi_2\left(\varphi_1^{-1}\left(u_1\right)\right) \\ & =\varphi_2\left(\left(\frac{2 u_1}{u_1^2+1}, \frac{u_1^2-1}{u_1^2+1}\right)\right) \\ & =\frac{\frac{2 u_1}{u_1^2+1}}{1+\frac{u_1^2-1}{u_1^2+1}} \\ & =\frac{1}{u_1} \end{aligned} u2=φαβ(u1)=φ2φ11(u1)=φ2(φ11(u1))=φ2((u12+12u1,u12+1u121))=1+u12+1u121u12+12u1=u11

Example: Stereographic Projections for S n S^n Sn (球极平面投影)

  • n n n-dimensional sphere S n S^n Sn 是嵌入在 ( n + 1 ) (n+1) (n+1) 维欧氏空间中的流形,下图展示了使用 stereographic projection 将 S 2 S^2 S2 映射到二维空间
    在这里插入图片描述该方法可以推广到 S n S^n Sn
    在这里插入图片描述对于超球体上的任意点 p = ( x , z ) ∈ S n p=(\mathbf x,z)\in S^n p=(x,z)Sn,用 north pole 做投影,在超平面 z = 0 z=0 z=0 上得到的投影点坐标为 ( u N , 0 ) (\mathbf u_N,0) (uN,0)
    u N : = φ N ( p ) = x 1 − z x = 2 u N ∣ u N ∣ 2 + 1 z = ∣ u N ∣ 2 − 1 ∣ u N ∣ 2 + 1 \begin{aligned} \mathbf{u}_{\mathbf{N}}&:=\varphi_N(p)=\frac{\mathbf{x}}{1-z} \\ \mathbf{x}&=\frac{2 \mathbf{u}_{\mathbf{N}}}{\left|\mathbf{u}_{\mathbf{N}}\right|^2+1} \\ z&=\frac{\left|\mathbf{u}_{\mathbf{N}}\right|^2-1}{\left|\mathbf{u}_{\mathbf{N}}\right|^2+1} \end{aligned} uNxz:=φN(p)=1zx=uN2+12uN=uN2+1uN21

Tangent Spaces

  • 定义 Manifold M M M x x x 处的 tangent space T x M T_xM TxM,它由所有经过 x x x 的 curves γ ( t ) \gamma(t) γ(t)tangent vector v v v 张成。该空间可以被看作 x x x 处对 M M M 的一阶线性近似
    在这里插入图片描述

Tangent Spaces as the Velocity of Curves

  • 假设 p ∈ M p\in M pM M M M 为 smooth manifold,可以定义 chart φ : U → R n \varphi:U\rightarrow\R^n φ:URn,其中 U U U M M M 上包含 p p p 的 open subset. 进一步可以定义一个 smooth parametric curve γ : t → M \gamma:t\rightarrow M γ:tM 用于将 t ∈ [ a , b ] t\in[a,b] t[a,b] 映射到 M M M 上过 p p p 点的一条曲线,因此 φ ∘ γ : t → R n \varphi\circ\gamma:t\rightarrow\R^n φγ:tRn 可以将 t t t 直接映射到欧氏空间上的点 x x x. 当 t = t 0 t=t_0 t=t0 时 (对应点 p p p),有
     "velocity" at  p = d φ ∘ γ ( t ) d t ∣ t = t 0 = [ d x 1 ( t ) d t , … , d x n ( t ) d t ] ∣ t = t 0 \text { "velocity" at } p=\left.\frac{d \varphi \circ \gamma(t)}{d t}\right|_{t=t_0}=\left.\left[\frac{d x^1(t)}{d t}, \ldots, \frac{d x^n(t)}{d t}\right]\right|_{t=t_0}  "velocity" at p=dtdφγ(t) t=t0=[dtdx1(t),,dtdxn(t)] t=t0上述速度向量即可视为 tangent vector v v v,所有 tangent vector 张成的空间 T p M T_pM TpM tangent space 可以被表示为 R n \R^n Rn,因此我们可以进行常规的欧式向量空间操作

Basis of the Tangent Space

  • 首先我们定义任意 test function f : M → R f:M\rightarrow\R f:MR 以及 smooth parametric curve γ : t → M \gamma:t\rightarrow M γ:tM,那么 “velocity” relative to this test function
    v f = d f ∘ γ ( t ) d t ∣ t = t 0 = d ( f ∘ φ − 1 ∘ φ ∘ γ ) ( t ) d t ∣ t = t 0 = d ( ( f ∘ φ − 1 ) ∘ ( φ ∘ γ ) ) ( t ) d t ∣ t = t 0 = ∑ i ∂ ( f ∘ φ − 1 ) ( x ) ∂ x i ∣ x = φ ∘ γ ( t 0 ) d ( φ ∘ γ ) i ( t ) d t ∣ t = t 0 chain rule = ∑ i ∂ ( f ∘ φ − 1 ) ( x ) ∂ x i ∣ x = φ ( p ) d ( φ ∘ γ ) i ( t ) d t ∣ t = t 0 since  p = γ ( t 0 ) = ∑ i ( basis for component  i ) ( "velocity" of component i wrt to  φ ) \begin{align*} {\bf v} f&=\frac{df \circ \gamma(t)}{dt}\Big|_{t=t_0} \\&= \frac{d(f \circ \varphi^{-1} \circ \varphi \circ \gamma)(t)}{dt}\Big|_{t=t_0} \\ &= \frac{d((f \circ \varphi^{-1}) \circ (\varphi \circ \gamma))(t)}{dt}\Big|_{t=t_0} \\ &= \sum_i \frac{\partial (f \circ \varphi^{-1})(x)}{\partial x_i}\Big|_{x=\varphi \circ \gamma(t_0)} \frac{d(\varphi \circ \gamma)^i(t)}{dt}\Big|_{t=t_0} && \text{chain rule} \\ &= \sum_i \frac{\partial (f \circ \varphi^{-1})(x)}{\partial x_i}\Big|_{x=\varphi(p)} \frac{d(\varphi \circ \gamma)^i(t)}{dt}\Big|_{t=t_0} && \text{since }p = \gamma(t_0) \\ &= \sum_i (\text{basis for component }i)(\text{"velocity" of component i wrt to } \varphi) \\ \end{align*} vf=dtdfγ(t) t=t0=dtd(fφ1φγ)(t) t=t0=dtd((fφ1)(φγ))(t) t=t0=ixi(fφ1)(x) x=φγ(t0)dtd(φγ)i(t) t=t0=ixi(fφ1)(x) x=φ(p)dtd(φγ)i(t) t=t0=i(basis for component i)("velocity" of component i wrt to φ)chain rulesince p=γ(t0)
  • 在 vector space 中,向量为向量各个坐标分量乘上对应 basis 的加权和,例如欧氏空间中常用的 basis 为 e 1 , e 2 , . . . e_1,e_2,... e1,e2,.... 上述式子中可以将 d ( φ ∘ γ ) i ( t ) d t ∣ t = t 0 \frac{d(\varphi \circ \gamma)^i(t)}{dt}\Big|_{t=t_0} dtd(φγ)i(t) t=t0 看作坐标分量, ∂ ( f ∘ φ − 1 ) ( x ) ∂ x i ∣ x = φ ( p ) \frac{\partial (f \circ \varphi^{-1})(x)}{\partial x_i}\Big|_{x=\varphi(p)} xi(fφ1)(x) x=φ(p) 看作对应的 basis,可以简写为
    ( ∂ ∂ x i ) p ( f ) : = ∂ ( f ∘ φ − 1 ) ( φ ( p ) ) ∂ x i \begin{align*} \Big(\frac{\partial}{\partial x^i}\Big)_p (f) := \frac{\partial (f \circ \varphi^{-1})(\varphi(p))}{\partial x_i} \\ \end{align*} (xi)p(f):=xi(fφ1)(φ(p))由于 f f f 是任取的,因此我们甚至可以省略 f f f,对于任意切向量 v ∈ T p M {\bf v}\in T_pM vTpM,有
    v = ∑ i = 1 n v ( x i ) ⋅ ( ∂ ∂ x i ) p = ∑ i = 1 n d ( φ ∘ γ ) i ( t ) d t ∣ t = t 0 ⋅ ( ∂ ∂ x i ) p \begin{align*} {\bf v} &= \sum_{i=1}^n v(x^i) \cdot \Big(\frac{\partial}{\partial x^i}\Big)_p \\ &= \sum_{i=1}^n \frac{d(\varphi \circ \gamma)^i(t)}{dt}\Big|_{t=t_0} \cdot \Big(\frac{\partial}{\partial x^i}\Big)_p \\ \tag{10} \end{align*} v=i=1nv(xi)(xi)p=i=1ndtd(φγ)i(t) t=t0(xi)p

Change of Basis for Tangent Vectors

  • d d d 维流形 M M M 中定义两个 charts:
    φ ( p ) = ( x 1 ( p ) , … , x d ( p ) ) ϑ ( p ) = ( y 1 ( p ) , … , y d ( p ) ) \begin{align*} \varphi(p) = (x^1(p), \ldots, x^d(p)) \\ \vartheta(p) = (y^1(p), \ldots, y^d(p)) \\ \end{align*} φ(p)=(x1(p),,xd(p))ϑ(p)=(y1(p),,yd(p))下面考虑如何将一个 chart 对应的 tangent space 变换到另一个 chart 对应的 tangent space
    v f = ∑ i v ( x i ) ⋅ ( ∂ ∂ x i ) p f = ∑ i v ( x i ) ⋅ ∂ x i ( f ∘ φ − 1 ) ( φ ( p ) ) by definition = ∑ i v ( x i ) ⋅ ∂ x i ( f ∘ ϑ − 1 ∘ ϑ ∘ φ − 1 ) ( φ ( p ) ) introduce  ϑ  with identity trick = ∑ i v ( x i ) ⋅ ∂ x i ( ( f ∘ ϑ − 1 ) ∘ ( ϑ ∘ φ − 1 ) ) ( φ ( p ) ) = ∑ i v ( x i ) ⋅ ∑ j ∂ x i ( ϑ ∘ φ − 1 ) j ( φ ( p ) ) ⋅ ∂ y j ( f ∘ ϑ − 1 ) ( ϑ ∘ φ − 1 ( φ ( p ) ) ) chain rule = ∑ i v ( x i ) ⋅ ∑ j ∂ x i ( ϑ ∘ φ − 1 ) j ( φ ( p ) ) ⋅ ∂ y j ( f ∘ ϑ − 1 ) ( ϑ ( p ) ) simplifying = ∑ i v ( x i ) ⋅ ∑ j ∂ x i ( ϑ ∘ φ − 1 ) j ( φ ( p ) ) ⋅ ( ∂ ∂ y j ) p f by definition = ∑ i v ( x i ) ⋅ ∑ j ∂ y j ∂ x i ∣ x = φ ( p ) ⋅ ( ∂ ∂ y j ) p f since  y j ( x ) = y j ( φ − 1 ( φ ( x ) ) ) = ∑ j ∑ i v ( x i ) ⋅ ∂ y j ∂ x i ∣ x = φ ( p ) ⋅ ( ∂ ∂ y j ) p f v f = ∑ j v ( y j ) ⋅ ( ∂ ∂ y j ) p f \begin{align*} {\bf v} f &= \sum_iv(x^i) \cdot \Big(\frac{\partial}{\partial x^i}\Big)_p f \\ &= \sum_iv(x^i) \cdot \partial_{x^i} (f \circ \varphi^{-1})(\varphi(p)) && \text{by definition} \\ &= \sum_iv(x^i) \cdot \partial_{x^i} (f \circ \vartheta^{-1} \circ \vartheta \circ \varphi^{-1})(\varphi(p)) && \text{introduce } \vartheta \text{ with identity trick}\\ &=\sum_i v(x^i) \cdot \partial_{x^i} ((f \circ \vartheta^{-1}) \circ (\vartheta \circ \varphi^{-1}))(\varphi(p)) \\ &=\sum_i v(x^i) \cdot\sum_j \partial_{x^i} (\vartheta \circ \varphi^{-1})^j(\varphi(p)) \cdot \partial_{y^j} (f \circ \vartheta^{-1})(\vartheta \circ \varphi^{-1}(\varphi(p))) && \text{chain rule} \\ &=\sum_i v(x^i) \cdot \sum_j \partial_{x^i} (\vartheta \circ \varphi^{-1})^j(\varphi(p)) \cdot \partial_{y^j} (f \circ \vartheta^{-1})(\vartheta(p)) && \text{simplifying} \\ &=\sum_i v(x^i) \cdot \sum_j \partial_{x^i} (\vartheta \circ \varphi^{-1})^j(\varphi(p)) \cdot \Big(\frac{\partial}{\partial y^j}\Big)_p f && \text{by definition} \\ &=\sum_i v(x^i) \cdot \sum_j \frac{\partial y^j}{\partial x^i}\big|_{x=\varphi(p)} \cdot \Big(\frac{\partial}{\partial y^j}\Big)_p f && \text{since }y^j(x) = y^j(\varphi^{-1}(\varphi(x))) \\ &=\sum_j\sum_i v(x^i) \cdot \frac{\partial y^j}{\partial x^i}\big|_{x=\varphi(p)} \cdot \Big(\frac{\partial}{\partial y^j}\Big)_p f\\ {\bf v} f &=\sum_j v(y^j) \cdot \Big(\frac{\partial}{\partial y^j}\Big)_p f \end{align*} vfvf=iv(xi)(xi)pf=iv(xi)xi(fφ1)(φ(p))=iv(xi)xi(fϑ1ϑφ1)(φ(p))=iv(xi)xi((fϑ1)(ϑφ1))(φ(p))=iv(xi)jxi(ϑφ1)j(φ(p))yj(fϑ1)(ϑφ1(φ(p)))=iv(xi)jxi(ϑφ1)j(φ(p))yj(fϑ1)(ϑ(p))=iv(xi)jxi(ϑφ1)j(φ(p))(yj)pf=iv(xi)jxiyj x=φ(p)(yj)pf=jiv(xi)xiyj x=φ(p)(yj)pf=jv(yj)(yj)pfby definitionintroduce ϑ with identity trickchain rulesimplifyingby definitionsince yj(x)=yj(φ1(φ(x)))其中, ∂ x f : = ∂ f ∂ x \partial_x f := \frac{\partial f}{\partial x} xf:=xf. 由上述两式可知 change of basis matrix 即为 Jacobian J y J_{\bf y} Jy
    v ( y ) = [ v ( y 1 ) … v ( y d ) ] = J y v ( x ) = [ ∂ y 1 ∂ x 1 ∣ x = φ ( p ) ⋯ ∂ y 1 ∂ x d ∣ x = φ ( p ) ⋮ ⋱ ⋮ ∂ y d ∂ x 1 ∣ x = φ ( p ) ⋯ ∂ y d ∂ x d ∣ x = φ ( p ) ] [ v ( x 1 ) … v ( x d ) ] \begin{align*} {\bf v(y)} = \begin{bmatrix} v(y^1) \\ \ldots \\ v(y^d) \end{bmatrix} = {\bf J_y} {\bf v(x)} = \begin{bmatrix} \frac{\partial y^1}{\partial x^1}\big|_{x=\varphi(p)} & \cdots & \frac{\partial y^1}{\partial x^d}\big|_{x=\varphi(p)} \\ \vdots & \ddots & \vdots \\ \frac{\partial y^d}{\partial x^1}\big|_{x=\varphi(p)} & \cdots & \frac{\partial y^d}{\partial x^d}\big|_{x=\varphi(p)} \end{bmatrix} \begin{bmatrix} v(x^1) \\ \ldots \\ v(x^d) \end{bmatrix}\\ \end{align*} v(y)= v(y1)v(yd) =Jyv(x)= x1y1 x=φ(p)x1yd x=φ(p)xdy1 x=φ(p)xdyd x=φ(p) v(x1)v(xd)

Example: Tangent Vectors on a Sphere

  • 考虑如下的单位球体,定义 curve γ ( t ) \gamma(t) γ(t) (红蓝交界处), θ = π / 4 \theta=\pi/4 θ=π/4
    γ ( t ) = ( cos ⁡ π 4 cos ⁡ π t , cos ⁡ π 4 sin ⁡ π t , sin ⁡ π 4 ) ,   t ∈ [ − 1 , 1 ] \begin{equation*} \gamma(t) = (\cos \frac{\pi}{4}\cos\pi t, \cos \frac{\pi}{4}\sin\pi t, \sin \frac{\pi}{4}), \text{ }t \in [-1, 1] \end{equation*} γ(t)=(cos4πcosπt,cos4πsinπt,sin4π), t[1,1]在这里插入图片描述下面求 p = γ ( t 0 = 0 ) = ( x , y , z ) = ( 1 2 , 0 , 1 2 ) p = \gamma(t_0 = 0) = (x, y, z) = (\frac{1}{\sqrt{2}}, 0, \frac{1}{\sqrt{2}}) p=γ(t0=0)=(x,y,z)=(2 1,0,2 1) 处的 tangent vector
  • 由前面的 “Example: Stereographic Projections for S n S^n Sn” 可知,将单位球体由 north pole 投影到 z = 0 z=0 z=0 的超平面上得到的投影坐标为 ( u 1 , u 2 ) (u_1,u_2) (u1,u2),i.e., φ \varphi φ
    u 1 ( x , y , z ) = x 1 − z u 2 ( x , y , z ) = y 1 − z \begin{align*} u_1(x, y, z) &= \frac{x}{1-z} \\ u_2(x, y, z) &= \frac{y}{1-z} \end{align*} u1(x,y,z)u2(x,y,z)=1zx=1zy因此 p p p 处的 tangent vector 坐标
    d φ ∘ γ ( t ) d t ∣ t = t 0 = [ d u 1 ( cos ⁡ π 4 cos ⁡ π t , cos ⁡ π 4 sin ⁡ π t , sin ⁡ π 4 ) d t , d u 2 ( cos ⁡ π 4 cos ⁡ π t , cos ⁡ π 4 sin ⁡ π t , sin ⁡ π 4 ) d t , ] ∣ t = t 0 = [ d ( ( 2 + 1 ) cos ⁡ π t ) d t , d ( ( 2 + 1 ) sin ⁡ π t ) d t ] ∣ t = t 0 = [ ( 2 + 1 ) π ( − sin ⁡ π t ) , ( 2 + 1 ) π cos ⁡ π t ] ∣ t = t 0 = ( 0 , ( 2 + 1 ) π ) \begin{align*} \frac{d \varphi \circ \gamma(t)}{dt}\Big|_{t=t_0} &= \Big[ \frac{d u_1(\cos \frac{\pi}{4}\cos\pi t, \cos \frac{\pi}{4}\sin\pi t, \sin \frac{\pi}{4})}{dt}, \frac{d u_2(\cos \frac{\pi}{4}\cos\pi t, \cos \frac{\pi}{4}\sin\pi t, \sin \frac{\pi}{4})}{dt}, \Big]\Big|_{t=t_0} \\ &= \Big[ \frac{d \big((\sqrt{2} + 1)\cos\pi t\big)}{dt}, \frac{d \big((\sqrt{2} + 1)\sin\pi t\big)}{dt} \Big]\Big|_{t=t_0} \\ &= \Big[ (\sqrt{2}+1)\pi(-\sin\pi t), (\sqrt{2}+1)\pi\cos\pi t \Big]\Big|_{t=t_0} \\ &= (0, (\sqrt{2}+1)\pi) \\ \end{align*} dtdφγ(t) t=t0=[dtdu1(cos4πcosπt,cos4πsinπt,sin4π),dtdu2(cos4πcosπt,cos4πsinπt,sin4π),] t=t0=[dtd((2 +1)cosπt),dtd((2 +1)sinπt)] t=t0=[(2 +1)π(sinπt),(2 +1)πcosπt] t=t0=(0,(2 +1)π)加上 basis 可以得到如下的 tangent vector
    T φ = 0 ⋅ ( ∂ ∂ u 1 ) p + ( 2 + 1 ) π ⋅ ( ∂ ∂ u 2 ) p \begin{equation*} {\bf T_{\varphi}} = 0 \cdot \big(\frac{\partial}{\partial u_1} \big)_p + (\sqrt{2} + 1)\pi \cdot \big(\frac{\partial}{\partial u_2} \big)_p \end{equation*} Tφ=0(u1)p+(2 +1)π(u2)p
  • 下面考虑将上述 tangent vector 变换到由 south pole 定义的 chart ϑ \vartheta ϑ. φ − 1 \varphi^{-1} φ1
    x = 2 u 1 u 1 2 + u 2 2 + 1 y = 2 u 2 u 1 2 + u 2 2 + 1 z = u 1 2 + u 2 2 − 1 u 1 2 + u 2 2 + 1 \begin{align*} x &= \frac{2u_1}{u_1^2 + u_2^2 + 1} \\ y &= \frac{2u_2}{u_1^2 + u_2^2 + 1} \\ z &= \frac{u_1^2 + u_2^2 - 1}{u_1^2 + u_2^2 + 1} \end{align*} xyz=u12+u22+12u1=u12+u22+12u2=u12+u22+1u12+u221 ϑ \vartheta ϑ
    w 1 ( x , y , z ) = x 1 + z w 2 ( x , y , z ) = y 1 + z \begin{align*} w_1(x, y, z) &= \frac{x}{1+z} \\ w_2(x, y, z) &= \frac{y}{1+z} \\ \end{align*} w1(x,y,z)w2(x,y,z)=1+zx=1+zy因此可以将 w i w_i wi u i u_i ui 表示
    w i ( u 1 , u 2 ) = w i ∘ φ − 1 ( u 1 , u 2 ) = w i ( 2 u 1 u 1 2 + u 2 2 + 1 , 2 u 2 u 1 2 + u 2 2 + 1 , u 1 2 + u 2 2 − 1 u 1 2 + u 2 2 + 1 ) \begin{align*} w_i(u_1, u_2) &= w_i \circ \varphi^{-1}(u_1, u_2) \\ &= w_i\Big(\frac{2u_1}{u_1^2 + u_2^2 + 1}, \frac{2u_2}{u_1^2 + u_2^2 + 1}, \frac{u_1^2 + u_2^2 - 1}{u_1^2 + u_2^2 + 1} \Big) \end{align*} wi(u1,u2)=wiφ1(u1,u2)=wi(u12+u22+12u1,u12+u22+12u2,u12+u22+1u12+u221)代入下式可得
    v ( w ) = J u v ( u ) = [ ∂ w 1 ∂ u 1 ∣ u = φ ( p ) ∂ w 1 ∂ u 2 ∣ u = φ ( p ) ∂ w 2 ∂ u 1 ∣ u = φ ( p ) ∂ w 2 ∂ u 2 ∣ u = φ ( p ) ] [ 0 ( 2 + 1 ) π ] = [ u 2 2 − u 1 2 ( u 1 2 + u 2 2 ) 2 ∣ u = φ ( p ) − 2 u 1 u 2 ( u 1 2 + u 2 2 ) 2 ∣ u = φ ( p ) − 2 u 1 u 2 ( u 1 2 + u 2 2 ) 2 ∣ u = φ ( p ) u 1 2 − u 2 2 ( u 1 2 + u 2 2 ) 2 ∣ u = φ ( p ) ] [ 0 ( 2 + 1 ) π ] = [ − ( 2 + 1 ) 2 ( 2 + 1 ) 4 0 0 ( 2 + 1 ) 2 ( 2 + 1 ) 4 ] [ 0 ( 2 + 1 ) π ] = [ 0 ( 2 − 1 ) π ] \begin{align*} {\bf v(w) } &= {\bf J_u} {\bf v(u)} \\ &= \begin{bmatrix} \frac{\partial w_1}{\partial u_1}\big|_{u=\varphi(p)} & \frac{\partial w_1}{\partial u_2}\big|_{u=\varphi(p)} \\ \frac{\partial w_2}{\partial u_1}\big|_{u=\varphi(p)} & \frac{\partial w_2}{\partial u_2}\big|_{u=\varphi(p)} \end{bmatrix} \begin{bmatrix} 0 \\ (\sqrt{2} + 1)\pi \end{bmatrix}\\ &= \begin{bmatrix} \frac{u_2^2 - u_1^2}{(u_1^2+u_2^2)^2}\big|_{u=\varphi(p)} & -\frac{2u_1u_2}{(u_1^2+u_2^2)^2}\big|_{u=\varphi(p)} \\ -\frac{2u_1u_2}{(u_1^2+u_2^2)^2}\big|_{u=\varphi(p)} & \frac{u_1^2 - u_2^2}{(u_1^2+u_2^2)^2}\big|_{u=\varphi(p)} \end{bmatrix} \begin{bmatrix} 0 \\ (\sqrt{2} + 1)\pi \end{bmatrix}\\ &= \begin{bmatrix} \frac{- (\sqrt{2} + 1)^2}{(\sqrt{2} + 1)^4} & 0 \\ 0 & \frac{(\sqrt{2} + 1)^2}{(\sqrt{2} + 1)^4} \end{bmatrix} \begin{bmatrix} 0 \\ (\sqrt{2} + 1)\pi \end{bmatrix}\\ &= \begin{bmatrix} 0 \\ (\sqrt{2} - 1)\pi \end{bmatrix}\\ \end{align*} v(w)=Juv(u)=[u1w1 u=φ(p)u1w2 u=φ(p)u2w1 u=φ(p)u2w2 u=φ(p)][0(2 +1)π]= (u12+u22)2u22u12 u=φ(p)(u12+u22)22u1u2 u=φ(p)(u12+u22)22u1u2 u=φ(p)(u12+u22)2u12u22 u=φ(p) [0(2 +1)π]= (2 +1)4(2 +1)200(2 +1)4(2 +1)2 [0(2 +1)π]=[0(2 1)π]

Metric Tensor

Covariant vs. Contravariant Tensors

  • 假设向量空间中的一组基为 v \bf v v,向量 u u u 可以被表示为 u = v c \bf u=\bf v\bf c u=vc,其中 c \bf c c 可以看作是向量坐标。现在进行基变换得到一组新基 v ~ \bf \tilde v v~,对应的基变换为 T T T,i.e., v ~ = v T \mathbf{\tilde v} = \mathbf v T v~=vT,此时有 u = v ~ c ~ u=\bf\tilde v \bf\tilde c u=v~c~
  • Contravariant vector 是指在进行基变换时,向量坐标对应的变换为基变换的逆 T − 1 T^{-1} T1,i.e., c ~ = T − 1 c \mathbf{\tilde c} = T^{-1} \bf c c~=T1c. 通常将其写为列向量的形式并用上标表示
    v α = [ v 0 v 1 v 2 ] \begin{equation*} v^\alpha = \begin{bmatrix} v^0 \\ v^1 \\ v^2 \end{bmatrix} \end{equation*} vα= v0v1v2 例如欧氏空间里的 geometric vector 即为 contravariant vector,证明如下:
    v ~ c ~ = v c ⇒ v T c ~ = v c ⇒ c ~ = T − 1 c \mathbf{\tilde v} \mathbf{\tilde c}=\mathbf{ v}\mathbf{c}\\ \Rightarrow\mathbf{v} T\mathbf{\tilde c}=\mathbf{ v}\mathbf{c} \\\Rightarrow \mathbf{\tilde c}=T^{-1}\mathbf{c} v~c~=vcvTc~=vcc~=T1c
  • Covariant vector 是指在进行基变换时,向量坐标对应的变换即为基变换 T T T. 通常将其写为行向量的形式并用下标表示
    u α = [ v 0 , v 1 , v 2 ] \begin{equation*} u_\alpha = [ v_0, v_1, v_2 ] \end{equation*} uα=[v0,v1,v2]例如线性函数 f ( x ) = c x f(\mathbf x)=\mathbf c\mathbf x f(x)=cx 即为 covariant vector ( c \mathbf c c 为行向量, x \mathbf x x 为列向量),证明如下:
    c ~ x ~ = c x c ~ ( T − 1 x ) = c x 基变换后, x ~ = T − 1 x ⇒ c ~ = c T \begin{align*} \mathbf{\tilde c}\mathbf{\tilde x}&=\mathbf c \mathbf x\\ \mathbf{\tilde c}(T^{-1}\mathbf x)&=\mathbf c\mathbf x && \text{基变换后},\mathbf{\tilde x}= T^{-1}\mathbf x\\ \Rightarrow\mathbf{\tilde c}&=\mathbf c T \end{align*} c~x~c~(T1x)c~=cx=cx=cT基变换后x~=T1x
  • We’ll usually write of a ( n , m ) (n,m) (n,m)-tensor where n n n is the number of contravariant components and m m m is the number of covariant components. The rank is then the sum of m + n m+n m+n. Therefore a contravariant vector is a ( 1 , 0 ) (1,0) (1,0)-tensor and a covector is a ( 0 , 1 ) (0,1) (0,1)-tensor.

Linear Transformations as Tensors

在这里插入图片描述

  • 下面考虑进行基变换后 L ~ \tilde L L~ 的形式。假设 v \mathbf v v 为 geometric vector, u = L v \mathbf {u}=L\mathbf {v} u=Lv 为线性变换后的向量坐标,有
    L v = u ⇒ L ( T v ~ ) = T u ~ ⇒ u ~ = T − 1 L T v ~ \begin{aligned} &L\mathbf {v}=\mathbf {u}\\ \Rightarrow& L (T \mathbf {\tilde v}) =T \mathbf {\tilde u}\\ \Rightarrow& \mathbf {\tilde u}=T^{-1}LT\mathbf {\tilde v} \end{aligned} Lv=uL(Tv~)=Tu~u~=T1LTv~ u ~ = L ~ v ~ \mathbf {\tilde u}=\tilde L\mathbf {\tilde v} u~=L~v~ 可知,
    L ~ = T − 1 L T \tilde L=T^{-1}LT L~=T1LT因此线性变换为 ( 1 , 1 ) (1,1) (1,1)-tensor

Bilinear Forms

在这里插入图片描述

  • Bilinear Forms 可以写为矩阵形式:
    B ( u , v ) = u T A v \begin{equation*} B({\bf u}, {\bf v}) = {\bf u^T}A{\bf v} \end{equation*} B(u,v)=uTAv
  • 下面考虑进行基变换后 A ~ \tilde A A~ 的形式。假设 u , v \mathbf u,\mathbf v u,v 为 geometric vector,有
    u T A v = u ~ T A ~ v ~ ⇒ u T A v = ( u T T − T ) A ~ ( T − 1 v ) ⇒ A = T − T A ~ T − 1 ⇒ A ~ = T T A T \begin{aligned} &{\bf u^T}A{\bf v}={\bf \tilde u^T}\tilde A{\bf \tilde v}\\ \Rightarrow& {\bf u^T}A{\bf v}=({\bf u^T}T^{-T})\tilde A(T^{-1}{\bf v})\\ \Rightarrow&A=T^{-T}\tilde AT^{-1}\\ \Rightarrow&\tilde A=T^{T}AT \end{aligned} uTAv=u~TA~v~uTAv=(uTTT)A~(T1v)A=TTA~T1A~=TTAT因此 Bilinear Forms 为 ( 0 , 2 ) (0,2) (0,2)-tensor

The Metric Tensor

在这里插入图片描述

  • Metric Tensor 是一种特殊形式的 Bilinear Form,可以用来定义向量间的距离,例如欧氏空间 R n \R^n Rn 中的向量点积对应的 metric tensor 为 I n I_n In,i.e., u ⋅ v = u T I n v \mathbf u\cdot \mathbf v={\bf u^T}I_n{\bf v} uv=uTInv. 利用 Bilinear Form 基变换的结论,我们可以在基变换的情况下求出向量间的距离,i.e., u T ( T T I n T ) v {\bf u^T}(T^{T}I_nT){\bf v} uT(TTInT)v
  • 当对任何非零向量 v v v 都有 g p ( v , v ) > 0 g_p(v,v)>0 gp(v,v)>0 时,metric tensor 为正定矩阵

Riemannian Manifolds (黎曼流形)

Riemannian metric (tensor) (黎曼度量)

  • 对于 real manifold M M M,定义 metric tensor g p g_p gp 将任意点 p p p 处的两个切向量映射为实数,如果该映射满足 bilinearsymmetric 以及 positive-definete,则 metric tensor g p g_p gp黎曼度量 (a family of inner products)
    g p : T p M × T p M → R , p ∈ M \begin{equation*} g_p: T_pM \times T_pM \rightarrow \mathbb{R}, p \in M \end{equation*} gp:TpM×TpMR,pM

Riemannian Manifolds (黎曼流形)

  • 对于 real manifold M M M,若 (1) M M M 的每个点上都定义一个黎曼度量 g g g (we have a different tensor for e v e r y every every point on the manifold),且 (2) 对于流形上的点 p p p p → g p ( X ( p ) , Y ( p ) ) p\rightarrow g_p(X(p),Y(p)) pgp(X(p),Y(p)) p p p光滑函数,其中 X ( p ) , Y ( p ) X(p),Y(p) X(p),Y(p) 为两个切向量,则称 ( M , g ) (M,g) (M,g) 为一个黎曼流形 (A manifold M M M equipped with a Riemannian metric g g g)
  • 可见对于黎曼流形而言,虽然相邻点的 tangent space 可能不同 (manifold curves 不同,因此 tangent space 也不同),但切空间的内积值在相邻点间光滑地改变。Intuitively, Riemannian manifolds have all the nice “smoothness” properties we would want and makes our lives a lot easier.
  • 通过 Riemannian metric g g g,我们能定义两点间的距离
    在这里插入图片描述其中 geodesic γ ∈ C ∞ ( [ 0 , 1 ] , M ) \gamma\in\mathcal C^\infty([0,1],M) γC([0,1],M) γ ( 0 ) = x , γ ( 1 ) = y \gamma(0)=x,\gamma(1)=y γ(0)=x,γ(1)=y γ ˙ ( t ) \dot{\gamma}(t) γ˙(t) γ \gamma γ t t t 的一阶导

Induced Metric Tensors

  • A natural way to define the metric tensor is to take our n n n dimensional manifold M M M embedded in n + k n+k n+k dimensional Euclidean space, and use the standard Euclidean metric tensor in n + k n+k n+k space but transformed to a local coordinate system on M M M. That is, we’re going to define our family of Riemannian metric tensors using the metric tensor from the embedded Euclidean space. This guarantees that we’ll have this nice smoothness property because we’re inducing it from the standard Euclidean metric in the embedded space.
  • 对于 p p p 点处的 tangent vector v \bf v v x x x n + k n+k n+k 维 embedded space, y y y 为 chart φ \varphi φ 对应的 n n n 维 local coordinate system,对应 tangent space 的 basis 为 ( ∂ ∂ y i ) p \Big(\frac{\partial}{\partial y^i}\Big)_p (yi)p,坐标为 v ( y i ) v(y^i) v(yi). 我们可以进行基变换,找到 v \bf v v 在恒等变换 chart ϑ \vartheta ϑ 对应 tangent space 的坐标,该坐标即为 v \bf v v 在 embedded space 中的坐标:
    v = ∑ i v ( y i ) ⋅ ( ∂ ∂ y i ) p = ∑ i v ( y i ) ⋅ ∂ ( □ ∘ φ − 1 ) ∂ y i ∣ y = φ ( p ) = ∑ i v ( y i ) ⋅ ∑ j ∂ x j ∂ y i ∣ y = φ ( p ) ∂ □ ∂ x j ∣ x = p = ∑ i v ( y i ) ⋅ ∑ j ∂ x j ∂ y i ∣ y = φ ( p ) ( ∂ ∂ x j ) p = ∑ j ( ∑ i v ( y i ) ⋅ ∂ x j ∂ y i ∣ y = φ ( p ) ) ( ∂ ∂ x j ) p \begin{align*} {\bf v} &= \sum_i v(y^i) \cdot \Big(\frac{\partial}{\partial y^i}\Big)_p \\ &= \sum_i v(y^i) \cdot \frac{\partial (□ \circ \varphi^{-1})}{\partial y_i}\Big|_{y=\varphi(p)} \\ &= \sum_i v(y^i) \cdot \sum_j \frac{\partial x^j}{\partial y^i}\Big|_{y=\varphi(p)} \frac{\partial □}{\partial x^j}\Big|_{x=p} \\ &= \sum_i v(y^i) \cdot \sum_j \frac{\partial x^j}{\partial y^i}\Big|_{y=\varphi(p)} \Big(\frac{\partial}{\partial x^j}\Big)_p \\ &= \sum_j \left(\sum_i v(y^i) \cdot \frac{\partial x^j}{\partial y^i}\Big|_{y=\varphi(p)}\right) \Big(\frac{\partial}{\partial x^j}\Big)_p \end{align*} v=iv(yi)(yi)p=iv(yi)yi(φ1) y=φ(p)=iv(yi)jyixj y=φ(p)xj x=p=iv(yi)jyixj y=φ(p)(xj)p=j(iv(yi)yixj y=φ(p))(xj)p由此可知, v \bf v v 在 embedded space 中的坐标 x j x^j xj ∑ i v ( y i ) ⋅ ∂ x j ∂ y i ∣ y = φ ( p ) \sum_i v(y^i) \cdot \frac{\partial x^j}{\partial y^i}\Big|_{y=\varphi(p)} iv(yi)yixj y=φ(p)
  • 假设 v M , w M {\bf v_M}, {\bf w_M} vM,wM 为 embedded Euclidean space 中的 tangent vectors, v U , w U {\bf v_U}, {\bf w_U} vU,wU 为同样的 tangent vectors,但表示在 local coordinate system 中。可得下式:
    g M ( v M , w M ) = v M ⋅ w M Euclidean inner product = [ ∑ i = 1 d v ( y i ) ⋅ ∂ x 1 ∂ y i ∣ y = φ ( p ) … ∑ i = 1 d v ( y i ) ⋅ ∂ x n ∂ y i ∣ y = φ ( p ) ] [ ∑ i = 1 d w ( y i ) ⋅ ∂ x 1 ∂ y i ∣ y = φ ( p ) … ∑ i = 1 d w ( y i ) ⋅ ∂ x n ∂ y i ∣ y = φ ( p ) ] = [ v ( y 1 ) … v ( y d ) ] [ ∂ x 1 ∂ y 1 ∣ y = φ ( p ) … ∂ x n ∂ y 1 ∣ y = φ ( p ) ⋯ ⋱ ⋯ ∂ x 1 ∂ y d ∣ y = φ ( p ) … ∂ x n ∂ y d ∣ y = φ ( p ) ] [ ∂ x 1 ∂ y 1 ∣ y = φ ( p ) … ∂ x 1 ∂ y d ∣ y = φ ( p ) ⋯ ⋱ ⋯ ∂ x n ∂ y 1 ∣ y = φ ( p ) … ∂ x n ∂ y d ∣ y = φ ( p ) ] [ w ( y 1 ) … w ( y d ) ] = v U T J x T J x w U = g ( v U , w U ) \begin{align*} g_M({\bf v_M},{\bf w_M}) &= {\bf v_M} \cdot {\bf w_M} && \text{Euclidean inner product}\\ &= \begin{bmatrix} \sum_{i=1}^d v(y^i) \cdot \frac{\partial x^1}{\partial y^i}\Big|_{y=\varphi(p)} & \ldots & \sum_{i=1}^d v(y^i) \cdot \frac{\partial x^n}{\partial y^i}\Big|_{y=\varphi(p)} \end{bmatrix} \begin{bmatrix} \sum_{i=1}^d w(y^i) \cdot \frac{\partial x^1}{\partial y^i}\Big|_{y=\varphi(p)} \\ \ldots \\ \sum_{i=1}^d w(y^i) \cdot \frac{\partial x^n}{\partial y^i}\Big|_{y=\varphi(p)} \end{bmatrix} \\ &= \begin{bmatrix} v(y^1) & \ldots & v(y^d) \end{bmatrix} \begin{bmatrix} \frac{\partial x^1}{\partial y^1}\Big|_{y=\varphi(p)} & \ldots & \frac{\partial x^n}{\partial y^1}\Big|_{y=\varphi(p)} \\ \cdots & \ddots & \cdots \\ \frac{\partial x^1}{\partial y^d}\Big|_{y=\varphi(p)} & \ldots & \frac{\partial x^n}{\partial y^d}\Big|_{y=\varphi(p)} \\ \end{bmatrix} \begin{bmatrix} \frac{\partial x^1}{\partial y^1}\Big|_{y=\varphi(p)} & \ldots & \frac{\partial x^1}{\partial y^d}\Big|_{y=\varphi(p)} \\ \cdots & \ddots & \cdots \\ \frac{\partial x^n}{\partial y^1}\Big|_{y=\varphi(p)} & \ldots & \frac{\partial x^n}{\partial y^d}\Big|_{y=\varphi(p)} \\ \end{bmatrix} \begin{bmatrix} w(y^1) \\ \ldots \\ w(y^d) \end{bmatrix} \\ &= {\bf v_U}^T{\bf J_{x}}^T{\bf J_{x}}{\bf w_U} \\ &= g({\bf v_U}, {\bf w_U}) \end{align*} gM(vM,wM)=vMwM=[i=1dv(yi)yix1 y=φ(p)i=1dv(yi)yixn y=φ(p)] i=1dw(yi)yix1 y=φ(p)i=1dw(yi)yixn y=φ(p) =[v(y1)v(yd)] y1x1 y=φ(p)ydx1 y=φ(p)y1xn y=φ(p)ydxn y=φ(p) y1x1 y=φ(p)y1xn y=φ(p)ydx1 y=φ(p)ydxn y=φ(p) w(y1)w(yd) =vUTJxTJxwU=g(vU,wU)Euclidean inner product因此 Induced Metric Tensor
    g = J x T J x g = {\bf J_{x}}^T{\bf J_{x}} g=JxTJx
  • 有了 Metric Tensor 之后,我们可以计算 tangent vector 的长度 ∥ u ∥ = g ( u , u ) \|\mathbf u\|=\sqrt{g(\mathbf u,\mathbf u)} u=g(u,u) ,两个 tangent vector 之间的角度 cos ⁡ θ = g ( u , v ) ∥ u ∥ ∥ v ∥ \cos\theta=\frac{g(\mathbf u,\mathbf v)}{\|\mathbf u\|\|\mathbf v\|} cosθ=u∥∥vg(u,v),或是流体的表面积

Hyperbolic Geometry and Poincaré Embeddings

Curvature (曲率)

  • Curvature 可以用于衡量曲面 (surfaces) 或曲线 (curves) 偏离平面 (flat plane) 或直线 (straight line) 的程度,曲率越大,偏离程度越大。以曲线的曲率为例,圆的曲率为 κ = 1 r \kappa=\frac{1}{r} κ=r1,其中 r r r 为圆的半径,而曲线上的三点可以确定一个圆,当这三点无限接近于重合时得到的圆即为密切圆,是对重合点 x 0 x_0 x0 附近曲线的最佳圆近似,密切圆的曲率即为曲线在该点的曲率
    在这里插入图片描述

Gaussian Curvature

  • Gaussian Curvature. a measure of curvature for surfaces (2D manifolds). 某点处的高斯曲率为该点主曲率 κ m a x \kappa_{max} κmax κ m i n \kappa_{min} κmin 的乘积,其中主曲率是指所有过该点曲线的极大曲率 κ m a x \kappa_{max} κmax 以及垂直于极大曲率面的极小曲率 κ m i n \kappa_{min} κmin. 下图分别对应负曲率、零曲率和正曲率

在这里插入图片描述

  • Gaussian curvature 仅适用于 2D surfaces,对于更高维度的流形,我们可以通过观察 surface 上 certain geometric objects 的特性相对于 flat space 的变化来感受曲率。下图展示了三角形在不同曲面上的内角和

在这里插入图片描述


Parallel Transport, Riemannian Curvature Tensor and Sectional Curvature

  • Parallel Transport. 如下图所示,假设我们现在位于地球的 A A A 点并准备一直向北走到 N N N 点,图中给出了 A A A 点处的切向量,在移动过程中,我们始终保持切向量的方向不变。之后,我们继续保持切向量方向不变,从 N N N 点移动到 B B B 点,最后再回到 A A A 点,此时我们发现切向量的方向改变了。 这是因为虽然我们感觉自己在平面上移动,但实际上是在曲面上移动,这就会使得经过一个 loop 后切向量的方向改变了。如果我们在平面上 (zero curvature) 移动就不会发生上述现象
    在这里插入图片描述
  • Riemannian Curvature Tensor. 我们可以用 Riemannian Curvature Tensor 来度量 deviation of parallel transport. 对于光滑流形上的每个点,它都可以给出一个 ( 1 , 3 ) (1, 3) (1,3)-tensor,返回该点处 tangent space 内两个线性无关向量的线性变换,用于衡量 parallel transport 使 tangent space 中向量方向发生偏移的程度
  • Sectional Curvature. 我们可以通过 Riemannian Curvature Tensor 来衡量 P P P 点处的曲率
    K ( u , v ) = ⟨ R ( u , v ) v , u ⟩ ⟨ u , u ⟩ ⟨ v , v ⟩ − ⟨ u , v ⟩ 2 \begin{equation*} K(u, v) = \frac{\langle R(u, v)v, u\rangle }{\langle u, u\rangle \langle v, v\rangle - \langle u, v\rangle ^2} \end{equation*} K(u,v)=u,uv,vu,v2R(u,v)v,u其中 u , v u,v u,v P P P 点处 tangent space 中的两个线性无关向量, R R R 为 Riemannian Curvature Tensor

Euclidean and Non-Euclidean Geometries

Manifolds with Constant Sectional Curvature

  • 在每个点处具有恒定截面曲率的黎曼流形是曲面的特殊形式,它具有以下三种情况:
    • Constant Positive Curvature: Elliptic geometry (e.g. hypersphere)
    • Constant Zero Curvature: Euclidean geometry (e.g. Euclidean space)
    • Constant Negative Curvature: Hyperbolic geometry

Euclidean geometry

  • 欧氏几何满足以下 5 条定理,这些定理定义了一个公理系统 (axiomatic system):(1) A straight line segment can be drawn joining any two points. (2) Any straight line segment can be extended indefinitely in a straight line. (3) Given any straight line segment, a circle can be drawn having the segment as radius and one endpoint as center. (4) All right angles are congruent. (5) If two lines are drawn which intersect a third in such a way that the sum of the inner angles on one side is less than two right angles, then the two lines inevitably must intersect each other on that side if extended far enough. (aka the parallel postulate)
    在这里插入图片描述
  • 欧氏空间是欧氏几何的一种模型 (model). A model for an axiomatic system is a well-defined set, which assigns meaning for the undefined terms presented in the system, in a manner that is correct with the relations defined in the system. 例如欧氏几何定义了点,但没有定义点的具体含义,而欧氏空间则将点定义为了 x ∈ R n x\in\R^n xRn. 将欧氏几何中的抽象概念进行具体定义后,我们就得到了欧氏几何的一个模型

Elliptic Geometry

  • Elliptic Geometry 将第五条定理 parallel postulate 更改为了 “Two lines perpendicular to a given line must intersect.”
    在这里插入图片描述
  • Elliptic Geometry 的一种 modelmanifold defined by the surface of a sphere

Hyperbolic Geometry

  • Hyperbolic Geometry 将第五条定理 parallel postulate 更改为了 “For any given line R R R and point P P P not on R R R, in the plane containing both line R R R and point P P P there are at least two distinct lines through P P P that do not intersect R R R..”
    在这里插入图片描述Figure: Lines x x x and y y y intersecting at P P P never pass through line R R R, although it is possible that they can asymptotically approach it.

This figure is not a great visualization because, as we’ll mention below, you can’t really intuitively represent 2D hyperbolic geometry in 2D Euclidean space. The real tough part is that even for the 2D hyperbolic plane, we cannot embed it into 3D Euclidean space (Hilbert’s theorem). This makes it hard to visualize, and results in a more complex model than the other two geometries.

Hyperbolic Space

在这里插入图片描述

  • 如前所述,我们无法将 2D hyperbolic plane 嵌入到 3D 欧氏空间里,但我们能把它嵌入到 3D Minkowski space (a pseudo-Euclidean space and a type of pseudo-Riemannian manifold)

Minkowski Space

在这里插入图片描述

  • Even though it’s defined as a vector space, we can regard it as basically having n n n real dimensions (just like R n \R^n Rn) but with a special type of metric tensor: the Minkowski metric.
    g E ( u , v ) = u 1 v 1 + u 2 v 2 + … + u n v n Euclidean Metric g M ( u , v ) = ± [ u 1 v 1 − u 2 v 2 − … − u n v n ] Minkowski Metric \begin{align*} g_E({\bf u, v}) &= u_1 v_1 + u_2 v_2 + \ldots + u_n v_n && \text{Euclidean Metric} \\ g_M({\bf u, v}) &= \pm [u_1 v_1 - u_2 v_2 - \ldots - u_n v_n] && \text{Minkowski Metric} \end{align*} gE(u,v)gM(u,v)=u1v1+u2v2++unvn=±[u1v1u2v2unvn]Euclidean MetricMinkowski MetricMinkowski Metric 前取正负号均可,下文将选取正号。注意到 Minkowski Space 中有一个维度是和其余维度是不一样的,在狭义相对论中,这一维度代表时间,其余维度则代表空间

Hyperboloid (双曲面)

  • 双曲线绕其对称轴旋转而生成的曲面即为双曲面。下图主要关注 two sheet hyperboloid
    x 2 a 2 + y 2 b 2 − z 2 c 2 = − 1 \begin{equation*} \frac{x^2}{a^2} + \frac{y^2}{b^2} - \frac{z^2}{c^2} = -1 \end{equation*} a2x2+b2y2c2z2=1其中 z > 0 z>0 z>0 的部分为 “forward sheet”
    在这里插入图片描述
  • two sheet hyperboloid 对应的参数化表示为 (关于双曲函数:可能是最好的讲解双曲函数的文章)
    x = a sinh ⁡ t cos ⁡ θ y = b sinh ⁡ t sin ⁡ θ z = ± c cosh ⁡ t \begin{align*} x &= a\sinh t \cos\theta \\ y &= b\sinh t \sin\theta \\ z &= \pm c\cosh t \\ \end{align*} xyz=asinhtcosθ=bsinhtsinθ=±ccosht

Models for the Hyperbolic Geometry

  • 双曲几何有 4 种常见模型:Klein model, Poincaré disk model, Lorentz (hyperboloid/Minkowski) model, 和 Poincaré half-plane model,下面主要介绍 Lorentz model 和 Poincaré disk model

Lorentz Model

Hyperboloid / Minkowski / Lorentz Model

  • Hyperboloid Model 是属于 n n n 维 Hyperbolic Geometry 的模型,模型中的点被表示在 n + 1 n+1 n+1 维 Minkowski space 中的 two-sheeted hyperboloid 的 forward sheet 上
    x 2 + y 2 − z 2 = − 1 where  z > 0 \begin{equation*} x^2 + y^2 - z^2 = -1 \quad \text{where }z>0 \end{equation*} x2+y2z2=1where z>0
  • 现在我们可以对双曲面中的点、线、圆等几何概念进行可视化。测地线 (geodesic) 是 curved space 中对直线的推广,即两点间的最短路径,它被定义为 “a curve where you can parallel transport a tangent vector without deformation”. 在双曲面模型中,测地线被定义为由两点 (that define the line) 和原点构成的平面与双曲面相交得到的曲线,因此两点间的最短路径并不符合欧式几何中的直观认知,它是先向下走再向上走,如下图的棕色曲线所示
    在这里插入图片描述
  • 我们可以进一步定义两点间的距离,这可以基于 Minkowski metric 对 tangent vectors 的 arc length 积分得到。 u , v \bf u,v u,v 间的距离为
    d ( u , v ) = arcosh ( g M ( u , v ) ) \begin{equation*} d({\bf u, v}) = \text{arcosh}(g_M({\bf u, v})) \end{equation*} d(u,v)=arcosh(gM(u,v))其中 g M ( u , v ) = u 1 v 1 − u 2 v 2 − … − u n v n g_M({\bf u, v}) = u_1 v_1 - u_2 v_2 - \ldots - u_n v_n gM(u,v)=u1v1u2v2unvn 为 Minkowski metric
  • 有了距离定义后我们可以继续定义 Hyperboloid Model 中的圆。最简单的圆就是以 ( 0 , 0 , 1 ) (0,0,1) (0,0,1) 为圆心的圆,该圆所处的平面平行于 z = 0 z=0 z=0 平面,这种圆和欧氏空间里的圆形状相同。而对于圆心不在 ( 0 , 0 , 1 ) (0,0,1) (0,0,1) 的圆,它的圆心就不在中心,而是在偏离中心的某点上,使得圆上各点到圆心的距离相等
    在这里插入图片描述Figure: Visualization of a hyperboloid circle as a “slice” of the forward sheet of a hyperboloid

Example: Calculating the Arc Length of a Geodesic In Hyperbolic Space

  • 下面计算 hyperbolic plane embedded in 3D Minkowski space 上两点间的弧长。假设两点间的 curve 为
    x = sinh ⁡ t y = 0 z = cosh ⁡ t \begin{align*} x &= \sinh t \\ y &= 0 \\ z &= \cosh t \\ \end{align*} xyz=sinht=0=cosht其中 t ∈ [ t a , t b ] t\in[t_a,t_b] t[ta,tb]
  • 可以直接积分去求弧长
    d ( A , B ) = ∫ t a t b g M ( ( d x ( t ) d t , d y ( t ) d t , d z ( t ) d t ) , ( d x ( t ) d t , d y ( t ) d t , d z ( t ) d t ) ) d t = ∫ t a t b ( d z ( t ) d t ) 2 − ( d x ( t ) d t ) 2 − ( d y ( t ) d t ) 2 d t = ∫ t a t b ( d cosh ⁡ t d t ) 2 − ( d sinh ⁡ t d t ) 2 d t = ∫ t a t b sinh ⁡ 2 t − cosh ⁡ 2 t d t = ∫ t a t b 1 d t = t b − t a \begin{align*} d(A, B) &= \int_{t_a}^{t_b} \sqrt{ g_M\big( (\frac{dx(t)}{dt}, \frac{dy(t)}{dt}, \frac{dz(t)}{dt}), (\frac{dx(t)}{dt}, \frac{dy(t)}{dt}, \frac{dz(t)}{dt}) \big)} dt \\ &= \int_{t_a}^{t_b} \sqrt{ \big(\frac{dz(t)}{dt}\big)^2 - \big(\frac{dx(t)}{dt}\big)^2 - \big(\frac{dy(t)}{dt}\big)^2 } dt \\ &= \int_{t_a}^{t_b} \sqrt{ \big(\frac{d \cosh t}{dt}\big)^2 - \big(\frac{d \sinh t}{dt}\big)^2 } dt \\ &= \int_{t_a}^{t_b} \sqrt{ \sinh^2 t -\cosh^2 t } dt \\ &= \int_{t_a}^{t_b} \sqrt{ 1 } dt \\ &= t_b - t_a \end{align*} d(A,B)=tatbgM((dtdx(t),dtdy(t),dtdz(t)),(dtdx(t),dtdy(t),dtdz(t))) dt=tatb(dtdz(t))2(dtdx(t))2(dtdy(t))2 dt=tatb(dtdcosht)2(dtdsinht)2 dt=tatbsinh2tcosh2t dt=tatb1 dt=tbta
  • 也可以直接用距离公式去求弧长,得到的结果和直接积分相同
    d ( A , B ) = a r c o s h ( g M ( ( sinh ⁡ t a , 0 , cosh ⁡ t a ) , ( sinh ⁡ t b , 0 , cosh ⁡ t b ) , ) ) = a r c o s h ( cosh ⁡ t a cosh ⁡ t b − sinh ⁡ t a sinh ⁡ t b ) = a r c o s h ( c o s h ( t b − t a ) ) hyperbolic identity = t b − t a \begin{align*} d(A, B) &= arcosh\Big(g_M\big( (\sinh t_a, 0, \cosh t_a), (\sinh t_b, 0, \cosh t_b), \big) \Big) \\ &= arcosh(\cosh t_a \cosh t_b - \sinh t_a \sinh t_b) \\ &= arcosh(cosh(t_b - t_a)) && \text{hyperbolic identity} \\ &= t_b - t_a \end{align*} d(A,B)=arcosh(gM((sinhta,0,coshta),(sinhtb,0,coshtb),))=arcosh(coshtacoshtbsinhtasinhtb)=arcosh(cosh(tbta))=tbtahyperbolic identity

Poincaré Ball Model

  • Poincaré Ball Model 是 n n n 维双曲几何的一种模型,它的点嵌入在 n n n 维球体中 (or in a circle in the 2D case which is called the Poincaré disk model)
  • Poincaré Ball Model 可以直接由 hyperboloid model 通过 stereoscopic projection 导出,就是将 hyperboloid model 上的点投影到 z = 0 z=0 z=0 平面的单位圆上
    在这里插入图片描述投影时,对于 hyperboloid 上的点 P = ( x 1 , … , x n , z ) P=(x_1, \ldots, x_n, z) P=(x1,,xn,z) 和单位圆上的投影点 Q = ( y 1 , … , y n ) Q=(y_1, \ldots, y_n) Q=(y1,,yn),它们满足如下关系:
    y i = x i z + 1 ( z , x i ) = ( 1 + ∑ y i 2 , 2 y i ) 1 − ∑ y i 2 \begin{align*} y_i &= \frac{x_i}{z + 1} \\ (z, x_i) &= \frac{(1 + \sum y_i^2, 2y_i)}{1 - \sum y_i^2} \end{align*} yi(z,xi)=z+1xi=1yi2(1+yi2,2yi)
  • Poincaré disk model (庞加莱圆盘模型). 在这里插入图片描述如果将 hyperboloid 上的测地线投影到单位圆上,我们就得到了 Poincaré disk model 里的直线,它是一条曲线
    在这里插入图片描述注意到,(1) 该曲线无限逼近,但永远不可能到达单位圆的边界。由投影的过程可知,当曲线接近边界时,它就在接近双曲面上的无穷远处;(2) 上一点表明 “distances at the edge of the circle grow exponentially as you move toward the edge of the circle (compared to their Euclidean distances).”;(3) 曲线以 90 度角接近边界;(4) 上图所示的三条曲线虽然看起来彼此发散,但实际上是平行线,这就是更改欧氏几何第五条定理的结果。下图更清晰地展示了这一过程,我们可以找到无数条过一点的线 (black) 与另一条线平行 (blue)
    在这里插入图片描述Figure: Hyperbolic parallel lines that do not intersect on the Poincaré disk.
  • Metric tensor of the Poincaré disk model
    在这里插入图片描述注意到,该 metric tensor 随 y y y 平滑变化,并且 y y y 离原点越远,计算得到的距离就越大
  • Distance between two points x , y x,y x,y on the Poincaré ball (可以根据 Poincaré ball model 和 Hyperboloid model 之间的坐标转换关系,从 Hyperboloid model 的距离计算公式中推出来)
    d D ( x , y ) = arcosh ( 1 + 2 ∥ x − y ∥ 2 ( 1 − ∥ x ∥ 2 ) ( 1 − ∥ y ∥ 2 ) ) \begin{align*} d_{\mathbb D}(x,y) = \text{arcosh}\left(1 + 2\frac{\lVert x-y \rVert^2}{(1-\lVert x\rVert^2)(1-\lVert y\rVert^2)}\right) \\ \end{align*} dD(x,y)=arcosh(1+2(1x2)(1y2)xy2)Poincaré norm
    ∥ x ∥ D : = d D ( 0 , x ) = 2 tanh ⁡ − 1 ( ∥ x ∥ ) \|x\|_{\mathbb{D}}:=d_{\mathbb{D}}(0, x)=2 \tanh ^{-1}(\|x\|) xD:=dD(0,x)=2tanh1(x)
  • 任何单位圆里的 Euclidean circle 都是 Hyperbolic circle,但反直觉的是圆心并不是 Euclidean center,而是位于一个非对称的位置 (除了以 ( 0 , 0 ) (0, 0) (0,0) 为圆心的圆)

[NeurIPS 2017] Poincaré Embeddings for Learning Hierarchical Representations

[NeurIPS 2018] Hyperbolic neural networks

[PMLR 2018] Hyperbolic entailment cones for learning hierarchical embeddings

References


Further Reading

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值