张量网络算法基础(九、张量网络机器学习【下】)

暑假的接近尾声,这个系列也要结束了,这是这个系列的最后一篇。我们接着上篇继续讲!
在这里插入图片描述

一、监督/非监督张量网络机器学习

∣ ψ ⟩ \left| \psi \right\rangle ψ 是一个L-qubit的量子态,它的参数复杂度会随着特征个数 L 呈指数上升,张量网络机器学习的中心思想之一就是将 ∣ ψ ⟩ \left| \psi \right\rangle ψ用张量网络表示,从而使参数复杂度降低到多项式级。

在给定N个训练样本 { X [ n ] } \left\{ {{X}^{[n]}} \right\} {X[n]}后,我们可以训练量子态,使其满足等概率假设:

P ( X [ 1 ] ) = P ( X [ 2 ] ) = ⋯ \boldsymbol{P}\left(\boldsymbol{X}^{[\mathbf{1}]}\right)=\boldsymbol{P}\left(\boldsymbol{X}^{[2]}\right)=\cdots P(X[1])=P(X[2])=

这被称为MPS非监督机器学习。

定义交叉熵损失函数

f ( { X [ n ] } ) = − 1 N ∑ n = 1 N ln ⁡ P ( X [ n ] ) = − 1 N ∑ n = 1 N ln ⁡ ( ∏ ⊗ l = 1 L ∣ ⟨ x l [ n ] ∣ ψ ⟩ ∣ ) 2 f\left(\left\{X^{[n]}\right\}\right)=-\frac{1}{N} \sum_{n=1}^{N} \ln P\left(X^{[n]}\right)=-\frac{1}{N} \sum_{n=1}^{N} \ln \left(\prod_{\otimes l=1}^{L}\left|\left\langle x_{l}^{[n]} \mid \psi\right\rangle\right|\right)^{2} f({X[n]})=N1n=1NlnP(X[n])=N1n=1Nln(l=1Lxl[n]ψ)2

其中N为训练集样本个数,当且仅当 P ( X [ 1 ] ) = P ( X [ 2 ] ) = ⋯ \boldsymbol{P}\left(\boldsymbol{X}^{[\mathbf{1}]}\right)=\boldsymbol{P}\left(\boldsymbol{X}^{[2]}\right)=\cdots P(X[1])=P(X[2])=时, f f f达到极小。

在MPS表示下损失函数如下图所示:

在这里插入图片描述
定义损失函数后,我们可以通过梯度更新方法,更新张量网络中的张量,使得损失函数降到极低,梯度更新的公式为

A ( l ) ← A ( l ) − η ∂ f ∂ A ( l ) A^{(l)} \leftarrow A^{(l)}-\eta \frac{\partial f}{\partial A^{(l)}} A(l)A(l)ηA(l)f

使用MPS表示 ∣ ψ ⟩ \left| \psi \right\rangle ψ时,可利用MPS中心正交形式,逐个更新各个张量,步骤如下:
a) 更新第 l l l 个张量 A ( l ) ← A ( l ) − η ∂ f ∂ A ( l ) A^{(l)} \leftarrow A^{(l)}-\eta \frac{\partial f}{\partial A^{(l)}} A(l)A(l)ηA(l)f时,将正交中心移动至该张量;
b) 利用张量网络的微分法则求出损失函数关于 A ( l ) ← A ( l ) − η ∂ f ∂ A ( l ) A^{(l)} \leftarrow A^{(l)}-\eta \frac{\partial f}{\partial A^{(l)}} A(l)A(l)ηA(l)f的梯度;

在这里插入图片描述
仔细看看上面这幅图,有用到之前学过的知识。忘记了?没事,再回顾下张量网络的梯度更新

在这里插入图片描述

二、张量网络图片生成与压缩

当得到量子态 ∣ ψ ⟩ \left| \psi \right\rangle ψ后,可以求出像素的联合概率密度,还可以计算条件概率。将图片中部分已知像素记为 { x m [ A ] } \left\{ x_{m}^{[A]} \right\} {xm[A]},剩余未知像素记为 { x n [ B ] } \left\{ x_{n}^{[B]} \right\} {xn[B]},其概率分布可由条件概率给出:

P ( { x n [ B ] } ∣ { x m [ A ] } ) = ( ∏ ⊗ n ⟨ x n [ B ] ∣ ψ ~ ⟩ ) 2 \boldsymbol{P}\left(\left\{\boldsymbol{x}_{\boldsymbol{n}}^{[\boldsymbol{B}]}\right\} \mid\left\{\boldsymbol{x}_{\boldsymbol{m}}^{[\boldsymbol{A}]}\right\}\right)=\left(\prod_{\otimes \boldsymbol{n}}\left\langle\boldsymbol{x}_{\boldsymbol{n}}^{[\boldsymbol{B}]} \mid \tilde{\boldsymbol{\psi}}\right\rangle\right)^{2} P({xn[B]}{xm[A]})=(nxn[B]ψ~)2

其中,量子态 ∣ ψ ~ ⟩ \left| {\tilde{\psi }} \right\rangle ψ~是通过对 ∣ ψ ⟩ \left| \psi \right\rangle ψ的投影测量获得:

∣ ψ ~ ⟩ = 1 Z ∏ ⊗ m ⟨ x m [ A ] ∣ ψ ⟩ |\tilde{\psi}\rangle=\frac{1}{Z} \prod_{\otimes m}\left\langle x_{m}^{[A]} \mid \psi\right\rangle ψ~=Z1mxm[A]ψ

其中Z为归一化系数,如下图所示:

在这里插入图片描述
上述条件概率的定义与之前子系统概率公式自洽。根据概率公式:

P ( { x n [ B ] } ) = ∑ { x m ( A ) } P ( { x m [ A ] } ∪ { x n [ B ] } ) = ∑ { x m ( A ) } P ( { x n [ B ] } ∣ { x m [ A ] } ) P ( { x m [ A ] } ) P\left(\left\{x_{n}^{[B]}\right\}\right)=\sum_{\left\{x_{m}^{(A)}\right\}} P\left(\left\{x_{m}^{[A]}\right\} \cup\left\{x_{n}^{[B]}\right\}\right)=\sum_{\left\{x_{m}^{(A)}\right\}} P\left(\left\{x_{n}^{[B]}\right\} \mid\left\{x_{m}^{[A]}\right\}\right) P\left(\left\{x_{m}^{[A]}\right\}\right) P({xn[B]})={xm(A)}P({xm[A]}{xn[B]})={xm(A)}P({xn[B]}{xm[A]})P({xm[A]})

上篇博客留下的证明题在这!
在这里插入图片描述

在等概率的先验分布前提下,先验分布近似为等概率分布,有:

P ( { x n [ B ] } ) = 1 Z ∑ { x m [ A ] } P ( { x n [ B ] } ∣ { x m [ A ] } ) P\left(\left\{x_{n}^{[B]}\right\}\right)=\frac{1}{Z} \sum_{\left\{x_{m}^{[A]}\right\}} P\left(\left\{x_{n}^{[B]}\right\} \mid\left\{x_{m}^{[A]}\right\}\right) P({xn[B]})=Z1{xm[A]}P({xn[B]}{xm[A]})

由于:
P ( { x n [ B ] } ∣ { x m [ A ] } ) = ( ∏ ⊗ n ⟨ x n [ B ] ∣ ψ ~ ⟩ ) 2 \boldsymbol{P}\left(\left\{\boldsymbol{x}_{\boldsymbol{n}}^{[\boldsymbol{B}]}\right\} \mid\left\{\boldsymbol{x}_{\boldsymbol{m}}^{[\boldsymbol{A}]}\right\}\right)=\left(\prod_{\otimes \boldsymbol{n}}\left\langle\boldsymbol{x}_{\boldsymbol{n}}^{[\boldsymbol{B}]} \mid \tilde{\boldsymbol{\psi}}\right\rangle\right)^{2} P({xn[B]}{xm[A]})=(nxn[B]ψ~)2

所以

P ( { x n [ B ] } ∣ { x m [ A ] } ) = 1 z ( Π ⊗ n ⟨ x n [ B ] ∣ Π ⊗ m ⟨ x m [ A ] ∣ ψ ⟩ ) 2 P\left(\left\{x_{n}^{[B]}\right\} \mid\left\{x_{m}^{[A]}\right\}\right)=\frac{1}{z}\left(\Pi_{\otimes n}\left\langle x_{n}^{[B]}\left|\Pi_{\otimes m}\left\langle x_{m}^{[A]} \mid \psi\right\rangle\right)^{2}\right.\right. P({xn[B]}{xm[A]})=z1(Πnxn[B]Πmxm[A]ψ)2

代入条件概率公式得:

P ( { x n [ B ] } ) = ∏ ⊗ n Tr ⁡ { x m [ A ] } ⟨ x n [ B ] ∣ φ ⟩ ⟨ φ ∣ x n [ B ] ⟩ = ∏ ⊗ n ⟨ x n [ B ] ∣ Tr ⁡ { x m ( A ) } ∣ φ ⟩ ⟨ φ ∣ x n [ B ] ⟩ \mathrm{P}\left(\left\{x_{n}^{[\mathrm{B}]}\right\}\right)=\prod_{\otimes n} \operatorname{Tr}_{\left\{x_{m}^{[\mathrm{A}]}\right\}}\left\langle x_{n}^{[\mathrm{B}]} \mid \varphi\right\rangle\left\langle\varphi \mid x_{n}^{[\mathrm{B}]}\right\rangle=\prod_{\otimes n}\left\langle x_{n}^{[\mathrm{B}]}\left|\operatorname{Tr}_{\left\{x_{m}^{(\mathrm{A})}\right\}}\right| \varphi\right\rangle\left\langle\varphi \mid x_{n}^{[\mathrm{B}]}\right\rangle P({xn[B]})=nTr{xm[A]}xn[B]φφxn[B]=nxn[B]Tr{xm(A)}φφxn[B]

又因为

ρ ^ [ B ] = Tr ⁡ { x m [ A ] } ∣ φ ⟩ ⟨ φ ∣ \hat{\rho}^{[B]}=\operatorname{Tr}_{\left\{x_{m}^{[\mathrm{A}]}\right\}}|\varphi\rangle\langle\varphi| ρ^[B]=Tr{xm[A]}φφ

由此上篇学过的概率公式 P ( { x n [ B ] } ) = ∏ ⊗ n ⟨ x n [ B ] ∣ ρ ^ [ B ] ∣ x n [ B ] ⟩ \text{P}\left( \left\{ x_{n}^{[\text{B}]} \right\} \right)=\prod\limits_{\otimes n}{\left\langle x_{n}^{[\text{B}]}\left| {{{\hat{\rho }}}^{[B]}} \right|x_{n}^{[\text{B}]} \right\rangle } P({xn[B]})=nxn[B]ρ^[B]xn[B]得证。

采用逐点生成的方法避免指数级复杂度:

P ( { x n [ B ] } ∣ { x m [ A ] } ) = 1 z ( Π ⊗ n ⟨ x n [ B ] ∣ Π ⊗ m ⟨ x m [ A ] ∣ ψ ⟩ ) 2 P\left(\left\{x_{n}^{[B]}\right\} \mid\left\{x_{m}^{[A]}\right\}\right)=\frac{1}{z}\left(\Pi_{\otimes n}\left\langle x_{n}^{[B]}\left|\Pi_{\otimes m}\left\langle x_{m}^{[A]} \mid \psi\right\rangle\right)^{2}\right.\right. P({xn[B]}{xm[A]})=z1(Πnxn[B]Πmxm[A]ψ)2

步骤如下:

  1. 通过 ∣ ψ ⟩ \left| \psi \right\rangle ψ与已知像素 { x m [ A ] } \left\{ x_{m}^{[A]} \right\} {xm[A]},利用投影公式 ∣ ψ ~ ⟩ = 1 Z ∏ ⊗ m ⟨ x m [ A ] ∣ ψ ⟩ |\tilde{\psi}\rangle=\frac{1}{Z} \prod_{\otimes m}\left\langle x_{m}^{[A]} \mid \psi\right\rangle ψ~=Z1mxm[A]ψ计算描述未知像素对应的量子态,记为 ∣ ψ ~ 0 ⟩ \left| {{{\tilde{\psi }}}^{\text{0}}} \right\rangle ψ~0
  2. 利用 ∣ ψ ~ t − 1 ⟩ \left| {{{\tilde{\psi }}}^{t-1}} \right\rangle ψ~t1计算第t个未知像素 x t [ B ] x_{t}^{[B]} xt[B]对应的qubit的约化密度矩阵 ρ ^ [ t − 1 ] \hat{\rho }_{{}}^{[t-1]} ρ^[t1],计算该像素的概率分布 P ( x ) = ⟨ x ∣ ρ ^ [ t − 1 ] ∣ x ⟩ P(x)=\left\langle x\left|\hat{\rho}^{[t-1]}\right| x\right\rangle P(x)=xρ^[t1]x,并根据该概率分布进行采样,生成该像素 x t [ B ] x_{t}^{[B]} xt[B]
  3. 如果仍有未知像素,则根据生成的像素对 ∣ ψ ~ t − 1 ⟩ \left| {{{\tilde{\psi }}}^{t-1}} \right\rangle ψ~t1进行投影得到
    ∣ ψ ~ t ⟩ = 1 z ⟨ x t B ∣ ψ ~ ( t − 1 ) ⟩ \left| {{{\tilde{\psi }}}^{t}} \right\rangle =\frac{1}{z}\left\langle x_{t}^{B} | {{{\tilde{\psi }}}^{(t-1)}} \right\rangle ψ~t=z1xtBψ~(t1),并执行步骤 2

建议:上面的的内容有点难懂,当某个地方没有看懂的时候建议往上看看相应的公式,或许这样能好懂许多。

如果仅知道 ∣ ψ ⟩ \left| \psi \right\rangle ψ而不知道任何像素, { x m [ A ] } \left\{ x_{m}^{[A]} \right\} {xm[A]}可以把看成空集,也可以通过逐点生成法生成图片。每次通过约化密度矩阵计算出单个像素概率分布后,可通过随机采样生成该像素。可以在获得单个像素概率分布后,计算最概然的像素值作为生成的像素,采用此方法,每个量子态 ∣ ψ ⟩ \left| \psi \right\rangle ψ仅生成一张最概然的图片,被称为量子平均图。

张量网络压缩感知:利用最概然生成法,对图片进行压缩采样,即通过保留尽量少的像素,利用量子态恢复重构出原始图片。其中采样的方法被称为纠缠次序采样协议(EOSP),核心思想是利用纠缠熵衡量不同像素携带的信息量的大小,以此采样纠缠熵大的像素。

在这里插入图片描述

三、监督性张量网络机器学习

以分类任务为例,我们需要使用张量网络建立从数据 { x } \{x\} {x}到分类标签 κ \kappa κ的的函数映射 f f f,对于概率模型而言,该映射可以由条件概率P给出:

f : { x } → κ ⇒ P ( κ ∣ { x } ) f:\{x\} \rightarrow \kappa \Rightarrow P(\kappa \mid\{x\}) f:{x}κP(κ{x})

利用张量网络建立条件概率,一种常用的方法就是采用非监督机器学习的方法获得 ∣ ψ ⌣ ⟩ \left| {\overset{\scriptscriptstyle\smile}{\psi }} \right\rangle ψ并计算联合概率分布,再通过投影计算获得条件概率。 ∣ ψ ⌣ ⟩ \left| {\overset{\scriptscriptstyle\smile}{\psi }} \right\rangle ψ中含有(L+1)个qubit,其中L个qubit对应于图片像素,一个qubit对应于分类标签,因此MPS中含有(L+1)个物理指标。
在这里插入图片描述
进行监督学习的具体方法:

  1. 将分类标签 κ \kappa κ也当作是特征量,利用训练集进行非监督机器学习,训练量子态 ∣ ψ ⌣ ⟩ \left| {\overset{\scriptscriptstyle\smile}{\psi }} \right\rangle ψ
  2. 利用投影计算条件概率 P ( κ ∣ { x } ) = ( ⟨ κ ∣ ∏ ⊗ n ( x n ∣ ψ ~ ⟩ ) 2 P(\kappa \mid \{x\})=\left( \left\langle \kappa \right|\prod\limits_{\otimes n}{{{\left( {{x}_{n}}|\tilde{\psi }\rangle \right)}^{2}}} \right. P(κ{x})=(κn(xnψ~)2,分类结果即为最概然的标签值argmax κ _{\kappa } κ P ( κ ∣ { x } ) = ( ∏ ⊗ n ( x n ∣ ψ ~ ⟩ ) 2 P(\kappa \mid\{x\})=\left(\prod_{\otimes n}\left(x_{n}|\tilde{\psi}\rangle\right)^{2}\right. P(κ{x})=(n(xnψ~)2

注意:量子态归一化条件是张量网络量子概率可解释性的核心

我是一只正在不断学习、希望早日成为小白的小小白,有什么错误欢迎大家批评指正,喜欢的请点个赞哦!
在这里插入图片描述

Modern applications in engineering and data science are increasingly based on multidimensional data of exceedingly high volume, variety, and structural richness. However, standard machine learning algo- rithms typically scale exponentially with data volume and complex- ity of cross-modal couplings - the so called curse of dimensionality - which is prohibitive to the analysis of large-scale, multi-modal and multi-relational datasets. Given that such data are often efficiently represented as multiway arrays or tensors, it is therefore timely and valuable for the multidisciplinary machine learning and data analytic communities to review low-rank tensor decompositions and tensor net- works as emerging tools for dimensionality reduction and large scale optimization problems. Our particular emphasis is on elucidating that, by virtue of the underlying low-rank approximations, tensor networks have the ability to alleviate the curse of dimensionality in a number of applied areas. In Part 1 of this monograph we provide innovative solutions to low-rank tensor network decompositions and easy to in- terpret graphical representations of the mathematical operations on tensor networks. Such a conceptual insight allows for seamless migra- tion of ideas from the flat-view matrices to tensor network operations and vice versa, and provides a platform for further developments, prac- tical applications, and non-Euclidean extensions. It also permits the introduction of various tensor network operations without an explicit notion of mathematical expressions, which may be beneficial for many research communities that do not directly rely on multilinear algebra. Our focus is on the Tucker and tensor train (TT) decompositions and their extensions, and on demonstrating the ability of tensor networks to provide linearly or even super-linearly (e.g., logarithmically) scalable solutions, as illustrated in detail in Part 2 of this monograph.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值