Note-Towards Logical Specification of Statistical Machine Learning

Content

  • Preliminaries

    前导知识,没怎么看懂,不过好像不影响后面

  • Techniques for Conditional Indistinguishability

    • Counterfactual Epistemic Operators

      介绍了两个操作符,主要为形式化公平这个属性做准备

    • Conditional Indistinguishability via Counterfactual Knowledge

      如何用上文描述的两个操作符去表示’Conditional Indistinguishability’

  • Formal Model for Statistical Classification

    • Statistical Classification Problems

      给出了一些定义:

      • C : D → L C: D \rightarrow L C:DL, L be a finite set of class labels, D be the finite set of input data (called feature vectors) that we want to classify.
      • f : D × L → R f: D \times L \rightarrow R f:D×LR: be a scoring function that gives a score f(v, ℓ) of predicting the class of an input datum (feature vector) v as a label ℓ.
      • H ( v ) = l H(v) = l H(v)=l: to represent that a label ℓ maximizes f(v, ℓ).
    • Modeling the Behaviours of Classifiers

      给出了两个公式

      s ∣ = ψ ( x , y )  iff  C ( σ s ( x ) ) = σ s ( y ) s ∣ = h ( x , y )  iff  H ( σ s ( x ) ) = σ s ( y ) \begin{array}{ll}{s |= \psi(x, y)} & {\text { iff } C\left(\sigma_{s}(x)\right)=\sigma_{s}(y)} \\ {s |= h(x, y)} & {\text { iff } H\left(\sigma_{s}(x)\right)=\sigma_{s}(y)}\end{array} s=ψ(x,y)s=h(x,y) iff C(σs(x))=σs(y) iff H(σs(x))=σs(y)

      ψ(x, y) to represent that C classifies a given input x as a class y.

      h(x, y) to represent that y is the actual class of an input x.

  • Formalizing the Classification Performance

    • 形式化 correctness

    • 1566126879180

    • true positive: s ∣ = ψ ℓ ( x ) ∧ h ℓ ( x ) s |= \psi_{\ell}(x) \wedge h_{\ell}(x) s=ψ(x)h(x).

    • the precision being within an interval I is given by:

      $\operatorname{Pr}\left[v \stackrel{$}{\leftarrow} \sigma_{w_{\mathrm{real}}}(x) : H(v)=\ell | C(v)=\ell\right] \in I​$ or $\operatorname{Pr}\left[s \stackrel{$}{\leftarrow} w_{\mathrm{real}} : s|=h_{\ell}(x) \ | \ s |= \psi_{\ell}(x)\right] \in I​$.

    • Precisione ℓ , I ( x ) =  def  ψ ℓ ( x ) ⊃ P I h ℓ ( x ) \text {Precisione}_{\ell, I}(x) \stackrel{\text { def }}{=} \psi_{\ell}(x) \supset \mathbb{P}_{I} h_{\ell}(x) Precisione,I(x)= def ψ(x)PIh(x) and  precision  = t p t p + f p \text { precision }=\frac{t p}{t p+f p}  precision =tp+fptp.

    • Recall ⁡ ℓ , I ( x ) =  def  h ℓ ( x ) ⊃ P I ψ ℓ ( x ) ​ \operatorname{Recall}_{\ell, I}(x) \stackrel{\text { def }}{=} h_{\ell}(x) \supset \mathbb{P}_{I} \psi_{\ell}(x)​ Recall,I(x)= def h(x)PIψ(x) and recall = t p t p + f n ​ \text {recall}=\frac{t p}{t p+f n}​ recall=tp+fntp.

    • Accuracy ℓ , I ( x ) =  def  P I ( tp ⁡ ( x ) ∨ tn ⁡ ( x ) ) ​ \begin{array}{l}{\text{Accuracy}_{\ell, I}(x) \stackrel{\text { def }}{=}} {\mathbb{P}_{I}(\operatorname{tp}(x) \vee \operatorname{tn}(x))}\end{array}​ Accuracy,I(x)= def PI(tp(x)tn(x)) and T P + T N T P + T N + F P + F N ​ \frac{T P+T N}{T P+T N+F P+F N}​ TP+TN+FP+FNTP+TN.

  • Formalizing the Robustness of Classifiers

    • Probabilistic Robustness against Targeted Attacks
      • 定义:When a robustness attack aims at misclassifying an input as a specific target label, then it is called a targeted attack.
      • K ε D φ \mathrm{K}_{\varepsilon}^{D} \varphi KεDφ represents that the classifier C is confident that ϕ is true as far as it classifies the test data that are perturbed by a level ε of noise.
      • D defined by D ( σ w ( x ) ∥ σ w ′ ( x ) ) = max ⁡ v , v ′ ∥ v − v ′ ∥ p ​ D\left(\sigma_{w}(x) \| \sigma_{w^{\prime}}(x)\right)=\max _{v, v^{\prime}}\left\|v-v^{\prime}\right\|_{p}​ D(σw(x)σw(x))=maxv,vvvp where v and v′ range over the datasets supp(σw(x)) and supp(σw′ (x)) respectively.
      • 以下是给出的公式:
      • h panda  ( x ) ⊃ K ε D P 0 ψ gibon  ( x ) h_{\text {panda }}(x) \supset K_{\varepsilon}^{D} \mathbb{P}_{0} \psi_{\text {gibon }}(x) hpanda (x)KεDP0ψgibon (x), which represents that a panda’s photo x will not be recognized as a gibbon at all after the photo is perturbed by noise.
      •  Target Robust p a n d a , δ ( x ,  gibbon  ) =  def  K ε D ( h panda  ( x ) ⊃ P [ 0 , δ ] ψ gibbon  ( x ) ) \text { Target Robust}_{panda, \delta}(x, \text { gibbon }) \stackrel{\text { def }}{=} K_{\varepsilon}^{D}\left(h_{\text {panda }}(x) \supset \mathbb{P}_{[0, \delta]} \psi_{\text {gibbon }}(x)\right)  Target Robustpanda,δ(x, gibbon )= def KεD(hpanda (x)P[0,δ]ψgibbon (x)).
    • Probabilistic Robustness against Non-Targeted Attacks
      •  TotalRobust ℓ , I ( x ) =  def  K ε D ( h ℓ ( x ) ⊃ P I ψ ℓ ( x ) ) = K ε D Recall ⁡ ℓ , I ( x ) \text { TotalRobust}_{\ell, I}(x) \stackrel{\text { def }}{=} K_{\varepsilon}^{D}\left(h_{\ell}(x) \supset \mathbb{P}_{I} \psi_{\ell}(x)\right)=K_{\varepsilon}^{D} \operatorname{Recall}_{\ell, I}(x)  TotalRobust,I(x)= def KεD(h(x)PIψ(x))=KεDRecall,I(x).
      • 结论:
      •  TotalRobust p a n d a , I ( x )  implies TargetRobust  panda  , δ ( x ,  gibbon  ) \text { TotalRobust}_{panda,I}(x) \text { implies TargetRobust }_{\text {panda }, \delta}(x, \text { gibbon })  TotalRobustpanda,I(x) implies TargetRobust panda ,δ(x, gibbon ).
      • robustness can be regarded as recall in the presence of perturbed noise.
  • Formalizing the Fairness of Classifiers

    • 符号定义
      • s ∣ = η G ( x )  iff  σ s ( x ) ∈ G s |= \eta_{G}(x) \text { iff } \sigma_{s}(x) \in G s=ηG(x) iff σs(x)G.
      • w ∣ = ξ d  iff  σ w ( x ) = d w|=\xi_{d} \text { iff } \sigma_{w}(x)=d w=ξd iff σw(x)=d.
    • Group Fairness (Statistical Parity)
      • 定义:the property that the output distributions of the classifier are identical for different groups.
      • R ε =  def  { ( w , w ′ ) ∈ W × W ∣ D ( σ w ( y ) ∥ σ w ′ ( y ) ) ≤ ε } \mathcal{R}_{\varepsilon} \stackrel{\text { def }}{=}\left\{\left(w, w^{\prime}\right) \in \mathcal{W} \times \mathcal{W} | D\left(\sigma_{w}(y) \| \sigma_{w^{\prime}}(y)\right) \leq \varepsilon\right\} Rε= def {(w,w)W×WD(σw(y)σw(y))ε}.
      • M , w = P ε ‾ φ  iff there exists a  w ′  s.t.  ( w , w ′ ) ∉ R ε  and  M , w ′ ∣ = φ \mathfrak{M}, w=\overline{\mathrm{P}_{\varepsilon}} \varphi \text { iff there exists a } w^{\prime} \text { s.t. }\left(w, w^{\prime}\right) \notin \mathcal{R}_{\varepsilon} \text { and } \mathfrak{M}, w^{\prime} |= \varphi M,w=Pεφ iff there exists a w s.t. (w,w)/Rε and M,w=φ.
      •  GrpFair  ( x , y ) =  def  ( η G 0 ( x ) ∧ ψ ( x , y ) ) ⊃ ¬ P ⁡ ε t v ‾ P 1 ( ξ d ∧ η G 1 ( x ) ∧ ψ ( x , y ) ) \text { GrpFair }(x, y) \stackrel{\text { def }}{=}\left(\eta_{G_{0}}(x) \wedge \psi(x, y)\right) \supset \neg \overline{\operatorname{P}_\varepsilon ^{\mathrm{tv}}} \mathbb{P}_{1}\left(\xi_{d} \wedge \eta_{G_{1}}(x) \wedge \psi(x, y)\right)  GrpFair (x,y)= def (ηG0(x)ψ(x,y))¬PεtvP1(ξdηG1(x)ψ(x,y)).
    • Individual Fairness (as Lipschitz Property)
      • the property that the classifier outputs similar labels given similar inputs.
      • R ε r , D =  def  { ( w , w ′ ) ∈ W × W ∣ v ∈ supp ⁡ ( σ w ( x ) ) , v ′ ∈ supp ⁡ ( σ w ′ ( x ) ) D ( σ w ( y ) ∥ σ w ′ ( y ) ) ≤ ε ⋅ r ( v , v ′ ) } \mathcal{R}_{\varepsilon}^{r, D} \stackrel{\text { def }}{=}\left\{\left(w, w^{\prime}\right) \in \mathcal{W} \times \mathcal{W} | \begin{array}{c}{v \in \operatorname{supp}\left(\sigma_{w}(x)\right), v^{\prime} \in \operatorname{supp}\left(\sigma_{w^{\prime}}(x)\right)} \\ {D\left(\sigma_{w}(y) \| \sigma_{w^{\prime}}(y)\right) \leq \varepsilon \cdot r\left(v, v^{\prime}\right)}\end{array}\right\} Rεr,D= def {(w,w)W×Wvsupp(σw(x)),vsupp(σw(x))D(σw(y)σw(y))εr(v,v)}.
      • lndFair ⁡ ( x , y ) =  def  ψ ( x , y ) ⊃ ¬ P ε r , D ‾ P 1 ( ξ d ∧ ψ ( x , y ) ) \operatorname{lndFair}(x, y) \stackrel{\text { def }}{=} \psi(x, y) \supset \neg \overline{\mathrm{P}_{\varepsilon}^{r, D}} \mathbb{P}_{1}\left(\xi_{d} \wedge \psi(x, y)\right) lndFair(x,y)= def ψ(x,y)¬Pεr,DP1(ξdψ(x,y)).
    • Equal Opportunity
      • the property that the recall (true positive rate) is the same for all the groups.
      • E q O p p ( x ) =  def  ( η G ( x ) ∧ ψ ( x , y ) ) ⊃ ¬ P 0 t v ‾ P 1 ( ξ d ∧ ¬ η G ( x ) ∧ ψ ( x , y ) ) \mathrm{EqOpp}(x) \stackrel{\text { def }}{=}\left(\eta_{G}(x) \wedge \psi(x, y)\right) \supset \neg\overline{\mathrm{P}_{0}^{\mathrm{tv}}} \mathbb{P}_{1}\left(\xi_{d} \wedge \neg \eta_{G}(x) \wedge \psi(x, y)\right) EqOpp(x)= def (ηG(x)ψ(x,y))¬P0tvP1(ξd¬ηG(x)ψ(x,y)).

Reference

统计学(statistical machine learning)

数学符号,|=,左边一个竖线,右边一个等号是什么符号

total variation 总变差

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值