Conformal Prediction

Conformal Prediction in Classification

Conformal Coverage Guarantee

Given the calibration data set { ( X i , Y i ∗ ) } i = 1 n {\left \{\left ( X_i, {Y}_{i}^{*}\right ) \right \}}_{i=1}^{n} {(Xi,Yi)}i=1n and pretrained model f ^ ( ⋅ ) \hat{f}\left ( \cdot\right ) f^() ( f ^ ( X i ) ∈ [ 0 , 1 ] ( K ) \hat{f}\left ( X_i\right ) \in {\left [ 0, 1\right ]}^{\left ( K\right )} f^(Xi)[0,1](K)).
The probability (or confidence) assigned to the true label is f ^ ( X i ) Y i ∗ {\hat{f}\left ( X_i\right ) }_{{Y}_{i}^{*}} f^(Xi)Yi.

Calculate and sort the conformal scores: s i = s ( X i , Y i ∗ ) = 1 − f ^ ( X i ) Y i ∗ s_i= s\left ( X_i, {Y}_{i}^{*}\right ) =1-{\hat{f}\left ( X_i\right ) }_{{Y}_{i}^{*}} si=s(Xi,Yi)=1f^(Xi)Yi ( { s 1 ≤ ⋯ ≤ s n } \left \{s_1 \leq \cdots \leq s_n \right \} {s1sn}).

Obtain the ⌈ ( n + 1 ) ( 1 − α ) ⌉ n \frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n} n(n+1)(1α) quantile of { s i } i = 1 n {\left \{ s_i\right \}}_{i=1}^{n} {si}i=1n: q ^ = inf ⁡ { q : ∣ { i : s i ≤ q } ∣ n ≥ ⌈ ( n + 1 ) ( 1 − α ) ⌉ n } = s ⌈ ( n + 1 ) ( 1 − α ) ⌉ \hat{q}=\inf \left \{ q:\frac{\left | \left \{ i:s_i \leq q\right \}\right |}{n} \geq \frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n} \right \} = {s}_{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil} q^=inf{q:n{i:siq}n(n+1)(1α)}=s(n+1)(1α).

Construct the prediction set of ( X t e s t , Y t e s t ∗ ) \left ( {X}_{test}, {Y}_{test}^{*}\right ) (Xtest,Ytest): C ( X t e s t ) = { y : f ^ ( X t e s t ) y ≥ 1 − q ^ } = { y : s ( X t e s t , y ) ≤ q ^ } \mathcal{C}\left ( {X}_{test}\right )=\left \{ y: {\hat{f}\left ( {X}_{test}\right )}_{y} \geq 1-\hat{q} \right \}=\left \{ y: s\left ( {X}_{test}, y\right )\leq \hat{q}\right \} C(Xtest)={y:f^(Xtest)y1q^}={y:s(Xtest,y)q^}.

The event { Y t e s t ∗ ∈ C ( X t e s t ) } \left \{ {Y}_{test}^{*} \in \mathcal{C}\left ( {X}_{test}\right ) \right \} {YtestC(Xtest)} is equivalent to { s ( X t e s t , Y t e s t ∗ ) ≤ q ^ } \left \{ s\left ( {X}_{test}, {Y}_{test}^{*}\right )\leq \hat{q}\right \} {s(Xtest,Ytest)q^}.

By the exchangeability of ( X 1 , Y 1 ) , ⋯   , ( X n , Y n ) , ( X t e s t , Y t e s t ∗ ) \left ( X_1, Y_1\right ), \cdots ,\left ( X_n, Y_n\right ), \left ( {X}_{test}, {Y}_{test}^{*}\right ) (X1,Y1),,(Xn,Yn),(Xtest,Ytest), we have P ( s t e s t ≤ s i ) = i n + 1 \mathcal{P}\left ( {s}_{test} \leq s_i \right )=\frac{i}{n+1} P(stestsi)=n+1i.

Then we get the probability of conformal coverage: P ( Y t e s t ∗ ∈ C ( X t e s t ) ) = P ( s t e s t ≤ q ^ ) = ⌈ ( n + 1 ) ( 1 − α ) ⌉ n + 1 \mathcal{P}\left ( {Y}_{test}^{*} \in \mathcal{C}\left ( {X}_{test}\right ) \right )=\mathcal{P}\left ( {s}_{test} \leq \hat{q} \right )=\frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n+1} P(YtestC(Xtest))=P(stestq^)=n+1(n+1)(1α).
The lower bound is 1 − α 1-\alpha 1α, and the upper bound is 1 − α + 1 n + 1 1-\alpha + \frac{1}{n+1} 1α+n+11.

Classification with Adaptive Prediction Set

Given { π k ( X i ) } k = 1 K {\left \{ {\pi}_{k}\left ( X_i\right )\right \}}_{k=1}^{K} {πk(Xi)}k=1K as the permutation of { k } k = 1 K {\left \{ k\right \}}_{k=1}^{K} {k}k=1K that sorts f ^ ( X i ) \hat{f}\left ( X_i\right ) f^(Xi) ( { f ( X i ) π 1 ( X i ) ^ ≥ ⋯ ≥ f ( X i ) π K ( X i ) ^ } \left \{ \hat{{f\left (X_i \right )}_{{\pi}_{1}\left ( X_i\right )}} \geq \cdots \geq \hat{{f\left (X_i \right )}_{{\pi}_{K}\left ( X_i\right )}} \right \} {f(Xi)π1(Xi)^f(Xi)πK(Xi)^}).

Calculate the conformal scores: s i = s ( X i , Y i ∗ ) = ∑ j = 1 k f ^ ( X i ) π j ( X i ) s_i= s\left ( X_i, {Y}_{i}^{*}\right ) =\textstyle\sum_{j=1}^{k}{\hat{f}\left ( X_i\right )}_{{\pi}_{j}\left ( X_i\right )} si=s(Xi,Yi)=j=1kf^(Xi)πj(Xi), where π k ( X i ) = Y i ∗ {\pi}_{k}\left ( X_i\right )={Y}_{i}^{*} πk(Xi)=Yi.

Obtain the ⌈ ( n + 1 ) ( 1 − α ) ⌉ n \frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n} n(n+1)(1α) quantile of { s i } i = 1 n {\left \{ s_i\right \}}_{i=1}^{n} {si}i=1n: q ^ \hat{q} q^.

Prediction set: C ( X t e s t ) = { π 1 ( X t e s t ) , ⋯   , π k ( X t e s t ) } \mathcal{C}\left ( {X}_{test}\right )=\left \{ {\pi}_{1}\left ( {X}_{test} \right ), \cdots , {\pi}_{k}\left ( {X}_{test} \right )\right \} C(Xtest)={π1(Xtest),,πk(Xtest)}, where k = sup ⁡ { k ′ : ∑ j = 1 k ′ f ^ ( X t e s t ) π j ( X t e s t ) < q ^ } + 1 k=\sup \left \{ {k}^{\prime}:\textstyle\sum_{j=1}^{{k}^{\prime}} {\hat{f}\left ({X}_{test} \right )}_{{\pi}_{j}\left ( {X}_{test}\right )} < \hat{q} \right \} + 1 k=sup{k:j=1kf^(Xtest)πj(Xtest)<q^}+1.

(proof of bound not completed)

LLMs with Conformal Factualiy Guarantees

Given an input X t e s t ∈ X {X}_{test} \in \mathcal{X} XtestX, we get an output L ( X t e s t ) ∈ Y \mathcal{L}\left ( {X}_{test}\right ) \in \mathcal{Y} L(Xtest)Y. The goal is $\mathcal{P}\left ( \mathcal{L}\left ( {X}_{test}\right ) ; is ; correct \right ) \geq 1-\alpha $.

The correctness of L ( X t e s t ) \mathcal{L} \left ( {X}_{test}\right ) L(Xtest) is equivalent to the entailment relation Y t e s t ∗ ⇒ L ( X t e s t ) {Y}_{test}^{*}\Rightarrow \mathcal{L}\left ( {X}_{test}\right ) YtestL(Xtest).

Define the entailment set of $\mathcal{L}\left ( {X}_{test}\right ) $: E ( L ( X t e s t ) ) = { y ∈ Y : y ⇒ L ( X t e s t ) } \mathcal{E}\left ( \mathcal{L}\left ( {X}_{test}\right ) \right ) = \left \{ y\in\mathcal{Y}:y\Rightarrow \mathcal{L}\left ( {X}_{test}\right ) \right \} E(L(Xtest))={yY:yL(Xtest)}, then { Y t e s t ∗ ⇒ L ( X t e s t ) } \left\{ {Y}_{test}^{*}\Rightarrow \mathcal{L}\left ( {X}_{test}\right )\right\} {YtestL(Xtest)} is equivalent to { Y t e s t ∗ ∈ E ( L ( X t e s t ) ) } \left\{{Y}_{test}^{*} \in \mathcal{E}\left ( \mathcal{L}\left ( {X}_{test}\right ) \right )\right\} {YtestE(L(Xtest))}.

Construct { F t ( X i ) } t ∈ T {\left \{ {\mathcal{F}}_{t}\left ( X_i\right )\right \}}_{t\in \mathcal{T}} {Ft(Xi)}tT following the \textit{nested property} (i.e., ∀ t 1 , t 2 ∈ T , t 1 ≤ t 2 → F t 1 ⊆ F t 2 \forall t_1,t_2 \in \mathcal{T}, t_1 \leq t_2\rightarrow {\mathcal{F}}_{t_1}\subseteq {\mathcal{F}}_{t_2} t1,t2T,t1t2Ft1Ft2), where F t ( X i ) {\mathcal{F}}_{t}\left ( X_i\right ) Ft(Xi) is the entailment set of $ {F}{t}\left ( X_i, \mathcal{L}\left ( X_i\right )\right )$ (i.e., F t ( X i ) = E ( F t ( X i , L ( X i ) ) ) {\mathcal{F}}_{t}\left ( X_i\right )=\mathcal{E}\left ( {F}_{t}\left ( X_i, \mathcal{L}\left ( X_i\right )\right )\right ) Ft(Xi)=E(Ft(Xi,L(Xi)))) and F t ( X i , L ( X i ) ) {F}_{t}\left ( X_i, \mathcal{L}\left ( X_i\right )\right ) Ft(Xi,L(Xi)) is the output calibrated by the ``back off’’ function F t ( ⋅ ) F_t(\cdot) Ft() with the safe threshold t t t from the base output L ( X i ) \mathcal{L}\left ( X_i\right ) L(Xi) (by removing unreliable sub-claims). (${F}{\sup \mathcal{T}}\left ( X_i, \mathcal{L}\left ( X_i\right )\right )=\varnothing $, F 0 ( X i , L ( X i ) ) = L ( X i ) F_0\left(X_i, \mathcal{L}\left( X_i\right) \right)=\mathcal{L}\left(X_i \right) F0(Xi,L(Xi))=L(Xi))

Define the conformal scores: r ( X i , Y i ∗ ) = inf ⁡ { t : ∀ j ≥ t , Y i ∗ ∈ F j ( X i ) } = inf ⁡ { t : ∀ j ≥ t , Y i ∗ ⇒ F j ( X i , L ( X i ) ) } r\left (X_i,{Y}_{i}^{*} \right )=\inf \left \{t:\forall j\geq t,{Y}_{i}^{*} \in\mathcal{F}_j\left ( X_i\right ) \right \}=\inf \left \{t:\forall j\geq t,{Y}_{i}^{*} \Rightarrow F_j\left (X_i,\mathcal{L}\left ( X_i \right ) \right ) \right \} r(Xi,Yi)=inf{t:jt,YiFj(Xi)}=inf{t:jt,YiFj(Xi,L(Xi))}
(min safe threshold that holds the true label).

Obtain the ⌈ ( n + 1 ) ( 1 − α ) ⌉ n \frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n} n(n+1)(1α) quantile of { r ( X i , Y i ∗ ) } i = 1 n {\left \{ r\left (X_i,{Y}_{i}^{*} \right )\right \}}_{i=1}^{n} {r(Xi,Yi)}i=1n: q ^ = r ⌈ ( n + 1 ) ( 1 − α ) ⌉ \hat{q}={r}_{\left \lceil \left ( n+1\right )\left ( 1-\alpha\right )\right \rceil} q^=r(n+1)(1α) (sort r 1 ≤ ⋯ ≤ r n r_1 \leq \cdots \leq r_n r1rn).

{ r t e s t ≤ q ^ } \left \{ {r}_{test} \leq \hat{q} \right \} {rtestq^} is equivalent to { Y t e s t ∗ ⇒ F q ^ ( X t e s t , L ( X t e s t ) ) } \left \{ {Y}_{test}^{*}\Rightarrow {F}_{\hat{q}}\left ( {X}_{test}, \mathcal{L}\left ( {X}_{test}\right )\right )\right \} {YtestFq^(Xtest,L(Xtest))} (i.e., q ^ \hat{q} q^ is a safe threshold).

Then P ( F q ^ ( X t e s t , L ( X t e s t ) )    i s    c o r r e c t ) = P ( r t e s t ≤ q ^ ) = ⌈ ( n + 1 ) ( 1 − α ) ⌉ n + 1 ∈ [ 1 − α , 1 − α + 1 n + 1 ] \mathcal{P}\left ( {F}_{\hat{q}}\left ( {X}_{test}, \mathcal{L}\left ( {X}_{test}\right )\right ) \; is \; correct\right )=\mathcal{P}\left ( {r}_{test} \leq \hat{q}\right )=\frac{\left \lceil \left ( n+1\right )\left ( 1-\alpha \right )\right \rceil}{n+1} \in \left[ 1-\alpha, 1-\alpha + \frac{1}{n+1} \right] P(Fq^(Xtest,L(Xtest))iscorrect)=P(rtestq^)=n+1(n+1)(1α)[1α,1α+n+11].

We can evaluate the entailment of the current output controlled by t t t by only evaluating the sub-claims of the base output once and computing the \textit{supremum} (safe threshold) over the sub-claims within the current output.

For sub-claims { c m } m = 1 M i t {\left \{ c_m\right \}}_{m=1}^{{M}_{it}} {cm}m=1Mit and Y i ∗ ∈ Y {Y}_{i}^{*} \in \mathcal{Y} YiY, { Y i ∗ ⇒ F t ( X i , L ( X i ) ) } ⇔ { ∀ m ∈ M i t , Y i ∗ ⇒ c m } \left \{ {Y}_{i}^{*}\Rightarrow F_t\left ( X_i,\mathcal{L}\left ( X_i\right )\right ) \right \}\Leftrightarrow \left \{\forall m \in {M}_{it}, {Y}_{i}^{*}\Rightarrow c_m \right \} {YiFt(Xi,L(Xi))}{mMit,Yicm}, where M i t {M}_{it} Mit is the current number of the sub-claims within the i i i-th output controlled (or accepted) by t t t.

Then the conformal scores are: r ( X i , Y i ∗ ) = inf ⁡ { t : ∀ j ≥ t , ∀ c ∈ F j ( X i , L ( X i ) ) , Y i ∗ ⇒ c } r\left (X_i,{Y}_{i}^{*} \right )=\inf \left \{t:\forall j\geq t, \forall c\in F_j\left (X_i,\mathcal{L}\left ( X_i\right ) \right ),{Y}_{i}^{*} \Rightarrow c \right \} r(Xi,Yi)=inf{t:jt,cFj(Xi,L(Xi)),Yic}.

As for the entailment set, E ( F t ( X i , L ( X i ) ) ) = ⋂ m M i t E ( c m ) \mathcal{E}\left ( F_t\left ( X_i, \mathcal{L}\left ( X_i\right )\right )\right )=\textstyle\bigcap_{m}^{{M}_{it}}\mathcal{E}\left ( c_m\right ) E(Ft(Xi,L(Xi)))=mMitE(cm), and then the conformal scores are r ( X i , Y i ∗ ) = inf ⁡ { t : ∀ j ≥ t , Y i ∗ ∈ ⋂ m M i j E ( c m ) } r\left (X_i,{Y}_{i}^{*} \right )=\inf \left \{t:\forall j\geq t, {Y}_{i}^{*} \in \textstyle\bigcap_{m}^{{M}_{ij}}\mathcal{E}\left ( c_m\right ) \right \} r(Xi,Yi)=inf{t:jt,YimMijE(cm)}.

Instead of guaranteeing full factuality, we want a ∈ [ 0 , 1 ] a \in \left [ 0,1\right ] a[0,1] fraction of the accepted sub-claims (i.e., F t ( X i , L ( X i ) ) {F}_{t}\left ( {X}_{i}, \mathcal{L}\left ( {X}_{i}\right )\right ) Ft(Xi,L(Xi))) to be factual (Partial entailment keeps the min safe threshold small, which mitigates the issue of q ^ \hat{q} q^ being so large that the accepted sub-claims are uninformative or even empty).

Then the conformal scores with acceptable entailment level a ∈ [ 0 , 1 ] a \in \left [ 0,1\right ] a[0,1] are:\ r a ( X i , Y i ∗ ) = inf ⁡ { t ∈ T : ∀ j ≥ t , M Y i ∗ ( F j ( X i , L ( X i ) ) ) ≥ a } r_a\left (X_i, {Y}_{i}^{*}\right )=\inf \left \{t \in \mathcal{T}:\forall j\geq t, {\mathcal{M}}_{{Y}_{i}^{*}}\left ( F_j\left ( X_i, \mathcal{L}\left ( X_i\right )\right )\right ) \geq a \right \} ra(Xi,Yi)=inf{tT:jt,MYi(Fj(Xi,L(Xi)))a}, where M Y i ∗ ( F j ( X i , L ( X i ) ) ) = 1 M i j ∑ m M i j 1 Y i ∗ ⇒ c m {\mathcal{M}}_{{Y}_{i}^{*}}\left ( F_j\left ( X_i, \mathcal{L}\left ( X_i\right )\right )\right )=\frac{1}{{M}_{ij}}\textstyle\sum_{m}^{{M}_{ij}}{\textbf{1}}_{{Y}_{i}^{*}\Rightarrow c_m} MYi(Fj(Xi,L(Xi)))=Mij1mMij1Yicm, and M i j {M}_{ij} Mij is the current number of the i i i-th output controlled by threshold j j j.

The event { r a ( X t e s t , Y t e s t ∗ ) ≤ q ^ } \left \{ r_a\left ( {X}_{test}, {Y}_{test}^{*} \right ) \leq \hat{q}\right \} {ra(Xtest,Ytest)q^} implies { M Y t e s t ∗ ( F q ^ ( X t e s t , L ( X t e s t ) ) ) ≥ a } \left \{ {\mathcal{M}}_{{Y}_{test}^{*}}\left ( {F}_{\hat{q}}\left ( {X}_{test}, \mathcal{L}\left ( {X}_{test}\right )\right )\right ) \geq a\right \} {MYtest(Fq^(Xtest,L(Xtest)))a} (not equivalent).

  • 4
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
这个问题需要使用到非参数置信度算法,例如Conformal Prediction或Transductive Conformal Prediction。下面是一个使用Transductive Conformal Prediction的Python实现: ```python import numpy as np import pandas as pd from sklearn.neighbors import NearestNeighbors from sklearn.linear_model import LinearRegression from nonconformist.tc import TcpClassifier from nonconformist.nc import MarginErrFunc # 读取数据集 data = pd.read_csv('data.csv') # 划分训练集和测试集 train_data = data[:100] test_data = data[100:] # 训练模型 model = LinearRegression() model.fit(train_data[['X']], train_data[['Y']]) # 计算测试集的预测值和非符合性 X_test = test_data[['X']].values y_test = test_data[['Y']].values predictions = model.predict(X_test) errors = np.abs(predictions - y_test) # 计算最近邻 k = 5 nbrs = NearestNeighbors(n_neighbors=k, algorithm='ball_tree').fit(X_test) distances, indices = nbrs.kneighbors(X_test) # 使用Transductive Conformal Prediction构造置信区间 tcp = TcpClassifier(MarginErrFunc(), k=k) tcp.fit(X_test, y_test, indices, distances) interval90 = tcp.predict(X_test, significance=0.1) interval80 = tcp.predict(X_test, significance=0.2) interval60 = tcp.predict(X_test, significance=0.4) # 计算正确率 correct90 = 0 correct80 = 0 correct60 = 0 for i in range(len(X_test)): if y_test[i] >= interval90[i][0] and y_test[i] <= interval90[i][1]: correct90 += 1 if y_test[i] >= interval80[i][0] and y_test[i] <= interval80[i][1]: correct80 += 1 if y_test[i] >= interval60[i][0] and y_test[i] <= interval60[i][1]: correct60 += 1 accuracy90 = correct90 / len(X_test) accuracy80 = correct80 / len(X_test) accuracy60 = correct60 / len(X_test) print('90% prediction interval accuracy:', accuracy90) print('80% prediction interval accuracy:', accuracy80) print('60% prediction interval accuracy:', accuracy60) ``` 这个实现使用了scikit-learn和nonconformist库来进行Transductive Conformal Prediction,它可以计算测试集的预测值和非符合性,计算最近邻,并使用Transductive Conformal Prediction构造置信区间。最后,它计算正确率并输出结果。请注意,这个实现可能需要额外的库函数,你可以使用 pip 命令来安装它们。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值