2021-10-12RecSys2021通过信息瓶颈缓解推荐中的混淆偏差Mitigating Confounding Bias in Recommendation via Informati-论文翻译

作者:袁冬至
微信:DataGap
公众号:救命的药
研究方向:推荐系统
欢迎交流,学习!

Mitigating Confounding Bias in Recommendation via Information Bottleneck 通过信息瓶颈缓解推荐中的混淆偏差-论文翻译

又是一篇新的关于bias的文章;

ABSTRACT
How to effectively mitigate the bias of feedback in recommender
systems is an important research topic. In this paper, we first de-
scribe the generation process of the biased and unbiased feedback
in recommender systems via two respective causal diagrams, where
the difference between them can be regarded as the source of bias.
We then define this difference as a confounding bias, which can
be regarded as a collection of some specific biases that have pre-
viously been studied. For the case with biased feedback alone, we
derive the conditions that need to be satisfied to obtain a debiased
representation from the causal diagrams. Based on information
theory, we propose a novel method called debiased information bot-
tleneck (DIB) to optimize these conditions and then find a tractable
solution for it. In particular, the proposed method constrains the
model to learn a biased embedding vector with independent biased
and unbiased components in the training phase, and uses only the
unbiased component in the test phase to deliver more accurate
recommendations. Finally, we conduct extensive experiments on a
public dataset and a real product dataset to verify the effectiveness
of the proposed method and discuss its properties.

摘要

如何有效地减少推荐系统中的反馈偏差是一个重要的研究课题。在本文中,我们首先通过两个各自的因果图描述了推荐系统中有偏反馈和无偏反馈的产生过程,其中它们之间的差异可以看作是偏差的来源。然后,我们将这种差异定义为混杂偏差,可以将其视为先前研究过的一些特定偏差的集合。对于仅存在有偏反馈的情况,我们推导了从因果图中获得Debiase表示所需满足的条件。基于信息论,我们提出了一种新的方法,称之为debiased information瓶颈(DIB),以优化这些条件,然后找到一个易于处理的解决方案。特别是,该方法在训练阶段约束模型学习具有独立偏置和无偏置分量的偏置嵌入向量,并且在测试阶段仅使用无偏置分量来提供更准确的建议。最后,我们在一个公共数据集和一个真实的产品数据集上进行了大量的实验,以验证所提方法的有效性并讨论其性质。

总:用信息瓶颈方法去除混淆偏差中的偏差,用实验测试过;

CCS CONCEPTS
•Information systems→Recommender systems.
KEYWORDS
Confounding bias, Causal diagrams, Recommender systems, Infor-
mation bottleneck

CCS概念•

信息系统→推荐系统。

关键词

混淆偏差、因果图、推荐系统、信息瓶颈

1 INTRODUCTION
As a feedback loop system, a recommender system is associated
with various biases during the interaction between the user and
the system, such as position bias [3,38], selection bias [27,32] and
popularity bias [1,6]. Ignoring these biases will cause a recommen-
dation model to converge to a biased sub-optimal solution, and
have harmful effects on the recommender system and the users,
such as filter bubbles [16], echo chambers [11] and unfairness [10].
Therefore, how to effectively alleviate the bias of the feedback data
collected in a recommender system is an important problem.

1简介作为一个反馈回路系统,推荐系统在用户和系统之间的交互过程中与各种偏差相关联,例如位置偏差[3,38]、选择偏差[27,32]和流行偏差[1,6]。忽略这些偏差将导致推荐模型收敛到有偏差的次优解,并对推荐系统和用户产生有害影响,如过滤气泡[16]、回声室[11]和不公平[10]。因此,如何有效地缓解推荐系统中收集的反馈数据的偏差是一个重要的问题。

点评:推荐系统会有多种偏差,忽略这些偏差是次优解还会对用户产生几种有害影响,缓解推荐系统中收集的反馈数偏差很重要;

The previous works solving the bias problem in recommender
systems mainly include the following four lines, i.e., heuristic-based
methods [25,43], inverse propensity score-based methods [32,44,
45], unbiased data augmentation methods [5,23,39,46], and some
theoretical tools-based methods [30,31]. The first line assumes that
user feedback depends on certain specific factors and models this
relationship, such as item features [12,20] and public opinions [22,
24]. The second line uses the inverse propensity score as the sample
weight to adjust the biased feedback distribution. The third line
introduces a special uniform data as an unbiased target data to
guide the training of the biased feedback. The last line aims to
couple certain theoretical tools with the bias problem, and uses
these theoretical tools to design some debiasing models, such as
information bottleneck and causal inference techniques [37,40–42].
However, most methods ignore the bias generation process, and
thus may only be applicable to a certain type of bias problem.

以前解决推荐系统中的偏差问题的工作主要包括以下四行,即。E基于启发式的方法[25,43],基于逆倾向评分的方法[32,44,45],无偏数据扩充方法[5,23,39,46],以及一些基于理论工具的方法[30,31]。第一行假设用户反馈取决于某些特定因素,并对这种关系进行建模,例如项目特征[12,20]和公众意见[22,24]。第二行使用反向倾向得分作为样本权重来调整有偏反馈分布。第三行引入一个特殊的统一数据作为无偏目标数据,指导有偏反馈的训练。最后一行旨在将某些理论工具与偏差问题结合起来,并使用这些理论工具设计一些借记模型,如信息瓶颈和因果推理技术[37,40–42]。然而,大多数方法忽略了偏置的产生过程,因此可能只适用于特定类型的偏置问题。

点评:这段总结的好;偏差的解决方法;点出它们的不足,引出自己的方法;

In this paper, inspired by [13,18], we first describe the generation
process of the biased feedback and unbiased feedback in recom-
mender systems via two respective causal diagrams, where the
difference between them can be regarded as the source of bias. We
define this difference as aconfounding bias, which can be regarded
as a collection of some specific biases that have been studied in
previous works. To simplify and match the main models in the rec-
ommendation field, we generally assume that the confounding bias
will be reflected in the embedding representation of a recommen-
dation model trained with a biased feedback data. Moreover, we
propose adebiased information bottleneck(DIB) objective function
to alleviate the confounding bias in the biased feedback without an
unbiased data

在本文中,受[13,18]的启发,我们首先通过两个各自的因果图描述了推荐系统中有偏反馈和无偏反馈的产生过程,其中它们之间的差异可以被视为偏差的来源。我们将这种差异定义为一种偏差,它可以被视为一些特定偏差的集合,这些偏差已在Liu和Cheng等人之前的著作中进行了研究。为了简化和匹配推荐领域中的主要模型,我们通常假设混杂偏差将反映在使用有偏差反馈数据训练的推荐模型的嵌入表示中。此外,我们提出了有偏信息瓶颈(DIB)目标函数,以缓解无偏数据的有偏反馈中的混淆偏差。

Specifically, the proposed method is based on our observations in
the causal diagrams of the feedback generation process described
above. In the training phase, we constrain the model to learn a
specialbiased embedding vector, including a biased component re-
sponsible for the effect of the confounding bias, and an unbiased
component responsible for the effect of the user’s true preference.
To remove the influence of the confounding bias in the test phase,
we only retain the unbiased component in the embedding vector
in the process of recommending items, i.e., adebiased embedding
vector. The proposed method has better interpretability because it
is directly derived from the causal diagram of the bias generation
process. In addition, the proposed method can be used to solve a
more general bias problem because the confounding bias is essen-
tially a fusion of some specific biases. Finally, we conduct extensive
experiments on a public dataset and a real product dataset to ver-
ify the effectiveness of the proposed method, including standard
unbiased tests, ablation studies, and some in-depth analysis of the
proposed method.

具体而言,所提出的方法基于我们在上述反馈生成过程因果图中的观察结果。在训练阶段,我们约束模型学习一个特殊的有偏嵌入向量,包括一个负责混淆偏差影响的有偏分量和一个负责用户真实偏好影响的无偏分量。为了消除测试阶段混淆偏差的影响,我们在推荐项目的过程中只保留嵌入向量中的无偏分量,即。E有偏嵌入向量。所提出的方法具有更好的解释性,因为它直接来自偏差产生过程的因果图。此外,所提出的方法可用于解决更一般的偏差问题,因为混杂偏差本质上是某些特定偏差的融合。最后,我们在一个公共数据集和一个真实产品数据集上进行了大量实验,以验证所提出方法的有效性,包括标准无偏测试、烧蚀研究以及对所提出方法的一些深入分析。

7 CONCLUSIONS AND FUTURE WORK
In this paper, we describe the generation process of the biased and
unbiased feedback in recommender systems via two respective
causal diagrams, and then define a new bias based on the difference
between them, which is called confounding bias. When only the
biased feedback is available, we analyze the conditions that need to
be met to alleviate the confounding bias, and propose a debiased
information bottleneck (DIB) method to perform this optimization
process based on the guidance of information theory. Moreover, we
also derive a tractable solution for the proposed method. We verify
the effectiveness of the proposed method on a public dataset and
a real product dataset. In addition, we also include some ablation
studies and deep analysis of the proposed method.
For future works, we plan to extend the proposed method to
scenarios where more than one biased data is available. We are
also interested in further relaxing the independent assumptions
of the unbiased and biased components, that is, there may be a
special mixed component entangled with the biased or unbiased
component in some tasks.

7结论和未来工作

在本文中,我们通过两个各自的因果图描述了推荐系统中有偏反馈和无偏反馈的产生过程,然后根据它们之间的差异定义了一个新的偏差,称为混杂偏差。当只有偏差反馈可用时,我们分析了消除混杂偏差所需满足的条件,并基于信息论的指导,提出了一种基于偏差信息瓶颈(DIB)的优化方法。此外,我们还推导了该方法的可处理解。我们在一个公共数据集和一个真实的产品数据集上验证了该方法的有效性。此外,我们还对所提出的方法进行了一些烧蚀研究和深入分析。对于未来的工作,我们计划将建议的方法扩展到多个有偏差数据可用的场景。我们还对进一步放宽无偏分量和有偏分量的独立假设感兴趣,也就是说,在某些任务中,可能有一个特殊的混合分量与有偏分量或无偏分量纠缠在一起。

点评:定义了新的偏差混杂偏差,提出了一种基于偏差信息瓶颈(DIB)的优化方法;在数据集上验证了方法的有效性;未来把改方法扩展到多个有偏差数据可用的场景;

recsys21的论文我怎么看不懂呢?感觉很深奥?是不是我看的太少了,没有用心看;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 6
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值