Analysis of recommendation algorithms for e-commerce

这是我读完Analysis of recommendation algorithms for e-commerce这篇论文所做的笔记,绝非原创只是一些零碎知识的整理。不妥之处还望广大博友积极提出意见!


       Analysis of recommendation algorithms fore-commerce(2000)
                                                 电子商务算法分析

ABSTRACT

Recommender systems applystatistical and knowledge discovery techniques to the problem of making productrecommendations during a live customer interaction and they are achievingwidespread success in E-Commerce nowadays. In this paper, we investigateseveral techniques for analyzing large-scale purchase and preference data forthe purpose of producing useful recommendations to customers.

推荐系统将统计和知识发现技术应用于实时客户交互过程中产品推荐的问题,并在当今电子商务中取得了广泛的成功。本文通过对大规模购买和偏好数据进行分析的几种技术,为客户提供有用的建议。

 

1. INTRODUCTION

The largest E-commerce sitesoffer millions of      products for sale.Choosing among so many options is challenging for consumers.Recommender systemshave emerged in response to this problem.

最大的电子商务网站提供数以百万计的产品来出售。对于消费者而言,在这么多的选择中选择是有挑战性。针对这个问题,就出现了推荐系统。

One of the earliest and most successful recommender technologiesis collaborative filtering .Collaborative filtering works by building adatabase of preferences for products by consumers.

协同过滤是最早且最成功的推荐技术之一。协同过滤通过消费者建立一个偏向于产品的数据库来工作。

 

However, there remain important research questions

in overcoming two fundamental challenges for       collaborative filtering recommendersystems.

然而,在克服协同过滤推荐系统所面临的两个基本挑战中仍然存在着重要的研究问题。

 

The first challenge is to improve the scalability  of the collaborative filtering algorithms. These algorithms are able tosearch tens of thousands of p otential neighbors in real-time, but the demandsof modern E-commerce systems are to search tens of millions of p otentialneighb ors. Further, existing algorithms have performance problems withindividual consumers for whom the site has large amounts of  information.

第一个挑战是提高协同过滤算法的可伸缩性。这些算法能够实时地搜索数以万计的潜在邻居,但现代电子商务系统的需求是搜索数以百万计的潜在的邻居。此外,因为站点有大量的信息,现有的算法对个人消费者有性能问题。

tens of thousands of 数以万计的

tens of millions of 数以百万计

 

The second challenge is to improve the quality of  the recommendations for the consumers.Consumers need recommendations they can trust to help them findproducts theywill like. If a consumer trusts a recommender system, purchases a product, andfinds out he does not like the product, the consumer will b e unlikely to usethe recommender system again. Recommender systems, like other search systems,have two typ es of characteristic errors: false negatives,which are pro ductsthat are not recommended, though the consumer would like them, and falsepositives, which are products that are recommended, though the consumer doesnot like them. In the E-commerce domain the most important errors to avoid arefalse  positives, since these errors willlead to angry consumers, and since there are usually many products on anE-commerce site that a consumer will like to purchase, so there is no reason torisk recommending one she will not like.

第二个挑战是提高消费者的建议质量。消费者需要他们可以信任的推荐系统,帮助他们找到自己喜欢的产品。如果消费者信任一个推荐系统,购买一个产品,发现他不喜欢产品,消费者将不太可能再次使用推荐系统。推荐系统和其他搜索系统一样,有两个特征错误:假阴性,这些都是不推荐的尽管消费者会喜欢它们,而假阳性则是推荐的产品,尽管消费者不喜欢它们。在电子商务领域最重要的错误,避免假阳性,因为这些错误将导致消费者愤怒,因为通常消费者将在电子商务站点购买许多产品, ,所以没有理由风险建议她所不喜欢的。

 

1.1 Problem Statement

The focus of this paper is two-fold. First, we provide a systematicexperimental evaluation of difierent techniques for recommender systems, andsecond, we present new algorithms that are particularly suited for sparse datasets,.These algorithms have characteristics that make them likely to be faster inonline performance than many previously studied algorithms.

本文的重点是两方面。首先,我们提供一个系统地对推荐系统的不同技术进行了简单的评价,其次,我们提出了特别适用于稀疏数据集的新算法。这些算法的特点使它们在在线性能上比许多以前研究过的算法更快。

 

1.2 Contributions

This paper has three primary research contributios:

1.  An analysis of the effiectivenessof recommendesystems on actual customer data from an e-commercesite.

2. A comparison of the p erformance of several difierent recommenderalgorithms, including original collab orative filtering algorithms, algorithms basedon dimensionality reduction, and classical data mining algorithms. 

3. A new approach to forming recommendations that  has online eÆciency advantages versuspreviously   studied algorithms, and that also has qualityadvantages in the presence of very sparse datasets, such as is common withE-commerce purchase data.

 

1.3 Organization

 

2. RELATED WORK

Recommender Systems.

Tapestry is one of the earliestimplementations of collaborative filtering based recommender systems.

Tapestry是基于协同过滤的推荐系统的最早实现之一。

 

pseudonymous collaborativefiltering匿名的协同过滤器

 

Personalization inE-Commerce.

In recent years, with theadvent of E-Commerce the need for p ersonalized services has been emphasized.

近年来,随着电子商务的出现,个性化服务的需求得到了强调。

Business researchers haveadvo cated the need for one-to-one marketing.

商业研究人员提出了一对一营销的必要性

 

KnowledgeDiscoveryin Databases(KDD)

KDD techniques [10], alsoknown as data mining, usually refer to extraction of implicit but usefulinformation from databases.Two main goals of these techniques are to save moneyby discovering the potential for eficiencies, or to make more money bydiscovering ways to sell more pro ducts to customers.

KDD技术[10],也称为数据挖掘,通常指从数据库中提取隐含但有用的信息。这些技术的两个主要目标是通过发现潜在的可能性来节省资金,或者通过发现向客户销售更多产品的方法来赚取更多的钱。

 

In recommender systems, oneof the b est known data mining techniques is the discovery of associationrules. The main goal of these rules is to find association between two sets ofproducts in the transaction database such that the presence of products in oneset implies the presence of the products from the other set.

在推荐系统中,最著名的数据挖掘技术之一就是发现关联规则。这些规则的主要目标是在事务数据库中发现两组产品之间的关联,这样一组产品的出现就意味着产品来自另一个集合。

 

Dimensionality Reduction

There have been substantial researchwork done in the area of dimensionality reduction.

在降维领域已经做了大量的研究工作。

 

3. RECOMMENDER SYSTEMS

Recommender systems haveevolved in the extremely interactive environment of the web.

推荐系统已经在网络的极交互环境中发展。

They apply data analysis techniquesto the problem of helping customers ?nd which products they would like topurchase at E-Commerce sites by producing a list of top N recommended pro ductsfor a given customer.

他们将数据分析技术应用于帮助客户发现他们想要在电子商务网站购买哪些产品的问题,他们列出了一个给定客户的首选的推荐产品列表。

3.1 TraditionalDataMining: AssociationRules

Knowledge Discovery inDatabases (KDD) community has

long been interested indevising methods for making product recommendations to customers based ondifferent techniques. One of the most commonly used data mining techniques forE-commerce is finding asso ciation rules b etween a set of co-purchased products. Essentially, these techniques are concerned with discovering association between two sets of products such that the presence of some pro ductsin a particular transaction implies that products from the other set are alsopresent in the same transaction. .

数据库(KDD)邻域的知识发现长期以来一直对设计方法,根据不同技术向客户提出产品建议有兴趣。在电子商务中最常用的数据挖掘技术之一就是找到一套共同购买的专业产品的应用程序。从本质上讲,这些技术涉及到在两组产品之间发现了可选择的连接,这样在特定事务中出现一些专业的管道就意味着其他集合中的产品也存在于相同的事务中。

The quality of associationrules is commonly evaluated by looking at their support and confidence.

关联规则的质量通常通过观察他们的支持和信心来评估。

With association rules it is common to findrules having supp ort and confidence higher than a user-defined minimum. A rulethat has a high confidence level is often very imp ortant, b ecause it providesan accurate prediction of the outcome in question. The support of a rule isalso important, since rules with very low support are often uninteresting,since they do not describe sufficiently large populations, and may beartifacts.

有了关联规则,通常会发现规则比用户定义的最小值更支持和信任。一个具有高可信度的规则通常非常重要,因为它提供了对问题结果的准确预测。规则的支持也很重要,因为很少支持的规则通常是无趣的,因为它们没有描述足够大的数量,并且可能是工件。

Association rules can be used to developtop-N recommender systems.

关联规则可用于开发top - n推荐系统.

3.2 Recommender SystemsBased on Collaborative Filtering 基于协同过滤的推荐系统

Collaborative filtering (CF) [21, 17] isthe most successful recommender system technology to date, and is used in manyof the most successful recommender systems on the Web.

协同过滤(CF)是迄今为止最成功的推荐系统技术,在网络上许多最成功的推荐系统中使用。

CF systems recommend pro ducts to a targetcustomer based on the opinions of other customers. These systems employstatistical techniques to find a set of customers known as neighbors, that havea history of agreeing with the target user (i.e., they either rate difierentproducts similarly or they tend to buy similar set of products). Once aneighborhood of users is formed, these systems use several algorithms toproduce recommendations.

CF系统根据其他客户的意见向目标客户推荐产品。这些系统采用统计技术来寻找一组被称为邻居的客户,他们有与目标用户达成一致的历史(也就是。他们要么对不同的产品进行评级,要么倾向于购买类似的产品。一旦形成了一个用户社区,这些系统就会使用多种算法来生成建议。

 

The "representation" task dealswith the scheme used to model the products that have already been purchased bya customer. The "neighborhood formation" task focuses on the problemof how to identify the other neighboring customers. Finally, the "recommendation

generation" task focuses on theproblem of finding the top-N recommended pro ducts from the neighborhood ofcustomers.

“表示”任务处理用于为已经被客户购买的产品建模的方案。“邻里形成”任务的重点在于如何识别其他邻近的顾客。最后,“建议“生成”的任务是解决在客户附近找到最推荐的专业产品的问题。

Representation:

Sparsity

Accordingly, a recommender system based onnearest

neighb or algorithms may b e unable to makeany product recommendations for a particular user. This problem is known asreduced coverage. Furthermore, the

accuracy of recommendations may be poor. Anexample of a missed opp ortunity for quality is the loss of

neighbor transitivity.

因此,一个基于最近的推荐系统邻居或算法可能无法为某个特定用户提供任何产品推荐。这个问题被称为减少覆盖率。此外,建议的准确性可能很差。一个错过了质量的机会的例子就是损失邻居传递性。

Scalability

Nearestneighb or algorithms require computation   that grows with both the numb er of customersand  the number of products. Withmillions of customers and products, a typical web-based recommender      system running existing algorithms willsuffer serious  scalability problems.

最近的邻居或算法需要的计算量随着客户数量和产品数量的增加而增长。有了数百万的客户和产品,一个典型的基于web的推荐系统运行现有的算法将会遇到严重的可伸缩性问题。

 

Synonymy

Inreal life scenario, different product names can refer to the similar ob jects.Correlation based   recommender systemscan't find this latent association and treat these pro ducts differently.

在真实的场景中,不同的产品名称可以引用类似的对象。基于相关的推荐系统无法找到这种潜在的关联,并以不同的方式处理这些问题。

 

Themost important step in CF-based recommender    systems is that of computing the similaritybetweencustomers as it is used to form a proximity-based  neighborhood between a target customer and anumberof like-minded customers.The neighborhood formation process is in factthe model-building or learning process for a recommender system algorithm.

在基于CF的推荐系统中,最重要的一步是计算客户之间的相似度,因为它被用来在目标客户和众多志同道合的客户之间建立一个以近邻为基础的社区。邻域形成过程实际上是一个推荐系统算法的模型构建或学习过程。

The proximityb etween two customers is usually    measuredusing either the correlation or the cosine

measure.

两个客户之间的距离通常使用相关性或余弦来度量衡量。

 

Different Neighborhood Types.

Aftercomputing the all-to-all proximity between   customers, the next task is to actually formthe   neighborhood.

Herewe discuss two schemes:

Center-based;

AggregateNeighb orho o d

在计算了客户之间的所有距离之后,下一个任务就是实际地形成社区。

 

NeighborhoodFormed in Low-dimensional Space. The fact that the low dimensional space isless sparse  than its high dimensionalcounterpart led us to formthe neighborhod in reduced space.

邻域是在低维空间中形成的。低维度空间相对于高维空间的稀疏性,使我们在缩小的空间中形成了邻居。

 

The finalstep of a CF-based recommender system is to derive the top-N recommendationsfrom the       neighborhood of customers.We present two different technique for performingthe task:

Most-frequentItem Recommendation;

Association Rule-based Recommendation.

基于cf的推荐系统的最后一步是从客户的社区中获得最重要的建议。我们提出了两种不同的执行任务的方法:

 

4. EXPERIMENTAL EVALUATION  实验评价

4.3 Experimental Results

Ourmain goal is to explore the possibilities of combining different subtasks toformulate an efficient recommendation algorithm. As the combination ofdifferent parameters and tasks is enormous, we experimentally evaluate eachparameter by making reasonable choices for the rest.

我们的主要目标是探索将不同的子任务组合成一个有效的推荐算法的可能性。由于不同的参数和任务的组合是巨大的,我们通过对其余的参数做出合理的选择,对每个参数进行实验评估。

 

Anumber of interesting observations can be made bylooking into the results ofFigure 6 and Figure 7. First, the CF-based techniques do better than the  traditional rule-based approach and forcertain    density levels the differenceis dramatic. Second, as was expected, as the density decreases the      quality of the recommendation decreases aswell. Third, the lower dimensional representations does better  for ML,but worse for EC compared to theCF-based schemes that use the original representations.

通过查看图6和图7的结果,可以进行一些有趣的观察。首先,基于cf的技术优于传统的基于规则的方法,对于某些密度级别的差异是巨大的。第二,正如预期的那样,随着密度的降低,建议的质量也降低了。第三,较低维度的表示对ML来说比较好,但与使用原始表示的基于cof的方案相比,它更糟。

 

5. CONCLUSION

Recommendersystems are a p owerful new technology for extracting additional value for abusiness fromits customer databases.These systems help customersfind productsthey want to buy from a business. Recommender systems benefit customers byenabling the to find products they like.Conversely,they help thebusiness bygenerating more sales.Recommender      systemsare rapidly becoming a crucial tool in     E-commerce on the Web.Recommender systemsare bein

stressedby the huge volume of customer data in existing corporate databases,and will bestressed evenmore by the increasing volume of customer data     available on the Web.New technologies areneeded   that can dramatically improve thescalability of   recommender systems.

Inthis paper we presented and experimentally evaluate various algorithmic choicesfor CF-based recommender systems. Our results show that dimensionalityreductiontechniques hold the promise of allowing CF-based algorithms to scale to largedata sets and at the same time pro duce highquality recommendations. Futurework is required to understand exactly  whylow dimensional representation works well for  some recommender applications, and less wellfor   others.

推荐系统是一种可以从客户数据库中提取业务价值的新技术。这些系统帮助客户找到他们想从企业购买的产品。推荐系统通过让用户找到他们喜欢的产品从而使客户受益。相反,他们通过创造更多的销售来帮助企业。推荐系统正迅速成为网络上电子商务的重要工具。因为在现有的公司数据库中大量的客户数据的压力下使用,推荐系统将会更大的压力;因为越来越多的客户数据在Web上可用,将会有更大的压力。需要新技术,可以显著提高推荐系统的可伸缩性。

本文给出了基于cf的推荐系统的各种算法选择。我们的结果表明,维数减少技术能够保证基于cf的算法在大数据集上进行扩展,同时产生高质量的推荐。未来的工作需要理解为什么低维度的表现在一些推荐应用程序中工作得很好,而对其他人来说则不太好。

 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值