Analysis of recommendation algorithms for e-commerce

这是我读完Analysis of recommendation algorithms for e-commerce这篇论文所做的笔记,绝非原创只是一些零碎知识的整理。不妥之处还望广大博友积极提出意见!

       Analysis of recommendation algorithms fore-commerce(2000)


Recommender systems applystatistical and knowledge discovery techniques to the problem of making productrecommendations during a live customer interaction and they are achievingwidespread success in E-Commerce nowadays. In this paper, we investigateseveral techniques for analyzing large-scale purchase and preference data forthe purpose of producing useful recommendations to customers.




The largest E-commerce sitesoffer millions of      products for sale.Choosing among so many options is challenging for consumers.Recommender systemshave emerged in response to this problem.


One of the earliest and most successful recommender technologiesis collaborative filtering .Collaborative filtering works by building adatabase of preferences for products by consumers.



However, there remain important research questions

in overcoming two fundamental challenges for       collaborative filtering recommendersystems.



The first challenge is to improve the scalability  of the collaborative filtering algorithms. These algorithms are able tosearch tens of thousands of p otential neighbors in real-time, but the demandsof modern E-commerce systems are to search tens of millions of p otentialneighb ors. Further, existing algorithms have performance problems withindividual consumers for whom the site has large amounts of  information.


The second challenge is to improve the quality of  the recommendations for the consumers.Consumers need recommendations they can trust to help them findproducts theywill like. If a consumer trusts a recommender system, purchases a product, andfinds out he does not like the product, the consumer will b e unlikely to usethe recommender system again. Recommender systems, like other search systems,have two typ es of characteristic errors: false negatives,which are pro ductsthat are not recommended, though the consumer would like them, and falsepositives, which are products that are recommended, though the consumer doesnot like them. In the E-commerce domain the most important errors to avoid arefalse  positives, since these errors willlead to angry consumers, and since there are usually many products on anE-commerce site that a consumer will like to purchase, so there is no reason torisk recommending one she will not like.

1.1 Problem Statement

The focus of this paper is two-fold. First, we provide a systematicexperimental evaluation of difierent techniques for recommender systems, andsecond, we present new algorithms that are particularly suited for sparse datasets,.These algorithms have characteristics that make them likely to be faster inonline performance than many previously studied algorithms.



1.2 Contributions

This paper has three primary research contributios:

1.  An analysis of the effiectivenessof recommendesystems on actual customer data from an e-commercesite.

2. A comparison of the p erformance of several difierent recommenderalgorithms, including original collab orative filtering algorithms, algorithms basedon dimensionality reduction, and classical data mining algorithms. 

3. A new approach to forming recommendations that  has online eÆciency advantages versuspreviously   studied algorithms, and that also has qualityadvantages in the presence of very sparse datasets, such as is common withE-commerce purchase data.


1.3 Organization



Recommender Systems.

Tapestry is one of the earliestimplementations of collaborative filtering based recommender systems.



Personalization inE-Commerce.

In recent years, with theadvent of E-Commerce the need for p ersonalized services has been emphasized.


Business researchers haveadvo cated the need for one-to-one marketing.



KnowledgeDiscoveryin Databases(KDD)

KDD techniques [10], alsoknown as data mining, usually refer to extraction of implicit but usefulinformation from databases.Two main goals of these techniques are to save moneyby discovering the potential for eficiencies, or to make more money bydiscovering ways to sell more pro ducts to customers.



In recommender systems, oneof the b est known data mining techniques is the discovery of associationrules. The main goal of these rules is to find association between two sets ofproducts in the transaction database such that the presence of products in oneset implies the presence of the products from the other set.



Dimensionality Reduction

There have been substantial researchwork done in the area of dimensionality reduction.




Recommender systems haveevolved in the extremely interactive environment of the web.


They apply data analysis techniquesto the problem of helping customers ?nd which products they would like topurchase at E-Commerce sites by producing a list of top N recommended pro ductsfor a given customer.


3.1 TraditionalDataMining: AssociationRules

Knowledge Discovery inDatabases (KDD) community has

long been interested indevising methods for making product recommendations to customers based ondifferent techniques. One of the most commonly used data mining techniques forE-commerce is finding asso ciation rules b etween a set of co-purchased products. Essentially, these techniques are concerned with discovering association between two sets of products such that the presence of some pro ductsin a particular transaction implies that products from the other set are alsopresent in the same transaction. .


The quality of associationrules is commonly evaluated by looking at their support and confidence.


With association rules it is common to findrules having supp ort and confidence higher than a user-defined minimum. A rulethat has a high confidence level is often very imp ortant, b ecause it providesan accurate prediction of the outcome in question. The support of a rule isalso important, since rules with very low support are often uninteresting,since they do not describe sufficiently large populations, and may beartifacts.


Association rules can be used to developtop-N recommender systems.

3.2 Recommender SystemsBased on Collaborative Filtering 基于协同过滤的推荐系统

Collaborative filtering (CF) [21, 17] isthe most successful recommender system technology to date, and is used in manyof the most successful recommender systems on the Web.


CF systems recommend pro ducts to a targetcustomer based on the opinions of other customers. These systems employstatistical techniques to find a set of customers known as neighbors, that havea history of agreeing with the target user (i.e., they either rate difierentproducts similarly or they tend to buy similar set of products). Once aneighborhood of users is formed, these systems use several algorithms toproduce recommendations.



The "representation" task dealswith the scheme used to model the products that have already been purchased bya customer. The "neighborhood formation" task focuses on the problemof how to identify the other neighboring customers. Finally, the "recommendation

generation" task focuses on theproblem of finding the top-N recommended pro ducts from the neighborhood ofcustomers.




Accordingly, a recommender system based onnearest

neighb or algorithms may b e unable to makeany product recommendations for a particular user. This problem is known asreduced coverage. Furthermore, the

accuracy of recommendations may be poor. Anexample of a missed opp ortunity for quality is the loss of

neighbor transitivity.



Nearestneighb or algorithms require computation   that grows with both the numb er of customersand  the number of products. Withmillions of customers and products, a typical web-based recommender      system running existing algorithms willsuffer serious  scalability problems.




Inreal life scenario, different product names can refer to the similar ob jects.Correlation based   recommender systemscan't find this latent association and treat these pro ducts differently.



Themost important step in CF-based recommender    systems is that of computing the similaritybetweencustomers as it is used to form a proximity-based  neighborhood between a target customer and anumberof like-minded customers.The neighborhood formation process is in factthe model-building or learning process for a recommender system algorithm.


The proximityb etween two customers is usually    measuredusing either the correlation or the cosine




Different Neighborhood Types.

Aftercomputing the all-to-all proximity between   customers, the next task is to actually formthe   neighborhood.

Herewe discuss two schemes:


AggregateNeighb orho o d



NeighborhoodFormed in Low-dimensional Space. The fact that the low dimensional space isless sparse  than its high dimensionalcounterpart led us to formthe neighborhod in reduced space.



The finalstep of a CF-based recommender system is to derive the top-N recommendationsfrom the       neighborhood of customers.We present two different technique for performingthe task:

Most-frequentItem Recommendation;

Association Rule-based Recommendation.




4.3 Experimental Results

Ourmain goal is to explore the possibilities of combining different subtasks toformulate an efficient recommendation algorithm. As the combination ofdifferent parameters and tasks is enormous, we experimentally evaluate eachparameter by making reasonable choices for the rest.



Anumber of interesting observations can be made bylooking into the results ofFigure 6 and Figure 7. First, the CF-based techniques do better than the  traditional rule-based approach and forcertain    density levels the differenceis dramatic. Second, as was expected, as the density decreases the      quality of the recommendation decreases aswell. Third, the lower dimensional representations does better  for ML,but worse for EC compared to theCF-based schemes that use the original representations.




Recommendersystems are a p owerful new technology for extracting additional value for abusiness fromits customer databases.These systems help customersfind productsthey want to buy from a business. Recommender systems benefit customers byenabling the to find products they like.Conversely,they help thebusiness bygenerating more sales.Recommender      systemsare rapidly becoming a crucial tool in     E-commerce on the Web.Recommender systemsare bein

stressedby the huge volume of customer data in existing corporate databases,and will bestressed evenmore by the increasing volume of customer data     available on the Web.New technologies areneeded   that can dramatically improve thescalability of   recommender systems.

Inthis paper we presented and experimentally evaluate various algorithmic choicesfor CF-based recommender systems. Our results show that dimensionalityreductiontechniques hold the promise of allowing CF-based algorithms to scale to largedata sets and at the same time pro duce highquality recommendations. Futurework is required to understand exactly  whylow dimensional representation works well for  some recommender applications, and less wellfor   others.




