基础知识《一》

太棒了 又收集到一些好东西---2014-11-05

1 http://www.huxiu.com/article/6550/1.html

http://blog.csdn.net/lzt1983/article/details/7696578

3 https://code.google.com/p/recsyscode/

http://www.lifecrunch.biz/

http://iamcaihuafeng.blog.sohu.com/150048878.html

6 我爱自然语言

 

2012届KDD Cup

 

Track1任务:社交网络中的个性化推荐系统

根据腾讯微博中的用户属性(User Profile)、SNS社交关系、在社交网络中的互动记录(retweet、comment、at)等,以及过去30天内的历史item推荐记录,来预测接下来最有可能被用户接受的推荐item列表

Track2任务:搜索广告系统的pTCR点击率预估

提供用户在腾讯搜索的查询词(query)、展现的广告信息(包括广告标题、描述、url等),以及广告的相对位置(多条广告中的排名)和用户点击情况,以及广告主和用户的属性信息,来预测后续时间用户对广告的点击情况

 

数据集:http://www.kddcup2012.org/c/kddcup2012-track1/data

论文:http://www.kddcup2012.org/workshop

 

2011届KDD Cup

 

Track1任务:音乐评分预测

根据用户在雅虎音乐上item的历史评分记录,来预测用户对其他item(包括歌曲、专辑等)的评分和实际评分之间的差异RMSE(最小均方误差)。同时提供的还有歌曲所属的专辑、歌手、曲风等信息

Track2任务:识别音乐是否被用户评分

每个用户提供6首候选的歌曲,其中3首为用户已评分数据,另3首是该用户未评分,但是出自用户中整体评分较高的歌曲。歌曲的属性信息(专辑、歌手、曲风等)也同样提供。参赛者给出二分分类结果(0/1分类),并根据整体准确率计算最终排名

 

数据集:http://kddcup.yahoo.com/datasets.php#

论文:http://kddcup.yahoo.com/workshop.php

 

2009届KDD Cup

 

法国电信运营商Orange的大规模数据中,积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统(CRM),用快速、稳定的方法,预测客户三个维度的属性,包括:1、忠诚度:用户切换运营商的可能性(Churn);2、购买欲:购买新服务的可能性(Appetency);3、增值性:客户升级或追加购买高利润产品的可能性(Up-selling)。结果用AUC曲线来评估

 

数据集:http://www.sigkdd.org/kddcup/index.php

论文:http://jmlr.csail.mit.edu/proceedings/papers/v7/

 

 

附上我收集的资料链接,格式基本按照‘URL+资料名称+出现在书中的页数’,某些链接可能需要你翻过一道‘墙’,某些重复引用的我就没重复贴上链接了 
   
   
  http://en.wikipedia.org/wiki/Information_overload 
   P1 
   
  http://www.readwriteweb.com/archives/recommender_systems.php 
  (A Guide to Recommender System) P4 
   
  http://en.wikipedia.org/wiki/Cross-selling 
   (Cross Selling) P6 
   
  http://blog.kiwitobes.com/?p=58 , http://stanford2009.wikispaces.com/ 
  (课程:Data Mining and E-Business: The Social Data Revolution) P7 
   
   http://thesearchstrategy.com/ebooks/an%20introduction%20to%20search%20engines%20and%20web%20navigation.pdf 
  (An Introduction to Search Engines and Web Navigation) p7 
   
  http://www.netflixprize.com/ 
  p8 
   
  http://cdn-0.nflximg.com/us/pdf/Consumer_Press_Kit.pdf 
   p9 
   
   http://stuyresearch.googlecode.com/hg-history/c5aa9d65d48c787fd72dcd0ba3016938312102bd/blake/resources/p293-davidson.pdf 
  (The Youtube video recommendation system) p9 
   
   http://www.slideshare.net/plamere/music-recommendation-and-discovery 
  ( PPT: Music Recommendation and Discovery) p12 
   
  http://www.facebook.com/instantpersonalization/ 
  P13 
   
   http://about.digg.com/blog/digg-recommendation-engine-updates 
   (Digg Recommendation Engine Updates) P16 
   
   http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36955.pdf 
   (The Learning Behind Gmail Priority Inbox)p17 
   
  http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf 
  (Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems) P20 
   
  http://www-users.cs.umn.edu/~mcnee/mcnee-cscw2006.pdf 
   (Don’t Look Stupid: Avoiding Pitfalls when Recommending Research Papers)P23 
   
  http://www.sigkdd.org/explorations/issues/9-2-2007-12/7-Netflix-2.pdf 
   (Major componets of the gravity recommender system) P25 
   
  http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext 
  (What is a Good Recomendation Algorithm?) P26 
   
  http://research.microsoft.com/pubs/115396/evaluationmetrics.tr.pdf 
   (Evaluation Recommendation Systems) P27 
   
  http://mtg.upf.edu/static/media/PhD_ocelma.pdf 
  (Music Recommendation and Discovery in the Long Tail) P29 
   
  http://ir.ii.uam.es/divers2011/ 
  (Internation Workshop on Novelty and Diversity in Recommender Systems) p29 
   
  http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_21.pdf 
  (Auralist: Introducing Serendipity into Music Recommendation ) P30 
   
  http://www.springerlink.com/content/978-3-540-78196-7/#section=239197&page=1&locus=21 
  (Metrics for evaluating the serendipity of recommendation lists) P30 
   
  http://dare.uva.nl/document/131544 
  (The effects of transparency on trust in and acceptance of a content-based art recommender) P31 
   
  http://brettb.net/project/papers/2007%20Trust-aware%20recommender%20systems.pdf 
   (Trust-aware recommender systems) P31 
   
  http://recsys.acm.org/2011/pdfs/RobustTutorial.pdf 
  (Tutorial on robutness of recommender system) P32 
   
  http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html 
   (Five Stars Dominate Ratings) P37 
   
  http://www.informatik.uni-freiburg.de/~cziegler/BX/ 
  (Book-Crossing Dataset) P38 
   
  http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html 
  (Lastfm Dataset) P39 
   
  http://mmdays.com/2008/11/22/power_law_1/ 
  (浅谈网络世界的Power Law现象) P39 
   
  http://www.grouplens.org/node/73/ 
  (MovieLens Dataset) P42 
   
  http://research.microsoft.com/pubs/69656/tr-98-12.pdf 
  (Empirical Analysis of Predictive Algorithms for Collaborative Filtering) P49 
   
  http://vimeo.com/1242909 
  (Digg Vedio) P50 
   
  http://glaros.dtc.umn.edu/gkhome/fetch/papers/itemrsCIKM01.pdf 
   (Evaluation of Item-Based Top-N Recommendation Algorithms) P58 
   
  http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf 
  (Amazon.com Recommendations Item-to-Item Collaborative Filtering) P59 
   
  http://glinden.blogspot.com/2006/03/early-amazon-similarities.html 
   (Greg Linden Blog) P63 
   
  http://www.hpl.hp.com/techreports/2008/HPL-2008-48R1.pdf 
  (One-Class Collaborative Filtering) P67 
   
  http://en.wikipedia.org/wiki/Stochastic_gradient_descent 
  (Stochastic Gradient Descent) P68 
   
  http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf 
   (Latent Factor Models for Web Recommender Systems) P70 
   
  http://en.wikipedia.org/wiki/Bipartite_graph 
  (Bipatite Graph) P73 
   
  http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4072747&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4072747 
  (Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation) P74 
   
  http://www-cs-students.stanford.edu/~taherh/papers/topic-sensitive-pagerank.pdf 
  (Topic Sensitive Pagerank) P74 
   
  http://www.stanford.edu/dept/ICME/docs/thesis/Li-2009.pdf 
  (FAST ALGORITHMS FOR SPARSE MATRIX INVERSE COMPUTATIONS) P77 
   
  https://www.aaai.org/ojs/index.php/aimagazine/article/view/1292 
   (LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data) P80 
   
  http://research.yahoo.com/files/wsdm266m-golbandi.pdf 
  ( adaptive bootstrapping of recommender systems using decision trees) P87 
   
  http://en.wikipedia.org/wiki/Vector_space_model 
  (Vector Space Model) P90 
   
  http://tunedit.org/challenge/VLNetChallenge 
  (冷启动问题的比赛) P92 
   
  http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf 
   (Latent Dirichlet Allocation) P92 
   
  http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence 
   (Kullback–Leibler divergence) P93 
   
  http://www.pandora.com/about/mgp 
  (About The Music Genome Project) P94 
   
  http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes 
  (Pandora Music Genome Project Attributes) P94 
   
  http://www.jinni.com/movie-genome.html 
  (Jinni Movie Genome) P94 
   
  http://www.shilad.com/papers/tagsplanations_iui2009.pdf 
   (Tagsplanations: Explaining Recommendations Using Tags) P96 
   
  http://en.wikipedia.org/wiki/Tag_(metadata) 
  (Tag Wikipedia) P96 
   
  http://www.shilad.com/shilads_thesis.pdf 
  (Nurturing Tagging Communities) P100 
   
  http://www.stanford.edu/~morganya/research/chi2007-tagging.pdf 
   (Why We Tag: Motivations for Annotation in Mobile and Online Media ) P100 
   
  http://www.google.com/url?sa=t&rct=j&q=delicious%20dataset%20dai-larbor&source=web&cd=1&ved=0CFIQFjAA&url=http%3A%2F%2Fwww.dai-labor.de%2Fen%2Fcompetence_centers%2Firml%2Fdatasets%2F&ei=1R4JUKyFOKu0iQfKvazzCQ&;usg=AFQjCNGuVzzKIKi3K2YFybxrCNxbtKqS4A&cad=rjt 
  (Delicious Dataset) P101 
   
  http://research.microsoft.com/pubs/73692/yihgoca-www06.pdf 
   (Finding Advertising Keywords on Web Pages) P118 
   
  http://www.kde.cs.uni-kassel.de/ws/rsdc08/ 
  (基于标签的推荐系统比赛) P119 
   
  http://delab.csd.auth.gr/papers/recsys.pdf 
  (Tag recommendations based on tensor dimensionality reduction)P119 
   
  http://www.l3s.de/web/upload/documents/1/recSys09.pdf 
  (latent dirichlet allocation for tag recommendation) P119 
   
  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5271&rep=rep1&type=pdf 
  (Folkrank: A ranking algorithm for folksonomies) P119 
   
  http://www.grouplens.org/system/files/tagommenders_numbered.pdf 
   (Tagommenders: Connecting Users to Items through Tags) P119 
   
  http://www.grouplens.org/system/files/group07-sen.pdf 
  (The Quest for Quality Tags) P120 
   
  http://2011.camrachallenge.com/ 
  (Challenge on Context-aware Movie Recommendation) P123 
   
  http://bits.blogs.nytimes.com/2011/09/07/the-lifespan-of-a-link/ 
  (The Lifespan of a link) P125 
   
  http://www0.cs.ucl.ac.uk/staff/l.capra/publications/lathia_sigir10.pdf 
   (Temporal Diversity in Recommender Systems) P129 
   
  http://staff.science.uva.nl/~kamps/ireval/papers/paper_14.pdf 
   (Evaluating Collaborative Filtering Over Time) P129 
   
  http://www.google.com/places/ 
  (Hotpot) P139 
   
  http://www.readwriteweb.com/archives/google_launches_recommendation_engine_for_places.php 
  (Google Launches Hotpot, A Recommendation Engine for Places) P139 
   
  http://xavier.amatriain.net/pubs/GeolocatedRecommendations.pdf 
   (geolocated recommendations) P140 
   
  http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html 
  (A Peek Into Netflix Queues) P141 
   
  http://www.cs.umd.edu/users/meesh/420/neighbor.pdf 
  (Distance Browsing in Spatial Databases1) P142 
   
  http://www.eng.auburn.edu/~weishinn/papers/MDM2010.pdf 
   (Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks) P143 
   
   
  http://blog.nielsen.com/nielsenwire/consumer/global-advertising-consumers-trust-real-friends-and-virtual-strangers-the-most/ 
  (Global Advertising: Consumers Trust Real Friends and Virtual Strangers the Most) P144 
   
  http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36371.pdf 
  (Suggesting Friends Using the Implicit Social Graph) P145 
   
  http://blog.nielsen.com/nielsenwire/online_mobile/friends-frenemies-why-we-add-and-remove-facebook-friends/ 
  (Friends & Frenemies: Why We Add and Remove Facebook Friends) P147 
   
  http://snap.stanford.edu/data/ 
  (Stanford Large Network Dataset Collection) P149 
   
  http://www.dai-labor.de/camra2010/ 
  (Workshop on Context-awareness in Retrieval and Recommendation) P151 
   
  http://www.comp.hkbu.edu.hk/~lichen/download/p245-yuan.pdf 
   (Factorization vs. Regularization: Fusing Heterogeneous 
  Social Relationships in Top-N Recommendation) P153 
   
  http://www.infoq.com/news/2009/06/Twitter-Architecture/ 
  (Twitter, an Evolving Architecture) P154 
   
  http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CGQQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.165.3679%26rep%3Drep1%26type%3Dpdf&ei=dIIJUMzEE8WviQf5tNjcCQ&usg=AFQjCNGw2bHXJ6MdYpksL66bhUE8krS41w&sig2=5EcEDhRe9S5SQNNojWk7_Q 
  (Recommendations in taste related domains) P155 
   
  http://www.ercim.eu/publication/ws-proceedings/DelNoe02/RashmiSinha.pdf 
  (Comparing Recommendations Made by Online Systems and Friends) P155 
   
  http://techcrunch.com/2010/04/22/facebook-edgerank/ 
  (EdgeRank: The Secret Sauce That Makes Facebook's News Feed Tick) P157 
   
  http://www.grouplens.org/system/files/p217-chen.pdf 
  (Speak Little and Well: Recommending Conversations in Online Social Streams) P158
   
  http://blog.linkedin.com/2008/04/11/learn-more-abou-2/ 
  (Learn more about “People You May Know”) P160 
   
  http://domino.watson.ibm.com/cambridge/research.nsf/58bac2a2a6b05a1285256b30005b3953/8186a48526821924852576b300537839/$FILE/TR%202009.09%20Make%20New%20Frends.pdf 
  (“Make New Friends, but Keep the Old” – Recommending People on Social Networking Sites) P164 
   
  http://www.google.com.hk/url?sa=t&rct=j&q=social+recommendation+using+prob&source=web&cd=2&ved=0CFcQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.141.465%26rep%3Drep1%26type%3Dpdf&ei=LY0JUJ7OL9GPiAfe8ZzyCQ&usg=AFQjCNH-xTUWrs9hkxTA8si5fztAdDAEng 
  (SoRec: Social Recommendation Using Probabilistic Matrix) P165 
   
  http://olivier.chapelle.cc/pub/DBN_www2009.pdf 
  (A Dynamic Bayesian Network Click Model for Web Search Ranking) P177 
   
  http://www.google.com.hk/url?sa=t&rct=j&q=online+learning+from+click+data+spnsored+search&source=web&cd=1&ved=0CFkQFjAA&url=http%3A%2F%2Fwww.research.yahoo.net%2Ffiles%2Fp227-ciaramita.pdf&ei=HY8JUJW8CrGuiQfpx-XyCQ&usg=AFQjCNE_CYbEs8DVo84V-0VXs5FeqaJ5GQ&cad=rjt 
  (Online Learning from Click Data for Sponsored Search) P177 
   
  http://www.cs.cmu.edu/~deepay/mywww/papers/www08-interaction.pdf 
  (Contextual Advertising by Combining Relevance with Click Feedback) P177 
  http://tech.hulu.com/blog/2011/09/19/recommendation-system/ 
  (Hulu 推荐系统架构) P178 
   
  http://mymediaproject.codeplex.com/ 
  (MyMedia Project) P178 
   
  http://www.grouplens.org/papers/pdf/www10_sarwar.pdf 
  (item-based collaborative filtering recommendation algorithms) P185 
   
  http://www.stanford.edu/~koutrika/Readings/res/Default/billsus98learning.pdf 
  (Learning Collaborative Information Filters) P186 
   
  http://sifter.org/~simon/journal/20061211.html 
  (Simon Funk Blog:Funk SVD) P187 
   
  http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/a1-koren.pdf 
  (Factor in the Neighbors: Scalable and Accurate Collaborative Filtering) P190 
   
  http://nlpr-web.ia.ac.cn/2009papers/gjhy/gh26.pdf 
  (Time-dependent Models in Collaborative Filtering based Recommender System) P193 
   
  http://sydney.edu.au/engineering/it/~josiah/lemma/kdd-fp074-koren.pdf 
  (Collaborative filtering with temporal dynamics) P193 
   
  http://en.wikipedia.org/wiki/Least_squares 
  (Least Squares Wikipedia) P195 
   
  http://www.mimuw.edu.pl/~paterek/ap_kdd.pdf 
  (Improving regularized singular value decomposition for collaborative filtering) P195 
   
  http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf 
   (Factorization Meets the Neighborhood: a Multifaceted 
  Collaborative Filtering Model) P195 

 

 

Where to Learn Deep Learning – Courses, Tutorials, Software

Deep Learning is a very hot Machine Learning techniques which has been achieving remarkable results recently. We give a list of free resources for learning and using Deep Learning.

By Gregory Piatetsky,  @kdnuggets, May 26, 2014. 

Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even  Dogs vs Cats image recognition with 98.9% accuracy. Many winning entries in recent Kaggle Data Science competitions have used Deep Learning. 

The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after  papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks. 

Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called  Filters learned by ConvNetConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks. 

See more on ConvNet and factors enabled recent success of Deep Learning in my exclusive  interview with Yann LeCun

In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoff Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab). 

Here are some useful and free (!) resources for learning and using Deep Learning:
 
The packages which support Deep Learning include
  • Torch7, an extension of the LuaJIT language which includes an object-oriented package for deep learning and computer vision. The main advantage of Torch7 is that LuaJIT is extremely fast and very flexible.
  • Theano + Pylearn2, which has the advantage of using Python (widely used), and the disadvantage of using Python (slow for big data).
  • cuda-convnet, High-performance C++/CUDA implementation of convolutional neural networks, based on Yann LeCun work.

 
Related:
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值