Mahout
ylzhjlinux
这个作者很懒,什么都没留下…
展开
-
Item-based recommendation
User-based: Who is similar to the boy, and what do they like?Item-based: What is similar to what the boy likes? The algorithm The difference between User-based and Item-based :Slope-one...原创 2014-04-30 11:33:14 · 180 阅读 · 0 评论 -
Mahout:Topic modeling using latent Dirichlet allocation (LDA)
IntroductionTo find these topics in a particular set of documents,We’d modify our clustering code to work with word vectors instead of the document vectors we’ve been using so far. A word vector i...原创 2014-06-12 14:46:25 · 254 阅读 · 0 评论 -
Mahout: Batch and online clustering
Online news clusteringCluster one million articles, as showed below, and save the cluster centroids for all clusters. Periodically, for each new article, use canopy clustering to assign it t...原创 2014-06-13 10:47:41 · 216 阅读 · 0 评论 -
Mahoout: CWSS
jcseghttp://www.oschina.net/p/jcseghttp://technology.chtsai.org/mmseg/ scwshttp://www.ftphp.com/scws/demo/v48.phphttp://www.ftphp.com/scws/docs.php#instscwshttp://www.350351.com/...原创 2014-06-13 14:39:16 · 250 阅读 · 0 评论 -
Mahout: Integerate jcseg with mahout seq2parse
Google global sites urlhttps://github.com/justjavac/Google-IPs JCSEGhttp://www.oschina.net/p/jcsegMMSEGhttp://technology.chtsai.org/mmseg/ //convert maven project to eclipse project...原创 2014-06-16 18:30:06 · 92 阅读 · 0 评论 -
Mahout: CVB
When run cvb, there is a errororg.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritableSolution:the new LDA requires SequenceFile<IntWritable, VectorWritable> as input...原创 2014-06-19 18:32:25 · 161 阅读 · 0 评论 -
Solr:Deploy solr to tomcat
Install tomcat7#sudo apt-get update#sudo apt-get install tomcat7#sudo apt-get install tomcat7-adminhttp://localhost:8080/#sudo vi /etc/tomcat7/tomcat-users.xml<tomcat-users> ...原创 2014-07-09 17:41:03 · 108 阅读 · 0 评论 -
MachineLearning: Introduction 1
Supervised learningis tasked with learning a function from labeled training data in order to predict the value of any valid input. Common examples of supervised learning include classifying e-mail...原创 2014-04-24 00:05:54 · 261 阅读 · 0 评论 -
Recommendtion System Introduction
collaborative filteringproducing recommendations based on, and only based on, knowledge of users’ rela-tionships to items. These techniques require no knowledge of the properties of the items thems...原创 2014-04-24 23:04:08 · 135 阅读 · 0 评论 -
Mahout: build 0.9 from source code and eclipse env setup
1.# svn co http://svn.apache.org/repos/asf/mahout/trunkor download mahout-distribution-0.9-src.tar.gz 2.mvn -DskipTests clean install package 3.create a common java project which using mahou...原创 2014-04-27 00:49:03 · 140 阅读 · 0 评论 -
Exploring the user-based recommender 1
recommending items to some user, denoted by u, as seen below It would be terribly slow to examine every item. In reality, a neighborhood of most similar users is computed first, and only items k...原创 2014-04-29 15:29:31 · 84 阅读 · 0 评论 -
Mahout: Dirichlet clustering
Dirichlet clustering starts with a data set of points and a ModelDistribution. Think of ModelDistribution as a class that generates different models. You create an empty model and try to assign point...原创 2014-06-12 14:08:43 · 78 阅读 · 0 评论 -
Mahout: Fuzzy k-means clustering
As the name says, the fuzzy k-means clustering algorithm does a fuzzy form of k-means clustering. Instead of the exclusive clustering in k-means, fuzzy k-means tries to generate overlapping clusters ...原创 2014-06-12 11:18:08 · 210 阅读 · 0 评论 -
Mahout: An overview of clustering techniques
Different kinds of clustering problemsEXCLUSIVE CLUSTERING In exclusive clustering, an item belongs exclusively to one cluster, not several.OVERLAPPING CLUSTERING What if we wanted to do non-e...原创 2014-06-12 10:57:40 · 170 阅读 · 0 评论 -
New and experimental recommenders
Singular value decomposition–based recommenders SVDRecommender Linear interpolation item–based recommendation KnnItemBasedRecommender Cluster-based recommendation TreeCluster...原创 2014-04-30 14:31:50 · 147 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 1
co-occurrence matrixInstead of computing the similarity between every pair of items, it’ll compute the number of times each pair of items occurs together in some user’s list of preferences, ...原创 2014-05-04 09:38:23 · 69 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 2
generating user vectorsInput formatuserID: itemID1 itemID2 itemID3 ....Output format a Vector from all item IDs for the user, and outputs the user ID mapped to the user’s preference ve...原创 2014-05-04 11:28:42 · 94 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 3
Running recommendations with HadoopThe glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the se...原创 2014-05-04 13:55:35 · 100 阅读 · 0 评论 -
Mahout: build 0.9 support hadoop2.3.0
mvn clean package -Dhadoop2.version=2.3.0 -DskipTestsmvn clean package -Dhadoop.version=2.3.0 -DskipTestsmvn clean package -Dhadoop.profile=200 -DskipTestsThe above commands will not work...原创 2014-05-05 00:24:20 · 71 阅读 · 0 评论 -
Mahout: Introduction to clustering
Clustering a collection involves three things:An algorithmA notion of both similarity and dissimilarityA stopping condition Measuring the similarity of items The most important issue...原创 2014-05-05 12:01:47 · 104 阅读 · 0 评论 -
Mahout: qulity blogs
http://blog.csdn.net/zwan0518/article/details/9100329https://www.ibm.com/developerworks/library/j-mahout-scaling/http://mail-archives.apache.org/mod_mbox/mahout-user/201202.mbox/%3C1328197877...原创 2014-05-06 17:51:44 · 62 阅读 · 0 评论 -
Mahout: Run ItemBasedRecommemdation Job in eclipse
1.configure parameters by Run -> Run Configurations->Java Applications --> Arguments--input hdfs://192.168.122.1:2014/user/zhaohj/mahout/item.txt --output hdfs://192.168.122.1:2014/user/...原创 2014-05-14 16:12:08 · 103 阅读 · 0 评论 -
Mahout: Clustering - Representing data
Transforming data into vectorsIn Mahout, vectors are implemented as three different classesDenseVector can be thought of as an array of doubles, whose size is the numberof features in the data....原创 2014-06-11 11:02:56 · 110 阅读 · 0 评论 -
Mahout: K-means clustering
K-means AlgorithmThe k-means algorithm will start with an initial set of k centroid points. The algorithm does multiple rounds of processing and refines the centroid locations until the iteration ...原创 2014-06-11 16:06:14 · 73 阅读 · 0 评论 -
Exploring the user-based recommender 2( similarity metrics)
Sample Data1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.0...原创 2014-04-29 17:25:01 · 151 阅读 · 0 评论