spark
baibaiw5
这个作者很懒,什么都没留下…
展开
-
【spark】创建一个基于maven的spark项目所需要的pom.xml文件模板
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">4.0.0com.xxxxtestjartestit1.0.0nexusOS Chinahttp://maven.oschina.net/conte原创 2016-02-17 19:32:59 · 5432 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin
:scala版package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * This class provides a basic implementation of "left outer join" * operat翻译 2016-03-09 23:58:31 · 323 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList
:package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport java.util.PriorityQueue/** * Assumption: for all input (K, V), K's are non-un翻译 2016-03-09 23:30:03 · 799 阅读 · 0 评论 -
【spark】采用MultilayerPerceptron对MNIST的0-9数字进行识别
:由于只采用一种(28 * 28, 100, 50, 10)层进行训练,效果不是很好package com.bbw5.ml.sparkimport org.apache.spark.ml.tuning.ParamGridBuilderimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContext原创 2016-03-09 22:07:41 · 1714 阅读 · 0 评论 -
【spark】采用LogisticRegression(ML API篇)对MNIST的0-1数字进行识别
:ROC曲线概念http://blog.csdn.net/abcjennifer/article/details/7359370:Recall-Precision概念http://blog.csdn.net/pirage/article/details/9851339:下载MNIST数据集http://yann.lecun.com/exdb/mnist/:加载M原创 2016-03-09 19:34:36 · 2017 阅读 · 0 评论 -
【spark-breeze】win7-64位 breeze安装
breeze:maven 依赖 org.scalanlp breeze_2.10 0.11.2org.scalanlpbreeze-natives_2.10 0.11.2---------------------------------------------------------------------------------------原创 2016-03-08 19:02:25 · 1845 阅读 · 0 评论 -
【spark+python】采用LogisticRegression(MLLib)对MNIST的0-1数字进行识别
:下载数据集http://yann.lecun.com/exdb/mnist/:原创 2016-02-29 20:33:39 · 1452 阅读 · 0 评论 -
【Mastering Machine Learning with scikit-learn (python+spark版)】Chapter2 Linear Regression
:源码下载地址https://www.packtpub.com/big-data-and-business-intelligence/mastering-machine-文章管理learning-scikit-learn:启动ipython notebookcd E:\DM\bookcode\mastering-machine-learning-scikit-learnip翻译 2016-02-24 22:02:55 · 1096 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List
:scala版本的Top 10 Listpackage com.bmb.dataalgorithms.sparkimport scala.collection.mutable.PriorityQueueimport org.apache.spark.Loggingimport org.apache.spark.SparkConfimport org.apache.spark.Spa翻译 2016-02-24 21:56:40 · 1016 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort
:最近看了《Data Algorithms_Recipes for Scaling up with Hadoop and Spark》,其中的算法采用Java实现,下载路径为源码下载https://github.com/mahmoudparsian/data-algorithms-book/:本着学习的目的,现提供scala版本的算法Secondary Sortpackage com.翻译 2016-02-23 22:22:03 · 554 阅读 · 0 评论 -
【spark】spark word count例子
:代码package com.test.mllib.testimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject WorkCountApp { def main(args: Array[String]) { var filename = "" args match原创 2016-02-17 20:50:36 · 802 阅读 · 0 评论 -
【spark】spark常用命令列表
:启动spark-shell时,指定需要加载的类库bin\spark-shell --jars E:\DM\code\projects\ch11-testit\target\ch11-testit-1.0.0.jar:通过spark-submit运行某个应用E:\DM\Spark\spark-1.4.1-bin-hadoop2.4\bin\spark-submit --maste原创 2016-02-17 19:35:28 · 5761 阅读 · 0 评论 -
【spark】DataFrame基本操作
:参考http://dataunion.org/19375.html转载 2016-03-17 22:46:51 · 976 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern
:scala版本算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.SparkCon翻译 2016-03-16 20:00:16 · 620 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage
:scala版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.Partitioner翻译 2016-03-16 22:10:28 · 465 阅读 · 0 评论 -
【spark】win7-64位下编译spark1.6.0
1:设置setting.xml中maven仓库为http://maven.oschina.net/content/groups/public/ (此仓库需要maven3.3.3以上)xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/S原创 2016-02-17 19:17:42 · 430 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering
:spark examples中的kmeans实现/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional翻译 2016-04-05 19:43:29 · 418 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport breeze.linalg.DenseVector/** * This class solves K-Nearest-Nerigbor翻译 2016-03-31 19:00:45 · 424 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.Partitionerimport org.apache.spark.HashPartitioneri翻译 2016-03-30 20:10:32 · 378 阅读 · 0 评论 -
【spark source】Spark LinearRegression源码解读
:org.apache.spark.mllib.regression.RegressionModel定义线性回归模型的predict接口:org.apache.spark.mllib.regression.impl.GLMRegressionModel从文件中加载Model,或保存Model到文件中:org.apache.spark.mllib.pmml.PMMLExportabl原创 2016-03-28 21:42:20 · 709 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend
:scala版本算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * usermovieratings.txt * * User1 Movie1 1 * User1 Movie2 2 * User翻译 2016-03-28 21:13:39 · 320 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * friends.txt * 1 2,3,4,5,翻译 2016-03-23 19:52:44 · 448 阅读 · 0 评论 -
【spark+nlp】 Feature Extract and Preprocess
:Spark NLP常用方法package com.bbw5.ml.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.ml.feature.CountVectorizerimport org.apache.spark.ml.feature.Co原创 2016-03-22 22:20:57 · 1026 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items
:scala版算法实现 package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMapimport scala.collection.mutable.Arra翻译 2016-03-22 18:19:21 · 419 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends
:scala 版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMap/** * The FindCommonFriends is a Spa翻译 2016-03-21 20:49:30 · 355 阅读 · 0 评论 -
【spark】使用线性回归对葡萄酒质量进行预测
dd原创 2016-03-17 20:48:29 · 4675 阅读 · 0 评论 -
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis
:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * finds all association rul翻译 2016-03-17 18:22:28 · 354 阅读 · 0 评论 -
【spark】spark+kafka
:启动kafkaMobaXterm_Personal_8.5.exeD:/Develop/kafka_2.10-0.8.2.1/bin/windows/zookeeper-server-start.bat D:/Develop/kafka_2.10-0.8.2.1/config/zookeeper.propertiesD:/Develop/kafka_2.10-0.8.2.原创 2016-05-12 22:16:30 · 491 阅读 · 0 评论