spark_baibaiw5的博客-CSDN博客

spark

关注

关注数：文章数：28 文章阅读量：35316 文章收藏量：36

作者: baibaiw5

这个作者很懒，什么都没留下…

展开

【spark】创建一个基于maven的spark项目所需要的pom.xml文件模板

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">4.0.0com.xxxxtestjartestit1.0.0nexusOS Chinahttp://maven.oschina.net/conte

原创 2016-02-17 19:32:59 · 5432 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin

：scala版package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * This class provides a basic implementation of "left outer join" * operat

翻译 2016-03-09 23:58:31 · 323 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList

：package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport java.util.PriorityQueue/** * Assumption: for all input (K, V), K's are non-un

翻译 2016-03-09 23:30:03 · 799 阅读 · 0 评论
【spark】采用MultilayerPerceptron对MNIST的0-9数字进行识别

：由于只采用一种(28 * 28, 100, 50, 10)层进行训练，效果不是很好package com.bbw5.ml.sparkimport org.apache.spark.ml.tuning.ParamGridBuilderimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContext

原创 2016-03-09 22:07:41 · 1714 阅读 · 0 评论
【spark】采用LogisticRegression(ML API篇)对MNIST的0-1数字进行识别

：ROC曲线概念http://blog.csdn.net/abcjennifer/article/details/7359370：Recall-Precision概念http://blog.csdn.net/pirage/article/details/9851339：下载MNIST数据集http://yann.lecun.com/exdb/mnist/：加载M

原创 2016-03-09 19:34:36 · 2017 阅读 · 0 评论
【spark-breeze】win7-64位 breeze安装

breeze:maven 依赖 org.scalanlp breeze_2.10 0.11.2org.scalanlpbreeze-natives_2.10 0.11.2---------------------------------------------------------------------------------------

原创 2016-03-08 19:02:25 · 1845 阅读 · 0 评论
【spark+python】采用LogisticRegression(MLLib)对MNIST的0-1数字进行识别

：下载数据集http://yann.lecun.com/exdb/mnist/：

原创 2016-02-29 20:33:39 · 1452 阅读 · 0 评论
【Mastering Machine Learning with scikit-learn (python+spark版)】Chapter2 Linear Regression

：源码下载地址https://www.packtpub.com/big-data-and-business-intelligence/mastering-machine-文章管理learning-scikit-learn：启动ipython notebookcd E:\DM\bookcode\mastering-machine-learning-scikit-learnip

翻译 2016-02-24 22:02:55 · 1096 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List

：scala版本的Top 10 Listpackage com.bmb.dataalgorithms.sparkimport scala.collection.mutable.PriorityQueueimport org.apache.spark.Loggingimport org.apache.spark.SparkConfimport org.apache.spark.Spa

翻译 2016-02-24 21:56:40 · 1016 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort

：最近看了《Data Algorithms_Recipes for Scaling up with Hadoop and Spark》，其中的算法采用Java实现，下载路径为源码下载https://github.com/mahmoudparsian/data-algorithms-book/：本着学习的目的，现提供scala版本的算法Secondary Sortpackage com.

翻译 2016-02-23 22:22:03 · 554 阅读 · 0 评论
【spark】spark word count例子

：代码package com.test.mllib.testimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject WorkCountApp { def main(args: Array[String]) { var filename = "" args match

原创 2016-02-17 20:50:36 · 802 阅读 · 0 评论
【spark】spark常用命令列表

：启动spark-shell时，指定需要加载的类库bin\spark-shell --jars E:\DM\code\projects\ch11-testit\target\ch11-testit-1.0.0.jar：通过spark-submit运行某个应用E:\DM\Spark\spark-1.4.1-bin-hadoop2.4\bin\spark-submit --maste

原创 2016-02-17 19:35:28 · 5761 阅读 · 0 评论
【spark】DataFrame基本操作

：参考http://dataunion.org/19375.html

转载 2016-03-17 22:46:51 · 976 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern

:scala版本算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.SparkCon

翻译 2016-03-16 20:00:16 · 620 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage

:scala版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.Partitioner

翻译 2016-03-16 22:10:28 · 465 阅读 · 0 评论
【spark】win7-64位下编译spark1.6.0

1：设置setting.xml中maven仓库为http://maven.oschina.net/content/groups/public/ (此仓库需要maven3.3.3以上)xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/S

原创 2016-02-17 19:17:42 · 430 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering

:spark examples中的kmeans实现/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional

翻译 2016-04-05 19:43:29 · 418 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors

：scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport breeze.linalg.DenseVector/** * This class solves K-Nearest-Nerigbor

翻译 2016-03-31 19:00:45 · 424 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.Partitionerimport org.apache.spark.HashPartitioneri

翻译 2016-03-30 20:10:32 · 378 阅读 · 0 评论
【spark source】Spark LinearRegression源码解读

：org.apache.spark.mllib.regression.RegressionModel定义线性回归模型的predict接口：org.apache.spark.mllib.regression.impl.GLMRegressionModel从文件中加载Model，或保存Model到文件中：org.apache.spark.mllib.pmml.PMMLExportabl

原创 2016-03-28 21:42:20 · 709 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend

:scala版本算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * usermovieratings.txt * * User1 Movie1 1 * User1 Movie2 2 * User

翻译 2016-03-28 21:13:39 · 320 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People

：scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * friends.txt * 1 2,3,4,5,

翻译 2016-03-23 19:52:44 · 448 阅读 · 0 评论
【spark+nlp】 Feature Extract and Preprocess

：Spark NLP常用方法package com.bbw5.ml.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.ml.feature.CountVectorizerimport org.apache.spark.ml.feature.Co

原创 2016-03-22 22:20:57 · 1026 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items

：scala版算法实现 package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMapimport scala.collection.mutable.Arra

翻译 2016-03-22 18:19:21 · 419 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends

：scala 版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMap/** * The FindCommonFriends is a Spa

翻译 2016-03-21 20:49:30 · 355 阅读 · 0 评论
【spark】使用线性回归对葡萄酒质量进行预测

dd

原创 2016-03-17 20:48:29 · 4675 阅读 · 0 评论
【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * finds all association rul

翻译 2016-03-17 18:22:28 · 354 阅读 · 0 评论
【spark】spark+kafka

:启动kafkaMobaXterm_Personal_8.5.exeD:/Develop/kafka_2.10-0.8.2.1/bin/windows/zookeeper-server-start.bat D:/Develop/kafka_2.10-0.8.2.1/config/zookeeper.propertiesD:/Develop/kafka_2.10-0.8.2.

原创 2016-05-12 22:16:30 · 491 阅读 · 0 评论

spark

作者: baibaiw5

【spark】创建一个基于maven的spark项目所需要的pom.xml文件模板

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList

【spark】采用MultilayerPerceptron对MNIST的0-9数字进行识别

【spark】采用LogisticRegression(ML API篇)对MNIST的0-1数字进行识别

【spark-breeze】win7-64位 breeze安装

【spark+python】采用LogisticRegression(MLLib)对MNIST的0-1数字进行识别

【Mastering Machine Learning with scikit-learn (python+spark版)】Chapter2 Linear Regression

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort

【spark】spark word count例子

【spark】spark常用命令列表

【spark】DataFrame基本操作

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage

【spark】win7-64位下编译spark1.6.0

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit

【spark source】Spark LinearRegression源码解读

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People

【spark+nlp】 Feature Extract and Preprocess

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends

【spark】使用线性回归对葡萄酒质量进行预测

【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis

【spark】spark+kafka