自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(36)
  • 资源 (1)
  • 收藏
  • 关注

原创 maven setting.xml 中国配置

<!-- mirror | Specifies a repository mirror site to use instead of a given repository. The repository that | this mirror serves has an ID that matches the mirrorOf element of this mi

2016-06-09 10:09:00 481

原创 【spark】spark+kafka

:启动kafkaMobaXterm_Personal_8.5.exeD:/Develop/kafka_2.10-0.8.2.1/bin/windows/zookeeper-server-start.bat   D:/Develop/kafka_2.10-0.8.2.1/config/zookeeper.propertiesD:/Develop/kafka_2.10-0.8.2.

2016-05-12 22:16:30 482

原创 【python】numpy,scipy,pandas资源列表

http://blog.csdn.net/huangxia73/article/details/38065881

2016-04-05 23:28:46 449

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 12. K-Means Clustering

:spark examples中的kmeans实现/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional

2016-04-05 19:43:29 413

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 13 k-Nearest Neighbors

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport breeze.linalg.DenseVector/** * This class solves K-Nearest-Nerigbor

2016-03-31 19:00:45 421

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 11 Smarter Email Marketing wit

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.Partitionerimport org.apache.spark.HashPartitioneri

2016-03-30 20:10:32 372

原创 【spark source】Spark LinearRegression源码解读

:org.apache.spark.mllib.regression.RegressionModel定义线性回归模型的predict接口:org.apache.spark.mllib.regression.impl.GLMRegressionModel从文件中加载Model,或保存Model到文件中:org.apache.spark.mllib.pmml.PMMLExportabl

2016-03-28 21:42:20 700

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 10 Content-Based Recommend

:scala版本算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * usermovieratings.txt * * User1 Movie1 1 * User1 Movie2 2 * User

2016-03-28 21:13:39 316

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation People

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * friends.txt * 1 2,3,4,5,

2016-03-23 19:52:44 444

原创 【spark+nlp】 Feature Extract and Preprocess

:Spark NLP常用方法package com.bbw5.ml.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport org.apache.spark.ml.feature.CountVectorizerimport org.apache.spark.ml.feature.Co

2016-03-22 22:20:57 1022

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 9 Recommendation Items

:scala版算法实现 package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMapimport scala.collection.mutable.Arra

2016-03-22 18:19:21 415

翻译 【pyspark】jieba 中文分词

2016-03-21 21:10:07 7115 5

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 8 Common Friends

:scala 版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.HashMap/** * The FindCommonFriends is a Spa

2016-03-21 20:49:30 348

转载 【spark】DataFrame基本操作

:参考http://dataunion.org/19375.html

2016-03-17 22:46:51 970

原创 【spark】使用线性回归对葡萄酒质量进行预测

dd

2016-03-17 20:48:29 4665

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter 7 Market Basket Analysis

:scala版算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBuffer/** * finds all association rul

2016-03-17 18:22:28 352

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter6 MovingAverage

:scala版算法package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.Partitioner

2016-03-16 22:10:28 461

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter5 Order Inversion Pattern

:scala版本算法实现package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport scala.collection.mutable.ArrayBufferimport org.apache.spark.SparkCon

2016-03-16 20:00:16 612

原创 【storm kafka】storm kafka集成

:maven 配置,解决log4j和slf4j的冲突<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.a

2016-03-10 23:12:16 570

原创 【storm】win7-64位 storm安装

:安装文档http://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html:启动zk(复用kafka zk)cd G:\Big-File\Architecture\storm\kafka_2.10-0.9.0.0bin\windows\zookeeper-server-start   config

2016-03-10 18:52:58 1486

原创 【kafka】win7-64位 kafka安装

:quick starthttp://kafka.apache.org/documentation.html#quickstart:修改kafka zk配置config/zookeeper.propertiesdataDir=G:/Big-File/Architecture/storm/kafka_2.10-0.9.0.0/zookeeper:启动zkcd G:\Big

2016-03-10 18:18:15 2613

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter4 LeftOuterJoin

:scala版package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContext/** * This class provides a basic implementation of "left outer join" * operat

2016-03-09 23:58:31 319

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 NonUniqueList

:package com.bbw5.dataalgorithms.sparkimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextimport java.util.PriorityQueue/** * Assumption: for all input (K, V), K's are non-un

2016-03-09 23:30:03 793

原创 【spark】采用MultilayerPerceptron对MNIST的0-9数字进行识别

:由于只采用一种(28 * 28, 100, 50, 10)层进行训练,效果不是很好package com.bbw5.ml.sparkimport org.apache.spark.ml.tuning.ParamGridBuilderimport org.apache.spark.SparkContextimport org.apache.spark.sql.SQLContext

2016-03-09 22:07:41 1707

原创 【spark】采用LogisticRegression(ML API篇)对MNIST的0-1数字进行识别

:ROC曲线概念http://blog.csdn.net/abcjennifer/article/details/7359370:Recall-Precision概念http://blog.csdn.net/pirage/article/details/9851339:下载MNIST数据集http://yann.lecun.com/exdb/mnist/:加载M

2016-03-09 19:34:36 2010

原创 【spark-breeze】win7-64位 breeze安装

breeze:maven 依赖  org.scalanlp  breeze_2.10   0.11.2org.scalanlpbreeze-natives_2.10 0.11.2---------------------------------------------------------------------------------------

2016-03-08 19:02:25 1841

原创 【python】win7-64位安装python

:下载pythonhttps://www.python.org/ftp/python/2.7.11/python-2.7.11.amd64.msi:添加以下路径进入PathD:\Develop\Python27;D:\Develop\Python27\Scripts:下载Microsoft Visual C++ Compiler for Python 2.7(安装其后,可以直接

2016-03-03 20:08:35 1266

原创 【spark+python】采用LogisticRegression(MLLib)对MNIST的0-1数字进行识别

:下载数据集http://yann.lecun.com/exdb/mnist/:

2016-02-29 20:33:39 1445

翻译 【Mastering Machine Learning with scikit-learn (python+spark版)】Chapter2 Linear Regression

:源码下载地址https://www.packtpub.com/big-data-and-business-intelligence/mastering-machine-文章管理learning-scikit-learn:启动ipython notebookcd E:\DM\bookcode\mastering-machine-learning-scikit-learnip

2016-02-24 22:02:55 1090

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter3 Top 10 List

:scala版本的Top 10 Listpackage com.bmb.dataalgorithms.sparkimport scala.collection.mutable.PriorityQueueimport org.apache.spark.Loggingimport org.apache.spark.SparkConfimport org.apache.spark.Spa

2016-02-24 21:56:40 1012

翻译 【Data Algorithms_Recipes for Scaling up with Hadoop and Spark】Chapter1 Secondary Sort

:最近看了《Data Algorithms_Recipes for Scaling up with Hadoop and Spark》,其中的算法采用Java实现,下载路径为源码下载https://github.com/mahmoudparsian/data-algorithms-book/:本着学习的目的,现提供scala版本的算法Secondary Sortpackage com.

2016-02-23 22:22:03 547

原创 【spark】spark word count例子

:代码package com.test.mllib.testimport org.apache.spark.SparkConfimport org.apache.spark.SparkContextobject WorkCountApp { def main(args: Array[String]) { var filename = "" args match

2016-02-17 20:50:36 794

原创 【spark】spark常用命令列表

:启动spark-shell时,指定需要加载的类库bin\spark-shell  --jars   E:\DM\code\projects\ch11-testit\target\ch11-testit-1.0.0.jar:通过spark-submit运行某个应用E:\DM\Spark\spark-1.4.1-bin-hadoop2.4\bin\spark-submit --maste

2016-02-17 19:35:28 5752

原创 【spark】创建一个基于maven的spark项目所需要的pom.xml文件模板

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">4.0.0com.xxxxtestjartestit1.0.0nexusOS Chinahttp://maven.oschina.net/conte

2016-02-17 19:32:59 5390

原创 【spark】win7-64位下编译spark1.6.0

1:设置setting.xml中maven仓库为http://maven.oschina.net/content/groups/public/  (此仓库需要maven3.3.3以上)xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/S

2016-02-17 19:17:42 428

原创 【hadoop】win7-32位下安装hadoop2.x

:安装JDK1.7:下载hadoop-2.3.0http://archive.apache.org/dist/hadoop/core/hadoop-2.3.0/hadoop-2.3.0.tar.gz:下载hadoop-common-2.2.0-bin-32.rarhttps://codeload.github.com/srccodes/hadoop-common-2.2.0-b

2016-02-16 23:15:54 1953

hadoop-2.3-win7配置

hadoop-2.3-win7配置

2016-02-18

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除