Spark
飞鸿踏雪Ben归来
努力跟上技术时代
展开
-
提交spark sample作业失败
提交报错如下:16/01/11 19:19:53 ERROR SparkContext: Error initializing SparkContext.java.net.ConnectException: Call From sparkmaster/192.168.10.80 to sparkmaster:8021 failed on connection exception: java.n原创 2016-01-12 11:25:07 · 1756 阅读 · 3 评论 -
安装Spark Standalone模式/Hadoop yarn模式并运行Wordcount
先说一下我的环境:2个node,其中一个是master兼worker,另外一个是纯workermaster兼worker:sparkmaster 192.168.10.80纯worker:sparkworker1 192.168.10.81下载及安装从官网下载预编译好的spark 1.6http://spark.apache.org/downloads.ht原创 2016-01-15 00:40:35 · 1679 阅读 · 0 评论 -
spark-submit 错误: ava.lang.ClassNotFoundException: WordCount
今天整了一上午,终于在spark上跑出来了这个程序。在eclipse上编了个简单Scala程序,code如下package spark.wordcountimport org.apache.spark.SparkContextimport org.apache.spark.SparkContext._import org.apache.spark.SparkConfob原创 2016-01-21 15:24:46 · 13498 阅读 · 0 评论 -
linux下安装eclipse开发Spark程序
今天成功在eclipse下开发了一个简单的Scala版WordCount,并在spark集群成功运行(standalone模式)。做个笔记mark一下前提安装了jdk,我的环境是1.7.0_79安装包Eclipse:eclipse-standard-kepler-SR2-linux-gtk-x86_64.tar.gzScala:scala-2.10.6.rpm 下载原创 2016-01-21 16:23:19 · 3249 阅读 · 3 评论 -
Spark shell里RDD action失败
今天操作Spark的时候遇到如下错误scala> val work = sc.textFile("file:///tmp/input")work: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[17] at textFile at :27scala> work.count()16/01/13 23:01:52 WAR原创 2016-01-14 15:06:49 · 1005 阅读 · 0 评论 -
java.lang.IllegalArgumentException: System memory 468189184 must be at least 4.718592E8
在Eclipse里开发spark项目,尝试直接在spark里运行程序的时候,遇到下面这个报错:ERROR SparkContext: Error initializing SparkContext.java.lang.IllegalArgumentException: System memory 468189184 must be at least 4.718592E8. Please u原创 2016-03-03 12:30:14 · 11014 阅读 · 0 评论 -
Scala写的wordcount
参照文档加以修改和深化,mark一下package myworkimport java.io.Fileimport org.apache.spark.SparkContextimport org.apache.spark.SparkContext._import org.apache.spark.SparkConfobject WordCount { def main(arg原创 2016-02-28 20:50:53 · 773 阅读 · 0 评论 -
Wrong FS: hdfs://******, expected: file:///
运行spark-submit遇到如下报错:Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://******, expected: file:///原因:conf.addResource(file:Path)的参数需要是一个Path类的实例,而不是String类的实例,所以如果这么写是不对原创 2016-05-14 23:49:13 · 2462 阅读 · 0 评论 -
理解Spark RDD中的aggregate函数
针对Spark的RDD,API中有一个aggregate函数,本人理解起来费了很大劲,明白之后,mark一下,供以后参考。首先,Spark文档中aggregate函数定义如下def aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U原创 2016-06-07 15:15:33 · 21904 阅读 · 4 评论