这个是根据 董西成老师的 博客实验,然后自己写了一遍,中间遇到一些问题,索性记录下来。
其实是个很简单的 wordcount类,不过有了这些类,其他的代码,往里面慢慢填就行了。
package org.apache.spark
import org.apache.spark._
import SparkContext._
object WordCount {
///apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar
def main(args: Array[String]) {
if (args.length != 2 ){
println("usage is org.test.WordCount <input> <output>")
return
}
val sparkConf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(sparkConf)
val textFile = sc.textFile(args(0))
val result = textFile.flatMap(line => line.split("\\s+")).map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(1))
}
}
shell 文件为:
export YARN_CONF_DIR=/etc/hadoop/conf
SPARK_JAR=/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar \
/apps/spark-1.2.0-bin-hadoop2.4/bin/spark-class org.apache.spark.deploy.yarn.Client \
--jar ./RELEASE/spark-test-wordcount.jar \
--class org.apache.spark.WordCount \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/word.txt \
--args hdfs://UHVDATA012.uhome.haier.net:8020/yang/output \
--num-workers 1 \
--master-memory 2g \
--worker-memory 2g \
--worker-cores 2
~
话不多少,就这些。跑完日志为:
[root@UHVDATA016 yangjingbo]# ./wordcount.sh
Spark assembly has been built with Hive, including Datanucleus jars on classpath
WARNING: This client is deprecated and will be removed in a future version of Spark. Use ./bin/spark-submit with "--master yarn"
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
--args is deprecated. Use --arg instead.
15/01/06 13:27:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/06 13:27:33 INFO yarn.Client: Requesting a new application from cluster with 7 NodeManagers
15/01/06 13:27:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (13824 MB per container)
15/01/06 13:27:33 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/01/06 13:27:33 INFO yarn.Client: Setting up container launch context for our AM
15/01/06 13:27:33 INFO yarn.Client: Preparing resources for our AM container
15/01/06 13:27:33 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/01/06 13:27:33 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:33 INFO yarn.Client: Uploading resource file:/apps/spark-1.2.0-bin-hadoop2.4/lib/spark-assembly-1.2.0-hadoop2.4.0.jar ->
15/01/06 13:27:35 INFO yarn.Client: Uploading resource file:/home/yangjingbo/RELEASE/spark-test-wordcount.jar ->
15/01/06 13:27:35 INFO yarn.Client: Setting up the launch environment for our AM container
15/01/06 13:27:35 WARN yarn.ClientBase: SPARK_JAR detected in the system environment. This variable has been deprecated in favor of the spark.yarn.jar configuration variable.
15/01/06 13:27:35 INFO spark.SecurityManager: Changing view acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: Changing modify acls to: root
15/01/06 13:27:35 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/01/06 13:27:35 INFO yarn.Client: Submitting application 29 to ResourceManager
15/01/06 13:27:35 INFO impl.YarnClientImpl: Submitted application application_1416218486128_0029
15/01/06 13:27:36 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:36 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1420522055830
final status: UNDEFINED
user: root
15/01/06 13:27:37 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:38 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:39 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:40 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:41 INFO yarn.Client: Application report for application_1416218486128_0029 (state: ACCEPTED)
15/01/06 13:27:42 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:42 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ###########
ApplicationMaster RPC port: 0
queue: default
start time: 1420522055830
final status: UNDEFINED
user: root
15/01/06 13:27:43 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:44 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:45 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:46 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:47 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:48 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:49 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:50 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:51 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:52 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:53 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:54 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:55 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:56 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:57 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:58 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:27:59 INFO yarn.Client: Application report for application_1416218486128_0029 (state: RUNNING)
15/01/06 13:28:00 INFO yarn.Client: Application report for application_1416218486128_0029 (state: FINISHED)
15/01/06 13:28:00 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: ##############
ApplicationMaster RPC port: 0
queue: default
start time: 1420522055830
final status: SUCCEEDED
user: root
./wordcount.sh: line 9: --num-workers: command not found
[root@UHVDATA016 yangjingbo]# hadoop dfs -ls /yang
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 2 items
drwxr-xr-x - root hdfs 0 2015-01-06 13:28 /yang/output
-rw-r--r-- 3 root hdfs 93 2015-01-06 11:38 /yang/word.txt
[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
cat: `/yang/output': Is a directory
[root@UHVDATA016 yangjingbo]# hadoop dfs -ls /yang/output/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Found 3 items
-rw-r--r-- 3 root hdfs 0 2015-01-06 13:28 /yang/output/_SUCCESS
-rw-r--r-- 3 root hdfs 0 2015-01-06 13:28 /yang/output/part-00000
-rw-r--r-- 3 root hdfs 49 2015-01-06 13:28 /yang/output/part-00001
[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output/part-00000
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[root@UHVDATA016 yangjingbo]# hadoop dfs -cat /yang/output/part-00001
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
(name,2)
(hadoop,2)
(hdfs,3)
(redis,6)
(hbase,2)