在hadoop环境下用spark跑wordcount(没有安装scala)

在spark和hdfs上运行wordcount:
一、单机单节点安装spark:
1、解压
2、配置conf,cp,, spark-env.sh,写路径,sbin跑动
[root@localhost spark-1.6.1-bin-hadoop1]# cd sbin
[root@localhost sbin]# ls
slaves.sh                       start-slaves.sh
spark-config.sh                 start-thriftserver.sh
spark-daemon.sh                 stop-all.sh
spark-daemons.sh                stop-history-server.sh
start-all.sh                    stop-master.sh
start-history-server.sh         stop-mesos-dispatcher.sh
start-master.sh                 stop-mesos-shuffle-service.sh
start-mesos-dispatcher.sh       stop-shuffle-service.sh
start-mesos-shuffle-service.sh  stop-slave.sh
start-shuffle-service.sh        stop-slaves.sh
start-slave.sh                  stop-thriftserver.sh
[root@localhost sbin]# start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost: failed to launch org.apache.spark.deploy.worker.Worker:
localhost:   Spark Command: /usr/lib/jvm/jdk1.7.0_45/bin/java -cp /usr/lib/spark/spark-1.6.1-bin-hadoop1/conf/:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/spark-assembly-1.6.1-hadoop1.2.1.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-rdbms-3.2.9.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-core-3.2.10.jar:/usr/lib/spark/spark-1.6.1-bin-hadoop1/lib/datanucleus-api-jdo-3.2.6.jar:/usr/lib/hadoop/hadoop-1.2.1/etc/hadoop -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost.localdomain:7077
localhost:   ========================================
localhost: full log in /usr/lib/spark/spark-1.6.1-bin-hadoop1/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
[root@localhost sbin]# jps
4035 
5268 Master
5354 Jps
5322 Worker
二、
再跑动hadoop
[root@localhost sbin]# cd ..
[root@localhost spark-1.6.1-bin-hadoop1]# start-all.sh
starting namenode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-localhost.localdomain.out
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
starting jobtracker, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-localhost.localdomain.out
localhost: starting tasktracker, logging to /usr/lib/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-localhost.localdomain.out
[root@localhost spark-1.6.1-bin-hadoop1]# jps
4035 
6364 TaskTracker
6432 Jps
6244 JobTracker
6064 DataNode
5268 Master
6166 SecondaryNameNode
5959 NameNode
5322 Worker
三、上传文档
[root@localhost hadoop-1.2.1]# hadoop fs -copyFromLocal README.txt input
四、
没有装scala,想用纯java环境跑动wordcount
(一)进入/bin/spark-shell
[root@localhost bin]# spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.Groups).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/


Using Scala version 2.10.5 (Java HotSpot(TM) Client VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
16/04/29 08:03:24 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.86.135 instead (on interface eth0)
16/04/29 08:03:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Spark context available as sc.
16/04/29 08:03:57 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/29 08:04:06 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/29 08:04:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/29 08:04:30 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/04/29 08:05:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/29 08:05:23 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/29 08:05:26 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.


scala> 
(二)指定运行文档
scala> val file=sc.textFile("hdfs://192.168.86.135:9000/user/root/input/README.txt")
16/04/29 08:07:31 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
file: org.apache.spark.rdd.RDD[String] = hdfs://192.168.86.135:9000/user/root/input/README.txt MapPartitionsRDD[1] at textFile at <console>:27
(三)
进行运算输出格式编辑
scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
16/04/29 08:12:31 WARN LoadSnappy: Snappy native library not loaded
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:29
(四)
输出
scala> count.collect()
res0: Array[(String, Int)] = Array((Hadoop,1), (Commodity,1), (For,1), (this,3), (country,1), (under,1), (it,1), (The,4), (Jetty,1), (Software,2), (Technology,1), (<http://www.wassenaar.org/>,1), (have,1), (http://wiki.apache.org/hadoop/,1), (BIS,1), (classified,1), (This,1), (following,1), (which,2), (security,1), (See,1), (encryption,3), (Number,1), (export,1), (reside,1), (for,3), ((BIS),,1), (any,1), (at:,2), (software,2), (makes,1), (algorithms.,1), (re-export,2), (latest,1), (your,1), (SSL,1), (the,8), (Administration,1), (includes,2), (import,,2), (provides,1), (Unrestricted,1), (country's,1), (if,1), (740.13),1), (Commerce,,1), (country,,1), (software.,2), (concerning,1), (laws,,1), (source,1), (possession,,2), (Apache,1), (our,2), (written,1), (as,1), (License,1), (regulations,...
scala> 
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

星之擎

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值