spark on hdfs spark处理hdfs上的文件简单的wordcount

本文通过Spark Shell演示了如何在Hadoop HDFS上读取文件并执行WordCount操作。首先,进入Spark的bin目录启动Spark Shell,接着加载HDFS中的文件,然后使用flatMap、map和reduceByKey函数进行单词拆分、计数,最后收集结果。相比于MapReduce,Spark的代码更简洁,执行速度更快。
摘要由CSDN通过智能技术生成

进入spark/bin 目录 输入spark-shell  进入spark shell模式

[hadoop@localhost bin]$ spark-shell
14/05/21 14:04:59 INFO spark.HttpServer: Starting HTTP Server
14/05/21 14:04:59 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/05/21 14:04:59 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:40562
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 0.9.1
      /_/

Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
14/05/21 14:05:05 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/21 14:05:05 INFO Remoting: Starting remoting
14/05/21 14:05:05 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@localhost:57902]
14/05/21 14:05:05 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@localhost:57902]
14/05/21 14:05:05 INFO spark.SparkEnv: Registering BlockManagerMaster
14/05/21 14:05:05 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140521140505-3f53
14/05/21 14:05:05 INFO storage.MemoryStore: MemoryStore started with capacity 294.9 MB.
14/05/21 14:05:06 INFO network.ConnectionManager: Bound socket to port 48084 with id = ConnectionManagerId(localhost,48084)
14/05/21 14:05:06 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/05/21 14:05:06 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager localhost:48084 with 294.9 MB RAM
14/05/21 14:05:06 INFO storage.BlockManagerMaster: Registered BlockManager
14/05/21 14:05:06 INFO spark.HttpServer: Starting HTTP Server
14/05/21 14:05:06 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/05/21 14:05:06 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:48508
14/05/21 14:05:06 INFO broadcast.HttpBroadcast: Broadcast server started at http://localhost:48508
14/05/21 14:05:06 INFO spark.SparkEnv: Registering MapOutputTracker
14/05/21 14:05:06 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-862d3bfc-8485-4adb-86a3-efd09c90fc03
14/05/21 14:05:06 INFO s

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值