spark shell 是spark自带的一个快速原型开发的工具,在spark目录下面的bin目录下面,
1.进入spark shell :
[hadoop@localhost bin]$ MASTER=spark://localhost:7077 ./spark-shell
或者直接输入
[hadoop@localhost bin]$ ./spark-shell
14/05/23 15:14:00 INFO spark.HttpServer: Starting HTTP Server
14/05/23 15:14:00 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/05/23 15:14:00 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:53024
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 0.9.1
/_/
Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_55)
Type in expressions to have them evaluated.
Type :help for more information.
14/05/23 15:14:06 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/23 15:14:08 INFO client.AppClient$ClientActor: Executor updated: app-20140523151407-0000/0 is now RUNNING
Created spark context..
Spark context available as sc.
scala> 14/05/23 15:14:09 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@localhost:52462/user/Executor#2047458293] with ID 0
14/05/23 15:14:10 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager localhost:51467 with 294.9 MB RAM
scala>
出现scala表示进入成功!接下来就可以进行交互式的变成了 与python的命令行类似
1:加载一个简单的文本文件
当我们通过spark shell 连接到一个存在的cluster上面的时候,spark会产生一个appid =app-20140523151407-0000 的 name=spark shell 的spark任务,可以通过spark自带的web ui看到 默认端口是8080 如下图所示:
说明我们已经成功链接到spark cluster上面,
现在可以下载数据集来做各种实验了
这里可以下载The Elements of Statistical Learning的数据集,
wget http://www-stat.