Spark交互式工具Spark Shell

 

 

 

 

 

  不多说,直接上干货!

 

 

 

REPL

  Read-Eval-Print-Loop,即交互式shell,以交互式方式来编程

   那么,什么是REPL呢,如下就是

[spark@master ~]$ python
Python 2.6.6 (r266:84292, Nov 22 2013, 12:16:22) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 1+2 3 >>> exit() [spark@master ~]$ 

 

 

 

 

 

 

Spark REPL

    (scala)

$SPARK_HOME/bin/spark-shell     

[spark@master ~]$ $SPARK_HOME/bin/spark-shell
17/04/09 16:09:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/09 16:09:31 INFO spark.SecurityManager: Changing view acls to: spark
17/04/09 16:09:31 INFO spark.SecurityManager: Changing modify acls to: spark
17/04/09 16:09:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
17/04/09 16:09:33 INFO spark.HttpServer: Starting HTTP Server
17/04/09 16:09:33 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/04/09 16:09:33 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:44073
17/04/09 16:09:33 INFO util.Utils: Successfully started service 'HTTP class server' on port 44073.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
17/04/09 16:09:51 INFO spark.SparkContext: Running Spark version 1.6.1
17/04/09 16:09:52 INFO spark.SecurityManager: Changing view acls to: spark
17/04/09 16:09:52 INFO spark.SecurityManager: Changing modify acls to: spark
17/04/09 16:09:52 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
17/04/09 16:09:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 45062.
17/04/09 16:09:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/04/09 16:09:56 INFO Remoting: Starting remoting
17/04/09 16:09:56 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.80.10:59006]
17/04/09 16:09:56 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 59006.
17/04/09 16:09:57 INFO spark.SparkEnv: Registering MapOutputTracker
17/04/09 16:09:57 INFO spark.SparkEnv: Registering BlockManagerMaster
17/04/09 16:09:57 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-2039d1e2-879e-49cf-8025-51ab363d931c
17/04/09 16:09:57 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
17/04/09 16:09:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/04/09 16:10:02 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/04/09 16:10:02 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/04/09 16:10:02 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/04/09 16:10:02 INFO ui.SparkUI: Started SparkUI at http://192.168.80.10:4040
17/04/09 16:10:04 INFO executor.Executor: Starting executor ID driver on host localhost
17/04/09 16:10:04 INFO executor.Executor: Using REPL class URI: http://192.168.80.10:44073
17/04/09 16:10:05 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56077.
17/04/09 16:10:05 INFO netty.NettyBlockTransferService: Server created on 56077
17/04/09 16:10:05 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/04/09 16:10:05 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:56077 with 517.4 MB RAM, BlockManagerId(driver, localhost, 56077)
17/04/09 16:10:05 INFO storage.BlockManagerMaster: Registered BlockManager
17/04/09 16:10:08 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
17/04/09 16:10:15 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
17/04/09 16:10:17 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
17/04/09 16:10:17 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
17/04/09 16:10:23 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/04/09 16:10:23 INFO metastore.ObjectStore: ObjectStore, initialize called
17/04/09 16:10:25 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/04/09 16:10:25 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/04/09 16:10:25 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/04/09 16:10:28 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
17/04/09 16:10:32 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/04/09 16:10:35 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/04/09 16:10:35 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
scala>

 

 

 

   (python)

 $SPARK_HOME/bin/pyspark  

[spark@master ~]$ $SPARK_HOME/bin/pyspark
Python 2.6.6 (r266:84292, Nov 22 2013, 12:16:22) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
17/04/09 16:12:43 INFO spark.SparkContext: Running Spark version 1.6.1
17/04/09 16:12:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/09 16:12:47 INFO spark.SecurityManager: Changing view acls to: spark
17/04/09 16:12:47 INFO spark.SecurityManager: Changing modify acls to: spark
17/04/09 16:12:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
17/04/09 16:12:50 INFO util.Utils: Successfully started service 'sparkDriver' on port 43980.
17/04/09 16:12:52 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/04/09 16:12:52 INFO Remoting: Starting remoting
17/04/09 16:12:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.80.10:43789]
17/04/09 16:12:52 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 43789.
17/04/09 16:12:52 INFO spark.SparkEnv: Registering MapOutputTracker
17/04/09 16:12:53 INFO spark.SparkEnv: Registering BlockManagerMaster
17/04/09 16:12:53 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-d4013f75-a504-433d-9775-fa821826c411
17/04/09 16:12:53 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
17/04/09 16:12:53 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/04/09 16:12:54 INFO server.Server: jetty-8.y.z-SNAPSHOT
17/04/09 16:12:54 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
17/04/09 16:12:54 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
17/04/09 16:12:54 INFO ui.SparkUI: Started SparkUI at http://192.168.80.10:4040
17/04/09 16:12:55 INFO executor.Executor: Starting executor ID driver on host localhost
17/04/09 16:12:55 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59659.
17/04/09 16:12:55 INFO netty.NettyBlockTransferService: Server created on 59659
17/04/09 16:12:55 INFO storage.BlockManagerMaster: Trying to register BlockManager
17/04/09 16:12:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:59659 with 517.4 MB RAM, BlockManagerId(driver, localhost, 59659)
17/04/09 16:12:55 INFO storage.BlockManagerMaster: Registered BlockManager
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.6.6 (r266:84292, Nov 22 2013 12:16:22)
SparkContext available as sc, HiveContext available as sqlContext.

>>> exit()
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
17/04/09 16:13:53 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
17/04/09 16:13:53 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.80.10:4040
17/04/09 16:13:53 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/04/09 16:13:53 INFO storage.MemoryStore: MemoryStore cleared
17/04/09 16:13:53 INFO storage.BlockManager: BlockManager stopped
17/04/09 16:13:53 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/04/09 16:13:53 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/04/09 16:13:53 INFO spark.SparkContext: Successfully stopped SparkContext
17/04/09 16:13:53 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/04/09 16:13:53 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/04/09 16:13:53 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
[spark@master ~]$ 17/04/09 16:13:54 INFO util.ShutdownHookManager: Shutdown hook called
17/04/09 16:13:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9658547e-ca95-4439-bcad-be792378f151
17/04/09 16:13:54 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9658547e-ca95-4439-bcad-be792378f151/pyspark-21fbe135-979c-4219-b3ec-9c00a95f8957
^C
[spark@master ~]$

 

 

 

 

 

 

 

 

 

 

  (其他)

       这里不多说

[spark@master bin]$ pwd
/usr/local/spark/spark-1.6.1-bin-hadoop2.6/bin
[spark@master bin]$ ll
total 92 -rwxr-xr-x. 1 spark spark 1099 Feb 27 2016 beeline -rw-r--r--. 1 spark spark 932 Feb 27 2016 beeline.cmd -rw-r--r--. 1 spark spark 1910 Feb 27 2016 load-spark-env.cmd -rw-r--r--. 1 spark spark 2143 Feb 27 2016 load-spark-env.sh -rwxr-xr-x. 1 spark spark 3459 Feb 27 2016 pyspark -rw-r--r--. 1 spark spark 1486 Feb 27 2016 pyspark2.cmd -rw-r--r--. 1 spark spark 1000 Feb 27 2016 pyspark.cmd -rwxr-xr-x. 1 spark spark 2384 Feb 27 2016 run-example -rw-r--r--. 1 spark spark 2682 Feb 27 2016 run-example2.cmd -rw-r--r--. 1 spark spark 1012 Feb 27 2016 run-example.cmd -rwxr-xr-x. 1 spark spark 2858 Feb 27 2016 spark-class -rw-r--r--. 1 spark spark 2365 Feb 27 2016 spark-class2.cmd -rw-r--r--. 1 spark spark 1010 Feb 27 2016 spark-class.cmd -rwxr-xr-x. 1 spark spark 1049 Feb 27 2016 sparkR -rw-r--r--. 1 spark spark 1010 Feb 27 2016 sparkR2.cmd -rw-r--r--. 1 spark spark 998 Feb 27 2016 sparkR.cmd -rwxr-xr-x. 1 spark spark 3026 Feb 27 2016 spark-shell -rw-r--r--. 1 spark spark 1528 Feb 27 2016 spark-shell2.cmd -rw-r--r--. 1 spark spark 1008 Feb 27 2016 spark-shell.cmd -rwxr-xr-x. 1 spark spark 1075 Feb 27 2016 spark-sql -rwxr-xr-x. 1 spark spark 1050 Feb 27 2016 spark-submit -rw-r--r--. 1 spark spark 1126 Feb 27 2016 spark-submit2.cmd -rw-r--r--. 1 spark spark 1010 Feb 27 2016 spark-submit.cmd [spark@master bin]$ 

 

 

 

 

 

 

 

Spark shell

  Spark的scala REPL,支持使用scala语言来进行Spark的交互式编程。

  支持Spark的local模式

spark@master ~]$ $SPARK_HOME/bin/spark-shell

 

 

 

 

  支持yarn cluster模式

[spark@master ~]$ $SPARK_HOME/bin/spark-shell --master yarn-cluster

 

 

 

  支持yarn client模式(注意这两种都是!)

[spark@master ~]$ $SPARK_HOME/bin/spark-shell --master yarn

或

[spark@master ~]$ $SPARK_HOME/bin/spark-shell --master yarn-client

 

  内置实例化一个SparkContex对象,可以通过sc来调用

  内置实例化一个SQLContext对象,可以使用sqlContext来调用

  查看帮助sc.\t

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值