spark-shell简单使用示例

原创 2016年08月29日 13:25:50
spark-shell 
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/


Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73)
Type in expressions to have them evaluated.
Type :help for more information.
18/01/31 23:08:35 WARN Utils: Your hostname, linux resolves to a loopback address: 127.0.0.2; using 192.168.73.128 instead (on interface eth0)
18/01/31 23:08:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/01/31 23:08:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context available as sc.
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/01/31 23:09:22 WARN : Your hostname, linux resolves to a loopback/non-reachable address: 192.168.73.128, but we couldn't find any external IP address!
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:38 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:51 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.


scala> val textFile = sc.textFile("file:/xnj/sparktest.txt");
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27


scala> text
textFile   text       


scala> textFile.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> textFile.count();
res0: Long = 10


scala> textFile.filter(line => line.contains("Hello")).count();
res1: Long = 9


scala> textFile.f
filter             filterWith         first              flatMap            flatMapWith        fold               foreach            
foreachPartition   foreachWith        


scala> textFile.first();
res2: String = Hello,World!


scala> var size = textFile.map(line=>line.split(" ").size);
size: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at map at <console>:29


scala> size.
++                         aggregate                  asInstanceOf               cache                      
cartesian                  checkpoint                 coalesce                   collect                    
compute                    context                    count                      countApprox                
countApproxDistinct        countByValue               countByValueApprox         dependencies               
distinct                   filter                     filterWith                 first                      
flatMap                    flatMapWith                fold                       foreach                    
foreachPartition           foreachWith                getCheckpointFile          getNumPartitions           
getStorageLevel            glom                       groupBy                    id                         
intersection               isCheckpointed             isEmpty                    isInstanceOf               
iterator                   keyBy                      localCheckpoint            map                        
mapPartitions              mapPartitionsWithContext   mapPartitionsWithIndex     mapPartitionsWithSplit     
mapWith                    max                        min                        name                       
name_=                     partitioner                partitions                 persist                    
pipe                       preferredLocations         randomSplit                reduce                     
repartition                sample                     saveAsObjectFile           saveAsTextFile             
setName                    sortBy                     sparkContext               subtract                   
take                       takeOrdered                takeSample                 toArray                    
toDebugString              toJavaRDD                  toLocalIterator            toString                   
top                        treeAggregate              treeReduce                 union                      
unpersist                  zip                        zipPartitions              zipWithIndex               
zipWithUniqueId            


scala> size.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> size.collect();
res3: Array[Int] = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)


scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b);
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:29


scala> wordCounts.collect();
res4: Array[(String, Int)] = Array((Hello,World!,9), ("",1))

windows安装spark后spark-shell启动报错

Hadoop lib下已添加对应版本的winutils cmd输入D:\peizhi\hadoop\bin\winutils.exe chmod 777 /tmp/hive 不能成功启动spark...

在spark中使用Hive报错error: not found: value sqlContext

:16: error: not found: value sqlContext import sqlContext.implicits._ ^ :16...
  • lovebyz
  • lovebyz
  • 2016年05月15日 19:20
  • 6378

spark shell的学习

1. 进入SPARK_HOME/bin下运行命令:
  • yeruby
  • yeruby
  • 2014年11月12日 15:13
  • 25693

[spark学习]之spark shell 入门

spark shell 是spark自带的一个快速原型开发的工具,在spark目录下面的bin目录下面,鸡...

spark-shell 基本用法

spark-shell 是 scala 语言的 REPL(Read-Eval-Print-Loop,通俗地理解就是命令行模式) 环境,同时针对 spark 做了一些拓展。1. 启动 spark-she...

Spark Shell简单使用

基础 Spark的shell作为一个强大的交互式数据分析工具,提供了一个简单的方式学习API。它可以使用Scala(在Java虚拟机上运行现有的Java库的一个很好方式)或Python。在Spark目...

spark sql读取hive数据时报找不到mysql驱动的解决办法

spark sql读取hive数据时报找不到mysql驱动
  • chx3515
  • chx3515
  • 2014年12月08日 23:39
  • 1597

Hadoop2.7.3+Spark2.1.0 完全分布式环境 搭建全过程

一、修改hosts文件 在主节点,就是第一台主机的命令行下; vim /etc/hosts 我的是三台云主机: 在原文件的基础上加上; ip1 master work...
  • hopeztm
  • hopeztm
  • 2017年02月24日 22:41
  • 1830

Spark安装与基础使用

Spark快速入门指南 – Spark安装与基础使用 Apache Spark 是一个新兴的大数据处理通用引擎,提供了分布式的内存抽象。Spark 正如其名,最大的特点就是快(Lightning-f...

Spark的join与cogroup简单示例

1.join  join就是把两个集合根据key,进行内容聚合;          元组集合A:(1,"Spark"),(2,"Tachyon"),("3","Hadoop")  元组集合B:(...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:spark-shell简单使用示例
举报原因:
原因补充:

(最多只允许输入30个字)