spark-shell简单使用示例

原创 2016年08月29日 13:25:50
spark-shell 
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/


Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73)
Type in expressions to have them evaluated.
Type :help for more information.
18/01/31 23:08:35 WARN Utils: Your hostname, linux resolves to a loopback address: 127.0.0.2; using 192.168.73.128 instead (on interface eth0)
18/01/31 23:08:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/01/31 23:08:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context available as sc.
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/01/31 23:09:22 WARN : Your hostname, linux resolves to a loopback/non-reachable address: 192.168.73.128, but we couldn't find any external IP address!
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:38 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:51 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.


scala> val textFile = sc.textFile("file:/xnj/sparktest.txt");
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27


scala> text
textFile   text       


scala> textFile.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> textFile.count();
res0: Long = 10


scala> textFile.filter(line => line.contains("Hello")).count();
res1: Long = 9


scala> textFile.f
filter             filterWith         first              flatMap            flatMapWith        fold               foreach            
foreachPartition   foreachWith        


scala> textFile.first();
res2: String = Hello,World!


scala> var size = textFile.map(line=>line.split(" ").size);
size: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at map at <console>:29


scala> size.
++                         aggregate                  asInstanceOf               cache                      
cartesian                  checkpoint                 coalesce                   collect                    
compute                    context                    count                      countApprox                
countApproxDistinct        countByValue               countByValueApprox         dependencies               
distinct                   filter                     filterWith                 first                      
flatMap                    flatMapWith                fold                       foreach                    
foreachPartition           foreachWith                getCheckpointFile          getNumPartitions           
getStorageLevel            glom                       groupBy                    id                         
intersection               isCheckpointed             isEmpty                    isInstanceOf               
iterator                   keyBy                      localCheckpoint            map                        
mapPartitions              mapPartitionsWithContext   mapPartitionsWithIndex     mapPartitionsWithSplit     
mapWith                    max                        min                        name                       
name_=                     partitioner                partitions                 persist                    
pipe                       preferredLocations         randomSplit                reduce                     
repartition                sample                     saveAsObjectFile           saveAsTextFile             
setName                    sortBy                     sparkContext               subtract                   
take                       takeOrdered                takeSample                 toArray                    
toDebugString              toJavaRDD                  toLocalIterator            toString                   
top                        treeAggregate              treeReduce                 union                      
unpersist                  zip                        zipPartitions              zipWithIndex               
zipWithUniqueId            


scala> size.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> size.collect();
res3: Array[Int] = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)


scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b);
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:29


scala> wordCounts.collect();
res4: Array[(String, Int)] = Array((Hello,World!,9), ("",1))

Spark Shell简单使用

基础 Spark的shell作为一个强大的交互式数据分析工具,提供了一个简单的方式学习API。它可以使用Scala(在Java虚拟机上运行现有的Java库的一个很好方式)或Python。在Spark目...
  • universe_ant
  • universe_ant
  • 2016年07月24日 16:41
  • 4971

Spark-shell例子

//parallelize演示(并行化scala的数据集) val num=sc.parallelize(1 to 10) //将数组并行化成RDD,默认分片 val doublenum=num.ma...
  • scgaliguodong123_
  • scgaliguodong123_
  • 2015年12月22日 17:19
  • 1333

spark shell的学习

1. 进入SPARK_HOME/bin下运行命令:
  • yeruby
  • yeruby
  • 2014年11月12日 15:13
  • 26583

spark初始简单的例子

环境scala ide+maven scala ide 创建maven项目。然后创建src/main/scala目录。 pom文件配置: ...
  • zhoudetiankong
  • zhoudetiankong
  • 2016年09月22日 19:16
  • 939

spark-shell 启动 以及 例子

spark-shell 启动 以及 例子wordCount
  • hzdxw
  • hzdxw
  • 2016年06月18日 01:59
  • 3878

spark:--spark-shell运行简单语句、用Idea编写例子--8

spark集群启动:MASTER=spark://host:port ./spark-shell ********************************** 在终端:hadoop fs -t...
  • fenger1943
  • fenger1943
  • 2015年02月06日 15:20
  • 530

spark-shell的wordcount的例子存档

启动spark-shell后 完成的第一示例 完成Spark安装并启动后,就可以用Spark API执行数据分析查询了。 首先让我们用Spark API运行流行的Word Count示例。如果还没有运...
  • make_APP
  • make_APP
  • 2016年01月27日 20:25
  • 1480

Spark启动

命令:./start-master.sh Spark Command: /home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-b...
  • wind520
  • wind520
  • 2015年07月09日 22:56
  • 8308

在spark中使用Hive报错error: not found: value sqlContext

:16: error: not found: value sqlContext import sqlContext.implicits._ ^ :16...
  • lovebyz
  • lovebyz
  • 2016年05月15日 19:20
  • 7161

创建Spark 2.1.0 Docker镜像

目前使用最多的Apache Spark容器镜像是sequenceiq/spark,在Docker Hub上有330K的下载量。sequenceiq的更新速度不是很给力,目前最新的版本是sequence...
  • farawayzheng_necas
  • farawayzheng_necas
  • 2017年01月11日 10:15
  • 2658
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:spark-shell简单使用示例
举报原因:
原因补充:

(最多只允许输入30个字)