关闭

spark-shell简单使用示例

142人阅读 评论(0) 收藏 举报
分类:
spark-shell 
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.0
      /_/


Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_73)
Type in expressions to have them evaluated.
Type :help for more information.
18/01/31 23:08:35 WARN Utils: Your hostname, linux resolves to a loopback address: 127.0.0.2; using 192.168.73.128 instead (on interface eth0)
18/01/31 23:08:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/01/31 23:08:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context available as sc.
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:08 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:08 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
18/01/31 23:09:22 WARN : Your hostname, linux resolves to a loopback/non-reachable address: 192.168.73.128, but we couldn't find any external IP address!
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar."
18/01/31 23:09:38 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/oracle/app/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/xnj/spark-1.6.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar."
18/01/31 23:09:38 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
18/01/31 23:09:51 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
18/01/31 23:09:51 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.


scala> val textFile = sc.textFile("file:/xnj/sparktest.txt");
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:27


scala> text
textFile   text       


scala> textFile.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> textFile.count();
res0: Long = 10


scala> textFile.filter(line => line.contains("Hello")).count();
res1: Long = 9


scala> textFile.f
filter             filterWith         first              flatMap            flatMapWith        fold               foreach            
foreachPartition   foreachWith        


scala> textFile.first();
res2: String = Hello,World!


scala> var size = textFile.map(line=>line.split(" ").size);
size: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at map at <console>:29


scala> size.
++                         aggregate                  asInstanceOf               cache                      
cartesian                  checkpoint                 coalesce                   collect                    
compute                    context                    count                      countApprox                
countApproxDistinct        countByValue               countByValueApprox         dependencies               
distinct                   filter                     filterWith                 first                      
flatMap                    flatMapWith                fold                       foreach                    
foreachPartition           foreachWith                getCheckpointFile          getNumPartitions           
getStorageLevel            glom                       groupBy                    id                         
intersection               isCheckpointed             isEmpty                    isInstanceOf               
iterator                   keyBy                      localCheckpoint            map                        
mapPartitions              mapPartitionsWithContext   mapPartitionsWithIndex     mapPartitionsWithSplit     
mapWith                    max                        min                        name                       
name_=                     partitioner                partitions                 persist                    
pipe                       preferredLocations         randomSplit                reduce                     
repartition                sample                     saveAsObjectFile           saveAsTextFile             
setName                    sortBy                     sparkContext               subtract                   
take                       takeOrdered                takeSample                 toArray                    
toDebugString              toJavaRDD                  toLocalIterator            toString                   
top                        treeAggregate              treeReduce                 union                      
unpersist                  zip                        zipPartitions              zipWithIndex               
zipWithUniqueId            


scala> size.co
coalesce              collect               compute               context               count                 countApprox           
countApproxDistinct   countByValue          countByValueApprox    


scala> size.collect();
res3: Array[Int] = Array(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)


scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b);
wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:29


scala> wordCounts.collect();
res4: Array[(String, Int)] = Array((Hello,World!,9), ("",1))
1
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:18867次
    • 积分:460
    • 等级:
    • 排名:千里之外
    • 原创:9篇
    • 转载:117篇
    • 译文:0篇
    • 评论:1条
    文章分类