sbt構建一個spark工程(scala+spark+sbt)

準備工作,文件結構如下:

(python2.7) appleyuchi@ubuntu:~/Desktop/WordCount$ tree
.
├── build.sbt
├── src
│        └── main
│                     └── scala
│                                 └── WordCount.scala

其中WordCount.scala如下:
 

import org.apache.spark._
import org.apache.spark.SparkContext._

object WordCount {
    def main(args: Array[String]) {
      val inputFile = args(0)
      val outputFile = args(1)
      val conf = new SparkConf().setAppName("wordCount")
      // Create a Scala Spark Context.
      val sc = new SparkContext(conf)
      // Load our input data.
      val input =  sc.textFile(inputFile)
      // Split up into words.
      val words = input.flatMap(line => line.split(" "))
      // Transform into word and count.
      val counts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}
      // Save the word count back out to a text file, causing evaluation.
      counts.saveAsTextFile(outputFile)
    }
}

build.sbt內容如下:

name := "learning-spark-mini-example"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0" % "provided"

 

注意,hdfs的文件系统在linux下面是看不到的。
也就是说用hdfs dfs -mkdir在linux下面是看不到的。

注意這個實驗的輸入和輸出都在hdfs文件系統中,linux下是看不到的。


下面是詳細的步驟

1.
注意运行这个例子前,还需要启动hdfs,否则会connection refused
具體命令是:
./start-dfs.sh
然後jps命令看下,是否namenode和datanode都運行起來了。
2.把linux系統的文件README.txt拷貝到HDFS系統中

hdfs dfs -mkdir /user/appleyuchi
hdfs dfs -put README.txt /user/appleyuchi

3.sbt package

4.
/home/appleyuchi/bigdata/spark-2.3.1-bin-hadoop2.7/bin/spark-submit --class "WordCount" --master local /home/appleyuchi/Desktop/WordCount/target/scala-2.11/learning-spark-mini-example_2.11-1.0.jar hdfs://localhost:9000/user/appleyuchi/README.txt ./wordcounts

這裏注意下哈
hdfs://localhost:9000/user/appleyuchi/README.txt
這個東西不是linux上面的文件路徑,是第2個步驟中,把linux的文件傳入到hdfs系統中以後的路徑
也就是說這個文件在spark處理時,linux下我們直接看是看不見的。

5.
把hdfs系統中的運行結果拷貝到linux系統中
hadoop fs -get /user/appleyuchi/wordcounts ~/wordcounts

cd ~/wordcounts
cat part-00000
(Hadoop,1)
(Commodity,1)
(For,1)
(this,3)
(country,1)
(under,1)
(it,1)
(The,4)
(Jetty,1)
(Software,2)
(Technology,1)
(<http://www.wassenaar.org/>,1)
(have,1)
(http://wiki.apache.org/hadoop/,1)
(BIS,1)
(classified,1)
(This,1)
(following,1)
(which,2)
(security,1)
(See,1)
(encryption,3)
(Number,1)
(export,1)
(reside,1)
(for,3)
((BIS),,1)
(any,1)
(at:,2)
(software,2)
(makes,1)
(algorithms.,1)
(re-export,2)
(latest,1)
(your,1)
(SSL,1)
(the,8)
(Administration,1)
(includes,2)
(import,,2)
(provides,1)
(Unrestricted,1)
(country's,1)
(if,1)
(740.13),1)
(Commerce,,1)
(country,,1)
(software.,2)
(concerning,1)
(laws,,1)
(source,1)
(possession,,2)
(Apache,1)
(our,2)
(written,1)
(as,1)
(License,1)
(regulations,1)
(libraries,1)
(by,1)
(please,2)
(form,1)
(BEFORE,1)
(ENC,1)
(code.,1)
(both,1)
(5D002.C.1,,1)
(distribution,2)
(visit,1)
(is,1)
(about,1)
(website,1)
(currently,1)
(permitted.,1)
(check,1)
(Security,1)
(Section,1)
(on,2)
(performing,1)
((see,1)
(U.S.,1)
(with,1)
(in,1)
((ECCN),1)
(object,1)
(using,2)
(cryptographic,3)
(mortbay.org.,1)
(and/or,1)
(Department,1)
(manner,1)
(from,1)
(Core,1)
(has,1)
(may,1)
(Exception,1)
(Industry,1)
(restrictions,1)
(details,1)
(http://hadoop.apache.org/core/,1)
(project,1)
(you,1)
(another,1)
(or,2)
(use,,2)
(policies,1)
(uses,1)
(information,2)
(Hadoop,,1)
(to,2)
(code,1)
(software,,2)
(Regulations,,1)
(more,2)
(software:,1)
(see,1)
(,18)
(of,5)
(wiki,,1)
(Bureau,1)
(Control,1)
(exception,1)
(Government,1)
(eligible,1)
(Export,2)
(information.,1)
(Foundation,1)
(functions,1)
(and,6)
(included,1)
((TSU),1)
(asymmetric,1)

也可以是到hdfs系统中去查看结果,命令如下:

hdfs dfs -cat /user/appleyuchi/wordcounts/*

-------------------------------------------------------------
其他用到的命令:
hdfs dfs -rmr input刪除根目錄下面的input文件夾
hdfs dfs -ls


所以來總結下:
先要啓動HDFS系統,然後輸入的數據文件README.txt要傳入HDFS系統中,運行代碼前還要用sbt解決依賴問題,最後運行該代碼,
運行後的結果最初是放在HDFS系統中的,爲了觀察結果,把結果從HDFS系統中轉移到linux系統中,最後才結束。


參考文獻:
https://blog.csdn.net/coder__cs/article/details/78992764

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值