spark本地开发环境搭建(maven + scala + java)

开发工具和软件版本信息

IDEA

2019.2

JAVA

1.8

Scala

2.11.12

Spark

2.4.3

Hadoop

2.7.7

Windows

Win10专业版64位

Centos

7.5

 

部署Spark和Hadoop本地模式

1)下载spark和Hadoop

spark,选择pre_build版本,也就是编译好的版本

http://mirror.bit.edu.cn/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

Hadoop,注意下载和spark对应的版本

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

 

2)下载Scala

Spark2.4.3支持的Scala版本为2.12,因此我们下载2.12.8版本

https://downloads.lightbend.com/scala/2.12.8/scala-2.12.8.msi

直接双击安装,选择安装在 D:\software目录下面,然后一直点击下一步直到结束

 

3)以管理员身份运行winrar,把

hadoop-2.7.7.tar.gz 和 spark-2.4.3-bin-hadoop2.7.tgz解压到E:\bigdata\

得到下面两个目录:

E:\bigdata\hadoop-2.7.7

E:\bigdata\spark-2.4.3-bin-hadoop2.7

 

 

4)配置环境变量

1:配置SPARK_HOME,HADOOP_HOME,SCALA_HOME的环境变量

 

x  (N):  SPARK HOME  E:\bigdata\spark-2.4.3-bin-hadoop2.71

 

 

 

x  HADOOP HOME  data\hadoo -2.7.7

 

 

x  SCALA HOME  D:\software\scala

 

5)配置PATH变量

1)Scala配置到PATH变量中

 在PATH变量中新增 :

 %SCALA_HOME%\bin

%SCALA_HOME%\jre\bin

%HADOOP_HOME%\bin

%SPARK_HOME%\bin

 

x  96USERPROFlLE%\AppData\Local\Microsoft\WindowsApps  %JAVA  D:\software\Microsoft VS Code\bin  SPARK  HADOOP  SCALA  MEW \

 

 

 

 

 

测试执行Spark本地模式是否成功

执行spark-sell命令,如果出现下图所示,则成功

 

Using Spark' s default log4j profile: org apache/ spark/10g4j—defau1ts. properties  Setting default log level to "WARN".  To adjust logging level use sc. setLogLeve1 (newLeve1). For SparkR, use setLogLeve1 (newLeve1) .  Spark context Web UI available at http • //  Spark context available as ' sc' (master  local app id  - local-1560831166803).  Spark session available as ' spark'  Welcome to  7  / \ , // / / version 2. 4.3  Using Scala version 2. 11. 12 (Java HotSpot (TM) 64—Bit Server VM,  Type in expressions to have them evaluated.  Type :help for more information.  scala>  Java 1.8.0 201)

 

 

1)解决winutil报错的问题

一般情况下,在Windows中启动spark-shell会报下面错误:

C: \User s\3g01 den) spark—shel 1  Missing Python executable ' python' , defaulting to ' E: \bigdata\spark—2. 4. 3—bin—hadoop2. 7\bin\.  for SPARK HOME environment variable.  utable in PYSPARK DRIVER PYTHON or PYSPARK PYTHON environment variable to detect SPARK HOME safely.  19/06/18 12:12:37 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path  java. io. IOException: Could not locate executable null\bin\winutils. exe in the Hadoop binaries.  Please install Python or specify the correct Python exec  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  at  org.  org.  org.  org.  org.  org.  org.  org.  org.  org.  org.  org.  apache. hadoop. util. Shell. getQua1 ifiedBinPath (Shell. java : 379)  apache. hadoop. util. Shell. getWinUti 1sPath (Shell. java: 394)  apache. hadoop. util. Shell. init) (Shell. java: 387)  apache. hadoop. util. StringUti1s. init) (StringUti1s. java: 80)  apache. hadoop. securi ty. Securi tyUti1. getAuthenticationMethod (Securi tyUti1. java: 611)  apache. hadoop. securi ty. UserGroupInformation. initial ize (UserGroupInformation. java: 273)  apache. hadoop. securi ty. UserGroupInformation. ensurelnitial ized (UserGroupInformation. java: 261)  apache. hadoop. security. UserGroupInformation. loginUserFromSubject (UserGroupInformation. java: 791)  apache. hadoop. securi ty. UserGroupInformation. getLoginUser (UserGroupInformation. java : 761)  apache. hadoop. securi ty. UserGroupInformation. getCurrentUser (UserGroupInformation. java: 634)  apache. spark. util. Uti 1s$$anonfun$getCurrentUserName$1. apply (Uti Is. scala: 2422)  apache. spark. util. Uti 1s$$anonfun$getCurrentUserName$1. apply (Uti Is. scala: 2422)  scala. Option. getOrE1se (Option. scala: 121)  org. apache. spark. util. Uti Is$. getCurrentUserName (Utils. scala: 2422)  org. apache. spark. SecurityManager. <init> (SecurityManager. scala: 79)  org. apache. spark. deploy. SparkSubmit. secMgr$1zycompute$1 (SparkSubmit. scala: 359)  org. apache. spark. deploy. SparkSubmit. org$apache$spark$dep10y$SparkSubmit$$secMgr$1 (SparkSubmit. scala: 359)  org. apache. spark. deploy. SparkSubmi t$$anonfun$prepareSubmi tEnvironment$7. apply (SparkSubmi t. scala : 367)  org. apache. spark. deploy. SparkSubmi t$$anonfun$prepareSubmi tEnvironment$7. apply (SparkSubmit. scala: 367)  scala. Option. map (Option. scala: 146)  org. apache. spark. deploy. SparkSubmit. prepareSubmi tEnvironment (SparkSubmit. scala: 366)  org. apache. spark. deploy. SparkSubmit. submit (SparkSubmit. scala: 143)  org. apache. spark. deploy. SparkSubmit. doSubmit (SparkSubmit. scala: 86)  org. apache. spark. deploy. SparkSubmi t$$anon$2. doSubmit (SparkSubmit. scala: 924)  org. apache. spark. deploy. SparkSubmi t$. main (SparkSubmit. scala: 933)  org. apache. spark. deploy. SparkSubmit. main (SparkSubmit. scala)  19/06/18 12: 12:37 WARN NativeCodeLoader: Unable to load native—hadoop library for your platform. . .  Using Spark' s default log4j profile: org apache/ spark/10g4j—defau1ts. properties  Setting default log level to "WARN".  using builtin—java classes where applicable

 

 

那么则需要把下面的压缩包里面的文件放到Hadoop的bin目录里面就没问题了:

 

> bigdata  hadoop-2.7.7  bin  winutil.rar  hadoop.dll  winutils.exe  2019/6/18 12:18  2019/3/25 9:46  2019/3/25 9:46  WinRAR  75 KB  84 KB  107 KB

 

压缩包见下面:

 

<<winutil.rar>>

 

重新执行spark-shell

 

- spark-shell  icrosoft Windows 10. 0. 17763. 437]  (c) 2018 Microsoft Corporation,  spark—shell  sing Spark' s default log4j profile: org/apache/spark/10g4j—defau1ts. properties  Setting default log level to "WARN".  o adjust logging level use sc. setLogLeve1 (newLeve1). For SparkR, use setLogLeve1 (newLeve1).  Spark context Web UI available at http • //  Spark context available as ' sc' (master  local app id  - local-1560832265811).  Spark session available as ' spark'  el come to  7  / \ , // / / version 2. 4.3  sing Scala version 2. 11. 12 (Java HotSpot (TM) 64—Bit Server VM,  ype in expressions to have them evaluated.  ype :help for more information.  scala>  Java 1.8.0 201)

 

 

然后在通过:http://localhost:4040/jobs/ 可以进入到spark界面

 

o localhost:4040/jobs  Säar  Jobs Stages  2.4.3  Spark Jobs (?)  User: 3golden  Total Uptime: 32 s  Scheduling Mode: FIFO  Event Timeline  Storage  Environment  Executors

 

 

 

 

IDEA本地开发Spark设置

 

1)IDEA的下载和安装

 

下载安装IDEA,社区版本就够使用

 

IDEA官网

https://www.jetbrains.com/idea/

 

 

https://vwm.jetbrains.com/ide  IntelliJ IDEA  Coming in 2019.2  What's New  Features  Learn  Buy  Download  DEA  Capable and Ergonormg_l_  r JVM  DOWNLOAD  TAKE A TOUR

 

 

 

IDEA社区版本下载:

 

Download IntelliJ  IDEA  Windows  macOS  Linux  Ultimate  For web and enterprise  Community  For JVM and Android  development  DOWNLOAD  Free trial  .EXE  development  DOWNLOAD  .EXE  Free, open-source

 

 

2)安装IDEA到本地目录(略)

 

3)安装Scala插件

打开IDEA,然后找到 Configure -> Plugins

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image012.png

 

 

搜索Scala插件,然后点击install安装。安装完成后状态变为installed

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image013.png

 

安装完成,重启IDEA

4)配置项目的Scala和Java SDK

选择project structure

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image014.png

 

 

 

然后设置每个项目的Java版本为1.8

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image015.png

 

 

然后为每个项目新增Scala的支持

 

Project Structure  scala-sdk-2.1 2.8  Project Settings  Project  Modules  Libraries  Facets  Artifacts  Platform Settings  SDKs  Global Libraries  Problems  Name:  2.12  scala-sdk-2.1 2.8  Compiler classpath:  Standard library:  V Classes  V  D:\software\scala\lib\scala-library.jar  D:\software\scala\lib\scala-parser-combinators 2.12-1.0.7.jar  El 2.12-2.0.3.jar  El 2.12-1.0.6.jar  JavaDocs  OK  Cancel  Apply

 

 

 

5)新建Spark开发项目

1:点击 create a new job

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image017.png

 

选择maven项目,然后直接点击下一步

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image018.png

 

填写项目信息,直接next

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image019.png

 

选择项目路径,然后next,项目建立完毕

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image020.png

 

接下来要为项目建立Scala支持

 

6)项目增加Scala SDK

 

第一步,右键点击项目,然后选择 Add Framework Support

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image021.png

 

 

第二步,勾选Scala,然后点击OK

 

Add Frameworks Support  Please select the desired technologies.  This will download all needed libraries and create Facets in project configuration.  C,) Groovy  a Z OSGi  D Play 2.x  •.1 Scala  SQL Support  Use library. scala-sdk-2.12.0  scala-sdk-2.12.O library will be used  OK  Create...  Configure...  Cancel

 

 

然后再项目结构中可以看到有Scala的支持:

 

 

Project  V dem001  7_ dem001 .iml  m pom.xml  1.8 > 201  scala-sdk-2.12.O  Scratches and Consoles

 

 

在main目录下面,和Java相同级别,建立一个目录名字叫做Scala目录

 

Project  resources  7_ dem001 .iml  m pom.xml  > FilesVava\jdk1.8.0 201  scala-sdk-2.12..O  Scratches and Consoles

 

 

右键点击scala目录,选择mark directory as 选项,设置为source folder

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image025.png

 

 

7)项目中代码开发

 

在POM文件中加入依赖:

 

C:\69819325\5DE70910-5C4B-4582-AC8E-43D53B836AC8.files\image026.png

 

具体插入的内容为:

 

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

 

    <groupId>bigdata_study</groupId>

    <artifactId>demo01</artifactId>

    <version>1.0-SNAPSHOT</version>

 

    <dependencies>

        <dependency><!--Sparkdependency-->

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-sql_2.12</artifactId>

            <version>2.4.3</version>

<!--            <scope>provided</scope>-->

        </dependency>

    </dependencies>

 

</project>

 

接下来,等待maven把所有的依赖下载完成后,新建一个Scala类,执行这个类

package com. bd. testl  import org. apache. spark. sql. SparkSession  object Testl {  def main (args: Array [String)): Unit  val spark = SparkSession. builder. appName( name  Simple Application"). master( master —  spark. sparkContext. parallelize (List(l, 2, 3))  val rddl  print(rddl. reduce( + ) )  spark. stop ( )  local . get0rCreate ( )

 

代码如下:

packagecom.bd.test1

importorg.apache.spark.sql.SparkSession

object Test1{

Def main(args:Array[String]):Unit={

val spark = SparkSession.builder.appName("SimpleApplication").master("local[*]").getOrCreate()

val rdd1=spark.sparkContext.parallelize(List(1,2,3))

print(rdd1.reduce(_+_))

spark.stop()

}

}

 

 

然后直接执行,已经可以执行了:

 

Eile Edit yiew Navigate Code Analyze Refactor Build Run 1001s VCS Window Help  dem001 src main scala com > bd ' testl Testl .scala  In 599  In 597  In 643  1.0  In 603  In 600  Project  @ V Rdem001 E:\workpaces\idea\dem001  resources  V com.bd.testl  Testl  D target  dem001 .iml  m pom.xml  Run:  Testl  m dem001 Testl .scala  package com. bd. testl  import org. apache. spark. sql. SparkSession  — object Testl {  def main (args: Array [String)): Unit  val spark = SparkSession. builder. appName( name  spark. sparkContext. paral lel ize (List(l,  val rddl  Simple Application"). master( master —  local  INFO  INFO  INFO  INFO  INFO  INFO  INFO  INFO  INFO  TaskSetManager :  TaskSetManager :  TaskSetManager :  TaskSetManager :  TaskSetManager :  TaskSetManager :  7  10  Fini shed  Fini shed  Fini shed  Fini shed  Fini shed  Fini shed  print(rddl. reduce( + ) )  spark. stop ( )  main(args: Array[String])  19/06/21  19/06/21  19/06/21  19/06/21  19/06/21  19/06/21  19/06/21  19/06/21  19/06/21  Testi  task  task  task  task  task  task  6.  0  stage  stage  stage  stage  stage  stage  0.  0.  0.  0.  0.  0.  o  o  o  o  o  o  (TID  (TID  (TID  (TID  (TID  (TID  6)  3)  5)  0)  1)  4)  In 594  ms  ms  ms  ms  ms  ms  on  on  on  on  on  on  localhost  localhost  localhost  localhost  localhost  localhost  (executor  (executor  (executor  (executor  (executor  (executor  driver)  dri ver)  dri ver)  dri ver)  dri ver)  dri ver)  (4/8)  (5/8)  (6/8)  (8/8)  TaskSchedu1erImp1: Removed TaskSet 0. 0, whose tasks  have all completed, from pool  DACSchedu1er: ResultStage 0 (reduce at Testl. scala:7) finished in 0. 899 s  DACSchedu1er: Job 0 finished: reduce at Testl. scala:7, took 0. 976056 s  619/06/21 18:27:02 INFO SparkU1: Stopped spark web UI at http://DESKTOP-PD31C7E:4040  19/06/21 18:27 :02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!  19/06/21 18:27 :02 INFO MemoryStore: MemoryStore cleared

 

 

 

注意:一定要注意的是:如果要把代码发布到集群上面的话,需要

1)把maven依赖的scope改为provided

2)代码中的master方法不要调用,因为我们在spark-submit的时候会指定master

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值