虚拟机运行Spark程序Jar包
一 在IDEA编写Scala程序
- 此处简单的就用WordCount来示例
- 为了程序广泛的适用性,所以文件输入输出路径采用了Properties进行了包装成一个文件,该文件可以放在虚拟机上,以供随时修改里面的路径。
- 需要注意的是,单词来源路径和单词统计结果生成文件路径如果是要在虚拟机上运行,其路径不能是windows本机路径,会识别不出来,最好是虚拟机路径或者hdfs路径。
- IDEA程序
import java.io.FileInputStream
import java.util.Properties
import org.apache.spark.rdd.RDD
import org.apache.spark.{
SparkConf, SparkContext}
object WordCount {
def main(args:Array[String]):Unit={
val conf: SparkConf = new SparkConf().setAppName("SparkCoreTest").setMaster("local[*]")
val sc: SparkContext = new SparkContext(conf)
//生成Java配置文件对象
val properties=new Properties()
//配置文件对象通过io流加载配置文件
properties.load(new FileInputStream("/software/wordcount/userset.properties"))//此处是虚拟机上路径
//对象获取输入路径
val loadFilePath: String = properties.get("loadfile").toString
//对象获取输出路径
val outFilePath=properties.getProperty("outfile").toString
val result: RDD[(String, Int)] = sc.textFile(loadFilePath).flatMap(x=>x.split(" ")).map((_,1)).reduceByKey(_+_)
result.collect().foreach(println)
result.saveAsTextFile(outFilePath)
}
}
- 配置文件信息
- 从linux虚拟机读取文件
- 结果输出到hdfs路径,若不是root用户最好修改该路径权限:
hdfs dfs -chmod -R 777 /zhu
- 文本
二 生成Jar包
前文已经多次操作过,若需要复习的小伙伴请点击链接前往:
链接: 生成IDEA程序Jar包.
三 执行Jar包
- 将Jar包拖到linux虚拟机自不必多说
- 启动Hadoop:
start-all.sh
- 启动Spark客户端,先转到spark的sbin目录下:
cd /opt/spark245/sbin
,再启动:./start-all.sh
。(注:这一步是单机执行spark-submit需要做的,若是本地启动则不需要) - 删除Jar包里的安全及冲突文件(这一步非常重要,不执行这一步,可能会报错也有可能不输出结果)。
- 首先下载:
yum -y install zip
- 然后执行:
zip -d /software/wordcount/sparkdemo.jar 'META-INF/.SF' 'META-INF/.RSA' 'META-INF/*SF'
- 注:-d后面是Jar包的路径
- 首先下载:
- 执行spark-submit提交程序
- 本地启动
--master local[*]
- 单机模式
--master spark://192.168.198.201:7077
- 本地启动
spark-submit \
--class nj.zb.kb09.gaoji.WordCount \
--master local[*] \
/software/wordcount/sparkdemo.jar
- 程序执行结果
[root@hadoopwei wordcount]# spark-submit \
> --class nj.zb.kb09.gaoji.WordCount \
> --master local[*] \
> /software/wordcount/sparkdemo.jar
20/11/12 15:21:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/11/12 15:21:01 INFO SparkContext: Running Spark version 2.4.5
20/11/12 15:21:01 INFO SparkContext: Submitted application: SparkCoreTest
20/11/12 15:21:01 INFO SecurityManager: Changing view acls to: root
20/11/12 15:21:01 INFO SecurityManager: Changing modify acls to: root
20/11/12 15:21:01 INFO SecurityManager: Changing view acls groups to:
20/11/12 15:21:01 INFO SecurityManager: Changing modify acls groups to:
20/11/12 15:21:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
20/11/12 15:21:01 INFO Utils: Successfully started service 'sparkDriver' on port 39581.
20/11/12 15:21:01 INFO SparkEnv: Registering MapOutputTracker
20/11/12 15:21:01 INFO SparkEnv: Registering BlockManagerMaster
20/11/12 15:21:01 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/11/12 15:21:01 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/11/12 15:21:01 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-39e8f9fc-4490-4558-a7a0-6aba02898662
20/11/12 15:21:01 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/11/12 15:21:01 INFO SparkEnv: Registering OutputCommitCoordinator
20/11/12 15:21:01 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/11/12 15:21:01 INFO Utils: Successfully started service 'SparkUI' on port 4041.
20/11/12 15:21:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hadoopwei:4041
20/11/12 15:21:01 INFO SparkContext: Added JAR file:/software/wordcount/sparkdemo.jar at spark://hadoopwei:39581/jars/sparkdemo.jar with timestamp 1605165661891
20/11/12 15:21:01 INFO Executor: Starting executor ID driver on host localhost
20/11/12 15:21:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34046.
20/11/12 15:21:01 INFO NettyBlockTransferService: Server created on hadoopwei:34046
20/11/12 15:21:01 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/11/12 15:21:01 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hadoopwei, 34046, None)
20/11/12 15:21:01 INFO BlockManagerMasterEndpoint: Registering block manager hadoopwei:34046 with 366.3 MB RAM, BlockManagerId(driver, hadoopwei, 34046, None)
20/11/12 15:21:01 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hadoopwei, 34046, None)
20/