windows下运行spark,hadoop,并简单实现伪集群环境
文章目录
材料准备
-
hadoop-3.2.2
-
spark-3.2.1
-
java 1.8.0_291
-
scala 2.12.10
安装步骤
1.安装java1.8
2.安装scala
直接下载scala.msi安装后
scala -version
=》 Scala code runner version 2.12.10
3.安装hadoop
将安装包以管理员权限解压后,配置环境变量
HADOOP_HOME F:/hadoop(位置自己搞)
在path中配置如下
X:/hadoop/bin
4.安装spark
解压安装包后,添加到path里面的内容
X:/spark/bin
安装补丁
1.winutils
运行
git https://github.com/cdarlint/winutils.git
将hadoop-3.2.2/bin中的内容全部拷贝至步骤3中对应文件夹内
2.spark设置
进入spark安装路径
X:\spark\conf
将文件 spark-defaults.conf.template修改为spark-defaults.conf
修改内容
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# Example:
# spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir file:///X://spark//log
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.history.fs.logDirectory file:///X://spark//log
注意:file:///X://spark 是安装路径 新建一个log文件夹
3.配置脚本
注意 需要修改ip地址
IP地址获取如下
cmd中输入
ipconfig
找到对应的ipv4地址将其替换给如下ip即可
分别创建如下文件在spark安装路径中 X:\spark\bin
spark-start.bat(如果中文乱码 别用utf-8保存)
echo 关闭端口
netstat -ano|findstr "8080"
netstat -ano|findstr "7077"
netstat -ano|findstr "18080"
echo 打开master
start "master" cmd /k call master.bat
timeout /t 8
echo 打开slave1
start "slave1" cmd /k call slave1.bat
timeout /t 2
echo 打开slave2
start "slave2" cmd /k call slave2.bat
timeout /t 2
echo 打开history
start "historyserver" cmd /k call historyserver.bat
timeout /t 5
echo 打开页面
start http://ip:8080
timeout /t 1
start http://ip:18080
master.bat
spark-class org.apache.spark.deploy.master.Master
slave1.bat
spark-class org.apache.spark.deploy.worker.Worker ip:7077
slave2.bat
spark-class org.apache.spark.deploy.worker.Worker ip:7077
historyserver.bat
spark-class org.apache.spark.deploy.history.HistoryServer
验证安装
进入cmd后输入
spark-start
如果弹出浏览器页面为2 并且可以正常浏览 安装成功
任务提交
spark-submit --class aaa --master spark://ip:7077 jar