windows下运行spark，hadoop，并简单实现伪集群环境

⁣⁢白

已于 2022-05-04 21:15:51 修改

阅读量1.2k

点赞数

分类专栏：大数据文章标签： hadoop spark

于 2022-05-04 20:41:07 首次发布

本文链接：https://blog.csdn.net/u013842655/article/details/124577142

版权

大数据专栏收录该内容

1 篇文章 0 订阅

订阅专栏

windows下运行spark，hadoop，并简单实现伪集群环境

文章目录

windows下运行spark，hadoop，并简单实现伪集群环境

材料准备

hadoop-3.2.2
spark-3.2.1
java 1.8.0_291
scala 2.12.10

安装步骤

1.安装java1.8

2.安装scala

直接下载scala.msi安装后

scala -version

=》 Scala code runner version 2.12.10

3.安装hadoop

将安装包以管理员权限解压后，配置环境变量

HADOOP_HOME F:/hadoop(位置自己搞)

在path中配置如下

X:/hadoop/bin

4.安装spark

解压安装包后，添加到path里面的内容

X:/spark/bin

安装补丁

1.winutils

运行

git https://github.com/cdarlint/winutils.git

将hadoop-3.2.2/bin中的内容全部拷贝至步骤3中对应文件夹内

2.spark设置

进入spark安装路径

X:\spark\conf

将文件 spark-defaults.conf.template修改为spark-defaults.conf

修改内容

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
spark.eventLog.enabled           true
spark.eventLog.dir               file:///X://spark//log
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.history.fs.logDirectory    file:///X://spark//log

注意：file:///X://spark 是安装路径新建一个log文件夹

3.配置脚本

注意需要修改ip地址

IP地址获取如下

cmd中输入

ipconfig

找到对应的ipv4地址将其替换给如下ip即可

分别创建如下文件在spark安装路径中 X:\spark\bin

spark-start.bat（如果中文乱码别用utf-8保存）

echo 关闭端口
netstat -ano|findstr "8080"
netstat -ano|findstr "7077"
netstat -ano|findstr "18080"

echo 打开master
start "master" cmd /k call master.bat
timeout /t 8
echo 打开slave1
start "slave1" cmd /k call slave1.bat
timeout /t 2

echo 打开slave2
start "slave2" cmd /k call slave2.bat
timeout /t 2

echo 打开history
start "historyserver" cmd /k call historyserver.bat
timeout /t 5

echo 打开页面
start http://ip:8080
timeout /t 1
start http://ip:18080

master.bat

spark-class org.apache.spark.deploy.master.Master

slave1.bat

spark-class org.apache.spark.deploy.worker.Worker ip:7077

slave2.bat

spark-class org.apache.spark.deploy.worker.Worker ip:7077

historyserver.bat

spark-class org.apache.spark.deploy.history.HistoryServer

验证安装

进入cmd后输入

spark-start

如果弹出浏览器页面为2 并且可以正常浏览安装成功

任务提交

spark-submit --class aaa --master spark://ip:7077 jar

⁣⁢白

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
windows下运行spark，hadoop，并简单实现伪集群环境

windows下运行spark，hadoop，并简单实现伪集群环境
复制链接

扫一扫

专栏目录