203.Spark(十)调优一:环境准备,执行计划,资源调优,sql语法调优

本文详细介绍了Spark调优过程,包括环境与数据准备,执行计划解析,资源调优(如内存估算、CPU优化)以及SparkSQL的语法优化策略,如谓词下推、广播Join等。通过对执行计划的深入理解,优化资源配置,提升Spark应用性能。
摘要由CSDN通过智能技术生成

目录

一、环境、数据准备

1.启动hadoop,hive

2.将文件上传到hadoop

3.将四个配置文件复制到idea中

4.引入pom.xml

5.环境配置

6.运行初始化代码

7.测试结果

二、 执行计划

1.基本语法

2.执行计划处理流程

3.explain使用示例

4.图形式的查看执行计划

二、资源调优

1.资源规划

(1)资源估算

(2)内存估算

2.缓存调优

(1)不带缓存运行rdd

(2)带缓存运行rdd

(3)DataFrame、DataSet

3.cpu优化

(1)并行

(2)并发

(3)CPU 低效原因

(4)合理利用 CPU 资源

三、SparkSQL 语法优化

1.基于 RBO (规则)的优化

(1)谓词下推

(2)列剪裁

(3)常量替换

2.基于 CBO (代价)的优化

(1)什么是cbo

(2)Statistics 收集

(3)使用 CBO

(4)广播 Join

(5)SMB Join


一、环境、数据准备

1.启动hadoop,hive

#先启动hadoop
cd /opt/module/
myhadoop.sh start
 
#查看启动情况
jpsall
 
#启动hive
cd /opt/module/hive/bin/
hiveservices.sh start
 
#查看启动状态,等到两个都是正常之后在执行后面的步骤
/opt/module/hive/bin/hiveservices.sh status
 
#查看日志,默认/tmp/root/hive.log,可以自定义日志路径
tail -f /opt/module/hive/logs/hive.log

#启动客户端
/opt/module/hive/bin/beeline -u jdbc:hive2://hadoop102:10000

#创建一个数据库sparktuning
create database if not exists sparktuning;
show databases;

2.将文件上传到hadoop

登录hadoop页面:

http://hadoop102:9870/explorer.html#/

创建文件夹:sparkdata,并将文件复制进去

3.将四个配置文件复制到idea中

#将core-site.xml,hdfs-site.xml,yarn-site.xml复制
cd /opt/module/hadoop-3.1.3/etc/hadoop/

#将hive-site.xml复制
cd /opt/module/hive/conf/

 

log4j.properties:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.sparkproject.jetty=WARN
log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

4.引入pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.atguigu.sparktuning</groupId>
    <artifactId>sparktuning</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <scala.binary.version>2.12</scala.binary.version>
        <spark.version>3.0.0</spark.version>
    </properties>

    <dependencies>
        <!-- Spark的依赖引入 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.47</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.28</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>druid</artifactId>
            <version>1.1.16</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <!--编译scala所需插件-->
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.1</version>
                <executions>
                    <execution>
                        <id>compile-scala</id>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>test-compile-scala</id>
                        <goals>
                            <goal>add-source</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <!-- 声明绑定到maven的compile阶段 -->
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

            <!-- assembly打包插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <archive>
                        <manifest>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>

            <!--            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                &lt;!&ndash; 所有的编译都依照JDK1.8 &ndash;&gt;
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>-->

        </plugins>
    </build>
</project>

5.环境配置

需要注意:项目需要引入scala

将scala文件夹设置文source root:

6.运行初始化代码

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

鹏哥哥啊Aaaa

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值