spark hive 本地调试 & 提交任务到yarn cluster

CUUUT

于 2023-05-13 16:51:58 发布

阅读量623

点赞数

文章标签： spark hive 大数据

本文链接：https://blog.csdn.net/okkkay/article/details/130658995

版权

本文讲述spark部署在远程服务器的yarn环境下时，如何进行本地调试以及提交任务到yarn cluster

首先，把 core-site.xml hive-site.xml yarn-site.xml拷贝到 resources 下，和target/classes下
在这里插入图片描述

本地调试

    System.setProperty("HADOOP_USER_NAME", "user")  // 这里user改成你自己有权限的用户名

    val spark = SparkSession
      .builder()
      .enableHiveSupport()
      // 本地模式
      .master("local[*]")
      .appName("name")
      // 连接远程数据仓库
      .config("spark.sql.warehouse.dir", "hdfs://xxxx")
      // 如果不加的话，会导致只能显示目录，不能读取表数据
      .config("dfs.client.use.datanode.hostname", "true")
      .getOrCreate()

    //TODO 执行逻辑和操作
    // config("dfs.client.use.datanode.hostname", "true")，不加这个的话，会导致只能
    // show tables, 而select会一直Connection timed out: no further information
    spark.sql("show tables;").show()
    spark.sql("select * from table1;").show()
    
    //TODO 关闭环境
    spark.close()

提交任务到yarn

编写代码

    val sparkConf = new SparkConf()
      .setMaster("yarn")  // 设置yarn模式
      .setAppName("SparkSQL")

    val spark = SparkSession
      .builder()
      .enableHiveSupport()
      .config(sparkConf)
      .config("spark.sql.warehouse.dir", "hdfs://xxxx")  // 注明数据仓库地址
      .getOrCreate()

    //TODO 执行逻辑和操作
    val df: DataFrame = spark.sql("select * from table1")
    df.write.format("csv").save("hdfs://hdp:8020/output/result.csv")  // 保存结果

    //TODO 关闭环境
    spark.close()

代码敲完了，在maven打包前记得一定一定要build
maven打包，点package

这里看你需不需要把各种依赖打包，如果需要的话要在pom.xml添加如下代码：

    <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

加上依赖之后，包会变得很大，看个人需求
4. 上传到集群
5. copy完整类名
在这里插入图片描述
6. 提交任务到yarn集群

spark-submit \
--class 完整类名 \
--master yarn \
--deploy-mode cluster \
./xxxx/yyyy.jar

根据自己的需求在代码和提交命令中添加参数

CUUUT

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫