本地调试sparkapp-CSDN博客

本文链接：https://blog.csdn.net/weixin_28977419/article/details/81208952

2.本地调试SparkApp-需要怎么做

举个栗子：

object Application {
  private val LOG = LoggerFactory.getLogger(Application.getClass)

  def main(args: Array[String]): Unit = {
    System.setProperty("hadoop.home.dir", "F:\\winutils-master\\winutils-master\\hadoop-2.7.1")


    val algorithmArgs = AlgorithmArgs(args)
    println(algorithmArgs.paramsMap)

    ArgsCheck.checkArgs(algorithmArgs.paramsMap)
    val title: String = algorithmArgs.paramsMap.get("title").toString
    val alg: String = algorithmArgs.paramsMap.get("alg").toString
    println(title)
    println(alg)


    val sparkConf = new SparkConf().setAppName(s"alg").setMaster("local[*]")
    val sc = new SparkContext(sparkConf)
    val hiveContext = new HiveContext(sc)

    hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
      new StructType(Array(new StructField("Channel", DoubleType),
        
        new StructField("Delicassen", DoubleType))))
      .registerTempTable("kmeans");

  }

}

maven依赖文件

maven依赖

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<dependencies>

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-mllib_2.10</artifactId>

<version>${spark.version}</version>

</dependency>

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-hive_2.10</artifactId>

<version>${spark.version}</version>

</dependency>

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-core_2.10</artifactId>

<version>1.6.1</version>

</dependency>

<dependency>

<groupId>org.apache.spark</groupId>

<artifactId>spark-hive_2.10</artifactId>

<version>1.6.1</version>

</dependency>

</dependencies>

</project>

几个地方需要注意：

1.设置成Local模式

setMaster("local[*]")

2.初始化sparkcontext

SparkContext是进行sparkApp开发的主要接口，其他的比如sqlcontext,hivecontext根据自己后面的逻辑去构造，到底是用哪一个，关于sqlContext,和hiveContext

sqlContext:

The entry point for working with structured data (rows and columns) in Spark. Allows the
* creation of [[DataFrame]] objects as well as the execution of SQL queries.

hiveContext:

An instance of the Spark SQL execution engine that integrates with data stored in Hive.
* Configuration for Hive is read from hive-site.xml on the classpath.(涉及到hive表的话用这个)

3.构造后续代码里需要用到的数据表，和数据

因为sparkApp运行在本地时，都是在自己本地内存里的临时数据，如果后续代码里面有要要到的表，先创建，免得报说找不到表名

创建表及生成数据的例子：

  hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
      new StructType(Array(new StructField("Channel", DoubleType),
        new StructField("Region", DoubleType),
        new StructField("Fresh", DoubleType),
        new StructField("Milk", DoubleType),
        new StructField("Grocery", DoubleType),
        new StructField("Frozen", DoubleType),
        new StructField("Detergents_Paper", DoubleType),
        new StructField("Delicassen", DoubleType))))
      .registerTempTable("kmeans");

sqlContext.sql:这种方式不行

3.遇到的坑-及怎么填

1.java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

缺少winutils.exe程序。网上下载一个与自己hadoop版本想对应的winutils.ext,然后在系统里指定hadoop-home的路径，具体参考https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha

2.Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@160c7c42, see the next exception for details.

引入相应的jar包，依赖

3.java.io.IOException: Failed to delete: C:\Users\Administrator\AppData\Local\Temp\spark-ab9a3888-2c3e-4bef-90e5-8084ac8f180d

这个是最好退出的时候，删除临时文件失败，不影响你调试程序

4，没有权限访问,tmp/hive

使用命令区个这个目录加权限./bin/winutils.exe chmod 755 tmp/hive

5.主要注意spark,spark-hive相应的版本，一般经常会报找不到某个方法

这种情况很大可能是因为maven,依赖，不同的地方，引入了不同的版本的jar包，剔除就ok