本地调试sparkapp

2.本地调试SparkApp-需要怎么做

举个栗子:

object Application {
  private val LOG = LoggerFactory.getLogger(Application.getClass)

  def main(args: Array[String]): Unit = {
    System.setProperty("hadoop.home.dir", "F:\\winutils-master\\winutils-master\\hadoop-2.7.1")


    val algorithmArgs = AlgorithmArgs(args)
    println(algorithmArgs.paramsMap)

    ArgsCheck.checkArgs(algorithmArgs.paramsMap)
    val title: String = algorithmArgs.paramsMap.get("title").toString
    val alg: String = algorithmArgs.paramsMap.get("alg").toString
    println(title)
    println(alg)


    val sparkConf = new SparkConf().setAppName(s"alg").setMaster("local[*]")
    val sc = new SparkContext(sparkConf)
    val hiveContext = new HiveContext(sc)

    hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
      new StructType(Array(new StructField("Channel", DoubleType),
        
        new StructField("Delicassen", DoubleType))))
      .registerTempTable("kmeans");

  }

}

maven依赖文件

maven依赖

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

 

   

    <dependencies>

<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-mllib_2.10</artifactId>

    <version>${spark.version}</version>

</dependency>

<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-hive_2.10</artifactId>

    <version>${spark.version}</version>

</dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-core_2.10</artifactId>

            <version>1.6.1</version>

        </dependency>

        <dependency>

            <groupId>org.apache.spark</groupId>

            <artifactId>spark-hive_2.10</artifactId>

            <version>1.6.1</version>

        </dependency>

    </dependencies>

</project>

几个地方需要注意:

1.设置成Local模式

setMaster("local[*]")

2.初始化sparkcontext

SparkContext是进行sparkApp开发的主要接口,其他的比如sqlcontext,hivecontext根据自己后面的逻辑去构造,到底是用哪一个,关于sqlContext,和hiveContext

sqlContext:

The entry point for working with structured data (rows and columns) in Spark. Allows the
* creation of [[DataFrame]] objects as well as the execution of SQL queries.

hiveContext:

An instance of the Spark SQL execution engine that integrates with data stored in Hive.
* Configuration for Hive is read from hive-site.xml on the classpath.(涉及到hive表的话用这个)

3.构造后续代码里需要用到的数据表,和数据

因为sparkApp运行在本地时,都是在自己本地内存里的临时数据,如果后续代码里面有要要到的表,先创建,免得报说找不到表名

创建表及生成数据的例子:

  hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
      new StructType(Array(new StructField("Channel", DoubleType),
        new StructField("Region", DoubleType),
        new StructField("Fresh", DoubleType),
        new StructField("Milk", DoubleType),
        new StructField("Grocery", DoubleType),
        new StructField("Frozen", DoubleType),
        new StructField("Detergents_Paper", DoubleType),
        new StructField("Delicassen", DoubleType))))
      .registerTempTable("kmeans");

sqlContext.sql:这种方式不行

3.遇到的坑-及怎么填

1.java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

缺少winutils.exe程序。网上下载一个与自己hadoop版本想对应的winutils.ext,然后在系统里指定hadoop-home的路径,具体参考https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha

2.Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@160c7c42, see the next exception for details.

引入相应的jar包,依赖

3.java.io.IOException: Failed to delete: C:\Users\Administrator\AppData\Local\Temp\spark-ab9a3888-2c3e-4bef-90e5-8084ac8f180d

这个是最好退出的时候,删除临时文件失败,不影响你调试程序

4,没有权限访问,tmp/hive

使用命令区 个这个目录加权限./bin/winutils.exe chmod 755 tmp/hive

5.主要注意spark,spark-hive相应的版本,一般经常会报找不到某个方法

这种情况很大可能是因为maven,依赖,不同的地方,引入了不同的版本的jar包,剔除就ok

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值