2.本地调试SparkApp-需要怎么做
举个栗子:
object Application {
private val LOG = LoggerFactory.getLogger(Application.getClass)
def main(args: Array[String]): Unit = {
System.setProperty("hadoop.home.dir", "F:\\winutils-master\\winutils-master\\hadoop-2.7.1")
val algorithmArgs = AlgorithmArgs(args)
println(algorithmArgs.paramsMap)
ArgsCheck.checkArgs(algorithmArgs.paramsMap)
val title: String = algorithmArgs.paramsMap.get("title").toString
val alg: String = algorithmArgs.paramsMap.get("alg").toString
println(title)
println(alg)
val sparkConf = new SparkConf().setAppName(s"alg").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)
hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
new StructType(Array(new StructField("Channel", DoubleType),
new StructField("Delicassen", DoubleType))))
.registerTempTable("kmeans");
}
}
maven依赖文件
maven依赖
|
几个地方需要注意:
1.设置成Local模式
setMaster("local[*]")
2.初始化sparkcontext
SparkContext是进行sparkApp开发的主要接口,其他的比如sqlcontext,hivecontext根据自己后面的逻辑去构造,到底是用哪一个,关于sqlContext,和hiveContext
sqlContext:
The entry point for working with structured data (rows and columns) in Spark. Allows the
* creation of [[DataFrame]] objects as well as the execution of SQL queries.
hiveContext:
An instance of the Spark SQL execution engine that integrates with data stored in Hive.
* Configuration for Hive is read from hive-site.xml on the classpath.(涉及到hive表的话用这个)
3.构造后续代码里需要用到的数据表,和数据
因为sparkApp运行在本地时,都是在自己本地内存里的临时数据,如果后续代码里面有要要到的表,先创建,免得报说找不到表名
创建表及生成数据的例子:
hiveContext.createDataFrame(hiveContext.sparkContext.makeRDD(Seq(Row.fromSeq(Seq(2.2, 3.2, 126.69, 96.56, 75.61, 21.4, 26.74, 13.38)))),
new StructType(Array(new StructField("Channel", DoubleType),
new StructField("Region", DoubleType),
new StructField("Fresh", DoubleType),
new StructField("Milk", DoubleType),
new StructField("Grocery", DoubleType),
new StructField("Frozen", DoubleType),
new StructField("Detergents_Paper", DoubleType),
new StructField("Delicassen", DoubleType))))
.registerTempTable("kmeans");
sqlContext.sql:这种方式不行
3.遇到的坑-及怎么填
1.java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
缺少winutils.exe程序。网上下载一个与自己hadoop版本想对应的winutils.ext,然后在系统里指定hadoop-home的路径,具体参考https://stackoverflow.com/questions/35652665/java-io-ioexception-could-not-locate-executable-null-bin-winutils-exe-in-the-ha
2.Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@160c7c42, see the next exception for details.
引入相应的jar包,依赖
3.java.io.IOException: Failed to delete: C:\Users\Administrator\AppData\Local\Temp\spark-ab9a3888-2c3e-4bef-90e5-8084ac8f180d
这个是最好退出的时候,删除临时文件失败,不影响你调试程序
4,没有权限访问,tmp/hive
使用命令区 个这个目录加权限./bin/winutils.exe chmod 755 tmp/hive
5.主要注意spark,spark-hive相应的版本,一般经常会报找不到某个方法
这种情况很大可能是因为maven,依赖,不同的地方,引入了不同的版本的jar包,剔除就ok