sparksql操作hive遇到的坑
第一次写sparksql项目想用sparksql操作hive查询数据就找到了百度代码做参考【idea工具开发】
代码如下:
import org.apache.spark.sql.SparkSession
object aaa {
def main(args: Array[String]): Unit = {
//设置HADOOP_USER_NAME,否则会有权限问题
System.setProperty("HADOOP_USER_NAME", "hadoop")
val spark = SparkSession
.builder()
.appName("SparkHiveDemo")
.master("spark://192.168.43.128:7077")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "/user/hive/warehouse/")
.getOrCreate()
spark.sql("select * from student").show()
spark.close()
}
}
现象:执行'show tables;'语句完全没有问题执行'select * from student'就会出现以下现象 executor不停分配和移除:
21/02/26 16:11:01 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asked to remove non-existent executor 5
21/02/26 16:11:01 INFO StandaloneSchedulerBackend: Granted executor ID app-20210226161103-0021/8 on hostPort 192.168.43.127:7079 with 1 core(s), 1024.0 MB RAM
21/02/26 16:11:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20210226161103-0021/8 is now RUNNING
21/02/26 16:11:01 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20210226161103-0021/6 is now EXITED (Command exited with code 1)
21/02/26 16:11:01 INFO StandaloneSchedulerBackend: Executor app-20210226161103-0021/6 removed: Command exited with code 1
21/02/26 16:11:01 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20210226161103-0021/9 on worker-20210226101445-192.168.43.130-7079 (192.168.43.130:7079) with 1 core(s)
21/02/26 16:11:01 INFO BlockManagerMasterEndpoint: Trying to remove executor 6 from BlockManagerMaster.
以上日志显示不断循环
查了一下excutor日志原因找报错如下【报错日志目录spark/work/app-20210226153746-0020/98/stderr】
... 4 more
Caused by: java.io.IOException: Failed to connect to DESKTOP-HKJLBCB:58243
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: DESKTOP-HKJLBCB
at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
原因:master指定错误,替换成如下代码即可完成
变更后的代码:
import org.apache.spark.sql.SparkSession
object aaa {
def main(args: Array[String]): Unit = {
//设置HADOOP_USER_NAME,否则会有权限问题
System.setProperty("HADOOP_USER_NAME", "hadoop")
val spark = SparkSession
.builder()
.appName("SparkHiveDemo")
.master("local[*]")
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "/user/hive/warehouse/")
.getOrCreate()
spark.sql("select * from student").show()
spark.close()
}
}
或者可以参考如下连接:
https://www.cnblogs.com/Mr-lin66/p/13519103.html