Spark
1.spark无法查询 spark Compression codec com.hadoop.compression.lzo.LzoCodec not found.
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
在hadoop中配置了编解码器lzo,所以当使用yarn模式时,spark自身没有lzo的jar包所以无法找到!
解决办法
配置spark-default.conf文件!
注意
如果仍然不能找到,就将lzo的jar包放到spark的jars中
2.spark-shell 执行sql语句报java.net.URISyntaxException: Expected scheme-specific part at
遇到idea执行sparksql程序时操作的是本地的spark-warehouse库,没有报错信息,后来发现使用spark-shell 查询sql也无法执行并报错,尝试将元数据信息初始化就解决了,idea也能正常访问hive了。
hive元数据为初始化,解决步骤,将mysql中的metasore数据库删除,再执行hive初始化元数据库信息的命令 schematool -initSchema -dbType mysql
3.spark on yarn 错误ExitCodeException exitCode=13
集群模式冲突,程序中不能指定mater模式 去掉setmaster()
Hive
1.HIVE insert return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask错误解决方案
hive 插入数据的时候,不能直接运行,报错
错误原因:
namenode内存空间不够,JVM剩余内存空间不够新job运行所致
错误提示
Starting Job = job_1594085668614_0006, Tracking URL = http://kudu:8088/proxy/application_1594085668614_0006/
Kill Command = /root/soft/hadoop-3.2.1/bin/mapred job -kill job_1594085668614_0006
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2020-07-07 09:43:24,559 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1594085668614_0006 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
1
解决办法:
在hive> 下,输入
set hive.exec.mode.local.auto=true;
2.idea 启动spark程序连接hive hive启动错误 The dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------ 权限不够
启动hive时出现错误
解决:首先需要在windows下有hadoop的环境(hadoop已配置环境变量,可以cmd用hadoop -version检测),然后到bin,
cd D:\hadoop\hadoop-2.4.1\hadoop-2.4.1\bin
.\winutils.exe ls F:\tmp\hive
.\winutils.exe chmod 777 F:\tmp\hive
.\winutils.exe ls F:\tmp\hive
在windows环境下,需要使用如下命令设置用户,否则会报异常
GetLocalGroupForUser errot(2221):?????
System.setProperty("HADOOP_USER_NAME","root")