在我做数仓项目的时候,通过Sqoop写入脚本向HDFS传入数据的时候,报了如标题所示的错误,以下是错误信息:
通过上面的两张图片不难看出,主要是存在两个问题
1.Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-572947236 (找不到HDFS的块——>上网搜寻后大多都是说块损坏)
于是我使用相应查看状态操作查看块
hdfs fsck /tmp/hadoop-yarn/staging/root/.staging/job_1659322766804_0001/libjars/opencsv-2.3.jar
结果显示:块状态是健康的,即并没有块损坏这一说法
The filesystem under path '/tmp/hadoop-yarn/staging/root/.staging/job_1659322766804_0001/libjars/opencsv-2.3.jar' is HEALTHY
所以报错的原因很可能是第一张图片的问题,导致了第二张图片的接连报错
2.Application application_xxx failed 2 times due to AM Container for attempt_xxx exited withexitCode: (上网搜寻后发现是配置出现了问题,主要是mapred-site.xml 与 yarn-site.xml)
我的主要是mapred-site.xml 出现了问题
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- 指定yarn的shuffle技术-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定resourcemanager的主机名-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!--配置resourcemanager的内部通讯地址-->
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop01:8032</value>
</property>
<!--配置resourcemanager的scheduler的内部通讯地址-->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop01:8030</value>
</property>
<!--配置resoucemanager的资源调度的内部通讯地址-->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop01:8031</value>
</property>
<!--配置resourcemanager的管理员的内部通讯地址-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop01:8033</value>
</property>
<!--配置resourcemanager的web ui 的监控页面-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志信息保存在文件系统上的最长时间,单位为秒-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>640800</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop-3.1.3/etc/hadoop:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/common/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn:/usr/local/hadoop-3.1.3/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn/*</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
主要的问题所在就在于yarn.application.classpath。我们同样因为Sqoop走的就是mapreduce,所以我们必须在mapreduce上面配置好对应的yarn.application.classpath。
以下是Sqoop的基本工作流程,可以看到Sqoop通过客户端接收到的命令通过Task Translater后转换为mapreduce相关任务
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定mapreduce使用yarn资源管理器-->
<property>
<name>mapred.job.tracker</name>
<value>hadoop01:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 配置作业历史服务器的地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!-- 配置作业历史服务器的http地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop-3.1.3/etc/hadoop:/usr/local/hadoop-3.1.3/share/hadoop/common/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/common/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/hdfs/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/mapreduce/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn:/usr/local/hadoop-3.1.3/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.1.3/share/hadoop/yarn/*</value>
</property>
</configuration>
主要就是加上yarn.application.classpath就ok了
最后如下图可以看到,问题完美解决!!!