记录一个在大数据平台进行数据导入的错误:
Caused by: java.lang.IllegalArgumentException: Bad URL format. Hostname not found in authority part of the url: null:null. Are you missing a '/' after the hostname ?
at org.apache.hive.jdbc.Utils.parseURL(Utils.java:165)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:134)
... 16 more
19/05/29 10:31:10 INFO hive.HiveImport: Hive import complete.
19/05/29 10:31:10 INFO hive.HiveImport: Export directory is not empty, keeping it.
可以看出,错误为Bad URL format. Hostname not found in authority part of the url: null:null.
这个错误是在执行sqoop import 导入时发生的(sqoop代码没有问题,大家不用详看,关键不在此):
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/dbname --username root --password root --table check_report --hive-import --hive-database lmj_test --hive-table ori_yyts_check_report --map-column-hive reportinfo=string --map-column-java reportinfo=String --hive-overwrite --delete-target-dir --hive-drop-import-delims --hive-drop-import-delims --null-string '' --null-non-string '' -m5
执行的环境是linux系统,导入的hive集群是搭建在hadoop上的。
先说说最后怎么解决这个问题:通过添加了几个环境变量
export HADOOP_OPTS=" -DHADOOP_USER_NAME=jupiter"
export BDOS_SQOOP_HIVE_IP="dthost31"
export BDOS_SQOOP_HIVE_PORT="10000"
export BDOS_SQOOP_HIVE_NAMESPACE="lmj_test"
export BDOS_SQOOP_HIVE_USERNAME="jupiter"
export BDOS_SQOOP_HIVE_PWD="null"
#执行的sqoop语句保持不变
sqoop import --connect jdbc:postgresql://192.168.xxx.xxx:5432/dbname --username root --password root --table check_report --hive-import --hive-database lmj_test --hive-table ori_yyts_check_report --map-column-hive reportinfo=string --map-column-java reportinfo=String --hive-overwrite --delete-target-dir --hive-drop-import-delims --hive-drop-import-delims --null-string '' --null-non-string '' -m5
重新提交执行,发现导入成功,原本报错的地方变为:
19/05/29 10:44:30 INFO hive.HiveImport: Loading uploaded data into Hive
19/05/29 10:44:31 INFO hive.HiveImport: Hive import complete.
原因明显是在于一些环境变量的缺失,经过研究,关键的环境变量是
export BDOS_SQOOP_HIVE_IP="dthost31"
dthost31这台服务器,其实是hadoop集群的Hive Metastore、HiveServer2、WebHCat Server。我们在导入时,需要制定正确的HiveServer2
进一步查询原因,发现在配置文件中有这样一段代码:
URI jdbcURI = URI.create(URI_HIVE_PREFIX + "//" + authority);
// Check to prevent unintentional use of embedded mode. A missing "/"
// to separate the 'path' portion of URI can result in this.
// The missing "/" common typo while using secure mode, eg of such url -
// jdbc:hive2://localhost:10000;principal=hive/HiveServer2Host@YOUR-REALM.COM
if ((jdbcURI.getAuthority() != null) && (jdbcURI.getHost() == null)) {throw new JdbcUriParseException("Bad URL format. Hostname not found "+ " in authority part of the url: " + jdbcURI.getAuthority()
+ ". Are you missing a '/' after the hostname ?");
这里定义了异常抛出,可以看到,只要当getHost()或者getAuthority()为null时,就会抛出我们所关注的那个错误。
因此,我们需要在环境变量里添加关于host等的一些信息,而且host指定的应是hive句群的hiveserver2这台服务器