sqoop(3):import之mysql/sqlserver/pgsql/Oracle To hive

一、mysqlToHive

1.配置

(1)往/etc/profile最后加入 

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/modules/hive-1.2.1/lib/*

然后刷新配置

source /etc/profile

(2) 把hive-site复制到sqoop中!

 cp ../hive-1.2.1/conf/hive-site.xml  conf/

2.命令:

bin/sqoop import \
--connect jdbc:mysql://bigdata.ibeifeng.com:3306/sqoop \
--username root \
--password 123456 \
--table tohdfs \
--direct \
--delete-target-dir \
--hive-import \
--hive-database student \
--hive-table stu_info \
--num-mappers 1 \
--fields-terminated-by '\t'

二、Sqlserver To Hive

1.环境

(1)CDH5.15.2(sqoop1)

(2)centos7.4

(3)sqlserver 2017

2.配置环境

(1)下载连接sql server的jar包,下载路径

https://docs.microsoft.com/zh-cn/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-2017

或者从maven仓库下载对应版本,以mssql-jdbc-6.1.0.jre8.jar为例

(2)放到cdh的sqoop的lib目录下

/opt/cloudera/parcels/CDH/lib/sqoop/lib

3.执行导入hive语句

sqoop import \
--connect 'jdbc:sqlserver://ip:1433;database=master' \
--username 'sa' \
--password '[Root_123]' \
--table 'student2' \
--direct \
--delete-target-dir \
--hive-import \
--hive-database test2 \
--hive-table student2_ms \
--num-mappers 1 \
--fields-terminated-by '\t'

三、postgresql To Hive

1.环境

(1)postgresql

(2)cdh5.15.2

2.下载postgresql的jdbc驱动包postgresql-42.2.2.jar

3.postgresql数据库信息

(1)端口(默认):5432 

4.创建pgsql数据表student

SELECT * FROM "student"
1    zhangsan    20    female
2    lisi    34    male
3    wangwu    35    male
4    wangwu3    35    male

5.切换到hdfs用户,运行sqoop命令

su – hdfs

备注:如果使用root用户,则会报错

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x        at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission

    原因:使用root用户运行命令,但是root用户在hdfs上不是万能用户,hdfs上管理员权限是hdfs用户。所以,需要切换到hdfs用户再运行命令。

6. 将jar(postgresql-42.2.2.jar)包放到cm的sqoop的lib下

/opt/cloudera/parcels/CDH/lib/sqoop/lib/postgresql-42.2.2.jar

7.sqoop导入到hive

sqoop import --connect jdbc:postgresql://ip:5432/bigdata --username postgres --password 123456 --table student --hive-import --hive-overwrite --hive-database test --hive-table student2  --lines-terminated-by "\n" --fields-terminated-by "\t" --m 2

四、oracle To Hive

1.环境

(1)oracle

(2)cdh5.15.2

2.导入hive命令

sqoop import --connect jdbc:oracle:thin:@ip:1521:orcl --username=username --password=Root_123 \
--table STUDENT3 \
--delete-target-dir \
--hive-import \
--hive-database test2 \
--hive-table student2_or \
-m 1 \
--fields-terminated-by '\t'

3.报错

(1)描述:当导入STUDENT这张表的时候,报错

Error: java.io.IOException: SQLException in nextKeyValue
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "CLASS": invalid identifier

        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
        at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:208)
        at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:886)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296)
        at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3613)
        at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3657)
        at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495)
        at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
        at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
        ... 12 more

(2)原因:因为这张表是使用Navicat Premium创建的,而Navicat创建oracle表的时候,默认列名用双引号引着,而oracle默认是不区分大小写的,所以导致报错,如果把双引号去掉,即可。即用Navicat创建的时候是:

CREATE TABLE "STUDENT"(
"id" NUMBER(10,0) NOT NULL ENABLE, 
"name" VARCHAR2(20 CHAR) NOT NULL ENABLE, 
PRIMARY KEY (ID)
)

(3)解决办法:把“”冒号去掉,即可

CREATE TABLE STUDENT(
id NUMBER(10,0) NOT NULL ENABLE, 
name VARCHAR2(20 CHAR) NOT NULL ENABLE, 
PRIMARY KEY (ID)
)

五、总结

1.参考资料:官网

http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.15.2/SqoopUserGuide.html

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值