一、mysqlToHive
1.配置
(1)往/etc/profile最后加入
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/modules/hive-1.2.1/lib/*
然后刷新配置
source /etc/profile
(2) 把hive-site复制到sqoop中!
cp ../hive-1.2.1/conf/hive-site.xml conf/
2.命令:
bin/sqoop import \
--connect jdbc:mysql://bigdata.ibeifeng.com:3306/sqoop \
--username root \
--password 123456 \
--table tohdfs \
--direct \
--delete-target-dir \
--hive-import \
--hive-database student \
--hive-table stu_info \
--num-mappers 1 \
--fields-terminated-by '\t'
二、Sqlserver To Hive
1.环境
(1)CDH5.15.2(sqoop1)
(2)centos7.4
(3)sqlserver 2017
2.配置环境
(1)下载连接sql server的jar包,下载路径
或者从maven仓库下载对应版本,以mssql-jdbc-6.1.0.jre8.jar为例
(2)放到cdh的sqoop的lib目录下
/opt/cloudera/parcels/CDH/lib/sqoop/lib
3.执行导入hive语句
sqoop import \
--connect 'jdbc:sqlserver://ip:1433;database=master' \
--username 'sa' \
--password '[Root_123]' \
--table 'student2' \
--direct \
--delete-target-dir \
--hive-import \
--hive-database test2 \
--hive-table student2_ms \
--num-mappers 1 \
--fields-terminated-by '\t'
三、postgresql To Hive
1.环境
(1)postgresql
(2)cdh5.15.2
2.下载postgresql的jdbc驱动包postgresql-42.2.2.jar
3.postgresql数据库信息
(1)端口(默认):5432
4.创建pgsql数据表student
SELECT * FROM "student"
1 zhangsan 20 female
2 lisi 34 male
3 wangwu 35 male
4 wangwu3 35 male
5.切换到hdfs用户,运行sqoop命令
su – hdfs
备注:如果使用root用户,则会报错
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission
原因:使用root用户运行命令,但是root用户在hdfs上不是万能用户,hdfs上管理员权限是hdfs用户。所以,需要切换到hdfs用户再运行命令。
6. 将jar(postgresql-42.2.2.jar)包放到cm的sqoop的lib下
/opt/cloudera/parcels/CDH/lib/sqoop/lib/postgresql-42.2.2.jar
7.sqoop导入到hive
sqoop import --connect jdbc:postgresql://ip:5432/bigdata --username postgres --password 123456 --table student --hive-import --hive-overwrite --hive-database test --hive-table student2 --lines-terminated-by "\n" --fields-terminated-by "\t" --m 2
四、oracle To Hive
1.环境
(1)oracle
(2)cdh5.15.2
2.导入hive命令
sqoop import --connect jdbc:oracle:thin:@ip:1521:orcl --username=username --password=Root_123 \
--table STUDENT3 \
--delete-target-dir \
--hive-import \
--hive-database test2 \
--hive-table student2_or \
-m 1 \
--fields-terminated-by '\t'
3.报错
(1)描述:当导入STUDENT这张表的时候,报错
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "CLASS": invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:208)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:886)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3613)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3657)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1495)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
(2)原因:因为这张表是使用Navicat Premium创建的,而Navicat创建oracle表的时候,默认列名用双引号引着,而oracle默认是不区分大小写的,所以导致报错,如果把双引号去掉,即可。即用Navicat创建的时候是:
CREATE TABLE "STUDENT"(
"id" NUMBER(10,0) NOT NULL ENABLE,
"name" VARCHAR2(20 CHAR) NOT NULL ENABLE,
PRIMARY KEY (ID)
)
(3)解决办法:把“”冒号去掉,即可
CREATE TABLE STUDENT(
id NUMBER(10,0) NOT NULL ENABLE,
name VARCHAR2(20 CHAR) NOT NULL ENABLE,
PRIMARY KEY (ID)
)
五、总结
1.参考资料:官网
http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.15.2/SqoopUserGuide.html