Sqoop-1.4.6-cdh5.7.0.tar.gz安装和使用

sqoop作用:

sqoop是一个数据交换工具,最常用的两个工具是导入导出。
导入导出的参照物是hadoop,向hadoop导数据就是导入。
sqoop目前有两个版本,1.4.X为sqoop1;1.99.X为sqoop2。两个版本不兼容。一般都用sqoop1

Sqoop安装

1 下载Sqoop-1.4.6-cdh5.7.0.tar.gz包

http://archive.cloudera.com/cdh5/cdh/5/
环境sqoop-1.4.6-cdh5.7.0.tar.gz版本

2 上传到linux上,并解压

[hadoop@hadoop001 software]$ tar -xzvf sqoop-1.4.6-cdh5.7.0.tar.gz –C~/app

如果不是hadoop用户解压,注意修改用户和用户组

3 Sqoop文件目录结构:

[hadoop@hadoop000 app]$cd sqoop-1.4.6-cdh5.7.0/

[hadoop@hadoop000sqoop-1.4.6-cdh5.7.0]$ ll

total 1880

drwxr-xr-x. 2 hadoophadoop 4096 Jun 25 10:10 bin(放脚本)

-rw-rw-r–. 1 hadoophadoop 60569 Mar 24 2016 build.xml

-rw-rw-r–. 1 hadoophadoop 1126 Mar 24 2016 cdh.build.properties

-rw-rw-r–. 1 hadoophadoop 35614 Mar 24 2016 CHANGELOG.txt

drwxr-xr-x. 4 hadoophadoop 4096 Jun 25 10:10 cloudera

-rw-rw-r–. 1 hadoophadoop 6901 Mar 24 2016 cloudera-pom.xml

-rw-rw-r–. 1 hadoophadoop 9880 Mar 24 2016 COMPILING.txt

drwxr-xr-x. 2 hadoophadoop 4096 Jun 25 10:10 conf(配置文件)

drwxr-xr-x. 5 hadoophadoop 4096 Jun 25 10:10 docs(文档)

drwxr-xr-x. 2 hadoophadoop 4096 Jun 25 10:10 ivy

-rw-rw-r–. 1 hadoophadoop 17386 Mar 24 2016 ivy.xml

drwxr-xr-x. 2 hadoophadoop 4096 Jun 25 10:10 lib(lib包依赖包)

-rw-rw-r–. 1 hadoophadoop 15319 Mar 24 2016 LICENSE.txt

-rw-rw-r–. 1 hadoophadoop 505 Mar 24 2016 NOTICE.txt

-rw-rw-r–. 1 hadoophadoop 18772 Mar 24 2016 pom-old.xml

-rw-rw-r–. 1 hadoophadoop 1096 Mar 24 2016 README.txt

-rw-rw-r–. 1 hadoophadoop 1035867 Mar 24 2016 sqoop-1.4.6-cdh5.7.0.jar(完整jar包)

-rw-rw-r–. 1 hadoophadoop 6554 Mar 24 2016 sqoop-patch-review.py

-rw-rw-r–. 1 hadoophadoop 656201 2016 sqoop-test-1.4.6-cdh5.7.0.jar(测试包)

drwxr-xr-x. 7 hadoophadoop 4096 Mar 24 2016 src(源码)

drwxr-xr-x. 4 hadoophadoop 4096 Jun 25 10:10 testdata

[hadoop@hadoop000 software]$ sudo chown hadoop:hadoop sqoop-1.4.6-cdh5.7.0

4 配置环境变量

[hadoop@hadoop001 sqoop]$vim .~/.bash_profile

添加自己的配置

export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.7.0

export PATH=$SQOOP_HOME/bin:$PATH

5 复制sqoop/conf/sqoop-env-template.sh为sqoop-env.sh,添加配置
[hadoop@hadoop001 conf]$ cp sqoop-env-template.sh   sqoop-env.sh


export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0(Hadoop安装目录)

export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.7.0(MapRedice目录)


export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.7.0(Hive目录)

export HIVE_CONF_DIR=/home/hadoop/app/hive-1.1.0-cdh5.7.0/conf (hive的conf目录,这里之所以要配置,是因为若果不配,从mysql数据库向hive中导数数会报错)
6 拷贝mysql的jdbc驱动包mysql-connector-java-5.1.27-bin.jar 和java-json.jar到sqoop/lib目录下

(这里我直接拷贝的hive下面已经下载好的驱动,注意如果mysql驱动版本过高,也可能会导致有各种各样问题)

[hadoop0001 lib]$
cp  /home/hadoop/app/hive-1.1.0-cdh5.7.0/lib/mysql-connector-java-5.1.27-bin.jar   .
cp  /home/hadoop/software/java-json.jar     .
为了避免后面将mysql中表的数据导入hive中,最好执行下面的2个操作:
[hadoop@hadoop001 lib]$ cp  hive-common-1.1.0-cdh5.7.0.jar   ~/app/sqoop-1.4.6-cdh5.7.0/lib/ 
[hadoop@hadoop001 lib]$ cp  hive-shims-*.jar  
[hadoop@hadoop001 lib]$ ll hive*
-rw-r--r-- 1 hadoop hadoop 301470 Jul 23 23:51 hive-common-1.1.0-cdh5.7.0.jar
-rw-r--r-- 1 hadoop hadoop  58263 Jul 23 23:46 hive-shims-0.23-1.1.0-cdh5.7.0.jar
-rw-r--r-- 1 hadoop hadoop   9147 Jul 23 23:46 hive-shims-1.1.0-cdh5.7.0.jar
-rw-r--r-- 1 hadoop hadoop 108292 Jul 23 23:46 hive-shims-common-1.1.0-cdh5.7.0.jar
-rw-r--r-- 1 hadoop hadoop  13093 Jul 23 23:46 hive-shims-scheduler-1.1.0-cdh5.7.0.jar
[hadoop@hadoop001 lib]$ 
7.查看是否安装成功
[hadoop@hadoop001 ~]$ sqoop version   //查看sqoop的版本
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/28 17:09:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
Sqoop 1.4.6-cdh5.7.0
git commit id 
Compiled by jenkins on Wed Mar 23 11:30:51 PDT 2016
[hadoop@hadoop001 ~]

Sqoop的用法

1.首先看看sqoop有没有报错:
[hadoop@hadoop001 bin]$ sqoop

Warning:/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase importswill fail.

Please set $HBASE_HOME to the root of yourHBase installation.

Warning:/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobswill fail.

Please set $HCAT_HOME to the root of yourHCatalog installation.

Warning:/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumuloimports will fail.

Please set $ACCUMULO_HOME to the root of yourAccumulo installation.

Warning:/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumuloimports will fail.

Please set $ZOOKEEPER_HOME to the root ofyour Zookeeper installation.

Try 'sqoop help' for usage.(这里提示让我用sqoophelp查看)

这里的警告是因为我没有配置hbase,zookeeper,HCatalog
2.sqoop hlep查看命令帮助
[hadoop@hadoop001 ~]$ sqoop help
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
19/07/28 17:16:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.
3.查看mysql数据库中的所有库
sqoop  list-databases \
--connect  jdbc:mysql://localhost:3306 \
--username root  \
--password  root
这样写,千万要注意换号符后面不能有空格,不然执行命令会报错
4.查看mysql数据库中的ruozedata库中的所有表
 sqoop list-tables \
--connect  jdbc:mysql://localhost:3306/ruozedata \
--username root  \
--password root
5.mysql数据库中的表向hadoop中的表导入数据

mysql的ruozedata库中的city_info表向hdfs导入数据

sqoop import \
--connect  jdbc:mysql://localhost:3306/ruozedata \
--password root \
--username root \
--table "city_info" \          从mysql的什么表中导出数据
--mapreduce-job-name  MySQL2HDFS_EMP \     给这个mr作业取个名字
--delete-target-dir \               当再次执行sqoop语句的时候,会报错,因为 person_mysql文件夹已经存在了,我们需要先删除这个文件夹再运行sqoop语句
--target-dir   person_mysql \       指定hdfs的目录名
--columns  city_name,area \        指定需要导出的字段
--fields-terminated-by  '\t' \      指定导入到hdfs文件的分割符     
 -m 1             sqoop默认配置就是4个map,用户也可以通过-m参数,自己指定map的数量
 [hadoop@hadoop001 shell]$ hadoop fs -cat  /user/hadoop/person_mysql/part-m-00000
19/07/28 17:36:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
BEIJING NC
SHANGHAI        EC
NANJING EC
GUANGZHOU       SC
SANYA   SC
WUHAN   CC
CHANGSHA        CC
XIAN    NW
CHENGDU SW
HAERBIN NE
[hadoop@hadoop001 shell]$ 
6.mysql数据库中的表向hive中的表导入数据
 sqoop import \
--connect  jdbc:mysql://localhost:3306/ruozedata \
--password root \
--username root \
--table "city_info" \
--mapreduce-job-name  city_info \
--delete-target-dir \
--hive-database  default \            导入hive的那个库
--hive-table  city_info \                 导入hive的那张表(这个表最好先按照和mysql中的表的格式手动创建,不要用sqoop import --create-hive-table参数来创建 )
--hive-import \                               导入hive的开关,必须要有,默认是关的
--hive-overwrite  \                          覆盖hive的表的内容
--fields-terminated-by  '\t' \            指定分割符
--split-by city_id \          如果msyql表没设置主键,必须指定字段进行拆分
 -m 2

注意:这个表最好先按照和mysql中的表的格式手动创建,不要用sqoop import --create-hive-table参数来创建

7.hive中的表数据向mysql数据库中导入数据
sqoop export \
 --connect jdbc:mysql://localhost:3306/ruozedata \
 --password root \
 --username root \
 --mapreduce-job-name  area_product_click_count_top3 \
 --table area_product_click_count_top3 \         导入到mysql的ruozedata库中的area_product_click_count_top3中
 --export-dir /user/hive/warehouse/area_product_click_count_top3 \      导出的hive的area_product_click_count_top3表所在的hdfs的存放目录
 --columns "day,product_id,click_sum,product_name,area,r"  \          定义导出那些字段
 --fields-terminated-by  '\t'  \                    指定分割符
 -m 2 	

注意:
1.导入数据时报错

19/07/23 23:34:59 ERROR tool.ImportTool: Error during import: No primary key could be found for table city_info. Please specify one with --split-by or perform a sequential import with '-m 1'.	
因为要么mysql表没有设置主键,要么没有加加上--split-by 参数 
如果 –m 不为1说明要么把-m 改为1  要么 加一个 --split-by 参数指定安照什么字段进行拆分要么给mysql表设置一个主键。

2.报错:

19/07/23 23:23:58 INFO hive.HiveImport: Loading uploaded data into Hive
19/07/23 23:23:58 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
19/07/23 23:23:58 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf	
解决办法:1.$SQOOP_HOME/CONF/sqoop-env.sh  配置HIVE_CONF_DIR目录
export HIVE_CONF_DIR=/home/hadoop/app/hive-1.1.0-cdh5.7.0/conf
2.	
[hadoop@hadoop001 lib]$ cp  hive-common-1.1.0-cdh5.7.0.jar   ~/app/sqoop-1.4.6-cdh5.7.0/lib/ 
[hadoop@hadoop001 lib]$ cp  hive-shims-*.jar  
[hadoop@hadoop001 lib]$ pwd
/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/lib
这样就能解决该问题。
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值