sqoop 基础

Sqoop
产生背景
 1) RDBMS ==> Hadoop
  file ==> load hive
  shell ==> file ==> HDFS
  MapReduce: DBInputFormat TextOutputFormat
 2) Hadoop ==> RDBMS
  MapReduce: TextInputFormat DBOutputFormat
 存在的问题
  1) MR 麻烦
  2) 效率低: 新的业务线 写一个MR
 ==> 抽取一个框架
  1) RDBMS层面的: driver/username/password/url/database/table/sql
  2) Hadoop层面的:hdfs path/分隔符/mappers/reducers
  3) 扩展一:当有一个新的业务线接入以后,那么我们只需要将新的业务线对应的参数传递给mapreduce即可
   A) hadoop jar的方式来提交
   B) 动态的根据业务线传入参数
  4) 扩展二:OK了吗?爽了吗? ==> 工匠精神
   思路是什么? ==> WebUI + DB Configuration ==> UIUE
   使用Spring Boot微服务建构大数据平台
Sqoop是什么
Apache Sqoop(TM) is a tool designed for
efficiently transferring bulk data
between Apache Hadoop and structured datastores
such as relational databases(RDBMS 关系型数据库).
RDBMS <==> Hadoop(HDFS/Hive/HBase....)
Note that 1.99.7 is not compatible with 1.4.6 and not feature complete,
it is not intended for production deployment.
sqoop1和sqoop2是不兼容,就好比struts1和struts2
Sqoop: SQL-to-Hadoop
 SQoops
连接传统数据库到Hadoop之间的一个桥梁
 MapReduce:一定有Mapper和Reduce吗?
  使用mapreduce来完成导入操作,是否需要reduce?




sqoop1架构









sqoop2 架构:




相关官方文档
http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.7.0/SqoopUserGuide.html
http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html
  

测试环境相关:
[hadoop@node1 ~]$ echo $HIVE_HOME;echo $HADOOP_HOME
/home/hadoop/app/hive-1.1.0-cdh5.7.0
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
---- sqoop 安装 配置===
[hadoop@node1 software]$ tar xf sqoop-1.4.6-cdh5.7.0.tar.gz -C /home/hadoop/app/

添加 sqoop的环境变量
[hadoop@node1 ~]$ cat /home/hadoop/.bash_profile |grep SQOOP
export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.7.0
export PATH=$PATH:$SQOOP_HOME/bin

修改
[hadoop@node1 conf]$ pwd
/home/hadoop/app/sqoop-1.4.6-cdh5.7.0/conf
[hadoop@node1 conf]$ cp -rp sqoop-env-template.sh sqoop-env.sh
[hadoop@node1 conf]$ echo $HIVE_HOME ;echo $HADOOP_HOME
/home/hadoop/app/hive-1.1.0-cdh5.7.0
/home/hadoop/app/hadoop-2.6.0-cdh5.7.0
echo "export HADOOP_COMMON_HOME=$HADOOP_HOME" >> sqoop-env.sh
echo "export HADOOP_MAPRED_HOME=$HADOOP_HOME" >> sqoop-env.sh
echo "export HIVE_HOME=$HIVE_HOME" >> sqoop-env.sh

# 如果需要把RDBMS导入到HBASE的时候,需要设置 export HBASE_HOME=...
# 因为环境中没有安装zookeeper,也不需要设置export ZOOCFGDIR=...
[hadoop@node1 ~]$ sqoop help
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/02/21 17:30:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
usage: sqoop COMMAND [ARGS]
Available commands:
  codegen Generate code to interact with database records
  create-hive-table Import a table definition into Hive
  eval Evaluate a SQL statement and display the results
  export Export an HDFS directory to a database table
  help List available commands
  import Import a table from a database to HDFS
  import-all-tables Import tables from a database to HDFS
  import-mainframe Import datasets from a mainframe server to HDFS
  job Work with saved jobs
  list-databases List available databases on a server
  list-tables List available tables in a database
  merge Merge results of incremental imports
  metastore Run a standalone Sqoop metastore
  version Display version information
See 'sqoop help COMMAND' for information on a specific command.
上面的报错信息 ,可以不用在意 。
如果在意的话, 可以修改$ZOOKEEPER_HOME/bin/configure-sqoop 注释掉HCatalog,Accumulo检查
$ZOOKEEPER_HOME
$ACCUMULO_HOME
这些warn如果用到了可以加到环境变量,没有用到也没关系
[hadoop@node1 soft
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值