Hadoop大数据开发之_Sqoop本地搭建,sqoop作为实例测试,前提要安装好Hadoop和Java环境
1linux下解压sqoop文件
2.配置环境变量
echo 'export SQOOP_HOME=/opt/sqoop-1.4.7' >> /etc/profile
echo 'export PATH=$SQOOP_HOME/bin:$PATH' >> /etc/profile
source /etc/profile
3.配置sqoop-env.sh
export HADOOP_COMMON_HOME=/export/server/hadoop-3.1.4
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/export/server/hadoop-3.1.4
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/hive-2.3.9
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/zookeeper-3.4.14
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.14
4.拷贝mysql-connector-java-8.0.25.jar到/opt/sqoop-1.4.7/lib/下
5.测试从mysql导入到hdfs,使用mr操作
sqoop import \
--connect jdbc:mysql://node1:3306/web_test \
--username root \
--password 1234 \
--table t_web \
--target-dir /sqoop \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
6.知识点
关系型数据库导入到hadoop集群用 import
hadoop导出到关系型数据库用 export
导出基本语法
sqoop export
”jdbc“
table 表名
分割符
mapper数
export-dir /hadoop路径
7.可能遇到的问题
1.在配置sqoop文件时环境变量多配到了bin目录
2.运行sqoop语句时未开启yarn集群的ResourceManager,导致连接node结点失败。
8.注意
运行sqoop时要提前启动 HDFS集群,YARN集群,需要Yarn来分配资源