通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出

最新推荐文章于 2024-04-23 14:57:25 发布

liudahai777

最新推荐文章于 2024-04-23 14:57:25 发布

阅读量501

点赞数

　　目录
　　
　　1、什么是 Sqoop？
　　
　　2、下载应用程序及配置环境变量
　　
　　2.1、下载 Sqoop 1.4.7
　　
　　2.2、设置环境变量
　　
　　2.3、设置安装所需环境
　　
　　3、安装 Sqoop 1.4.7
　　
　　3.1、修改 Sqoop 配置文件
　　
　　3.1.1、修改配置文件 sqoop-env.sh
　　
　　3.1.2、修改配置文件 configure-sqoop
　　
　　3.2、查看 Sqoop 版本
　　
　　4、启动和测试 Sqoop 的数据导入、导出
　　
　　4.1、Sqoop 通过 Hive 导入数据到 Sqoop
　　
　　4.2、Sqoop 通过 MySql 导入数据到 Hive
　　
　　4.3、Sqoop 通过 MySql 导入数据到 Hbase
　　
　　目录
　　
　　1、什么是 Sqoop？
　　
　　 Sqoop 是一种用于在 Hadoop 和关系数据库或大型机之间传输数据的工具。
　　
　　您可以使用 Sqoop 将数据从关系数据库管理系统RDBMS（如 MySQL 或 Oracle）导入 Hadoop 分布式文件系统 HDFS，转换 Hadoop MapReduce 中的数据，然后将数据导出回 RDBMS。
　　
　　 Sqoop 自动执行此过程的大部分过程，依靠数据库来描述要导入的数据的模式。Sqoop 使用 MapReduce 导入和导出数据，提供并行操作和容错。
　　
　　2、下载应用程序及配置环境变量
　　
　　2.1、下载 Sqoop 1.4.7
　　
　　通过以下命令下载 Sqoop，解压后，放到/home/work/_app/ 目录中：
　　
　　[root@c0 _src]# pwd
　　
　　/home/work/_src
　　
　　[root@c0 _src]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
　　
　　[root@c0 _src]# tar -xzvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
　　
　　[root@c0 _src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /home/work/_app/
　　
　　2.2、设置环境变量
　　
　　在每一台机器上设置 Sqoop 环境变量，运行以下命令
　　
　　echo "" >> /etc/bashrc
　　
　　echo "# Sqoop 1.4.7" >> /etc/bashrc
　　
　　echo "export SQOOP_HOME=/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0" >> /etc/bashrc
　　
　　echo "" >> /etc/bashrc
　　
　　echo "# Path" >> /etc/bashrc
　　
　　echo "export PATH=\$PATH:\$SQOOP_HOME/bin" >> /etc/bashrc
　　
　　source /etc/bashrc
　　
　　2.3、设置安装所需环境
　　
　　安装和运行 Sqoop 需要用到 Hive、MySql、Hadoop环境。可以参考文章：Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建
　　
　　3、安装 Sqoop 1.4.7
　　
　　3.1、修改 Sqoop 配置文件
　　
　　3.1.1、修改配置文件 sqoop-env.sh
　　
　　创建 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh 文件编辑并保存，内容为空，因为我们在本文的配置环境变量章节中已经配置了环境变量，同时也在<Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建>一文中配置了 Hive 和 Hadoop 环境变量：
　　
　　[root@c0 ~]# echo "" > /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh
　　
　　将 /home/work/_app/hive-2.3.4/lib/ 目录下的 hive-hcatalog-core-2.3.4.jar、mysql-connector-java-5.1.47-bin.jar、hive-common-2.3.4.jar、libthrift-0.9.3.jar 文件，复制到/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib 目录下
　　
　　[root@c0 ~]# cp /home/work/_app/hive-2.3.4/lib/hive-hcatalog-core-2.3.4.jar /home/work/_app/hive-2.3.4/lib/mysql-connector-java-5.1.47-bin.jar /home/work/_app/hive-2.3.4/lib/libthrift-0.9.3.jar /home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/
　　
　　3.1.2、修改配置文件 configure-sqoop
　　
　　编辑 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop 文件并保存，内容如下：
　　
　　[root@c0 _src]# cat /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop
　　
　　#!/bin/bash
　　
　　#
　　
　　# Copyright 2011 The Apache Software Foundation
　　
　　#
　　
　　# Licensed to the Apache Software Foundation (ASF) under one
　　
　　# or more contributor license agreements. See the NOTICE file
　　
　　# distributed with this work for additional information
　　
　　# regarding copyright ownership. The ASF licenses this file
　　
　　# to you under the Apache License, Version 2.0 (the
　　
　　# "License"); you may not use this file except in compliance
　　
　　# with the License. You may obtain a copy of the License at
　　
　　#
　　
　　# http://www.apache.org/licenses/LICENSE-2.0
　　
　　#
　　
　　# Unless required by applicable law or agreed to in writing, software
　　
　　# distributed under the License is distributed on an "AS IS" BASIS,
　　
　　# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
　　
　　# See the License for the specific language governing permissions and
　　
　　# limitations under the License.
　　
　　# This is sourced in by bin/sqoop to set environment variables prior to
　　
　　# invoking Hadoop.
　　
　　bin="$1"
　　
　　if [ -z "${bin}" ]; then
　　
　　bin=`dirname $0`
　　
　　bin=`cd ${bin} && pwd`
　　
　　fi
　　
　　if [ -z "$SQOOP_HOME" ]; then
　　
　　export SQOOP_HOME=${bin}/..
　　
　　fi
　　
　　SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
　　
　　if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
　　
　　. "${SQOOP_CONF_DIR}/sqoop-env.sh"
　　
　　fi
　　
　　# Find paths to our dependency systems. If they are unset, use CDH defaults.
　　
　　if [ -z "${HADOOP_COMMON_HOME}" ]; then
　　
　　if [ -n "${HADOOP_HOME}" ]; then
　　
　　HADOOP_COMMON_HOME=${HADOOP_HOME}
　　
　　else
　　
　　if [ -d "/usr/lib/hadoop" ]; then
　　
　　HADOOP_COMMON_HOME=/usr/lib/hadoop
　　
　　else
　　
　　HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop
　　
　　fi
　　
　　fi
　　
　　fi
　　
　　if [ -z "${HADOOP_MAPRED_HOME}" ]; then
　　
　　HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
　　
　　if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
　　
　　if [ -n "${HADOOP_HOME}" ]; then
　　
　　HADOOP_MAPRED_HOME=${HADOOP_HOME}
　　
　　else
　　
　　HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce
　　
　　fi
　　
　　fi
　　
　　fi
　　
　　# We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set
　　
　　# so that hcat script works correctly on BigTop
　　
　　if [ -z "${HADOOP_HOME}" ]; then
　　
　　if [ -n "${HADOOP_COMMON_HOME}" ]; then
　　
　　HADOOP_HOME=${HADOOP_COMMON_HOME}
　　
　　export HADOOP_HOME
　　
　　fi
　　
　　fi
　　
　　if [ -z "${HBASE_HOME}" ]; then
　　
　　if [ -d "/usr/lib/hbase" ]; then
　　
　　HBASE_HOME=/usr/lib/hbase
　　
　　else
　　
　　HBASE_HOME=${SQOOP_HOME}/../hbase
　　
　　fi
　　
　　fi
　　
　　#if [ -z "${HCAT_HOME}" ]; then
　　
　　# if [ -d "/usr/lib/hive-hcatalog" ]; then
　　
　　# HCAT_HOME=/usr/lib/hive-hcatalog
　　
　　# elif [ -d "/usr/lib/hcatalog" ]; then
　　
　　# HCAT_HOME=/usr/lib/hcatalog
　　
　　# else
　　
　　# HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
　　
　　# if [ ! -d ${HCAT_HOME} ]; then
　　
　　# HCAT_HOME=${SQOOP_HOME}/../hcatalog
　　
　　# fi
　　
　　# fi
　　
　　#fi
　　
　　#if [ -z "${ACCUMULO_HOME}" ]; then
　　
　　# if [ -d "/usr/lib/accumulo" ]; then
　　
　　# ACCUMULO_HOME=/usr/lib/accumulo
　　
　　# else
　　
　　# ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
　　
　　# fi
　　
　　#fi
　　
　　if [ -z "${ZOOKEEPER_HOME}" ]; then
　　
　　if [ -d "/usr/lib/zookeeper" ]; then
　　
　　ZOOKEEPER_HOME=/usr/lib/zookeeper
　　
　　else
　　
　　ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper
　　
　　fi
　　
　　fi
　　
　　if [ -z "${HIVE_HOME}" ]; then
　　
　　if [ -d "/usr/lib/hive" ]; then
　　
　　export HIVE_HOME=/usr/lib/hive
　　
　　elif [ -d ${SQOOP_HOME}/../hive ]; then
　　
　　export HIVE_HOME=${SQOOP_HOME}/../hive
　　
　　fi
　　
　　fi
　　
　　# Check: If we can't find our dependencies, give up here.
　　
　　if [ ! -d "${HADOOP_COMMON_HOME}" ]; then
　　
　　echo "Error: $HADOOP_COMMON_HOME does not exist!"
　　
　　echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'
　　
　　exit 1
　　
　　fi
　　
　　if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
　　
　　echo "Error: $HADOOP_MAPRED_HOME does not exist!"
　　
　　echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'
　　
　　exit 1
　　
　　fi
　　
　　## Moved to be a runtime check in sqoop.
　　
　　if [ ! -d "${HBASE_HOME}" ]; then
　　
　　echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
　　
　　echo 'Please set $HBASE_HOME to the root of your HBase installation.'
　　
　　fi
　　
　　## Moved to be a runtime check in sqoop.
　　
　　#if [ ! -d "${HCAT_HOME}" ]; then
　　
　　# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
　　
　　# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
　　
　　#fi
　　
　　#if [ ! -d "${ACCUMULO_HOME}" ]; then
　　
　　# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
　　
　　# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
　　
　　#fi
　　
　　if [ ! -d "${ZOOKEEPER_HOME}" ]; then
　　
　　echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
　　
　　echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
　　
　　fi
　　
　　# Where to find the main Sqoop jar
　　
　　SQOOP_JAR_DIR=$SQOOP_HOME
　　
　　# If there's a "build" subdir, override with this, so we use
　　
　　# the newly-compiled copy.
　　
　　if [ -d "$SQOOP_JAR_DIR/build" ]; then
　　
　　SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"
　　
　　fi
　　
　　function add_to_classpath() {
　　
　　dir=$1
　　
　　for f in $dir/*.jar; do
　　
　　SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
　　
　　done
　　
　　export SQOOP_CLASSPATH
　　
　　}
　　
　　# Add sqoop dependencies to classpath.
　　
　　SQOOP_CLASSPATH=""
　　
　　if [ -d "$SQOOP_HOME/lib" ]; then
　　
　　add_to_classpath $SQOOP_HOME/lib
　　
　　fi
　　
　　# Add HBase to dependency list
　　
　　if [ -e "$HBASE_HOME/bin/hbase" ]; then
　　
　　TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`
　　
　　SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
　　
　　fi
　　
　　# Add HCatalog to dependency list
　　
　　if [ -e "${HCAT_HOME}/bin/hcat" ]; then
　　
　　TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
　　
　　if [ -z "${HIVE_CONF_DIR}" ]; then
　　
　　TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
　　
　　fi
　　
　　SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
　　
　　fi
　　
　　# Add Accumulo to dependency list
　　
　　if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
　　
　　for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
　　
　　SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
　　
　　done
　　
　　for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
　　
　　SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
　　
　　done
　　
　　fi
　　
　　ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}
　　
　　if [ -d "${ZOOCFGDIR}" ]; then
　　
　　SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH
　　
　　fi
　　
　　SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}
　　
　　# If there's a build subdir, use Ivy-retrieved dependencies too.
　　
　　if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then
　　
　　for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do
　　
　　SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
　　
　　done
　　
　　fi
　　
　　add_to_classpath ${SQOOP_JAR_DIR}
　　
　　HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"
　　
　　if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then
　　
　　# User has elements to prepend to the classpath, forcibly overriding
　　
　　# Sqoop's own lib directories.
　　
　　export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"
　　
　　fi
　　
　　export SQOOP_CLASSPATH
　　
　　export SQOOP_CONF_DIR
　　
　　export SQOOP_JAR_DIR
　　
　　export HADOOP_CLASSPATH
　　
　　export HADOOP_COMMON_HOME
　　
　　export HADOOP_MAPRED_HOME
　　
　　export HBASE_HOME
　　
　　export HCAT_HOME
　　
　　export HIVE_CONF_DIR
　　
　　export ACCUMULO_HOME
　　
　　export ZOOKEEPER_HOME
　　
　　3.2、查看 Sqoop 版本
　　
　　[root@c0 _src]# sqoop version
　　
　　2019-03-11 22:30:16,837 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
　　
　　Sqoop 1.4.7
　　
　　git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
　　
　　Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
　　
　　4、启动和测试 Sqoop 的数据导入、导出
　　
　　4.1、Sqoop 通过 Hive 导入数据到 Sqoop
　　
　　在 mysql 中创建数据库 testmshk 并授权给 root 用户，同时创建 hive2mysql_mshk 表
　　
　　[root@c0 _src]# mysql -uroot -p123456
　　
　　mysql: [Warning] Using a password on the command line interface can be insecure.
　　
　　Welcome to the MySQL monitor. Commands end with ; or \g.
　　
　　Your MySQL connection id is 21
　　
　　Server version: 5.7.25 MySQL Community Server (GPL)
　　
　　Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
　　
　　Oracle is a registered trademark of Oracle Corporation and/or its
　　
　　affiliates. Other names may be trademarks of their respective
　　
　　owners.
　　
　　Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
　　
　　mysql> CREATE DATABASE testmshk DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
　　
　　Query OK, 1 row affected (0.00 sec)
　　
　　mysql> show databases;
　　
　　+--------------------+
　　
　　| Database |
　　
　　+--------------------+
　　
　　| information_schema |
　　
　　| hive |
　　
　　| mysql |
　　
　　| performance_schema |
　　
　　| sys |
　　
　　| testmshk |
　　
　　+--------------------+
　　
　　6 rows in set (0.02 sec)
　　
　　mysql> grant select,insert,update,delete,create on testmshk.* to root;
　　
　　Query OK, 0 rows affected (0.01 sec)
　　
　　mysql> flush privileges;
　　
　　Query OK, 0 rows affected (0.01 sec)
　　
　　mysql> use testmshk;
　　
　　Database changed
　　
　　mysql> create table hive2mysql_mshk(id int,namea varchar(50),nameb varchar(50));
　　
　　Query OK, 0 rows affected (0.02 sec)
　　
　　mysql> quit;
　　
　　Bye
　　
　　通过 Sqoop 查询 Mysql 中表的内容，这时可以看到表中的内容是空的
　　
　　[root@c0 ~]# sqoop eval --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"
　　
　　2019-03-11 23:44:06,894 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
　　
　　2019-03-11 23:44:06,945 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
　　
　　2019-03-11 23:44:07,100 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
　　
　　-------------------------------------------------------------
　　
　　| id | namea | nameb |
　　
　　-------------------------------------------------------------
　　
　　-------------------------------------------------------------
　　
　　在<Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建>一文中，我们在测试 Hive 时创建了测试数据 /hive/warehouse/testtable/testdata001.dat 我们将这个数据，导入到 Mysql
　　
　　[root@c0 ~]# sqoop export --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --table hive2mysql_mshk --export-dir /hive/warehouse/testtable/testdata001.dat --input-fields-terminated-by ','
　　
　　2019-03-11 23:47:10,400 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
　　
　　2019-03-11 23:47:10,437 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
　　
　　2019-03-11 23:47:10,571 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
　　
　　2019-03-11 23:47:10,574 INFO tool.CodeGenTool: Beginning code generation
　　
　　2019-03-11 23:47:10,914 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
　　
　　2019-03-11 23:47:10,943 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
　　
　　2019-03-11 23:47:10,952 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2
　　
　　Note: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.java uses or overrides a deprecated API.
　　
　　Note: Recompile with -Xlint:deprecation for details.
　　
　　2019-03-11 23:47:12,652 INFO orm.CompilationManager:www.mtyL1.com Writing jar file: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.jar
　　
　　2019-03-11 23:47:12,669 INFO mapreduce.ExportJobBase: Beginning export of hive2mysql_mshk
　　
　　2019-03-11 23:47:12,669 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
　　
　　2019-03-11 23:47:12,804 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
　　
　　2019-03-11 23:47:14,106 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
　　
　　2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
　　
　　2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
　　
　　2019-03-11 23:47:14,479 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
　　
　　2019-03-11 23:47:14,808 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0003
　　
　　2019-03-11 23:47:16,429 INFO input.FileInputFormat:www.zhenghongyule.com Total input files to process : 1
　　
　　2019-03-11 23:47:16,432 INFO input.FileInputFormat:www.hengtongyoule.com Total input files to process : 1
　　
　　2019-03-11 23:47:16,513 INFO mapreduce.JobSubmitter: number of splits:4
　　
　　2019-03-11 23:47:16,577 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
　　
　　2019-03-11 23:47:16,684 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0003
　　
　　2019-03-11 23:47:16,686 INFO mapreduce.JobSubmitter: Executing with tokens: []
　　
　　2019-03-11 23:47:16,924 INFO conf.Configuration: resource-types.xml not found
　　
　　2019-03-11 23:47:16,924 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
　　
　　2019-03-11 23:47:17,213 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0003
　　
　　2019-03-11 23:47:17,261 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0003/
　　
　　2019-03-11 23:47:17,262 INFO mapreduce.Job: Running job: job_1552315366846_0003
　　
　　2019-03-11 23:47:23,359 INFO mapreduce.Job: Job job_1552315366846_0003 running in uber mode : false
　　
　　2019-03-11 23:47:23,360 INFO mapreduce.Job: map 0% reduce 0%
　　
　　2019-03-11 23:47:31,454 INFO mapreduce.Job: map 75% reduce 0%
　　
　　2019-03-11 23:47:32,462 INFO mapreduce.Job: map 100% reduce 0%
　　
　　2019-03-11 23:47:32,473 INFO mapreduce.Job: Job job_www.meiwanyule.cn 1552315366846_0003 completed successfully
　　
　　2019-03-11 23:47:32,619 INFO mapreduce.Job: Counters: 32
　　
　　File System Counters
　　
　　FILE: Number of bytes read=0
　　
　　FILE: Number of bytes written=913424
　　
　　FILE: Number of read operations=0
　　
　　FILE: Number of large read operations=0
　　
　　FILE: Number of write operations=0
　　
　　HDFS: Number of bytes read=782
　　
　　HDFS: Number of bytes written=0
　　
　　HDFS: Number of read operations=19
　　
　　HDFS: Number of large read operations=0
　　
　　HDFS: Number of write operations=0
　　
　　Job Counters
　　
　　Launched map tasks=4
　　
　　Data-local map tasks=4
　　
　　Total time spent by all maps in occupied slots (ms)=23446
　　
　　Total time spent by all reduces in occupied slots (ms)=0
　　
　　Total time spent by all map tasks (ms)=23446
　　
　　Total vcore-milliseconds taken by all map tasks=23446
　　
　　Total megabyte-milliseconds taken by all map tasks=24008704
　　
　　Map-Reduce Framework
　　
　　Map input records=2
　　
　　Map output records=2
　　
　　Input split bytes=636
　　
　　Spilled Records=0
　　
　　Failed Shuffles=0
　　
　　Merged Map outputs=0
　　
　　GC time elapsed (ms)=582
　　
　　CPU time spent (ms)=3960
　　
　　Physical memory (bytes) snapshot=830259200
　　
　　Virtual memory (bytes) snapshot=11165683712
　　
　　Total committed heap usage (bytes)=454557696
　　
　　Peak Map Physical memory (bytes)=208502784
　　
　　Peak Map Virtual memory (bytes)=2793611264
　　
　　File Input Format Counters
　　
　　Bytes Read=0
　　
　　File Output Format Counters
　　
　　Bytes Written=0
　　
　　2019-03-11 23:47:32,626 INFO mapreduce.ExportJobBase: Transferred 782 bytes in 18.5015 seconds (42.2668 bytes/sec)
　　
　　2019-03-11 23:47:32,629 INFO mapreduce.ExportJobBase: Exported 2 records.
　　
　　--export-dir 表示在 HDFS 对应的 Hive 数据库文件位置
　　
　　–input-fields-terminated-by 表示要处理的间隔符
　　
　　再次通过 Sqoop 查看 MySql 中的内容，可以看到数据已经成功导入
　　
　　[root@c0 ~]# sqoop eval --connect jdbc:mysql:/www.jlzkyy.com/c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"
　　
　　2019-03-11 23:48:56,848 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
　　
　　2019-03-11 23:48:56,884 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
　　
　　2019-03-11 23:48:57,024 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
　　
　　-------------------------------------------------------------
　　
　　| id | namea | nameb |
　　
　　-------------------------------------------------------------
　　
　　| 10086 | my.mshk.top | you.mshk.top |
　　
　　| 12306 | mname | yname |
　　
　　-------------------------------------------------------------
　　
　　在 MySql 中能够看到我们创建的 hive2mysql_mshk 表有2行数据
　　
　　4.2、Sqoop 通过 MySql 导入数据到 Hive
　　
　　刚刚我们创建的 hive2mysql_mshk 表没有任何主键，我们只是从 Hive 中添加了一些记录到 Mysql。
　　
　　默认情况下，Sqoop 将识别表中的主键列（如果存在）并将其用作拆分列。
　　
　　从数据库中检索拆分列的低值和高值，并且映射任务在总范围的大小均匀的组件上运行。
　　
　　如果主键的实际值在其范围内不均匀分布，则可能导致任务不平衡。
　　
　　您应该使用 --split-by 参数明确选择不同的列。例如-- split-by id。
　　
　　在将 MySql 的数据导入到 Hive中的 Sqoop 命令添加了更多参数：
　　
　　sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk
　　
　　--split-by <column-name> 用哪个列来拆分
　　
　　--table 告诉计算机您要从MySQL导入哪个表
　　
　　--target-dir <dir> HDFS要存储的目录
　　
　　--hive-import 将表导入Hive
　　
　　--hive-overwrite 覆盖Hive表中的现有数据
　　
　　--hive-table <table-name> 设置导入Hive时要使用的表名
　　
　　--fields-terminated-by <char> 设置字段分隔符
　　
　　接下来 Sqoop 的操作是一个 map-reduce 工作。
　　
　　2019-03-12 20:21:37,154 INFO mapreduce.ImportJobBase: Transferred 49 bytes in 27.4776 seconds (1.7833 bytes/sec)
　　
　　2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Retrieved 2 records.
　　
　　2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table hive2mysql_mshk
　　
　　2019-03-12 20:21:37,188 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
　　
　　2019-03-12 20:21:37,208 INFO hive.HiveImport: Loading uploaded data into Hive
　　
　　2019-03-12 20:21:37,220 INFO conf.HiveConf:www.yongshi123.cn Found configuration file file:/home/work/_app/hive-2.3.4/conf/hive-site.xml
　　
　　2019-03-12 20:21:49,491 INFO hive.HiveImport:
　　
　　2019-03-12 20:21:49,492 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
　　
　　2019-03-12 20:21:56,558 INFO hive.HiveImport: OK
　　
　　2019-03-12 20:21:56,561 INFO hive.HiveImport: Time taken: 5.954 seconds
　　
　　2019-03-12 20:21:57,005 INFO hive.HiveImport: Loading data to table testmshk.mysql2hive_mshk
　　
　　2019-03-12 20:21:58,181 INFO hive.HiveImport: OK
　　
　　2019-03-12 20:21:58,181 INFO hive.HiveImport: Time taken: 1.619 seconds
　　
　　2019-03-12 20:21:58,681 INFO hive.HiveImport: Hive import complete.
　　
　　接下来我们将 MySql 中的表 hive2mysql_mshk 数据，导入到 Hbase ，同时在 Hbase 中创建表 mysql2hase_mshk
　　
　　[root@c0 ~]# sqoop import --connect jdbc:mysql://www.jrgjze.com c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --hbase-table mysql2hase_mshk --hbase-create-table --hbase-row-key id --column-family id
　　
　　2019-03-13 12:04:33,647 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
　　
　　2019-03-13 12:04:33,694 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
　　
　　2019-03-13 12:04:33,841 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
　　
　　2019-03-13 12:04:33,841 INFO tool.CodeGenTool:www.dfgjpt.com Beginning code generation
　　
　　2019-03-13 12:04:34,162 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
　　
　　2019-03-13 12:04:34,197 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
　　
　　hbase(main):004:0>
　　
　　如何在 Hbase 和 Hive 中互相导入、导出数据，请参考文章：Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建中的 9.2.4 和 9.2.5 章节。

liudahai777

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出

　　目录　　　　1、什么是 Sqoop？　　　　2、下载应用程序及配置环境变量　　　　2.1、下载 Sqoop 1.4.7　　　　2.2、设置环境变量　　　　2.3、设置安装所需环境　　　　3、安装 Sqoop 1.4.7　　　　3.1、修改 Sqoop 配置文件　　　　3.1.1、修改配置文件 sqoop-env.sh　　　　3.1.2、修改配置文件 conf...
复制链接

扫一扫