通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出

  目录
  
  1、什么是 Sqoop?
  
  2、下载应用程序及配置环境变量
  
  2.1、下载 Sqoop 1.4.7
  
  2.2、设置环境变量
  
  2.3、设置安装所需环境
  
  3、安装 Sqoop 1.4.7
  
  3.1、修改 Sqoop 配置文件
  
  3.1.1、修改配置文件 sqoop-env.sh
  
  3.1.2、修改配置文件 configure-sqoop
  
  3.2、查看 Sqoop 版本
  
  4、启动和测试 Sqoop 的数据导入、导出
  
  4.1、Sqoop 通过 Hive 导入数据到 Sqoop
  
  4.2、Sqoop 通过 MySql 导入数据到 Hive
  
  4.3、Sqoop 通过 MySql 导入数据到 Hbase
  
  目录
  
  1、什么是 Sqoop?
  
    Sqoop 是一种用于在 Hadoop 和关系数据库或大型机之间传输数据的工具。
  
    您可以使用 Sqoop 将数据从关系数据库管理系统RDBMS(如 MySQL 或 Oracle)导入 Hadoop 分布式文件系统 HDFS,转换 Hadoop MapReduce 中的数据,然后将数据导出回 RDBMS。
  
    Sqoop 自动执行此过程的大部分过程,依靠数据库来描述要导入的数据的模式。Sqoop 使用 MapReduce 导入和导出数据,提供并行操作和容错。
  
  2、下载应用程序及配置环境变量
  
  2.1、下载 Sqoop 1.4.7
  
    通过以下命令下载 Sqoop,解压后,放到/home/work/_app/ 目录中:
  
  [root@c0 _src]# pwd
  
  /home/work/_src
  
  [root@c0 _src]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
  
  [root@c0 _src]# tar -xzvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
  
  [root@c0 _src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /home/work/_app/
  
  2.2、设置环境变量
  
    在每一台机器上设置 Sqoop 环境变量,运行以下命令
  
  echo "" >> /etc/bashrc
  
  echo "# Sqoop 1.4.7" >> /etc/bashrc
  
  echo "export SQOOP_HOME=/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0" >> /etc/bashrc
  
  echo "" >> /etc/bashrc
  
  echo "# Path" >> /etc/bashrc
  
  echo "export PATH=\$PATH:\$SQOOP_HOME/bin" >> /etc/bashrc
  
  source /etc/bashrc
  
  2.3、设置安装所需环境
  
    安装和运行 Sqoop 需要用到 Hive、MySql、Hadoop环境。可以参考文章:Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建
  
  3、安装 Sqoop 1.4.7
  
  3.1、修改 Sqoop 配置文件
  
  3.1.1、修改配置文件 sqoop-env.sh
  
     创建 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh 文件编辑并保存,内容为空,因为我们在本文的配置环境变量章节中已经配置了环境变量,同时也在<Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建>一文中配置了 Hive 和 Hadoop 环境变量:
  
  [root@c0 ~]# echo "" > /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh
  
    将 /home/work/_app/hive-2.3.4/lib/ 目录下的 hive-hcatalog-core-2.3.4.jar、mysql-connector-java-5.1.47-bin.jar、hive-common-2.3.4.jar、libthrift-0.9.3.jar 文件,复制到/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib 目录下
  
  [root@c0 ~]# cp /home/work/_app/hive-2.3.4/lib/hive-hcatalog-core-2.3.4.jar /home/work/_app/hive-2.3.4/lib/mysql-connector-java-5.1.47-bin.jar /home/work/_app/hive-2.3.4/lib/libthrift-0.9.3.jar /home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/
  
  3.1.2、修改配置文件 configure-sqoop
  
    编辑 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop 文件并保存,内容如下:
  
  [root@c0 _src]# cat /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop
  
  #!/bin/bash
  
  #
  
  # Copyright 2011 The Apache Software Foundation
  
  #
  
  # Licensed to the Apache Software Foundation (ASF) under one
  
  # or more contributor license agreements.  See the NOTICE file
  
  # distributed with this work for additional information
  
  # regarding copyright ownership.  The ASF licenses this file
  
  # to you under the Apache License, Version 2.0 (the
  
  # "License"); you may not use this file except in compliance
  
  # with the License.  You may obtain a copy of the License at
  
  #
  
  #     http://www.apache.org/licenses/LICENSE-2.0
  
  #
  
  # Unless required by applicable law or agreed to in writing, software
  
  # distributed under the License is distributed on an "AS IS" BASIS,
  
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  
  # See the License for the specific language governing permissions and
  
  # limitations under the License.
  
  # This is sourced in by bin/sqoop to set environment variables prior to
  
  # invoking Hadoop.
  
  bin="$1"
  
  if [ -z "${bin}" ]; then
  
  bin=`dirname $0`
  
  bin=`cd ${bin} && pwd`
  
  fi
  
  if [ -z "$SQOOP_HOME" ]; then
  
  export SQOOP_HOME=${bin}/..
  
  fi
  
  SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}
  
  if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then
  
  . "${SQOOP_CONF_DIR}/sqoop-env.sh"
  
  fi
  
  # Find paths to our dependency systems. If they are unset, use CDH defaults.
  
  if [ -z "${HADOOP_COMMON_HOME}" ]; then
  
  if [ -n "${HADOOP_HOME}" ]; then
  
  HADOOP_COMMON_HOME=${HADOOP_HOME}
  
  else
  
  if [ -d "/usr/lib/hadoop" ]; then
  
  HADOOP_COMMON_HOME=/usr/lib/hadoop
  
  else
  
  HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop
  
  fi
  
  fi
  
  fi
  
  if [ -z "${HADOOP_MAPRED_HOME}" ]; then
  
  HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
  
  if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
  
  if [ -n "${HADOOP_HOME}" ]; then
  
  HADOOP_MAPRED_HOME=${HADOOP_HOME}
  
  else
  
  HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce
  
  fi
  
  fi
  
  fi
  
  # We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set
  
  # so that hcat script works correctly on BigTop
  
  if [ -z "${HADOOP_HOME}" ]; then
  
  if [ -n "${HADOOP_COMMON_HOME}" ]; then
  
  HADOOP_HOME=${HADOOP_COMMON_HOME}
  
  export HADOOP_HOME
  
  fi
  
  fi
  
  if [ -z "${HBASE_HOME}" ]; then
  
  if [ -d "/usr/lib/hbase" ]; then
  
  HBASE_HOME=/usr/lib/hbase
  
  else
  
  HBASE_HOME=${SQOOP_HOME}/../hbase
  
  fi
  
  fi
  
  #if [ -z "${HCAT_HOME}" ]; then
  
  #  if [ -d "/usr/lib/hive-hcatalog" ]; then
  
  #    HCAT_HOME=/usr/lib/hive-hcatalog
  
  #  elif [ -d "/usr/lib/hcatalog" ]; then
  
  #    HCAT_HOME=/usr/lib/hcatalog
  
  #  else
  
  #    HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog
  
  #    if [ ! -d ${HCAT_HOME} ]; then
  
  #       HCAT_HOME=${SQOOP_HOME}/../hcatalog
  
  #    fi
  
  #  fi
  
  #fi
  
  #if [ -z "${ACCUMULO_HOME}" ]; then
  
  #  if [ -d "/usr/lib/accumulo" ]; then
  
  #    ACCUMULO_HOME=/usr/lib/accumulo
  
  #  else
  
  #    ACCUMULO_HOME=${SQOOP_HOME}/../accumulo
  
  #  fi
  
  #fi
  
  if [ -z "${ZOOKEEPER_HOME}" ]; then
  
  if [ -d "/usr/lib/zookeeper" ]; then
  
  ZOOKEEPER_HOME=/usr/lib/zookeeper
  
  else
  
  ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper
  
  fi
  
  fi
  
  if [ -z "${HIVE_HOME}" ]; then
  
  if [ -d "/usr/lib/hive" ]; then
  
  export HIVE_HOME=/usr/lib/hive
  
  elif [ -d ${SQOOP_HOME}/../hive ]; then
  
  export HIVE_HOME=${SQOOP_HOME}/../hive
  
  fi
  
  fi
  
  # Check: If we can't find our dependencies, give up here.
  
  if [ ! -d "${HADOOP_COMMON_HOME}" ]; then
  
  echo "Error: $HADOOP_COMMON_HOME does not exist!"
  
  echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'
  
  exit 1
  
  fi
  
  if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then
  
  echo "Error: $HADOOP_MAPRED_HOME does not exist!"
  
  echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'
  
  exit 1
  
  fi
  
  ## Moved to be a runtime check in sqoop.
  
  if [ ! -d "${HBASE_HOME}" ]; then
  
  echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."
  
  echo 'Please set $HBASE_HOME to the root of your HBase installation.'
  
  fi
  
  ## Moved to be a runtime check in sqoop.
  
  #if [ ! -d "${HCAT_HOME}" ]; then
  
  #  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
  
  #  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
  
  #fi
  
  #if [ ! -d "${ACCUMULO_HOME}" ]; then
  
  #  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
  
  #  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
  
  #fi
  
  if [ ! -d "${ZOOKEEPER_HOME}" ]; then
  
  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
  
  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
  
  fi
  
  # Where to find the main Sqoop jar
  
  SQOOP_JAR_DIR=$SQOOP_HOME
  
  # If there's a "build" subdir, override with this, so we use
  
  # the newly-compiled copy.
  
  if [ -d "$SQOOP_JAR_DIR/build" ]; then
  
  SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"
  
  fi
  
  function add_to_classpath() {
  
  dir=$1
  
  for f in $dir/*.jar; do
  
  SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
  
  done
  
  export SQOOP_CLASSPATH
  
  }
  
  # Add sqoop dependencies to classpath.
  
  SQOOP_CLASSPATH=""
  
  if [ -d "$SQOOP_HOME/lib" ]; then
  
  add_to_classpath $SQOOP_HOME/lib
  
  fi
  
  # Add HBase to dependency list
  
  if [ -e "$HBASE_HOME/bin/hbase" ]; then
  
  TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`
  
  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
  
  fi
  
  # Add HCatalog to dependency list
  
  if [ -e "${HCAT_HOME}/bin/hcat" ]; then
  
  TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`
  
  if [ -z "${HIVE_CONF_DIR}" ]; then
  
  TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}
  
  fi
  
  SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}
  
  fi
  
  # Add Accumulo to dependency list
  
  if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then
  
  for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do
  
  SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
  
  done
  
  for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do
  
  SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn
  
  done
  
  fi
  
  ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}
  
  if [ -d "${ZOOCFGDIR}" ]; then
  
  SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH
  
  fi
  
  SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}
  
  # If there's a build subdir, use Ivy-retrieved dependencies too.
  
  if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then
  
  for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do
  
  SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;
  
  done
  
  fi
  
  add_to_classpath ${SQOOP_JAR_DIR}
  
  HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"
  
  if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then
  
  # User has elements to prepend to the classpath, forcibly overriding
  
  # Sqoop's own lib directories.
  
  export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"
  
  fi
  
  export SQOOP_CLASSPATH
  
  export SQOOP_CONF_DIR
  
  export SQOOP_JAR_DIR
  
  export HADOOP_CLASSPATH
  
  export HADOOP_COMMON_HOME
  
  export HADOOP_MAPRED_HOME
  
  export HBASE_HOME
  
  export HCAT_HOME
  
  export HIVE_CONF_DIR
  
  export ACCUMULO_HOME
  
  export ZOOKEEPER_HOME
  
  3.2、查看 Sqoop 版本
  
  [root@c0 _src]# sqoop version
  
  2019-03-11 22:30:16,837 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  
  Sqoop 1.4.7
  
  git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
  
  Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
  
  4、启动和测试 Sqoop 的数据导入、导出
  
  4.1、Sqoop 通过 Hive 导入数据到 Sqoop
  
    在 mysql 中创建数据库 testmshk 并授权给 root 用户,同时创建 hive2mysql_mshk 表
  
  [root@c0 _src]# mysql -uroot -p123456
  
  mysql: [Warning] Using a password on the command line interface can be insecure.
  
  Welcome to the MySQL monitor.  Commands end with ; or \g.
  
  Your MySQL connection id is 21
  
  Server version: 5.7.25 MySQL Community Server (GPL)
  
  Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
  
  Oracle is a registered trademark of Oracle Corporation and/or its
  
  affiliates. Other names may be trademarks of their respective
  
  owners.
  
  Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
  
  mysql> CREATE DATABASE testmshk DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
  
  Query OK, 1 row affected (0.00 sec)
  
  mysql> show databases;
  
  +--------------------+
  
  | Database           |
  
  +--------------------+
  
  | information_schema |
  
  | hive               |
  
  | mysql              |
  
  | performance_schema |
  
  | sys                |
  
  | testmshk           |
  
  +--------------------+
  
  6 rows in set (0.02 sec)
  
  mysql> grant select,insert,update,delete,create on testmshk.* to root;
  
  Query OK, 0 rows affected (0.01 sec)
  
  mysql> flush  privileges;
  
  Query OK, 0 rows affected (0.01 sec)
  
  mysql> use testmshk;
  
  Database changed
  
  mysql> create table hive2mysql_mshk(id int,namea varchar(50),nameb varchar(50));
  
  Query OK, 0 rows affected (0.02 sec)
  
  mysql> quit;
  
  Bye
  
    通过 Sqoop 查询 Mysql 中表的内容,这时可以看到表中的内容是空的
  
  [root@c0 ~]# sqoop eval --connect  jdbc:mysql://c0:3306/testmshk?useSSL=false  --username root --password 123456 --query "select * from hive2mysql_mshk"
  
  2019-03-11 23:44:06,894 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  
  2019-03-11 23:44:06,945 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  
  2019-03-11 23:44:07,100 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  
  -------------------------------------------------------------
  
  | id          | namea                | nameb                |
  
  -------------------------------------------------------------
  
  -------------------------------------------------------------
  
    在<Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建>一文中,我们在测试 Hive 时创建了测试数据 /hive/warehouse/testtable/testdata001.dat 我们将这个数据,导入到 Mysql
  
  [root@c0 ~]# sqoop export --connect  jdbc:mysql://c0:3306/testmshk?useSSL=false  --username root --password 123456 --table hive2mysql_mshk  --export-dir /hive/warehouse/testtable/testdata001.dat  --input-fields-terminated-by ','
  
  2019-03-11 23:47:10,400 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  
  2019-03-11 23:47:10,437 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  
  2019-03-11 23:47:10,571 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  
  2019-03-11 23:47:10,574 INFO tool.CodeGenTool: Beginning code generation
  
  2019-03-11 23:47:10,914 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
  
  2019-03-11 23:47:10,943 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
  
  2019-03-11 23:47:10,952 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2
  
  Note: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.java uses or overrides a deprecated API.
  
  Note: Recompile with -Xlint:deprecation for details.
  
  2019-03-11 23:47:12,652 INFO orm.CompilationManager:www.mtyL1.com Writing jar file: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.jar
  
  2019-03-11 23:47:12,669 INFO mapreduce.ExportJobBase: Beginning export of hive2mysql_mshk
  
  2019-03-11 23:47:12,669 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
  
  2019-03-11 23:47:12,804 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
  
  2019-03-11 23:47:14,106 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
  
  2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
  
  2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
  
  2019-03-11 23:47:14,479 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
  
  2019-03-11 23:47:14,808 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0003
  
  2019-03-11 23:47:16,429 INFO input.FileInputFormat:www.zhenghongyule.com Total input files to process : 1
  
  2019-03-11 23:47:16,432 INFO input.FileInputFormat:www.hengtongyoule.com Total input files to process : 1
  
  2019-03-11 23:47:16,513 INFO mapreduce.JobSubmitter: number of splits:4
  
  2019-03-11 23:47:16,577 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
  
  2019-03-11 23:47:16,684 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0003
  
  2019-03-11 23:47:16,686 INFO mapreduce.JobSubmitter: Executing with tokens: []
  
  2019-03-11 23:47:16,924 INFO conf.Configuration: resource-types.xml not found
  
  2019-03-11 23:47:16,924 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
  
  2019-03-11 23:47:17,213 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0003
  
  2019-03-11 23:47:17,261 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0003/
  
  2019-03-11 23:47:17,262 INFO mapreduce.Job: Running job: job_1552315366846_0003
  
  2019-03-11 23:47:23,359 INFO mapreduce.Job: Job job_1552315366846_0003 running in uber mode : false
  
  2019-03-11 23:47:23,360 INFO mapreduce.Job:  map 0% reduce 0%
  
  2019-03-11 23:47:31,454 INFO mapreduce.Job:  map 75% reduce 0%
  
  2019-03-11 23:47:32,462 INFO mapreduce.Job:  map 100% reduce 0%
  
  2019-03-11 23:47:32,473 INFO mapreduce.Job: Job job_www.meiwanyule.cn 1552315366846_0003 completed successfully
  
  2019-03-11 23:47:32,619 INFO mapreduce.Job: Counters: 32
  
  File System Counters
  
  FILE: Number of bytes read=0
  
  FILE: Number of bytes written=913424
  
  FILE: Number of read operations=0
  
  FILE: Number of large read operations=0
  
  FILE: Number of write operations=0
  
  HDFS: Number of bytes read=782
  
  HDFS: Number of bytes written=0
  
  HDFS: Number of read operations=19
  
  HDFS: Number of large read operations=0
  
  HDFS: Number of write operations=0
  
  Job Counters
  
  Launched map tasks=4
  
  Data-local map tasks=4
  
  Total time spent by all maps in occupied slots (ms)=23446
  
  Total time spent by all reduces in occupied slots (ms)=0
  
  Total time spent by all map tasks (ms)=23446
  
  Total vcore-milliseconds taken by all map tasks=23446
  
  Total megabyte-milliseconds taken by all map tasks=24008704
  
  Map-Reduce Framework
  
  Map input records=2
  
  Map output records=2
  
  Input split bytes=636
  
  Spilled Records=0
  
  Failed Shuffles=0
  
  Merged Map outputs=0
  
  GC time elapsed (ms)=582
  
  CPU time spent (ms)=3960
  
  Physical memory (bytes) snapshot=830259200
  
  Virtual memory (bytes) snapshot=11165683712
  
  Total committed heap usage (bytes)=454557696
  
  Peak Map Physical memory (bytes)=208502784
  
  Peak Map Virtual memory (bytes)=2793611264
  
  File Input Format Counters
  
  Bytes Read=0
  
  File Output Format Counters
  
  Bytes Written=0
  
  2019-03-11 23:47:32,626 INFO mapreduce.ExportJobBase: Transferred 782 bytes in 18.5015 seconds (42.2668 bytes/sec)
  
  2019-03-11 23:47:32,629 INFO mapreduce.ExportJobBase: Exported 2 records.
  
  --export-dir 表示在 HDFS 对应的 Hive 数据库文件位置
  
  –input-fields-terminated-by 表示要处理的间隔符
  
    再次通过 Sqoop 查看 MySql 中的内容,可以看到数据已经成功导入
  
  [root@c0 ~]# sqoop eval --connect  jdbc:mysql:/www.jlzkyy.com/c0:3306/testmshk?useSSL=false  --username root --password 123456 --query "select * from hive2mysql_mshk"
  
  2019-03-11 23:48:56,848 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  
  2019-03-11 23:48:56,884 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  
  2019-03-11 23:48:57,024 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  
  -------------------------------------------------------------
  
  | id          | namea                | nameb                |
  
  -------------------------------------------------------------
  
  | 10086       | my.mshk.top          | you.mshk.top         |
  
  | 12306       | mname                | yname                |
  
  -------------------------------------------------------------
  
    在 MySql 中能够看到我们创建的 hive2mysql_mshk 表有2行数据
  
  4.2、Sqoop 通过 MySql 导入数据到 Hive
  
    刚刚我们创建的 hive2mysql_mshk 表没有任何主键,我们只是从 Hive 中添加了一些记录到 Mysql。
  
    默认情况下,Sqoop 将识别表中的主键列(如果存在)并将其用作拆分列。
  
    从数据库中检索拆分列的低值和高值,并且映射任务在总范围的大小均匀的组件上运行。
  
    如果主键的实际值在其范围内不均匀分布,则可能导致任务不平衡。
  
    您应该使用 --split-by 参数明确选择不同的列。例如-- split-by id。
  
    在将 MySql 的数据导入到 Hive中的 Sqoop 命令添加了更多参数:
  
  sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id  --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk  --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk
  
  --split-by <column-name> 用哪个列来拆分
  
  --table 告诉计算机您要从MySQL导入哪个表
  
  --target-dir <dir> HDFS要存储的目录
  
  --hive-import 将表导入Hive
  
  --hive-overwrite 覆盖Hive表中的现有数据
  
  --hive-table <table-name> 设置导入Hive时要使用的表名
  
  --fields-terminated-by <char> 设置字段分隔符
  
    接下来 Sqoop 的操作是一个 map-reduce 工作。
  
  2019-03-12 20:21:37,154 INFO mapreduce.ImportJobBase: Transferred 49 bytes in 27.4776 seconds (1.7833 bytes/sec)
  
  2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Retrieved 2 records.
  
  2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table hive2mysql_mshk
  
  2019-03-12 20:21:37,188 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
  
  2019-03-12 20:21:37,208 INFO hive.HiveImport: Loading uploaded data into Hive
  
  2019-03-12 20:21:37,220 INFO conf.HiveConf:www.yongshi123.cn Found configuration file file:/home/work/_app/hive-2.3.4/conf/hive-site.xml
  
  2019-03-12 20:21:49,491 INFO hive.HiveImport:
  
  2019-03-12 20:21:49,492 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
  
  2019-03-12 20:21:56,558 INFO hive.HiveImport: OK
  
  2019-03-12 20:21:56,561 INFO hive.HiveImport: Time taken: 5.954 seconds
  
  2019-03-12 20:21:57,005 INFO hive.HiveImport: Loading data to table testmshk.mysql2hive_mshk
  
  2019-03-12 20:21:58,181 INFO hive.HiveImport: OK
  
  2019-03-12 20:21:58,181 INFO hive.HiveImport: Time taken: 1.619 seconds
  
  2019-03-12 20:21:58,681 INFO hive.HiveImport: Hive import complete.
  
    接下来我们将 MySql 中的表 hive2mysql_mshk 数据,导入到 Hbase ,同时在 Hbase 中创建表 mysql2hase_mshk
  
  [root@c0 ~]# sqoop import --connect jdbc:mysql://www.jrgjze.com c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --hbase-table mysql2hase_mshk --hbase-create-table --hbase-row-key id --column-family id
  
  2019-03-13 12:04:33,647 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
  
  2019-03-13 12:04:33,694 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
  
  2019-03-13 12:04:33,841 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
  
  2019-03-13 12:04:33,841 INFO tool.CodeGenTool:www.dfgjpt.com Beginning code generation
  
  2019-03-13 12:04:34,162 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
  
  2019-03-13 12:04:34,197 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1
  
  hbase(main):004:0>
  
    如何在 Hbase 和 Hive 中互相导入、导出数据,请参考文章:Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建 中的 9.2.4 和 9.2.5 章节。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值