MySQL中hase与tree的区别_通过 Sqoop1.4.7 将 Mysql5.7、Hive2.3.4、Hbase1.4.9 之间的数据导入导出...

目录

[TOC]

1、什么是 Sqoop?

2、下载应用程序及配置环境变量

2.1、下载 Sqoop 1.4.7

2.2、设置环境变量

2.3、设置安装所需环境

3、安装 Sqoop 1.4.7

3.1、修改 Sqoop 配置文件

3.1.1、修改配置文件 sqoop-env.sh

3.1.2、修改配置文件 configure-sqoop

3.2、查看 Sqoop 版本

4、启动和测试 Sqoop 的数据导入、导出

4.1、Sqoop 通过 Hive 导入数据到 Sqoop

4.2、Sqoop 通过 MySql 导入数据到 Hive

4.3、Sqoop 通过 MySql 导入数据到 Hbase

1、什么是 Sqoop?

Sqoop 是一种用于在 Hadoop 和关系 数据库 或大型机之间传输数据的工具。

您可以使用 Sqoop 将数据从关系数据库管理系统 RDBMS (如 MySQL 或 Oracle )导入 Hadoop 分布式文件系统 HDFS ,转换 Hadoop MapReduce 中的数据,然后将数据导出回 RDBMS 。

Sqoop 自动执行此过程的大部分过程,依靠数据库来描述要导入的数据的模式。 Sqoop 使用 MapReduce 导入和导出数据,提供并行操作和容错。

2、下载应用程序及配置环境变量

2.1、下载 Sqoop 1.4.7

通过以下命令下载 Sqoop ,解压后,放到 /home/work/_app/ 目录中:

[root@c0 _src]# pwd

/home/work/_src

[root@c0 _src]# wget http://mirrors.shu.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

[root@c0 _src]# tar -xzvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

[root@c0 _src]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /home/work/_app/

2.2、设置环境变量

在每一台机器上设置 Sqoop 环境变量,运行以下命令

echo "" >> /etc/bashrc

echo "# Sqoop 1.4.7" >> /etc/bashrc

echo "export SQOOP_HOME=/home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0" >> /etc/bashrc

echo "" >> /etc/bashrc

echo "# Path" >> /etc/bashrc

echo "export PATH=\$PATH:\$SQOOP_HOME/bin" >> /etc/bashrc

source /etc/bashrc

2.3、设置安装所需环境

3、安装 Sqoop 1.4.7

3.1、修改 Sqoop 配置文件

3.1.1、修改配置文件 sqoop-env.sh

创建 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh 文件编辑并保存,内容为空,因为我们在本文的配置环境变量章节中已经配置了环境变量,同时也在< Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建 >一文中配置了 Hive 和 Hadoop 环境变量:

[root@c0 ~]# echo "" > /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/conf/sqoop-env.sh

将 /home/work/_app/hive-2.3.4/lib/ 目录下的 hive-hcatalog-core-2.3.4.jar 、 mysql-connector-java-5.1.47-bin.jar 、 hive-common-2.3.4.jar 、 libthrift-0.9.3.jar 文件,复制到 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib 目录下

[root@c0 ~]# cp /home/work/_app/hive-2.3.4/lib/hive-hcatalog-core-2.3.4.jar /home/work/_app/hive-2.3.4/lib/mysql-connector-java-5.1.47-bin.jar /home/work/_app/hive-2.3.4/lib/libthrift-0.9.3.jar /home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/

3.1.2、修改配置文件 configure-sqoop

编辑 /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop 文件并保存,内容如下:

[root@c0 _src]# cat /home/work/_app/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop

#!/bin/bash

#

# Copyright 2011 The Apache Software Foundation

#

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements. See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership. The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License. You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# This is sourced in by bin/sqoop to set environment variables prior to

# invoking Hadoop.

bin="$1"

if [ -z "${bin}" ]; then

bin=`dirname $0`

bin=`cd ${bin} && pwd`

fi

if [ -z "$SQOOP_HOME" ]; then

export SQOOP_HOME=${bin}/..

fi

SQOOP_CONF_DIR=${SQOOP_CONF_DIR:-${SQOOP_HOME}/conf}

if [ -f "${SQOOP_CONF_DIR}/sqoop-env.sh" ]; then

. "${SQOOP_CONF_DIR}/sqoop-env.sh"

fi

# Find paths to our dependency systems. If they are unset, use CDH defaults.

if [ -z "${HADOOP_COMMON_HOME}" ]; then

if [ -n "${HADOOP_HOME}" ]; then

HADOOP_COMMON_HOME=${HADOOP_HOME}

else

if [ -d "/usr/lib/hadoop" ]; then

HADOOP_COMMON_HOME=/usr/lib/hadoop

else

HADOOP_COMMON_HOME=${SQOOP_HOME}/../hadoop

fi

fi

fi

if [ -z "${HADOOP_MAPRED_HOME}" ]; then

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then

if [ -n "${HADOOP_HOME}" ]; then

HADOOP_MAPRED_HOME=${HADOOP_HOME}

else

HADOOP_MAPRED_HOME=${SQOOP_HOME}/../hadoop-mapreduce

fi

fi

fi

# We are setting HADOOP_HOME to HADOOP_COMMON_HOME if it is not set

# so that hcat script works correctly on BigTop

if [ -z "${HADOOP_HOME}" ]; then

if [ -n "${HADOOP_COMMON_HOME}" ]; then

HADOOP_HOME=${HADOOP_COMMON_HOME}

export HADOOP_HOME

fi

fi

if [ -z "${HBASE_HOME}" ]; then

if [ -d "/usr/lib/hbase" ]; then

HBASE_HOME=/usr/lib/hbase

else

HBASE_HOME=${SQOOP_HOME}/../hbase

fi

fi

#if [ -z "${HCAT_HOME}" ]; then

# if [ -d "/usr/lib/hive-hcatalog" ]; then

# HCAT_HOME=/usr/lib/hive-hcatalog

# elif [ -d "/usr/lib/hcatalog" ]; then

# HCAT_HOME=/usr/lib/hcatalog

# else

# HCAT_HOME=${SQOOP_HOME}/../hive-hcatalog

# if [ ! -d ${HCAT_HOME} ]; then

# HCAT_HOME=${SQOOP_HOME}/../hcatalog

# fi

# fi

#fi

#if [ -z "${ACCUMULO_HOME}" ]; then

# if [ -d "/usr/lib/accumulo" ]; then

# ACCUMULO_HOME=/usr/lib/accumulo

# else

# ACCUMULO_HOME=${SQOOP_HOME}/../accumulo

# fi

#fi

if [ -z "${ZOOKEEPER_HOME}" ]; then

if [ -d "/usr/lib/zookeeper" ]; then

ZOOKEEPER_HOME=/usr/lib/zookeeper

else

ZOOKEEPER_HOME=${SQOOP_HOME}/../zookeeper

fi

fi

if [ -z "${HIVE_HOME}" ]; then

if [ -d "/usr/lib/hive" ]; then

export HIVE_HOME=/usr/lib/hive

elif [ -d ${SQOOP_HOME}/../hive ]; then

export HIVE_HOME=${SQOOP_HOME}/../hive

fi

fi

# Check: If we can't find our dependencies, give up here.

if [ ! -d "${HADOOP_COMMON_HOME}" ]; then

echo "Error: $HADOOP_COMMON_HOME does not exist!"

echo 'Please set $HADOOP_COMMON_HOME to the root of your Hadoop installation.'

exit 1

fi

if [ ! -d "${HADOOP_MAPRED_HOME}" ]; then

echo "Error: $HADOOP_MAPRED_HOME does not exist!"

echo 'Please set $HADOOP_MAPRED_HOME to the root of your Hadoop MapReduce installation.'

exit 1

fi

## Moved to be a runtime check in sqoop.

if [ ! -d "${HBASE_HOME}" ]; then

echo "Warning: $HBASE_HOME does not exist! HBase imports will fail."

echo 'Please set $HBASE_HOME to the root of your HBase installation.'

fi

## Moved to be a runtime check in sqoop.

#if [ ! -d "${HCAT_HOME}" ]; then

# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."

# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'

#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then

# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."

# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'

#fi

if [ ! -d "${ZOOKEEPER_HOME}" ]; then

echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."

echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'

fi

# Where to find the main Sqoop jar

SQOOP_JAR_DIR=$SQOOP_HOME

# If there's a "build" subdir, override with this, so we use

# the newly-compiled copy.

if [ -d "$SQOOP_JAR_DIR/build" ]; then

SQOOP_JAR_DIR="${SQOOP_JAR_DIR}/build"

fi

function add_to_classpath() {

dir=$1

for f in $dir/*.jar; do

SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;

done

export SQOOP_CLASSPATH

}

# Add sqoop dependencies to classpath.

SQOOP_CLASSPATH=""

if [ -d "$SQOOP_HOME/lib" ]; then

add_to_classpath $SQOOP_HOME/lib

fi

# Add HBase to dependency list

if [ -e "$HBASE_HOME/bin/hbase" ]; then

TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`$HBASE_HOME/bin/hbase classpath`

SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}

fi

# Add HCatalog to dependency list

if [ -e "${HCAT_HOME}/bin/hcat" ]; then

TMP_SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:`${HCAT_HOME}/bin/hcat -classpath`

if [ -z "${HIVE_CONF_DIR}" ]; then

TMP_SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}:${HIVE_CONF_DIR}

fi

SQOOP_CLASSPATH=${TMP_SQOOP_CLASSPATH}

fi

# Add Accumulo to dependency list

if [ -e "$ACCUMULO_HOME/bin/accumulo" ]; then

for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*accumulo.*jar | cut -d':' -f2`; do

SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

done

for jn in `$ACCUMULO_HOME/bin/accumulo classpath | grep file:.*zookeeper.*jar | cut -d':' -f2`; do

SQOOP_CLASSPATH=$SQOOP_CLASSPATH:$jn

done

fi

ZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}

if [ -d "${ZOOCFGDIR}" ]; then

SQOOP_CLASSPATH=$ZOOCFGDIR:$SQOOP_CLASSPATH

fi

SQOOP_CLASSPATH=${SQOOP_CONF_DIR}:${SQOOP_CLASSPATH}

# If there's a build subdir, use Ivy-retrieved dependencies too.

if [ -d "$SQOOP_HOME/build/ivy/lib/sqoop" ]; then

for f in $SQOOP_HOME/build/ivy/lib/sqoop/*/*.jar; do

SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f;

done

fi

add_to_classpath ${SQOOP_JAR_DIR}

HADOOP_CLASSPATH="${SQOOP_CLASSPATH}:${HADOOP_CLASSPATH}"

if [ ! -z "$SQOOP_USER_CLASSPATH" ]; then

# User has elements to prepend to the classpath, forcibly overriding

# Sqoop's own lib directories.

export HADOOP_CLASSPATH="${SQOOP_USER_CLASSPATH}:${HADOOP_CLASSPATH}"

fi

export SQOOP_CLASSPATH

export SQOOP_CONF_DIR

export SQOOP_JAR_DIR

export HADOOP_CLASSPATH

export HADOOP_COMMON_HOME

export HADOOP_MAPRED_HOME

export HBASE_HOME

export HCAT_HOME

export HIVE_CONF_DIR

export ACCUMULO_HOME

export ZOOKEEPER_HOME

3.2、查看 Sqoop 版本

[root@c0 _src]# sqoop version

2019-03-11 22:30:16,837 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

Sqoop 1.4.7

git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8

Compiled by maugli on Thu Dec 21 15:59:58 STD 2017

4、启动和测试 Sqoop 的数据导入、导出

4.1、Sqoop 通过 Hive 导入数据到 Sqoop

在 mysql 中创建数据库 testmshk 并授权给 root 用户,同时创建 hive2mysql_mshk 表

[root@c0 _src]# mysql -uroot -p123456

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor. Commands end with ; or \g.

Your MySQL connection id is 21

Server version: 5.7.25 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE testmshk DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

Query OK, 1 row affected (0.00 sec)

mysql> show databases;

+--------------------+

| Database |

+--------------------+

| information_schema |

| hive |

| mysql |

| performance_schema |

| sys |

| testmshk |

+--------------------+

6 rows in set (0.02 sec)

mysql> grant select,insert,update,delete,create on testmshk.* to root;

Query OK, 0 rows affected (0.01 sec)

mysql> flush privileges;

Query OK, 0 rows affected (0.01 sec)

mysql> use testmshk;

Database changed

mysql> create table hive2mysql_mshk(id int,namea varchar(50),nameb varchar(50));

Query OK, 0 rows affected (0.02 sec)

mysql> quit;

Bye

通过 Sqoop 查询 Mysql 中表的内容,这时可以看到表中的内容是空的

[root@c0 ~]# sqoop eval --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"

2019-03-11 23:44:06,894 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:44:06,945 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:44:07,100 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

-------------------------------------------------------------

| id | namea | nameb |

-------------------------------------------------------------

-------------------------------------------------------------

在< Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建 >一文中,我们在测试 Hive 时创建了测试数据 /hive/warehouse/testtable/testdata001.dat 我们将这个数据,导入到 Mysql

[root@c0 ~]# sqoop export --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --table hive2mysql_mshk --export-dir /hive/warehouse/testtable/testdata001.dat --input-fields-terminated-by ','

2019-03-11 23:47:10,400 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:47:10,437 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:47:10,571 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-11 23:47:10,574 INFO tool.CodeGenTool: Beginning code generation

2019-03-11 23:47:10,914 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-11 23:47:10,943 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-11 23:47:10,952 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2

Note: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

2019-03-11 23:47:12,652 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/9ea7f54fe87f35ed071ed75c293f25d8/hive2mysql_mshk.jar

2019-03-11 23:47:12,669 INFO mapreduce.ExportJobBase: Beginning export of hive2mysql_mshk

2019-03-11 23:47:12,669 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-03-11 23:47:12,804 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-03-11 23:47:14,106 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

2019-03-11 23:47:14,112 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2019-03-11 23:47:14,479 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

2019-03-11 23:47:14,808 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0003

2019-03-11 23:47:16,429 INFO input.FileInputFormat: Total input files to process : 1

2019-03-11 23:47:16,432 INFO input.FileInputFormat: Total input files to process : 1

2019-03-11 23:47:16,513 INFO mapreduce.JobSubmitter: number of splits:4

2019-03-11 23:47:16,577 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

2019-03-11 23:47:16,684 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0003

2019-03-11 23:47:16,686 INFO mapreduce.JobSubmitter: Executing with tokens: []

2019-03-11 23:47:16,924 INFO conf.Configuration: resource-types.xml not found

2019-03-11 23:47:16,924 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2019-03-11 23:47:17,213 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0003

2019-03-11 23:47:17,261 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0003/

2019-03-11 23:47:17,262 INFO mapreduce.Job: Running job: job_1552315366846_0003

2019-03-11 23:47:23,359 INFO mapreduce.Job: Job job_1552315366846_0003 running in uber mode : false

2019-03-11 23:47:23,360 INFO mapreduce.Job: map 0% reduce 0%

2019-03-11 23:47:31,454 INFO mapreduce.Job: map 75% reduce 0%

2019-03-11 23:47:32,462 INFO mapreduce.Job: map 100% reduce 0%

2019-03-11 23:47:32,473 INFO mapreduce.Job: Job job_1552315366846_0003 completed successfully

2019-03-11 23:47:32,619 INFO mapreduce.Job: Counters: 32

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=913424

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=782

HDFS: Number of bytes written=0

HDFS: Number of read operations=19

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Launched map tasks=4

Data-local map tasks=4

Total time spent by all maps in occupied slots (ms)=23446

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=23446

Total vcore-milliseconds taken by all map tasks=23446

Total megabyte-milliseconds taken by all map tasks=24008704

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=636

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=582

CPU time spent (ms)=3960

Physical memory (bytes) snapshot=830259200

Virtual memory (bytes) snapshot=11165683712

Total committed heap usage (bytes)=454557696

Peak Map Physical memory (bytes)=208502784

Peak Map Virtual memory (bytes)=2793611264

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

2019-03-11 23:47:32,626 INFO mapreduce.ExportJobBase: Transferred 782 bytes in 18.5015 seconds (42.2668 bytes/sec)

2019-03-11 23:47:32,629 INFO mapreduce.ExportJobBase: Exported 2 records.

--export-dir 表示在 HDFS 对应的 Hive 数据库文件位置

–input-fields-terminated-by 表示要处理的间隔符

再次通过 Sqoop 查看 MySql 中的内容,可以看到数据已经成功导入

[root@c0 ~]# sqoop eval --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --query "select * from hive2mysql_mshk"

2019-03-11 23:48:56,848 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-11 23:48:56,884 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-11 23:48:57,024 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

-------------------------------------------------------------

| id | namea | nameb |

-------------------------------------------------------------

| 10086 | my.mshk.top | you.mshk.top |

| 12306 | mname | yname |

-------------------------------------------------------------

在 MySql 中能够看到我们创建的 hive2mysql_mshk 表有2行数据

7fe985292f504c8eec4945a21e03adb1.png

4.2、Sqoop 通过 MySql 导入数据到 Hive

刚刚我们创建的 hive2mysql_mshk 表没有任何主键,我们只是从 Hive 中添加了一些记录到 Mysql 。

默认情况下, Sqoop 将识别表中的主键列(如果存在)并将其用作拆分列。

从数据库中检索拆分列的低值和高值,并且映射任务在总范围的大小均匀的组件上运行。

如果主键的实际值在其范围内不均匀分布,则可能导致任务不平衡。

您应该使用 --split-by 参数明确选择不同的列。例如 -- split-by id 。

在将 MySql 的数据导入到 Hive 中的 Sqoop 命令添加了更多参数:

sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk

--split-by 用哪个列来拆分

--table 告诉计算机您要从MySQL导入哪个表

--target-dir

--hive-import 将表导入Hive

--hive-overwrite 覆盖Hive表中的现有数据

--hive-table 设置导入Hive时要使用的表名

--fields-terminated-by 设置字段分隔符

接下来 Sqoop 的操作是一个 map-reduce 工作。

[root@c0 _src]# sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --target-dir /hive/warehouse/mysql2hive_mshk --fields-terminated-by "," --hive-import --hive-table testmshk.mysql2hive_mshk --hive-overwrite

2019-03-12 20:21:05,060 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-12 20:21:05,137 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-12 20:21:05,337 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-12 20:21:05,348 INFO tool.CodeGenTool: Beginning code generation

2019-03-12 20:21:05,785 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:05,821 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:05,831 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/work/_app/hadoop-3.1.2

Note: /tmp/sqoop-root/compile/202a5bda3950a7ccc6782f3cfcc3a99d/hive2mysql_mshk.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

2019-03-12 20:21:08,747 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/202a5bda3950a7ccc6782f3cfcc3a99d/hive2mysql_mshk.jar

2019-03-12 20:21:08,761 WARN manager.MySQLManager: It looks like you are importing from mysql.

2019-03-12 20:21:08,761 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

2019-03-12 20:21:08,761 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

2019-03-12 20:21:08,762 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

2019-03-12 20:21:08,764 INFO mapreduce.ImportJobBase: Beginning import of hive2mysql_mshk

2019-03-12 20:21:08,765 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

2019-03-12 20:21:08,928 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

2019-03-12 20:21:09,656 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2019-03-12 20:21:10,332 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2

2019-03-12 20:21:10,688 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1552315366846_0011

2019-03-12 20:21:12,618 INFO db.DBInputFormat: Using read commited transaction isolation

2019-03-12 20:21:12,619 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `hive2mysql_mshk`

2019-03-12 20:21:12,622 INFO db.IntegerSplitter: Split size: 555; Num splits: 4 from: 10086 to: 12306

2019-03-12 20:21:12,696 INFO mapreduce.JobSubmitter: number of splits:4

2019-03-12 20:21:13,137 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552315366846_0011

2019-03-12 20:21:13,140 INFO mapreduce.JobSubmitter: Executing with tokens: []

2019-03-12 20:21:13,443 INFO conf.Configuration: resource-types.xml not found

2019-03-12 20:21:13,443 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.

2019-03-12 20:21:13,533 INFO impl.YarnClientImpl: Submitted application application_1552315366846_0011

2019-03-12 20:21:13,593 INFO mapreduce.Job: The url to track the job: http://c1:8088/proxy/application_1552315366846_0011/

2019-03-12 20:21:13,594 INFO mapreduce.Job: Running job: job_1552315366846_0011

2019-03-12 20:21:20,705 INFO mapreduce.Job: Job job_1552315366846_0011 running in uber mode : false

2019-03-12 20:21:20,727 INFO mapreduce.Job: map 0% reduce 0%

2019-03-12 20:21:29,927 INFO mapreduce.Job: map 50% reduce 0%

2019-03-12 20:21:29,930 INFO mapreduce.Job: Task Id : attempt_1552315366846_0011_m_000000_0, Status : FAILED

[2019-03-12 20:21:28.236]Container [pid=19941,containerID=container_e15_1552315366846_0011_01_000002] is running 539445760B beyond the 'VIRTUAL' memory limit. Current usage: 199.9 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

Dump of the process-tree for container_e15_1552315366846_0011_01_000002 :

|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

|- 20026 19941 19941 19941 (java) 950 81 2678403072 50861 /opt/jdk1.8.0_201/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/home/work/_data/hadoop-3.1.2/nm-local-dir/usercache/root/appcache/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.11.32 41274 attempt_1552315366846_0011_m_000000_0 16492674416642

|- 19941 19939 19941 19941 (bash) 1 2 115900416 307 /bin/bash -c /opt/jdk1.8.0_201/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/home/work/_data/hadoop-3.1.2/nm-local-dir/usercache/root/appcache/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.11.32 41274 attempt_1552315366846_0011_m_000000_0 16492674416642 1>/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/stdout 2>/home/work/_logs/hadoop-3.1.2/userlogs/application_1552315366846_0011/container_e15_1552315366846_0011_01_000002/stderr

[2019-03-12 20:21:28.324]Container killed on request. Exit code is 143

[2019-03-12 20:21:28.335]Container exited with a non-zero exit code 143.

2019-03-12 20:21:30,978 INFO mapreduce.Job: map 75% reduce 0%

2019-03-12 20:21:37,021 INFO mapreduce.Job: map 100% reduce 0%

2019-03-12 20:21:37,032 INFO mapreduce.Job: Job job_1552315366846_0011 completed successfully

2019-03-12 20:21:37,145 INFO mapreduce.Job: Counters: 33

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=915840

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=425

HDFS: Number of bytes written=49

HDFS: Number of read operations=24

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters

Failed map tasks=1

Launched map tasks=5

Other local map tasks=5

Total time spent by all maps in occupied slots (ms)=31981

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=31981

Total vcore-milliseconds taken by all map tasks=31981

Total megabyte-milliseconds taken by all map tasks=32748544

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=425

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=318

CPU time spent (ms)=6520

Physical memory (bytes) snapshot=815542272

Virtual memory (bytes) snapshot=11174408192

Total committed heap usage (bytes)=437780480

Peak Map Physical memory (bytes)=206934016

Peak Map Virtual memory (bytes)=2795565056

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=49

2019-03-12 20:21:37,154 INFO mapreduce.ImportJobBase: Transferred 49 bytes in 27.4776 seconds (1.7833 bytes/sec)

2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Retrieved 2 records.

2019-03-12 20:21:37,159 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table hive2mysql_mshk

2019-03-12 20:21:37,188 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-12 20:21:37,208 INFO hive.HiveImport: Loading uploaded data into Hive

2019-03-12 20:21:37,220 INFO conf.HiveConf: Found configuration file file:/home/work/_app/hive-2.3.4/conf/hive-site.xml

2019-03-12 20:21:49,491 INFO hive.HiveImport:

2019-03-12 20:21:49,492 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true

2019-03-12 20:21:56,558 INFO hive.HiveImport: OK

2019-03-12 20:21:56,561 INFO hive.HiveImport: Time taken: 5.954 seconds

2019-03-12 20:21:57,005 INFO hive.HiveImport: Loading data to table testmshk.mysql2hive_mshk

2019-03-12 20:21:58,181 INFO hive.HiveImport: OK

2019-03-12 20:21:58,181 INFO hive.HiveImport: Time taken: 1.619 seconds

2019-03-12 20:21:58,681 INFO hive.HiveImport: Hive import complete.

最后,让我们验证 Hive 中的输出:

[root@c0 ~]# hive

Logging initialized using configuration in jar:file:/home/work/_app/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> show databases;

OK

default

testmshk

Time taken: 3.512 seconds, Fetched: 2 row(s)

hive> use testmshk;

OK

Time taken: 0.032 seconds

hive> show tables;

OK

mysql2hive_mshk

testtable

Time taken: 0.083 seconds, Fetched: 2 row(s)

hive> select * from mysql2hive_mshk;

OK

10086 my.mshk.top you.mshk.top

12306 mname yname

Time taken: 1.634 seconds, Fetched: 2 row(s)

hive> quit;

同时我们在 HDFS 中也可以看到创建的数据:

eb0fc666c084152bddfe121e1156cce6.png

4.3、Sqoop 通过 MySql 导入数据到 Hbase

接下来我们将 MySql 中的表 hive2mysql_mshk 数据,导入到 Hbase ,同时在 Hbase 中创建表 mysql2hase_mshk

[root@c0 ~]# sqoop import --connect jdbc:mysql://c0:3306/testmshk?useSSL=false --username root --password 123456 --split-by id --table hive2mysql_mshk --hbase-table mysql2hase_mshk --hbase-create-table --hbase-row-key id --column-family id

2019-03-13 12:04:33,647 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7

2019-03-13 12:04:33,694 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

2019-03-13 12:04:33,841 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

2019-03-13 12:04:33,841 INFO tool.CodeGenTool: Beginning code generation

2019-03-13 12:04:34,162 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

2019-03-13 12:04:34,197 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `hive2mysql_mshk` AS t LIMIT 1

...

2019-03-13 12:05:13,782 INFO mapreduce.Job: map 75% reduce 0%

2019-03-13 12:05:15,813 INFO mapreduce.Job: map 100% reduce 0%

2019-03-13 12:05:16,827 INFO mapreduce.Job: Job job_1552397454797_0002 completed successfully

2019-03-13 12:05:16,942 INFO mapreduce.Job: Counters: 33

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=1041632

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=425

HDFS: Number of bytes written=0

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=0

Job Counters

Failed map tasks=5

Launched map tasks=9

Other local map tasks=9

Total time spent by all maps in occupied slots (ms)=68882

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=68882

Total vcore-milliseconds taken by all map tasks=68882

Total megabyte-milliseconds taken by all map tasks=70535168

Map-Reduce Framework

Map input records=2

Map output records=2

Input split bytes=425

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=801

CPU time spent (ms)=15480

Physical memory (bytes) snapshot=1097326592

Virtual memory (bytes) snapshot=11271196672

Total committed heap usage (bytes)=629669888

Peak Map Physical memory (bytes)=295751680

Peak Map Virtual memory (bytes)=2828283904

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=0

2019-03-13 12:05:16,949 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 38.224 seconds (0 bytes/sec)

2019-03-13 12:05:16,954 INFO mapreduce.ImportJobBase: Retrieved 2 records.

这时,我们再用 shell 测试连接 Hbase ,查看我们刚刚导入的数据,能够看到 mysql2hase_mshk 已经存在,并且可以获取其中的数据

[root@c0 ~]# hbase shell

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

Version 1.4.9, rd625b212e46d01cb17db9ac2e9e927fdb201afa1, Wed Dec 5 11:54:10 PST 2018

hbase(main):001:0> list

TABLE

mysql2hase_mshk

1 row(s) in 0.1870 seconds

=> ["mysql2hase_mshk"]

hbase(main):002:0> scan 'mysql2hase_mshk'

ROW COLUMN+CELL

10086 column=id:namea, timestamp=1552449912494, value=my.mshk.top

10086 column=id:nameb, timestamp=1552449912494, value=you.mshk.top

12306 column=id:namea, timestamp=1552449906986, value=mname

12306 column=id:nameb, timestamp=1552449906986, value=yname

2 row(s) in 0.1330 seconds

hbase(main):003:0> hbase(main):003:0> get 'mysql2hase_mshk','10086'

COLUMN CELL

id:namea timestamp=1552449912494, value=my.mshk.top

id:nameb timestamp=1552449912494, value=you.mshk.top

1 row(s) in 0.0230 seconds

hbase(main):004:0>

希望您发现它很有用,感谢您的支持和阅读我的博客。

博文作者:迦壹

转载声明:可以转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明,谢谢合作!

假设您认为这篇文章对您有帮助,可以通过以下方式进行捐赠,谢谢!

以太坊地址:0xbB0a92d634D7b9Ac69079ed0e521CC2e0a97c420

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值