linux安装mysql8020_Sqoop的安装与使用

5.3 添加需要的jar包到lib下面

这里的jar包指的是连接关系型数据库的jar 比如mysql Oracle  这些jar包是需要自己添加到lib目录下面去的

cp  ~/hive/lib/mysql-connector-java-5.1.30.jar  ~/sqoop/lib/

5.4 添加环境变量

vi ~/.profile

添加如下内容

export SQOOP_HOME=/home/Hadoop/sqoop

export SBT_HOME=/home/hadoop/sbt

export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin

export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib

source ~/.profile使配置文件生效

5.5 测试mysql数据库的连接使用

①连接mysql数据库,列出所有的数据库

hadoop@linuxidc:~/sqoop/conf$ sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:15:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:15:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

information_schema

XINGXUNTONG

XINGXUNTONG_HIVE

amon

hive

hmon

mahout

mysql

oozie

performance_schema

realworld

rman

scm

smon

-P表示输入密码 可以直接使用--password来制定密码

②mysql数据库的表导入到HDFS

hadoop@linuxidc:~/sqoop/conf$ sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test1

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:19:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:19:21 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:19:21 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:19:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:19:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:19:23 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/15cb67e2b315154cdf02e3a17cf32bbe/weblogs.jar

14/10/21 18:19:23 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:19:23 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:19:23 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:19:23 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:19:23 INFO mapreduce.ImportJobBase: Beginning import of weblogs

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

14/10/21 18:19:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/10/21 18:19:24 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/10/21 18:19:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/10/21 18:19:25 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

14/10/21 18:19:40 INFO db.DBInputFormat: Using read commited transaction isolation

14/10/21 18:19:41 INFO mapreduce.JobSubmitter: number of splits:1

14/10/21 18:19:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0002

14/10/21 18:19:46 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0002

14/10/21 18:19:46 INFO mapreduce.Job: The url to track the job: N/A

14/10/21 18:19:46 INFO mapreduce.Job: Running job: job_1413879907572_0002

14/10/21 18:20:12 INFO mapreduce.Job: Job job_1413879907572_0002 running in uber mode : false

14/10/21 18:20:12 INFO mapreduce.Job:  map 0% reduce 0%

14/10/21 18:20:41 INFO mapreduce.Job:  map 100% reduce 0%

14/10/21 18:20:45 INFO mapreduce.Job: Job job_1413879907572_0002 completed successfully

14/10/21 18:20:46 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=107189

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=87

HDFS: Number of bytes written=251130

HDFS: Number of read operations=4

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=22668

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=22668

Total vcore-seconds taken by all map tasks=22668

Total megabyte-seconds taken by all map tasks=23212032

Map-Reduce Framework

Map input records=3000

Map output records=3000

Input split bytes=87

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=41

CPU time spent (ms)=1540

Physical memory (bytes) snapshot=133345280

Virtual memory (bytes) snapshot=1201442816

Total committed heap usage (bytes)=76021760

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=251130

14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Transferred 245.2441 KB in 80.7974 seconds (3.0353 KB/sec)

14/10/21 18:20:46 INFO mapreduce.ImportJobBase: Retrieved 3000 records.

-m 表示启动几个map任务来读取数据  如果数据库中的表没有主键这个参数是必须设置的而且只能设定为1  否则会提示

14/10/21 18:18:27 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.

而这个参数设置为几会直接决定导入的文件在hdfs上面是分成几块的 比如 设置为1 则会产生一个数据文件

14/10/21 18:23:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r--  1 hadoop supergroup          0 2014-10-21 18:20 /user/sqoop/test1/_SUCCESS

-rw-r--r--  1 hadoop supergroup    251130 2014-10-21 18:20 /user/sqoop/test1/part-m-00000

这里添加主键:

mysql> desc weblogs;

+--------------+-------------+------+-----+---------+-------+

| Field        | Type        | Null | Key | Default | Extra |

+--------------+-------------+------+-----+---------+-------+

| md5          | varchar(32) | YES  |    | NULL    |      |

| url          | varchar(64) | YES  |    | NULL    |      |

| request_date | date        | YES  |    | NULL    |      |

| request_time | time        | YES  |    | NULL    |      |

| ip          | varchar(15) | YES  |    | NULL    |      |

+--------------+-------------+------+-----+---------+-------+

5 rows in set (0.00 sec)

mysql> alter table weblogs add primary key(md5,ip);

Query OK, 3000 rows affected (1.60 sec)

Records: 3000  Duplicates: 0  Warnings: 0

mysql> desc weblogs;

+--------------+-------------+------+-----+---------+-------+

| Field        | Type        | Null | Key | Default | Extra |

+--------------+-------------+------+-----+---------+-------+

| md5          | varchar(32) | NO  | PRI |        |      |

| url          | varchar(64) | YES  |    | NULL    |      |

| request_date | date        | YES  |    | NULL    |      |

| request_time | time        | YES  |    | NULL    |      |

| ip          | varchar(15) | NO  | PRI |        |      |

+--------------+-------------+------+-----+---------+-------+

5 rows in set (0.02 sec)

然后指定-m

hadoop@linuxidc:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:22:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:24:04 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:24:04 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:24:04 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:24:04 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:24:07 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/7061f445f29510afa2b89729126a57b9/weblogs.jar

14/10/21 18:24:07 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:24:07 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:24:07 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:24:07 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:24:07 ERROR tool.ImportTool: Error during import: No primary key could be found for table weblogs. Please specify one with --split-by or perform a sequential import with '-m 1'.

hadoop@linuxidc:~/sqoop/conf$ sqoop import -m 2  --connect jdbc:mysql://127.0.0.1:3306/realworld --username root -P --table weblogs --target-dir /user/sqoop/test2

Warning: /home/hadoop/sqoop/../hive-hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

14/10/21 18:30:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.2

Enter password:

14/10/21 18:30:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/10/21 18:30:07 INFO tool.CodeGenTool: Beginning code generation

14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:30:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `weblogs` AS t LIMIT 1

14/10/21 18:30:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop

Note: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

14/10/21 18:30:09 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6dbf2401c1a51b81c5b885e6f7d43137/weblogs.jar

14/10/21 18:30:09 WARN manager.MySQLManager: It looks like you are importing from mysql.

14/10/21 18:30:09 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

14/10/21 18:30:09 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

14/10/21 18:30:09 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.

14/10/21 18:30:09 WARN manager.CatalogQueryManager: The table weblogs contains a multi-column primary key. Sqoop will default to the column md5 only for this job.

14/10/21 18:30:09 INFO mapreduce.ImportJobBase: Beginning import of weblogs

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.3.0-cdh5.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.98.1-cdh5.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

14/10/21 18:30:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/10/21 18:30:09 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

14/10/21 18:30:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

14/10/21 18:30:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

14/10/21 18:30:17 INFO db.DBInputFormat: Using read commited transaction isolation

14/10/21 18:30:17 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`md5`), MAX(`md5`) FROM `weblogs`

14/10/21 18:30:17 WARN db.TextSplitter: Generating splits for a textual index column.

14/10/21 18:30:17 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.

14/10/21 18:30:17 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.

14/10/21 18:30:18 INFO mapreduce.JobSubmitter: number of splits:4

14/10/21 18:30:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413879907572_0003

14/10/21 18:30:19 INFO impl.YarnClientImpl: Submitted application application_1413879907572_0003

14/10/21 18:30:19 INFO mapreduce.Job: The url to track the job: N/A

14/10/21 18:30:19 INFO mapreduce.Job: Running job: job_1413879907572_0003

14/10/21 18:30:32 INFO mapreduce.Job: Job job_1413879907572_0003 running in uber mode : false

14/10/21 18:30:32 INFO mapreduce.Job:  map 0% reduce 0%

14/10/21 18:31:12 INFO mapreduce.Job:  map 50% reduce 0%

14/10/21 18:31:13 INFO mapreduce.Job:  map 75% reduce 0%

14/10/21 18:31:15 INFO mapreduce.Job:  map 100% reduce 0%

14/10/21 18:31:21 INFO mapreduce.Job: Job job_1413879907572_0003 completed successfully

14/10/21 18:31:22 INFO mapreduce.Job: Counters: 30

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=429312

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=532

HDFS: Number of bytes written=251209

HDFS: Number of read operations=16

HDFS: Number of large read operations=0

HDFS: Number of write operations=8

Job Counters

Launched map tasks=4

Other local map tasks=4

Total time spent by all maps in occupied slots (ms)=160326

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=160326

Total vcore-seconds taken by all map tasks=160326

Total megabyte-seconds taken by all map tasks=164173824

Map-Reduce Framework

Map input records=3001

Map output records=3001

Input split bytes=532

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=806

CPU time spent (ms)=5450

Physical memory (bytes) snapshot=494583808

Virtual memory (bytes) snapshot=4805771264

Total committed heap usage (bytes)=325058560

File Input Format Counters

Bytes Read=0

File Output Format Counters

Bytes Written=251209

14/10/21 18:31:22 INFO mapreduce.ImportJobBase: Transferred 245.3213 KB in 72.5455 seconds (3.3816 KB/sec)

这里产生的文件跟主键的字段个数以及-m的参数是相关的 大致是-m的值乘以主键字段数,有待考证

hadoop@linuxidc:~/study/cdh5$ hadoop fs -ls /user/sqoop/test2/

14/10/21 18:32:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 5 items

-rw-r--r--  1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/_SUCCESS

-rw-r--r--  1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00000

-rw-r--r--  1 hadoop supergroup    251130 2014-10-21 18:31 /user/sqoop/test2/part-m-00001

-rw-r--r--  1 hadoop supergroup          0 2014-10-21 18:31 /user/sqoop/test2/part-m-00002

-rw-r--r--  1 hadoop supergroup        79 2014-10-21 18:31 /user/sqoop/test2/part-m-00003

这里的主键设计的不合理导致数据分布不均匀~~  有待改进

③数据导出Oracle和HBase

使用export可将hdfs中数据导入到远程数据库中

export --connect jdbc:oracle:thin:@192.168.**.**:**:**--username **--password=** -m1table VEHICLE--export-dir /user/root/VEHICLE

向Hbase导入数据

sqoop import --connect jdbc:oracle:thin:@192.168.**.**:**:**--username**--password=**--m 1 --table VEHICLE --hbase-create-table --hbase-table VEHICLE--hbase-row-key ID --column-family VEHICLEINFO --split-by ID

5.6 测试Mysql数据库的使用

前提:导入mysql jdbc的jar包

①测试数据库连接

sqoop list-databases –connect jdbc:mysql://192.168.10.63 –username root–password 123456

②Sqoop的使用

以下所有的命令每行之后都存在一个空格,不要忘记

(以下6中命令都没有进行过成功测试)

<1>mysql–>hdfs

sqoop export –connect

jdbc:mysql://192.168.10.63/ipj

–username root

–password 123456

–table ipj_flow_user

–export-dir hdfs://192.168.10.63:8020/user/flow/part-m-00000

前提:

(1)hdfs中目录/user/flow/part-m-00000必须存在

(2)如果集群设置了压缩方式lzo,那么本机必须得安装且配置成功lzo

(3)hadoop集群中每个节点都要有对mysql的操作权限

<2>hdfs–>mysql

sqoop import –connect

jdbc:mysql://192.168.10.63/ipj

–table ipj_flow_user

<3>mysql–>hbase

sqoop  import  –connect

jdbc:mysql://192.168.10.63/ipj

–table ipj_flow_user

–hbase-table ipj_statics_test

–hbase-create-table

–hbase-row-key id

–column-family imei

<4>hbase–>mysql

关于将Hbase的数据导入到mysql里,Sqoop并不是直接支持的,一般采用如下3种方法:

第一种:将Hbase数据扁平化成HDFS文件,然后再由Sqoop导入.

第二种:将Hbase数据导入Hive表中,然后再导入mysql。

第三种:直接使用Hbase的Java API读取表数据,直接向mysql导入

不需要使用Sqoop。

<5>mysql–>hive

sqoop import –connect

jdbc:mysql://192.168.10.63/ipj

–table hive_table_test

–hive-import

–hive-table hive_test_table 或–create-hive-table hive_test_table

<6>hive–>mysql

sqoop export –connect

jdbc:mysql://192.168.10.63/ipj

–username hive

–password 123456

–table target_table

–export-dir /user/hive/warehouse/uv/dt=mytable

前提:mysql中表必须存在

③Sqoop其他操作

<1>列出mysql中的所有数据库

sqoop list-databases –connect jdbc:mysql://192.168.10.63:3306/ –usernameroot –password 123456

<2>列出mysql中某个库下所有表

sqoop list-tables –connect jdbc:mysql://192.168.10.63:3306/ipj –usernameroot –password 123456

6 Sqoop1的性能

测试数据:

表名:tb_keywords

行数:11628209

数据文件大小:1.4G

测试结果:

HDFS--->DB

HDFS

Sqoop

428s

166s

HDFSFILEDB

209s

105s

从结果上来看,以FILE作为中转方式性能是要高于SQOOP的,原因如下:

本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的导入\导出工具效率高以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复

即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。

Sqoop 的详细介绍:请点这里

Sqoop 的下载地址:请点这里

0b1331709591d260c1c78e86d0c51c18.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值