1.环境信息
[hadoop@master sqoop-1.4.6]$ cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
[hadoop@master sqoop-1.4.6]$
[hadoop@master sqoop-1.4.6]$ mysql --version
mysql Ver 14.14 Distrib 5.6.37, for Linux (x86_64) using EditLine wrapper
[hadoop@master sqoop-1.4.6]$ hadoop version
Hadoop 2.8.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 20fe5304904fc2f5a18053c389e43cd26f7a70fe
Compiled by vinodkv on 2017-06-02T06:14Z
Compiled with protoc 2.5.0
From source with checksum 60125541c2b3e266cbf3becc5bda666
This command was run using /home/hadoop/hadoop-2.8.1/share/hadoop/common/hadoop-common-2.8.1.jar
[hadoop@master sqoop-1.4.6]$ hive --version
Hive 2.1.1
Subversion git://jcamachorodriguez-rMBP.local/Users/jcamachorodriguez/src/workspaces/hive/HIVE-release2/hive -r 1af77bbf8356e86cabbed92cfa8cc2e1470a1d5c
Compiled by jcamachorodriguez on Tue Nov 29 19:46:12 GMT 2016
From source with checksum 569ad6d6e5b71df3cb04303183948d90
[hadoop@master sqoop-1.4.6]$ hbase version
HBase 1.2.6
Source code repository file:///home/busbey/projects/hbase/hbase-assembly/target/hbase-1.2.6 revision=Unknown
Compiled by busbey on Mon May 29 02:25:32 CDT 2017
From source with checksum 7e8ce83a648e252758e9dae1fbe779c9
2.下载
http://mirror.bit.edu.cn/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
这个目录下有多个sqoop ,请在请注意下上面这个版本, sqoop-1.4.6.tar.gz 这个包少sqoop-1.4.6.jar文件。
2.1 解压配置
解压到/home/hadoop 下面,并重命名为sqoop-1.4.6
[hadoop@master sqoop-1.4.6]$ cd /home/hadoop/sqoop-1.4.6
2.1.1 拷贝文件
[hadoop@master sqoop-1.4.6]$ cp sqoop-1.4.6.jar lib/ #拷贝sqoop-1.4.6.jar到lib目录
[hadoop@master sqoop-1.4.6]$ cp mysql-connector-java-5.1.44-bin.jar lib/ #拷贝mysql驱动包到lib目录,这个包需要下载解压,前面hive安装提过
2.2 配置
sqoop-env.sh
[hadoop@master conf]$ cat sqoop-env.sh # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # included in all the hadoop scripts with source command # should not be executable directly # also should not be passed any arguments, since we need original $* # Set Hadoop-specific environment variables here. #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.8.1/ #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce #set the path to where bin/hbase is available export HBASE_HOME=/home/hadoop/hbase-1.2.6 #Set the path to where bin/hive is available export HIVE_HOME=/home/hadoop/apache-hive-2.1.1 #Set the path for where zookeper config dir is #export ZOOCFGDIR= #使用自带的
/etc/profile 变量设置 最后添加
export JAVA_HOME=/usr/java/jdk1.8.0_131/ export HADOOP_HOME=/home/hadoop/hadoop-2.8.1/ export HIVE_HOME=/home/hadoop/apache-hive-2.1.1 export HBASE_HOME=/home/hadoop/hbase-1.2.6 export SQOOP_HOME=/home/hadoop/sqoop-1.4.6 #添加sqoop变量 export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$HBASE_HOME/bin:$SQOOP_HOME/bin #添加sqoop变量
configure-sqoop
注释下面部分(启动会告警):
[hadoop@master bin]$ cat -n configure-sqoop ........... 134 ## Moved to be a runtime check in sqoop.
135 #if [ ! -d "${HCAT_HOME}" ]; then 136 # echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail." 137 # echo 'Please set $HCAT_HOME to the root of your HCatalog installation.' 138 #fi 139 140 #if [ ! -d "${ACCUMULO_HOME}" ]; then 141 # echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail." 142 # echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.' 143 #fi ..........
3.验证[hadoop@master bin]$ sqoop import --connect jdbc:mysql://10.0.1.98/ykt --table paper_detail --username root --password 123456 --direct -m 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/09/23 19:52:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 17/09/23 19:52:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 17/09/23 19:52:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 17/09/23 19:52:02 INFO tool.CodeGenTool: Beginning code generation 17/09/23 19:52:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `paper_detail` AS t LIMIT 1 17/09/23 19:52:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `paper_detail` AS t LIMIT 1 17/09/23 19:52:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce Note: /tmp/sqoop-hadoop/compile/f2ada91047344c6af2723a1a8044f440/paper_detail.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 17/09/23 19:52:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/f2ada91047344c6af2723a1a8044f440/paper_detail.jar 17/09/23 19:52:06 INFO manager.DirectMySQLManager: Beginning mysqldump fast path import 17/09/23 19:52:06 INFO mapreduce.ImportJobBase: Beginning import of paper_detail 17/09/23 19:52:07 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 17/09/23 19:52:10 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 17/09/23 19:52:11 INFO client.RMProxy: Connecting to ResourceManager at master/10.0.1.118:18040 17/09/23 19:52:22 INFO db.DBInputFormat: Using read commited transaction isolation 17/09/23 19:52:23 INFO mapreduce.JobSubmitter: number of splits:1 17/09/23 19:52:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1506138213755_0004 17/09/23 19:52:28 INFO impl.YarnClientImpl: Submitted application application_1506138213755_0004 17/09/23 19:52:29 INFO mapreduce.Job: The url to track the job: http://master:18088/proxy/application_1506138213755_0004/ 17/09/23 19:52:29 INFO mapreduce.Job: Running job: job_1506138213755_0004 17/09/23 19:52:55 INFO mapreduce.Job: Job job_1506138213755_0004 running in uber mode : false 17/09/23 19:52:55 INFO mapreduce.Job: map 0% reduce 0% 17/09/23 19:53:26 INFO mapreduce.Job: map 100% reduce 0% 17/09/23 19:53:27 INFO mapreduce.Job: Job job_1506138213755_0004 completed successfully 17/09/23 19:53:28 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=160883 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=27954 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=27272 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=27272 Total vcore-seconds taken by all map tasks=27272 Total megabyte-seconds taken by all map tasks=27926528 Map-Reduce Framework Map input records=1 Map output records=962 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=92 CPU time spent (ms)=1150 Physical memory (bytes) snapshot=105013248 Virtual memory (bytes) snapshot=2092593152 Total committed heap usage (bytes)=17776640 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=27954 17/09/23 19:53:28 INFO mapreduce.ImportJobBase: Transferred 27.2988 KB in 77.2979 seconds (361.6399 bytes/sec) 17/09/23 19:53:28 INFO mapreduce.ImportJobBase: Retrieved 962 records.
查看HDFS文件数据
如果不指定hdfs目录默认生成在/user/someuser/paper_detail/[hadoop@master bin]$ hadoop fs -cat /user/hadoop/paper_detail/part-m-00000 |more 1,1,填空,22,5,1,1 1,1,填空,23,5,2,2 1,1,填空,31,5,3,3 1,2,解答题,403,5,1,4 1,2,解答题,394,5,2,5 1,3,多选题,987,5,1,6 1,4,单选题,757,5,1,7 1,4,单选题,133,5,2,8 2,1,单项选择,19,1,1,1 2,2,数字选择,18,2,1,2 2,2,数字选择,21,2,2,3 2,2,数字选择,24,2,3,4 2,2,数字选择,25,2,4,5 2,3,填空,20,5,1,6 2,3,填空,23,10,2,7 2,4,233,395,12,1,8 3,1,单项选择,16,2,1,1 3,1,单项选择,17,2,2,2 3,1,单项选择,18,2,3,3 3,1,单项选择,21,2,4,4 3,1,单项选择,24,2,5,5 3,1,单项选择,25,2,6,6 3,2,解答,62,2,1,7 3,2,解答,63,2,2,8 4,1,选择题,717,2,1,1 4,1,选择题,718,2,2,2 ......
sqoop 官方手册:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_the_import_process