Hadoop群集与关系数据库RDBMS之间的协同工作

1.Hadoop集群上把mysql中的数据HDFS:

(1)准备工作:安装mysql数据库,并创建一个database 名为:hadooptest;

然后在hadooptest下创建table名为employees:其内容为(此数据来自hadoop基础教程一书):

Alice Engineering 50000 2009-03-12

Bob Sales 35000 2011-10-01

Camille Marketing 40000 2003-04-20

David Executive 75000 2001-03-20

Erica Support 34000 2011-07-07

(2)下载并安装sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz;注意要设置相应的环境变量:在 /etc/profile 里面;

 然后下载mysql-connector-java-5.1.34-bin.jar,并把其拷到 sqoop的lib目录下:注意要设置相应的环境变量: vim  /etc/profile  ;

#set sqoop path

export SQOOP_HOME=/usr/sqoop

export PATH=${SQOOP_HOME}/bin:${PATH}

export CLASSPATH=${SQOOP_HOME}/lib/mysql-connector-java-5.1.34-bin.jar:${CLASSPATH}

(3)运行命令: sqoop import --connect jdbc:mysql://localhost/hadooptest --username hadoop --password 20082009 --table employees -m 1时,产生下面错误:


查看logs:从中发现是因为其他从机无法连接Master的数据库,所以出现错误;



针对上述问题,必须在Master主机里面以root的身份来赋予其他从机的访问权限,而且是无密码访问:


设置完后再次运行命令:把localhost变为Master 主机的IP地址,命令如下:提示主机Master.hadoop无法访问数据库:


下面修改数据库的访问权限,赋予Master.hadoop权限,密码设置为空:


设置完成后,再次执行命令:

sqoop import --connect jdbc:mysql://172.16.2.17/hadooptest --username  hadoop --table employees -m 1

(默认情况下,sqoop会启动4mapper读取数据,尽管数据集很小;通过-m可以指定mapper的数量)运行成功:



查看HDFS上的输出结果:成功把mysql的数据导入HDFS上去:




2.Hadoop 群集通过sqoop把mysql数据导入hive中:

(1)首先要删除之前的HDFS上的输出目录:  hadoop  fs  -rmr  em* 
(2)使用sqoop执行数据导入任务:
[hadoop@Master ~]$ sqoop import --connect jdbc:mysql://172.16.2.17/hadooptest --username hadoop --table employees --hive-import --hive-table employees
Warning: /usr/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
14/12/29 10:29:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/29 10:29:56 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/12/29 10:29:56 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/12/29 10:29:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/29 10:29:56 INFO tool.CodeGenTool: Beginning code generation
14/12/29 10:29:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:29:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:29:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop
Note: /tmp/sqoop-hadoop/compile/934eea95d6f0f4b0d86d229dbc48686c/employees.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/12/29 10:29:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/934eea95d6f0f4b0d86d229dbc48686c/employees.jar
14/12/29 10:29:57 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/12/29 10:29:57 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/12/29 10:29:57 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/12/29 10:29:57 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/12/29 10:29:57 INFO mapreduce.ImportJobBase: Beginning import of employees
14/12/29 10:29:57 INFO db.DBInputFormat: Using read commited transaction isolation
14/12/29 10:29:57 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`first_name`), MAX(`first_name`) FROM `employees`
14/12/29 10:29:57 WARN db.TextSplitter: Generating splits for a textual index column.
14/12/29 10:29:57 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
14/12/29 10:29:57 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
14/12/29 10:29:57 INFO mapred.JobClient: Running job: job_201412282237_0008
14/12/29 10:29:58 INFO mapred.JobClient:  map 0% reduce 0%
14/12/29 10:30:12 INFO mapred.JobClient:  map 50% reduce 0%
14/12/29 10:30:15 INFO mapred.JobClient:  map 100% reduce 0%
14/12/29 10:30:20 INFO mapred.JobClient: Job complete: job_201412282237_0008
14/12/29 10:30:20 INFO mapred.JobClient: Counters: 18
14/12/29 10:30:20 INFO mapred.JobClient:   Map-Reduce Framework
14/12/29 10:30:20 INFO mapred.JobClient:     Spilled Records=0
14/12/29 10:30:20 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2141048832
14/12/29 10:30:20 INFO mapred.JobClient:     Map input records=5
14/12/29 10:30:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=513
14/12/29 10:30:20 INFO mapred.JobClient:     Map output records=5
14/12/29 10:30:20 INFO mapred.JobClient:     Physical memory (bytes) snapshot=280956928
14/12/29 10:30:20 INFO mapred.JobClient:     CPU time spent (ms)=2510
14/12/29 10:30:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=216793088
14/12/29 10:30:20 INFO mapred.JobClient:   File Input Format Counters 
14/12/29 10:30:20 INFO mapred.JobClient:     Bytes Read=0
14/12/29 10:30:20 INFO mapred.JobClient:   FileSystemCounters
14/12/29 10:30:20 INFO mapred.JobClient:     HDFS_BYTES_READ=513
14/12/29 10:30:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=121353
14/12/29 10:30:20 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=161
14/12/29 10:30:20 INFO mapred.JobClient:   Job Counters 
14/12/29 10:30:20 INFO mapred.JobClient:     Launched map tasks=4
14/12/29 10:30:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/12/29 10:30:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/12/29 10:30:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=24231
14/12/29 10:30:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/12/29 10:30:20 INFO mapred.JobClient:   File Output Format Counters 
14/12/29 10:30:20 INFO mapred.JobClient:     Bytes Written=161
14/12/29 10:30:20 INFO mapreduce.ImportJobBase: Transferred 161 bytes in 23.6772 seconds (6.7998 bytes/sec)
14/12/29 10:30:20 INFO mapreduce.ImportJobBase: Retrieved 5 records.
14/12/29 10:30:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:30:20 WARN hive.TableDefWriter: Column start_date had to be cast to a less precise type in Hive
14/12/29 10:30:20 INFO hive.HiveImport: Removing temporary files from import process: hdfs://172.16.2.17:9000/user/hadoop/employees/_logs
14/12/29 10:30:20 INFO hive.HiveImport: Loading uploaded data into Hive
14/12/29 10:30:21 INFO hive.HiveImport: Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
14/12/29 10:30:21 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291030_581129438.txt
14/12/29 10:30:25 INFO hive.HiveImport: OK
14/12/29 10:30:25 INFO hive.HiveImport: Time taken: 3.157 seconds
14/12/29 10:30:25 INFO hive.HiveImport: Loading data to table default.employees
14/12/29 10:30:25 INFO hive.HiveImport: OK
14/12/29 10:30:25 INFO hive.HiveImport: Time taken: 0.185 seconds
14/12/29 10:30:25 INFO hive.HiveImport: Hive import complete.
14/12/29 10:30:25 INFO hive.HiveImport: Export directory is empty, removing it.
(3)查看输出的结果:
[hadoop@Master ~]$ hive -e "select * from employees"
Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291031_1993004175.txt
OK
Alice   Engineering     50000   2009-03-12
Bob     Sales   35000   2011-10-01
Camille Marketing       40000   2003-04-20
David   Executive       75000   2001-03-20
Erica   Support 34000   2011-07-07
Time taken: 2.586 seconds
[hadoop@Master ~]$ hive -e "describe employees"
Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291041_685213829.txt
OK
first_name      string
dept    string
salary  int
start_date      string
Time taken: 2.297 seconds




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值