sqoop使用

1.安装

Step 3: Downloading Sqoop

We can download the latest version of Sqoop from the following link For this tutorial, we are using version 1.4.5, that is, sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz.

Step 4: Installing Sqoop

The following commands are used to extract the Sqoop tar ball and move it to “/usr/lib/sqoop” directory.

$tar -xvf sqoop-1.4.4.bin__hadoop-2.0.4-alpha.tar.gz
$ su
password:

# mv sqoop-1.4.4.bin__hadoop-2.0.4-alpha /usr/lib/sqoop
#exit

Step 5: Configuring bashrc

You have to set up the Sqoop environment by appending the following lines to ~/.bashrc file:

#Sqoop
export SQOOP_HOME=/usr/lib/sqoop export PATH=$PATH:$SQOOP_HOME/bin

The following command is used to execute ~/.bashrc file.

$ source ~/.bashrc

Step 6: Configuring Sqoop

To configure Sqoop with Hadoop, you need to edit the sqoop-env.sh file, which is placed in the $SQOOP_HOME/conf directory. First of all, Redirect to Sqoop config directory and copy the template file using the following command:

$ cd $SQOOP_HOME/conf
$ mv sqoop-env-template.sh sqoop-env.sh

Open sqoop-env.sh and edit the following lines:

export HADOOP_COMMON_HOME=/usr/local/hadoop 
export HADOOP_MAPRED_HOME=/usr/local/hadoop

Step 7: Download and Configure mysql-connector-java

We can download mysql-connector-java-5.1.30.tar.gz file from the following link.

The following commands are used to extract mysql-connector-java tarball and move mysql-connector-java-5.1.30-bin.jar to /usr/lib/sqoop/lib directory.

$ tar -zxf mysql-connector-java-5.1.30.tar.gz
$ su
password:

# cd mysql-connector-java-5.1.30
# mv mysql-connector-java-5.1.30-bin.jar /usr/lib/sqoop/lib

Step 8: Verifying Sqoop

The following command is used to verify the Sqoop version.

$ cd $SQOOP_HOME/bin
$ sqoop-version

Expected output:

14/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
Sqoop 1.4.5 git commit id 5b34accaca7de251fc91161733f906af2eddbe83
Compiled by abe on Fri Aug 1 11:19:26 PDT 2014

Sqoop installation is complete.

2.导入数据

Importing a Table

Sqoop tool ‘import’ is used to import table data from the table to the Hadoop file system as a text file or a binary file.

The following command is used to import the emp table from MySQL database server to HDFS.

 

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp --m 1

If it is executed successfully, then you get the following output.

 

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation
14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar
-----------------------------------------------------
-----------------------------------------------------
14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1419242001831_0001/
14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false
14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0%
14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0%
14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully
-----------------------------------------------------
-----------------------------------------------------
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)
14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

To verify the imported data in HDFS, use the following command.

 

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

It shows you the emp table data and fields are separated with comma (,).

 

1201, gopal,    manager, 50000, TP
1202, manisha,  preader, 50000, TP
1203, kalil,    php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi,  admin,   20000, TP

Importing into Target Directory

We can specify the target directory while importing table data into HDFS using the Sqoop import tool.

Following is the syntax to specify the target directory as option to the Sqoop import command.

 

--target-dir <new or exist directory in HDFS>

The following command is used to import emp_add table data into ‘/queryresult’ directory.

 

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--target-dir /queryresult

The following command is used to verify the imported data in /queryresult directory form emp_add table.

 

$ $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*

It will show you the emp_add table data with comma (,) separated fields.

 

1201, 288A, vgiri,   jublee
1202, 108I, aoc,     sec-bad
1203, 144Z, pgutta,  hyd
1204, 78B,  oldcity, sec-bad
1205, 720C, hitech,  sec-bad

Import Subset of Table Data

We can import a subset of a table using the ‘where’ clause in Sqoop import tool. It executes the corresponding SQL query in the respective database server and stores the result in a target directory in HDFS.

The syntax for where clause is as follows.

 

--where <condition>

The following command is used to import a subset of emp_add table data. The subset query is to retrieve the employee id and address, who lives in Secunderabad city.

 

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp_add \
--m 1 \
--where city =’sec-bad’” \
--target-dir /wherequery

The following command is used to verify the imported data in /wherequery directory from the emp_add table.

 

$ $HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*

It will show you the emp_add table data with comma (,) separated fields.

 

1202, 108I, aoc,     sec-bad
1204, 78B,  oldcity, sec-bad
1205, 720C, hitech,  sec-bad

Incremental Import

Incremental import is a technique that imports only the newly added rows in a table. It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import.

The following syntax is used for the incremental option in Sqoop import command.

 

--incremental <mode>
--check-column <column name>
--last value <last check column value>

Let us assume the newly added data into emp table is as follows:

 

1206, satish p, grp des, 20000, GR

The following command is used to perform the incremental import in the emptable.

 

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--table emp \
--m 1 \
--incremental append \
--check-column id \
-last value 1205

The following command is used to verify the imported data from emp table to HDFS emp/ directory.

 

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

It shows you the emp table data with comma (,) separated fields.

 

1201, gopal,    manager, 50000, TP
1202, manisha,  preader, 50000, TP
1203, kalil,    php dev, 30000, AC
1204, prasanth, php dev, 30000, AC
1205, kranthi,  admin,   20000, TP
1206, satish p, grp des, 20000, GR

The following command is used to see the modified or newly added rows from the emp table.

 

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1

It shows you the newly added rows to the emp table with comma (,) separated fields.

 

1206, satish p, grp des, 20000, GR

Syntax

The following syntax is used to import all tables.

 

$ sqoop import-all-tables (generic-args) (import-args) 
$ sqoop-import-all-tables (generic-args) (import-args)

Example

Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows.

 

 +--------------------+
 |      Tables        |
 +--------------------+
 |      emp           |
 |      emp_add       |
 |      emp_contact   |
 +--------------------+

The following command is used to import all the tables from the userdbdatabase.

 

$ sqoop import-all-tables \
--connect jdbc:mysql://localhost/userdb \
--username root

Note: If you are using the import-all-tables, it is mandatory that every table in that database must have a primary key field.

The following command is used to verify all the table data to the userdb database in HDFS

 

$ $HADOOP_HOME/bin/hadoop fs -ls

It will show you the list of table names in userdb database as directories.

Output

 

drwxr-xr-x - hadoop supergroup 0 2014-12-22 22:50 _sqoop
drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:46 emp
drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:50 emp_add
drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:52 emp_contact

 

3.导出数据

Syntax

The following is the syntax for the export command.

 

$ sqoop export (generic-args) (export-args) 
$ sqoop-export (generic-args) (export-args)

Example

Let us take an example of the employee data in file, in HDFS. The employee data is available in emp_data file in ‘emp/’ directory in HDFS. The emp_datais as follows.

 

1201, gopal,     manager, 50000, TP
1202, manisha,   preader, 50000, TP
1203, kalil,     php dev, 30000, AC
1204, prasanth,  php dev, 30000, AC
1205, kranthi,   admin,   20000, TP
1206, satish p,  grp des, 20000, GR

It is mandatory that the table to be exported is created manually and is present in the database from where it has to be exported.

The following query is used to create the table ‘employee’ in mysql command line.

 

$ mysql
mysql> USE db;
mysql> CREATE TABLE employee ( 
   id INT NOT NULL PRIMARY KEY, 
   name VARCHAR(20), 
   deg VARCHAR(20),
   salary INT,
   dept VARCHAR(10));

The following command is used to export the table data (which is in emp_datafile on HDFS) to the employee table in db database of Mysql database server.

 

$ sqoop export \
--connect jdbc:mysql://localhost/db \
--username root \
--table employee \ 
--export-dir /emp/emp_data

The following command is used to verify the table in mysql command line.

 

mysql>select * from employee;

If the given data is stored successfully, then you can find the following table of given employee data.

 

+------+--------------+-------------+-------------------+--------+
| Id   | Name         | Designation | Salary            | Dept   |
+------+--------------+-------------+-------------------+--------+
| 1201 | gopal        | manager     | 50000             | TP     |
| 1202 | manisha      | preader     | 50000             | TP     |
| 1203 | kalil        | php dev     | 30000             | AC     |
| 1204 | prasanth     | php dev     | 30000             | AC     |
| 1205 | kranthi      | admin       | 20000             | TP     |
| 1206 | satish p     | grp des     | 20000             | GR     |
+------+--------------+-------------+-------------------+--------+

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值