Sqoop export and import commands

最新推荐文章于 2022-03-27 23:00:50 发布

jiedushi

最新推荐文章于 2022-03-27 23:00:50 发布

阅读量5.3k

点赞数

分类专栏： hadoop+hive+hbase 文章标签： import table hbase mysql jdbc command

hadoop+hive+hbase 专栏收录该内容

80 篇文章 1 订阅

订阅专栏

Sqoop Import Examples:

Sqoop Import :- Import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS) and its subprojects (Hive, HBase).

Import the data (MySQL table) to HBase:

Case 1: If table have primary key and import all the column of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table

Case 2: If table have primary key and import only few columns of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table

Note : Column names specified in --columns attribute must contain the primary key column.

Case 3: If table doesn't have primary key then choose one column as a hbase-row-key. Import all the column of MySQL table into HBase table.

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 --hbase-create-table

Case 4: If table doesn't have primary key then choose one column as a hbase-row-key. Import only few columns of MySQL table into HBase table.

Note: Column name specified in hbase-row-key atribute must be in columns list. Otherwise command will execute successfully but no records are inserted into hbase.

Note : The value of primary key column or column specified in --hbase-row-key attribute become the HBase row value. If MySQL table doesn't have primary key or column specified in --hbase-row-key attribute doesn't have unique value then there is a lost of few records.

Example : Let us consider a MySQL table test_table which have two columns name,address. The table test_table doesn't have primary key or unique key column.

Records of test_table:
________________
name    address
----------------
abc    123
sqw    345
abc    125
sdf    1234
aql    23dw

Run the following command to import test_table data into HBase:

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table test_table --hbase-table hbase_test_table --column-family test_table_col1 --hbase-row-key name --hbase-create-table

Only 4 records are visible into HBase table instead of 5. In above example two rows have same value 'abc' of name column and value of this column is used as a HBase row key value. If record having value 'abc' of name column come then thoes record will inserted into HBase table. Next time, another record having the same value 'abc' of name column come then thoes column will overwrite the value previous column.

Above problem also occured if table have composite primary key because the one column from composite key is used as a HBase row key.

Import the data (MySQL table) to Hive

Case 1: Import MySQL table into Hive if table have primary key.

bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

Case 2: Import MySQL table into Hive if table doesn't have primary key.

$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home --split-by column_name

or

$ bin/sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1

Import the data (MySQL table) to HDFS

Case 1: Import MySQL table into HDFS if table have primary key.

$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName

Case 2: Import MySQL table into HDFS if table doesn't have primary key.

$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName -m 1

Sqoop Export Examples:

Sqoop Export: export the HDFS and its subproject (Hive, HBase) data back into an RDBM S.

Export Hive table back to an RDBMS:

By default, Hive will stored data using ^A as a field delimiter and \n as a row delimiter.

$ bin/sqoop export --connect jdbc:mysql://localhost/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'

where '\001' is octal representation of ^A.