Sqoop的安装,简介与简单使用

1 Sqoop的安装

首先,sqoop的下载地址:http://sqoop.apache.org/

点击download

点击这个链接即可下载

将sqoop下载到本地以后 , 传到 Linux 平台 , 然后解压到自己的指定目录 , 修改配置文件 , 配置sqoop到 /etc/profile 变量里面 , 在lib下面添加连接数据库的jar包

因为原来的evn是一个模板 , 所以我们要自己复制一个

修改 sqoop-env.sh

添加sqoop到环境变量中

添加jar包到lib目录下面

source /etc/profile  重新加载一下配置文件

然后  sqoop -version 即安装成功


2 sqoop的简介

sqoop 是 apache 旗下一款“Hadoop 和关系数据库服务器之间传送数据”的工具。

核心的功能有两个:

导入、迁入

导出、迁出

导入数据:MySQL,Oracle 导入数据到 Hadoop 的 HDFS、HIVE、HBASE 等数据存储系统

导出数据:从 Hadoop 的文件系统中导出数据到关系数据库 mysql 等 Sqoop 的本质还是一个命令行工具,和 HDFS,Hive 相比,并没有什么高深的理论。

sqoop工具:本质就是迁移数据, 迁移的方式:就是把sqoop的迁移命令转换成MR程序

hive工具 : 本质就是执行计算,依赖于HDFS存储数据,把SQL转换成MR程序

sqoop的工作机制 : 将导入或导出命令翻译成 MapReduce 程序来实现 在翻译出的 MapReduce 中主要是对 InputFormat 和 OutputFormat 进行定制


3 sqoop的使用

常用参数 :

--connect <jdbc-uri> jdbc 连接地址
--connection-manager <class-name> 连接管理者
--driver <class-name> 驱动类
--hadoop-mapred-home <dir> $HADOOP_MAPRED_HOME
--help help 信息
-P 从命令行输入密码
--password <password> 密码
--username <username> 账号
--verbose 打印流程信息
--connection-param-file <filename> 可选参数

第一类:数据库中的数据导入到HDFS上

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --columns 'id, account, income, expenses'
 
 

指定输出路径、指定数据分隔符

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --target-dir '/sqoop/td' --fields-terminated-by '\t'

 
 

指定Map数量 -m

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --target-dir '/sqoop/td1' --fields-terminated-by '\t' -m 2
 
 

 增加where条件, 注意:条件必须用引号引起来

sqoop import --connect jdbc:mysql://192.168.1.10:3306/bigdata30 --username root --password 123456  --table trade_detail --where 'id>3' --target-dir '/sqoop/td2' 
 
 

增加query语句(使用 \ 将语句换行)


 
 
  1. sqoop import --connect jdbc:mysql: //192.168.1.10:3306/bigdata30 --username root --password 123456 \
  2. --query ' SELECT * FROM trade_detail where id > 2 AND $ CONDITIONS' -- split-by trade_detail.id --target-dir '/sqoop/td3'

注意:如果使用--query这个命令的时候,需要注意的是where后面的参数,AND $CONDITIONS这个参数必须加上 而且存在单引号与双引号的区别,如果--query后面使用的是双引号,那么需要在$CONDITIONS前加上\即\$CONDITIONS 如果设置map数量为1个时即-m 1,不用加上--split-by ${tablename.column},否则需要加上

第二类:将HDFS上的数据导出到数据库中(不要忘记指定分隔符)

sqoop export --connect jdbc:mysql://192.168.8.120:3306/bigdata30 --username root --password 123456 --export-dir '/td3' --table td_bak -m 1 --fields-terminated-by ','
 
 

输入sqoop help 即可查看sqoop支持哪些命令

输入sqoop help import 即可  如果不知道使用方式,可以使用 sqoop command 的方式 来查看某条具体命令的使用方式,比如:


 
 
  1. Common arguments:
  2. --connect <jdbc-uri> Specify JDBC
  3. connect
  4. string
  5. --connection-manager <class-name> Specify
  6. connection
  7. manager
  8. class name
  9. --connection-param-file <properties-file> Specify
  10. connection
  11. parameters
  12. file
  13. --driver <class-name> Manually
  14. specify JDBC
  15. driver class
  16. to use
  17. --hadoop-home <hdir> Override
  18. $HADOOP_MAPR
  19. ED_HOME_ARG
  20. --hadoop-mapred-home <dir> Override
  21. $HADOOP_MAPR
  22. ED_HOME_ARG
  23. -- help Print usage
  24. instructions
  25. --metadata-transaction-isolation-level <isolationlevel> Defines the
  26. transaction
  27. isolation
  28. level for
  29. metadata
  30. queries. For
  31. more details
  32. check
  33. java.sql.Con
  34. nection
  35. javadoc or
  36. the JDBC
  37. specificaito
  38. n
  39. --oracle-escaping-disabled <boolean> Disable the
  40. escaping
  41. mechanism of
  42. the
  43. Oracle/OraOo
  44. p connection
  45. managers
  46. -P Read
  47. password
  48. from console
  49. --password <password> Set
  50. authenticati
  51. on password
  52. --password-alias <password-alias> Credential
  53. provider
  54. password
  55. alias
  56. --password-file <password-file> Set
  57. authenticati
  58. on password
  59. file path
  60. --relaxed-isolation Use
  61. read-uncommi
  62. tted
  63. isolation
  64. for imports
  65. --skip-dist-cache Skip copying
  66. jars to
  67. distributed
  68. cache
  69. --temporary-rootdir <rootdir> Defines the
  70. temporary
  71. root
  72. directory
  73. for the
  74. import
  75. --throw-on-error Rethrow a
  76. RuntimeExcep
  77. tion on
  78. error
  79. occurred
  80. during the
  81. job
  82. --username <username> Set
  83. authenticati
  84. on username
  85. --verbose Print more
  86. information
  87. while
  88. working
  89. Import control arguments:
  90. --append Imports data
  91. in append
  92. mode
  93. --as-avrodatafile Imports data
  94. to Avro data
  95. files
  96. --as-parquetfile Imports data
  97. to Parquet
  98. files
  99. --as-sequencefile Imports data
  100. to
  101. SequenceFile
  102. s
  103. --as-textfile Imports data
  104. as plain
  105. text
  106. (default)
  107. --autoreset-to-one-mapper Reset the
  108. number of
  109. mappers to
  110. one mapper
  111. if no split
  112. key
  113. available
  114. --boundary-query <statement> Set boundary
  115. query for
  116. retrieving
  117. max and min
  118. value of the
  119. primary key
  120. --columns <col,col,col...> Columns to
  121. import from
  122. table
  123. --compression-codec <codec> Compression
  124. codec to use
  125. for import
  126. --delete-target-dir Imports data
  127. in delete
  128. mode
  129. --direct Use direct
  130. import fast
  131. path
  132. --direct-split-size <n> Split the
  133. input stream
  134. every 'n'
  135. bytes when
  136. importing in
  137. direct mode
  138. -e,--query <statement> Import
  139. results of
  140. SQL
  141. 'statement'
  142. --fetch-size <n> Set number
  143. 'n' of rows
  144. to fetch
  145. from the
  146. database
  147. when more
  148. rows are
  149. needed
  150. --inline-lob-limit <n> Set the
  151. maximum size
  152. for an
  153. inline LOB
  154. -m,--num-mappers <n> Use 'n' map
  155. tasks to
  156. import in
  157. parallel
  158. --mapreduce-job-name <name> Set name for
  159. generated
  160. mapreduce
  161. job
  162. --merge-key <column> Key column
  163. to use to
  164. join results
  165. --split-by <column-name> Column of
  166. the table
  167. used to
  168. split work
  169. units
  170. --split-limit <size> Upper Limit
  171. of rows per
  172. split for
  173. split
  174. columns of
  175. Date/Time/Ti
  176. mestamp and
  177. integer
  178. types. For
  179. date or
  180. timestamp
  181. fields it is
  182. calculated
  183. in seconds.
  184. split-limit
  185. should be
  186. greater than
  187. 0
  188. --table <table-name> Table to
  189. read
  190. --target-dir <dir> HDFS plain
  191. table
  192. destination
  193. --validate Validate the
  194. copy using
  195. the
  196. configured
  197. validator
  198. --validation-failurehandler <validation-failurehandler> Fully
  199. qualified
  200. class name
  201. for
  202. ValidationFa
  203. ilureHandler
  204. --validation-threshold <validation-threshold> Fully
  205. qualified
  206. class name
  207. for
  208. ValidationTh
  209. reshold
  210. --validator <validator> Fully
  211. qualified
  212. class name
  213. for the
  214. Validator
  215. --warehouse-dir <dir> HDFS parent
  216. for table
  217. destination
  218. -- where < where clause> WHERE clause
  219. to use
  220. during
  221. import
  222. -z,--compress Enable
  223. compression
  224. Incremental import arguments:
  225. --check-column <column> Source column to check for incremental
  226. change
  227. --incremental <import-type> Define an incremental import of type
  228. 'append' or 'lastmodified'
  229. --last-value <value> Last imported value in the incremental
  230. check column
  231. Output line formatting arguments:
  232. --enclosed-by <char> Sets a required field enclosing
  233. character
  234. --escaped-by <char> Sets the escape character
  235. --fields-terminated-by <char> Sets the field separator character
  236. --lines-terminated-by <char> Sets the end-of-line character
  237. --mysql-delimiters Uses MySQL 's default delimiter set:
  238. fields: , lines: \n escaped-by: \
  239. optionally-enclosed-by: '
  240. --optionally-enclosed-by <char> Sets a field enclosing character
  241. Input parsing arguments:
  242. --input-enclosed-by <char> Sets a required field encloser
  243. --input-escaped-by <char> Sets the input escape
  244. character
  245. --input-fields-terminated-by <char> Sets the input field separator
  246. --input-lines-terminated-by <char> Sets the input end-of-line
  247. char
  248. --input-optionally-enclosed-by <char> Sets a field enclosing
  249. character
  250. Hive arguments:
  251. --create-hive-table Fail if the target hive
  252. table exists
  253. --external-table-dir <hdfs path> Sets where the external
  254. table is in HDFS
  255. --hive-database <database-name> Sets the database name to
  256. use when importing to hive
  257. --hive-delims-replacement <arg> Replace Hive record \0x01
  258. and row delimiters (\n\r)
  259. from imported string fields
  260. with user-defined string
  261. --hive-drop-import-delims Drop Hive record \0x01 and
  262. row delimiters (\n\r) from
  263. imported string fields
  264. --hive-home <dir> Override $HIVE_HOME
  265. --hive-import Import tables into Hive
  266. (Uses Hive 's default
  267. delimiters if none are
  268. set.)
  269. --hive-overwrite Overwrite existing data in
  270. the Hive table
  271. --hive-partition-key <partition-key> Sets the partition key to
  272. use when importing to hive
  273. --hive-partition-value <partition-value> Sets the partition value to
  274. use when importing to hive
  275. --hive-table <table-name> Sets the table name to use
  276. when importing to hive
  277. --map-column-hive <arg> Override mapping for
  278. specific column to hive
  279. types.
  280. HBase arguments:
  281. --column-family <family> Sets the target column family for the
  282. import
  283. --hbase-bulkload Enables HBase bulk loading
  284. --hbase-create-table If specified, create missing HBase tables
  285. --hbase-row-key <col> Specifies which input column to use as the
  286. row key
  287. --hbase-table <table> Import to <table> in HBase
  288. HCatalog arguments:
  289. --hcatalog-database <arg> HCatalog database name
  290. --hcatalog-home <hdir> Override $HCAT_HOME
  291. --hcatalog-partition-keys <partition-key> Sets the partition
  292. keys to use when
  293. importing to hive
  294. --hcatalog-partition-values <partition-value> Sets the partition
  295. values to use when
  296. importing to hive
  297. --hcatalog-table <arg> HCatalog table name
  298. --hive-home <dir> Override $HIVE_HOME
  299. --hive-partition-key <partition-key> Sets the partition key
  300. to use when importing
  301. to hive
  302. --hive-partition-value <partition-value> Sets the partition
  303. value to use when
  304. importing to hive
  305. --map-column-hive <arg> Override mapping for
  306. specific column to
  307. hive types.
  308. HCatalog import specific options:
  309. --create-hcatalog-table Create HCatalog before import
  310. --drop-and-create-hcatalog-table Drop and Create HCatalog before
  311. import
  312. --hcatalog-storage-stanza <arg> HCatalog storage stanza for table
  313. creation
  314. Accumulo arguments:
  315. --accumulo-batch-size <size> Batch size in bytes
  316. --accumulo-column-family <family> Sets the target column family for
  317. the import
  318. --accumulo-create-table If specified, create missing
  319. Accumulo tables
  320. --accumulo-instance <instance> Accumulo instance name.
  321. --accumulo-max-latency <latency> Max write latency in milliseconds
  322. --accumulo-password <password> Accumulo password.
  323. --accumulo-row-key <col> Specifies which input column to
  324. use as the row key
  325. --accumulo-table <table> Import to <table> in Accumulo
  326. --accumulo-user <user> Accumulo user name.
  327. --accumulo-visibility <vis> Visibility token to be applied to
  328. all rows imported
  329. --accumulo-zookeepers <zookeepers> Comma-separated list of
  330. zookeepers (host:port)
  331. Code generation arguments:
  332. --bindir <dir> Output directory for
  333. compiled objects
  334. --class-name <name> Sets the generated class
  335. name. This overrides
  336. --package-name. When
  337. combined with --jar-file,
  338. sets the input class.
  339. --escape-mapping-column-names <boolean> Disable special characters
  340. escaping in column names
  341. --input-null-non-string <null-str> Input null non-string
  342. representation
  343. --input-null-string <null-str> Input null string
  344. representation
  345. --jar-file <file> Disable code generation; use
  346. specified jar
  347. --map-column-java <arg> Override mapping for
  348. specific columns to java
  349. types
  350. --null-non-string <null-str> Null non-string
  351. representation
  352. --null-string <null-str> Null string representation
  353. --outdir <dir> Output directory for
  354. generated code
  355. --package-name <name> Put auto-generated classes
  356. in this package
  357. Generic Hadoop command-line arguments:
  358. (must preceed any tool-specific arguments)
  359. Generic options supported are
  360. -conf <configuration file> specify an application configuration file
  361. -D <property=value> use value for given property
  362. -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS ' property from configurations.
  363. -jt <local|resourcemanager:port> specify a ResourceManager
  364. -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
  365. -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
  366. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
  367. The general command line syntax is
  368. command [genericOptions] [commandOptions]
  369. At minimum, you must specify --connect and --table
  370. Arguments to mysqldump and other subprograms may be supplied
  371. after a '-- ' on the command line.

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值