sqoop导入mysql重复,Sqoop导出插入重复条目

I am trying to understand how sqoop export works.I have a table site in mysql which contains two columns id and url and contains two rows

1,www.yahoo.com

2,www.gmail.com

The table has no primary key

When i am exporting the entries from HDFS to mysql site table by executing below command its inserting duplicate entries

I have below entries in HDFS

1,www.one.com

2,www.2.com

3,www.3.com

4,www.4.com

sqoop export --table site --connect jdbc:mysql://localhost/loudacre -- username training --password training --export-dir /site/ --update-mode allowinsert --update-key id

So instead of updating already existing id its inserting duplicate id again (meaning two 1 , 1 for www.one.com and 1 for www.yahoo.com)

even if I remove the --update-key the outcome is same.Does its happening because the table doesn't have primary key

I am using sqoop 1.4.5 in Cloudera quickstart VM

Any help ?

解决方案

As per Sqoop docs,

MySQL will try to insert new row and if the insertion fails with duplicate unique key error it will update appropriate row instead.

So, either --update-key column should be primary key or have unique index on it.

Internally, sqoop will create query like this

INSERT INTO table (id,email) VALUES (1,www.one.com) ON DUPLICATE KEY UPDATE email=www.one.com

and so on for all other values.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值