1、修改时数据字段和顺序一致,但是一直在将 nvarchar 值 'Pension Benefits Current/Prior Service Cost' 转换成数据类型 int 时失败
在做修改数据的时候,修改字段的顺序和表结构顺序不一致,导致出错;实际做更新的时候,是按照表结构的顺序修改数据的
2、报错时无法显示所在行的错误数据
-Dorg.apache.sqoop.export.text.dump_data_on_error=true
3、Sqoop数据导出一致性问题
1)场景1:如Sqoop在导出到Mysql时,使用4个Map任务,过程中有2个任务失败,那此时MySQL中存储了另外两个Map任务导入的数据,此时老板正好看到了这个报表数据。而开发工程师发现任务失败后,会调试问题并最终将全部数据正确的导入MySQL,那后面老板再次看报表数据,发现本次看到的数据与之前的不一致,这在生产环境是不允许的。
官网:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
Since Sqoop breaks down export process into multiple transactions, it is possible that a failed export job may result in partial data being committed to the database. This can further lead to subsequent jobs failing due to insert collisions in some cases, or lead to duplicated data in others. You can overcome this problem by specifying a staging table via the --staging-table option which acts as an auxiliary table that is used to stage exported data. The staged data is finally moved to the destination table in a single transaction.
多个Map任务时,采用–staging-table方式,解决数据一致性问题。
--staging-table
--clear-staging
sqoop是MR任务。耗时较长,有可能失败,所以要做好事务一致性处理,他有两个参数可以利用--staging-table,--clear-staging-table
。