I have a file in hdfs and exporting it to sqoop table. please find the log details below:
Caused by: java.lang.RuntimeException: Can't parse input data: ' characters'
at tags.__loadFromFields(tags.java:335)
at tags.parse(tags.java:268)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89)
... 10 more
sqoop export command
sqoop export \
--connect "**************************************" \
--username=**** \
--password=***** \
--table tags \
--export-dir /user/cloudera/movie_lens/tags_no_header.csv \
--batch \
--input-lines-terminated-by '\n' \
--input-fields-terminated-by ',' \
--num-mappers 9 \
Table Structure :
create table tags
(userId integer
,movieId integer
,tag varchar(150)
,timestamp decimal
);
Record failing :
660,260,"imaginary world, characters, story, philosophical",1436680217
As per my understanding It's failing because of ambiguous parsing caused by comma ',' in middle of string.
Please help me to understand usage of --input-enclosed-by and --input-escaped-by arguments in this case or Is there any other solution.
解决方案
I have got it resolved using attribute --input-optionally-enclosed-by.
export command :
sqoop export \
--connect "jdbc:mysql://quickstart.cloudera:3306/movie_lens_db" \
--username=root \
--password=cloudera \
--table tags \
--export-dir /user/cloudera/escape_by_test.txt \
--batch \
--input-lines-terminated-by '\n' \
--input-fields-terminated-by ',' \
--input-optionally-enclosed-by '\"' \
--num-mappers 1 \
--outdir java_files
Table Data :
+--------+---------+---------------------------------------------------+------------+
| userId | movieId | tag | timestamp |
+--------+---------+---------------------------------------------------+------------+
| 660 | 260 | imaginary world, characters, story, philosophical | 1436680217 |
| 212 | 69712 | genuine characters | 1260688086 |
+--------+---------+---------------------------------------------------+------------+