Nutch抓取数据时报错如下
- 2016-05-13 19:31:55,415 WARN mapred.LocalJobRunner - job_local1852033656_0004
- java.lang.Exception: java.io.IOException: java.sql.BatchUpdateException: Incorrect string value: '\xF2\xA3\xAC\xB7\xEF\xBF...' for column 'text' at row 1
- at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
- Caused by: java.io.IOException: java.sql.BatchUpdateException: Incorrect string value: '\xF2\xA3\xAC\xB7\xEF\xBF...' for column 'text' at row 1
- at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
- at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
- at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
- at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
- at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
- at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
- at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
- at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
- at java.util.concurrent.FutureTask.run(FutureTask.java:266)
- at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
- at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
- at java.lang.Thread.run(Thread.java:745)
- Caused by: java.sql.BatchUpdateException: Incorrect string value: '\xF2\xA3\xAC\xB7\xEF\xBF...' for column 'text' at row 1
- at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:2028)
- at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1451)
- at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
- ... 11 more
- Caused by: java.sql.SQLException: Incorrect string value: '\xF2\xA3\xAC\xB7\xEF\xBF...' for column 'text' at row 1
- at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)
- at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3609)
- at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3541)
- at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2002)
- at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2163)
- at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2624)
- at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2127)
- at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2427)
- at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1980)
- ... 13 more
错误原因:
utf-8一个字符支持最多3个字节,而utf8mb4最多支持4个字节,上述的原因就是nutch配置的MYSQL数据库数据类型为utf-8,修改成utf8mb4即可