Spark 异常:Trying to write more fields than contained in row



将json转为row落地存储为parquet:
  1. for type_name in types.value:
  2.             print(type_name)
  3.             type_data_set = lines.filter(lambda line: line['type'] == type_name)
  4.             type_row = type_data_set.map(lambda line: Row(**line))
  5.             schema_row = self.sqlContext.createDataFrame(type_row)

  6.             schema_row.write.mode('overwrite').parquet(
  7.                 'hdfs://ip:port/parquet/%s/year=%s/month=%s/day=%s/hour=%s' % \
  8.                 (type_name, self.year, self.month, self.day, self.hour)
  9.             )

异常:

  1. Caused by: java.lang.IndexOutOfBoundsException: Trying to write more fields than contained in row (15 > 12)
  2.         at org.apache.spark.sql.execution.datasources.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:261)
  3.         at org.apache.spark.sql.execution.datasources.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:257)
  4.         at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
  5.         at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
  6.         at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
  7.         at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.writeInternal(ParquetRelation.scala:99)
  8.         at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:242)
  9.         ... 8 more
解决:
同为zi类型的两条记录一条12个字段,另一条15个字段
  1. {"time":"2016-06-06 17:25:14","message":{"channel":3,"containerId":"16","sendUserId":"2611","objectName":"RC:TxtMsg","count":49,"type":"zi","uuid":"-1","appId":"100000","nodeId":"GRM_NODE_0","userId":"2611","time":1465205114814,"ipAddress":"0","sdkVersion":"2.6.2","osName":"Android","deviceId":"0"}}
  2. {"time":"2016-06-06 17:41:31","message":{"channel":0,"count":0,"type":"zi","uuid":"","appId":"100000","nodeId":"MSG_NODE_2","userId":"2626","time":1465206091272,"ipAddress":"0","sdkVersion":"2.6.1","osName":"0","deviceId":"1"}}



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29754888/viewspace-2119617/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/29754888/viewspace-2119617/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值