spark on yarn 中文乱码

最近接了个云端日志项目,Logstash采集日志,抛到kafka集群,有sparkstreaming进行数据转,根据className查询redis获取相关提交人以及日期带出来,抛到rocketmq供其他部门使用,细节省略;
在spark-submit 提交代码到集群或者客户端运行的时候出现乱码,遇到这种情况,想到的就是编码问题,
因为在在本地运行spark streaming是可以处理中文的,放到集群上就不行了,kafka默认的utf-8,通过val encode=System.getProperty(“file.encoding”) 获取运行时的字符集是ANSI_X3.4-1968
乱码如下:
2018-03-10 11:42:54|changfei",“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status=true
jsonMess
{“errTime”:“2019-08-02 15:44:06,094”,“set”:“set4”,“module”:“report”,“ip”:“0.0.0.0”,“serverName”:“set4_report”,“className”:“RptZbzServiceImpl.java”,“traceId”:"",“message”:“2019-08-02 15:44:06,094 [ERROR] [DubboServerHandler-172.16.4.9:20886-thread-2148] c.y.d.r.s.impl.RptZbzServiceImpl [RptZbzServiceImpl.java : 672] ??ztdm:572068804763541504,btid:221,kjnd:2019,kjqj:6,???”,“errType”:"",“logLevel”:“ERROR”,“loggerName”:“c.y.d.r.s.impl.RptZbzServiceImpl”,“source”:"/data/tomcat-report/logs/yzf/report.log",“submitCode”:[“2018-05-24 16:33:33|zhangyong”,“2018-03-10 11:42:54|changfei”,“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status
=true
jsonMess
{“errTime”:“2019-08-02 15:44:06,109”,“set”:“set4”,“module”:“report”,“ip”:“0.0.0.0”,“serverName”:“set4_report”,“className”:“RptZbzServiceImpl.java”,“traceId”:"",“message”:“2019-08-02 15:44:06,109 [ERROR] [DubboServerHandler-172.16.4.9:20886-thread-2148] c.y.d.r.s.impl.RptZbzServiceImpl [RptZbzServiceImpl.java : 672] ??ztdm:572068804763541504,btid:221,kjnd:2019,kjqj:6,???”,“errType”:"",“logLevel”:“ERROR”,“loggerName”:“c.y.d.r.s.impl.RptZbzServiceImpl”,“source”:"/data/tomcat-report/logs/yzf/report.log",“submitCode”:[“2018-05-24 16:33:33|zhangyong”,“2018-03-10 11:42:54|changfei”,“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status=true
jsonMess
{“errTime”:“2019-08-02 15:44:06,132”,“set”:“set4”,“module”:“report”,“ip”:“0.0.0.0”,“serverName”:“set4_report”,“className”:“RptZbzServiceImpl.java”,“traceId”:"",“message”:“2019-08-02 15:44:06,132 [ERROR] [DubboServerHandler-172.16.4.9:20886-thread-2148] c.y.d.r.s.impl.RptZbzServiceImpl [RptZbzServiceImpl.java : 672] ??ztdm:572068804763541504,btid:221,kjnd:2019,kjqj:6,???”,“errType”:"",“logLevel”:“ERROR”,“loggerName”:“c.y.d.r.s.impl.RptZbzServiceImpl”,“source”:"/data/tomcat-report/logs/yzf/report.log",“submitCode”:[“2018-05-24 16:33:33|zhangyong”,“2018-03-10 11:42:54|changfei”,“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status
=true
jsonMess
{“errTime”:“2019-08-02 15:44:06,143”,“set”:“set4”,“module”:“report”,“ip”:“0.0.0.0”,“serverName”:“set4_report”,“className”:“RptZbzServiceImpl.java”,“traceId”:"",“message”:“2019-08-02 15:44:06,143 [ERROR] [DubboServerHandler-172.16.4.9:20886-thread-2162] c.y.d.r.s.impl.RptZbzServiceImpl [RptZbzServiceImpl.java : 672] ??ztdm:572091646859235328,btid:11002033,kjnd:2019,kjqj:7,???”,“errType”:"",“logLevel”:“ERROR”,“loggerName”:“c.y.d.r.s.impl.RptZbzServiceImpl”,“source”:"/data/tomcat-report/logs/yzf/report.log",“submitCode”:[“2018-05-24 16:33:33|zhangyong”,“2018-03-10 11:42:54|changfei”,“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status=true
jsonMess
{“errTime”:“2019-08-02 15:44:06,161”,“set”:“set4”,“module”:“report”,“ip”:“0.0.0.0”,“serverName”:“set4_report”,“className”:“RptZbzServiceImpl.java”,“traceId”:"",“message”:“2019-08-02 15:44:06,161 [ERROR] [DubboServerHandler-172.16.4.9:20886-thread-2148] c.y.d.r.s.impl.RptZbzServiceImpl [RptZbzServiceImpl.java : 672] ??ztdm:572068804763541504,btid:221,kjnd:2019,kjqj:6,???”,“errType”:"",“logLevel”:“ERROR”,“loggerName”:“c.y.d.r.s.impl.RptZbzServiceImpl”,“source”:"/data/tomcat-report/logs/yzf/report.log",“submitCode”:[“2018-05-24 16:33:33|zhangyong”,“2018-03-10 11:42:54|changfei”,“2017-12-28 17:40:15|lujie”,“2017-12-20 18:42:50|liulihe”,“2017-09-28 10:32:18|lengxiangwu”]}
encoding=ANSI_X3.4-1968
send_status
===true

之前怀疑操作系统字符集问题,找运维查了下kafka端操与spark集群操作字符集发现是一样的,kafka端自带消费者也是正常,排除系统问题;
发现转化成RDD后发现中文字符显示乱码,在网上搜了一圈,有的说修改spark-env.sh,添加export SPARK_JAVA_OPTS="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
在以client的方式提交代码,我发现也是乱码,有可能需要重启集群,因为是生产环境,未重启待验证;

解决:–conf spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"

bin/spark-submit
–class hbase.kafkaToJsonProduct
–master yarn
–deploy-mode cluster
–executor-memory 2G
–num-executors 1
–executor-cores 1
–conf spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"
–conf spark.streaming.backpressure.enabled=true
–conf spark.streaming.backpressure.initialRate=2
–conf spark.streaming.kafka.maxRatePerPartition=2
/home/hadoop/hbase_test-1.0.jar &

参考网址:–Spark Streaming 处理中文异常的解决方案
https://blog.csdn.net/kk303/article/details/52811363
https://blog.csdn.net/fjssharpsword/article/details/53305918
–spark on yarn 中文乱码问题,help?
https://cloud.tencent.com/developer/ask/191022

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值