2015年10月_breeze_lsw

12月 11月 10月 09月 08月 07月 06月 05月 04月 03月

原创 scala中使用smart-json将json与map相互转换

json解析工具使用的smart-json,曾经对比过java的Fastjson,gson。scala的json4s,lift-json。其中smart-json解析速度是最快的。编写环境 scala 2.10.4 smart-json 1.3.1直接上代码：import java.utilimport net.minidev.json.{JSONObject}import net.min

2015-10-22 22:50:43 11052 7

原创解决spark sql关联(join)查询使用“or“缓慢的问题

1.需求描述将a表的数据与b表的两个字段进行关联，输出结果a表数据约24亿条b表数据约30万条2.优化效果优化后执行时间从数天减少到数分钟3.资源配置spark 1.4.1200core,600G RAM4.代码简化版（优化前）sqlContext.sql("name,ip1,ip2 as ip from table_A where name is not null and ip2 is not n

2015-10-20 19:55:56 9310 1

原创 Spark排错与优化

一. 运维1. Master挂掉,standby重启也失效Master默认使用512M内存，当集群中运行的任务特别多时，就会挂掉，原因是master会读取每个task的event log日志去生成spark ui，内存不足自然会OOM，可以在master的运行日志中看到，通过HA启动的master自然也会因为这个原因失败。解决增加Master的内存占用，在Master节点...

2015-10-15 17:08:36 85874 13

原创 spark1.5 scala.collection.mutable.WrappedArray$ofRef cannot be cast to ...解决办法

下面是我在spark user list的求助贴,很快就得到了正确回答，有遇到问题的同学解决不了也可以去上面提问。I can use it under spark1.4.1,but error on spark1.5.1,how to deal with this problem? //define Schema val struct =StructType( StructFie

2015-10-13 10:58:07 7015

原创使用spark与ElasticSearch交互

因为业务需求将数据从es中导出写入到hdfs需要elasticsearch-hadoop的依赖，自行添加sbt或mvn依赖，也可直接下载jar使用，链接在下方。 elasticsearch-hadoop简单使用事例import org.elasticsearch.spark._val conf = new SparkConf().set("spark.es.nodes","192.168.47.

2015-10-09 21:57:11 25355 2

原创 spark standalone模式单节点启动多个executor

以前为了在一台机器上启动多个executor都是通过instance多个worker来实现的，因为standalone模式默认在一台worker上启动一个executor,造成了很大的不便。后来发现了另一种解决方法。设置参数设置每个executor使用的cpu数为4spark.executor.cores 4限制cpu使用数量，这里会启动3个executor(12/4)spark.cores.ma

2015-10-08 17:55:57 5772

spark_prometheus_metrics.json

博客https://blog.csdn.net/lsshlsw/article/details/82670508 spark_prometheus_metrics.json

2018-09-13

scala for spark

因为spark是用scala编写的，这里做了点总结，方便看spark源码。

2014-09-28

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人