自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(7)
  • 收藏
  • 关注

原创 Optimize map performamce with mapPartitions

As we can see in previous article "CSV Parser" we may need to create a new object for each record of an RDD as in123456    defmLine(line:String)={      valparser=

2015-01-26 13:55:18 281

原创 CSV Parser

Most of our data files are in CSV format. Although the String.split('\t') approach can handle a lot cases, there are CSV files which has quotes. In that case if a delimiter character is in between o

2015-01-24 14:18:31 369

原创 Partition by Hash on Keys

When an RDD object is created, it will partitioned to multiple pieces for parallel processing. If we have to join the RDD with other RDDs many times on some Key, we’d better partition the RDDs by the

2015-01-24 13:49:55 452

原创 Sample by a Hash Function (Scala)

It’s really common in Big Data ad hoc analysis we need to down sample the data. However for most of the cases, we need to down sample based on some hash function of a Key of the data. For example, to

2015-01-23 13:22:34 737

原创 Histogram with Spark (2) – Implicit class

As in the previous post we studied how to calculate the histogram on a RDD[String].By using implicit type conversion, we can add the helper method to the Map class and make the code looks better.

2015-01-22 14:22:16 424

原创 Histogram in Spark (1)

Spark’s DoubleRDDFunctions provide a histogram function for RDD[Double]. However there are no histogram function for RDD[String]. Here is a quick exercise for doing it. We will use immutable Map in th

2015-01-21 08:47:39 711

原创 Histogram in Scala

```.scalascala> val hist=Array("aa","bb","aa").foldLeft(Map[String,Int]()){| (m,c) => m + (c -> (m.getOrElse(c,0)+1))| }```or use the updated method of mutable Map```.scalascala>

2015-01-15 09:11:21 571

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除