前言
spark在操作dataset/dataframe时候,经常需要对每一行数据进行处理,像map/mapPartition/foreach/ foreachParition等,那么我们在拿到一行数据时候,如何从中拿取出我们想要的列,然后进行相关业务操作,经常摸不着头脑,本文基于spark 2.1.1分析了一行数据的表达,以及详细的讲解了各种操作拿取行中相应列数据的方法。
Row实战操作
根据api文档,Row有三种获取元素的方法,下面一一讲解并附一例子理解与实战,首先我们先建立一个DataSet。
scala> val data = List(("James ","","Smith","36636","M",60000), ("Michael ","Rose","","40288","M",70000), ("Robert ","","Williams","42114","",400000), ("Maria ","Anne","Jones","39192","F",500000), ("Jen","Mary","Brown","","F",0))
data: List[(String, String, String, String, String, Int)] = List(("James ","",Smith,36636,M,60000), ("Michael ",Rose,"",40288,M,70000),