最近项目中在做数据存储的时候,由于数据格式不一样,都存到redis中,然后就写了一个RedisStore,在这里去判断数据是从哪里来的,
eg:
val data = inputStage.getOutput.data
后面想想,这种做法其实是不合理的,为什么?
1. RedisStore这个类所做的工作应该只是将key-value这种键值对的数据保存到redis中,其实不应该去关注自身的数据来源,不管数据是从哪个类结果传过来的,对这个类来说,应该只是把它存起来而已
这就引发本文要说明的问题,不同类输出的数据格式在哪里做转换?
这里就使用了function2 函数
function2是带两个参数的函数,它的声明需要三个泛型参数,前两个是入参类型,第三个是返回数据类型
eg:
var recParser:Function2[String, Row, (String, scala.collection.immutable.Map[String, String])] = _具体转化操作如下:
import org.apache.spark.sql.Row
object ItemSimRecParser extends Function2[String, Row, (String, Map[String, String])] with Logging { override def apply(app: String, r: Row): (String, Map[String, String]) = { debugger.print("enter parseKeyValue meythod and db is 1") val key = r.getString(0) val items: String = r.getAs[Seq[Row]](1).map(_.getString(0)).mkString(";") (key, Map(app -> items)) } } object ItemSimRecParserByCate extends Function2[String, Row, (String, Map[String, String])] with Logging { override def apply(app: String, r: Row): (String, Map[String, String]) = { debugger.print("enter parseKeyValue method and db is 1") val key = r.getString(0) val items: String = r.getAs[Seq[Row]](2).map(_.getString(0)).mkString(";") (key, Map(app -> items)) } } object HotRecParser extends Function2[String, Row, (String, Map[String, String])] with Logging { override def apply(app: String, r: Row): (String, Map[String, String]) = { debugger.print("enter parseKeyValue meythod and db is 2") val cate = r.getString(0) val items: String = r.getAs[Seq[Row]](1).map(_.getString(0)).mkString(";") // (app, Map(cate -> items)) (app, Map(cate -> items)) } }
然后在redisStore中就是直接取,操作如下:
class RedisStore extends FlowStage{ var client:RedisClient = _ var maxBatch:Int = _ var db:Int = 0 override def run(context: AwiseContext): Unit = { val config = context.getConfiguration db = context.getRedisDB client = new RedisClient(config, db) maxBatch = config.getInt("redis.max.batch", 100) implicit val expire = config.getInt("rec.expire", 86400) val data = inputStage.getOutput.data debugger.show(data) client.connect() if(data != null){ val datas: Array[Row] = data.collect() //save all item sims to appcode save(client,datas,context) } client.close() } def save(client:RedisClient,datas:Array[Row],context: AwiseContext): Unit ={ var batchNum = 1 println("enter save method and allCateIds is true... ") datas.foreach(r =>{ val rs: (String, Map[String, String]) = context.recParser(context.getAppCode, r) client.batchHmset(rs._1, rs._2) if(batchNum >= maxBatch){ client.flush() batchNum = 0 } batchNum += 1 }) } override def run_(context: AwiseContext): Unit = { } }
总结:
1.本次在RedisStore所做的重构让本人更深入了解到某个类所做的职责是什么,不是职责范围内的应该移除出去
2.学习了function2函数的使用, 它是一个trait,可以继承它,然后重写apply()方法去实现具体的操作
function2函数需要3个泛型参数,前两个是输入参数类型,第三个是返回值类型