广告推销01

互联网广告背景

DSP原理图

在这里插入图片描述
DSP:各各广告主的代理商,帮助广告主投放广告,也是一个Web平台,可以存储广告主的诉求信息(目标用户画像)。
DMP:保存用户画像。

流程解析

1.当用户打开APP,APP会发送一条请求到Ad Exchange(广告交易平台),请求中携带有用户相关信息(userId);
2.一个Ad Exchange平台与多个DSP平台合作,Ad Exchange接收到APP的请求后,将用户信息发送至多个DSP平台;
3.DSP平台前面设有广告投放引擎,将接收到的用户信息与DMP用户定向相匹配;匹配成功后,该DSP平台参与竞价。
4.Ad Exchange平台根据竞价,将最合适的广告投放到App上,总时长大概200ms.

在这里插入图片描述

DMP

DMP做用户画像,依赖于大量的日志数据。
广告投放引擎与Ad Exchange有大量的数据交互,广告投放引擎会把用户数据保存下来。

项目流程图

在这里插入图片描述

日志文件转成parquet文件

普通的方法
package cn.dmp.tools

import cn.dmp.utils.{NBF, SchemaUtils}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}

/**
  * 将原始日志文件转换成parquet文件格式
  * 采用snappy压缩格式
  */
object Bzip2Parquet {

    def main(args: Array[String]): Unit = {
        // 0 校验参数个数
        if (args.length != 3) {
            println(
                """
                  |cn.dmp.tools.Bzip2Parquet
                  |参数:
                  | logInputPath
                  | compressionCode <snappy, gzip, lzo>
                  | resultOutputPath
                """.stripMargin)
            sys.exit()
        }

        // 1 接受程序参数
        val Array(logInputPath, compressionCode,resultOutputPath) = args

        // 2 创建sparkconf->sparkContext
        val sparkConf = new SparkConf()
        sparkConf.setAppName(s"${this.getClass.getSimpleName}")
        sparkConf.setMaster("local[*]")
        // RDD 序列化到磁盘 worker与worker之间的数据传输
        sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

        val sc = new SparkContext(sparkConf)

        val sQLContext = new SQLContext(sc)
        sQLContext.setConf("spark.sql.parquet.compression.codec", compressionCode)


        // 3 读取日志数据
        val rawdata = sc.textFile(logInputPath)

        // 4 根据业务需求对数据进行ETL  xxxx,x,x,x,x,,,,,
        val dataRow: RDD[Row] = rawdata
          .map(line => line.split(",", line.length))
          .filter(_.length >= 85)
          .map(arr => {
              Row(
                  arr(0),
                  NBF.toInt(arr(1)),
                  NBF.toInt(arr(2)),
                  NBF.toInt(arr(3)),
                  NBF.toInt(arr(4)),
                  arr(5),
                  arr(6),
                  NBF.toInt(arr(7)),
                  NBF.toInt(arr(8)),
                  NBF.toDouble(arr(9)),
                  NBF.toDouble(arr(10)),
                  arr(11),
                  arr(12),
                  arr(13),
                  arr(14),
                  arr(15),
                  arr(16),
                  NBF.toInt(arr(17)),
                  arr(18),
                  arr(19),
                  NBF.toInt(arr(20)),
                  NBF.toInt(arr(21)),
                  arr(22),
                  arr(23),
                  arr(24),
                  arr(25),
                  NBF.toInt(arr(26)),
                  arr(27),
                  NBF.toInt(arr(28)),
                  arr(29),
                  NBF.toInt(arr(30)),
                  NBF.toInt(arr(31)),
                  NBF.toInt(arr(32)),
                  arr(33),
                  NBF.toInt(arr(34)),
                  NBF.toInt(arr(35)),
                  NBF.toInt(arr(36)),
                  arr(37),
                  NBF.toInt(arr(38)),
                  NBF.toInt(arr(39)),
                  NBF.toDouble(arr(40)),
                  NBF.toDouble(arr(41)),
                  NBF.toInt(arr(42)),
                  arr(43),
                  NBF.toDouble(arr(44)),
                  NBF.toDouble(arr(45)),
                  arr(46),
                  arr(47),
                  arr(48),
                  arr(49),
                  arr(50),
                  arr(51),
                  arr(52),
                  arr(53),
                  arr(54),
                  arr(55),
                  arr(56),
                  NBF.toInt(arr(57)),
                  NBF.toDouble(arr(58)),
                  NBF.toInt(arr(59)),
                  NBF.toInt(arr(60)),
                  arr(61),
                  arr(62),
                  arr(63),
                  arr(64),
                  arr(65),
                  arr(66),
                  arr(67),
                  arr(68),
                  arr(69),
                  arr(70),
                  arr(71),
                  arr(72),
                  NBF.toInt(arr(73)),
                  NBF.toDouble(arr(74)),
                  NBF.toDouble(arr(75)),
                  NBF.toDouble(arr(76)),
                  NBF.toDouble(arr(77)),
                  NBF.toDouble(arr(78)),
                  arr(79),
                  arr(80),
                  arr(81),
                  arr(82),
                  arr(83),
                  NBF.toInt(arr(84))
              )
          })


        // 5 将结果存储到本地磁盘
        val dataFrame = sQLContext.createDataFrame(dataRow, SchemaUtils.logStructType)
        dataFrame.write.parquet(resultOutputPath)
        // 6 关闭sc
        sc.stop()
    }
}
package cn.dmp.utils

import org.apache.spark.sql.types._

object SchemaUtils {

    /**
      * 定义日志的Schema结构信息
      */
    val logStructType = StructType(Seq(
        StructField("sessionid", StringType),
        StructField("advertisersid", IntegerType),
        StructField("adorderid", IntegerType),
        StructField("adcreativeid", IntegerType),
        StructField("adplatformproviderid", IntegerType),
        StructField("sdkversion", StringType),
        StructField("adplatformkey", StringType),
        StructField("putinmodeltype", IntegerType),
        StructField("requestmode", IntegerType),
        StructField("adprice", DoubleType),
        StructField("adppprice", DoubleType),
        StructField("requestdate", StringType),
        StructField("ip", StringType),
        StructField("appid", StringType),
        StructField("appname", StringType),
        StructField("uuid", StringType),
        StructField("device", StringType),
        StructField("client", IntegerType),
        StructField("osversion", StringType),
        StructField("density", StringType),
        StructField("pw", IntegerType),
        StructField("ph", IntegerType),
        StructField("long", StringType),
        StructField("lat", StringType),
        StructField("provincename", StringType),
        StructField("cityname", StringType),
        StructField("ispid", IntegerType),
        StructField("ispname", StringType),
        StructField("networkmannerid", IntegerType),
        StructField("networkmannername", StringType),
        StructField("iseffective", IntegerType),
        StructField("isbilling", IntegerType),
        StructField("adspacetype", IntegerType),
        StructField("adspacetypename", StringType),
        StructField("devicetype", IntegerType),
        StructField("processnode", IntegerType),
        StructField("apptype", IntegerType),
        StructField("district", StringType),
        StructField("paymode", IntegerType),
        StructField("isbid", IntegerType),
        StructField("bidprice", DoubleType),
        StructField("winprice", DoubleType),
        StructField("iswin", IntegerType),
        StructField("cur", StringType),
        StructField("rate", DoubleType),
        StructField("cnywinprice", DoubleType),
        StructField("imei", StringType),
        StructField("mac", StringType),
        StructField("idfa", StringType),
        StructField("openudid", StringType),
        StructField("androidid", StringType),
        StructField("rtbprovince", StringType),
        StructField("rtbcity", StringType),
        StructField("rtbdistrict", StringType),
        StructField("rtbstreet", StringType),
        StructField("storeurl", StringType),
        StructField("realip", StringType),
        StructField("isqualityapp", IntegerType),
        StructField("bidfloor", DoubleType),
        StructField("aw", IntegerType),
        StructField("ah", IntegerType),
        StructField("imeimd5", StringType),
        StructField("macmd5", StringType),
        StructField("idfamd5", StringType),
        StructField("openudidmd5", StringType),
        StructField("androididmd5", StringType),
        StructField("imeisha1", StringType),
        StructField("macsha1", StringType),
        StructField("idfasha1", StringType),
        StructField("openudidsha1", StringType),
        StructField("androididsha1", StringType),
        StructField("uuidunknow", StringType),
        StructField("userid", StringType),
        StructField("iptype", IntegerType),
        StructField("initbidprice", DoubleType),
        StructField("adpayment", DoubleType),
        StructField("agentrate", DoubleType),
        StructField("lomarkrate", DoubleType),
        StructField("adxrate", DoubleType),
        StructField("title", StringType),
        StructField("keywords", StringType),
        StructField("tagid", StringType),
        StructField("callbackdate", StringType),
        StructField("channelid", StringType),
        StructField("mediatype", IntegerType)
    ))

}
package cn.dmp.utils

object NBF {


    def toInt(str: String): Int = {
        try {
            str.toInt
        } catch {
            case _: Exception => 0
        }
    }

    def toDouble(str: String): Double = {
        try {
            str.toDouble
        } catch {
            case _: Exception => 0
        }
    }
}
将数据封装到对象中
package cn.dmp.tools

import cn.dmp.beans.Log
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

/**
  * 日志转成parquet文件格式
  *
  * 使用自定义类的方式构建schema信息
  */
object Biz2ParquetV2 {

    def main(args: Array[String]): Unit = {

        // 0 校验参数个数
        if (args.length != 3) {
            println(
                """
                  |cn.dmp.tools.Bzip2Parquet
                  |参数:
                  | logInputPath
                  | compressionCode <snappy, gzip, lzo>
                  | resultOutputPath
                """.stripMargin)
            sys.exit()
        }

        // 1 接受程序参数
        val Array(logInputPath, compressionCode,resultOutputPath) = args

        // 2 创建sparkconf->sparkContext
        val sparkConf = new SparkConf()
        sparkConf.setAppName(s"${this.getClass.getSimpleName}")
        sparkConf.setMaster("local[*]")
        // RDD 序列化到磁盘 worker与worker之间的数据传输
        sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
        // 注册自定义类的序列化方式
        sparkConf.registerKryoClasses(Array(classOf[Log]))

        val sc = new SparkContext(sparkConf)

        val sQLContext = new SQLContext(sc)
        sQLContext.setConf("spark.sql.parquet.compression.codec", compressionCode)

        // 读取日志文件
        val dataLog: RDD[Log] = sc.textFile(logInputPath)
          .map(line => line.split(",", -1))
          .filter(_.length >= 85).map(arr => Log(arr))

        val dataFrame = sQLContext.createDataFrame(dataLog)

        // 按照省份名称及地市名称对数据进行分区
        dataFrame.write.partitionBy("provincename", "cityname").parquet(resultOutputPath)

        sc.stop()

    }


}

package cn.dmp.beans

import cn.dmp.utils.NBF

class Log(val sessionid: String,
          val advertisersid: Int,
          val adorderid: Int,
          val adcreativeid: Int,
          val adplatformproviderid: Int,
          val sdkversion: String,
          val adplatformkey: String,
          val putinmodeltype: Int,
          val requestmode: Int,
          val adprice: Double,
          val adppprice: Double,
          val requestdate: String,
          val ip: String,
          val appid: String,
          val appname: String,
          val uuid: String,
          val device: String,
          val client: Int,
          val osversion: String,
          val density: String,
          val pw: Int,
          val ph: Int,
          val long: String,
          val lat: String,
          val provincename: String,
          val cityname: String,
          val ispid: Int,
          val ispname: String,
          val networkmannerid: Int,
          val networkmannername: String,
          val iseffective: Int,
          val isbilling: Int,
          val adspacetype: Int,
          val adspacetypename: String,
          val devicetype: Int,
          val processnode: Int,
          val apptype: Int,
          val district: String,
          val paymode: Int,
          val isbid: Int,
          val bidprice: Double,
          val winprice: Double,
          val iswin: Int,
          val cur: String,
          val rate: Double,
          val cnywinprice: Double,
          val imei: String,
          val mac: String,
          val idfa: String,
          val openudid: String,
          val androidid: String,
          val rtbprovince: String,
          val rtbcity: String,
          val rtbdistrict: String,
          val rtbstreet: String,
          val storeurl: String,
          val realip: String,
          val isqualityapp: Int,
          val bidfloor: Double,
          val aw: Int,
          val ah: Int,
          val imeimd5: String,
          val macmd5: String,
          val idfamd5: String,
          val openudidmd5: String,
          val androididmd5: String,
          val imeisha1: String,
          val macsha1: String,
          val idfasha1: String,
          val openudidsha1: String,
          val androididsha1: String,
          val uuidunknow: String,
          val userid: String,
          val iptype: Int,
          val initbidprice: Double,
          val adpayment: Double,
          val agentrate: Double,
          val lomarkrate: Double,
          val adxrate: Double,
          val title: String,
          val keywords: String,
          val tagid: String,
          val callbackdate: String,
          val channelid: String,
          val mediatype: Int) extends Product with Serializable{

    // 角标和成员属性的映射关系
    override def productElement(n: Int): Any = n match {
        case 0	=> sessionid
        case 1	=> advertisersid
        case 2	=> adorderid
        case 3	=> adcreativeid
        case 4	=> adplatformproviderid
        case 5	=> sdkversion
        case 6	=> adplatformkey
        case 7	=> putinmodeltype
        case 8	=> requestmode
        case 9	=> adprice
        case 10	=> adppprice
        case 11	=> requestdate
        case 12	=> ip
        case 13	=> appid
        case 14	=> appname
        case 15	=> uuid
        case 16	=> device
        case 17	=> client
        case 18	=> osversion
        case 19	=> density
        case 20	=> pw
        case 21	=> ph
        case 22	=> long
        case 23	=> lat
        case 24	=> provincename
        case 25	=> cityname
        case 26	=> ispid
        case 27	=> ispname
        case 28	=> networkmannerid
        case 29	=> networkmannername
        case 30	=> iseffective
        case 31	=> isbilling
        case 32	=> adspacetype
        case 33	=> adspacetypename
        case 34	=> devicetype
        case 35	=> processnode
        case 36	=> apptype
        case 37	=> district
        case 38	=> paymode
        case 39	=> isbid
        case 40	=> bidprice
        case 41	=> winprice
        case 42	=> iswin
        case 43	=> cur
        case 44	=> rate
        case 45	=> cnywinprice
        case 46	=> imei
        case 47	=> mac
        case 48	=> idfa
        case 49	=> openudid
        case 50	=> androidid
        case 51	=> rtbprovince
        case 52	=> rtbcity
        case 53	=> rtbdistrict
        case 54	=> rtbstreet
        case 55	=> storeurl
        case 56	=> realip
        case 57	=> isqualityapp
        case 58	=> bidfloor
        case 59	=> aw
        case 60	=> ah
        case 61	=> imeimd5
        case 62	=> macmd5
        case 63	=> idfamd5
        case 64	=> openudidmd5
        case 65	=> androididmd5
        case 66	=> imeisha1
        case 67	=> macsha1
        case 68	=> idfasha1
        case 69	=> openudidsha1
        case 70	=> androididsha1
        case 71	=> uuidunknow
        case 72	=> userid
        case 73	=> iptype
        case 74	=> initbidprice
        case 75	=> adpayment
        case 76	=> agentrate
        case 77	=> lomarkrate
        case 78	=> adxrate
        case 79	=> title
        case 80	=> keywords
        case 81	=> tagid
        case 82	=> callbackdate
        case 83	=> channelid
        case 84	=> mediatype
    }

    // 对象一个又多少个成员属性
    override def productArity: Int = 85

    // 比较两个对象是否是同一个对象
    override def canEqual(that: Any): Boolean = that.isInstanceOf[Log]
}


object Log {
    def apply(arr: Array[String]): Log = new Log(
        arr(0),
        NBF.toInt(arr(1)),
        NBF.toInt(arr(2)),
        NBF.toInt(arr(3)),
        NBF.toInt(arr(4)),
        arr(5),
        arr(6),
        NBF.toInt(arr(7)),
        NBF.toInt(arr(8)),
        NBF.toDouble(arr(9)),
        NBF.toDouble(arr(10)),
        arr(11),
        arr(12),
        arr(13),
        arr(14),
        arr(15),
        arr(16),
        NBF.toInt(arr(17)),
        arr(18),
        arr(19),
        NBF.toInt(arr(20)),
        NBF.toInt(arr(21)),
        arr(22),
        arr(23),
        arr(24),
        arr(25),
        NBF.toInt(arr(26)),
        arr(27),
        NBF.toInt(arr(28)),
        arr(29),
        NBF.toInt(arr(30)),
        NBF.toInt(arr(31)),
        NBF.toInt(arr(32)),
        arr(33),
        NBF.toInt(arr(34)),
        NBF.toInt(arr(35)),
        NBF.toInt(arr(36)),
        arr(37),
        NBF.toInt(arr(38)),
        NBF.toInt(arr(39)),
        NBF.toDouble(arr(40)),
        NBF.toDouble(arr(41)),
        NBF.toInt(arr(42)),
        arr(43),
        NBF.toDouble(arr(44)),
        NBF.toDouble(arr(45)),
        arr(46),
        arr(47),
        arr(48),
        arr(49),
        arr(50),
        arr(51),
        arr(52),
        arr(53),
        arr(54),
        arr(55),
        arr(56),
        NBF.toInt(arr(57)),
        NBF.toDouble(arr(58)),
        NBF.toInt(arr(59)),
        NBF.toInt(arr(60)),
        arr(61),
        arr(62),
        arr(63),
        arr(64),
        arr(65),
        arr(66),
        arr(67),
        arr(68),
        arr(69),
        arr(70),
        arr(71),
        arr(72),
        NBF.toInt(arr(73)),
        NBF.toDouble(arr(74)),
        NBF.toDouble(arr(75)),
        NBF.toDouble(arr(76)),
        NBF.toDouble(arr(77)),
        NBF.toDouble(arr(78)),
        arr(79),
        arr(80),
        arr(81),
        arr(82),
        arr(83),
        NBF.toInt(arr(84))
    )
}

在这里插入图片描述

在这里插入图片描述

性能调优

有Spark的jvm性能调优经验吗?
job中有Shuffle就一定运行的慢吗?

nohup spark-submit --class com.initialize.dmp.log.Biz2Parquet --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 8g --executor-cores 1 --num-executors 20 /dmp-1.0.jar /adlogs/biz2/* /parquet_20_3 &

nohup spark-submit --class com.initialize.dmp.log.Biz2Parquet --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 8g --executor-cores 5 --num-executors 20 /home/hdfs/dmp-1.0.jar /adlogs/biz2/* /parquet_20_7 300 &

增加任务的并行度
executor-memory:的大小和num-executors有关系,他们的乘积不能大于集群中总的内容容量大小。
注意:做乘积的时候,executor-memory的多+1g
executor-cores:
executor-cores * num - executors <= 集群中的总的核数容量
一个executor如果只分配了一个核的话,在这个executor中的线程数量统一时刻只能有一个(Task),并且是串行。
如果executor分配了N核,在这个executor中的task都是并行的,并行的最大数量就是N
num-executors
申请的总的executor数量,executor的数量最好和分区数量成倍数关系
partitionNum

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值