第六届中国软件杯WIFI探针数据分析

Hello Spark_WIFIProbe_Analyse

基于Hadoop,Spark的WIFI探针大数据分析。

使用Scala语言 版本2.11.0

导入数据库,tanzhen
把lib包内容放入/home/example中

//分析数据

spark-submit --master spark://master:7077 --name DataAnalyse --class DataAnalyse --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar  /home/examples/WIFIAnalyse.jar

//存json数据

spark-submit --master spark://master:7077 --name JsonTanZhen --class JsonTanZhen --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar  /home/examples/WIFIAnalyse.jar hdfs://master:55555/input/data*.txt

探针数据分析–完整版

1客流量:店铺或区域整体客流及趋势
2入店量:进入店铺或区域的客流及趋势
3入店率:进⼊店铺或区域的客流占全部客流的比例及趋势
4驻店时长:进⼊店铺的顾客在店内的停留时长
5跳出率:进⼊店铺后很快离店的顾客及占比(占总体客流)
6深访率:进⼊店铺深度访问的顾客及占⽐(占总体客流)(可以根据定位轨迹或者停留时长判定)
7新老顾客:一定时间段内首次/两次以上进⼊店铺的顾客
8来访周期:进⼊店铺或区域的顾客距离上次来店的间隔
9顾客活跃度:按顾客距离上次来访间隔,划分为不同活跃度(高活跃度、中活跃度、低活跃度、沉睡活跃度)


客流量

1.id店铺昨日客流量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and id=?”

2.id店铺七天客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and id=?”

3.id店铺月客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and id=?”

4.id店铺上月客流量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and id=?”


入店量

5.id店铺昨日入店量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=?”

6.id店铺七日入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and ranges<=300 and id=?”

7.id店铺月入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and ranges<=300 and id=?”

8.id店铺上月入店量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and ranges<=300 and id=?”


入店率

客户端计算


驻店时长

9.id店铺昨日用户停留时间分段,sql=”select case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end as type,count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac) as total group by (case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end)”


新老顾客

10.id店铺昨日老顾客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count>1)as totals”


11.id店铺昨日新顾客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count=1)as totals”


跳出人数

12.id店铺昨日跳出率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=0 and cha<5”


深访人数

13.id店铺昨日深访率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=30”


来访周期

14.id店铺来访周期,
前七天活跃度,每天的人数

sql=”select case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end type,count(* ) as count from (select mac,count(*) as cha from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac)as totals group by (case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end)”


sql=”select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac”


顾客活跃度

客户端计算


  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 13
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值