sql练习一(蚂蚁森林)_蚂蚁森林sql练习题-CSDN博客

本文链接：https://blog.csdn.net/qq_42506914/article/details/88613097

背景说明：
以下表记录了用户每天的蚂蚁森林低碳生活领取的记录流水。
table_name：user_low_carbon  

seq（key） user_id data_dt  low_carbon
流水号     用户     日期      减少碳排放（g）
xxxxx01   u_001   2017/1/1  10
xxxxx02   u_001   2017/1/2  150
xxxxx03   u_001   2017/1/2  110
xxxxx04   u_001   2017/1/2  10
xxxxx05   u_001   2017/1/4  50
xxxxx06   u_001   2017/1/4  10
xxxxx07   u_001   2017/1/6  45
xxxxx08   u_001   2017/1/6  90
xxxxx09   u_002   2017/1/1  10
xxxxx10   u_002   2017/1/2  150
xxxxx11   u_002   2017/1/2  70
xxxxx12   u_002   2017/1/3  30
xxxxx13   u_002   2017/1/3  80
xxxxx14   u_002   2017/1/4  150
xxxxx14   u_002   2017/1/5  101
xxxxx15   u_002   2017/1/6  68
xxxxx16   u_002   2017/1/6  120


蚂蚁森林植物换购表，用于记录申领环保植物所需要减少的碳排放量
table_name:  plant_carbon   

plant_id plant_name low_carbon
植物编号  植物名       换购植物所需要的碳
p001     梭梭树       17900
p002     沙柳         19680
p003     樟子树       146210
p004     胡杨         215680

----题目
1、蚂蚁森林植物申领统计
问题：假设2017年1月1日开始记录低碳数据（user_low_carbon），假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨，
剩余的能量全部用来领取“p002-沙柳” 。
统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息；以及他比后一名多领了几颗沙柳。
得到的统计结果如下表样式：
user_id  plant_count less_count(比后一名多领了几颗沙柳)
u_101    1000         100
u_088    900          400
u_103    500          …


2、蚂蚁森林低碳用户排名分析
问题：查询user_low_carbon表中每日流水记录，条件为：
用户在2017年，连续三天（或以上）的天数里，
每天减少碳排放（low_carbon）都超过100g的用户低碳流水。
需要查询返回满足以上条件的user_low_carbon表中的记录流水。
例如用户u_002符合条件的记录如下，因为2017/1/2~2017/1/5连续四天的碳排放量之和都大于等于100g：
seq（key） user_id data_dt  low_carbon
xxxxx10    u_002  2017/1/2  150
xxxxx11    u_002  2017/1/2  70
xxxxx12    u_002  2017/1/3  30
xxxxx13    u_002  2017/1/3  80
xxxxx14    u_002  2017/1/4  150
xxxxx14    u_002  2017/1/5  101
备注：统计方法不限于sql、procedure、python,java等

数据准备

plant_carbon

p001	梭梭树	17
p002	沙柳	19
p003	樟子树	146
p004	胡杨	215

user_low_carbon 简单处理把流水号去掉了

u_001	2017/1/1	10
u_001	2017/1/2	150
u_001	2017/1/2	110
u_001	2017/1/2	10
u_001	2017/1/4	50
u_001	2017/1/4	10
u_001	2017/1/6	45
u_001	2017/1/6	90
u_002	2017/1/1	10
u_002	2017/1/2	150
u_002	2017/1/2	70
u_002	2017/1/3	30
u_002	2017/1/3	80
u_002	2017/1/4	150
u_002	2017/1/5	101
u_002	2017/1/6	68
u_003	2017/1/1	20
u_003	2017/1/2	10
u_003	2017/1/2	150
u_003	2017/1/3	160
u_003	2017/1/4	20
u_003	2017/1/5	120
u_003	2017/1/6	20
u_003	2017/1/7	10
u_003	2017/1/7	110
u_004	2017/1/1	110
u_004	2017/1/2	20
u_004	2017/1/2	50
u_004	2017/1/3	120
u_004	2017/1/4	30
u_004	2017/1/5	60
u_004	2017/1/6	120
u_004	2017/1/7	10
u_004	2017/1/7	120
u_005	2017/1/1	80
u_005	2017/1/2	50
u_005	2017/1/2	80
u_005	2017/1/3	180
u_005	2017/1/4	180
u_005	2017/1/4	10
u_005	2017/1/5	80
u_005	2017/1/6	280
u_005	2017/1/7	80
u_005	2017/1/7	80
u_006	2017/1/1	40
u_006	2017/1/2	40
u_006	2017/1/2	140
u_006	2017/1/3	210
u_006	2017/1/3	10
u_006	2017/1/4	40
u_006	2017/1/5	40
u_006	2017/1/6	20
u_006	2017/1/7	50
u_006	2017/1/7	240
u_007	2017/1/1	130
u_007	2017/1/2	30
u_007	2017/1/2	330
u_007	2017/1/3	30
u_007	2017/1/4	530
u_007	2017/1/5	30
u_007	2017/1/6	230
u_007	2017/1/7	130
u_007	2017/1/7	30
u_008	2017/1/1	160
u_008	2017/1/2	60
u_008	2017/1/2	60
u_008	2017/1/3	60
u_008	2017/1/4	260
u_008	2017/1/5	360
u_008	2017/1/6	160
u_008	2017/1/7	60
u_008	2017/1/7	60
u_009	2017/1/1	70
u_009	2017/1/2	70
u_009	2017/1/2	70
u_009	2017/1/3	170
u_009	2017/1/4	270
u_009	2017/1/5	70
u_009	2017/1/6	70
u_009	2017/1/7	70
u_009	2017/1/7	70
u_010	2017/1/1	90
u_010	2017/1/2	90
u_010	2017/1/2	90
u_010	2017/1/3	90
u_010	2017/1/4	90
u_010	2017/1/4	80
u_010	2017/1/5	90
u_010	2017/1/5	90
u_010	2017/1/6	190
u_010	2017/1/7	90
u_010	2017/1/7	90
u_011	2017/1/1	110
u_011	2017/1/2	100
u_011	2017/1/2	100
u_011	2017/1/3	120
u_011	2017/1/4	100
u_011	2017/1/5	100
u_011	2017/1/6	100
u_011	2017/1/7	130
u_011	2017/1/7	100
u_012	2017/1/1	10
u_012	2017/1/2	120
u_012	2017/1/2	10
u_012	2017/1/3	10
u_012	2017/1/4	50
u_012	2017/1/5	10
u_012	2017/1/6	20
u_012	2017/1/7	10
u_012	2017/1/7	10
u_013	2017/1/1	50
u_013	2017/1/2	150
u_013	2017/1/2	50
u_013	2017/1/3	150
u_013	2017/1/4	550
u_013	2017/1/5	350
u_013	2017/1/6	50
u_013	2017/1/7	20
u_013	2017/1/7	60
u_014	2017/1/1	220
u_014	2017/1/2	120
u_014	2017/1/2	20
u_014	2017/1/3	20
u_014	2017/1/4	20
u_014	2017/1/5	250
u_014	2017/1/6	120
u_014	2017/1/7	270
u_014	2017/1/7	20
u_015	2017/1/1	10
u_015	2017/1/2	20
u_015	2017/1/2	10
u_015	2017/1/3	10
u_015	2017/1/4	20
u_015	2017/1/5	70
u_015	2017/1/6	10
u_015	2017/1/7	80
u_015	2017/1/7	60

第一题

/*
regexp_replace 日期的格式 2017/1/4 是不能用日期函数的要是 2017-1-4 regexp_replace(data_dt,'/','-')
datediff 日期取差值
date_sub 日期与数字相减
date_add 日期与数字相加
floor数学函数 floor() 向下取整
*/

1、先查出所有用户在10月1日前的能量总和

select
user_id,
sum(low_carbon) sum_carbon
from user_low_carbon
where datediff(regexp_replace(data_dt,'/','-'),"2017-10-1")<0
group by user_id ; t1

2、查出胡杨，沙柳各需要的

select
low_carbon
from plant_carbon
where plant_id = 'p004' ; t2

select
low_carbon
from plant_carbon
where plant_id = 'p002' ;t3

3、查出沙柳数的前11名

select
user_id,
floor((sum_carbon-t2.low_carbon)/(t3.low_carbon)) treecount
from
t1,t2,t3
order by treecount desc
limit 11 ;t4

4、查出前一名比后一名多多少棵,取前10名
select
user_id,
treecount,
(treecount - lead(treecount)over(order by treecount desc))
from
t4
limit 10

//结合
select
user_id,
treecount,
(treecount - lead(treecount)over(order by treecount desc))
from
(
select
user_id,
floor((sum_carbon-t2.low_carbon)/(t3.low_carbon)) treecount
from
(
select
user_id,
sum(low_carbon) sum_carbon
from user_low_carbon
where datediff(regexp_replace(data_dt,'/','-'),"2017-10-1")<0
group by user_id
)t1,
(
select
low_carbon
from plant_carbon
where plant_id = 'p004'
)t2,
(
select
low_carbon
from plant_carbon
where plant_id = 'p002'
)t3
order by treecount desc
limit 11
)t4
limit 10

第二题

1、首先查出每个用户满足减少碳排放大于 100g的 user_id data_dt

2、找出满足连续三天的 user_id 和 data_dt（难点）

3、有了满足条件的user_id 和data_dt就能够查到流水了

第一步

select 
user_id,
data_dt,
sum(low_carbon) day_sum_carbon
from
user_low_carbon
group by user_id,data_dt
having day_sum_carbon >=100
order by  user_id ,data_dt  ;t1

第二步

方法一，

思路：每个用户，每条记录，和先2条的差要是 2，先一条的是1，后一条的是-1，后两条的是-2

这个方法比较麻烦

select
user_id,
data_dt,
day_sum_carbon,
datediff(regexp_replace(data_dt,'/','-'),regexp_replace(lag(data_dt,2,"1970-1-1")over(partition by user_id order by data_dt),'/','-')) lag2,
datediff(regexp_replace(data_dt,'/','-'),regexp_replace(lag(data_dt,1,"1970-1-1")over(partition by user_id order by data_dt),'/','-')) lag1,
datediff(regexp_replace(data_dt,'/','-'),regexp_replace(lead(data_dt,1,"1970-1-1")over(partition by user_id order by data_dt),'/','-')) lead1,
datediff(regexp_replace(data_dt,'/','-'),regexp_replace(lead(data_dt,2,"1970-1-1")over(partition by user_id order by data_dt),'/','-')) lead2
from
t1   ;t2

select
user_id,
data_dt,
day_sum_carbon
from
t2
where 
lag2 =2 and lag1= 1
or 
lag1 =1 and lead1 = -1
or
lead1=-1 and lead2=-2   ;t3

结果

user_id	data_dt	day_sum_carbon
u_002	2017/1/2	220
u_002	2017/1/3	110
u_002	2017/1/4	150
u_002	2017/1/5	101
u_005	2017/1/2	130
u_005	2017/1/3	180
u_005	2017/1/4	190
u_008	2017/1/4	260
u_008	2017/1/5	360
u_008	2017/1/6	160
u_008	2017/1/7	120
u_009	2017/1/2	140
u_009	2017/1/3	170
u_009	2017/1/4	270
u_010	2017/1/4	170
u_010	2017/1/5	180
u_010	2017/1/6	190
u_010	2017/1/7	180
u_011	2017/1/1	110
u_011	2017/1/2	200
u_011	2017/1/3	120
u_011	2017/1/4	100
u_011	2017/1/5	100
u_011	2017/1/6	100
u_011	2017/1/7	230
u_013	2017/1/2	200
u_013	2017/1/3	150
u_013	2017/1/4	550
u_013	2017/1/5	350
u_014	2017/1/5	250
u_014	2017/1/6	120
u_014	2017/1/7	290

方法二：

上一种的方法，如果要求连续10天的，那判断要10个，越多越麻烦

举例

1    1    0
3    2    1
4    3    1
5    4    1
8    5    3
9    6    3
10    7    3

假如第一列是日期，第二列是一个递增的数列，后面是差

那么差相等的证明日期是连续的

我们只需要在每一条记录开窗一样的值有几个判断大于等于3 就好了


select 
user_id,
data_dt,
row_number()over(partition by user_id order by data_dt)  no,
sum(low_carbon) day_sum_carbon
from
user_low_carbon
group by user_id,data_dt
having day_sum_carbon >=100  ;t1



select
user_id,
data_dt,
day_sum_carbon,
date_sub(regexp_replace(data_dt,'/','-'),no)  sub
from
t1    ;t2


select
user_id,
data_dt,
day_sum_carbon,
count(*) over(partition by user_id ,sub)  continue_Days
from t2 ;t3



select
user_id,
data_dt,
day_sum_carbon
from t3
where continue_Days >=3
order by user_id,data_dt

这种如果要求连续的日期多，并不会造成判断条件多，简洁点，但是执行的效率并没有前一个高，他3个job，而前面一个2个job

方法三：

写mr来实现,只写第二步，怎么从已经满足100克以上提取出连续3天的userId和data_dt

public class MayiInfo implements WritableComparable<MayiInfo> {
    String id;
    String date;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getDate() {
        return date;
    }

    public void setDate(String date) {
        this.date = date;
    }

    @Override
    public void write(DataOutput out) throws IOException {
     out.writeUTF(id);
     out.writeUTF(date);

    }

    @Override
    public void readFields(DataInput in) throws IOException {
        id = in.readUTF();
        date = in.readUTF();
    }

    @Override
    public String toString() {
        return id + "\t" + date;
    }

    //map  排序
    @Override
    public int compareTo(MayiInfo mayiInfo) {
        int result = 0;
        result = this.id.compareTo(mayiInfo.id);
        if(result == 0) {
            SimpleDateFormat sdf = new SimpleDateFormat("yyyy/MM/dd");
            try {
                Date d1 = sdf.parse(this.date);
                Date d2 = sdf.parse(mayiInfo.date);
                result = d1.compareTo(d2);
            } catch (ParseException e) {
                e.printStackTrace();
            }

        }
        return result;
    }
}

public class MyGroup extends WritableComparator {

    public MyGroup() {
        super(MayiInfo.class,true);
    }
//自定义分组，按user_id
    @Override
    public int compare(WritableComparable a, WritableComparable b) {
        MayiInfo m1 = (MayiInfo) a;
        MayiInfo m2 = (MayiInfo) b;

        int  result = m1.getId().compareTo(m2.getId());
        if(result > 0){
            result = 1;
        }else if(result<0) {
            result = -1;
        }else {
            result = 0 ;
        }

        return result;
    }
}

public class Stage1Reducer extends Reducer<MayiInfo, MayiInfo, MayiInfo, NullWritable> {
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy/MM/dd");

    @Override
    protected void reduce(MayiInfo key, Iterable<MayiInfo> values, Context context) throws IOException, InterruptedException {

        ArrayList<MayiInfo> list = new ArrayList<MayiInfo>();

        Date date = null;


      try {
          for (MayiInfo value : values) {
              MayiInfo mayiInfo = new MayiInfo();

            BeanUtils.copyProperties(mayiInfo,value);

              Date d1 = sdf.parse(mayiInfo.getDate());

              if (date == null) {
                  date = d1;
                  list.add(mayiInfo);
                  continue;
              }

              if(myDateDiff(d1,date) == 1){
                  date = d1;
                  list.add(mayiInfo);
              }else{
                  date = d1;
                  if(list.size() >=3){
                      for (MayiInfo m : list) {
                          context.write(m,NullWritable.get());
                      }
                  }
                  list.clear();
              }



          }
          if(list.size() >=3){
              for (MayiInfo m1 : list) {
                  context.write(m1,NullWritable.get());
              }
          }
          list.clear();


      }catch (Exception e){
          e.printStackTrace();
      }


    }

    public int myDateDiff(Date d1,Date d2){
        int result = 0 ;

        result = d1.getYear() - d2.getYear();

       result =  result == 0 ? d1.getMonth() - d2.getMonth() : result;

       result = result ==0?d1.getDate() -d2.getDate() : result;

       return result;



    }
}