2021.9.9滴滴数据分析实习生笔试_数据分析实习笔试-CSDN博客

本文链接：https://blog.csdn.net/qq_43656500/article/details/120241152

一、笔试形式

可能是因为招聘实习生，所以笔试不是很正规，是一个hr实习生加我微信后直接把题目以PDF的形式发给我的，要求在1小时到1.5小时把答案写在word里发给她（没有给我任何excel），PDF里也没有对题目给出的图片里的英文代号给出解释，我一直很懵。

声明一下：这是我自己写的答案，未必正确，我自己在mysql里验证过可运行~欢迎大家一起讨论~

二、笔试内容

第一题

自己建对应的表（验证代码是否正确）

create table u_log ( 
	day date not null, 
    login_timestamp timestamp not null, 
	uid int(2) not null, 
	gender char(1) not null,
	age integer not null
);

插入数据

insert into u_log values ('2020/8/1', '2020/8/1 11:00',1,'F',31),
    ('2020/8/1', '2020/8/1 14:00',1,'F',31),
    ('2020/8/1', '2020/8/1 19:00',3,'M',23),
    ('2020/8/1', '2020/8/1 14:00',1,'F',31),
    ('2020/8/1', '2020/8/1 20:00',2,'M',19),
    ('2020/8/1', '2020/8/1 21:00',4,'M',54),
    ('2020/8/1', '2020/8/1 21:05',5,'M',55);

得到数据库中表格

1，列出女性用户uid只显示前200

select distinct uid from u_log where gender='F' order by uid limit 200 ;

2.按年龄分组统计人数

（1）分成 <20，20-40，41-60，>60

首先添加一列一年龄分组


select *,
	(case 
		when 0<=age and age < 20 then '<20'
		when 20<=age and age <40 then '20-40'
		when 40<age and age <= 60 then '41-60'
		else  '>60'
		end) as age_level
from u_log ;

得到查询结果为

分组统计得到结果

select age_level,count(*)
from (
	   select *,
			(case 
				when 0<=age and age < 20 then '<20'
				when 20<=age and age <40 then '20-40'
				when 40<age and age <= 60 then '41-60'
				else  '>60'
				end) as age_level
		from u_log ) a
group by age_level;

查询结果

第二题

我认为bikes表和users在这道题中没有用到所以我只建了trips,region,promotion三张表

建表语句

create table trips ( 
	id int(11) not null primary key default 0, 
    user_id int(11) not null default 0,
    bike_id int(11) not null default 0,
    status varchar(191) not null default '0',
	started_at datetime default '0000-00-00', 
	completed_at datetime default '0000-00-00',
	region_id int(11)  default 0
);

create table regions (
    id int(11) not null primary key default 0,
    name varchar(191) 
);

create table promotions (
    id int(11) not null primary key default 0,
    p_name varchar(191), 
    start_at datetime,
    end_at datetime);

插入数据

insert into trips values(1,001,545,'completed','2020-08-01 09:00','2020-08-01 09:05',2),(2,010,589,'completed','2020-08-01 10:00','2020-08-01 09:02',1),
(3,024,245,'failed','2020-08-02 08:00','2020-08-02 08:05',1),(4,001,556,'completed','2020-08-02 09:00','2020-08-01 09:05',2),
(5,054,123,'completed','2020-08-03 20:00','2020-08-03 21:05',1),(6,078,545,'completed','2020-08-10 19:00','2020-08-10 19:05',1),
(7,011,111,'completed','2020-08-11 02:00','2020-08-11 02:05',2),(8,001,545,'started','2020-09-11 09:00','2020-09-11 09:05',2);

insert into regions values(1,'浦东新区'),(2,'城北高新区');

insert into promotions values(1,'8月大促','2020-08-01','2020-08-31');

1.促销活动期间，有多少用户和订单（插入数据时把活动名写成了8月大促）

select r.name,count(distinct user_id),count(r.id)
from trips t 
join regions r 
on t.region_id = r.id
where started_at between 
    (select start_at from promotions where p_name = '8月大促')
    and 
    (select end_at from promotions where p_name = '8月大促')
group by r.name;

2.首日使用情况占比

select concat(
				round(sum(case when day(started_at) =
                           (select day(start_at) from promotions 
                            where p_name = '8月大促') then 1 else 0 end)
                        /
				      sum(case when started_at between 
                            (select start_at from promotions 
                              where p_name = '8月大促') and (select end_at from promotions where p_name = '8月大促') 
						then 1 else 0 end)*100
                 ,2)
        ,'%') as a
from trips;

第三题

建表语句

create table completes_order_info (
    city_name varchar(191),
    completed_order_nums int(3),
    date date
);

插入数据

insert into completes_order_info values ('杭州',207,'2020-7-1'),('成都',178,'2020-7-1'),
    ('天津',57,'2020-7-01'),('重庆',214,'2020-7-01'),
    ('杭州',62,'2020-7-02'),('成都',111,'2020-7-02'),
    ('天津',73,'2020-7-02'),('重庆',60,'2020-7-02'),
    ('重庆',103,'2020-7-03'),('杭州',63,'2020-7-03');

1.各城市完单量最高的单量和日期

select a.city_name,a.completed_order_nums,a.date 
from(
		select *, row_number()over(partition by city_name order by completed_order_nums desc) as rn
		from completes_order_info) a
where rn = 1;

第四题

查找2021-01-04到2021-01-10每个完成状态为1的订单详情和该用户上一次、下一次订单完成时间

第五题

我自己建了一个表命名为python考察.csv

1.添加Attack, Defense, Sp.Atk, Sp.Def, Speed，Generation指标的汇总值（Total）这一列，输出Type1取值为Charmeleon的Total取值前十的数据。

import pandas as pd
#读入数据
data = pd.read_csv(open('C:/Users/dell/Desktop/文件夹/滴滴笔试9.9/python考察.csv',encoding = 'UTF-8'))
#添加汇总列
data['total'] = data[['Attack','Dedense','Sp.Atk','Sp.Def','Speed','Generation']].apply( lambda x: x.sum(),axis=1)
#取出前10行
data[data['type1']=='chameleon'][:10]

data为

得到结果