一、数据分析_第一次练习
【第一题】
1、创建db名字叫 flight 创建新表名字on_time_performance
2、航班数据检查csv ,导入到sql数据库中
3、执行count语句查看总行数。
【第一题】问题解答
1、创建db名字叫 flight 创建新表名字叫on_time_performance;
create database flight_0224; # 创建db
show DATABASES # 验证数据库是否创建成功
CREATE TABLE on_time_performance(`id` int); # 创建新表
show TABLES; # 验证表是否创建成功
2、航班数据检查csv ,导入到sql数据库中
navicat 导入向导
3、执行count语句查看总行数。
SELECT count(*) from sample_On_Time_On_Time_Performance_2015
【第二题】
1、在航班及时率表中有如下delay延迟字段,请找出导致延迟次数最多的是哪一个原因 .
并给出各个原因累计的延迟时间WeatherDelay CarrierDelay NASDelay SecurityDelay LateAircraftDelay
2、延迟次数倒数第二的是哪个原因 ,给出累计的延迟时间。
【第二题】问题解答
1、在航班及时率表中有如下delay延迟字段,请找出导致延迟次数最多的是哪一个原因 .
并给出各个原因累计的延迟时间WeatherDelay CarrierDelay NASDelay SecurityDelay LateAircraftDelay
1) 延迟次数最多的原因
select * from(
SELECT 'WeatherDelay' as 'type', sum(WeatherDelay) as '时间',count(WeatherDelay) as '次数' from sample_On_Time_On_Time_Performance_2015 where WeatherDelay != '0'
union SELECT 'CarrierDelay' as 'type',sum(CarrierDelay) as '时间',count(CarrierDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where CarrierDelay != '0'
union SELECT 'NASDelay' as 'type',sum(NASDelay) as '时间',count(NASDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where NASDelay != '0'
union SELECT 'SecurityDelay' as 'type',sum(SecurityDelay) as '时间',count(SecurityDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where SecurityDelay != '0'
union SELECT 'LateAircraftDelay' as 'type',sum(LateAircraftDelay) as '时间',count(LateAircraftDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where LateAircraftDelay != '0') t
order by 次数 desc
limit 1
结果:
2) 累计延迟时间:WeatherDelay CarrierDelay NASDelay SecurityDelay LateAircraftDelay
select * from(
SELECT 'WeatherDelay' as 'type', sum(WeatherDelay) as '时间',count(WeatherDelay) as '次数' from sample_On_Time_On_Time_Performance_2015 where WeatherDelay != '0'
union SELECT 'CarrierDelay' as 'type',sum(CarrierDelay) as '时间',count(CarrierDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where CarrierDelay != '0'
union SELECT 'NASDelay' as 'type',sum(NASDelay) as '时间',count(NASDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where NASDelay != '0'
union SELECT 'SecurityDelay' as 'type',sum(SecurityDelay) as '时间',count(SecurityDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where SecurityDelay != '0'
union SELECT 'LateAircraftDelay' as 'type',sum(LateAircraftDelay) as '时间',count(LateAircraftDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where LateAircraftDelay != '0') t
order by 次数 desc;
结果:
2、延迟次数倒数第二的是哪个原因 ,给出累计的延迟时间。
select * from(
SELECT 'WeatherDelay' as 'type', sum(WeatherDelay) as '时间',count(WeatherDelay) as '次数' from sample_On_Time_On_Time_Performance_2015 where WeatherDelay != '0'
union SELECT 'CarrierDelay' as 'type',sum(CarrierDelay) as '时间',count(CarrierDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where CarrierDelay != '0'
union SELECT 'NASDelay' as 'type',sum(NASDelay) as '时间',count(NASDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where NASDelay != '0'
union SELECT 'SecurityDelay' as 'type',sum(SecurityDelay) as '时间',count(SecurityDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where SecurityDelay != '0'
union SELECT 'LateAircraftDelay' as 'type',sum(LateAircraftDelay) as '时间',count(LateAircraftDelay) as '次数'from sample_On_Time_On_Time_Performance_2015 where LateAircraftDelay != '0') t
order by 次数
limit 1,1
结果:
【第三题】
1、请显示2015-01-03日的航班数据,要求显示AirlineID,FlightNum,Origin,Dest,DepTime,TailNum,AirTime,Distance如上字段
2、由于时间部分查询非常多,请在时间字段上增加索引,加快查询速度,此处请截取两张图片,比对执行速度的区别
3、由于数据很多,所以请给出第50-55行数据的AirlineID,FlightNum,Origin,Dest,DepTime,TailNum,AirTime,Distance 截图;同时写出快速获取到第50-55行所用的sql语句。
【第三题】问题解答
1、请显示2015-01-03日的航班数据,要求显示AirlineID,FlightNum,Origin,Dest,DepTime,TailNum,AirTime,Distance如上字段
SELECT t.AirlineID,t.FlightNum,t.Origin,t.Dest,t.DepTime,t.TailNum,t.AirTime,t.Distance
from sample_On_Time_On_Time_Performance_2015 t
where DATE(t.FlightDate) = '2015-01-03';
结果:
2、由于时间部分查询非常多,请在时间字段上增加索引,加快查询速度,此处请截取两张图片,比对执行速度的区别。
创建索引:
CREATE INDEX time1 on sample_On_Time_On_Time_Performance_2015(FlightDate);
查看索引是否创建成功:
show index from sample_On_Time_On_Time_Performance_2015;
EXPLAIN
SELECT t.FlightDate,t.AirlineID,t.FlightNum,t.Origin,t.Dest,t.DepTime,t.TailNum,t.AirTime,t.Distance
from sample_On_Time_On_Time_Performance_2015 t
where t.FlightDate = '2015/1/3';
SELECT t.FlightDate,t.AirlineID,t.FlightNum,t.Origin,t.Dest,t.DepTime,t.TailNum,t.AirTime,t.Distance
from sample_On_Time_On_Time_Performance_2015 t
where t.FlightDate = '2015/1/3';
设置索引前查询:
设置索引后查询:
3、由于数据很多,所以请给出第50-55行数据的AirlineID,FlightNum,Origin,Dest,DepTime,TailNum,AirTime,Distance 截图;
同时写出快速获取到第50-55行所用的sql语句。
SELECT t.AirlineID,t.FlightNum,t.Origin,t.Dest,t.DepTime,t.TailNum,t.AirTime,t.Distance
from sample_On_Time_On_Time_Performance_2015 t
where DATE(t.FlightDate) = '2015-01-03'
limit 49,6
结果:
【第四题】
1、探究一年内飞行航班的季节性,并统计出12个月的航班总数
2、找出航班最多的一个月是几月份。
【第四题】问题解答。?WITH rollup换成月做分组就可以了
1、探究一年内飞行航班的季节性,并统计出12个月的航班总数;
select IFNULL(a.季度,'总数(12个月') '季度',a. 航班数 from
(select
QUARTER(flightdate) as'季度',count(flightdate) '航班数'
FROM sample_On_Time_On_Time_Performance_2015
GROUP BY 季度
WITH ROLLUP
ORDER BY 季度) a ORDER BY 季度
结果:
2、找出航班最多的一个月是几月份。
select
DATE_FORMAT(flightdate,'%y_%m') '月份',count(flightdate) '计数'
FROM sample_On_Time_On_Time_Performance_2015
GROUP BY 月份
ORDER BY 计数 desc limit 1
SELECT a.* FROM
(select DATE_FORMAT(flightdate,'%y_%m') '月份',count(flightdate) '航班数'
FROM sample_On_Time_On_Time_Performance_2015
GROUP BY 月份
ORDER BY 航班数 DESC) a
where 航班数 = (SELECT max(a.航班数)from
(select DATE_FORMAT(flightdate,'%y_%m') '月份',count(flightdate) '航班数'
FROM sample_On_Time_On_Time_Performance_2015
GROUP BY 月份
ORDER BY 航班数 DESC) a)
【第五题】
现有一个json文件的airplane数据,写一段python程序把数据转换成csv,导入数据库中。
然后统计飞机最多的两个生产商练习所需资料包