疫情数据分析

疫情数据分析

1.前期数据准备

创建hive表,并导入数据:

create table covid_19_data(
    dt String comment "日期",
    confirmed_case Int comment "确诊病例",
    new_confirmed_case Int comment "新增病例",
    cure_case Int comment "累计治愈",
    new_cure_case Int comment "新增治愈",
    death_case Int comment "死亡病例",
    new_death_case Int comment "新增死亡",
    state String comment "洲",
    country String comment "国家"
)
row format delimited
fields terminated by ",";

load data inpath "/covid_19.csv" into table covid_19_data;

2.数据分析

2.1 死亡人数最多的国家以及其所在洲
select state,
       country,
       sum(new_death_case) sum_death_case
from covid_19_data
group by state, country
order by sum_death_case desc
limit 1;

在这里插入图片描述

2.2 连续三日新增病例在10000人以上的国家及其所连续天数
select country,
       count(1) as days,
       concat_ws("--",cast(min(dt) as string),cast(max(dt) as string)) as duration_dt
from (select country,
             new_confirmed_case,
             dt,
             row_number() over (partition by country order by dt) rank
      from (select country,
                   to_date(from_unixtime(UNIX_TIMESTAMP(dt, "yyyyMMdd"))) dt,
                   new_confirmed_case
            from covid_19_data
            where new_confirmed_case >= 10000) t1) t2
group by country, date_sub(dt, rank)
having days >= 20;

在这里插入图片描述

2.3 找出新增确诊病例连续十天未减少的国家及其时间,一个国家出现多次,保留距离现在时间最近的一次,并按持续天数降序排序.
select country,
       days,
       duration_dt
from (select country,
             days,
             duration_dt,
             row_number() over (partition by country order by min_dt desc) rank
      from (select country,
                   count(1)                                                          as days,
                   concat_ws("--", cast(min(dt) as string), cast(max(dt) as string)) as duration_dt,
                   min(dt)                                                           as min_dt
            from (select country,
                         to_date(from_unixtime(UNIX_TIMESTAMP(dt, "yyyyMMdd"))) as            dt,
                         new_confirmed_case,
                         row_number() over (partition by country order by new_confirmed_case) rank
                  from covid_19_data) t1
            group by country, date_sub(dt, rank)
            having days >= 20) t2) t3
where rank = 1
order by days desc;

在这里插入图片描述

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值