HiveSQL统计分析

131 篇文章 16 订阅
39 篇文章 3 订阅

  问题导读:

1、怎样使用SQL统计出每个用户的累积访问次数?
2、怎样使用SQL查询和排序亿级记录?
3、怎样使用SQL查询并排序分组取出前10?

第一题
1、需求
我们有如下的用户访问数据
 

userIdvisitDatevisitCount
u012021/1/215

u02
2021/1/236

u03
2021/1/228

u04
2021/1/203

u01
2021/1/236

u01
2021/2/218

u02
2021/1/236
u012021/2/224


要求使用SQL统计出每个用户的累积访问次数,如下表所示:
 

用户id月份小计累计
u012021-011111
u012021-021223
u022021-011212
u032021-0188
u042021-0133


2、数据准备
 

CREATE TABLE test_sql.test1 ( userId string, visitDate string, visitCount INT ) ROW format delimited FIELDS TERMINATED BY "\t";
INSERT INTO TABLE test_sql.test1
VALUES
        ( 'u01', '2021/1/21', 5 ),
        ( 'u02', '2021/1/23', 6 ),
        ( 'u03', '2021/1/22', 8 ),
        ( 'u04', '2021/1/20', 3 ),
        ( 'u01', '2021/1/23', 6 ),
        ( 'u01', '2021/2/21', 8 ),
        ( 'u02', '2021/1/23', 6 ),
        ( 'u01', '2021/2/22', 4 );
3、查询SQL
SELECT
        t2.userid,
        t2.visitmonth,
        subtotal_visit_cnt,
        sum( subtotal_visit_cnt ) over ( PARTITION BY userid ORDER BY visitmonth ) AS total_visit_cnt 
FROM
        (
        SELECT
                userid,
                visitmonth,
                sum( visitcount ) AS subtotal_visit_cnt 
        FROM
                ( SELECT userid, date_format( regexp_replace ( visitdate, '/', '-' ), 'yyyy-MM' ) AS visitmonth, visitcount FROM test_sql.test1 ) t1 
        GROUP BY
                userid,
                visitmonth 
        ) t2 
ORDER BY
        t2.userid,
        t2.visitmonth;
4、执行结果

第二题

1、需求

有50W个京东店铺,每个顾客访客访问任何一个店铺的任何一个商品时都会产生一条访问日志,访问日志存储的表名为Visit,访客的用户id为user_id,被访问的店铺名称为shop,数据如下:
user_idshop
u1a
u2b
u1b
u1a
u3c
u4b
u1a
u2c
u5b
u4b
u6c
u2c
u1b
u2a
u2a
u3a
u5a
u5a
u5a


请统计:

(1)每个店铺的UV(访客数)
(2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数

2、数据准备

CREATE TABLE test_sql.test2 ( user_id string, shop string ) ROW format delimited FIELDS TERMINATED BY '\t';
INSERT INTO TABLE test_sql.test2
VALUES
        ( 'u1', 'a' ),
        ( 'u2', 'b' ),
        ( 'u1', 'b' ),
        ( 'u1', 'a' ),
        ( 'u3', 'c' ),
        ( 'u4', 'b' ),
        ( 'u1', 'a' ),
        ( 'u2', 'c' ),
        ( 'u5', 'b' ),
        ( 'u4', 'b' ),
        ( 'u6', 'c' ),
        ( 'u2', 'c' ),
        ( 'u1', 'b' ),
        ( 'u2', 'a' ),
        ( 'u2', 'a' ),
        ( 'u3', 'a' ),
        ( 'u5', 'a' ),
        ( 'u5', 'a' ),
        ( 'u5', 'a' );


3、查询SQL实现

(1)

方式1:

# 每个店铺的UV(访客数)
SELECT shop,count(DISTINCT user_id) FROM test_sql.test2 GROUP BY shop


方式2:

#每个店铺的UV(访客数)
SELECT
        t.shop,
        count(*) 
FROM
        ( SELECT user_id, shop FROM test_sql.test2 GROUP BY user_id, shop ) t 
GROUP BY
        t.shop;


(2)
  

#每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数 
SELECT
        t2.shop,
        t2.user_id,
        t2.cnt 
FROM
        (
        SELECT
                t1.*,
                row_number() over ( PARTITION BY t1.shop ORDER BY t1.cnt DESC ) rank 
        FROM
                ( SELECT user_id, shop, count(*) AS cnt FROM test_sql.test2 GROUP BY user_id, shop ) t1 
        ) t2 
WHERE
        rank <= 3;


4、执行结果

(1)



(2)



第三题

1、需求

已知一个表STG.ORDER,有如下字段:Date,Order_id,User_id,amount。

数据样例:2021-01-01,10029028,1000003251,33.57。

请给出sql进行统计:

(1)给出 2021年每个月的订单数、用户数、总成交金额。
(2)给出2021年11月的新客数(指在11月才有第一笔订单)

2、数据准备

CREATE TABLE test_sql.test3 ( dt string, order_id string, user_id string, amount DECIMAL ( 10, 2 ) ) ROW format delimited FIELDS TERMINATED BY '\t';
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-01-01', '10029028', '1000003251', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-01-01', '10029029', '1000003251', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-01-01', '100290288', '1000003252', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-02-02', '10029088', '1000003251', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-02-02', '100290281', '1000003251', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-02-02', '100290282', '1000003253', 33.57 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2021-11-02', '10290282', '100003253', 234 );
INSERT INTO TABLE test_sql.test3
VALUES
        ( '2018-11-02', '10290284', '100003243', 234 );
3、查询SQL

(1)给出 2021年每个月的订单数、用户数、总成交金额。
SELECT
        t1.mon,
        count( t1.order_id ) AS order_cnt,
        count( DISTINCT t1.user_id ) AS user_cnt,
        sum( amount ) AS total_amount 
FROM
        (
        SELECT
                order_id,
                user_id,
                amount,
                date_format( dt, 'yyyy-MM' ) mon 
        FROM
                test_sql.test3 
        WHERE
                date_format( dt, 'yyyy' ) = '2021' 
        ) t1 
GROUP BY
        t1.mon;
(2)给出2021年11月的新客数(指在11月才有第一笔订单)
SELECT
        count( user_id ) 
FROM
        test_sql.test3 
GROUP BY
        user_id 
HAVING
        date_format( min( dt ), 'yyyy-MM' )= '2021-11';
4、 执行结果

(1)

(2)

第四题

1、需求

有一个5000万的用户文件(user_id,name,age),一个2亿记录的用户看电影的记录文件(user_id, url),根据年龄段观看电影的次数进行排序?

2、数据准备
CREATE TABLE test_sql.test4user ( user_id string, NAME string, age INT );
CREATE TABLE test_sql.test4log ( user_id string, url string );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '001', 'u1', 10 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '002', 'u2', 15 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '003', 'u3', 15 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '004', 'u4', 20 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '005', 'u5', 25 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '006', 'u6', 35 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '007', 'u7', 40 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '008', 'u8', 45 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '009', 'u9', 50 );
INSERT INTO TABLE test_sql.test4user
VALUES
        ( '0010', 'u10', 65 );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '001', 'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '002', 'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '003', 'url2' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '004', 'url3' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '005', 'url3' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '006', 'url1' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '007', 'url5' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '008', 'url7' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '009', 'url5' );
INSERT INTO TABLE test_sql.test4log
VALUES
        ( '0010', 'url1' );
3、查询SQL
  
方式1
SELECT
        t2.age_phase,
        sum( t1.cnt ) AS view_cnt 
FROM
        ( SELECT user_id, count(*) cnt FROM test_sql.test4log GROUP BY user_id ) t1
        JOIN (
        SELECT
                user_id,
        CASE
                        
                        WHEN age <= 10 AND age > 0 THEN
                        '0-10' 
                        WHEN age <= 20 AND age > 10 THEN
                        '10-20' 
                        WHEN age > 20 
                        AND age <= 30 THEN '20-30' WHEN age > 30 
                                AND age <= 40 THEN '30-40' WHEN age > 40 
                                        AND age <= 50 THEN '40-50' WHEN age > 50 
                                                AND age <= 60 THEN '50-60' WHEN age > 60 
                                                        AND age <= 70 THEN
                                                                '60-70' ELSE '70以上' 
                                                                END AS age_phase 
                                                FROM
                                                        test_sql.test4user 
                                                ) t2 ON t1.user_id = t2.user_id 
                                        GROUP BY
                                        t2.age_phase;
方式2
SELECT
        concat( phase - 10, '-', phase ),
        sum( cnt ) sum_movies 
FROM
        (
        SELECT
                *,
                ceil( age / 10 ) * 10 phase 
        FROM
                test4user a
                JOIN ( SELECT user_id, count( url ) cnt FROM test4log GROUP BY user_id ) b ON a.user_id = b.user_id 
        ) c 
GROUP BY
        c.phase;
4、执行结果

方式1

方式2

第五题

1、需求

有日志如下,请写出代码求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有 访问记录的用户)
日期用户年龄
2019-02-11test_123
2019-02-11test_219
2019-02-11test_339
2019-02-11test_123
2019-02-11test_339
2019-02-11test_123
2019-02-12test_219
2019-02-13test_123
2019-02-15test_219
2019-02-16test_219


2、数据准备

CREATE TABLE test5 ( dt string, user_id string, age INT ) ROW format delimited FIELDS TERMINATED BY ',';
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_1', 23 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_2', 19 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_3', 39 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_1', 23 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_3', 39 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-11', 'test_1', 23 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-12', 'test_2', 19 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-13', 'test_1', 23 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-15', 'test_2', 19 );
INSERT INTO TABLE test_sql.test5
VALUES
        ( '2019-02-16', 'test_2', 19 );


3、查询SQL

方式1

SELECT
        sum( total_user_cnt ) total_user_cnt,
        sum( total_user_avg_age ) total_user_avg_age,
        sum( two_days_cnt ) two_days_cnt,
        sum( avg_age ) avg_age 
FROM
        (
        SELECT
                0 total_user_cnt,
                0 total_user_avg_age,
                count(*) AS two_days_cnt,
                cast(
                sum( age ) / count(*) AS DECIMAL ( 5, 2 )) AS avg_age 
        FROM
                (
                SELECT
                        user_id,
                        max( age ) age 
                FROM
                        (
                        SELECT
                                user_id,
                                max( age ) age 
                        FROM
                                (
                                SELECT
                                        user_id,
                                        age,
                                        date_sub( dt, rank ) flag 
                                FROM
                                        (
                                        SELECT
                                                dt,
                                                user_id,
                                                max( age ) age,
                                                row_number() over ( PARTITION BY user_id ORDER BY dt ) rank 
                                        FROM
                                                test_sql.test5 
                                        GROUP BY
                                                dt,
                                                user_id 
                                        ) t1 
                                ) t2 
                        GROUP BY
                                user_id,
                                flag 
                        HAVING
                                count(*) >= 2 
                        ) t3 
                GROUP BY
                        user_id 
                ) t4 UNION ALL
        SELECT
                count(*) total_user_cnt,
                cast(
                sum( age ) / count(*) AS DECIMAL ( 5, 2 )) total_user_avg_age,
                0 two_days_cnt,
                0 avg_age 
        FROM
                ( SELECT user_id, max( age ) age FROM test_sql.test5 GROUP BY user_id ) t5 
        ) t6;


方式2

SELECT
        * 
FROM
        (
        SELECT
                count( user_id ) total_cnt_users,
                avg( age ) total_avg_age 
        FROM
                ( SELECT user_id, max( age ) age FROM test5 GROUP BY user_id ) g 
        ) h
        CROSS JOIN (
        SELECT
                count( user_id ) hot_users_count,
                avg( age ) hot_age_avg 
        FROM
                (
                SELECT
                        d.user_id,
                        max( d.age ) age 
                FROM
                        (
                        SELECT
                                user_id,
                                max( age ) age,
                                count( 1 ) cnt 
                        FROM
                                (
                                SELECT
                                        *,
                                        date_sub( dt, rank ) dt2 
                                FROM
                                        (
                                        SELECT
                                                user_id,
                                                dt,
                                                max( age ) age,
                                                ROW_NUMBER() over ( PARTITION BY a.user_id ORDER BY a.dt ) rank 
                                        FROM
                                                ( SELECT DISTINCT dt, age, user_id FROM test5 ) a 
                                        GROUP BY
                                                a.user_id,
                                                a.dt 
                                        ) b 
                                ) c 
                        GROUP BY
                                c.user_id,
                                c.dt2 
                        HAVING
                                cnt > 1 
                        ) d 
                GROUP BY
                        d.user_id 
                ) e 
        ) f ON 1 = 1;


4、执行结果

方式1



方式2



第六题

1、需求

请用sql写出所有用户中在今年10月份第一次购买商品的金额, 表ordertable字段:(购买用户:userid,金额:money,购买时间:paymenttime(格式:2021-10-01), 订单id:orderid

2、数据准备

CREATE TABLE test_sql.test6 ( userid string, money DECIMAL ( 10, 2 ), paymenttime string, orderid string );
INSERT INTO TABLE test_sql.test6
VALUES
        ( '001', 100, '2021-10-01', '123' );
INSERT INTO TABLE test_sql.test6
VALUES
        ( '001', 200, '2021-10-02', '124' );
INSERT INTO TABLE test_sql.test6
VALUES
        ( '002', 500, '2021-10-01', '125' );
INSERT INTO TABLE test_sql.test6
VALUES
        ( '001', 100, '2021-11-01', '126' );
3、查询SQL
SELECT
4、执行结果

第七题

1、需求

现有图书管理数据库的三个数据模型如下:

图书(数据表名:BOOK)
序号字段名称字段描述字段类型
1BOOK_ID总编号文本
2SORT分类号文本
3BOOK_NAME书名文本
4WRITER作者文本
5OUTPUT出版单位文本
6PRICE单位数值(保留小数点后2位)


读者(数据表名:READER)
 

序号字段名称字段描述字段类型
1READER_ID借书证号文本
2COMPANY单位文本
3NAME姓名文本
4SEX性别文本
5GRADE职称文本
6ADDR地址文本


借阅记录(数据表名:BORROW LOG)
 

序号字段名称字段描述字段类型
1READER_ID借书证号文本
2BOOK_ID总编号文本
3BORROW_DATE借书日期日期


(1)创建图书管理库的图书、读者和借阅三个基本表的表结构。请写出建表语句。
(2)找出姓李的读者姓名(NAME)和所在单位(COMPANY)。
(3)查找“高等教育出版社”的所有图书名称(BOOK_NAME)及单价(PRICE),结果按单价降序排序。
(4)查找价格介于10元和20元之间的图书种类(SORT)出版单位(OUTPUT)和单价(PRICE), 结果按出版单位(OUTPUT)和单价(PRICE)升序排序。
(5)查找所有借了书的读者的姓名(NAME)及所在单位(COMPANY)。
(6)求”科学出版社”图书的最高单价、最低单价、平均单价。
(7)找出当前至少借阅了2本图书(大于等于2本)的读者姓名及其所在单位。
(8)考虑到数据安全的需要,需定时将“借阅记录”中数据进行备份,请使用一条SQL语句,在备份用户bak下创建与“借阅记录”表结构完全一致的数据表BORROW_LOG_BAK.井且将“借阅记录”中现 有数据全部复制到BORROW_L0G_ BAK中。
(9)现在需要将原Oracle数据库中数据迁移至Hive仓库,请写出“图书”在Hive中的建表语句(Hive实现,提示:列分隔符|;数据表数据需要外部导入:分区分别以month_part、day_part 命名)
(10)Hive中有表A,现在需要将表A的月分区 202106中 user_id为20000的user_dinner字段更新为bonc8920,其他用户user_dinner字段数据不变,请列出更新的方法步骤。(Hive实现, 提示:Hlive中无update语法,请通过其他办法进行数据更新)


2、数据准备

(1) 创建图书表book

CREATE TABLE test_sql.book (
        book_id string,
        `SORT` string,
        book_name string,
        writer string,
        OUTPUT string,
price DECIMAL ( 10, 2 ));
INSERT INTO TABLE test_sql.book
VALUES
        ( '001', 'TP391', '信息处理', 'author1', '机械工业出版社', '20' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '002', 'TP392', '数据库', 'author12', '科学出版社', '15' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '003', 'TP393', '计算机网络', 'author3', '机械工业出版社', '29' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '004', 'TP399', '微机原理', 'author4', '科学出版社', '39' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '005', 'C931', '管理信息系统', 'author5', '机械工业出版社', '40' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '006', 'C932', '运筹学', 'author6', '科学出版社', '55' );
INSERT INTO TABLE test_sql.book
VALUES
        ( '007', 'C939', '大数据平台架构与原型实现', 'author7', '高等教育出版社', '66' );


(2)创建读者表reader

CREATE TABLE test_sql.reader ( reader_id string, company string, NAME string, sex string, grade string, addr string );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0001', '阿里巴巴', 'jack', '男', 'vp', 'addr1' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0002', '百度', 'robin', '男', 'vp', 'addr2' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0003', '腾讯', 'tony', '男', 'vp', 'addr3' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0004', '京东', 'jasper', '男', 'cfo', 'addr4' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0005', '网易', 'zhangsan', '女', 'ceo', 'addr5' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0006', '搜狐', 'lisi', '女', 'ceo', 'addr6' );
INSERT INTO TABLE test_sql.reader
VALUES
        ( '0007', '美团', '李哥', '男', '大数据开发', 'addr7' );


(3)创建借阅记录表borrow_log

CREATE TABLE test_sql.borrow_log ( reader_id string, book_id string, borrow_date string );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0001', '002', '2021-10-14' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0002', '001', '2021-10-13' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0003', '005', '2021-09-14' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0004', '006', '2021-08-15' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0005', '003', '2021-10-10' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0006', '004', '2021-12-13' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0007', '003', '2021-10-16' );
INSERT INTO TABLE test_sql.borrow_log
VALUES
        ( '0007', '008', '2021-10-16' );


3、查询SQL

(1)创建图书管理库的图书、读者和借阅三个基本表的表结构。请写出建表语句。

CREATE TABLE test_sql.book (
        book_id string,
        `SORT` string,
        book_name string,
        writer string,
        OUTPUT string,
price DECIMAL ( 10, 2 ));
CREATE TABLE test_sql.reader ( reader_id string, company string, NAME string, sex string, grade string, addr string );
CREATE TABLE test_sql.borrow_log ( reader_id string, book_id string, borrow_date string );


(2) 找出姓李的读者姓名(NAME)和所在单位(COMPANY)。

SELECT name, company FROM test_sql.reader WHERE name LIKE '李%';


(3)查找“高等教育出版社”的所有图书名称(BOOK_NAME)及单价(PRICE),结果按单价降序排序。

SELECT
        book_name,
        price 
FROM
        test_sql.book 
WHERE
        OUTPUT = "高等教育出版社" 
ORDER BY
        price DESC;


(4) 查找价格介于10元和20元之间的图书种类(SORT)出版单位(OUTPUT)和单价(PRICE),结 果按出版单位(OUTPUT)和单价(PRICE)升序排序。

方式1

SELECT
        sort,
        output,
        price 
FROM
        test_sql.book 
WHERE
        price ]= 10 
        AND price [= 20 
ORDER BY
        output,
        price;


方式2

SELECT
        sort,
        output,
        price 
FROM
        book 
WHERE
        price BETWEEN 10 
        AND 20 
ORDER BY
        output ASC,
        price ASC;


(5)查找所有借了书的读者的姓名(NAME)及所在单位(COMPANY)。

SELECT
        b.NAME,
        b.company 
FROM
        test_sql.borrow_log a
        JOIN test_sql.reader b ON a.reader_id = b.reader_id;


(6)求”科学出版社”图书的最高单价、最低单价、平均单价。

SELECT
        max( price ),
        min( price ),
        avg( price ) 
FROM
        test_sql.book 
WHERE
        OUTPUT = '科学出版社';


(7)找出当前至少借阅了2本图书(大于等于2本)的读者姓名及其所在单位。

SELECT
        b.NAME,
        b.company 
FROM
        ( SELECT reader_id FROM test_sql.borrow_log GROUP BY reader_id HAVING count(*) ]= 2 ) a
        JOIN test_sql.reader b ON a.reader_id = b.reader_id;


(8)考虑到数据安全的需要,需定时将“借阅记录”中数据进行备份,请使用一条SQL语句,在备份用 户bak下创建与“借阅记录”表结构完全一致的数据表BORROW_LOG_BAK.井且将“借阅记录”中现有 数据全部复制到BORROW_L0G_ BAK中。

CREATE TABLE test_sql.borrow_log_bak AS SELECT
* 
FROM
        test_sql.borrow_log;


(9)现在需要将原Oracle数据库中数据迁移至Hive仓库,请写出“图书”在Hive中的建表语句(Hive实现,提示:列分隔符|;数据表数据需要外部导入:分区分别以month_part、day_part 命名)

CREATE TABLE book_hive 
        ( book_id string, SORT string, book_name string, writer string, OUTPUT string, price DECIMAL ( 10, 2 ) ) 
        partitioned BY ( month_part string, day_part string ) 
        ROW format delimited FIELDS TERMINATED BY '\|' stored AS textfile;


(10)Hive中有表A,现在需要将表A的月分区 202106 中 user_id为20000的user_dinner字段更新为bonc8920,其他用户user_dinner字段数据不变,请列出更新的方法步骤。(Hive实现,提示:Hlive中无update语法,请通过其他办法进行数据更新)

方式1:配置hive支持事务操作,分桶表,orc存储格式
方式2:第一步找到要更新的数据,将要更改的字段替换为新的值,第二步找到不需要更新的数据,第三步将上两步的数据插入一张新表中。

4、执行结果

(2)



(3)



(4)



(5)



(6)



(7)



(8)



(9)



第八题
  
1、需求

有一个线上服务器访问日志格式如下(用sql答题)



求11月9号下午14点(14-15点),访问/api/user/login接口的top10的ip地址

2、数据准备

CREATE TABLE test_sql.test8 ( `date` string, interface string, ip string );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 11:22:05', '/api/user/login', '110.23.5.23' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 11:23:10', '/api/user/detail', '57.3.2.16' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 23:59:40', '/api/user/login', '200.6.5.166' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 11:14:23', '/api/user/login', '136.79.47.70' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 11:15:23', '/api/user/detail', '94.144.143.141' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 11:16:23', '/api/user/login', '197.161.8.206' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 12:14:23', '/api/user/detail', '240.227.107.145' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 13:14:23', '/api/user/login', '79.130.122.205' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:14:23', '/api/user/detail', '65.228.251.189' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:15:23', '/api/user/detail', '245.23.122.44' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:17:23', '/api/user/detail', '22.74.142.137' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:19:23', '/api/user/detail', '54.93.212.87' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:20:23', '/api/user/detail', '218.15.167.248' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:24:23', '/api/user/detail', '20.117.19.75' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 15:14:23', '/api/user/login', '183.162.66.97' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 16:14:23', '/api/user/login', '108.181.245.147' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:17:23', '/api/user/login', '22.74.142.137' );
INSERT INTO TABLE test_sql.test8
VALUES
        ( '2016-11-09 14:19:23', '/api/user/login', '22.74.142.137' );
3、查询SQL
  
SELECT
        ip,
        count(*) AS cnt 
FROM
        test_sql.test8 
WHERE
        date_format( `date`, 'yyyy-MM-dd HH' ) ]= '2016-11-09 14' 
        AND date_format( `date`, 'yyyy-MM-dd HH' ) [ '2016-11-09 15' 
        AND interface = '/api/user/login' 
GROUP BY
        ip 
ORDER BY
        cnt DESC 
        LIMIT 10;
4、执行结果
  
第九题

1、需求

有一个充值日志表credit_log,字段如下:

`dist_id` int   '区组id',
`account` string   '账号',
`money` int   '充值金额',`
create_time` string  '订单时间'
请写出SQL语句,查询充值日志表2021年01月02号每个区组下充值额最大的账号,要求

结果:

区组id,账号,金额,充值时间

2、数据准备
CREATE TABLE test_sql.test9 ( dist_id string COMMENT '区组id', account string COMMENT '账号', `money` DECIMAL ( 10, 2 ) COMMENT '充值金额', create_time string COMMENT '订单时间' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '11', 100006, '2021-01-02 13:00:01' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '22', 110000, '2021-01-02 13:00:02' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '33', 102000, '2021-01-02 13:00:03' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '44', 100300, '2021-01-02 13:00:04' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '55', 100040, '2021-01-02 13:00:05' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '66', 100005, '2021-01-02 13:00:06' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '77', 180000, '2021-01-03 13:00:07' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '88', 106000, '2021-01-02 13:00:08' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '99', 100400, '2021-01-02 13:00:09' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '12', 100030, '2021-01-02 13:00:10' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '13', 100003, '2021-01-02 13:00:20' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '14', 100020, '2021-01-02 13:00:30' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '15', 100500, '2021-01-02 13:00:40' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '16', 106000, '2021-01-02 13:00:50' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '1', '17', 100800, '2021-01-02 13:00:59' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '2', '18', 100800, '2021-01-02 13:00:11' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '2', '19', 100030, '2021-01-02 13:00:12' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '2', '10', 100000, '2021-01-02 13:00:13' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '2', '45', 100010, '2021-01-02 13:00:14' );
INSERT INTO TABLE test_sql.test9
VALUES
        ( '2', '78', 100070, '2021-01-02 13:00:15' );
3、查询SQL

SELECT 
4、执行结果
  
第十题

1、需求

有一个账号表如下,请写出SQL语句,查询各自区组的gold排名前十的账号(分组取前10)

dist_id string   '区组id',
account string   '账号',
gold int   '金币'
2、数据准备
CREATE TABLE test_sql.test10 ( `dist_id` string COMMENT '区组id', `account` string COMMENT '账号', `gold` INT COMMENT '金币' );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '77', 18 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '88', 106 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '99', 10 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '12', 13 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '13', 14 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '14', 25 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '15', 36 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '16', 12 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '1', '17', 158 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '2', '18', 12 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '2', '19', 44 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '2', '10', 66 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '2', '45', 80 );
INSERT INTO TABLE test_sql.test10
VALUES
        ( '2', '78', 98 );
3、 查询SQL
SELECT
        dist_id,
        account,
        gold 
FROM
        ( SELECT dist_id, account, gold, row_number () over ( PARTITION BY dist_id ORDER BY gold DESC ) rank FROM test_sql.test10 ) t 
WHERE
        rank [= 10;
4、执行结果

  • 0
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值