hive实验

最新推荐文章于 2023-04-23 11:42:44 发布

4.11.12

最新推荐文章于 2023-04-23 11:42:44 发布

阅读量1.4k

点赞数 2

分类专栏：大数据

本文链接：https://blog.csdn.net/weixin_43854358/article/details/84977010

版权

大数据专栏收录该内容

5 篇文章 1 订阅

订阅专栏

利用Hive对某网站的用户数据进行分析。

1.创建dblab数据库

命令：create database dblab;

2. 在dblab数据库下创建bigdata_user表，该表中的各种属性如下：

字段名	类型
id	int
uid	string
item_id	string
behavior_type	int
item_category	string
visit_date	date
province	string

3.将用户数据导入到bigdata_user表中（数据在本地的路径为/home/hadoop/user_table.txt）

命令如下图：

注：用户数据包含7列，每列含义如下

id：记录编号，具有唯一性

uid：用户id

item_id：商品id

behavior_type：包括浏览、收藏、加购物车、购买，分别为1、2、3、4

item_category：商品分类

visit_date：该记录产生时间

province：用户所在省份

4.查看前10位用户对商品的行为

命令： select behavior_type from bigdata_user limit 10;

结果如下图：

5.查询前20位用户购买商品时的时间和商品的种类

命令：select visit_data,item_category from bigdata_user limit 20;

结果如下图：

6.用聚合函数count()计算出表内有多少条行数据

命令：select count(*) from bigdata_user;

结果如下：

7.查出uid不重复的数据有多少条
命令：select count(distinct uid) from bigdata_user;

结果如下：

8、查询2014年12月10日到2014年12月13日有多少人浏览了商品

命令：select count(*) from bigdata_user where behavior_type=’1’ and visit_date<’2014-12-13’and visit_date > ‘2014-12-10’;

结果如下：

9.查询一件商品在某天的购买比例或浏览比例

解答：求某件商品的被点击量，浏览为1，购买为4

浏览量：select count(*) from bigdata_user where visit_date = ‘2014-12-10’;

购买量：select count(*) from bigdata_user where visit_date = ‘2014-12-10’and behavior_type = ‘4’;

购买比例：购买量/浏览量

10.某个地区的用户当天浏览网站的次数(要求可以实时查询)。

命令：hive> create table scan(province string,scan int)

> row format delimited

> fields terminated by '\t'

> stored as textfile;

命令：insert overwrite table scan select province,count(behavior_type) from bigdata_user where behavior_type = '1' group by province;

hive> select * from scan;

4.11.12

关注

2
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录