美团、快手、核桃编程互联网公司大数据岗位面试

最新推荐文章于 2024-08-07 22:25:18 发布

weixin_45321296

最新推荐文章于 2024-08-07 22:25:18 发布

阅读量506

点赞数

分类专栏：互联网面试文章标签： hadoop 大数据

本文链接：https://blog.csdn.net/weixin_45321296/article/details/114834762

版权

本文详细介绍了美团、快手和核桃编程在大数据岗位面试中涉及的知识点，包括快速排序、Hive SQL优化、流处理框架、Hadoop MR、UDF与UDAF、拉链表解链等，并提供了实际问题的解决方案和业务场景分析。

摘要由CSDN通过智能技术生成

一、核桃编程
1.快速排序算法 https://baike.baidu.com/item/%E5%BF%AB%E9%80%9F%E6%8E%92%E5%BA%8F%E7%AE%97%E6%B3%95/369842?fr=aladdin
2.select–from–where–group by–having–order by 执行顺序 from–where–group by–having–select–order by
https://blog.csdn.net/eadoucn/article/details/48787009
3.1-A-start1-end1、1-A-start2-end2、1-B-start1-end1、1-A-start3-end3
如何得到结果：1-A-start1-end2、1-B-start1-end1、1-A-start3-end3
4.静态分区、动态分区区别、使用场景
静态分区，分区时指定的值，支持load、insert，适合场景：确定分区、分区少、增量导入
动态分区，分区值某个字段，根据字段值进行分区，只能用insert，适合场景：不确定分区、分区多、全量导入
https://blog.csdn.net/weixin_43992185/article/details/107210836

5.流处理框架：kattle -kafa-sparkstreaming-hive \hbase\mysql
flume-kafka-sparkstreaming-hive \hbase\mysql
elk
https://www.cnblogs.com/zackstang/p/11522194.html
logstash（搜集）-elasticsearch（分析处理）-Kibana（展示）
https://www.cnblogs.com/aresxin/p/8035137.html

6.hivesql优化步骤
a.mr调参：本地/集群模式、非严格模式，表多大、每批多大、多少分区、设置map和reduce、
冷热数据-文件存储格式orc\sequencefile\textfile，输入输出文件格式，对外跨库传输文件textfile，数据倾斜（map端聚合）、内存溢出设置
b.业务逻辑调优：并行程序，业务逻辑梳理、关联条件梳理、逻辑清晰简单、表设计（增量、全量、拉链表、明细表、维度表设计），写功能函数封装实现。
c.sql调优：主表大从表小、嵌套子查询临时表化、数据最小化原则（选用目标条件和目标字段筛选后的数据关联）、加索引、空值\热点值\数据分片导致数据倾斜、重复数据导致笛卡尔积、函数性能（distinct<group by）

7.行列转换

列转行
concat_ws(',',collect_list(order_id)) as order_value 

行转列
select rder_value,order_id
from a
 lateral view  explode(split(order_value,',')) num  as  order_id
 
 
row_number() 1，2，3,4
rank()  1,2,2,4
dense_rank() 1,2,2,3

二、美团
交易表：trans（id,date_str,pro_id,amt,user_id）
1、构建汇总表（日期、当日汇总金额、昨日汇总金额），并初始化表中所有内容
2、求每天每个产品当日累计交易前3的用户

20200101 100
20200102 100
20200103 100

20200101 400 200
20200102 100 400
20200103 300 100

a.
truncate table a;
create table a as
with temp  as (
select date_str,sum(amt) num1 
           from  trans group by  date_str
)
select 
a.date_str,
a.num1 ,
b.num1
from   temp a,
 left join temp b
on a.date_str=b.date_str-1;

b.
truncate table a;
create table a as
with temp  as (
select date_str,sum(amt) num1 
           from  trans group by  date_str
)
select 
date_str,
num1 ,
sum(num1) over(partiton by date_str order by date_str rows between 1 preceding and 1 preceding)
from   temp ;

2.求每天每个产品当日累计交易前3的产品
20200101 a 400 400
20200102 a 100 500
20200103 a 300 800
20200101 b 500 500
20200102 b 200 700

date_str pro_id num1 num2 rn
20200101 b 500 500 1
20200101 a 400 400 2
20200102 b 200 700 1
20200102 a 100 500 2

a.
with temp as(
select 
 date_str,
 pro_id,
sum(amt) num1
from  trans group by  date_str,pro_id)

select 
 date_str,
 pro_id,
 row_number() over(partition by date_str  order by num2 desc ) rn

最低0.47元/天解锁文章

weixin_45321296

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
美团、快手、核桃编程互联网公司大数据岗位面试

一、核桃编程1.快速排序算法 https://baike.baidu.com/item/%E5%BF%AB%E9%80%9F%E6%8E%92%E5%BA%8F%E7%AE%97%E6%B3%95/369842?fr=aladdin2.select–from–where–group by–having–order by 执行顺序 from–where–group by–having–select–order byhttps://blog.csdn.net/eadoucn/article/detail
复制链接

扫一扫