- 博客(10)
- 收藏
- 关注
原创 SQL重复值排序的区别
最后,来对比一下rank()、dense_rank()、row_number()这三者之间的区别:rank ():如图1dense_rank(): 如图2row_number():如图3当排序过程当中出现重复相同值的时候,这三者的区别显而易见~~...
2019-08-28 21:32:36 990
原创 SQL列转行
行转列SELECT ProgrectName, 'OverseaSupply' AS Supplier, MAX(OverseaSupply) AS 'SupplyNum' -- 可以写max,也可以写mean,min,因为都是一样的FROM ProgrectDetail GROUP BY ProgrectName UNION ALL SELECT Progre...
2019-08-28 17:15:41 295
原创 python可交互数据透视表
!pip install pivottablejsfrom pivottablejs import pivot_uipivot_ui(df,outfile_path=‘pivottablejs.html’)
2019-08-19 10:36:31 582
原创 pandas-profiling
import sys !{sys.executable} -m pip install pandas-profilingimport numpy as npimport pandas as pdimport pandas_profiling as ppdf = pd.DataFrame(np.random.rand(100, 5),columns=[‘a’, ‘b’, ‘c’, ‘...
2019-08-19 10:33:18 862
原创 groupby + unstack
x = data.groupby([‘brand’,‘月份’]).sum()[‘counts’].to_frame()x = x[x[‘counts’]>=1000]x.head()x = x.rename(columns={‘counts’:‘month’})tmp = x.unstack()tmpfonts = fm.FontProperties(fname=‘simsu...
2019-08-19 10:32:06 906
原创 matplotlib画图
import pandas as pdimport matplotlib.pyplot as pltfrom pylab import mplimport matplotlib.font_manager as fm显示中文fonts = fm.FontProperties(fname='simsun.ttc') # 设置字体plt.figure(figsize=(100,5))tmp...
2019-08-19 10:28:34 249
原创 HiveQl: SemanticException Cannot do equality join on different types: string and struct
select t1.date as date,imei from(select date,aaid from tracking.analytics_usagev2 alateral view json_tuple(a.value, "B","H") v1 as B,Hlateral view json_tuple('H','aaid') v2 as aaidwhere date >=...
2019-08-15 16:26:40 1668 1
原创 row_number函数
select id, brand,firstactivetime,row_number() over (partition by id order by firstactivetime desc) as rankfrom profile.device_state_accumulator_allwhere date = 20190731and activetags &1 = 1an...
2019-08-14 16:20:35 447
原创 HiveQL 正则转义符号
|http://dp.pt.xiaomi.com/task/21676133http://dp.pt.xiaomi.com/task/21676292正则:竖线需要转义才可以作为split的参数:http://dp.pt.xiaomi.com/task/21676398test:http://dp.pt.xiaomi.com/task/21676414[]...
2019-08-06 20:22:47 1013
原创 HiveQL调优
group by会进行一次map reduce,因此不要group by和distinct一起写。会很浪费时间。hive工场运行的时候,可以看到map reduce了多少个stage,如果十几个了说明性能写的有问题子查询:http://dp.pt.xiaomi.com/task/21674302 没有用自查询,run了一天也没有结果,被工场分给很少的资源http://dp.pt.x...
2019-08-02 18:19:22 209
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人