Pandas学习打卡#3

最新推荐文章于 2024-04-22 23:17:25 发布

fangyan0819

最新推荐文章于 2024-04-22 23:17:25 发布

阅读量129

点赞数

分类专栏： pandas学习打卡

本文链接：https://blog.csdn.net/fangyan0819/article/details/105776644

版权

pandas学习打卡专栏收录该内容

4 篇文章 0 订阅

订阅专栏

import numpy  as np
import pandas as pd
df = pd.read_csv('F:/data/Diamonds.csv')
df.head()

	carat	color	depth	price
0	0.23	E	61.5	326
1	0.21	E	59.8	326
2	0.23	E	56.9	327
3	0.29	I	62.4	334
4	0.31	J	63.3	335

df_carat = df.query('carat > 1')['price']
df_carat.max() - df_carat.min()

bins = df['depth'].quantile(np.linspace(0,1,6)).tolist()
cuts = pd.cut(df['depth'],bins=bins)
df['cuts'] = cuts
color_result = df.groupby('cuts')['color'].describe()
df['均重价格']=df['price']/df['carat']
color_result['top'] == [i[1] for i in df.groupby(['cuts'
                                ,'color'])['均重价格'].mean().groupby(['cuts']).idxmax().values]

cuts
(43.0, 60.8]    False
(60.8, 61.6]    False
(61.6, 62.1]    False
(62.1, 62.7]     True
(62.7, 79.0]     True
Name: top, dtype: bool

df = df.drop(columns='均重价格')
cuts = pd.cut(df['carat'],bins=[0,0.5,1,1.5,2,np.inf]) #可选label添加自定义标签
df['cuts'] = cuts
df.head()

	carat	color	depth	price	cuts
0	0.23	E	61.5	326	(0.0, 0.5]
1	0.21	E	59.8	326	(0.0, 0.5]
2	0.23	E	56.9	327	(0.0, 0.5]
3	0.29	I	62.4	334	(0.0, 0.5]
4	0.31	J	63.3	335	(0.0, 0.5]

def f(nums):
    if not nums:        
        return 0
    res = 1                            
    cur_len = 1                        
    for i in range(1, len(nums)):      
        if nums[i-1] < nums[i]:        
            cur_len += 1                
            res = max(cur_len, res)     
        else:                       
            cur_len = 1                 
    return res

for name,group in df.groupby('cuts'):
    group = group.sort_values(by='depth')
    s = group['price']
    print(name,f(s.tolist()))

(0.0, 0.5] 8
(0.5, 1.0] 8
(1.0, 1.5] 7
(1.5, 2.0] 11
(2.0, inf] 7

for name,group in df[['carat','price','color']].groupby('color'):
    L1 = np.array([np.ones(group.shape[0]),group['carat']]).reshape(2,group.shape[0])
    L2 = group['price']
    result = (np.linalg.inv(L1.dot(L1.T)).dot(L1)).dot(L2).reshape(2,1)
    print('颜色为%s时回归系数为：%f'%(name,result[1]))

当颜色为D时，截距项为：-2361.017152，回归系数为：8408.353126
当颜色为E时，截距项为：-2381.049600，回归系数为：8296.212783
当颜色为F时，截距项为：-2665.806191，回归系数为：8676.658344
当颜色为G时，截距项为：-2575.527643，回归系数为：8525.345779
当颜色为H时，截距项为：-2460.418046，回归系数为：7619.098320
当颜色为I时，截距项为：-2878.150356，回归系数为：7761.041169
当颜色为J时，截距项为：-2920.603337，回归系数为：7094.192092

df = pd.read_csv('F:/data/Drugs.csv')
df.head()

	YYYY	State	COUNTY	SubstanceName	DrugReports
0	2010	VA	ACCOMACK	Propoxyphene	1
1	2010	OH	ADAMS	Morphine	9
2	2010	PA	ADAMS	Methadone	2
3	2010	VA	ALEXANDRIA CITY	Heroin	5
4	2010	PA	ALLEGHENY	Hydromorphone	5

idx=pd.IndexSlice
for i in range(2010,2018):
    county = (df.groupby(['COUNTY','YYYY']).sum().loc[idx[:,i],:].idxmax()[0][0])
    state = df.query('COUNTY == "%s"'%county)['State'].iloc[0]
    state_true = df.groupby(['State','YYYY']).sum().loc[idx[:,i],:].idxmax()[0][0]
    if state==state_true:
        print('在%d年，%s县的报告数最多，它所属的州%s也是报告数最多的'%(i,county,state))
    else:
        print('在%d年，%s县的报告数最多，但它所属的州%s不是报告数最多的，%s州报告数最多'%(i,county,state,state_true))

在2010年，PHILADELPHIA县的报告数最多，它所属的州PA也是报告数最多的
在2011年，PHILADELPHIA县的报告数最多，但它所属的州PA不是报告数最多的，OH州报告数最多
在2012年，PHILADELPHIA县的报告数最多，但它所属的州PA不是报告数最多的，OH州报告数最多
在2013年，PHILADELPHIA县的报告数最多，但它所属的州PA不是报告数最多的，OH州报告数最多
在2014年，PHILADELPHIA县的报告数最多，但它所属的州PA不是报告数最多的，OH州报告数最多
在2015年，PHILADELPHIA县的报告数最多，但它所属的州PA不是报告数最多的，OH州报告数最多
在2016年，HAMILTON县的报告数最多，它所属的州OH也是报告数最多的
在2017年，HAMILTON县的报告数最多，它所属的州OH也是报告数最多的




DrugReports    Heroin
dtype: object



DrugReports    Acetyl fentanyl
dtype: object