四大会计师事务所python数据分析_python数据分析案例(四)

密码:oknt

思路:将json文件转加载到python中,转化为两个DataFrame(食物、营养成分),以id列进行内连接,根据group和nutrient列分组后,得到每种营养成分在各种食物中的含量高低

step1

import json

data=json.load(open(r'C:\Users\Administrator\Downloads\pydata-book-2nd-edition\datasets\usda_food\database.json'))

#将json格式数据加载到python中,data中都是字典型对象,对应一种食品及其详细介绍

data[0]

#data中的第一个元素,为第一种食物的有关介绍

data[0].keys()

dict_keys(['id', 'description', 'tags', 'manufacturer', 'group', 'portions', 'nutrients'])

#data1中的第一个元素的几个方面

data[0]['nutrients']

#第一种食物的所有营养成分

step2

food=pd.DataFrame(data,columns=['id','description','manufacturer','group'])

food.head()

id description manufacturer group

0 1008 Cheese, caraway Dairy and Egg Products

1 1009 Cheese, cheddar Dairy and Egg Products

2 1018 Cheese, edam Dairy and Egg Products

3 1019 Cheese, feta Dairy and Egg Products

4 1028 Cheese, mozzarella, part skim milk Dairy and Egg Products

#每个元素排列为DataFrame中的每一行,元素中的键作为DataFrame的列

food['group'].value_counts()[:10]

Vegetables and Vegetable Products 812

Beef Products 618

Baked Products 496

Breakfast Cereals 403

Fast Foods 365

Legumes and Legume Products 365

Lamb, Veal, and Game Products 345

Sweets 341

Pork Products 328

Fruits and Fruit Juices 328

#前十种最多的食品种类

step3

total=[]

number=range(0,len(data))

for i in number:

frame=pd.DataFrame(data[i]['nutrients'])

frame['id']=data[i]['id']

total.append(frame)

nutrients=pd.concat(total,ignore_index=True)

nutrients[:5]

description group units value id

0 Protein Composition g 25.18 1008

1 Total lipid (fat) Composition g 29.20 1008

2 Carbohydrate, by difference Composition g 3.06 1008

3 Ash Other g 3.28 1008

4 Energy Energy kcal 376.00 1008

#利用concat函数将所有元素的每个营养成分数据整合在一张DataFrame中

step4

nutrients.duplicated().sum()

14179

#求所有重复项数量

nutrients=nutrients.drop_duplicates()

#删除重复项

nutrients=nutrients.rename(columns={'description':'nutirent','group':'nutgroup',copy=False)

#对重复的列名进行重命名

nutrients[:5]

nutrient nutgroup units value id

0 Protein Composition g 25.18 1008

1 Total lipid (fat) Composition g 29.20 1008

2 Carbohydrate, by difference Composition g 3.06 1008

3 Ash Other g 3.28 1008

4 Energy Energy kcal 376.00 1008

step5

detail=pd.merge(data,nutrients,on='id',how='outer')

detail.head()

id description manufacturer group nutrient nutgroup units value

0 1008 Cheese, caraway Dairy and Egg Products Protein Composition g 25.18

1 1008 Cheese, caraway Dairy and Egg Products Total lipid (fat) Composition g 29.20

2 1008 Cheese, caraway Dairy and Egg Products Carbohydrate, by difference Composition g 3.06

3 1008 Cheese, caraway Dairy and Egg Products Ash Other g 3.28

4 1008 Cheese, caraway Dairy and Egg Products Energy Energy kcal 376.00

#将两个DataFrame根据id列进行外连接

step6

result=detail.groupby(['nutrient','group'])['value'].quantile(0.5)

result[:10]

nutrient group

Adjusted Protein Sweets 12.900

Vegetables and Vegetable Products 2.180

Alanine Baby Foods 0.085

Baked Products 0.248

Beef Products 1.550

Beverages 0.003

Breakfast Cereals 0.311

Cereal Grains and Pasta 0.373

Dairy and Egg Products 0.271

Ethnic Foods 1.290

result['Alanine'].sort_values(ascending=False)

group

Beef Products 1.5500

Lamb, Veal, and Game Products 1.4080

Poultry Products 1.3940

Pork Products 1.3070

Ethnic Foods 1.2900

Finfish and Shellfish Products 1.2180

Sausages and Luncheon Meats 0.9420

Nut and Seed Products 0.7345

Spices and Herbs 0.5550

Fast Foods 0.5140

Restaurant Foods 0.4650

Snacks 0.4335

Legumes and Legume Products 0.4100

Cereal Grains and Pasta 0.3730

Meals, Entrees, and Sidedishes 0.3270

Breakfast Cereals 0.3110

Dairy and Egg Products 0.2710

Baked Products 0.2480

Sweets 0.1020

Baby Foods 0.0850

Vegetables and Vegetable Products 0.0840

Soups, Sauces, and Gravies 0.0650

Fruits and Fruit Juices 0.0270

Beverages 0.0030

Fats and Oils 0.0000

step7

result['Alanine'].sort_values(ascending=False)[:10].plot()

#找到所有食物种类的中含Alanine成分最多的前十种

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值