继续上一篇文章的内容,进行python basic的梳理,这次主要的知识点会集中在数据分析处理库pandas内,这个作为数据分析处理最为常用的库实为重要,所以也是多次出现,这里只是做一些常用的语法做一些笔记记录,更多的知识点还是需要大家自行去寻找学习了。
#导入所需的数据源
import pandas as pd
food_info = pd.read_csv("/Users/yongsenlin/Desktop/food_info.csv")
#观测前5行数据
first_rows = food_info.head()
print (first_rows)
#打印出所有的列(字段)
print (food_info.columns)
output:
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)','Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)','Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)','Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)','Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)','Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)','Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg','Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)','Cholestrl_(mg)'],dtype='object')
#打印出数据集的行数和列数
print (food_info.shape)
output:
(8618, 36)
#pandas 的索引从0开始计算
print (food_info.loc[0])
output:
NDB_No 1001 Shrt_Desc BUTTER WITH SALT Water_(g) 15.87 Energ_Kcal 717 Protein_(g) 0.85 Lipid_Tot_(g) 81.11 Ash_(g) 2.11 Carbohydrt_(g) 0.06 Fiber_TD_(g) 0 Sugar_Tot_(g) 0.06 Calcium_(mg) 24 Iron_(mg) 0.02 Magnesium_(mg) 2 Phosphorus_(mg) 24 Potassium_(mg) 24 Sodium_(mg) 643 Zinc_(mg) 0.09 Copper_(mg) 0 Manganese_(mg) 0 Selenium_(mcg) 1 Vit_C_(mg) 0 Thiamin_(mg) 0.005 Riboflavin_(mg) 0.034 Niacin_(mg) 0.042 Vit_B6_(mg) 0.003 Vit_B12_(mcg) 0.17 Vit_A_IU 2499 Vit_A_RAE 684 Vit_E_(mg) 2.32 Vit_D_mcg 1.5 Vit_D_IU 60 Vit_K_(mcg) 7 FA_Sat_(g) 51.368 FA_Mono_(g) 21.021 FA_Poly_(g) 3.043 Cholestrl_(mg) 215 Name: 0, dtype: object
#数据类型的种类
1-object - For string values
2-int - For integer values
3-float - For float values
4-datetime - For time values
5-bool - For Boolean values
print(food_info.dtypes)
output:
NDB_No int64 Shrt_Desc object Water_(g) float64 Energ_Kcal int64 Protein_(g) float64 Lipid_Tot_(g) float64 Ash_(g) float64 Carbohydrt_(g) float64 Fiber_TD_(g) float64 Sugar_Tot_(g) float64 Calcium_(mg) float64 Iron_(mg) float64 Magnesium_(mg) float64 Phosphorus_(mg) float64 Potassium_(mg) float64 Sodium_(mg) float64 Zinc_(mg) float64 Copper_(mg) float64 Manganese_(mg) float64 Selenium_(mcg) float64 Vit_C_(mg) float64 Thiamin_(mg) float64 Riboflavin_(mg) float64 Niacin_(mg) float64 Vit_B6_(mg) float64 Vit_B12_(mcg) float64 Vit_A_IU float64 Vit_A_RAE float64 Vit_E_(mg) float64 Vit_D_mcg float64 Vit_D_IU float64 Vit_K_(mcg) float64 FA_Sat_(g) float64 FA_Mono_(g) float64 FA_Poly_(g) float64 Cholestrl_(mg) float64 dtype: object
#选取指定行数的数据
Method 1
two_five_ten = [2,5,10]
food_info.loc[two_five_ten]
Method 2
food_info.loc[[2,5,10]]
# 选取特别列数的数据
Method 1
ndb_col = food_info["NDB_No"]
print (ndb_col)
output:
0 1001 1 1002 2 1003 3 1004 4 1005 5 1006 6 1007 7 1008 8 1009 9 1010 10 1011 11 1012 12 1013 13 1014 14 1015 15 1016 16 1017 17 1018 18 1019 19 1020 20 1021 21 1022 22 1023 23 1024 24 1025 25 1026 26 1027 27 1028 28 1029 29 1030 ... 8588 43544 8589 43546 8590 43550 8591 43566 8592 43570 8593 43572 8594 43585 8595 43589 8596 43595 8597 43597 8598 43598 8599 44005 8600 44018 8601 44048 8602 44055 8603 44061 8604 44074 8605 44110 8606 44158 8607 44203 8608 44258 8609 44259 8610 44260 8611 48052 8612 80200 8613 83110 8614 90240 8615 90480 8616 90560 8617 93600 Name: NDB_No, Length: 8618, dtype: int64
Method 2
columns = ["Zinc_(mg)", "Copper_(mg)"]
zinc_copper = food_info[columns]
print (zinc_copper)
Method 3
zinc_copper = food_info[["Zinc_(mg)", "Copper_(mg)"]]
print (zinc_copper)
output:
Zinc_(mg) Copper_(mg) 0 0.09 0.000 1 0.05 0.016 2 0.01 0.001 3 2.66 0.040 4 2.60 0.024 5 2.38 0.019 6 2.38 0.021 7 2.94 0.024 8 3.43 0.056 9 2.79 0.042 10 3.07 0.042 11 0.40 0.029 12 0.33 0.040 13 0.47 0.030 14 0.51 0.033 15 0.38 0.028 16 0.51 0.019 17 3.75 0.036 18 2.88 0.032 19 3.50 0.025 20 1.14 0.080 21 3.90 0.036 22 3.90 0.032 23 2.10 0.021 24 3.00 0.032 25 2.92 0.011 26 2.46 0.022 27 2.76 0.025 28 3.61 0.034 29 2.81 0.031 ... ... ... 8588 3.30 0.377 8589 0.05 0.040 8590 0.05 0.030 8591 1.15 0.116 8592 5.03 0.200 8593 3.83 0.545 8594 0.08 0.035 8595 3.90 0.027 8596 4.10 0.100 8597 3.13 0.027 8598 0.13 0.000 8599 0.02 0.000 8600 0.09 0.037 8601 0.21 0.026 8602 2.77 0.571 8603 0.41 0.838 8604 0.05 0.028 8605 0.03 0.023 8606 0.10 0.112 8607 0.02 0.020 8608 1.49 0.854 8609 0.19 0.040 8610 0.10 0.038 8611 0.85 0.182 8612 1.00 0.250 8613 1.10 0.100 8614 1.55 0.033 8615 0.19 0.020 8616 1.00 0.400 8617 1.00 0.250 [8618 rows x 2 columns]
#选取特别字符结尾的列
gram_columns = []
for c in col_names:
if c.endswith("(g)"):
gram_columns.append(c)
gram_df = food_info[gram_columns]
print(gram_df.head(3))
—End—