Pandas 数据读取
1.pandas.read_csv("文件名")
in:
import pandas
food_info = pandas.read_csv("food_info.csv")
print(type(food_info))
print (food_info.dtypes)
pandas.read_csv("文件名"):读取以逗号为分隔符的文件。
print(type(food_info)):打印文件元素类型,此处为food_info为datafram格式(表格格式)返回表格列的类型
int64、float64、object64(pandas中的字符类型)
out:
NDB_No int64
Shrt_Desc object
Water_(g) float64
Energ_Kcal int64
dtype: object
2.food_info.head(行数)
in:
first_rows = food_info.head()
#print first_rows
print(food_info.head(3))
print (food_info.columns)
#print food_info.shape
print(food_info.head(3)):打印表格的前3行
food_info.head():代表所有行
food_info.tail(4):返回倒数4行的表格
print (food_info.columns):返回列名
print food_info.shape:打印表格的行数和列数
out:
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g) \
0 1001 BUTTER WITH SALT 15.87 717 0.85
1 1002 BUTTER WHIPPED WITH SALT 15.87 717 0.85
2 1003 BUTTER OIL ANHYDROUS 0.24 876 0.28
Lipid_Tot_(g) Ash_(g) Carbohydrt_(g) Fiber_TD_(g) Sugar_Tot_(g) \
0 81.11 2.11 0.06 0.0 0.06
1 81.11 2.11 0.06 0.0 0.06
2 99.48 0.00 0.00 0.0 0.00
... Vit_A_IU Vit_A_RAE Vit_E_(mg) Vit_D_mcg Vit_D_IU \
0 ... 2499.0 684.0 2.32 1.5 60.0
1 ... 2499.0 684.0 2.32 1.5 60.0
2 ... 3069.0 840.0 2.80 1.8 73.0
Vit_K_(mcg) FA_Sat_(g) FA_Mono_(g) FA_Poly_(g) Cholestrl_(mg)
0 7.0 51.368 21.021 3.043 215.0
1 7.0 50.489 23.426 3.012 219.0
2 8.6 61.924 28.732 3.694 256.0
[3 rows x 36 columns]
Index(['NDB_No', 'Shrt_Desc', 'Water_(g)', 'Energ_Kcal', 'Protein_(g)',
'Lipid_Tot_(g)', 'Ash_(g)', 'Carbohydrt_(g)', 'Fiber_TD_(g)',
'Sugar_Tot_(g)', 'Calcium_(mg)', 'Iron_(mg)', 'Magnesium_(mg)',
'Phosphorus_(mg)', 'Potassium_(mg)', 'Sodium_(mg)', 'Zinc_(mg)',
'Copper_(mg)', 'Manganese_(mg)', 'Selenium_(mcg)', 'Vit_C_(mg)',
'Thiamin_(mg)', 'Riboflavin_(mg)', 'Niacin_(mg)', 'Vit_B6_(mg)',
'Vit_B12_(mcg)', 'Vit_A_IU', 'Vit_A_RAE', 'Vit_E_(mg)', 'Vit_D_mcg',
'Vit_D_IU', 'Vit_K_(mcg)', 'FA_Sat_(g)', 'FA_Mono_(g)', 'FA_Poly_(g)',
'Cholestrl_(mg)'],
dtype='object')
3.print (food_info.loc[行数])
in:
#pandas uses zero-indexing
#Series object representing the row at index 0.
print (food_info.loc[0])
# Series object representing the seventh row.
#food_info.loc[6]
# Will throw an error: "KeyError: 'the label [8620] is not in the [index]'"
#food_info.loc[8620]
#The object dtype is equivalent to a string in Python
print (food_info.loc[0]):打印列表第0行的数据
out:
NDB_No 1001
Shrt_Desc BUTTER WITH SALT
Water_(g) 15.87
Energ_Kcal 717
Protein_(g) 0.85
Lipid_Tot_(g) 81.11
FA_Poly_(g) 3.043
Cholestrl_(mg) 215
Name: 0, dtype: object
in:
# Returns a DataFrame containing the rows at indexes 3, 4, 5, and 6.
#food_info.loc[3:6]
# Returns a DataFrame containing the rows at indexes 2, 5, and 10. Either of the following approaches will work.
# Method 1
#two_five_ten = [2,5,10]
#food_info.loc[two_five_ten]
# Method 2
#food_info.loc[[2,5,10]]
out:
NDB_No Shrt_Desc
2 1003 BUTTER OIL ANHYDROUS
5 1006 CHEESE BRIE
10 1011 CHEESE COLBY
in:
# Series object representing the "NDB_No" column.
ndb_col = food_info["NDB_No"]
print (ndb_col)
# Alternatively, you can access a column by passing in a string variable.
#col_name = "NDB_No"
#ndb_col = food_info[col_name]
打印“NDB_No”则一列
out:
0 1001
1 1002
2 1003
3 1004
4 1005
...
8613 83110
8614 90240
8615 90480
8616 90560
8617 93600
Name: NDB_No, Length: 8618, dtype: int64
in:
columns = ["Zinc_(mg)", "Copper_(mg)"]
zinc_copper = food_info[columns]
print (zinc_copper)
#print zinc_copper
# Skipping the assignment.
#zinc_copper = food_info[["Zinc_(mg)", "Copper_(mg)"]]
food_info[columns]:输出Zinc_(mg) 、 Copper_(mg)两列
out:
Zinc_(mg) Copper_(mg)
0 0.09 0.000
1 0.05 0.016
2 0.01 0.001
3 2.66 0.040
4 2.60 0.024
5 2.38 0.019
[8618 rows x 2 columns]