python笔记
os.listdir()方法
os.listdir()方法返回指定的文件夹中包含的文件或文件夹zhi的列表,这个列表以字母顺序。不包含’.‘和’…’,即使它在文件夹中。
语法
listdir()语法格式如下:
os.listdir(path)
参数:path需要指定的目录路径
返回值:返回指定路径下的文件和文件夹列表
实例
import os
fileList = os.listdir('E:\pycharm\work\cookbook\day6\data')
print(fileList)
输出读取文件
包括(值,类型,数量,维度数,行列)
print(type(detail))
print(detail.values) # 值
print(detail.columns)
print(detail.dtypes) # 每一行的类型
print("元素的个数",detail.size) # 数量
print("5",detail.ndim) # 维度数
print("6",detail.shape) # 行列
### 输出
#<class 'pandas.core.frame.DataFrame'>
[[2956 417 610062 ... nan 'caipu/104001.jpg' 1442]
[2958 417 609957 ... nan 'caipu/202003.jpg' 1442]
[2961 417 609950 ... nan 'caipu/303001.jpg' 1442]
...
[6756 774 609949 ... nan 'caipu/404005.jpg' 1138]
[6763 774 610014 ... nan 'caipu/302003.jpg' 1138]
[6764 774 610017 ... nan 'caipu/302006.jpg' 1138]]
Index(['detail_id', 'order_id', 'dishes_id', 'logicprn_name',
'parent_class_name', 'dishes_name', 'itemis_add', 'counts', 'amounts',
'cost', 'place_order_time', 'discount_amt', 'discount_reason',
'kick_back', 'add_inprice', 'add_info', 'bar_code', 'picture_file',
'emp_id'],
dtype='object')
detail_id int64
order_id int64
dishes_id int64
logicprn_name float64
parent_class_name float64
dishes_name object
itemis_add int64
counts int64
amounts int64
cost float64
place_order_time datetime64[ns]
discount_amt float64
discount_reason float64
kick_back float64
add_inprice int64
add_info float64
bar_code float64
picture_file object
emp_id int64
dtype: object
元素的个数 52801
5 2
6 (2779, 19)
使用字典访问单列数据
import pandas as pd
# 使用字典访问单列数据
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
order_id = detail['order_id']
print(order_id)
print(type(order_id), order_id.shape)
使用访问属性的方式访问单列数据
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
dishes_name = detail.dishes_name
print(dishes_name)
单列,自主选择行数
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
dishes_name5 = detail["dishes_name"][1:6] # [start,end)
print(dishes_name5)
多列多行
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
orderDish = detail[['order_id','dishes_name']][0:5]
print(orderDish)
所有的列指定行数
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
order6 = detail[:][1:6]
print(order6)
Dataframe的方法head.tail
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
print("detail.head()",detail.head()) # head()表示前5行的数据
print("detail.tail()",detail.tail()) # tail()表示后5行的数据
loc函数:通过行索引 “Index” 中的具体值来取行数据(如取"Index"为"A"的行)
iloc函数:通过行号来取行数据(如取第二行的数据)
oc/iloc实现单列切片
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
dishes_name1 = detail.loc[:,'dishes_name']
print(dishes_name1)
使用loc打印出[‘order_id’,‘dishes_name’]的[1:6)
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
dishes_name2 = detail.loc[1:6, ['order_id','dishes_name']]
print(dishes_name2)
iloc使用
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
print(detail.values)
orderDish = detail.iloc[1,[1,5]]
# 第3行第1列的元素,第3行第3列的元素,这里的行列是dataFrame的行列表格,不是excel,
# 因为dataFrame已经重新排序了
print("iloc的orderDish \n",orderDish)
iloc的多行多列
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
orderDish = detail.iloc[2:7,[1,3]]
print("iloc的orderDish \n",orderDish)
条件切片,指定一个条件
order_id为458的dishes_name
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
orderDish = detail.loc[detail['order_id']==458, ['order_id','dishes_name']]
print(orderDish)
ix-iloc、loc的比较
#ix-iloc\loc比较
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
dishes_name_loc = detail.loc[2:6, 'dishes_name']
print(dishes_name_loc)
dishes_name_iloc = detail.iloc[2:6, 5]
print(dishes_name_iloc)
dishes_name_ix = detail.ix[2:6, 5]
print(dishes_name_ix)
更改
将order_id为458的变换为45800
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
detail.loc[detail['order_id']==458, 'order_id'] = 45800
print(detail.loc[detail['order_id']==458, 'order_id'])
print('/')
print(detail.loc[detail['order_id']==45800, 'order_id'])
增加
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
detail['payment'] = detail['counts'] * detail['amounts']
print(detail['payment'])
detail['pay_way'] = '现金支付'
print(detail['pay_way'])
删除
删除某列
print(detail.columns)
detail.drop(labels='pay_way', axis=1, inplace=True)
print(detail.columns)
inplace参数的理解:
修改一个对象时:
inplace=True:不创建新的对象,直接对原始对象进行修改;
inplace=False:对数据进行修改,创建并返回新的对象承载其修改结果。
删除某行
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
print(len(detail))
detail.drop(labels=range(1,11), axis=0, inplace=True)
print(len(detail))
dataframe的统计描述
算个平均值
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
print("numpy的平均值",np.mean(detail['amounts']))
print('pandas求平均值:',detail['amounts'].mean())
print(detail[['amounts','counts']].describe())
统计一下频数
detail = pd.read_excel("E:\pycharm\work\cookbook\day6\data\meal_order_detail.xlsx")
print(detail['dishes_name'].value_counts())
print(detail['dishes_name'].value_counts()[0:10])