『Python』Excel文件的读取以及DataFrame的相关操作（3）

最新推荐文章于 2024-08-03 19:27:22 发布

Python捞数人

最新推荐文章于 2024-08-03 19:27:22 发布

阅读量2.5k

点赞数

分类专栏： Python 文章标签： python 数据挖掘 pycharm

本文链接：https://blog.csdn.net/m0_47149835/article/details/124851147

版权

Python 专栏收录该内容

21 篇文章 2 订阅

订阅专栏

$e n v :$
$\space \space \space pandas: \space 1.2.2$
$\space \space \space xlsings: \space 0.22.2$

1.获取 $E x c e l$ 中已经存在的所有 $S h e e t$

# Pandas / 方法_1
import pandas as pd
file_path = r'...'
sht_names = pd.ExcelFile(file_path).sheet_names # 返回所有 sheet 名称组成的列表

# xlsings / 方法_2
import xlwings as xw
app = xw.App(visible=False, add_book=False)
app.display_alerts = False
app.screen_updating = False
wb_source = app.books.open(file_path)
sht_names = [wb_source.sheets[i].name for i in range(len(wb_source.sheets))]
wb_output.close()
app.quit()

2.读取 $E x c e l$ 中指定位置的 $S h e e t$

# 读取 excel 中的第二个 sheet
# Pandas / 方法_1  -->  返回DateFrame
df = pd.read_excel(file_path, sheet_name=1)


# xlsings / 方法_2  -->  返回Sheet对象
app = xw.App(visible=False, add_book=False)
app.display_alerts = False
app.screen_updating = False
wb_source = app.books.open(file_path)
sht = wb_source.sheets[1]
df = sht[sht.used_range.address].options(pd.DataFrame, index=False, header=True)
wb_output.close()
app.quit()

3.读取 $E x c e l$ 中指定名称的 $S h e e t$

# 读取名为 playing 的 sheet
# Pandas / 方法_1
df = pd.read_excel(file_path, sheet_name='playing')

# xlsings / 方法_2
app = xw.App(visible=False, add_book=False)
app.display_alerts = False
app.screen_updating = False
wb_source = app.books.open(file_path)
sht = wb_source.sheets['Playing']
df = sht[sht.used_range.address].options(pd.DataFrame, index=False, header=True)
wb_output.close()
app.quit()

4.根据关键字读取 $E x c e l$ 中可能存在的 $S h e e t$

结合 $r e$ 包或者 $f i l t e r$ 函数，再加上第 $1$ 点就能轻松实现

import re
df = pd.read_excel(file_path)
sheet_list = pd.ExcelFile(file_path).sheet_names
print(sheet_list)
"""
用于演示的 excel 里有以下 sheet
['Sheet_for_show',
 '2022_05_21 Testing (2)',
 'Testing 2022_05_21 ',
 'data',
 'picture_072']
"""
needed_sht = list(filter(lambda x: '2022' in x, sheet_list))
print(needed_sht)
"""
['2022_05_21 Testing (2)', 'Testing 2022_05_21 ']
"""
# 返回一个字典: 键为sheet的名字，值为相应sheet下的DataFrame
df_dict = pd.read_excel(file_path, sheet_name=needed_sht) 
# 针对复杂一些的搜索规则，可以考虑使用re包来获取自己需要用到的sheet

5.利用二维列表生成 $D a t a F r a m e$

value_list = [['a', 'csa', 'asaxa', 'xcazxa], ['xcasz', 'axawax', 'xcegea', 'xzzxnakj']]
col_name = ['col_a', 'col_b', 'col_c', 'col_d']
df = pd.DataFrame(value_list, columns=col_name)
"""
   col_a   col_b   col_c     col_d
0      a     csa   asaxa    xcazxa
1  xcasz  axawax  xcegea  xzzxnakj
"""

6.一次性修改多列类型

# 读取文件前
pd.read_excel(file_path, dtype={'columns_1': np.float64, 'column_2': np.int32})

# 读取文件后
df.astype({'columns_1': 'float', 'column_2': 'int'})

7.同时验证多个值是否存在与某列内

df = pd.read_excel(file_path)
column_values = df[column].tolist()

# 验证 'test_1' 'test_2' 是否同时存在于该列内
check_list = ['test_1', 'test_2']
all(judge in column_values for judge in check_list)