Python操作excel库xlrd详解

ftzchina

已于 2022-10-06 16:18:15 修改

阅读量3.5k

点赞数 7

分类专栏： Python 文章标签： python xlrd Excel

于 2022-05-16 15:15:43 首次发布

本文链接：https://blog.csdn.net/qq_27071221/article/details/124798545

版权

Python 专栏收录该内容

54 篇文章 12 订阅

订阅专栏

最近在工作中需要对Excel文件中的内容进行提取操作，由于数据量众多需借助工具才能展开，由此用到了Python对Excel操作的第三方库xlrd。

一，准备的Excel文件

首先准备一份Excel文件，便于后续对该库的演示操作

二，操作Excel

2.1 打开Excel文件

import xlrd

path = "D:\\演示\\test.xlsx"
testNJ = xlrd.open_workbook(path)

2.2 获取Excel中所有的sheet

testNJ.sheet_names()

结果如下：

2.3 获取一个工作表对象

>>> table = testNJ.sheets()                #方法一
>>> table
[<xlrd.sheet.Sheet object at 0x0000000002FF4390>, <xlrd
000000000303D7B8>]
>>> table[0]
<xlrd.sheet.Sheet object at 0x0000000002FF4390>
>>>
>>> table = testNJ.sheet_by_index(0)        #方法二
>>> table
<xlrd.sheet.Sheet object at 0x0000000002FF4390>
>>>
>>> table = testNJ.sheet_by_name("景点")      #方法三
>>> table
<xlrd.sheet.Sheet object at 0x0000000002FF4390>

有以上三个方法获取一个工作表对象，用第三种最方便，代码也最易读

2.4 获取工作表的行数和列数

>>> table.nrows
7
>>> table.ncols
2

2.5 获取工作表整行和整列的值

table.row_values(num1)
table.col_values(num2)

这里需要注意两个参数：num1和num2。num1在row_values()中，指的是选取的行数是多少，例如我们选取第一行所有字段名称数据，那么这个num1就是0。同理，col_values()的参数就是第几列的意思。该操作的返回值是列表

>>> table.row_values(0)
['ID', 'NAME']
>>>
>>> table.row_values(1)
[1.0, '夫子庙']
>>>
>>>
>>> table.col_values(1)
['NAME', '夫子庙', '中山陵', '纪念馆', '玄武湖', '老山', '汤山']
>>>

还可以指定从第几行或者从第几列开始，例如从第二列的第三行开始获取后面的列的数据

>>> table.col_values(colx=1,start_rowx=2)
['中山陵', '纪念馆', '玄武湖', '老山', '汤山']
>>>

2.6 获取单元格的对象和值

获取单元格的对象
>>> table.row(1)[1]
text:'夫子庙'
>>> table.cell(1,1)
text:'夫子庙'
>>> table.col(1)[1]
text:'夫子庙'
>>>
获取单元格的值
>>>table.cell_value(1,1)
>>>'夫子庙'

比如我要获取"夫子庙"所在单元格的对象，以上三种方法都可以，但是如果要取出里面值得用cell_value才行

2.7 获取字体的颜色信息

import xlrd

gongDan = xlrd.open_workbook("daemon.xls",formatting_info=True)

gongDanTb = gongDan.sheet_by_index(0)

xfx = gongDanTb.cell_xf_index(0, 0)

xf = gongDan.xf_list[xfx]

print(gongDan.font_list[xf.font_index].colour_index)

2.8 获取合并单元格的值

def readExcel(self):
		self.gongDan = xlrd.open_workbook(self.excelPath,formatting_info=True)
		self.gongDanTable = self.gongDan.sheet_by_index(0)

def get_merged_cells(self):
		"""
		获取所有的合并单元格，格式如下：
		[(4, 5, 2, 4), (5, 6, 2, 4), (1, 4, 3, 4)]
		(4, 5, 2, 4) 的含义为：行 从下标4开始，到下标5（不包含）  列 从下标2开始，到下标4（不包含），为合并单元格
		"""
		return self.gongDanTable.merged_cells


def get_merged_cells_value(self, merged, row_index, col_index):
		"""
		先判断给定的单元格，是否属于合并单元格；
		如果是合并单元格，就返回合并单元格的内容
		"""
		for (rlow, rhigh, clow, chigh) in merged:
			if (row_index >= rlow and row_index < rhigh):
				if (col_index >= clow and col_index < chigh):
					cell_value = self.gongDanTable.cell_value(rlow, clow)
					#print('该单元格[%d,%d]属于合并单元格，值为[%s]' % (row_index, col_index, cell_value))
					return cell_value
		return ''


def collectLineInfo(self):
		rows_num = self.gongDanTable.nrows # 行数
		cols_num = self.gongDanTable.ncols # 列数
		merged = self.get_merged_cells() # 获取所有的合并单元格
		for r in range(rows_num):
			for c in range(cols_num):
				cell_value = self.gongDanTable.row_values(r)[c]
				if cell_value is None or cell_value == '':
					cell_value = self.get_merged_cells_value(merged, r, c)

三，总结

以上只是呈现出了对工作表对象几个常用的Excel操作，其实其支持的方法还有很多，大家可以根据实际需要去查询自己需要的方法或者属性

>>> dir(table)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__form
at__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_s
ubclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclas
shook__', '__weakref__', '_cell_attr_to_xfx', '_cell_types', '_cell_values', '_c
ell_xf_indexes', '_dimncols', '_dimnrows', '_first_full_rowx', '_ixfe', '_maxdat
acolx', '_maxdatarowx', '_position', '_repr_these', '_xf_index_stats', '_xf_inde
x_to_xl_type_map', 'automatic_grid_line_colour', 'bf', 'biff_version', 'book', '
bt', 'cached_normal_view_mag_factor', 'cached_page_break_preview_mag_factor', 'c
ell', 'cell_note_map', 'cell_type', 'cell_value', 'cell_xf_index', 'col', 'col_l
abel_ranges', 'col_slice', 'col_types', 'col_values', 'colinfo_map', 'columns_fr
om_right_to_left', 'computed_column_width', 'cooked_normal_view_mag_factor', 'co
oked_page_break_preview_mag_factor', 'default_additional_space_above', 'default_
additional_space_below', 'default_row_height', 'default_row_height_mismatch', 'd
efault_row_hidden', 'defcolwidth', 'dump', 'fake_XF_from_BIFF20_cell_attr', 'fir
st_visible_colx', 'first_visible_rowx', 'fixed_BIFF2_xfindex', 'formatting_info'
, 'gcw', 'get_rows', 'gridline_colour_index', 'gridline_colour_rgb', 'handle_fea
t11', 'handle_hlink', 'handle_msodrawingetc', 'handle_note', 'handle_obj', 'hand
le_quicktip', 'handle_txo', 'has_pane_record', 'horizontal_page_breaks', 'horz_s
plit_first_visible', 'horz_split_pos', 'hyperlink_list', 'hyperlink_map', 'inser
t_new_BIFF20_xf', 'logfile', 'merged_cells', 'name', 'ncols', 'nrows', 'number',
 'panes_are_frozen', 'put_cell', 'put_cell_ragged', 'put_cell_unragged', 'ragged
_rows', 'read', 'remove_splits_if_pane_freeze_is_removed', 'req_fmt_info', 'rich
_text_runlist_map', 'row', 'row_label_ranges', 'row_len', 'row_slice', 'row_type
s', 'row_values', 'rowinfo_map', 'scl_mag_factor', 'sheet_selected', 'sheet_visi
ble', 'show_formulas', 'show_grid_lines', 'show_in_page_break_preview', 'show_ou
tline_symbols', 'show_sheet_headers', 'show_zero_values', 'split_active_pane', '
standardwidth', 'string_record_contents', 'tidy_dimensions', 'update_cooked_mag_
factors', 'utter_max_cols', 'utter_max_rows', 'verbosity', 'vert_split_first_vis
ible', 'vert_split_pos', 'vertical_page_breaks', 'visibility']
>>>

同样对于打开的整个Excel对象也有很多属性和方法

>>>
>>> dir(testNJ)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__e
q__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash_
_', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__',
 '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof_
_', '__str__', '__subclasshook__', '__weakref__', '_all_sheets_count', '_all_she
ets_map', '_externsheet_info', '_externsheet_type_b57', '_extnsht_count', '_extn
sht_name_from_num', '_repr_these', '_resources_released', '_rich_text_runlist_ma
p', '_sh_abs_posn', '_sharedstrings', '_sheet_list', '_sheet_names', '_sheet_num
_from_name', '_sheet_visibility', '_sheethdr_count', '_supbook_addins_inx', '_su
pbook_count', '_supbook_locals_inx', '_supbook_types', '_xf_epilogue_done', '_xf
_index_to_xl_type_map', 'actualfmtcount', 'addin_func_names', 'biff2_8_load', 'b
iff_version', 'builtinfmtcount', 'codepage', 'colour_map', 'countries', 'datemod
e', 'derive_encoding', 'dump', 'encoding', 'fake_globals_get_sheet', 'filestr',
'font_list', 'format_list', 'format_map', 'formatting_info', 'get2bytes', 'get_r
ecord_parts', 'get_record_parts_conditional', 'get_sheet', 'get_sheets', 'getbof
', 'handle_boundsheet', 'handle_builtinfmtcount', 'handle_codepage', 'handle_cou
ntry', 'handle_datemode', 'handle_externname', 'handle_externsheet', 'handle_fil
epass', 'handle_name', 'handle_obj', 'handle_sheethdr', 'handle_sheetsoffset', '
handle_sst', 'handle_supbook', 'handle_writeaccess', 'initialise_format_info', '
load_time_stage_1', 'load_time_stage_2', 'logfile', 'mem', 'name_and_scope_map',
 'name_map', 'name_obj_list', 'names_epilogue', 'nsheets', 'on_demand', 'palette
_record', 'parse_globals', 'props', 'ragged_rows', 'raw_user_name', 'read', 'rel
ease_resources', 'sheet_by_index', 'sheet_by_name', 'sheet_loaded', 'sheet_names
', 'sheets', 'style_name_map', 'unload_sheet', 'use_mmap', 'user_name', 'verbosi
ty', 'xf_list', 'xfcount']
>>>

ftzchina

关注

7
点赞
踩
27

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python操作excel库xlrd详解

最近在工作中需要对Excel文件中的内容进行提取操作，由于数据量众多需借助工具才能展开，由此用到了Python对Excel操作的第三方库xlrd。一，准备的Excel文件首先准备一份Excel文件，便于后续对该库的演示操作二，操作Excel2.1 打开Excel文件import xlrdpath = "D:\\演示\\test.xlsx"testNJ = xlrd.open_workbook(path)2.2 ......
复制链接

扫一扫