机器学习之数据预处理，Pandas读取excel数据

使用Pandas进行Excel数据预处理在机器学习中的应用

最新推荐文章于 2025-09-11 17:41:48 发布

原创

最新推荐文章于 2025-09-11 17:41:48 发布 · 4.7k 阅读

10 ·

CC 4.0 BY-SA版权

文章标签：

#Python #Pandas #机器学习 #数据预处理

本文介绍了如何利用Python的Pandas库进行Excel数据预处理，特别是针对机器学习的需求。Pandas提供了read_excel()函数来读取Excel文件，支持多种参数定制，如指定sheet_name、header、names、index_col和usecols等。通过实例展示了如何读取、写入Excel数据，并进行特定列的选择和格式转换，以满足机器学习中数值格式的需求。

Python读写excel的工具库很多，比如最耳熟能详的xlrd、xlwt，xlutils，openpyxl等。其中xlrd和xlwt库通常配合使用，一个用于读，一个用于写excel。xlutils结合xlrd可以达到修改excel文件目的。openpyxl可以对excel文件同时进行读写操作。

而说到数据预处理，pandas就体现除了它的强大之处，并且它还支持可读写多种文档格式，其中就包括对excel的读写。本文重点就是介绍pandas对excel数据集的预处理。

机器学习常用的模型对数据输入都是有要求的，多数机器学习算法最基本的要求是训练数据要转换成数值格式。当然，也有像决策树算法这种不需要转换为数值的算法，这里不做特例讨论。

pandas读取excel文件的函数是pandas.read_excel()，主要参数包括：

io : 读取的excel文档地址，

string, path object (pathlib.Path or py._path.local.LocalPath),

file-like object, pandas ExcelFile, or xlrd workbook. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file://localhost/path/to/workbook.xlsx

sheet_name : 读取的excel指定的sheet页

string, int, mixed list of strings/ints, or None, default 0

Strings are used for sheet names, Integers are used in zero-indexed sheet positions.

Lists of strings/integers are used to request multiple sheets.

Specify None to get all sheets.

str|int -> DataFrame is returned. list|None -> Dict of DataFrames is returned, with keys representing sheets.

Available Cases

最低0.47元/天解锁文章