1、excel
data = pd.read_excel(io,
sheetname=0, header=0,
skiprows=None, skip_footer=0,
index_col=None, names=None,
parse_cols=None, parse_dates=False,
date_parser=None, na_values=None,
thousands=None, convert_float=True,
has_index_names=None, converters=None,
dtype=None, true_values=None,
false_values=None, engine=None,
squeeze=False, **kwds)
常用参数解析:
io : string, path object ; excel 路径。
sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe
header : int, list of ints, default 0 指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None
skiprows : list-like,Rows to skip at the beginning,省略指定行数的数据
skip_footer : int,default 0, 省略从尾部数的int行数据
index_col : int, list of ints, default None指定列为索引列,也可以使用u”strings”
names : array-like, default None, 指定列的名字。
import numpy as np
import pandas as pd
io='C:/Users/23008/Desktop/AJPY/test.xls'
data = pd.read_excel(io)
#默认:第一个表sheet[0],列名为第一行header=0
print(type(data))
print(data.head())
print(data.tail())
<class 'pandas.core.frame.DataFrame'>
a 131 wevw 1
0 b 132.0 4 2
1 svv 133.0 w3 3
2 sv 134.0 43 4
3 vs 135.0 r4 5
4 fsv 136.0 NaN 6
a 131 wevw 1
19 bu 151.0 66 21
20 bye 152.0 67 22
21 yebrtu 153.0 68 23
22 ye 154.0 69 24
23 ey 155.0 70 25
data_1 = pd.read_excel(io,header=None,sheetname=1)
#第一行不做列名,读第二个工作表
print(data_1.head())
0 1
0 dsaa 33
1 758 34
2 58 35
3 58568 36
4 143 37
data_2= pd.read_excel(io,skiprows=[4],skip_footer=3)
#skiprows=[4]省略第五行,=4省略前五行
#skip_footer=3省略后三行,=[3]倒数第三行
print(data_2.head())
print(data_2.tail())
a 131 wevw 1
0 b 132.0 4 2
1 svv 133.0 w3 3
2 sv 134.0 43 4
3 fsv 136.0 NaN 6
4 svsvsv 137.0 52 7
a 131 wevw 1
15 yebrtu 148.0 63 18
16 bey 149.0 64 19
17 b5 NaN 65 20
18 bu 151.0 66 21
19 bye 152.0 67 22
data_3= pd.read_excel(io,header=None,names=["a","b","c","e"])
#name参数,指定列名
print(data_3.head())
a b c e
0 a 131.0 wevw 1
1 b 132.0 4 2
2 svv 133.0 w3 3
3 sv 134.0 43 4
4 vs 135.0 r4 5
data_4= pd.read_excel(io,index_col="a")
#index_col="a",指定a列为索引列
print(data_4.head())
131 wevw 1
a
b 132.0 4 2
svv 133.0 w3 3
sv 134.0 43 4
vs 135.0 r4 5
fsv 136.0 NaN 6
2、存储数据
to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
float_format=None,columns=None, header=True, index=True,
index_label=None,startrow=0, startcol=0, engine=None,
merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
常用参数解析
- excel_writer : string or ExcelWriter object 目标路径
- sheet_name : string, default ‘Sheet1’ Name of sheet which will contain DataFrame,填充excel的第几页
- na_rep : string, default ”,Missing data representation 缺失值填充
- float_format : string, default None Format string for floating point numbers
- columns : sequence, optional,Columns to write 选择输出的的列。
- header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean, default True,Write row names (index)
- index_label : string or sequence, default None, Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startrow :upper left cell row to dump data frame
- startcol :upper left cell column to dump data frame
- engine : string, default None ,write engine to use - you can also set this via the options,io.excel.xlsx.writer, io.excel.xls.writer, andio.excel.xlsm.writer.
- merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.
- encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.
- inf_rep : string, default ‘inf’ Representation for infinity (there is no native representation for infinity in Excel)
- freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen
io='C:/Users/23008/Desktop/AJPY/test.xls'
io1='C:/Users/23008/Desktop/AJPY/test11.xls'
data = pd.read_excel(io)
data.to_excel(io1,sheet_name="sheet2",na_rep="NULL",header=["a","b","c","d"])
#存储到test11第二个工作表中,缺失值填充为“NULL”,列名为abcd
io1='C:/Users/23008/Desktop/AJPY/test11.xls'
io2='C:/Users/23008/Desktop/AJPY/test12.xls'
data = pd.read_excel(io1)
data.to_excel(io2,columns=['b','c'],index_label=["ID"])
#columns=['b','c']:只存bc两列
#行索引默认有,index_label=["ID"],行索引列名为ID
data.to_excel(io2,columns=['b','c'],index=None)
#index=None不要行索引,默认有