![d260cf08663bee981259a2b7905e6a67.png](https://i-blog.csdnimg.cn/blog_migrate/5ba406b867efc95b01fce83a9fe99f13.jpeg)
I. 读写文本文件
i) read_csv() / read_table()
ii) to_csv()
iii) pd.read_excel()
iv) pd.to_excel()
v) 打开大文件(.csv)
II. 读写json数据
i) pd.read_json()
ii) Series.to_json() / DataFrame.to_json()
III. 从数据库中读取数据
i) pd.read_sql()
I. 读取文本格式的数据
以下是pandas中用于将表格型数据的读取为DataFrame对象的函数,其中以pd.read_csv()方法和pd.read_table()方法最为常用
函数 说明
read_csv 把csv文件转化为DataFrame对象。默认分隔符为逗号(,)
read_table 把常规的带分隔符的文件转化为DataFrame对象。默认分隔符为制表符(/t)
read_fwf 读取定宽列格式数据(也就是说,没有分隔符)
read_clipboard 把剪贴板中的数据转化为DataFrame对象
i)read_csv() / read_table()
这两个方法除了默认的分隔符不同之外,其他参数都一样,所以就一起介绍了。
由于其中的参数很多,我仅从个人使用经验出发重点介绍几个参数,以下排名分先后:
read_csv() / read_table()
read_table(filepath_or_buffer, sep='t', delimiter=None, header='infer',
names=None, index_col=None, usecols=None, squeeze=False,
prefix=None, mangle_dupe_cols=True, dtype=None, engine=None,
converters=None, true_values=None, false_values=None,
skipinitialspace=False, skiprows=None, nrows=None, na_values=None,
keep_default_na=True,na_filter=True, verbose=False,
skip_blank_lines=True, parse_dates=False, infer_datetime_format=False,
keep_date_col=False, date_parser=None, dayfirst=False, iterator=False,
chunksize=None, compression='infer', thousands=None, decimal=b'.',
lineterminator=None, quotechar='"', quoting=0, escapechar=None,
comment=None, encoding=None, dialect=None, tupleize_cols=None,
error_bad_lines=True, warn_bad_lines=True, skipfooter=0,
doublequote=True, delim_whitespace=False, low_memory=True,
memory_map=False, float_precision=None)
engine 解析器引擎, 可选'c'或'python', 默认为None. C更快, 但Python功能更完备.
填'c'或默认不填都会造成文件初始化失败, 这里直接选'python'即可,。
encoding 读写文件使用的编码方式.字符串类型, 默认为None.
我们常遇到含中文的csv文件, 此时推荐选择'utf-8-sig'或'utf_8_sig'
names 要使用的列名字符串的列表.类数组, 默认为None.
如果文件不包含字段行, 应该显式传递header=None.如果names列表中有重复的
字段名,会报UserWarning.
header 用作列名的行号(可以为多个).整数或整数的列表, 默认为'infer'.
默认行为是推测列名:
1.如果names参数没有列名字符串的列表传入, 行为等同于header=0,
即把第一行作为字段名
2.如果names参数有列名字符串的列表传入, 行为等同于header=None
index_col Column(s) to use as the row labels of the ``DataFrame``
可以是int, str, sequence of int / str 或 False, 默认为None.
int指column index(从0开始), str指column name,
如果传入的是字符串或整数的序列, 则使用MultiIndex
int or sequence or False, default None. If a sequence is given, a
MultiIndex is used.
parse_dates boolean or list of ints or names or list of lists or dict,
default False
* boolean. If True -> try parsing the index.
* list of ints or names. e.g. If [1, 2, 3] -> try parsing
columns 1, 2, 3 each as a separate date column.
* list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and
parse as a single date column.
* dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call
result 'foo'
If a column or index contains an unparseable date, the entire column
or index will be returned unaltered as an object data type.
For non-standard datetime parsing, use ``pd.to_datetime`` after
``pd.read_csv``
Note: A fast-path exists for iso8601-formatted dates.
chunksize 每个块(chunk)中行的数量,打开大文件时使用. int, default None
nrows Number of rows of file to read. Useful for reading pieces of large files
(快速查看大数据集的前n行,非常快) int, default None
usecols Return a subset of the columns.
list-like or callable, default None
If list-like, all elements must either be positional
(i.e. integer indices into the document columns) or strings that
correspond to column names provided either by the user in `names` or
inferred from the document header row(s). For example, a
valid list-like `usecols` parameter would be [0, 1, 2] or
['foo', 'bar', 'baz']. Element order is ignored, so ``usecols=[0, 1]``
is the same as ``[1, 0]``.
read_csv()和read_table()方法只是分隔符不同的区别:
![3384bd4e4a661a850bb8c06c8967b5fc.png](https://i-blog.csdnimg.cn/blog_migrate/710fd31822fc5b17da8adb7fe8407903.jpeg)
有了read_table(),我们很多时间就不需要用with open打开txt文件了:
![e8cf415cceff1190a71f342bac0b92f3.png](https://i-blog.csdnimg.cn/blog_migrate/0f3f68f18f57c6ace98dd7c0ee6e0a27.jpeg)
![c19262be41e9778986c8038772751b30.png](https://i-blog.csdnimg.cn/blog_migrate/b48bb83b7716c3ec4206e39ace3a524b.jpeg)
![f1a576d623d5d2913de0b4a00dce022d.png](https://i-blog.csdnimg.cn/blog_migrate/6ce56e59ec4100b774ba4e9348c8182e.jpeg)
一个示例:
数据集如下
![1e32698f95862807c8a95596da3a9ff5.png](https://i-blog.csdnimg.cn/blog_migrate/207865904df97415e6005960b9873b63.png)
如果用with open解析:
def data_processing(path):
'''K线数据处理'''
with open(path) as f:
content = f.readlines()
content = [json.loads(dic.rstrip()) for dic in content]
t = [datetime.fromtimestamp(dic.get('id')) for dic in content]
o = [dic.get('open') for dic in content]
h = [dic.get('high') for dic in content]
l = [dic.get('low') for dic in content]
c = [dic.get('close') for dic in content]
v = [dic.get('vol') for dic in content]
df = pd.DataFrame({
'time': t,
'open': o,
'high': h,
'low': l,
'close': c,
'volume': v
})
df.set_index('time', inplace=True)
如果用read_table()解析:
def data_processing(path):
'''K线数据处理'''
df = pd.read_table(path, engine='python', names=['records'])
df['records'] = df['records'].map(lambda x: json.loads(x))
df['timestamp'] = df['records'].map(lambda x: datetime.fromtimestamp(x.get('id')))
df['open'] = df['records'].map(lambda x: x.get('open'))
df['high'] = df['records'].map(lambda x: x.get('high'))
df['low'] = df['records'].map(lambda x: x.get('low'))
df['close'] = df['records'].map(lambda x: x.get('close'))
df.drop('records', axis=1, inplace=True)
df.set_index('timestamp', inplace=True)
return df
综合示例:
![a841b12ebb76d7d80b08860a0420000c.png](https://i-blog.csdnimg.cn/blog_migrate/166a77fc68d75163c974c813d78d6538.jpeg)
![77ae5b396efe10e112a39c04dfaac47d.png](https://i-blog.csdnimg.cn/blog_migrate/e9e9bf263bbaee52ea134940def10b59.jpeg)
ii) to_csv()
这个方法用于把pandas对象写入csv文件中
to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True,
index=True, index_label=None, mode='w', encoding=None, compression=None,
quoting=None, quotechar='"', line_terminator='n', chunksize=None, tupleize_cols=None,
date_format=None, doublequote=True, escapechar=None, decimal='.')
参数
path_or_buf 保存路径
string or file handle, default None
File path or object, if None is provided the result is returned as a string.
index 是否保留行索引
boolean, default True
Write row names (index)
encoding 如果包含中文, 必须加上encoding='utf_8_sig'
string, optional
A string representing the encoding to use in the output file,
defaults to 'ascii' on Python 2 and 'utf-8' on Python 3.
DataFrame保存到csv文件
![3e27cb17938e9946b31a7114546095c9.png](https://i-blog.csdnimg.cn/blog_migrate/ed2aeb4e1e808ca601bc20ceffb6a993.jpeg)
![3fb5d5001c1d38c557bd2cdb276b5e41.png](https://i-blog.csdnimg.cn/blog_migrate/270e37a831664f9ca3974bae40fbcfbd.jpeg)
Series保存到csv文件
![a9109aa14e00ac33ee7f6dd23b382dd5.png](https://i-blog.csdnimg.cn/blog_migrate/43ed1a2ebcd7879af3eb7d3ba8cc23e2.jpeg)
![8f4facf5c0e5c03e58fb4d4a8973b098.png](https://i-blog.csdnimg.cn/blog_migrate/98ddd0e4953b86db4b641023fbaa2d0f.jpeg)
如果包含中文
![e3bf8c209f3e20474a0d8d6886a81b85.png](https://i-blog.csdnimg.cn/blog_migrate/0cb6a40323ac05de00522c31920bceb5.jpeg)
![a2910eadb278b916c80a4645cc887795.png](https://i-blog.csdnimg.cn/blog_migrate/52b85300681c8d919422eee027e76111.jpeg)
iii)pd.read_excel()
read_excel()方法和前面介绍的两个方法参数大同小异。
小异如下:
- read_excel()方法有一个sheet_name参数,用于指定工作表
- 对于read_excel()方法,engine和encoding都是不需要的。
- read_excel()方法没有chunksize参数
- read_excel()方法的usecols参数不能指定字符串的列表
read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None,
squeeze=False, dtype=None, engine=None, converters=None, true_values=None,
false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False,
date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True,
**kwds)
sheet_name string, int, mixed list of strings/ints, or None, default 0
Strings are used for sheet names
Integers are used in zero-indexed sheet positions.
Lists of strings/integers are used to request multiple sheets.
Specify None to get all sheets.
str|int -> DataFrame is returned.
list|None -> Dict of DataFrames is returned, with keys representing
sheets.
Available Cases
* Defaults to 0 -> 1st sheet as a DataFrame
* 1 -> 2nd sheet as a DataFrame
* "Sheet1" -> 1st sheet as a DataFrame
* [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
* None -> All sheets as a dictionary of DataFrames
names : array-like, default None
List of column names to use. If file contains no header row,
then you should explicitly pass header=None
header int, list of ints, default 0
Row (0-indexed) to use for the column labels of the parsed
DataFrame. If a list of integers is passed those row positions will
be combined into a ``MultiIndex``. Use None if there is no header.
index_col int, list of ints, default None
Column (0-indexed) to use as the row labels of the DataFrame.
Pass None if there is no such column. If a list is passed,
those columns will be combined into a ``MultiIndex``. If a
subset of data is selected with ``usecols``, index_col
is based on the subset.
parse_dates
bool, list-like, or dict, default False
The behavior is as follows:
bool. If True -> try parsing the index.
list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as
a separate date column.
list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as
a single date column.
dict, e.g. {{‘foo’ : [1, 3]}} -> parse columns 1, 3 as date and call result ‘foo’
If a column or index contains an unparseable date, the entire column or
index will be returned unaltered as an object data type. For non-standard datetime
parsing, use pd.to_datetime after pd.read_csv
nrows int, default None
Number of rows to parse
usecols int or list, default None
* If None then parse all columns,
* If int then indicates last column to be parsed
* If list of ints then indicates list of column numbers to be parsed
* If string then indicates comma separated list of Excel column letters and
column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
both sides.
示例1
![5255e414bb1df2b77d44d2498862c4bf.png](https://i-blog.csdnimg.cn/blog_migrate/947d8277f5ffb3f83d1fb9366ae32d34.jpeg)
示例2
![b38318bea844621af8969eb2163049ec.png](https://i-blog.csdnimg.cn/blog_migrate/cbcf837c55c4a3695329eba7c65128af.jpeg)
iv) pd.to_excel()
把pandas对象写入一个excel工作表中
to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None,
header=True, index=True, index_label=None, startrow=0, startcol=0,
engine=None, merge_cells=True, encoding=None, inf_rep='inf',
verbose=True, freeze_panes=None)
excel_writer 文件路径或已存在的ExcelWriter对象
string or ExcelWriter object
File path or existing ExcelWriter
sheet_name string, default 'Sheet1'
Name of sheet which will contain DataFrame
index boolean, default True
Write row names (index)
header : bool or list of str, default True
Write out the column names. If a list of string is given it is
assumed to be aliases for the column names.
freeze_panes tuple of int (length 2), optional
冻结窗格, 第一个参数为行, 第二个参数为列, 如冻结首行应写为(1,0)
和read_excel类似,包含中文时无需注明encoding
示例1:
![d7453b255f7aae8a74372f1db8d645c8.png](https://i-blog.csdnimg.cn/blog_migrate/c45be7783829dc17ee90904c5580645e.jpeg)
![8ccde76905a6b7678d86bc136489192c.png](https://i-blog.csdnimg.cn/blog_migrate/b833e297c901ebac5bb07120f12caea7.jpeg)
![395436982567d4d10607c543b80cd9e2.png](https://i-blog.csdnimg.cn/blog_migrate/aa83e859189d5dfa8442c98878ce94bf.png)
示例2:
![2ef2f6df6854053436653b2ffecfc4b5.png](https://i-blog.csdnimg.cn/blog_migrate/cb3bb0fe2a1f0617c69c31ff95fa342c.jpeg)
![a0754b5861bfb5eb232e8b2ced5b4b12.png](https://i-blog.csdnimg.cn/blog_migrate/ec57e4736b5be64e6e406d5a604212e4.jpeg)
![6b17fb04c515af7f4be8387eb08e03a3.png](https://i-blog.csdnimg.cn/blog_migrate/e15cbf610874ed9e450821358e2dd66c.png)
![3dcf032acc64445230cfdda5e594ce8c.png](https://i-blog.csdnimg.cn/blog_migrate/9fcfbad4554fd6f1346669a079f78531.jpeg)
除此之外还可以用如下这种形式往一个工作簿中写入多张工作表:
>>> writer = pd.ExcelWriter('output.xlsx')
>>> df1.to_excel(writer,'Sheet1')
>>> df2.to_excel(writer,'Sheet2')
>>> writer.save()
那么,如何往既有的工作簿中写入新的工作表呢?
方法一:
![2e9e097760b23e58b3b43cb16902dbc5.png](https://i-blog.csdnimg.cn/blog_migrate/6cc53bb18cf32602d504f7744cf5df9f.jpeg)
![9cb66698e19c89035c7e2b7562da32d0.png](https://i-blog.csdnimg.cn/blog_migrate/6458fbedaea772ced7b012b673edf4ef.png)
![1620af0e15da3beb7c4c8ed61f7746a2.png](https://i-blog.csdnimg.cn/blog_migrate/8db6b8047b922d2c0f52c3e875608e5f.png)
方法二:
pandas在0.24版本中给出了更方便的解决方案:
![d09aa85e1b0faec0c9e9c8abe93f14f6.png](https://i-blog.csdnimg.cn/blog_migrate/c81b1a226049562126296e2ef9664e4d.jpeg)
所以如果你用的是0.23或更低版本的pandas记得先升级:
pip install --upgrade pandas
然后这么操作:
![a95d250e637078d2ffab0ea24395d3dd.png](https://i-blog.csdnimg.cn/blog_migrate/12f1854f35c6a113a7c6fb35aec99f61.png)
![f245fbebd9dd5237b1a574e2234724e1.png](https://i-blog.csdnimg.cn/blog_migrate/9418d6b891e8aee95d32051f9137504f.png)
推荐方法二,非常方便,这也是官方推荐的方法。
iii)打开大文件(.csv)
偶尔,我们在处理数据时会遇到数百兆甚至几个G的数据集,这个时候别说处理,连查看都很难做到(如果你是个只用excel的分析师就更加难受了)。
文本使用的数据集如下:
![0d9e920cf76e57b06b8633ae81a05c1c.png](https://i-blog.csdnimg.cn/blog_migrate/680fe1991c34dcbddd0bc7f745836b92.jpeg)
如果用excel打开会非常卡:
![37d374619921564d08dab3b42dd56a73.png](https://i-blog.csdnimg.cn/blog_migrate/485481b9c2e0c141145ab494ea4a4d24.jpeg)
查看数据集
在得到一个数据集时,我们往往会先查看它的字段和数据,通过在read_csv()方法中指定nrows即可快速查看数据集的前n行,非常快
![20193b669628b6511d5566d72d2563d2.png](https://i-blog.csdnimg.cn/blog_migrate/6c09f3010fa8243d334da7cc872a1c3c.jpeg)
读取文件
开始之前,我们先看一下常规方法的耗时
![4c979564bc7ed5882a5e4fce809a9989.png](https://i-blog.csdnimg.cn/blog_migrate/cfc51d95495cf6b904f5472196f20785.jpeg)
发现根本读不出来
然后我们看一下设置chunksize参数后的效果
![901a5221892bce64477a78c211454472.png](https://i-blog.csdnimg.cn/blog_migrate/da1dbae700f9c0f22fd4675a5f430386.jpeg)
稍微花了点时间,因为算上了合并子DataFrame的开销,也可以不合并,把每个子DataFrame单独做处理,视具体情况而定。
II. 读写json数据
json标准库我们都知道,我们通常都用它来加载json文件、转换json字符串,但其实pandas提供了更为强大的json读写方法,而且这些方法和DataFrame、Series结合得更紧密,毕竟这两个是我们数据分析中最常用到的数据结构。所以本着哪个好用用哪个的实用主义精神,让我们告别json.load()/json.loads(),拥抱pd.read_json().
i) pd.read_json()
这个方法的作用是将json字符串转化为pandas对象(注意,是json字符串不是json对象)
read_json(path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True,
convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False,
date_unit=None, encoding=None, lines=False, chunksize=None, compression='infer')
参数
path_or_buf 要解析的json字符串或类文件对象
a valid JSON string or file-like, default: None
The string could be a URL. Valid URL schemes include http, ftp, s3, gcs, and file.
For file URLs, a host is expected. For instance,
a local file could be file://localhost/path/to/table.json
typ 目标类型, 'frame'或'series', 默认是'frame'
type of object to recover (series or frame), default ‘frame’
orient JSON字符串的格式表示, 如果typ == 'series',默认是'index',
如果typ == 'frame', 默认是'columns'
Indication of expected JSON string format. string. Compatible JSON strings can be
produced by to_json() with a corresponding orient value. The set of possible
orients is:
'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
'records' : list like [{column -> value}, ... , {column -> value}]
'index' : dict like {index -> {column -> value}} 或 {index → value}
'columns' : dict like {column -> {index -> value}} 或[{column →value}]
'values' : just the values array
The allowed and default values depend on the value of the typ parameter.
when typ == 'series',
allowed orients are {'split','records','index'}
default is 'index'
The Series index must be unique for orient 'index'.
when typ == 'frame',
allowed orients are {'split','records','index', 'columns','values', 'table'}
default is 'columns'
The DataFrame index must be unique for orients 'index' and 'columns'.
The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.
New in version 0.23.0: ‘table’ as an allowed value for the orient argument
convert_axes boolean, default True
Try to convert the axes to the proper dtypes.
convert_dates
boolean, default True
List of columns to parse for dates; If True, then try to parse datelike
columns default is True; a column label is datelike if
it ends with '_at',
it ends with '_time',
it begins with 'timestamp',
it is 'modified', or
it is 'date'
lines boolean, default False
Read the file as a json object per line.
New in version 0.19.0.
解析json字符串为Series:
![526f07920e7627abb2c5040892332c70.png](https://i-blog.csdnimg.cn/blog_migrate/f74d222bc9c0ebb9ca3bd875fb0c6c1d.jpeg)
解析json字符串为DataFrame
![a434595626c393f0223cd7d3e16901be.png](https://i-blog.csdnimg.cn/blog_migrate/bc0049ecf33f3477a47985f60d5e3ec1.jpeg)
读取json文件为DataFrame
![ee1a344090c269c5c2d7f2c7bfe2404d.png](https://i-blog.csdnimg.cn/blog_migrate/fcdca329eb4fbe405e88ed01bad53a34.png)
![78d9022f7d0dc1d539e6311a3d628073.png](https://i-blog.csdnimg.cn/blog_migrate/e08d0b37254085ce32fb454404022333.jpeg)
读取txt文件为DataFrame
![53b5c982bdc7fc2263d9f3f6eafdcd4b.png](https://i-blog.csdnimg.cn/blog_migrate/126cbd37eeea731a23c5a07964dc1f53.png)
![bbb69069e995a3679b6d37b23d32d595.png](https://i-blog.csdnimg.cn/blog_migrate/17b6ee628edff68443112e86af9fffc6.jpeg)
所以前面的函数可以这么写:
def data_processing(path):
'''K线数据处理'''
df = pd.read_json(path, lines=True).reindex(columns=['id','open','high','low','close'])
df['id'] = df['id'].map(lambda x: datetime.fromtimestamp(x))
df.set_index('id', inplace=True)
return df
ii) Series.to_json() / DataFrame.to_json()
把pandas对象转化为json字符串
Convert the object to a JSON string.
Note NaN's and None will be converted to null and datetime objects
will be converted to UNIX timestamps.
to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10,
force_ascii=True, date_unit='ms', default_handler=None, lines=False,
compression=None, index=True)
path_or_buf string or file handle, optional
File path or object. If not specified, the result is returned as
a string.
orient string
Indication of expected JSON string format.
* Series
- default is 'index'
- allowed values are: {'split','records','index'}
* DataFrame
- default is 'columns'
- allowed values are:
{'split','records','index','columns','values'}
* The format of the JSON string
- 'split' : dict like {'index' -> [index],
'columns' -> [columns], 'data' -> [values]}
- 'records' : list like
[{column -> value}, ... , {column -> value}]
- 'index' : dict like {index -> {column -> value}}
- 'columns' : dict like {column -> {index -> value}}
- 'values' : just the values array
- 'table' : dict like {'schema': {schema}, 'data': {data}}
describing the data, and the data component is
like ``orient='records'``.
.. versionchanged:: 0.20.0
lines boolean, default False
If 'orient' is 'records' write out line delimited json format. Will
throw ValueError if incorrect 'orient' since others are not list like.
index boolean, default True
Whether to include the index values in the JSON string. Not
including the index (``index=False``) is only supported when
orient is 'split' or 'table'.
把pandas对象转化为json字符串1:
![e416e6a512ed09bcdef8f06ef3cc0ed3.png](https://i-blog.csdnimg.cn/blog_migrate/a6e6bde0ad2a35946363c668cb3316e4.jpeg)
把pandas对象转化为json字符串2:
![0a8ce80deaed89e147fd1716dcc63517.png](https://i-blog.csdnimg.cn/blog_migrate/935c38878b26af4d9518c04395515ea9.jpeg)
把pandas对象转化为json文件
![220b4ef18b7dc1eff18d2111a759d24d.png](https://i-blog.csdnimg.cn/blog_migrate/982ec1f2b657e614cc4bb62b87c54ea5.jpeg)
![25d6df0f3ba03c2ccbc0e572727187a5.png](https://i-blog.csdnimg.cn/blog_migrate/7ecccb6f4d2c653c0a5485525553b6fd.jpeg)
III. 从数据库中读取数据
i) pd.read_sql()
把SQL查询结果或数据库的表读取为DataFrame.这个方法其实是read_sql_query()和read_sql_table()的封装,read_sql()根据输入选择不同的方法执行。
Read SQL query or database table into a DataFrame.
This function is a convenience wrapper around ``read_sql_table`` and
``read_sql_query`` (for backward compatibility). It will delegate
to the specific function depending on the provided input. A SQL query
will be routed to ``read_sql_query``, while a database table name will
be routed to ``read_sql_table``. Note that the delegated function might
have more specific notes about their functionality not listed here.
read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None,
columns=None, chunksize=None)
sql SQL查询语句或表名
string or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
con 数据库连接对象
SQLAlchemy connectable (engine/connection) or database string URI
or DBAPI2 connection (fallback mode)
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported.
index_col string or list of strings, optional, default: None
Column(s) to set as index(MultiIndex).
columns list, default: None
List of column names to select from SQL table (only used when reading
a table).
parse_dates list or dict, default: None
- List of column names to parse as dates.
- Dict of ``{column_name: format string}`` where format string is
strftime compatible in case of parsing string times, or is one of
(D, s, ns, ms, us) in case of parsing integer timestamps.
- Dict of ``{column_name: arg dict}``, where the arg dict corresponds
to the keyword arguments of :func:`pandas.to_datetime`
Especially useful with databases without native Datetime support,
such as SQLite.
以MySQL为例:
![f8980d0743ba9682f01c44f15cc02b42.png](https://i-blog.csdnimg.cn/blog_migrate/fff6eec2631ad7182b941b4aca238a7e.jpeg)