python 4988_Python 数据处理（十六）-CSDN博客

本文链接：https://blog.csdn.net/weixin_39945531/article/details/114910075

10 指定浮点数转换方法

可以在 C 引擎解析期间使用 float_precision 参数来指定浮点数转换器

该参数有三个可选的值：

None: 普通转换器

high: 高精度转换器

round_trip: 保证文件读写之后小数点精度不变

In [127]: val = "0.3066101993807095471566981359501369297504425048828125"

In [128]: data = "a,b,c\n1,2,{0}".format(val)

In [129]: abs(

.....: pd.read_csv(

.....: StringIO(data),

.....: engine="c",

.....: float_precision=None,

.....: )["c"][0] - float(val)

.....: )

.....:

Out[129]: 5.551115123125783e-17

In [130]: abs(

.....: pd.read_csv(

.....: StringIO(data),

.....: engine="c",

.....: float_precision="high",

.....: )["c"][0] - float(val)

.....: )

.....:

Out[130]: 5.551115123125783e-17

In [131]: abs(

.....: pd.read_csv(StringIO(data), engine="c", float_precision="round_trip")["c"][0]

.....: - float(val)

.....: )

.....:

Out[131]: 0.0

11 千位分隔符

对于使用千位分隔符编写的大数，可以将千位关键字设置为长度为 1 的字符串，以便正确解析整数

而在默认情况下，带有千位分隔符的数字将被解析为字符串

In [132]: print(open("tmp.csv").read())

ID|level|category

Patient1|123,000|x

Patient2|23,000|y

Patient3|1,234,018|z

In [133]: df = pd.read_csv("tmp.csv", sep="|")

In [134]: df

Out[134]:

ID level category

0 Patient1 123,000 x

1 Patient2 23,000 y

2 Patient3 1,234,018 z

In [135]: df.level.dtype

Out[135]: dtype('O')

可以设置 thousands 参数来解析

In [136]: print(open("tmp.csv").read())

ID|level|category

Patient1|123,000|x

Patient2|23,000|y

Patient3|1,234,018|z

In [137]: df = pd.read_csv("tmp.csv", sep="|", thousands=",")

In [138]: df

Out[138]:

ID level category

0 Patient1 123000 x

1 Patient2 23000 y

2 Patient3 1234018 z

In [139]: df.level.dtype

Out[139]: dtype('int64')

12 NA 值

要控制哪些值被解析为缺失值(用 NaN 表示)，请在 na_values 中指定一个字符串。

如果指定了一个字符串列表，那么其中的所有值都被认为是缺失值

如果您指定一个数字(如浮点数 5.0 或整数 5)，相应的等效值也将被认为是一个缺失的值(在这种情况下 [5.0,5] 被有效地识别为 NaN)。

要完全覆盖会被识别为缺失的默认值，请指定 keep_default_na=False

默认被识别为 NaN 的值是

['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA', '', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '']

考虑下面额例子

pd.read_csv("path_to_file.csv", na_values=[5])

在上述示例中，除了默认值之外，5 和 5.0 将被识别为 NaN。字符串将首先被解释为数字 5，然后被解释为 NaN。

pd.read_csv("path_to_file.csv", keep_default_na=False, na_values=[""])

在上面的例子中，只有空白字段会被识别为 NaN

pd.read_csv("path_to_file.csv", keep_default_na=False, na_values=["NA", "0"])

上例中，字符串 NA 和 0 都会被识别为 NaN

pd.read_csv("path_to_file.csv", na_values=["Nope"])

除了默认值外，字符串 Nope 也会被识别为 NaN

13 无限值

inf 类似的值将被解析为 np.inf (正无穷)和 -inf 为 -np.inf(负无穷)。

解析会忽略值的大小写，即 Inf，也将被解析为 np.inf

14 返回 Series

使用 squeeze 关键字参数，将返回单个列的 Series 形式输出:

In [140]: print(open("tmp.csv").read())

level

Patient1,123000

Patient2,23000

Patient3,1234018

In [141]: output = pd.read_csv("tmp.csv", squeeze=True)

In [142]: output

Out[142]:

Patient1 123000

Patient2 23000

Patient3 1234018

Name: level, dtype: int64

In [143]: type(output)

Out[143]: pandas.core.series.Series

15 布尔值

常见的值 True 和 False 都被认为是布尔值。有时您可能想要识别其他值为布尔值。

为此，可以使用 true_values 和 false_values 参数，如下所示:

In [144]: data = "a,b,c\n1,Yes,2\n3,No,4"

In [145]: print(data)

a,b,c

1,Yes,2

3,No,4

In [146]: pd.read_csv(StringIO(data))

Out[146]:

a b c

0 1 Yes 2

1 3 No 4

In [147]: pd.read_csv(StringIO(data), true_values=["Yes"], false_values=["No"])

Out[147]:

a b c

0 1 True 2

1 3 False 4

16 处理错误的行

某些文件的行格式存在错误，字段太少或太多。字段太少的行将在尾随字段中填充 NA 值。默认情况下，包含太多字段的行将引发错误

In [148]: data = "a,b,c\n1,2,3\n4,5,6,7\n8,9,10"

In [149]: pd.read_csv(StringIO(data))

---------------------------------------------------------------------------

ParserError Traceback (most recent call last)

----> 1 pd.read_csv(StringIO(data))

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)

684 )

685

--> 686 return _read(filepath_or_buffer, kwds)

687

688

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)

456

457 try:

--> 458 data = parser.read(nrows)

459 finally:

460 parser.close()

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)

1194 def read(self, nrows=None):

1195 nrows = _validate_integer("nrows", nrows)

-> 1196 ret = self._engine.read(nrows)

1197

1198 # May alter columns / col_dict

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)

2153 def read(self, nrows=None):

2154 try:

-> 2155 data = self._reader.read(nrows)

2156 except StopIteration:

2157 if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

你可以选择跳过错误的行

In [29]: pd.read_csv(StringIO(data), error_bad_lines=False)

b'Skipping line 3: expected 3 fields, saw 4\n'

Out[29]:

a b c

0 1 2 3

1 8 9 10

你也可以使用 usecols 参数来消除某些行中出现的多余的列数据

In [30]: pd.read_csv(StringIO(data), usecols=[0, 1, 2])

Out[30]:

a b c

0 1 2 3

1 4 5 6

2 8 9 10

17 dialect

dialect 参数在读取特定格式的文件时提供了更大的灵活性，默认情况下，它使用 Excel 方言，但您可以指定方言名称或 csv.Dialect 实例

假设您的数据带有未封闭的引号

In [150]: print(data)

label1,label2,label3

index1,"a,c,e

index2,b,d,f

默认情况下，read_csv 使用 Excel 方言，并将双引号视为引号字符，这会导致在换行符之前无法找到配对的双引号。

我们可以用 dialect 来解决这个问题:

In [151]: import csv

In [152]: dia = csv.excel()

In [153]: dia.quoting = csv.QUOTE_NONE

In [154]: pd.read_csv(StringIO(data), dialect=dia)

Out[154]:

label1 label2 label3

index1 "a c e

index2 b d f

可以通过关键字参数分别指定所有方言选项

In [155]: data = "a,b,c~1,2,3~4,5,6"

In [156]: pd.read_csv(StringIO(data), lineterminator="~")

Out[156]:

a b c

0 1 2 3

1 4 5 6

另一个常见的方言选项是 skipinitialspace，跳过分隔符后的任何空白字符

In [157]: data = "a, b, c\n1, 2, 3\n4, 5, 6"

In [158]: print(data)

a, b, c

1, 2, 3

4, 5, 6

In [159]: pd.read_csv(StringIO(data), skipinitialspace=True)

Out[159]:

a b c

0 1 2 3

1 4 5 6

解析器会尽一切努力做正确的事情，类型推断非常重要

18 引号和转义符

嵌入字段中的引号(和其他转义字符)可以通过多种方式处理。

一种方法是使用反斜杠。要正确解析此数据，您应该传递 escapechar 选项：

In [160]: data = 'a,b\n"hello, \\"Bob\\", nice to see you",5'

In [161]: print(data)

a,b

"hello, \"Bob\", nice to see you",5

In [162]: pd.read_csv(StringIO(data), escapechar="\\")

Out[162]:

a b

0 hello, "Bob", nice to see you 5

19 固定列宽文件

read_fwf() 函数可用于读取具有固定列宽的数据文件

read_fwf 的函数参数与 read_csv 的函数参数基本相同，只是有两个额外的参数，而且 delimiter 参数的用法也不同:

colspecs：一个元组列表，给出每行固定宽度字段的范围，半开区间。默认设置为 infer 可以让解析器自动从数据的前 100 行中推断出格式。

widths：一个字段宽度列表，如果间隔是连续的，可以用它来代替 colspecs

delimiter：在固定宽度的文件中作为填充字符。如 ~

考虑如下固定宽度文件

In [163]: print(open("bar.csv").read())

id8141 360.242940 149.910199 11950.7

id1594 444.953632 166.985655 11788.4

id1849 364.136849 183.628767 11806.2

id1230 413.836124 184.375703 11916.8

id1948 502.953953 173.237159 12468.3

为了将此文件解析为 DataFrame，我们只需要将列与文件名一起提供给 read_fwf 函数

# Column specifications are a list of half-intervals

In [164]: colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)]

In [165]: df = pd.read_fwf("bar.csv", colspecs=colspecs, header=None, index_col=0)

In [166]: df

Out[166]:

1 2 3

id8141 360.242940 149.910199 11950.7

id1594 444.953632 166.985655 11788.4

id1849 364.136849 183.628767 11806.2

id1230 413.836124 184.375703 11916.8

id1948 502.953953 173.237159 12468.3

另外，您可以仅提供连续列的列宽：

# Widths are a list of integers

In [167]: widths = [6, 14, 13, 10]

In [168]: df = pd.read_fwf("bar.csv", widths=widths, header=None)

In [169]: df

Out[169]:

0 1 2 3

0 id8141 360.242940 149.910199 11950.7

1 id1594 444.953632 166.985655 11788.4

2 id1849 364.136849 183.628767 11806.2

3 id1230 413.836124 184.375703 11916.8

4 id1948 502.953953 173.237159 12468.3

解析器会忽略列周围的多余空白，因此可以在文件中的列之间留出额外的分隔。

默认情况下，read_fwf 会尝试使用文件的前 100 行来推断文件的 colspec。

仅当文件的列能够通过提供的定界符对齐并正确分隔时，才可以这样做

In [170]: df = pd.read_fwf("bar.csv", header=None, index_col=0)

In [171]: df

Out[171]:

1 2 3

id8141 360.242940 149.910199 11950.7

id1594 444.953632 166.985655 11788.4

id1849 364.136849 183.628767 11806.2

id1230 413.836124 184.375703 11916.8

id1948 502.953953 173.237159 12468.3

read_fwf 支持 dtype 参数，用于指定与推断类型不同的列类型

In [172]: pd.read_fwf("bar.csv", header=None, index_col=0).dtypes

Out[172]:

1 float64

2 float64

3 float64

dtype: object

In [173]: pd.read_fwf("bar.csv", header=None, dtype={2: "object"}).dtypes

Out[173]:

0 object

1 float64

2 object

3 float64

dtype: object

20 索引

20.1 文件隐式索引列

考虑下面这个文件，表头的数量比数据列的数量少一个

In [174]: print(open("foo.csv").read())

A,B,C

20090101,a,1,2

20090102,b,3,4

20090103,c,4,5

在这种情况下，read_csv 假定第一列数据将作为 DataFrame 的索引

In [175]: pd.read_csv("foo.csv")

Out[175]:

A B C

20090101 a 1 2

20090102 b 3 4

20090103 c 4 5

注意: 在这种情况下，日期不会自动解析，你需要像以前一样指定参数

In [176]: df = pd.read_csv("foo.csv", parse_dates=True)

In [177]: df.index

Out[177]: DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03'], dtype='datetime64[ns]', freq=None)

20.2 多级索引

假设您有两列索引的数据

In [178]: print(open("data/mindex_ex.csv").read())

year,indiv,zit,xit

1977,"A",1.2,.6

1977,"B",1.5,.5

1977,"C",1.7,.8

1978,"A",.2,.06

1978,"B",.7,.2

1978,"C",.8,.3

1978,"D",.9,.5

1978,"E",1.4,.9

1979,"C",.2,.15

1979,"D",.14,.05

1979,"E",.5,.15

1979,"F",1.2,.5

1979,"G",3.4,1.9

1979,"H",5.4,2.7

1979,"I",6.4,1.2

可以使用 index_col 参数传递一个列索引列表，以将多个列组合为多重索引

In [179]: df = pd.read_csv("data/mindex_ex.csv", index_col=[0, 1])

In [180]: df

Out[180]:

zit xit

year indiv

1977 A 1.20 0.60

B 1.50 0.50

C 1.70 0.80

1978 A 0.20 0.06

B 0.70 0.20

C 0.80 0.30

D 0.90 0.50

E 1.40 0.90

1979 C 0.20 0.15

D 0.14 0.05

E 0.50 0.15

F 1.20 0.50

G 3.40 1.90

H 5.40 2.70

I 6.40 1.20

In [181]: df.loc[1978]

Out[181]:

zit xit

indiv

A 0.2 0.06

B 0.7 0.20

C 0.8 0.30

D 0.9 0.50

E 1.4 0.90

20.3 列多级索引

通过为 header 参数指定行号列表，您可以将列表中读取的行作为列的多级索引。

如果指定了非连续的行，将会跳过中间的行

In [182]: from pandas._testing import makeCustomDataframe as mkdf

In [183]: df = mkdf(5, 3, r_idx_nlevels=2, c_idx_nlevels=4)

In [184]: df.to_csv("mi.csv")

In [185]: print(open("mi.csv").read())

C0,,C_l0_g0,C_l0_g1,C_l0_g2

C1,,C_l1_g0,C_l1_g1,C_l1_g2

C2,,C_l2_g0,C_l2_g1,C_l2_g2

C3,,C_l3_g0,C_l3_g1,C_l3_g2

R0,R1,,,

R_l0_g0,R_l1_g0,R0C0,R0C1,R0C2

R_l0_g1,R_l1_g1,R1C0,R1C1,R1C2

R_l0_g2,R_l1_g2,R2C0,R2C1,R2C2

R_l0_g3,R_l1_g3,R3C0,R3C1,R3C2

R_l0_g4,R_l1_g4,R4C0,R4C1,R4C2

In [186]: pd.read_csv("mi.csv", header=[0, 1, 2, 3], index_col=[0, 1])

Out[186]:

C0 C_l0_g0 C_l0_g1 C_l0_g2

C1 C_l1_g0 C_l1_g1 C_l1_g2

C2 C_l2_g0 C_l2_g1 C_l2_g2

C3 C_l3_g0 C_l3_g1 C_l3_g2

R0 R1

R_l0_g0 R_l1_g0 R0C0 R0C1 R0C2

R_l0_g1 R_l1_g1 R1C0 R1C1 R1C2

R_l0_g2 R_l1_g2 R2C0 R2C1 R2C2

R_l0_g3 R_l1_g3 R3C0 R3C1 R3C2

R_l0_g4 R_l1_g4 R4C0 R4C1 R4C2

read_csv 还能够解析一种更常见的多列索引格式

,a,a,a,b,c,c

,q,r,s,t,u,v

one,1,2,3,4,5,6

two,7,8,9,10,11,12

In [188]: pd.read_csv("mi2.csv", header=[0, 1], index_col=0)

Out[188]:

a b c

q r s t u v

one 1 2 3 4 5 6

two 7 8 9 10 11 12

注意: 如果没有指定 index_col (例如，数据没有索引)，那么列索引上的列名都会丢失

21 自动嗅探分隔符

read_csv 能够推断出文件的分隔符，因为 pandas 使用了 csv 模块的 csv.Sniffer 类。为此，你必须指定 sep=None。

In [189]: print(open("tmp2.sv").read())

:0:1:2:3

0:0.4691122999071863:-0.2828633443286633:-1.5090585031735124:-1.1356323710171934

1:1.2121120250208506:-0.17321464905330858:0.11920871129693428:-1.0442359662799567

2:-0.8618489633477999:-2.1045692188948086:-0.4949292740687813:1.071803807037338

3:0.7215551622443669:-0.7067711336300845:-1.0395749851146963:0.27185988554282986

4:-0.42497232978883753:0.567020349793672:0.27623201927771873:-1.0874006912859915

5:-0.6736897080883706:0.1136484096888855:-1.4784265524372235:0.5249876671147047

6:0.4047052186802365:0.5770459859204836:-1.7150020161146375:-1.0392684835147725

7:-0.3706468582364464:-1.1578922506419993:-1.344311812731667:0.8448851414248841

8:1.0757697837155533:-0.10904997528022223:1.6435630703622064:-1.4693879595399115

9:0.35702056413309086:-0.6746001037299882:-1.776903716971867:-0.9689138124473498

In [190]: pd.read_csv("tmp2.sv", sep=None, engine="python")

Out[190]:

Unnamed: 0 0 1 2 3

0 0 0.469112 -0.282863 -1.509059 -1.135632

1 1 1.212112 -0.173215 0.119209 -1.044236

2 2 -0.861849 -2.104569 -0.494929 1.071804

3 3 0.721555 -0.706771 -1.039575 0.271860

4 4 -0.424972 0.567020 0.276232 -1.087401

5 5 -0.673690 0.113648 -1.478427 0.524988

6 6 0.404705 0.577046 -1.715002 -1.039268

7 7 -0.370647 -1.157892 -1.344312 0.844885

8 8 1.075770 -0.109050 1.643563 -1.469388

9 9 0.357021 -0.674600 -1.776904 -0.968914

22 逐块地遍历文件

假设你希望惰性地遍历一个(可能非常大的)文件，而不是一次性将整个文件读入内存

例如，有如下文件

In [191]: print(open("tmp.sv").read())

|0|1|2|3

0|0.4691122999071863|-0.2828633443286633|-1.5090585031735124|-1.1356323710171934

1|1.2121120250208506|-0.17321464905330858|0.11920871129693428|-1.0442359662799567

2|-0.8618489633477999|-2.1045692188948086|-0.4949292740687813|1.071803807037338

3|0.7215551622443669|-0.7067711336300845|-1.0395749851146963|0.27185988554282986

4|-0.42497232978883753|0.567020349793672|0.27623201927771873|-1.0874006912859915

5|-0.6736897080883706|0.1136484096888855|-1.4784265524372235|0.5249876671147047

6|0.4047052186802365|0.5770459859204836|-1.7150020161146375|-1.0392684835147725

7|-0.3706468582364464|-1.1578922506419993|-1.344311812731667|0.8448851414248841

8|1.0757697837155533|-0.10904997528022223|1.6435630703622064|-1.4693879595399115

9|0.35702056413309086|-0.6746001037299882|-1.776903716971867|-0.9689138124473498

In [192]: table = pd.read_csv("tmp.sv", sep="|")

In [193]: table

Out[193]:

Unnamed: 0 0 1 2 3

0 0 0.469112 -0.282863 -1.509059 -1.135632

1 1 1.212112 -0.173215 0.119209 -1.044236

2 2 -0.861849 -2.104569 -0.494929 1.071804

3 3 0.721555 -0.706771 -1.039575 0.271860

4 4 -0.424972 0.567020 0.276232 -1.087401

5 5 -0.673690 0.113648 -1.478427 0.524988

6 6 0.404705 0.577046 -1.715002 -1.039268

7 7 -0.370647 -1.157892 -1.344312 0.844885

8 8 1.075770 -0.109050 1.643563 -1.469388

9 9 0.357021 -0.674600 -1.776904 -0.968914

通过为 read_csv 指定 chunksize，将会返回 TextFileReader 类型的可迭代对象:

In [194]: with pd.read_csv("tmp.sv", sep="|", chunksize=4) as reader:

.....: reader

.....: for chunk in reader:

.....: print(chunk)

.....:

Unnamed: 0 0 1 2 3

0 0 0.469112 -0.282863 -1.509059 -1.135632

1 1 1.212112 -0.173215 0.119209 -1.044236

2 2 -0.861849 -2.104569 -0.494929 1.071804

3 3 0.721555 -0.706771 -1.039575 0.271860

Unnamed: 0 0 1 2 3

4 4 -0.424972 0.567020 0.276232 -1.087401

5 5 -0.673690 0.113648 -1.478427 0.524988

6 6 0.404705 0.577046 -1.715002 -1.039268

7 7 -0.370647 -1.157892 -1.344312 0.844885

Unnamed: 0 0 1 2 3

8 8 1.075770 -0.10905 1.643563 -1.469388

9 9 0.357021 -0.67460 -1.776904 -0.968914

指定 iterator=True 可以返回 TextFileReader 对象

In [195]: with pd.read_csv("tmp.sv", sep="|", iterator=True) as reader:

.....: reader.get_chunk(5)

23 指定解析引擎

pandas 有两个解析器：

用 C 语言实现的快速高效的解析器

用 Python 实现的功能更加完善的解析器

pandas 尽可能使用 C 解析器，但是如果指定了 C 不支持的选项，将会使用 Python 解析器。

C 不支持的选项包括：

sep：除了单字符之外，如，正则表达式

skipfooter：

sep=None 且 delim_whitespace=False

除非使用 engine ='python' 明确选择 python 引擎，否则指定以上任何选项都将产生 ParserWarning

24 远程文件读写

您可以传入一个 URL 来读取或写入远程文件，以下示例显示读取 CSV 文件:

df = pd.read_csv("https://download.bls.gov/pub/time.series/cu/cu.item", sep="\t")

25 数据写出

25.1 写出 CSV 格式

Series 和 DataFrame 对象都有 to_csv 方法，该方法允许将对象的内容存储为逗号分隔文件。

该函数带有多个参数。只有第一个是必须的

path_or_buf: 要写入文件的路径或文件对象，如果是文件对象，必须使用 newline=''

sep : 输出文件的字段分隔符 (默认为 ,)

na_rep: 缺失值的字符串表示形式(默认为 '')

float_format: 浮点数的格式化字符串

columns: 要写入的列 (默认为 None)

header: 是否写出列名 (默认 True)

index: 是否写出索引名 (默认 True)

index_label: 索引列的列名，默认为 None，并且 header 和 index 为 True，则使用索引名称。如果 DataFrame 使用 MultiIndex，则应给出一个序列

mode : Python 写入模式，默认 'w'

encoding: 字符串编码格式

line_terminator: 表示行尾的字符序列(默认 os.linesep)

quoting: 在 csv 模块中设置引用规则(默认为 csv.QUOTE_MINIMAL)。注意：如果您设置了 float_format，那么浮点数将被转换为字符串，csv.QUOTE_NONNUMERIC 将它们视为非数字

quotechar: 用于引用字段的字符(默认 ")

doublequote: 控制字段中 quotechar 的引用 (默认 True)

escapechar: 在适当时用于转义 sep 和 quotechar 的字符(默认 None)

chunksize: 一次写入的行数

date_format: datetime 对象的格式化字符串

25.2 写出格式化字符串

DataFrame 对象具有 to_string 实例方法，该方法允许控制对象的字符串表示形式。所有参数都是可选的

buf: 默认为 None，例如一个 StringIO 对象

columns: 默认为 None, 要写入的列

col_space: 默认为 None，每列的最小宽度。

na_rep: 默认为 NaN, NA 值的表示

formatters: 默认为 None, 一个函数的字典(按列)，每个函数接受一个参数并返回一个格式化的字符串。

float_format: 默认为 None, 一个带有单个(浮点)参数并返回格式化字符串的函数；应用于 DataFrame 中的浮点数

sparsify: 默认为 True, 对于具有层次结构索引的 DataFrame，将其设置为 False 可在每一行打印每个 MultiIndex 键

index_names: 默认为 True, 将打印索引名

index: 默认为 True, 打印索引

header: 默认为 True, 打印列名

justify: 默认 left, 列左对齐或右对齐

Series 对象也有一个 to_string 方法，但是只有 buf、na_rep、float_format 参数。

还有一个 length 参数，如果设置为 True，将额外输出序列的长度