Pandas 2.2 中文文档（九十二）-CSDN博客

原文：pandas.pydata.org/docs/

1.1.5 版本的新特性（2020 年 12 月 7 日）

原文：pandas.pydata.org/docs/whatsnew/v1.1.5.html

这些是 pandas 1.1.5 版本的更改。查看发行说明以获取包括其他 pandas 版本的完整更改日志。

修复的回归

修复了将类似时间间隔的标量加到DatetimeIndex时不正确引发的回归问题（GH 37295）
修复了当Series的Index的名称为元组时，调用Series.groupby()时引发的回归问题（GH 37755）
修复了在给定一维元组以从MultiIndex选择时，DataFrame.loc()和Series.loc()中__setitem__的回归问题（GH 37711）
修复了对带有 NumPy dtyped 操作数的ExtensionDtype的Series进行原地操作时的回归问题（GH 37910）
修复了对groupby迭代器进行元数据传播时的回归问题（GH 37343）
修复了从DatetimeIndex构造的MultiIndex不保留频率的回归问题（GH 35563）
修复了在将具有 datetime64 值的SparseArray传递给Index构造函数时引发AttributeError的回归问题（GH 35843）
修复了在具有整数 dtype 的列中对DataFrame.unstack()的回归问题（GH 37115）
修复了在反序列化后对带有CategoricalDtype的Series进行索引时的回归问题（GH 37631）
修复了在对象类型列中存在超出范围的日期时间对象时，DataFrame.groupby() 聚合的回归问题 (GH 36003)
修复了在对标签进行分组时，当索引中存在标签时，df.groupby(..).rolling(..)生成的 MultiIndex 的回归问题 (GH 37641)
修复了在其他操作（例如 DataFrame.pivot()）之后未填充 NaN 的DataFrame.fillna() 的回归问题 (GH 36495).
修复了df.groupby(..).rolling(..)中的性能回归问题 (GH 38038)
修复了在至少一个索引具有重复项时，MultiIndex.intersection() 返回重复项的回归问题 (GH 36915)
修复了DataFrameGroupBy.first()，SeriesGroupBy.first()，DataFrameGroupBy.last()和SeriesGroupBy.last()在将None视为非 NA 值时的回归问题 (GH 38286) ## Bug 修复
Python 3.9 中 pytables 方法的错误 (GH 38041) ## 其他
只在 CI 任务中设置 -Werror 作为编译器标志 (GH 33315, GH 33314) ## 贡献者

总共有 12 人为此版本提交了补丁。名字旁边有“+”符号的人第一次为此版本提交了补丁。

Andrew Wieteska
Fangchen Li
Janus
Joris Van den Bossche
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Richard Shadrach
Simon Hawkins
Uwe L. Korn
jbrockmendel
patrick ## 修复的回归问题
修复了将时间间隔标量添加到 DatetimeIndex 时错误地引发的回归问题 (GH 37295)
修复了当Series的Index的名称为元组时，Series.groupby()引发异常的回归问题（GH 37755）
修复了在为MultiIndex选择一维元组进行__setitem__时，DataFrame.loc()和Series.loc()上的回归问题（GH 37711）
修复了在使用带有 NumPy dtyped 操作数的ExtensionDtype的Series上进行原地操作时的回归问题（GH 37910）
修复了对groupby迭代器进行元数据传播时的回归问题（GH 37343）
修复了从DatetimeIndex构造的MultiIndex不保留频率的回归问题（GH 35563）
修复了在将具有 datetime64 值的SparseArray传递给构造函数时，Index引发AttributeError的回归问题（GH 35843）
修复了具有整数数据类型的列的DataFrame.unstack()中的回归问题（GH 37115）
修复了在反序列化后使用CategoricalDtype的Series进行索引时的回归问题（GH 37631）
修复了在对象数据类型列中存在越界日期时间对象时，DataFrame.groupby()聚合的回归问题（GH 36003）
修复了使用df.groupby(..).rolling(..)进行分组时，当按索引中的标签进行分组时，导致的结果MultiIndex的回归问题（GH 37641）
修复了DataFrame.fillna()在其他操作（如DataFrame.pivot()）后未填充NaN的回归问题（GH 36495）
修复了df.groupby(..).rolling(..)中的性能退化（GH 38038）
修复了MultiIndex.intersection()中的回归问题，当索引中至少有一个重复项时返回重复项（GH 36915）
修复了DataFrameGroupBy.first()、SeriesGroupBy.first()、DataFrameGroupBy.last()和SeriesGroupBy.last()中的回归问题，当None被视为非缺失值时（GH 38286）

修复的 Bug

Python 3.9 中 pytables 方法的错误（GH 38041）

其他

仅在 CI 作业中设置-Werror作为编译器标志（GH 33315, GH 33314）

贡献者

共有 12 人为此版本贡献了补丁。名字旁边有“+”符号的人第一次贡献了补丁。

Andrew Wieteska
Fangchen Li
Janus
Joris Van den Bossche
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Richard Shadrach
Simon Hawkins
Uwe L. Korn
jbrockmendel
patrick

1.1.4 新内容（2020 年 10 月 30 日）

原文：pandas.pydata.org/docs/whatsnew/v1.1.4.html

这些是 pandas 1.1.4 版本的更改。查看发布说明以获取包括其他 pandas 版本的完整更改日志。

修复的回归问题

修复了当names为dict_keys类型时，read_csv()抛出ValueError的回归问题 (GH 36928)
修复了read_csv()在指定了index_col参数且超过 100 万行时的回归问题 (GH 37094)
修复了尝试变异DateOffset对象时不再引发AttributeError的回归问题 (GH 36940)
修复了当传递位置参数以传递给聚合函数时，DataFrame.agg()会失败的回归问题 (GH 36948)
修复了带有sort=False参数的RollingGroupby中未能被尊重的回归问题 (GH 36889)
修复了Series.astype()方法在将None转换为"nan"时的回归问题 (GH 36904)
修复了对只读数据的Series.rank()方法失败的回归问题 (GH 37290)
修复了在RollingGroupby中导致索引为对象 dtype 时段错误的回归问题 (GH 36727)
修复了当输入为DataFrame且仅评估了Series时，DataFrame.resample(...).apply(...)()引发AttributeError的回归问题 (GH 36951)
修复了带有可空整数 dtype 的DataFrame.groupby(..).std()中的回归问题 (GH 37415)
修复了PeriodDtype与其字符串表示形式的相等性和不相等性比较的回归问题 (GH 37265)
修复了切片DatetimeIndex在不规则时间序列上出现 AssertionError 或在未排序索引上时引发的回归问题，带有 pd.NaT (GH 36953) 或未排序索引 (GH 35509)
修复了某些偏移量的回归问题（pd.offsets.Day()及以下），不再可散列 (GH 37267)
修复了 StataReader 中的回归，当使用迭代器读取数据集时，需要手动设置 chunksize (GH 37280)
修复了与 DataFrame.iloc() 结合使用时的 setitem 回归问题，尝试在使用布尔列表进行筛选时设置值时引发错误 (GH 36741)
修复了在将 Series 与 setitem 结合使用时，在设置值之前对齐的回归问题 (GH 37427)
修复了 MultiIndex.is_monotonic_increasing 在至少一个级别中出现 NaN 时返回错误结果的回归问题 (GH 37220)
修复了对 Series 的就地算术操作（+=）不更新父 DataFrame/Series 的回归问题 (GH 36373) ## 问题修复
Bug causing groupby(...).sum() 和类似操作不保留元数据 (GH 29442)
Series.isin() 和 DataFrame.isin() 中的 Bug 在目标为只读时引发 ValueError (GH 37174)
DataFrameGroupBy.fillna() 和 SeriesGroupBy.fillna() 中的 Bug 在 1.0.5 之后引入了性能回归 (GH 36757)
DataFrame.info() 中的 Bug 在 DataFrame 具有整数列名时引发 KeyError (GH 37245)
DataFrameGroupby.apply()中的错误会在分组时删除CategoricalIndex（[GH 35792](https://github.com/pandas-dev/pandas/issues/35792） ## 贡献者

总共有 18 人为此版本贡献了补丁。名字后面带有“+”的人第一次贡献了补丁。

Daniel Saxton
Fangchen Li
Janus +
Joris Van den Bossche
Kevin Sheppard
Marco Gorelli
Matt Roeschke
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Paul Ganssle
Richard Shadrach
Simon Hawkins
Thomas Smith
Tobias Pitters
abmyii +
jbrockmendel
patrick ## 修复的回归问题
修复了read_csv()方法在names为dict_keys类型时引发ValueError的回归问题（GH 36928）
修复了read_csv()在超过 100 万行并指定index_col参数时的回归问题（GH 37094）
修复了尝试改变DateOffset对象时不再引发AttributeError的回归问题（GH 36940）
修复了DataFrame.agg()在传递给聚合函数的位置参数时会失败并引发TypeError的回归问题（GH 36948)。
修复了RollingGroupby中sort=False未被尊重的回归问题（GH 36889）
修复了Series.astype()将None转换为"nan"时转换为字符串时的回��问题（GH 36904）
修复了Series.rank()方法在只读数据上失败的回归问题（GH 37290）
修复了RollingGroupby在对象 dtype 的索引下导致分段错误的回归问题（GH 36727）
修复了DataFrame.resample(...).apply(...)()在输入为DataFrame且仅评估Series时引发AttributeError的回归问题（GH 36951）
修复了带有可空整数 dtype 的DataFrame.groupby(..).std()的回归问题（GH 37415）
修复了PeriodDtype在比较相等和不相等时与其字符串表示形式的回归问题（GH 37265)
修复了在不规则时间序列上切片 DatetimeIndex 或未排序索引时引发 AssertionError 的回归问题（GH 36953 和 GH 35509)
修复了某些偏移量（pd.offsets.Day()及以下）不再可哈希的回归问题（GH 37267)
修复了使用迭代器读取数据集时，需要手动设置 chunksize 的 StataReader 的回归问题（GH 37280)
修复了在使用布尔列表过滤时尝试设置值时引发错误的 DataFrame.iloc() 的回归问题（GH 36741)
修复了设置一系列值之前对齐 Series 的回归问题（GH 37427)
修复了 MultiIndex.is_monotonic_increasing 在至少一个级别中存在 NaN 时返回错误结果的回归问题（GH 37220)
修复了对 Series 进行就地算术运算（+=）时未更新父 DataFrame/Series 的回归问题（GH 36373)

Bug fixes

导致 groupby(...).sum() 和类似方法不保留元数据的错误已修复（GH 29442)
当目标是只读时，Series.isin() 和 DataFrame.isin() 引发 ValueError 的错误已修复（GH 37174)
DataFrameGroupBy.fillna() 和 SeriesGroupBy.fillna() 中引入的性能回归问题已在 1.0.5 之后修复（GH 36757)
在 DataFrame.info() 中存在的 Bug，在 DataFrame 有整数列名时引发了 KeyError (GH 37245)
在 DataFrameGroupby.apply() 中存在的 Bug 会在分组时丢弃 CategoricalIndex (GH 35792)

贡献者

总共有 18 人对此版本进行了贡献。名字后面带有“+”的人第一次贡献了一个补丁。

Daniel Saxton
Fangchen Li
Janus +
Joris Van den Bossche
Kevin Sheppard
Marco Gorelli
Matt Roeschke
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Paul Ganssle
Richard Shadrach
Simon Hawkins
Thomas Smith
Tobias Pitters
abmyii +
jbrockmendel
patrick

1.1.3 的新功能 (2020 年 10 月 5 日)

原文：pandas.pydata.org/docs/whatsnew/v1.1.3.html

这些是 pandas 1.1.3 中的变更。查看发布说明了解包括其他版本的 pandas 在内的完整更改日志。

增强功能

添加了对新版 Python 的支持

pandas 1.1.3 现在支持 Python 3.9 (GH 36296).

开发变更

Cython 的最低版本现在是最新的 bug 修复版本 (0.29.21) (GH 36296).

修复回归

修复了在未应用重标签时，DataFrame.agg()、DataFrame.apply()、Series.agg() 和 Series.apply() 中内部后缀暴露给用户的回归（GH 36189)
修复了IntegerArray的一元加和减操作引发TypeError的回归（GH 36063)
修复了将timedelta_range()添加到Timestamp时引发ValueError的回归（GH 35897)
修复了当输入为元组时，Series.__getitem__()不正确地引发异常的回归（GH 35534)
修复了Series.__getitem__()在输入为 frozenset 时不正确地引发异常的回归（GH 35747)
修复了使用 numexpr 使用 C 而不是 Python 语义的 Index、Series 和 DataFrame 的模数的回归（GH 36047, GH 36526)
修复了read_excel()在engine="odf"下的回归，某些情况下，当单元格具有嵌套子节点时导致UnboundLocalError（GH 36122, GH 35802)
修复了在使用替换方法时，DataFrame.replace()中使用浮点数时不一致的替换的回归问题（GH 35376)
修复了在包含Timestamp的MultiIndex上的Series.loc()上引发InvalidIndexError的回归问题（GH 35858)
修复了数值数组和字符串之间的DataFrame和Series比较中的回归问题（GH 35700，GH 36377)
修复了带有raw=True和用户函数返回字符串的DataFrame.apply()的回归问题（GH 35940)
修复了将空DataFrame列设置为Series时保留框架索引名称的回归问题（GH 36527)
修复了超过最大时间戳的序数的Period的错误值的回归问题（GH 36430)
修复了当delim_whitespace设置为True时，read_table()引发ValueError的回归问题（GH 35958)
修复了在归一化前时期日期时结果向后移动一天的Series.dt.normalize()的回归问题（GH 36294) ## Bug 修复
read_spss()中的错误，在将pathlib.Path作为path传递时会引发TypeError（GH 33666)
Series.str.startswith() 和 Series.str.endswith() 中的错误，category dtype 未传播 na 参数 (GH 36241)
在为提供索引时，Series 构造函数中整数溢出问题（当输入的标量足够大时） (GH 36291)
DataFrame.sort_values() 中的错误，在按键排序时将列转换为分类 dtype 时引发 AttributeError (GH 36383)
在基于位置堆叠具有重复名称级别的 MultiIndex 列时，DataFrame.stack() 中引发 ValueError 的错误 (GH 36353)
在从 np.float32 转换为字符串 dtype 时，Series.astype() 中的错误，显示了过多的精度 (GH 36451)
在使用 NaN 和行长度超过 1,000,000 时，Series.isin() 和 DataFrame.isin() 中的错误 (GH 22205)
在传递了 ordered=False 的 Series 标签的情况下，cut() 中引发 ValueError 的错误 (GH 36603)
pandas-1.1.0 增强功能中的回退，当传入 start、stop 和 periods 时，timedelta_range() 推断频率时发生的问题（GH 32377）

总共有 16 位贡献者为此版本提供了补丁。在其名字后带有“+”符号的人第一次为此版本提供了补丁。

Asish Mahapatra
Dan Moore +
Daniel Saxton
Fangchen Li
Hans
Irv Lustig
Joris Van den Bossche
Kaiqi Dong
MeeseeksMachine
Number42 +
Pandas 开发团队
Richard Shadrach
Simon Hawkins
jbrockmendel
nrebena
patrick

增强功能

对新 Python 版本的支持

pandas 1.1.3 现在支持 Python 3.9（GH 36296）

开发变更

Cython 的最小版本现在是最新的 bug 修复版本（0.29.21）（GH 36296）

增加了对新 Python 版本的支持

pandas 1.1.3 现在支持 Python 3.9（GH 36296）

开发变更

Cython 的最小版本现在是最新的 bug 修复版本（0.29.21）（GH 36296）

修复的回归问题

修复了DataFrame.agg()、DataFrame.apply()、Series.agg()和Series.apply()中的回归问题，在没有应用重新标记时，内部后缀暴露给用户（GH 36189）
修复了IntegerArray一元加法和减法操作引发TypeError的回归问题（GH 36063）
修复了将timedelta_range()添加到Timestamp时引发ValueError的回归问题（GH 35897）
修复了Series.__getitem__()中的回归问题，当输入为元组时错误地引发异常（GH 35534）
修复了Series.__getitem__()中的回归问题，当输入为 frozenset 时错误地引发异常（GH 35747）
修复了使用numexpr的 C 而不是 Python 语义的Index、Series和DataFrame的模数回归问题（GH 36047，GH 36526）
修复了read_excel()中engine="odf"引起的回归问题，在某些情况下，当单元格具有嵌套子节点时会导致UnboundLocalError（GH 36122，GH 35802）
修复了DataFrame.replace()中的回归问题，在替换方法中使用浮点数时替换不一致的问题（GH 35376)
修复了Series.loc()中的回归问题，在包含MultiIndex的Series上使用Timestamp时引发InvalidIndexError（GH 35858)
修复了DataFrame和Series之间比较数值数组和字符串时的回归问题（GH 35700, GH 36377)
修复了DataFrame.apply()中的回归问题，使用raw=True并且用户函数返回字符串时的问题（GH 35940)
修复了将空的DataFrame列设置为Series时保留索引名称的回归问题（GH 36527)
修复了Period中的回归问题，超过最大时间戳的序数值不正确的问题（GH 36430)
修复了read_table()中的回归问题，当delim_whitespace设置为True时引发ValueError（GH 35958)
修复了Series.dt.normalize()中的回归问题，当对前纪元日期进行归一化时，结果会向后偏移一天（GH 36294)

Bug 修复

在read_spss()中修复了一个 bug，当将pathlib.Path作为path传递时会引发TypeError（GH 33666)
在具有 category dtype 的情况下，Series.str.startswith() 和 Series.str.endswith() 中未传播 na 参数的错误 (GH 36241)
在为提供了索引的情况下，Series 构造函数中会发生整数溢出的错误，当输入的标量足够大时 (GH 36291)
在对将列转换为分类 dtype 的键进行排序时，DataFrame.sort_values() 引发 AttributeError 的错误 (GH 36383)
在基于位置堆叠具有重复名称级别的 MultiIndex 列时，DataFrame.stack() 引发 ValueError 的错误 (GH 36353)
当从 np.float32 转换为字符串 dtype 时，Series.astype() 显示过多的精度的错误 (GH 36451)
使用 NaN 和行长度超过 1,000,000 时 Series.isin() 和 DataFrame.isin() 存在的问题（GH 22205）
在传递 ordered=False 的 Series 标签的情况下，cut() 引发 ValueError 的错误（GH 36603）

其他

在传递 start、stop 和 periods 时，timedelta_range() 推断频率的增强功能在 pandas-1.1.0 中被撤销了（GH 32377）

贡献者

一共有 16 人为这个版本贡献了补丁。在其名字后面带有“+”的人第一次为这个项目贡献了补丁。

Asish Mahapatra
Dan Moore +
Daniel Saxton
Fangchen Li
Hans
Irv Lustig
Joris Van den Bossche
Kaiqi Dong
MeeseeksMachine
Number42 +
Pandas 开发团队
Richard Shadrach
Simon Hawkins
jbrockmendel
nrebena
patrick

1.1.2 版本的新功能（2020 年 9 月 8 日）

原文：pandas.pydata.org/docs/whatsnew/v1.1.2.html

这些是 pandas 1.1.2 版本的更改。查看发布说明获取包括其他版本的完整更改日志。

修复的回归

DatetimeIndex.intersection()中的回归，当与列表相交时错误地引发AssertionError（GH 35876）
修复了在原地更新列时的回归（例如使用df['col'].fillna(.., inplace=True)）（GH 35731）
修复了在DataFrame.append()中混合 tz-aware 和 tz-naive datetime 列时的回归（GH 35460）
RangeIndex.format()的性能回归（GH 35712）
MultiIndex.get_loc()中的回归，当传递空列表时会返回跨越整个索引的切片（GH 35878）
修复了在索引操作后无效缓存的回归；当设置不更新数据时可能会出现这种情况（GH 35521)
DataFrame.replace()中的回归，尝试替换Interval类型元素时会引发TypeError（GH 35931）
修复了对IntervalIndex的closed属性进行 pickle 往返时的回归（GH 35658）
修复了DataFrameGroupBy.agg()中的回归，当底层数组为只读时会引发ValueError: buffer source array is read-only（GH 36014）
修复了Series.groupby.rolling()中的回归，输入中的MultiIndex级别数量被压缩为一个（GH 36018）
修复了在空DataFrame上的DataFrameGroupBy中的回归（GH 36197） ## Bug 修复
DataFrame.eval()中的错误，object dtype 列二进制操作时出现问题（GH 35794）
在构造稀疏的 datetime64 类型时，Series构造函数存在错误引发TypeError（GH 35762）
在使用result_type="reduce"的情况下，DataFrame.apply()存在错误地返回具有不正确索引的结果（GH 35683）
当设置为"ignore"时，Series.astype()和DataFrame.astype()不尊重errors参数的错误（GH 35471）
在DateTimeIndex.format()和PeriodIndex.format()中，当name=True时，将第一项设置为"None"，而应为""（GH 35712）
在Float64Index.__contains__()中，不正确地引发TypeError，而不是返回False（GH 35788）
当传递有序集时，Series构造函数不正确地引发TypeError（GH 36044）
在某些日期的情况下，Series.dt.isocalendar()和DatetimeIndex.isocalendar()返回不正确的年份（GH 36032）
在某些情况下，DataFrame索引返回不正确的Series，当系列已更改且缓存未失效时（GH 33675）
在DataFrame.corr()中存在错误，导致后续索引查找不正确（GH 35882）
在import_optional_dependency()中，在包名称与导入名称不同的情况下，返回不正确的包名称（GH 35948）
在将空DataFrame列设置为Series时保留索引名称时存在错误（GH 31368） ## 其他
factorize() 现在支持 na_sentinel=None，以在值的唯一值中包含 NaN，并删除 1.1 版本中无意中暴露给公共 API 的 dropna 关键字，来自 factorize()（GH 35667）
DataFrame.plot() 和 Series.plot() 关于使用 FixedFormatter 和 FixedLocator 引发 UserWarning（GH 35684 和 GH 35945） ## 贡献者

总共有 16 人为此版本贡献了补丁。名字后面带有“+”的人第一次贡献了补丁。

Ali McMaster
Asish Mahapatra
Daniel Saxton
Fangchen Li
Harsh Sharma +
Irv Lustig
Jeet Parekh +
Joris Van den Bossche
Kaiqi Dong
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Simon Hawkins
Terji Petersen
jbrockmendel
patrick ## 修复的回归
DatetimeIndex.intersection() 中的回归，在与列表相交时错误地引发 AssertionError（GH 35876）
修复在原地更新列时的回归（例如使用 df['col'].fillna(.., inplace=True)）（GH 35731)
修复 DataFrame.append() 中的回归，混合 tz-aware 和 tz-naive datetime 列（GH 35460）
RangeIndex.format() 的性能回归（GH 35712）
当传入空列表时，MultiIndex.get_loc() 返回整个索引范围的切片的回归（GH 35878）
修复索引操作后缓存无效的回归；当设置不更新数据时，可能会出现这种情况（GH 35521）
DataFrame.replace() 中的回归，尝试替换 Interval 类型元素时会引发 TypeError（GH 35931）
修复 pickle 往返中 IntervalIndex 的 closed 属性的回归（GH 35658）
修复了DataFrameGroupBy.agg()中的回归问题，当底层数组为只读时，会引发ValueError: buffer source array is read-only（GH 36014）
修复了Series.groupby.rolling()中的回归问题，输入中的MultiIndex级别数量被压缩为一个（GH 36018）
修复了DataFrameGroupBy在空DataFrame上的回归问题（GH 36197）

问题修复

在DataFrame.eval()中存在错误，对object dtype 列的二进制操作（GH 35794）
在构造稀疏 datetime64 dtypes 时，Series构造函数引发TypeError的错误（GH 35762）
在DataFrame.apply()中存在错误，使用result_type="reduce"返回带有不正确索引的结果（GH 35683）
在Series.astype()和DataFrame.astype()中存在错误，当设置为扩展 dtypes 的errors参数为"ignore"时不予尊重（GH 35471）
在DateTimeIndex.format()和PeriodIndex.format()中存在错误，当设置name=True时，将第一个项目设置为"None"，而应该是""（GH 35712）
在Float64Index.__contains__()中存在错误，错误地引发TypeError而不是返回False（GH 35788）
在Series构造函数中存在错误，当传递有序集时错误地引发TypeError（GH 36044）
在Series.dt.isocalendar()和DatetimeIndex.isocalendar()中存在错误，对于某些日期返回不正确的年份（GH 36032）
在某些情况下，DataFrame索引中存在错误，返回一个不正确的Series的错误（GH 33675）
DataFrame.corr()中的错误导致后续索引查找不正确（GH 35882)
在import_optional_dependency()中返回不正确的包名称的错误，情况是包名称与导入名称不同（GH 35948)
将空的DataFrame列设置为Series时保留索引名称的错误（GH 31368)

其他

factorize()现在支持na_sentinel=None，以包含 NaN 在值的唯一值中，并删除 1.1 版本中无意中暴露给公共 API 的dropna关键字，从factorize()中（GH 35667)
DataFrame.plot()和Series.plot()关于使用FixedFormatter和FixedLocator引发UserWarning（GH 35684和GH 35945)

贡献者

总共有 16 人为这个版本贡献了补丁。名字后面带有“+”的人第一次贡献了补丁。

Ali McMaster
Asish Mahapatra
Daniel Saxton
Fangchen Li
Harsh Sharma +
Irv Lustig
Jeet Parekh +
Joris Van den Bossche
Kaiqi Dong
Matthew Roeschke
MeeseeksMachine
Pandas 开发团队
Simon Hawkins
Terji Petersen
jbrockmendel
patrick

1.1.1 中的新内容（2020 年 8 月 20 日）

原文：pandas.pydata.org/docs/whatsnew/v1.1.1.html

这些是 pandas 1.1.1 中的更改。查看发行说明以获取包括其他版本的完整更改日志。

修复回归

修复了CategoricalIndex.format()中的回归，当字符串标量的长度不同时，较短的字符串将右侧填充空格，以使其与最长的字符串具有相同的长度（GH 35439）
修复了Series.truncate()中的回归，当尝试截断单元素系列时（GH 35544）
修复了DataFrame.to_numpy()中的回归，当转换为str时会引发混合类型时的RuntimeError（GH 35455）
修复了read_csv()中的回归，当设置pandas.options.mode.use_inf_as_na为True时会引发ValueError（GH 35493）
修复了pandas.testing.assert_series_equal()中的回归，当使用check_exact=True时传递非数字类型时会引发错误（GH 35446）
修复了.groupby(..).rolling(..)中的回归，忽略了列选择（GH 35486）
修复了DataFrame.interpolate()中的回归，当DataFrame为空时会引发TypeError（GH 35598）
修复了DataFrame.shift()中的回归，当axis=1且异构类型时（GH 35488）
修复了DataFrame.diff()中的回归，读取只读数据时（GH 35559）
修复了.groupby(..).rolling(..)中的回归，当使用center=True和奇数个值时会发生段错误（GH 35552）
修复了DataFrame.apply()中的回归问题，只对单行操作并在原地修改输入的函数仅操作单行的问题（GH 35462）
修复了DataFrame.reset_index()中的回归问题，在带有datetime64类型级别的MultiIndex的空DataFrame上会引发ValueError的问题（GH 35606, GH 35657）
修复了pandas.merge_asof()在设置了left_index、right_index和tolerance时会引发UnboundLocalError的回归问题（GH 35558）
修复了.groupby(..).rolling(..)中的回归问题，会忽略自定义的BaseIndexer的问题（GH 35557）
修复了DataFrame.replace()和Series.replace()中的回归问题，编译的正则表达式在替换过程中被忽略的问题（GH 35680）
修复了DataFrameGroupBy.aggregate()中的回归问题，当函数列表中至少有一个函数不进行聚合时，会产生错误的结果（GH 35490）
修复了在实例化大型pandas.arrays.StringArray时出现的内存使用问题（GH 35499） ## Bug fixes
Styler中的错误，由于最近的其他更改，cell_ids参数没有效果的问题已修复（GH 35588）（GH 35663）
在pandas.testing.assert_series_equal()和pandas.testing.assert_frame_equal()中的一个错误，在check_dtypes设置为False时，未忽略扩展 dtype（GH 35715）
在to_timedelta()中的一个错误，在arg是包含空值的Series时失败（GH 35574）
.groupby(..).rolling(..)中的一个错误，在使用列选择时传递closed会引发ValueError（GH 35549）
在DataFrame构造函数中的一个错误，在某些情况下，当data和index长度不匹配时，未能引发ValueError（GH 33437） ## 贡献者

共有 20 人为此版本做出了贡献。名字后带有“+”的人第一次为此版本做出了贡献。

Ali McMaster
Daniel Saxton
Eric Goddard +
Fangchen Li
Isaac Virshup
Joris Van den Bossche
Kevin Sheppard
Matthew Roeschke
MeeseeksMachine +
Pandas 开发团队
Richard Shadrach
Simon Hawkins
Terji Petersen
Tom Augspurger
Yutaro Ikeda +
attack68 +
edwardkong +
gabicca +
jbrockmendel
sanderland + ## 修复的回归问题
修复了CategoricalIndex.format()中的回归问题，当字符串化的标量长度不同时，较短的字符串将右填充空格，以使其与最长的字符串具有相同的长度（GH 35439）
修复了Series.truncate()中的回归问题，当尝试截断单个元素系列时（GH 35544）
修复了DataFrame.to_numpy()中的回归问题，当转换为str时，混合 dtype 会引发RuntimeError（GH 35455）
修复了当pandas.options.mode.use_inf_as_na设置为True时，read_csv()会引发ValueError的回归问题（GH 35493）
修复了pandas.testing.assert_series_equal()在传递非数值类型时使用check_exact=True会引发错误的回归问题（GH 35446)
修复了.groupby(..).rolling(..)中忽略列选择的回归问题（GH 35486)
修复了DataFrame.interpolate()在空的DataFrame上会引发TypeError的回归问题（GH 35598)
修复了DataFrame.shift()在axis=1和异构数据类型时的回归问题（GH 35488)
修复了DataFrame.diff()在只读数据上的回归问题（GH 35559)
修复了.groupby(..).rolling(..)中使用center=True和奇数个值时会导致段错误的回归问题（GH 35552)
修复了DataFrame.apply()中只对单行操作的函数进行原位更改的回归问题（GH 35462)
修复了DataFrame.reset_index()在空的带有datetime64数据类型级别的MultiIndex的DataFrame上会引发ValueError的回归问题（GH 35606, GH 35657)
修复了pandas.merge_asof()在设置left_index、right_index和tolerance时会引发UnboundLocalError的回归问题（GH 35558)
修复了.groupby(..).rolling(..)中自定义BaseIndexer被忽略的回归问题（GH 35557)
修复了DataFrame.replace()和Series.replace()中的回归问题，编译的正则表达式在替换过程中被忽略（GH 35680)
修复了DataFrameGroupBy.aggregate()中的回归问题，当函数列表中至少有一个函数未聚合时会产生错误的结果（GH 35490)
在实例化大型pandas.arrays.StringArray时修复了内存使用问题（GH 35499)

Bug fixes

Styler中的错误，由于其他最近的更改，cell_ids参数没有效果（GH 35588) (GH 35663)
pandas.testing.assert_series_equal()和pandas.testing.assert_frame_equal()中的错误，当check_dtypes设置为False时未忽略扩展数据类型（GH 35715)
to_timedelta()中的错误，当arg是包含空值的Int64类型的Series时会失败（GH 35574)
在.groupby(..).rolling(..)中传递带有列选择的closed会引发ValueError的错误（GH 35549)
DataFrame构造函数中的错误，在某些情况下data和index长度不匹配时未引发ValueError（GH 33437)

Contributors

总共有 20 人为此版本贡献了补丁。名字后面带有“+”的人第一次贡献了补丁。

Ali McMaster
Daniel Saxton
Eric Goddard +
Fangchen Li
Isaac Virshup
Joris Van den Bossche
Kevin Sheppard
Matthew Roeschke
MeeseeksMachine +
Pandas 开发团队
Richard Shadrach
Simon Hawkins
Terji Petersen
Tom Augspurger
Yutaro Ikeda +
attack68 +
edwardkong +
gabicca +
jbrockmendel
sanderland +

1.1.0 版本的新功能（2020 年 7 月 28 日）

原文：pandas.pydata.org/docs/whatsnew/v1.1.0.html

这些是 pandas 1.1.0 版本的变化。查看发布说明以获取包括其他版本的 pandas 在内的完整更改日志。

增强功能

loc 引发的 KeyErrors 指定了缺失的标签

以前，如果在.loc调用中缺少标签，将引发 KeyError，指出不再支持此操作。

现在错误消息还包括缺失标签的列表（最多 10 个项目，显示宽度为 80 个字符）。请参阅GH 34272。 ### 所有 dtypes 现在都可以转换为StringDtype

以前，声明或转换为StringDtype通常仅在数据已经是str或类似 nan 时才可能。(GH 31204)。现在，在所有astype(str)或dtype=str有效的情况下，StringDtype都可以工作：

例如，现在以下操作有效：

In [1]: ser = pd.Series([1, "abc", np.nan], dtype="string")

In [2]: ser
Out[2]: 
0       1
1     abc
2    <NA>
dtype: string

In [3]: ser[0]
Out[3]: '1'

In [4]: pd.Series([1, 2, np.nan], dtype="Int64").astype("string")
Out[4]: 
0       1
1       2
2    <NA>
dtype: string 
```### 非单调 PeriodIndex 部分字符串切片

`PeriodIndex`现在支持非单调索引的部分字符串切片，反映了`DatetimeIndex`的行为（[GH 31096](https://github.com/pandas-dev/pandas/issues/31096))

例如：

```py
In [5]: dti = pd.date_range("2014-01-01", periods=30, freq="30D")

In [6]: pi = dti.to_period("D")

In [7]: ser_monotonic = pd.Series(np.arange(30), index=pi)

In [8]: shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))

In [9]: ser = ser_monotonic.iloc[shuffler]

In [10]: ser
Out[10]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
 ..
2015-09-23    21
2015-11-22    23
2016-01-21    25
2016-03-21    27
2016-05-20    29
Freq: D, Length: 30, dtype: int64

In [11]: ser["2014"]
Out[11]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
2014-10-28    10
2014-12-27    12
2014-01-31     1
2014-04-01     3
2014-05-31     5
2014-07-30     7
2014-09-28     9
2014-11-27    11
Freq: D, dtype: int64

In [12]: ser.loc["May 2015"]
Out[12]: 
2015-05-26    17
Freq: D, dtype: int64 
```### 比较两个`DataFrame`或两个`Series`并总结差异

我们为比较两个`DataFrame`或两个`Series`添加了`DataFrame.compare()`和`Series.compare()`（[GH 30429](https://github.com/pandas-dev/pandas/issues/30429)）

```py
In [13]: df = pd.DataFrame(
 ....:    {
 ....:        "col1": ["a", "a", "b", "b", "a"],
 ....:        "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
 ....:        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
 ....:    },
 ....:    columns=["col1", "col2", "col3"],
 ....: )
 ....: 

In [14]: df
Out[14]: 
 col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

In [15]: df2 = df.copy()

In [16]: df2.loc[0, 'col1'] = 'c'

In [17]: df2.loc[2, 'col3'] = 4.0

In [18]: df2
Out[18]: 
 col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

In [19]: df.compare(df2)
Out[19]: 
 col1       col3 
 self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

有关更多详细信息，请参阅用户指南。 ### 允许在 groupby 键中使用 NA

在 groupby 中，我们为DataFrame.groupby()和Series.groupby()添加了一个dropna关键字，以允许在组键中包含NA值。用户可以将dropna定义为False，如果他们想要在 groupby 键中包含NA值。默认设置为True以保持向后兼容性（GH 3729）

In [20]: df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]

In [21]: df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])

In [22]: df_dropna
Out[22]: 
 a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

# Default ``dropna`` is set to True, which will exclude NaNs in keys
In [23]: df_dropna.groupby(by=["b"], dropna=True).sum()
Out[23]: 
 a  c
b 
1.0  2  3
2.0  2  5

# In order to allow NaN in keys, set ``dropna`` to False
In [24]: df_dropna.groupby(by=["b"], dropna=False).sum()
Out[24]: 
 a  c
b 
1.0  2  3
2.0  2  5
NaN  1  4

dropna参数的默认设置为True，这意味着NA不包括在组键中。 ### 使用键进行排序

我们已经为DataFrame和Series排序方法添加了key参数，包括DataFrame.sort_values()、DataFrame.sort_index()、Series.sort_values()和Series.sort_index()。key可以是任何可调用函数，应用于排序之前每一列用于排序的每一列（GH 27237）。更多信息请参阅带有键的 sort_values 和带有键的 sort_index。

In [25]: s = pd.Series(['C', 'a', 'B'])

In [26]: s
Out[26]: 
0    C
1    a
2    B
dtype: object

In [27]: s.sort_values()
Out[27]: 
2    B
0    C
1    a
dtype: object

注意，这是以大写字母优先排序的。如果我们应用Series.str.lower()方法，我们会得到

In [28]: s.sort_values(key=lambda x: x.str.lower())
Out[28]: 
1    a
2    B
0    C
dtype: object

当应用于DataFrame时，如果指定了by，则按列对所有列或子集应用关键字，例如。

In [29]: df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
 ....:                   'b': [1, 2, 3, 4, 5, 6]})
 ....: 

In [30]: df
Out[30]: 
 a  b
0  C  1
1  C  2
2  a  3
3  a  4
4  B  5
5  B  6

In [31]: df.sort_values(by=['a'], key=lambda col: col.str.lower())
Out[31]: 
 a  b
2  a  3
3  a  4
4  B  5
5  B  6
0  C  1
1  C  2

有关更多详细信息，请参阅DataFrame.sort_values()、Series.sort_values()和sort_index()中的示例和文档。### Timestamp 构造函数中的 Fold 参数支持

Timestamp: 现在根据PEP 495支持仅关键字折叠参数，类似于父类datetime.datetime。它支持接受折叠作为初始化参数，并从其他构造函数参数中推断折叠（GH 25057，GH 31338）。支持仅限于dateutil时区，因为pytz不支持折叠。

例如：

In [32]: ts = pd.Timestamp("2019-10-27 01:30:00+00:00")

In [33]: ts.fold
Out[33]: 0

In [34]: ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
 ....:                  tz="dateutil/Europe/London", fold=1)
 ....: 

In [35]: ts
Out[35]: Timestamp('2019-10-27 01:30:00+0000', tz='dateutil//usr/share/zoneinfo/Europe/London')

有关使用折叠的更多信息，请参阅用户指南中的 Fold 子节。

to_datetime() 现在支持解析包含时区名称（%Z）和 UTC 偏移量（%z）的格式，然后通过设置 utc=True 将它们转换为 UTC。如果不设置 utc=True，则会返回一个带有 UTC 时区的 DatetimeIndex，而不是带有 object dtype 的 Index (GH 32792)。

例如：

In [36]: tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
 ....:           "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
 ....: 

In [37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
Out[37]: 
DatetimeIndex(['2010-01-01 11:00:00+00:00', '2010-01-01 13:00:00+00:00',
 '2010-01-01 09:00:00+00:00', '2010-01-01 08:00:00+00:00'],
 dtype='datetime64[ns, UTC]', freq=None)

In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[37]:
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
       2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
      dtype='object') 
```### Grouper 和 resample 现在支持参数 origin 和 offset

`Grouper` 和 `DataFrame.resample()` 现在支持参数 `origin` 和 `offset`。它允许用户控制调整分组的时间戳。([GH 31809](https://github.com/pandas-dev/pandas/issues/31809))

分组的区间根据时间序列起始点的当天开始调整。这对于天数的倍数（如 `30D`）或能整除一天的频率（如 `90s` 或 `1min`）非常有效。但是，它可能会与某些不符合此标准的频率产生不一致。要更改此行为，您现在可以使用参数 `origin` 指定一个固定的时间戳。

现在有两个参数已被弃用（更多信息请参阅 `DataFrame.resample()` 的文档）：

+   应该将 `base` 替换为 `offset`。

+   `loffset` 应该被替换为在重采样后直接将偏移添加到索引 `DataFrame`。

使用 `origin` 的小例子：

```py
In [38]: start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'

In [39]: middle = '2000-10-02 00:00:00'

In [40]: rng = pd.date_range(start, end, freq='7min')

In [41]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [42]: ts
Out[42]: 
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64

使用默认行为 'start_day' 进行重采样（origin 为 2000-10-01 00:00:00）：

In [43]: ts.resample('17min').sum()
Out[43]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

In [44]: ts.resample('17min', origin='start_day').sum()
Out[44]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

使用固定的原点进行重采样：

In [45]: ts.resample('17min', origin='epoch').sum()
Out[45]: 
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64

In [46]: ts.resample('17min', origin='2000-01-01').sum()
Out[46]: 
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

如果需要，您可以使用参数 offset（Timedelta）调整区间，该参数将被添加到默认的 origin 中。

完整示例，请参阅：使用 origin 或 offset 调整区间的开始。

现在使用 fsspec 进行文件系统处理。

对于读写非本地文件系统和从 HTTP(S) 读取，可选依赖项 fsspec 将用于分派操作（GH 33452）。这将为已支持的 S3 和 GCS 存储提供不变的功能，但还将添加对其他存储实现的支持，如 Azure Datalake and Blob，SSH，FTP，dropbox 和 github。有关文档和功能，请参阅 fsspec 文档。

与 S3 和 GCS 的现有接口功能不受此更改影响，因为 fsspec 仍将引入与以前相同的软件包。

其他增强

兼容性与 matplotlib 3.3.0 (GH 34850)
IntegerArray.astype() 现在支持 datetime64 dtype（GH 32538）
IntegerArray 现在实现了 sum 操作（GH 33172）
添加了 pandas.errors.InvalidIndexError（GH 34570）。
添加了 DataFrame.value_counts()（GH 5377）
添加了一个 pandas.api.indexers.FixedForwardWindowIndexer() 类，用于在 rolling 操作期间支持向前看窗口。
添加了一个 pandas.api.indexers.VariableOffsetWindowIndexer() 类，用于支持具有非固定偏移量的 rolling 操作（GH 34994）
describe() 现在包括一个 datetime_is_numeric 关键字，用于控制如何总结日期时间列（GH 30164, GH 34798）
Styler 现在可以更有效地呈现 CSS，当多个单元格具有相同的样式时（GH 30876）
highlight_null() 现在接受 subset 参数 (GH 31345)
在直接写入 sqlite 连接时 DataFrame.to_sql() 现在支持 multi 方法 (GH 29921)
pandas.errors.OptionError 现在在 pandas.errors 中暴露出来 (GH 27553)
添加了 api.extensions.ExtensionArray.argmax() 和 api.extensions.ExtensionArray.argmin() (GH 24382)
timedelta_range() 现在在传递 start、stop 和 periods 参数时会推断频率 (GH 32377)
在 IntervalIndex 上的位置切片现在支持 step > 1 的切片 (GH 31658)
Series.str 现在具有 fullmatch 方法，与 re.fullmatch 类似，匹配正则表达式与 Series 每一行的整个字符串 (GH 32806).
DataFrame.sample() 现在还允许将类数组和 BitGenerator 对象传递给 random_state 作为种子 (GH 32503)
Index.union() 现在对于 MultiIndex 对象内部无法排序的情况将引发 RuntimeWarning。传递 sort=False 可以抑制此警告 (GH 33015)
添加了 Series.dt.isocalendar() 和 DatetimeIndex.isocalendar() 方法，根据 ISO 8601 日历计算年、周和日，并返回一个 DataFrame (GH 33206, GH 34392).
DataFrame.to_feather() 方法现在支持额外的关键字参数（例如设置压缩），这些参数已添加到 pyarrow 0.17 中（GH 33422）。
cut() 现在将接受参数 ordered，默认为 ordered=True。如果 ordered=False 并且未提供标签，则会引发错误（GH 33141）。
DataFrame.to_csv()、DataFrame.to_pickle() 和 DataFrame.to_json() 现在在使用 gzip 和 bz2 协议时支持传递压缩参数的字典。这可以用于设置自定义压缩级别，例如 df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}（GH 33196）。
melt() 增加了一个 ignore_index 参数（默认为 True），如果设置为 False，则阻止方法删除索引（GH 17440）。
Series.update() 现在接受可以强制转换为 Series 的对象，例如 dict 和 list，与 DataFrame.update() 的行为相同（GH 33215）。
DataFrameGroupBy.transform() 和 DataFrameGroupBy.aggregate() 增加了 engine 和 engine_kwargs 参数，支持使用 Numba 执行函数（GH 32854，GH 33388）。
Resampler.interpolate() 现在支持 SciPy 插值方法 scipy.interpolate.CubicSpline 作为 cubicspline 方法（GH 33670）。
DataFrameGroupBy 和 SeriesGroupBy 现在为在组内进行随机抽样实现了 sample 方法（GH 31775）。
DataFrame.to_numpy() 现在支持 na_value 关键字来控制输出数组中的 NA 哨兵（GH 33820）。
将 api.extension.ExtensionArray.equals 添加到扩展数组接口中，类似于 Series.equals()（GH 27081）。
在read_stata()和 StataReader 中增加了最小支持的 dta 版本至 105（GH 26667）。
to_stata()支持使用compression关键字参数进行压缩。可以通过字符串或包含方法和传递给压缩库的任何其他参数的字典来推断或明确设置压缩。低级 Stata 文件编写器 StataWriter、StataWriter117和 StataWriterUTF8 也添加了压缩功能（GH 26599）。
HDFStore.put() 现在接受一个 track_times 参数。此参数将传递给 PyTables 的 create_table 方法（GH 32682）。
Series.plot()和 DataFrame.plot() 现在接受 xlabel 和 ylabel 参数以在 x 和 y 轴上显示标签（GH 9093）。
使Rolling和Expanding可迭代（GH 11704）
将 option_context 设为 contextlib.ContextDecorator, 这使得它可以作为整个函数的装饰器使用 (GH 34253).
DataFrame.to_csv() 和 Series.to_csv() 现在接受一个 errors 参数 (GH 22610)
DataFrameGroupBy.groupby.transform() 现在允许 func 为 pad, backfill 和 cumcount (GH 31269).
read_json() 现在接受一个 nrows 参数 (GH 33916).
DataFrame.hist(), Series.hist(), core.groupby.DataFrameGroupBy.hist(), 和 core.groupby.SeriesGroupBy.hist() 现在具有 legend 参数。设置为 True 以在直方图中显示图例 (GH 6279).
concat() 和 append() 现在保留扩展的数据类型，例如将可空整数列与 numpy 整数列合并将不再导致对象数据类型，而是保留整数数据类型 (GH 33607, GH 34339, GH 34095).
read_gbq() 现在允许禁用进度条 (GH 33360).
read_gbq() 现在支持 pandas-gbq 中的 max_results kwarg (GH 34639).
DataFrame.cov() 和 Series.cov() 现在支持一个新参数 ddof，以支持与相应的 numpy 方法中的 delta 自由度相��的功能 (GH 34611).
DataFrame.to_html() 和 DataFrame.to_string() 的 col_space 参数现在接受列表或字典以仅更改一些特定列的宽度（GH 28917）。
DataFrame.to_excel() 现在还可以写入 OpenOffice 电子表格 (.ods) 文件（GH 27222）。
explode() 现在接受 ignore_index 参数以重置索引，类似于 pd.concat() 或 DataFrame.sort_values()（GH 34932）。
DataFrame.to_markdown() 和 Series.to_markdown() 现在接受 index 参数作为 tabulate 的 showindex 的别名（GH 32667）。
read_csv() 现在接受字符串值如 “0”, “0.0”, “1”, “1.0” 并将其转换为可空布尔类型（GH 34859）。
ExponentialMovingWindow 现在支持一个 times 参数，允许按照 times 中的时间戳间隔计算 mean（GH 34839）。
DataFrame.agg() 和 Series.agg() 现在接受命名聚合以重命名输出的列/索引（GH 26513）。
compute.use_numba 现在作为一个配置选项存在，当可用时利用 numba 引擎（GH 33966, GH 35374）。
Series.plot() 现在支持不对称误差条。以前，如果 Series.plot() 收到了一个带有 yerr 和/或 xerr 错误值的“2xN”数组，左/下值（第一行）会被镜像，而右/上值（第二行）会被忽略。现在，第一行代表左/下错误值，第二行代表右/上错误值。(GH 9536) ## 显著的错误修复

这些是可能具有显着行为变化的错误修复。

`MultiIndex.get_indexer` 正确解释了 `method` 参数

这恢复了在 pandas 0.23.0 之前的 MultiIndex.get_indexer() 的行为，特别是 MultiIndex 被视为元组列表，并且根据这些元组列表的顺序进行填充或反向填充 (GH 29896).

举例说明：

In [47]: df = pd.DataFrame({
 ....:    'a': [0, 0, 0, 0],
 ....:    'b': [0, 2, 3, 4],
 ....:    'c': ['A', 'B', 'C', 'D'],
 ....: }).set_index(['a', 'b'])
 ....: 

In [48]: mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

使用 df 和 mi_2 进行重新索引并使用 method='backfill' 的差异如下：

pandas >= 0.23, < 1.1.0:

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
 c
0 -1  A
 0  A
 1  D
 3  A
 4  A
 5  C

pandas <0.23, >= 1.1.0

In [49]: df.reindex(mi_2, method='backfill')
Out[49]: 
 c
0 -1    A
 0    A
 1    B
 3    C
 4    D
 5  NaN

使用 df 和 mi_2 进行重新索引并使用 method='pad' 的差异如下：

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
 c
0 -1  NaN
 0  NaN
 1    D
 3  NaN
 4    A
 5    C

pandas < 0.23, >= 1.1.0

In [50]: df.reindex(mi_2, method='pad')
Out[50]: 
 c
0 -1  NaN
 0    A
 1    A
 3    C
 4    D
 5    D

标签查找失败总是引发 KeyError

标签查找 series[key]，series.loc[key] 和 frame.loc[key] 以前会根据键的类型和 Index 的类型引发 KeyError 或 TypeError。现在这些都统一引发 KeyError (GH 31867)

In [51]: ser1 = pd.Series(range(3), index=[0, 1, 2])

In [52]: ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

以前的行为：

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

新行为：

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

同样，DataFrame.at() 和 Series.at() 如果传递了不兼容的键，则会引发 TypeError 而不是 ValueError，如果传递了缺少的键，则引发 KeyError，与 .loc[] 的行为相匹配 (GH 31722) ### MultiIndex 整数查找失败引发 KeyError

使用带有整数类型的 MultiIndex 进行整数索引的情况下，当索引的第一级中有一个或多个整数键不存在时，将不会错误地引发 KeyError。(GH 33539)

In [53]: idx = pd.Index(range(4))

In [54]: dti = pd.date_range("2000-01-03", periods=3)

In [55]: mi = pd.MultiIndex.from_product([idx, dti])

In [56]: ser = pd.Series(range(len(mi)), index=mi)

以前的行为：

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

新行为：

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

`DataFrame.merge()` 保留了右侧框的行顺序

DataFrame.merge()现在在执行右合并时保留右侧框架的行顺序 (GH 27453)

In [57]: left_df = pd.DataFrame({'animal': ['dog', 'pig'],
 ....:                       'max_speed': [40, 11]})
 ....: 

In [58]: right_df = pd.DataFrame({'animal': ['quetzal', 'pig'],
 ....:                        'max_speed': [80, 11]})
 ....: 

In [59]: left_df
Out[59]: 
 animal  max_speed
0    dog         40
1    pig         11

In [60]: right_df
Out[60]: 
 animal  max_speed
0  quetzal         80
1      pig         11

先前行为:

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
 animal  max_speed
0      pig         11
1  quetzal         80

新行为:

In [61]: left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
Out[61]: 
 animal  max_speed
0  quetzal         80
1      pig         11

在 DataFrame 的多列上赋值时，当某些列不存在时

在DataFrame的多列上赋值时，当某些列不存在时，以前会将值分配给最后一列。现在，将使用正确的值构建新列。(GH 13658)

In [62]: df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})

In [63]: df
Out[63]: 
 a  b
0  0  3
1  1  4
2  2  5

先前行为:

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
 a  b
0  1  1
1  1  1
2  1  1

新行为:

In [64]: df[['a', 'c']] = 1

In [65]: df
Out[65]: 
 a  b  c
0  1  3  1
1  1  4  1
2  1  5  1 
```### 组合缩减的一致性

使用`DataFrame.groupby()`与`as_index=True`和聚合函数`nunique`会将分组列包含在结果的列中。现在，分组列仅出现在索引中，与其他缩减操作一致。([GH 32579](https://github.com/pandas-dev/pandas/issues/32579))

```py
In [66]: df = pd.DataFrame({"a": ["x", "x", "y", "y"], "b": [1, 1, 2, 3]})

In [67]: df
Out[67]: 
 a  b
0  x  1
1  x  1
2  y  2
3  y  3

先前行为:

In [3]: df.groupby("a", as_index=True).nunique()
Out[4]:
 a  b
a
x  1  1
y  1  2

新行为:

In [68]: df.groupby("a", as_index=True).nunique()
Out[68]: 
 b
a 
x  1
y  2

使用DataFrame.groupby()与as_index=False和函数idxmax、idxmin、mad、nunique、sem、skew或std会修改分组列。现在，分组列保持不变，与其他缩减操作一致。(GH 21090, GH 10355)

先前行为:

In [3]: df.groupby("a", as_index=False).nunique()
Out[4]:
 a  b
0  1  1
1  1  2

新行为:

In [69]: df.groupby("a", as_index=False).nunique()
Out[69]: 
 a  b
0  x  1
1  y  2

方法DataFrameGroupBy.size()以前会忽略as_index=False。现在，分组列作为列返回，使结果成为DataFrame而不是Series。(GH 32599)

先前行为:

In [3]: df.groupby("a", as_index=False).size()
Out[4]:
a
x    2
y    2
dtype: int64

新行为:

In [70]: df.groupby("a", as_index=False).size()
Out[70]: 
 a  size
0  x     2
1  y     2 
```### 当重新标记列时，`DataFrameGroupby.agg()`在`as_index=False`时丢失结果

以前，当`DataFrameGroupby.agg()`的`as_index`选项设置为`False`并且结果列被重新标记时，会丢失结果列。在这种情况下，结果值会被替换为先前的索引 ([GH 32240](https://github.com/pandas-dev/pandas/issues/32240)).

```py
In [71]: df = pd.DataFrame({"key": ["x", "y", "z", "x", "y", "z"],
 ....:                   "val": [1.0, 0.8, 2.0, 3.0, 3.6, 0.75]})
 ....: 

In [72]: df
Out[72]: 
 key   val
0   x  1.00
1   y  0.80
2   z  2.00
3   x  3.00
4   y  3.60
5   z  0.75

先前行为:

In [2]: grouped = df.groupby("key", as_index=False)
In [3]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))
In [4]: result
Out[4]:
 min_val
 0   x
 1   y
 2   z

新行为:

In [73]: grouped = df.groupby("key", as_index=False)

In [74]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))

In [75]: result
Out[75]: 
 key  min_val
0   x     1.00
1   y     0.80
2   z     0.75 
```### 在`DataFrame`上的 apply 和 applymap 仅评估第一行/列一次

```py
In [76]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 6]})

In [77]: def func(row):
 ....:    print(row)
 ....:    return row
 ....:

先前行为:

In [4]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[4]:
 a  b
0  1  3
1  2  6

新行为:

In [78]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[78]: 
 a  b
0  1  3
1  2  6 
```## 不兼容的 API 更改

### 添加 `check_freq` 参数到 `testing.assert_frame_equal` 和 `testing.assert_series_equal`

在 pandas 1.1.0 中，`check_freq` 参数被添加到 `testing.assert_frame_equal()` 和 `testing.assert_series_equal()` 中，默认为 `True`。`testing.assert_frame_equal()` 和 `testing.assert_series_equal()` 现在在索引频率不相同时引发 `AssertionError`。在 pandas 1.1.0 之前，不会检查索引频率。

### 提高了依赖项的最低版本

一些依赖项的最低支持版本已更新 ([GH 33718](https://github.com/pandas-dev/pandas/issues/33718), [GH 29766](https://github.com/pandas-dev/pandas/issues/29766), [GH 29723](https://github.com/pandas-dev/pandas/issues/29723), pytables >= 3.4.3)。如果已安装，我们现在要求：

| 包 | 最低版本 | 要求 | 改变 |
| --- | --- | --- | --- |
| numpy | 1.15.4 | X | X |
| pytz | 2015.4 | X |  |
| python-dateutil | 2.7.3 | X | X |
| 瓶颈 | 1.2.1 |  |  |
| numexpr | 2.6.2 |  |  |
| pytest（开发） | 4.0.2 |  |  |

对于[可选库](https://pandas.pydata.org/docs/getting_started/install.html)，通常建议使用最新版本。以下表格列出了在 pandas 开发过程中当前正在测试的每个库的最低版本。最低测试版本以下的可选库可能仍然可用，但不被视为受支持。

| 包 | 最低版本 | 改变 |
| --- | --- | --- |
| 美丽汤 | 4.6.0 |  |
| fastparquet | 0.3.2 |  |
| fsspec | 0.7.4 |  |
| gcsfs | 0.6.0 | X |
| lxml | 3.8.0 |  |
| matplotlib | 2.2.2 |  |
| numba | 0.46.0 |  |
| openpyxl | 2.5.7 |  |
| pyarrow | 0.13.0 |  |
| pymysql | 0.7.1 |  |
| pytables | 3.4.3 | X |
| s3fs | 0.4.0 | X |
| scipy | 1.2.0 | X |
| sqlalchemy | 1.1.4 |  |
| xarray | 0.8.2 |  |
| xlrd | 1.1.0 |  |
| xlsxwriter | 0.9.8 |  |
| xlwt | 1.2.0 |  |
| pandas-gbq | 1.2.0 | X |

更多信息请参阅 Dependencies 和 Optional dependencies。

### 开发变更

+   现在 Cython 的最低版本是最新的 bug 修复版本（0.29.16）([GH 33334](https://github.com/pandas-dev/pandas/issues/33334))。## 弃用

+   使用包含切片的单项列表（例如 `ser[[slice(0, 4)]]`）对 `Series` 进行查找已弃用，并将在未来版本中引发错误。请将列表转换为元组，或直接传递切片 ([GH 31333](https://github.com/pandas-dev/pandas/issues/31333))

+   `DataFrame.mean()` 和 `DataFrame.median()` 在 `numeric_only=None` 的情况下将在未来版本中包括 `datetime64` 和 `datetime64tz` 列 ([GH 29941](https://github.com/pandas-dev/pandas/issues/29941))

+   使用 `.loc` 使用位置切片设置值已弃用，并将在未来版本中引发错误。请改用具有标签的 `.loc` 或具有位置的 `.iloc` ([GH 31840](https://github.com/pandas-dev/pandas/issues/31840))

+   `DataFrame.to_dict()` 已弃用接受`orient`的简短名称，并将在未来版本中引发错误 ([GH 32515](https://github.com/pandas-dev/pandas/issues/32515))

+   `Categorical.to_dense()` 已弃用，并将在未来版本中移除，改用 `np.asarray(cat)` 代替 ([GH 32639](https://github.com/pandas-dev/pandas/issues/32639))

+   `SingleBlockManager` 构造函数中的 `fastpath` 关键字已弃用，并将在未来版本中移除 ([GH 33092](https://github.com/pandas-dev/pandas/issues/33092))

+   在 `pandas.merge()` 中将 `suffixes` 作为 `set` 提供已弃用。请改为提供一个元组 ([GH 33740](https://github.com/pandas-dev/pandas/issues/33740), [GH 34741](https://github.com/pandas-dev/pandas/issues/34741)).

+   使用类似 `[:, None]` 的多维索引器对 `Series` 进行索引以返回 `ndarray` 现在会引发 `FutureWarning`。请在索引之前转换为 NumPy 数组 ([GH 27837](https://github.com/pandas-dev/pandas/issues/27837))

+   `Index.is_mixed()` 已弃用，并将在未来版本中移除，直接检查 `index.inferred_type` 代替 ([GH 32922](https://github.com/pandas-dev/pandas/issues/32922))

+   将任何参数传递给 `read_html()` 作为位置参数已弃用。所有其他参数应作为关键字参数给出 ([GH 27573](https://github.com/pandas-dev/pandas/issues/27573)).

+   将任何参数传递给 `read_json()` 作为位置参数已弃用。所有其他参数应作为关键字参数给出 ([GH 27573](https://github.com/pandas-dev/pandas/issues/27573)).

+   将任何参数传递给 `read_excel()` 作为位置参数已被弃用。所有其他参数应作为关键字参数给出 ([GH 27573](https://github.com/pandas-dev/pandas/issues/27573)).

+   `pandas.api.types.is_categorical()` 已被弃用，将在未来版本中删除；请使用 `pandas.api.types.is_categorical_dtype()` 替代 ([GH 33385](https://github.com/pandas-dev/pandas/issues/33385))

+   `Index.get_value()` 已被弃用，将在未来版本中删除 ([GH 19728](https://github.com/pandas-dev/pandas/issues/19728))

+   `Series.dt.week()` 和 `Series.dt.weekofyear()` 已被弃用，将在未来版本中删除，请使用 `Series.dt.isocalendar().week()` 替代 ([GH 33595](https://github.com/pandas-dev/pandas/issues/33595))

+   `DatetimeIndex.week()` 和 `DatetimeIndex.weekofyear` 已被弃用，将在未来版本中删除，请使用 `DatetimeIndex.isocalendar().week` 替代 ([GH 33595](https://github.com/pandas-dev/pandas/issues/33595))

+   `DatetimeArray.week()` 和 `DatetimeArray.weekofyear` 已被弃用，将在未来版本中删除，请使用 `DatetimeArray.isocalendar().week` 替代 ([GH 33595](https://github.com/pandas-dev/pandas/issues/33595))

+   `DateOffset.__call__()` 已被弃用，将在未来版本中删除，请使用 `offset + other` 替代 ([GH 34171](https://github.com/pandas-dev/pandas/issues/34171))

+   `apply_index()` 已被弃用，将在未来版本中删除，请使用 `offset + other` 替代 ([GH 34580](https://github.com/pandas-dev/pandas/issues/34580))

+   `DataFrame.tshift()` 和 `Series.tshift()` 已被弃用，将在未来版本中删除，请使用 `DataFrame.shift()` 和 `Series.shift()` 替代 ([GH 11631](https://github.com/pandas-dev/pandas/issues/11631))

+   使用浮点数键对 `Index` 对象进行索引已被弃用，将来会引发 `IndexError`。您可以手动转换为整数键 ([GH 34191](https://github.com/pandas-dev/pandas/issues/34191)).

+   在 `groupby()` 中使用 `squeeze` 关键字已被弃用，将在未来版本中删除 ([GH 32380](https://github.com/pandas-dev/pandas/issues/32380))

+   `Period.to_timestamp()` 中的 `tz` 关键字已弃用，并将在未来的版本中移除；请改用 `per.to_timestamp(...).tz_localize(tz)`（[GH 34522](https://github.com/pandas-dev/pandas/issues/34522)）

+   `DatetimeIndex.to_perioddelta()` 已弃用，并将在未来的版本中移除。请改用 `index - index.to_period(freq).to_timestamp()`（[GH 34853](https://github.com/pandas-dev/pandas/issues/34853)）

+   `DataFrame.melt()` 接受已存在的 `value_name` 已弃用，并将在未来的版本中移除（[GH 34731](https://github.com/pandas-dev/pandas/issues/34731)）

+   `DataFrame.expanding()` 函数中的 `center` 关键字已弃用，并将在未来的版本中移除（[GH 20647](https://github.com/pandas-dev/pandas/issues/20647)）## 性能改进

+   `Timedelta` 构造函数的性能改进（[GH 30543](https://github.com/pandas-dev/pandas/issues/30543)）

+   `Timestamp` 构造函数的性能改进（[GH 30543](https://github.com/pandas-dev/pandas/issues/30543)）

+   在带有 `axis=0` 的 `DataFrame` 和 `Series` 之间的灵活算术操作性能改进（[GH 31296](https://github.com/pandas-dev/pandas/issues/31296)）

+   在带有 `axis=1` 的 `DataFrame` 和 `Series` 之间的算术操作性能改进（[GH 33600](https://github.com/pandas-dev/pandas/issues/33600)）

+   内部索引方法 `_shallow_copy()` 现在将缓存的属性复制到新索引中，避免在新索引上再次创建。这可以加速许多依赖于创建现有索引副本的操作（[GH 28584](https://github.com/pandas-dev/pandas/issues/28584)，[GH 32640](https://github.com/pandas-dev/pandas/issues/32640)，[GH 32669](https://github.com/pandas-dev/pandas/issues/32669)）

+   使用 `DataFrame.sparse.from_spmatrix()` 构造稀疏值的 `scipy.sparse` 矩阵的 `DataFrame` 时显著提升性能（[GH 32821](https://github.com/pandas-dev/pandas/issues/32821)、[GH 32825](https://github.com/pandas-dev/pandas/issues/32825)、[GH 32826](https://github.com/pandas-dev/pandas/issues/32826)、[GH 32856](https://github.com/pandas-dev/pandas/issues/32856)、[GH 32858](https://github.com/pandas-dev/pandas/issues/32858)）。

+   `Groupby.first()` 和 `Groupby.last()` 的分组方法性能提升（[GH 34178](https://github.com/pandas-dev/pandas/issues/34178)）

+   对可空（整数和布尔）数据类型进行 `factorize()` 操作的性能提升（[GH 33064](https://github.com/pandas-dev/pandas/issues/33064)）。

+   构造 `Categorical` 对象时的性能提升（[GH 33921](https://github.com/pandas-dev/pandas/issues/33921)）

+   修复了 `pandas.qcut()` 和 `pandas.cut()` 中的性能回归（[GH 33921](https://github.com/pandas-dev/pandas/issues/33921)）

+   对可空（整数和布尔）数据类型的归约操作（`sum`、`prod`、`min`、`max`）性能提升（[GH 30982](https://github.com/pandas-dev/pandas/issues/30982)、[GH 33261](https://github.com/pandas-dev/pandas/issues/33261)、[GH 33442](https://github.com/pandas-dev/pandas/issues/33442)）。

+   两个 `DataFrame` 对象之间进行算术操作的性能提升（[GH 32779](https://github.com/pandas-dev/pandas/issues/32779)）

+   `RollingGroupby` 中的性能改进（[GH 34052](https://github.com/pandas-dev/pandas/issues/34052)）

+   对 `MultiIndex` 进行算术操作（`sub`、`add`、`mul`、`div`）的性能改进（[GH 34297](https://github.com/pandas-dev/pandas/issues/34297)）

+   在 `DataFrame[bool_indexer]` 使用 `list` 作为 `bool_indexer` 时的性能改进（[GH 33924](https://github.com/pandas-dev/pandas/issues/33924)）

+   通过各种方式添加样式（例如`io.formats.style.Styler.apply()`，`io.formats.style.Styler.applymap()`或`io.formats.style.Styler.bar()``的性能（[GH 19917](https://github.com/pandas-dev/pandas/issues/19917)）  ## Bug fixes

### 分类

+   将无效的`fill_value`传递给`Categorical.take()`会引发`ValueError`，而不是`TypeError`（[GH 33660](https://github.com/pandas-dev/pandas/issues/33660)）

+   在使用`Categorical`与整数类别以及包含缺失值的浮点 dtype 列进行操作（例如`concat()`或`append()`）时，现在将得到一个浮点列，而不是对象 dtype 列（[GH 33607](https://github.com/pandas-dev/pandas/issues/33607)）

+   `merge()`无法在非唯一分类索引上连接的错误（[GH 28189](https://github.com/pandas-dev/pandas/issues/28189)）

+   将分类数据传递给`Index`构造函数时，以及与`dtype=object`一起错误地返回`CategoricalIndex`而不是对象 dtype `Index`的错误修复（[GH 32167](https://github.com/pandas-dev/pandas/issues/32167)）

+   `Categorical`比较运算符`__ne__`在任一元素缺失时会错误地评估为`False`的错误（[GH 32276](https://github.com/pandas-dev/pandas/issues/32276)）

+   `Categorical.fillna()`现在接受`Categorical` `other`参数（[GH 32420](https://github.com/pandas-dev/pandas/issues/32420)）

+   `Categorical`的 Repr 未区分`int`和`str`（[GH 33676](https://github.com/pandas-dev/pandas/issues/33676)）

### 日期时间类

+   将除`int64`之外的整数 dtype 传递给`np.array(period_index, dtype=...)`现在会引发`TypeError`，而不是错误地使用`int64`（[GH 32255](https://github.com/pandas-dev/pandas/issues/32255)）

+   `Series.to_timestamp()`现在如果轴不是 `PeriodIndex`，则引发`TypeError`。之前会引发`AttributeError`（[GH 33327](https://github.com/pandas-dev/pandas/issues/33327)）

+   `Series.to_period()`现在如果轴不是 `DatetimeIndex`，则引发`TypeError`。之前会引发`AttributeError`（[GH 33327](https://github.com/pandas-dev/pandas/issues/33327)）

+   `Period`不再接受元组作为`freq`参数（[GH 34658](https://github.com/pandas-dev/pandas/issues/34658)）

+   `Timestamp`中存在 Bug，从模糊的 epoch 时间构造 `Timestamp`并再次调用构造函数会更改 `Timestamp.value()`属性（[GH 24329](https://github.com/pandas-dev/pandas/issues/24329)）

+   `DatetimeArray.searchsorted()`, `TimedeltaArray.searchsorted()`, `PeriodArray.searchsorted()`不识别非 pandas 标量，并且错误地引发`ValueError`而不是`TypeError`（[GH 30950](https://github.com/pandas-dev/pandas/issues/30950)）

+   `Timestamp`中存在 Bug，构造具有比夏令时从冬季到夏季切换前少于 128 纳秒的 dateutil 时区的 `Timestamp`会导致不存在的时间（[GH 31043](https://github.com/pandas-dev/pandas/issues/31043)）

+   `Period.to_timestamp()`中存在 Bug，`Period.start_time()`返回微秒频率的时间戳比正确时间早一纳秒（[GH 31475](https://github.com/pandas-dev/pandas/issues/31475)）

+   `Timestamp`在年、月或日缺失时引发了混淆的错误消息（[GH 31200](https://github.com/pandas-dev/pandas/issues/31200)）

+   `DatetimeIndex`构造函数错误地接受了`bool`类型的输入（[GH 32668](https://github.com/pandas-dev/pandas/issues/32668)）

+   在 `DatetimeIndex.searchsorted()` 中存在一个 bug，不接受 `list` 或 `Series` 作为其参数 ([GH 32762](https://github.com/pandas-dev/pandas/issues/32762))

+   在 `PeriodIndex()` 中存在一个 bug，当传递一个字符串的 `Series` 时会引发错误 ([GH 26109](https://github.com/pandas-dev/pandas/issues/26109))

+   在 `Timestamp` 进行算术运算时，当与具有 `timedelta64` 数据类型的 `np.ndarray` 相加或相减时存在 bug ([GH 33296](https://github.com/pandas-dev/pandas/issues/33296))

+   在 `DatetimeIndex.to_period()` 中存在一个 bug，在不带参数调用时无法推断频率 ([GH 33358](https://github.com/pandas-dev/pandas/issues/33358))

+   在 `DatetimeIndex.tz_localize()` 中存在一个 bug，在某些情况下错误地保留了 `freq`，原始的 `freq` 已不再有效 ([GH 30511](https://github.com/pandas-dev/pandas/issues/30511))

+   在 `DatetimeIndex.intersection()` 中存在一个 bug，在某些情况下会丢失 `freq` 和时区信息 ([GH 33604](https://github.com/pandas-dev/pandas/issues/33604))

+   在 `DatetimeIndex.get_indexer()` 中存在一个 bug，对于混合的类似日期时间的目标，可能会返回不正确的输出 ([GH 33741](https://github.com/pandas-dev/pandas/issues/33741))

+   在 `DatetimeIndex` 中存在一个 bug，在某些类型的 `DateOffset` 对象上进行加减运���时，错误地保留了无效的 `freq` 属性 ([GH 33779](https://github.com/pandas-dev/pandas/issues/33779))

+   在 `DatetimeIndex` 中存在一个 bug，设置索引的 `freq` 属性可能会悄悄地改变另一个查看相同数据的索引的 `freq` 属性 ([GH 33552](https://github.com/pandas-dev/pandas/issues/33552))

+   `DataFrame.min()` 和 `DataFrame.max()` 在对使用空的 `pd.to_datetime()` 初始化的对象调用时，与 `Series.min()` 和 `Series.max()` 返回的结果不一致

+   在 `DatetimeIndex.intersection()` 和 `TimedeltaIndex.intersection()` 中存在一个 bug，结果没有正确的 `name` 属性 ([GH 33904](https://github.com/pandas-dev/pandas/issues/33904))

+   在 `DatetimeArray.__setitem__()`、`TimedeltaArray.__setitem__()`、`PeriodArray.__setitem__()` 中，错误地允许具有 `int64` dtype 的值被静默转换 ([GH 33717](https://github.com/pandas-dev/pandas/issues/33717))

+   在某些情况下，错误地从 `Period` 中减去 `TimedeltaIndex` 会错误地引发 `TypeError`，在某些情况下，应该成功，而在某些情况下，应该引发 `IncompatibleFrequency` 而不是 `TypeError` ([GH 33883](https://github.com/pandas-dev/pandas/issues/33883))

+   在使用只读的 NumPy 数组构造 `Series` 或 `Index` 时，如果非纳秒分辨率转换为对象 dtype，而不是在时间戳边界内将其强制转换为 `datetime64[ns]` dtype 时，出现了错误 ([GH 34843](https://github.com/pandas-dev/pandas/issues/34843)).

+   在 `Period`、`date_range()`、`period_range()`、`pd.tseries.frequencies.to_offset()` 中，`freq` 关键字不再允许元组，而应传递字符串代替 ([GH 34703](https://github.com/pandas-dev/pandas/issues/34703))

+   当将包含标量时区感知 `Timestamp` 的 `Series` 追加到空的 `DataFrame` 时，结果为对象列，而不是 `datetime64[ns, tz]` dtype ([GH 35038](https://github.com/pandas-dev/pandas/issues/35038))

+   当时间戳超出实现范围时，`OutOfBoundsDatetime` 将提供改进的错误消息。 ([GH 32967](https://github.com/pandas-dev/pandas/issues/32967))

+   当未定义规则时，`AbstractHolidayCalendar.holidays()` 中存在的错误 ([GH 31415](https://github.com/pandas-dev/pandas/issues/31415))

+   在与类似时间差的对象进行比较时，`Tick` 出现的错误引发了 `TypeError` ([GH 34088](https://github.com/pandas-dev/pandas/issues/34088))

+   在乘以浮点数时，`Tick` 出现的错误引发了 `TypeError` ([GH 34486](https://github.com/pandas-dev/pandas/issues/34486))

### 时间差

+   使用高精度整数构造 `Timedelta` 时，会对 `Timedelta` 组件进行四舍五入处理的错误 ([GH 31354](https://github.com/pandas-dev/pandas/issues/31354))

+   在将`np.nan`或`None`除以`Timedelta`时存在 bug，错误地返回`NaT`（[GH 31869](https://github.com/pandas-dev/pandas/issues/31869)）

+   现在，`Timedelta`可以理解`µs`作为微秒的标识符（[GH 32899](https://github.com/pandas-dev/pandas/issues/32899)）

+   当纳秒不为零时，`Timedelta`的字符串表示现在包括纳秒（[GH 9309](https://github.com/pandas-dev/pandas/issues/9309)）

+   当将`Timedelta`对象与具有`timedelta64` dtype 的`np.ndarray`进行比较时，存在 bug，错误地将所有条目视为不相等（[GH 33441](https://github.com/pandas-dev/pandas/issues/33441)）

+   在边缘情况下，`timedelta_range()`存在 bug，产生了额外的数据点（[GH 30353](https://github.com/pandas-dev/pandas/issues/30353)，[GH 33498](https://github.com/pandas-dev/pandas/issues/33498)）

+   在边缘情况下，`DataFrame.resample()`存在 bug，产生了额外的数据点（[GH 30353](https://github.com/pandas-dev/pandas/issues/30353)，[GH 13022](https://github.com/pandas-dev/pandas/issues/13022)，[GH 33498](https://github.com/pandas-dev/pandas/issues/33498)）

+   在处理时间差时，`DataFrame.resample()`存在一个 bug，忽略了`loffset`参数（[GH 7687](https://github.com/pandas-dev/pandas/issues/7687)，[GH 33498](https://github.com/pandas-dev/pandas/issues/33498)）

+   在字符串输入时，`Timedelta`和`pandas.to_timedelta()`存在 bug，忽略了`unit`参数（[GH 12136](https://github.com/pandas-dev/pandas/issues/12136)）

### 时区

+   在使用`infer_datetime_format=True`的情况下，`to_datetime()`存在 bug，时区名称（例如`UTC`）无法被正确解析（[GH 33133](https://github.com/pandas-dev/pandas/issues/33133)）

### 数值

+   在`axis=0`的情况下，`DataFrame.floordiv()`存在 bug，不像`Series.floordiv()`那样处理除以零的情况（[GH 31271](https://github.com/pandas-dev/pandas/issues/31271))

+   `to_numeric()` 在字符串参数为 `"uint64"` 且 `errors="coerce"` 时静默失败（[GH 32394](https://github.com/pandas-dev/pandas/issues/32394)）

+   `to_numeric()` 在 `downcast="unsigned"` 且数据为空时失败（[GH 32493](https://github.com/pandas-dev/pandas/issues/32493)）

+   `DataFrame.mean()` 在 `numeric_only=False` 且列为 `datetime64` 或 `PeriodDtype` 时错误地引发 `TypeError`（[GH 32426](https://github.com/pandas-dev/pandas/issues/32426)）

+   `DataFrame.count()` 在 `level="foo"` 且索引级别 `"foo"` 包含 NaN 时导致分段错误（[GH 21824](https://github.com/pandas-dev/pandas/issues/21824)）

+   `DataFrame.diff()` 在 `axis=1` 时存在错误，返回混合类型的不正确结果（[GH 32995](https://github.com/pandas-dev/pandas/issues/32995)）

+   `DataFrame.corr()` 和 `DataFrame.cov()` 在处理包含 `pandas.NA` 的可空整数列时引发错误（[GH 33803](https://github.com/pandas-dev/pandas/issues/33803)）

+   `DataFrame` 对象之间的算术运算存在问题，具有重复标签的非重叠列导致无限循环（[GH 35194](https://github.com/pandas-dev/pandas/issues/35194)）

+   `DataFrame` 和 `Series` 对象以及 `datetime64` 类型对象之间的加法和减法存在问题（[GH 33824](https://github.com/pandas-dev/pandas/issues/33824)）

+   `Index.difference()` 在比较 `Float64Index` 和对象 `Index` 时给出不正确结果（[GH 35217](https://github.com/pandas-dev/pandas/issues/35217)）

+   `DataFrame` 归纳（如 `df.min()`，`df.max()`）时与 `ExtensionArray` 类型的数据存在错误（[GH 34520](https://github.com/pandas-dev/pandas/issues/34520)，[GH 32651](https://github.com/pandas-dev/pandas/issues/32651)）

+   当`limit_direction`为`'forward'`或`'both'`且`method`为`'backfill'`或`'bfill'`时，`Series.interpolate()`和`DataFrame.interpolate()`现在会引发 ValueError，或`limit_direction`为`'backward'`或`'both'`且`method`为`'pad'`或`'ffill'`时（[GH 34746](https://github.com/pandas-dev/pandas/issues/34746)）

### 转换

+   构建来自具有大端`datetime64` dtype 的 NumPy 数组的`Series`存在错误（[GH 29684](https://github.com/pandas-dev/pandas/issues/29684)）

+   使用大纳秒关键字值构建`Timedelta`存在错误（[GH 32402](https://github.com/pandas-dev/pandas/issues/32402)）

+   在构建`DataFrame`时，集合将被复制而不是引发错误（[GH 32582](https://github.com/pandas-dev/pandas/issues/32582)）

+   `DataFrame`构造函数不再接受`DataFrame`对象的列表。由于对 NumPy 的更改，`DataFrame`对象现在被一致地视为 2D 对象，因此`DataFrame`对象的列表被视为 3D，并且不再适用于`DataFrame`构造函数（[GH 32289](https://github.com/pandas-dev/pandas/issues/32289)）。

+   在使用列表初始化框架并为`MultiIndex`的`columns`分配嵌套列表时存在`DataFrame`中的错误（[GH 32173](https://github.com/pandas-dev/pandas/issues/32173)）

+   当创建新索引时，对列表进行无效构造的错误消息已改进（[GH 35190](https://github.com/pandas-dev/pandas/issues/35190)）

### 字符串

+   当将“string” dtype 数据转换为可空整数 dtype 时，在`astype()`方法中存在错误（[GH 32450](https://github.com/pandas-dev/pandas/issues/32450)）

+   修复了使用`StringDtype`类型的`min`或`max`对`StringArray`或`Series`进行取值时会引发的问题（[GH 31746](https://github.com/pandas-dev/pandas/issues/31746)）

+   `Series.str.cat()`中的错误在其他为`Index`类型时返回`NaN`输出（[GH 33425](https://github.com/pandas-dev/pandas/issues/33425)）

+   `pandas.api.dtypes.is_string_dtype()`不再错误地将分类系列识别为字符串。

### 区间

+   `IntervalArray`中的错误允许在设置值时更改底层数据（[GH 32782](https://github.com/pandas-dev/pandas/issues/32782)）

### 索引操作

+   `DataFrame.xs()` 现在在提供`level`关键字且轴不是`MultiIndex`时会引发`TypeError`。之前会引发`AttributeError`（[GH 33610](https://github.com/pandas-dev/pandas/issues/33610)）

+   在`DatetimeIndex`上切片时存在错误，部分时间戳会在年末、季末或月末附近丢弃高分辨率索引（[GH 31064](https://github.com/pandas-dev/pandas/issues/31064)）

+   `PeriodIndex.get_loc()`中的错误处理高分辨率字符串与`PeriodIndex.get_value()`不同的情况（[GH 31172](https://github.com/pandas-dev/pandas/issues/31172)）

+   `Series.at()`和`DataFrame.at()`中的错误，当在`Float64Index`中查找整数时，不匹配`.loc`的行为（[GH 31329](https://github.com/pandas-dev/pandas/issues/31329)）

+   `PeriodIndex.is_monotonic()`中的错误在包含前导`NaT`条目时错误地返回`True`（[GH 31437](https://github.com/pandas-dev/pandas/issues/31437)）

+   `DatetimeIndex.get_loc()`中的错误会用转换后的整数键引发`KeyError`，而不是用户传递的键（[GH 31425](https://github.com/pandas-dev/pandas/issues/31425)）

+   `Series.xs()`中的错误在某些对象数据类型情况下错误地返回`Timestamp`而不是`datetime64`（[GH 31630](https://github.com/pandas-dev/pandas/issues/31630)）

+   `DataFrame.iat()`中的错误在某些对象数据类型情况下错误地返回`Timestamp`而不是`datetime`（[GH 32809](https://github.com/pandas-dev/pandas/issues/32809)）

+   `DataFrame.at()`中的错误，当列或索引不唯一时（[GH 33041](https://github.com/pandas-dev/pandas/issues/33041)）

+   在对象类型`Index`上使用整数键索引时，`Series.loc()`和`DataFrame.loc()`中存在错误，该对象类型不是全部整数 ([GH 31905](https://github.com/pandas-dev/pandas/issues/31905))

+   在具有重复列的`DataFrame`上，`DataFrame.iloc.__setitem__()`中存在错误，错误地设置所有匹配列的值 ([GH 15686](https://github.com/pandas-dev/pandas/issues/15686), [GH 22036](https://github.com/pandas-dev/pandas/issues/22036))

+   在带有`DatetimeIndex`、`TimedeltaIndex`或`PeriodIndex`的`DataFrame.loc()`和`Series.loc()`中，错误地允许查找不匹配的日期时间类别 ([GH 32650](https://github.com/pandas-dev/pandas/issues/32650))

+   使用非标准标量（例如`np.dtype`）进行索引时，`Series.__getitem__()`中存在错误 ([GH 32684](https://github.com/pandas-dev/pandas/issues/32684))

+   在`Index`构造函数中，为 NumPy 标量引发一个无用的错误消息 ([GH 33017](https://github.com/pandas-dev/pandas/issues/33017))

+   当`frame.index`或`frame.columns`不唯一时，`DataFrame.lookup()`中存在错误，错误地引发`AttributeError`；现在将引发一个带有有用错误消息的`ValueError` ([GH 33041](https://github.com/pandas-dev/pandas/issues/33041))

+   存在`Interval`中的错误，无法将`Timedelta`加或减到`Timestamp`区间中（[GH 32023](https://github.com/pandas-dev/pandas/issues/32023))

+   在复制后导致后续值更新未反映在复制后的值之后，`DataFrame.copy()`不使 _item_cache 失效的错误 ([GH 31784](https://github.com/pandas-dev/pandas/issues/31784))

+   修复了在提供`datetime64[ns, tz]`值时，`DataFrame.loc()`和`Series.loc()`中出现错误的回归（[GH 32395](https://github.com/pandas-dev/pandas/issues/32395))

+   在具有整数键和具有前导整数级别的`MultiIndex`的`Series.__getitem__()`中出现错误，如果键不在第一级别中，则未能引发`KeyError`（[GH 33355](https://github.com/pandas-dev/pandas/issues/33355))

+   在使用`ExtensionDtype`（例如`df.iloc[:, :1]`）对单列`DataFrame`进行切片时，在`DataFrame.iloc()`中出现错误，返回无效结果（[GH 32957](https://github.com/pandas-dev/pandas/issues/32957))

+   在将元素设置到空的`Series`中时，`DatetimeIndex.insert()`和`TimedeltaIndex.insert()`中出现的错误导致索引`freq`丢失（[GH 33573](https://github.com/pandas-dev/pandas/issues/33573))

+   在具有`IntervalIndex`和整数列表样式键的`Series.__setitem__()`中出现错误（[GH 33473](https://github.com/pandas-dev/pandas/issues/33473))

+   在`Series.__getitem__()`中允许使用`np.ndarray`、`Index`、`Series`索引器的缺失标签，但不允许使用`list`，现在所有这些都会引发`KeyError`（[GH 33646](https://github.com/pandas-dev/pandas/issues/33646))

+   在假定索引为单调递增的`DataFrame.truncate()`和`Series.truncate()`中出现错误（[GH 33756](https://github.com/pandas-dev/pandas/issues/33756))

+   使用表示日期时间的字符串列表进行索引失败，出现在`DatetimeIndex`或`PeriodIndex`上（[GH 11278](https://github.com/pandas-dev/pandas/issues/11278))

+   在与`MultiIndex`���起使用时，`Series.at()`中出现错误，对有效输入引发异常（[GH 26989](https://github.com/pandas-dev/pandas/issues/26989))

+   在使用字典值的`DataFrame.loc()`中，将`int`类型的列更改为`float`类型时存在错误（[GH 34573](https://github.com/pandas-dev/pandas/issues/34573)）

+   在使用`Series.loc()`与`MultiIndex`时存在错误，访问`None`值时会引发`IndexingError`（[GH 34318](https://github.com/pandas-dev/pandas/issues/34318)）

+   在空`DataFrame`或具有`MultiIndex`的空`Series`上，`DataFrame.reset_index()`和`Series.reset_index()`不会保留数据类型（[GH 19602](https://github.com/pandas-dev/pandas/issues/19602)）

+   在具有`NaT`条目的`DatetimeIndex`上，使用`time`键对`Series`和`DataFrame`进行索引存在错误（[GH 35114](https://github.com/pandas-dev/pandas/issues/35114)）

### 缺失

+   在空`Series`上调用`fillna()`现在会正确返回一个浅复制的对象。现在的行为与`Index`、`DataFrame`和非空`Series`一致（[GH 32543](https://github.com/pandas-dev/pandas/issues/32543)）。

+   当参数`to_replace`的类型为 dict/list 且用于包含`<NA>`的`Series`时，在`Series.replace()`中引发`TypeError`。该方法现在通过在进行替换的比较时忽略`<NA>`值来处理此问题（[GH 32621](https://github.com/pandas-dev/pandas/issues/32621)）

+   使用空值布尔型和 `skipna=False` 时，`any()` 和 `all()` 中的 bug 错误地返回所有 `False` 或所有 `True` 值为 `<NA>` ([GH 33253](https://github.com/pandas-dev/pandas/issues/33253))

+   澄清了关于使用 `method=akima` 的插值的文档。`der` 参数必须是标量或 `None` ([GH 33426](https://github.com/pandas-dev/pandas/issues/33426))

+   `DataFrame.interpolate()` 现在使用正确的轴约定。先前沿列插值会导致沿索引插值，反之亦然。此外，使用 `pad`、`ffill`、`bfill` 和 `backfill` 方法进行插值与使用 `DataFrame.fillna()` 方法完全相同 ([GH 12918](https://github.com/pandas-dev/pandas/issues/12918), [GH 29146](https://github.com/pandas-dev/pandas/issues/29146))

+   当列名为字符串类型时，`DataFrame.interpolate()` 中的 bug 在调用时会抛出 ValueError。该方法现在与列名的类型无关 ([GH 33956](https://github.com/pandas-dev/pandas/issues/33956))

+   使用格式规范将 `NA` 传递到格式字符串中现在可以正常工作。例如 `"{:.1f}".format(pd.NA)` 先前会引发 `ValueError`，但现在将返回字符串 `"<NA>"` ([GH 34740](https://github.com/pandas-dev/pandas/issues/34740))

+   在无效的 `na_action` 上，`Series.map()` 中的 bug 未引发异常 ([GH 32815](https://github.com/pandas-dev/pandas/issues/32815))

### MultiIndex

+   如果轴不是 `MultiIndex`，`DataFrame.swaplevels()` 现在会引发 `TypeError`。先前会引发 `AttributeError` ([GH 31126](https://github.com/pandas-dev/pandas/issues/31126))

+   当与 `MultiIndex` 一起使用时，`Dataframe.loc()` 中的 bug 返回的值与给定的输入顺序不同 ([GH 22797](https://github.com/pandas-dev/pandas/issues/22797))

```py
In [79]: df = pd.DataFrame(np.arange(4),
 ....:                  index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
 ....: 

# Rows are now ordered as the requested keys
In [80]: df.loc[(['b', 'a'], [2, 1]), :]
Out[80]: 
 0
b 2  3
 1  2
a 2  1
 1  0

当 sort=False 时，MultiIndex.intersection() 中的 bug 不能保证保留顺序。(GH 31325)
当截断 MultiIndex 时，DataFrame.truncate() 函数会丢失索引名称（GH 34564）。

In [81]: left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])

In [82]: right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])

# Common elements are now guaranteed to be ordered by the left side
In [83]: left.intersection(right, sort=False)
Out[83]: 
MultiIndex([('b', 2),
 ('a', 1)],
 )

在不指定级别的情况下连接两个具有不同列的 MultiIndex 时存在 bug。返回索引器参数被忽略（GH 34074）。

IO

将 set 作为 names 参数传递给 pandas.read_csv()、pandas.read_table() 或 pandas.read_fwf() 将会引发 ValueError: Names should be an ordered collection. 错误（GH 34946）。
当 display.precision 设置为零时，打印输出存在 bug（GH 20359）。
在 read_json() 中，当 json 包含大数字字符串时会发生整数溢出的 bug（GH 30320）。
当参数 header 和 prefix 都不是 None 时，read_csv() 将会引发 ValueError 错误（GH 27394）。
当 path_or_buf 是 S3 URI 时，DataFrame.to_json() 函数引发了 NotFoundError 错误（GH 28375）。
在写入 nanosecond 时间戳时，DataFrame.to_parquet() 函数会覆盖 pyarrow 的默认设置 coerce_timestamps；遵循 pyarrow 的默认设置允许在 version="2.0" 下写入 nanosecond 时间戳（GH 31652）。
read_csv() 函数在使用 sep=None 与 comment 关键字组合时引发了 TypeError 错误（GH 31396）。
在 Python 3 中，从 Python 2 写入固定格式的数据读取 DataFrame 时，HDFStore 函数会将 datetime64 列的 dtype 设置为 int64，存在 bug（GH 31750）。
read_sas() 现在处理大于Timestamp.max的日期和日期时间，将它们作为datetime.datetime对象返回（GH 20927)
Bug in DataFrame.to_json() 中Timedelta对象在使用date_format="iso"时无法正确序列化的问题（GH 28256)
read_csv() 在Dataframe中缺少parse_dates中传递的列名时将引发ValueError错误 (GH 31251)
Bug in read_excel() 中带有高代理项的 UTF-8 字符串会导致分段违规（GH 23809)
Bug in read_csv() 在空文件上导致文件描述符泄漏 (GH 31488)
Bug in read_csv() 当标题和数据行之间有空行时导致段错误 (GH 28071)
Bug in read_csv() 在权限问题上引发误导性异常 (GH 23784)
Bug in read_csv() 当header=None且有两列额外数据时引发IndexError错误
Bug in read_sas() 读取来自 Google Cloud Storage 的文件时引发AttributeError错误（GH 33069)
Bug in DataFrame.to_sql() 保存超出日期范围时引发AttributeError错误的问题（GH 26761)
Bug in read_excel() 未正确处理 OpenDocument 文本单元格中的多个嵌入空格。 (GH 32207)
read_json()在将list布尔值读入Series时引发TypeError（GH 31464）
pandas.io.json.json_normalize()中的错误，record_path指定的位置不指向数组（GH 26284）
pandas.read_hdf()在加载不支持的 HDF 文件时具有更明确的错误消息（GH 9539）
read_feather()中的错误在读取 s3 或 http 文件路径时引发ArrowIOError（GH 29055）
to_excel()中的错误无法处理列名render，会引发KeyError（GH 34331）
execute()中的错误会在 SQL 语句包含%字符且没有参数时，对一些 DB-API 驱动程序引发ProgrammingError（GH 34211）
StataReader()中的错误导致使用迭代器读取数据时，分类变量具有不同的 dtype（GH 31544）
HDFStore.keys()现在具有可选的include参数，允许检索所有本机 HDF5 表名称（GH 29916）
read_csv()和read_table()引发的TypeError异常在传递意外关键字参数时显示为parser_f（GH 25648）
read_excel()用于 ODS 文件的��误会删除 0.0 值（GH 27222）
ujson.encode()中的错误会在数字大于sys.maxsize时引发OverflowError（GH 34395）
HDFStore.append_to_multiple()中的错误会在设置min_itemsize参数时引发ValueError（GH 11238）
在create_table()中的错误现在会在输入时未指定data_columns中的column参数时引发错误（GH 28156）
read_json() 现在可以从文件 url 读取行分隔的 json 文件，同时设置 lines 和 chunksize。
DataFrame.to_sql() 在使用 MySQL 读取带有 -np.inf 条目的 DataFrame 时出现了更明确的 ValueError 错误（GH 34431)
Bug：大写文件扩展名未被 read_* 函数解压缩（GH 35164）
Bug：当 header=None 且 index_col 给定为 list 时，read_excel() 引发了 TypeError 错误 (GH 31783)
Bug：当在 MultiIndex 中的标题中使用 datetime 值时，read_excel() 引发了错误 (GH 34748)
read_excel() 不再接受 **kwds 参数。这意味着传递关键字参数 chunksize 现在会引发 TypeError 错误（之前引发 NotImplementedError 错误），而传递关键字参数 encoding 现在会引发 TypeError 错误 (GH 34464)
Bug：在时区感知的 datetime64 列中，DataFrame.to_records() 不正确地丢失了时区信息 (GH 32535)

绘图

现在，DataFrame.plot() 对于线条/柱图现在接受由字典指定的颜色 (GH 8193)
Bug：在多列中权重不起作用时，DataFrame.plot.hist() 中引发了错误 (GH 33173)
DataFrame.boxplot() 和 DataFrame.plot.boxplot() 中 medianprops、whiskerprops、capprops 和 boxprops 的颜色属性丢失了 (GH 30346)
Bug：DataFrame.hist() 中 column 参数的顺序被忽略了 (GH 29235)
DataFrame.plot.scatter()中的一个错误，即使添加了不同的cmap多个图，颜色条始终使用第一个cmap（GH 33389)
DataFrame.plot.scatter()中的一个错误，即使参数c分配给包含颜色名称的列，也会向图中添加一个颜色条（GH 34316)
pandas.plotting.bootstrap_plot()中的一个错误，导致杂乱的坐标轴和重叠的标签（GH 34905)
DataFrame.plot.scatter()中的一个错误导致绘制变量标记大小时出错（GH 32904)

GroupBy/resample/rolling

使用pandas.api.indexers.BaseIndexer进行count、min、max、median、skew、cov、corr操作现在将为任何单调的pandas.api.indexers.BaseIndexer后代返回正确的结果（GH 32865)
DataFrameGroupby.mean()和SeriesGroupby.mean()（以及类似的median()、std()和var()）现在如果传入一个不被接受的关键字参数，会引发TypeError。以前会引发UnsupportedFunctionCall（如果将min_count传入median()，会引发AssertionError）（GH 31485)
DataFrameGroupBy.apply()和SeriesGroupBy.apply()中的一个错误，在by轴未排序、存在重复值，并且应用的func不会改变传入对象时，会引发ValueError（GH 30667)
DataFrameGroupBy.transform()中的一个错误，使用转换函数会产生不正确的结果（GH 30918)
DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 中存在的错误，当按多个键进行分组时，其中一些是分类的，而其他不是时，返回的结果错误（GH 32494）
DataFrameGroupBy.count() 和 SeriesGroupBy.count() 中存在的错误，当分组列包含 NaN 时，会导致分段错误（GH 32841）
DataFrame.groupby() 和 Series.groupby() 中存在的错误，在聚合布尔类型的 Series 时产生不一致的类型（GH 32894）
DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 中存在的错误，当非空值数量低于可空整数类型的 min_count 时，会返回一个大的负数（GH 32861）
SeriesGroupBy.quantile() 中存在的错误，对可空整数类型会产生异常（GH 33136）
DataFrame.resample() 中存在的错误，当结果为带有 DST 过渡的时区感知 DatetimeIndex 在午夜时发生时，会引发 AmbiguousTimeError（GH 25758）
DataFrame.groupby() 中存在的错误，当通过只读类别进行分组时，并且 sort=False 时，会引发 ValueError（GH 33410）
DataFrameGroupBy.agg()、SeriesGroupBy.agg()、DataFrameGroupBy.transform()、SeriesGroupBy.transform()、DataFrameGroupBy.resample() 和 SeriesGroupBy.resample() 中的一个子类没有被保留（GH 28330) 的错误。
在 SeriesGroupBy.agg() 中，之前在 SeriesGroupBy 的命名聚合中接受了任何列名。现在的行为仅允许 str 和可调用对象，否则会引发 TypeError。(GH 34422)。
在 DataFrame.groupby() 中，当 agg 键中的一个引用了空列表时，丢失了 Index 的名称（GH 32580)。
在 Rolling.apply() 中，当指定 engine='numba' 时忽略了 center=True 的错误（GH 34784)。
DataFrame.ewm.cov() 中存在错误，对于 MultiIndex 输入抛出 AssertionError (GH 34440)。
在 core.groupby.DataFrameGroupBy.quantile() 中，对于非数值类型抛出 TypeError 而不是删除列（GH 27892)。
在 func='nunique' 且列的类型为 datetime64 时，core.groupby.DataFrameGroupBy.transform() 中存在错误，结果类型也将是 datetime64 而不是 int64 (GH 35109)。
在DataFrame.groupby()中存在错误，当选择列并使用as_index=False进行聚合时，会引发AttributeError（GH 35246）。
在DataFrameGroupBy.first()和DataFrameGroupBy.last()中存在错误，当在多个Categoricals上进行分组时，会产生不必要的ValueError（GH 34951）

重塑

影响所有数值和布尔缩减方法的错误，未返回子类化的数据类型。 (GH 25596)
DataFrame.pivot_table() 在仅设置了MultiIndexed列时存在错误（GH 17038）
在DataFrame.unstack()和Series.unstack()中存在错误，可以在MultiIndexed数据中使用元组名称（GH 19966）
在DataFrame.pivot_table() 中存在错误，当margin为True且仅定义了column时（GH 31016）
修正了DataFrame.pivot()中错误的错误消息，当columns设置为None时。（GH 30924）
在crosstab()中存在错误，当输入为两个Series且具有元组名称时，输出将保留一个虚拟的MultiIndex作为列。（GH 18321）
DataFrame.pivot() 现在可以接受index和columns参数的列表（GH 21425）
在concat()中存在错误，当copy=True时，结果的索引未被复制（GH 29879）
在SeriesGroupBy.aggregate()中存在错误，当它们共享相同名称时，聚合将被覆盖（GH 30880）
在将 Float64Index 转换为 Int64Index 时，或者在转换为 ExtensionArray dtype 时，Index.astype() 会丢失 name 属性的错误 (GH 32013)
当传递一个 DataFrame 或包含 DataFrame 的序列时，Series.append() 现在会引发 TypeError 错误 (GH 31413)
DataFrame.replace() 和 Series.replace() 如果 to_replace 不是预期类型，则会引发 TypeError 错误。之前的 replace 会静默失败（GH 18634）
在对 Series 进行原地操作时存在一个错误，该错误将一列添加到从中删除的 DataFrame（使用 inplace=True）的操作中（GH 30484）
在 DataFrame.apply() 中存在一个错误，即使请求了 raw=True，回调也会使用 Series 参数调用。(GH 32423)
在创建带有时区感知的 dtype 的列的 MultiIndex 时，DataFrame.pivot_table() 丢失时区信息 (GH 32558)
在 concat() 中存在一个错误，当将非字典映射作为 objs 传递时，会引发 TypeError 错误（GH 32863）
DataFrame.agg() 现在在尝试聚合不存在的列时提供了更具描述性的 SpecificationError 消息（GH 32755）
在使用MultiIndex列和MultiIndex行时，DataFrame.unstack()中的错误 (GH 32624、GH 24729 和 GH 28306)
在将字典附加到DataFrame时，如果不传递ignore_index=True，将引发TypeError: Can only append a dict if ignore_index=True，而不是TypeError: Can only append a :class:Series if ignore_index=True or if the :class:Series has a name (GH 30871)
DataFrame.corrwith()、DataFrame.memory_usage()、DataFrame.dot()、DataFrame.idxmin()、DataFrame.idxmax()、DataFrame.duplicated()、DataFrame.isin()、DataFrame.count()、Series.explode()、Series.asof() 和 DataFrame.asof() 未返回子类类型。 (GH 31331)
concat()中的错误未允许对具有重复键的DataFrame和Series进行连接 (GH 33654)
cut()中的错误在参数labels包含重复值时引发错误 (GH 33141)
确保只有命名函数可以在eval()中使用（GH 32460）
在某些情况下，Dataframe.aggregate()和Series.aggregate()中的错误导致了递归循环（GH 34224）
修复了melt()中的错误，在 melting 具有col_level > 0的MultiIndex列时，会在id_vars上引发KeyError（GH 34129）
在Series.where()中的错误，使用一个空的Series和非 bool dtype 的空cond时，会引发错误（GH 34592）
修复了DataFrame.apply()在元素具有S dtype 时引发ValueError的回归问题（GH 34529）

稀疏

从带有时区信息的 dtype 创建SparseArray将在删除时区信息之前发出警告，而不是悄悄地这样做（GH 32501）
修复arrays.SparseArray.from_spmatrix()中的错误，错误地读取了 scipy 稀疏矩阵（GH 31991）
在SparseArray上使用Series.sum()引发了TypeError（GH 25777）
当由类似列表索引时，DataFrame包含全部稀疏SparseArray时填充NaN（GH 27781，GH 29563）时的错误
SparseDtype的 repr 现在包括其fill_value属性的 repr。之前它使用了fill_value的字符串表示（GH 34352）
修复了当空的DataFrame无法转换为包含空SparseDtype时的错误（GH 33113）
在使用可迭代对象索引稀疏数据框时，arrays.SparseArray()中的错误返回了不正确的类型（GH 34526, GH 34540)

ExtensionArray

修复了Series.value_counts()在空的Int64类型输入时会引发错误的问题（GH 33317)
修复了在使用concat()连接具有不重叠列的DataFrame对象时，导致对象类型列而不是保留扩展类型的错误（GH 27692, GH 33027)
修复了当pandas.options.mode.use_inf_as_na设置为True时，StringArray.isna()对 NA 值返回False的错误（GH 33655)
修复了使用 EA 类型和索引但没有数据或标量数据构建Series时失败的错误（GH 26469)
修复了导致Series.__repr__()对其元素为多维数组的扩展类型崩溃的错误（GH 33770).
修复了Series.update()在具有缺失值的ExtensionArray类型时引发ValueError的错误（GH 33980)
修复了StringArray.memory_usage()未实现的错误（GH 33963)
修复了DataFrameGroupBy()在对可空布尔类型进行聚合时会忽略min_count参数的错误（GH 34051)
修复了使用dtype='string'构造DataFrame时失败的错误（GH 27953, GH 33623)
修复了DataFrame列设置为标量扩展类型时被视为对象类型而不是扩展类型的错误（GH 34832)
修复了IntegerArray.astype()中的错误，正确复制了掩码（GH 34931）。

其他

在对象数据类型 Index 上执行集合操作现在始终返回对象数据类型的结果 (GH 31401)
修复了 pandas.testing.assert_series_equal()，如果 left 参数是不同子类且 check_series_type=True，则正确引发异常 (GH 32670).
在 DataFrame.query() 或 DataFrame.eval() 字符串中获取缺失属性时，会正确引发 AttributeError (GH 32408)
修复了pandas.testing.assert_series_equal()中的 bug，当 check_dtype 为 False 时，检查了 Interval 和 ExtensionArray 操作数的数据类型（GH 32747)
DataFrame.__dir__() 中的 bug 导致在列名中使用 Unicode 代理时段错误（GH 25509)
DataFrame.equals() 和 Series.equals() 中的 bug，允许子类相等（GH 34402). ## 贡献者

本次发布共有 368 人贡献了补丁。名字后面带有“+”符号的人第一次贡献了补丁。

3vts +
A Brooks +
Abbie Popa +
Achmad Syarif Hidayatullah +
Adam W Bagaskarta +
Adrian Mastronardi +
Aidan Montare +
Akbar Septriyan +
Akos Furton +
Alejandro Hall +
Alex Hall +
Alex Itkes +
Alex Kirko
Ali McMaster +
Alvaro Aleman +
Amy Graham +
Andrew Schonfeld +
Andrew Shumanskiy +
Andrew Wieteska +
Angela Ambroz
Anjali Singh +
Anna Daglis
Anthony Milbourne +
Antony Lee +
Ari Sosnovsky +
Arkadeep Adhikari +
Arunim Samudra +
Ashkan +
Ashwin Prakash Nalwade +
Ashwin Srinath +
Atsushi Nukariya +
Ayappan +
Ayla Khan +
Bart +
Bart Broere +
Benjamin Beier Liu +
Benjamin Fischer +
Bharat Raghunathan
Bradley Dice +
Brendan Sullivan +
Brian Strand +
Carsten van Weelden +
Chamoun Saoma +
ChrisRobo +
Christian Chwala
Christopher Whelan
Christos Petropoulos +
Chuanzhu Xu
CloseChoice +
Clément Robert +
CuylenE +
DanBasson +
Daniel Saxton
Danilo Horta +
DavaIlhamHaeruzaman +
Dave Hirschfeld
Dave Hughes
David Rouquet +
David S +
Deepyaman Datta
Dennis Bakhuis +
Derek McCammond +
Devjeet Roy +
Diane Trout
Dina +
Dom +
Drew Seibert +
EdAbati
Emiliano Jordan +
Erfan Nariman +
Eric Groszman +
Erik Hasse +
Erkam Uyanik +
Evan D +
Evan Kanter +
Fangchen Li +
Farhan Reynaldo +
Farhan Reynaldo Hutabarat +
Florian Jetter +
Fred Reiss +
GYHHAHA +
Gabriel Moreira +
Gabriel Tutui +
Galuh Sahid
Gaurav Chauhan +
George Hartzell +
Gim Seng +
Giovanni Lanzani +
Gordon Chen +
Graham Wetzler +
Guillaume Lemaitre
Guillem Sánchez +
HH-MWB +
Harshavardhan Bachina
How Si Wei
Ian Eaves
Iqrar Agalosi Nureyza +
Irv Lustig
Iva Laginja +
JDkuba
Jack Greisman +
Jacob Austin +
Jacob Deppen +
Jacob Peacock +
Jake Tae +
Jake Vanderplas +
James Cobon-Kerr
Jan Červenka +
Jan Škoda
Jane Chen +
Jean-Francois Zinque +
Jeanderson Barros Candido +
Jeff Reback
Jered Dominguez-Trujillo +
Jeremy Schendel
Jesse Farnham
Jiaxiang
Jihwan Song +
Joaquim L. Viegas +
Joel Nothman
John Bodley +
John Paton +
Jon Thielen +
Joris Van den Bossche
Jose Manuel Martí +
Joseph Gulian +
Josh Dimarsky
Joy Bhalla +
João Veiga +
Julian de Ruiter +
Justin Essert +
Justin Zheng
KD-dev-lab +
Kaiqi Dong
Karthik Mathur +
Kaushal Rohit +
Kee Chong Tan
Ken Mankoff +
Kendall Masse
Kenny Huynh +
Ketan +
Kevin Anderson +
Kevin Bowey +
Kevin Sheppard
Kilian Lieret +
Koki Nishihara +
Krishna Chivukula +
KrishnaSai2020 +
Lesley +
Lewis Cowles +
Linda Chen +
Linxiao Wu +
Lucca Delchiaro Costabile +
MBrouns +
Mabel Villalba
Mabroor Ahmed +
Madhuri Palanivelu +
Mak Sze Chun
Malcolm +
Marc Garcia
Marco Gorelli
Marian Denes +
Martin Bjeldbak Madsen +
Martin Durant +
Martin Fleischmann +
Martin Jones +
Martin Winkel
Martina Oefelein +
Marvzinc +
María Marino +
Matheus Cardoso +
Mathis Felardos +
Matt Roeschke
Matteo Felici +
Matteo Santamaria +
Matthew Roeschke
Matthias Bussonnier
Max Chen
Max Halford +
Mayank Bisht +
Megan Thong +
Michael Marino +
Miguel Marques +
Mike Kutzma
Mohammad Hasnain Mohsin Rajan +
Mohammad Jafar Mashhadi +
MomIsBestFriend
Monica +
Natalie Jann
Nate Armstrong +
Nathanael +
Nick Newman +
Nico Schlömer +
Niklas Weber +
ObliviousParadigm +
Olga Lyashevska +
OlivierLuG +
Pandas Development Team
Parallels +
Patrick +
Patrick Cando +
Paul Lilley +
Paul Sanders +
Pearcekieser +
Pedro Larroy +
Pedro Reys
Peter Bull +
Peter Steinbach +
Phan Duc Nhat Minh +
Phil Kirlin +
Pierre-Yves Bourguignon +
Piotr Kasprzyk +
Piotr Niełacny +
Prakhar Pandey
Prashant Anand +
Puneetha Pai +
Quang Nguyễn +
Rafael Jaimes III +
Rafif +
RaisaDZ +
Rakshit Naidu +
Ram Rachum +
Red +
Ricardo Alanis +
Richard Shadrach +
Rik-de-Kort
Robert de Vries
Robin to Roxel +
Roger Erens +
Rohith295 +
Roman Yurchak
Ror +
Rushabh Vasani
Ryan
Ryan Nazareth
SAI SRAVAN MEDICHERLA +
SHUBH CHATTERJEE +
Sam Cohan
Samira-g-js +
Sandu Ursu +
Sang Agung +
SanthoshBala18 +
Sasidhar Kasturi +
SatheeshKumar Mohan +
Saul Shanabrook
Scott Gigante +
Sebastian Berg +
Sebastián Vanrell
Sergei Chipiga +
Sergey +
ShilpaSugan +
Simon Gibbons
Simon Hawkins
Simon Legner +
Soham Tiwari +
Song Wenhao +
Souvik Mandal
Spencer Clark
Steffen Rehberg +
Steffen Schmitz +
Stijn Van Hoey
Stéphan Taljaard
SultanOrazbayev +
Sumanau Sareen
SurajH1 +
Suvayu Ali +
Terji Petersen
Thomas J Fan +
Thomas Li
Thomas Smith +
Tim Swast
Tobias Pitters +
Tom +
Tom Augspurger
Uwe L. Korn
Valentin Iovene +
Vandana Iyer +
Venkatesh Datta +
Vijay Sai Mutyala +
Vikas Pandey
Vipul Rai +
Vishwam Pandya +
Vladimir Berkutov +
Will Ayd
Will Holmgren
William +
William Ayd
Yago González +
Yosuke KOBAYASHI +
Zachary Lawrence +
Zaky Bilfagih +
Zeb Nicholls +
alimcmaster1
alm +
andhikayusup +
andresmcneill +
avinashpancham +
benabel +
bernie gray +
biddwan09 +
brock +
chris-b1
cleconte987 +
dan1261 +
david-cortes +
davidwales +
dequadras +
dhuettenmoser +
dilex42 +
elmonsomiat +
epizzigoni +
fjetter
gabrielvf1 +
gdex1 +
gfyoung
guru kiran +
h-vishal
iamshwin
jamin-aws-ospo +
jbrockmendel
jfcorbett +
jnecus +
kernc
kota matsuoka +
kylekeppler +
leandermaben +
link2xt +
manoj_koneni +
marydmit +
masterpiga +
maxime.song +
mglasder +
moaraccounts +
mproszewska
neilkg
nrebena
ossdev07 +
paihu
pan Jacek +
partev +
patrick +
pedrooa +
pizzathief +
proost
pvanhauw +
rbenes
rebecca-palmer
rhshadrach +
rjfs +
s-scherrer +
sage +
sagungrp +
salem3358 +
saloni30 +
smartswdeveloper +
smartvinnetou +
themien +
timhunderwood +
tolhassianipar +
tonywu1999
tsvikas
tv3141
venkateshdatta1993 +
vivikelapoutre +
willbowditch +
willpeppo +
za +
zaki-indra +

Enhancements

loc 引发的 KeyError 指定缺少的标签

以前，如果在.loc调用中缺少标签，将引发 KeyError，指出不再支持此操作。

现在，错误消息还包括缺少标签的列表（最多 10 个项目，显示宽度为 80 个字符）。请参阅GH 34272。 ### 所有的数据类型现在都可以转换为StringDtype

以前，只有在数据已经只有str或类似 nan 时，才能声明或转换为StringDtype（GH 31204）。现在，在所有astype(str)或dtype=str的情况下都可以使用StringDtype：

例如，下面的现在可以工作：

In [1]: ser = pd.Series([1, "abc", np.nan], dtype="string")

In [2]: ser
Out[2]: 
0       1
1     abc
2    <NA>
dtype: string

In [3]: ser[0]
Out[3]: '1'

In [4]: pd.Series([1, 2, np.nan], dtype="Int64").astype("string")
Out[4]: 
0       1
1       2
2    <NA>
dtype: string 
```### 非单调 PeriodIndex 部分字符串切片

`PeriodIndex`现在支持对非单调索引进行部分字符串切片，反映了`DatetimeIndex`的行为（[GH 31096](https://github.com/pandas-dev/pandas/issues/31096)）

例如：

```py
In [5]: dti = pd.date_range("2014-01-01", periods=30, freq="30D")

In [6]: pi = dti.to_period("D")

In [7]: ser_monotonic = pd.Series(np.arange(30), index=pi)

In [8]: shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))

In [9]: ser = ser_monotonic.iloc[shuffler]

In [10]: ser
Out[10]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
 ..
2015-09-23    21
2015-11-22    23
2016-01-21    25
2016-03-21    27
2016-05-20    29
Freq: D, Length: 30, dtype: int64

In [11]: ser["2014"]
Out[11]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
2014-10-28    10
2014-12-27    12
2014-01-31     1
2014-04-01     3
2014-05-31     5
2014-07-30     7
2014-09-28     9
2014-11-27    11
Freq: D, dtype: int64

In [12]: ser.loc["May 2015"]
Out[12]: 
2015-05-26    17
Freq: D, dtype: int64 
```### 比较两个`DataFrame`或两个`Series`并总结差异

我们添加了`DataFrame.compare()`和`Series.compare()`，用于比较两个`DataFrame`或两个`Series`（[GH 30429](https://github.com/pandas-dev/pandas/issues/30429)）

```py
In [13]: df = pd.DataFrame(
 ....:    {
 ....:        "col1": ["a", "a", "b", "b", "a"],
 ....:        "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
 ....:        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
 ....:    },
 ....:    columns=["col1", "col2", "col3"],
 ....: )
 ....: 

In [14]: df
Out[14]: 
 col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

In [15]: df2 = df.copy()

In [16]: df2.loc[0, 'col1'] = 'c'

In [17]: df2.loc[2, 'col3'] = 4.0

In [18]: df2
Out[18]: 
 col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

In [19]: df.compare(df2)
Out[19]: 
 col1       col3 
 self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

有关更多详细信息，请参见用户指南。### 允许在 groupby 键中包含 NA

使用 groupby，我们在DataFrame.groupby()和Series.groupby()中添加了一个dropna关键字，以允许在分组键中包含NA值。用户可以将dropna定义为False，如果他们想要在 groupby 键中包含NA值。默认设置为True，以保持向后兼容性（GH 3729)

In [20]: df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]

In [21]: df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])

In [22]: df_dropna
Out[22]: 
 a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

# Default ``dropna`` is set to True, which will exclude NaNs in keys
In [23]: df_dropna.groupby(by=["b"], dropna=True).sum()
Out[23]: 
 a  c
b 
1.0  2  3
2.0  2  5

# In order to allow NaN in keys, set ``dropna`` to False
In [24]: df_dropna.groupby(by=["b"], dropna=False).sum()
Out[24]: 
 a  c
b 
1.0  2  3
2.0  2  5
NaN  1  4

dropna参数的默认设置是True，这意味着NA不包括在分组键中。### 使用键进行排序

我们在DataFrame和Series排序方法中添加了一个key参数，包括DataFrame.sort_values()，DataFrame.sort_index()，Series.sort_values()和Series.sort_index()。key可以是任何可调用函数，它会逐列应用于每个用于排序的列，然后执行排序（GH 27237）。有关更多信息，请参见 sort_values with keys 和 sort_index with keys。

In [25]: s = pd.Series(['C', 'a', 'B'])

In [26]: s
Out[26]: 
0    C
1    a
2    B
dtype: object

In [27]: s.sort_values()
Out[27]: 
2    B
0    C
1    a
dtype: object

请注意这是按大写字母优先排序的。如果我们应用Series.str.lower()方法，我们会得到

In [28]: s.sort_values(key=lambda x: x.str.lower())
Out[28]: 
1    a
2    B
0    C
dtype: object

当应用于DataFrame时，键会逐列应用于所有列或指定了by时的子集，例如

In [29]: df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
 ....:                   'b': [1, 2, 3, 4, 5, 6]})
 ....: 

In [30]: df
Out[30]: 
 a  b
0  C  1
1  C  2
2  a  3
3  a  4
4  B  5
5  B  6

In [31]: df.sort_values(by=['a'], key=lambda col: col.str.lower())
Out[31]: 
 a  b
2  a  3
3  a  4
4  B  5
5  B  6
0  C  1
1  C  2

更多细节，请参见 DataFrame.sort_values()，Series.sort_values() 和 sort_index() 中的示例和文档。### Timestamp 构造函数中的 Fold 参数支持

Timestamp: 现在支持关键字参数 fold，类似于父类 datetime.datetime 类的 PEP 495。它支持接受 fold 作为初始化参数，并从其他构造函数参数推断 fold（GH 25057, GH 31338）。支持仅限于 dateutil 时区，因为 pytz 不支持 fold。

例如：

In [32]: ts = pd.Timestamp("2019-10-27 01:30:00+00:00")

In [33]: ts.fold
Out[33]: 0

In [34]: ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
 ....:                  tz="dateutil/Europe/London", fold=1)
 ....: 

In [35]: ts
Out[35]: Timestamp('2019-10-27 01:30:00+0000', tz='dateutil//usr/share/zoneinfo/Europe/London')

更多关于使用 fold 的信息，请参见用户指南中的 Fold 子节。### 解析带有不同时区的时区感知格式到 to_datetime

to_datetime() 现在支持解析包含时区名称（%Z）和 UTC 偏移量（%z）的格式，然后通过设置 utc=True 将它们转换为 UTC。如果未设置 utc=True，则会返回一个带有 UTC 时区的 DatetimeIndex，而不是带有 object dtype 的 Index（GH 32792）。

例如：

In [36]: tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
 ....:           "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
 ....: 

In [37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
Out[37]: 
DatetimeIndex(['2010-01-01 11:00:00+00:00', '2010-01-01 13:00:00+00:00',
 '2010-01-01 09:00:00+00:00', '2010-01-01 08:00:00+00:00'],
 dtype='datetime64[ns, UTC]', freq=None)

In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[37]:
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
       2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
      dtype='object') 
```### Grouper 和 resample 现在支持参数 origin 和 offset

`Grouper` 和 `DataFrame.resample()` 现在支持参数 `origin` 和 `offset`。它允许用户控制用于调整分组的时间戳。([GH 31809](https://github.com/pandas-dev/pandas/issues/31809))

分组的区间根据时间序列起始点的当天开始时间进行调整。这适用于是天数的倍数的频率（如 `30D`）或分割一天的频率（如 `90s` 或 `1min`）。但是，它可能与一些不符合此条件的频率存在不一致性。要更改此行为，现在可以使用参数 `origin` 指定一个固定的时间戳。

现在两个参数已经被弃用（详细信息请参见 `DataFrame.resample()` 的文档）：

+   `base` 应该被替换为 `offset`。

+   `loffset`应该被直接添加到被重新采样后的索引`DataFrame`中以替代。

使用`origin`的一个小例子：

```py
In [38]: start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'

In [39]: middle = '2000-10-02 00:00:00'

In [40]: rng = pd.date_range(start, end, freq='7min')

In [41]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [42]: ts
Out[42]: 
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64

使用默认行为'start_day'重新采样（origin 为2000-10-01 00:00:00）：

In [43]: ts.resample('17min').sum()
Out[43]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

In [44]: ts.resample('17min', origin='start_day').sum()
Out[44]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

使用固定 origin 重新采样：

In [45]: ts.resample('17min', origin='epoch').sum()
Out[45]: 
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64

In [46]: ts.resample('17min', origin='2000-01-01').sum()
Out[46]: 
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

如果需要，您可以使用参数offset（一个Timedelta）来调整箱子，该参数将添加到默认的origin。

完整示例，请参见：使用 origin 或 offset 来调整箱子的起始位置。

现在使用`fsspec`来处理文件系统

对于除了本地文件系统以外的文件系统的读写以及从 HTTP(S)读取，可选依赖项fsspec将用于分派操作（GH 33452）。这将为 S3 和 GCS 存储提供不变的功能，这些存储已经受支持，但还将支持其他几种存储实现，如Azure Datalake 和 Blob、SSH、FTP、dropbox 和 github。有关文档和功能，请参阅fsspec 文档。

与 S3 和 GCS 接口的现有功能不受此更改影响，因为fsspec仍将引入与以前相同的软件包。

其他增强功能

与 matplotlib 3.3.0 兼容（GH 34850）
IntegerArray.astype()现在支持datetime64 dtype（GH 32538）
IntegerArray现在实现了sum操作（GH 33172）
添加了pandas.errors.InvalidIndexError（GH 34570）。
添加了DataFrame.value_counts()（GH 5377）
添加了一个pandas.api.indexers.FixedForwardWindowIndexer()类来支持rolling操作期间的前瞻窗口。
添加了一个pandas.api.indexers.VariableOffsetWindowIndexer()类来支持具有非固定偏移的rolling操作（GH 34994）
describe() 现在包括 datetime_is_numeric 关键字，用于控制如何总结日期时间列 (GH 30164, GH 34798)。
Styler 现在可以更有效地渲染 CSS，在多个单元格具有相同样式时 (GH 30876)。
highlight_null() 现在接受 subset 参数 (GH 31345)。
在直接写入 sqlite 连接时 DataFrame.to_sql() 现在支持 multi 方法 (GH 29921)。
pandas.errors.OptionError 现在在 pandas.errors 中公开 (GH 27553)。
添加了 api.extensions.ExtensionArray.argmax() 和 api.extensions.ExtensionArray.argmin() (GH 24382)。
timedelta_range() 现在在传递 start、stop 和 periods 时会推断频率 (GH 32377)。
在 IntervalIndex 上的位置切片现在支持 step > 1 的切片 (GH 31658)。
Series.str 现在具有 fullmatch 方法，该方法针对 Series 中每一行的整个字符串与正则表达式进行匹配，类似于 re.fullmatch (GH 32806)。
DataFrame.sample() 现在还允许将类似数组和 BitGenerator 对象传递给 random_state 作为种子 (GH 32503)。
Index.union() 现在对于 MultiIndex 对象，如果对象内部无法排序，将会引发 RuntimeWarning。传递 sort=False 可以抑制此警告（GH 33015）
新增了 Series.dt.isocalendar() 和 DatetimeIndex.isocalendar()，返回一个根据 ISO 8601 日历计算的包含年份、周和日期的 DataFrame（GH 33206, GH 34392）。
DataFrame.to_feather() 方法现在支持额外的关键字参数（例如设置压缩），这些参数在 pyarrow 0.17 中新增（GH 33422）。
cut() 现在将接受参数 ordered，默认为 ordered=True。如果 ordered=False 并且未提供标签，则会引发错误（GH 33141）
DataFrame.to_csv()、DataFrame.to_pickle() 和 DataFrame.to_json() 现在支持在使用 gzip 和 bz2 协议时传递压缩参数的字典。这可以用来设置自定义压缩级别，例如，df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1} （GH 33196）
melt() 现在新增了一个 ignore_index（默认为 True）参数，如果设置为 False，则会阻止该方法删除索引（GH 17440）。
Series.update() 现在接受可以强制转换为 Series 的对象，例如 dict 和 list，与 DataFrame.update() 的行为相同（GH 33215）
DataFrameGroupBy.transform() 和 DataFrameGroupBy.aggregate() 现在具有支持使用 Numba 执行函数的 engine 和 engine_kwargs 参数（GH 32854, GH 33388)
Resampler.interpolate() 现在支持 SciPy 插值方法 scipy.interpolate.CubicSpline 作为方法 cubicspline（GH 33670）
DataFrameGroupBy 和 SeriesGroupBy 现在实现了在组内进行随机抽样的 sample 方法（GH 31775）
DataFrame.to_numpy() 现在支持 na_value 关键字来控制输出数组中的 NA 值标记（GH 33820）
添加了 api.extension.ExtensionArray.equals 到扩展数组接口，类似于 Series.equals()（GH 27081）
在 read_stata() 和 StataReader 中，最低支持的 dta 版本已增加到 105（GH 26667）。
to_stata() 支持使用 compression 关键字参数进行压缩。可以通过字符串或包含方法和传递给压缩库的任何其他参数的字典来推断或显式设置压缩。压缩还添加到了低级 Stata 文件写入器 StataWriter、StataWriter117 和 StataWriterUTF8 中（GH 26599）。
HDFStore.put() 现在接受一个 track_times 参数。此参数传递给 PyTables 的 create_table 方法（GH 32682）。
Series.plot() 和 DataFrame.plot() 现在接受 xlabel 和 ylabel 参数来在 x 和 y 轴上显示标签 (GH 9093)。
将 Rolling 和 Expanding 可迭代 (GH 11704)。
将 option_context 改为 contextlib.ContextDecorator，这样可以将其用作整个函数的装饰器 (GH 34253)。
DataFrame.to_csv() 和 Series.to_csv() 现在接受一个 errors 参数 (GH 22610)。
DataFrameGroupBy.groupby.transform() 现在允许 func 为 pad、backfill 和 cumcount (GH 31269)。
read_json() 现在接受一个 nrows 参数 (GH 33916)。
DataFrame.hist()、Series.hist()、core.groupby.DataFrameGroupBy.hist() 和 core.groupby.SeriesGroupBy.hist() 现在增加了 legend 参数。设置为 True 以在直方图中显示图例 (GH 6279)。
concat() 和 append() 现在保留扩展数据类型，例如将可空整数列与 numpy 整数列合并将不再导致对象数据类型，而是保留整数数据类型 (GH 33607, GH 34339, GH 34095)。
read_gbq() 现在允许禁用进度条 (GH 33360)。
read_gbq() 现在支持来自 pandas-gbq 的 max_results kwarg（GH 34639）。
DataFrame.cov() 和 Series.cov() 现在支持新参数 ddof，以支持与相应的 numpy 方法相同的 delta 自由度（GH 34611）。
DataFrame.to_html() 和 DataFrame.to_string() 的 col_space 参数现在接受列表或字典以仅更改某些特定列的宽度（GH 28917）。
DataFrame.to_excel() 现在还可以写入 OpenOffice 电子表格 (.ods) 文件（GH 27222）。
explode() 现在接受 ignore_index 以重置索引，类似于 pd.concat() 或 DataFrame.sort_values()（GH 34932）。
DataFrame.to_markdown() 和 Series.to_markdown() 现在接受 index 参数作为 tabulate 的 showindex 的别名（GH 32667）。
read_csv() 现在接受像“0”、“0.0”、“1”、“1.0”这样的字符串值作为可转换为可空布尔值的 dtype（GH 34859）。
ExponentialMovingWindow 现在支持 times 参数，允许根据 times 中的时间戳间隔计算 mean（GH 34839）。
DataFrame.agg() 和 Series.agg() 现在接受命名聚合以重命名输出列/索引（GH 26513）。
当可用时，compute.use_numba 现在作为一个配置选项存在，使用 numba 引擎（GH 33966, GH 35374）
Series.plot() 现在支持不对称的误差条。以前，如果 Series.plot() 收到了一个带有 yerr 和/或 xerr 的“2xN”数组的错误值，左/下值（第一行）会被镜像，而右/上值（第二行）会被忽略。现在，第一行表示左/下的误差值，第二行表示右/上的误差值（GH 9536）### 由 loc 引发的 KeyError 指定缺少的标签

之前，如果在 .loc 调用中缺少标签，则会引发 KeyError，指出不再支持此操作。

现在错误消息还包括缺少的标签列表（最多 10 个项目，显示宽度 80 个字符）。请参阅 GH 34272。

所有 dtypes 现在都可以转换为 `StringDtype`

以前，只有在数据已经是 str 或类似于 nan 时，才能声明或转换为 StringDtype。现在，StringDtype 在所有情况下都能工作，就像 astype(str) 或 dtype=str 一样：

例如，现在以下内容可以工作：

In [1]: ser = pd.Series([1, "abc", np.nan], dtype="string")

In [2]: ser
Out[2]: 
0       1
1     abc
2    <NA>
dtype: string

In [3]: ser[0]
Out[3]: '1'

In [4]: pd.Series([1, 2, np.nan], dtype="Int64").astype("string")
Out[4]: 
0       1
1       2
2    <NA>
dtype: string

非单调 `PeriodIndex` 的部分字符串切片

PeriodIndex 现在支持非单调索引的部分字符串切片，与 DatetimeIndex 的行为相同（GH 31096）

例如：

In [5]: dti = pd.date_range("2014-01-01", periods=30, freq="30D")

In [6]: pi = dti.to_period("D")

In [7]: ser_monotonic = pd.Series(np.arange(30), index=pi)

In [8]: shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))

In [9]: ser = ser_monotonic.iloc[shuffler]

In [10]: ser
Out[10]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
 ..
2015-09-23    21
2015-11-22    23
2016-01-21    25
2016-03-21    27
2016-05-20    29
Freq: D, Length: 30, dtype: int64

In [11]: ser["2014"]
Out[11]: 
2014-01-01     0
2014-03-02     2
2014-05-01     4
2014-06-30     6
2014-08-29     8
2014-10-28    10
2014-12-27    12
2014-01-31     1
2014-04-01     3
2014-05-31     5
2014-07-30     7
2014-09-28     9
2014-11-27    11
Freq: D, dtype: int64

In [12]: ser.loc["May 2015"]
Out[12]: 
2015-05-26    17
Freq: D, dtype: int64

比较两个 `DataFrame` 或两个 `Series` 并总结差异

我们添加了 DataFrame.compare() 和 Series.compare() 用于比较两个 DataFrame 或两个 Series（GH 30429）

In [13]: df = pd.DataFrame(
 ....:    {
 ....:        "col1": ["a", "a", "b", "b", "a"],
 ....:        "col2": [1.0, 2.0, 3.0, np.nan, 5.0],
 ....:        "col3": [1.0, 2.0, 3.0, 4.0, 5.0]
 ....:    },
 ....:    columns=["col1", "col2", "col3"],
 ....: )
 ....: 

In [14]: df
Out[14]: 
 col1  col2  col3
0    a   1.0   1.0
1    a   2.0   2.0
2    b   3.0   3.0
3    b   NaN   4.0
4    a   5.0   5.0

In [15]: df2 = df.copy()

In [16]: df2.loc[0, 'col1'] = 'c'

In [17]: df2.loc[2, 'col3'] = 4.0

In [18]: df2
Out[18]: 
 col1  col2  col3
0    c   1.0   1.0
1    a   2.0   2.0
2    b   3.0   4.0
3    b   NaN   4.0
4    a   5.0   5.0

In [19]: df.compare(df2)
Out[19]: 
 col1       col3 
 self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

更多细节请参阅用户指南。

允许 `groupby` 键中有 NA

通过 groupby，我们在DataFrame.groupby()和Series.groupby()中添加了一个dropna关键字，以允许在组键中包含NA值。用户可以将dropna定义为False，如果他们想在 groupby 键中包含NA值。默认设置为True的dropna以保持向后兼容性（GH 3729)

In [20]: df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]

In [21]: df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])

In [22]: df_dropna
Out[22]: 
 a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

# Default ``dropna`` is set to True, which will exclude NaNs in keys
In [23]: df_dropna.groupby(by=["b"], dropna=True).sum()
Out[23]: 
 a  c
b 
1.0  2  3
2.0  2  5

# In order to allow NaN in keys, set ``dropna`` to False
In [24]: df_dropna.groupby(by=["b"], dropna=False).sum()
Out[24]: 
 a  c
b 
1.0  2  3
2.0  2  5
NaN  1  4

dropna参数的默认设置为True，这意味着NA不包括在组键中。

使用键排序

我们在DataFrame和Series排序方法中添加了一个key参数，包括DataFrame.sort_values()，DataFrame.sort_index()，Series.sort_values()和Series.sort_index()。key可以是任何可调用函数，它会按列应用于每个用于排序的列，然后执行排序（GH 27237）。有关更多信息，请参阅使用键进行排序和使用键进行索引排序。

In [25]: s = pd.Series(['C', 'a', 'B'])

In [26]: s
Out[26]: 
0    C
1    a
2    B
dtype: object

In [27]: s.sort_values()
Out[27]: 
2    B
0    C
1    a
dtype: object

注意这是按照大写字母优先排序的。如果我们应用Series.str.lower()方法，我们会得到

In [28]: s.sort_values(key=lambda x: x.str.lower())
Out[28]: 
1    a
2    B
0    C
dtype: object

当应用于DataFrame时，键会按列应用于所有列或指定了by时的子集，例如

In [29]: df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
 ....:                   'b': [1, 2, 3, 4, 5, 6]})
 ....: 

In [30]: df
Out[30]: 
 a  b
0  C  1
1  C  2
2  a  3
3  a  4
4  B  5
5  B  6

In [31]: df.sort_values(by=['a'], key=lambda col: col.str.lower())
Out[31]: 
 a  b
2  a  3
3  a  4
4  B  5
5  B  6
0  C  1
1  C  2

更多细节，请参阅DataFrame.sort_values()，Series.sort_values()和sort_index()中的示例和文档。

时间戳构造函数中的折叠参数支持

Timestamp: 现在支持关键字参数 fold，类似于父类 datetime.datetime 类的 PEP 495。它支持将 fold 作为初始化参数接受和从其他构造函数参数推断 fold（GH 25057, GH 31338）。支持仅限于 dateutil 时区，因为 pytz 不支持 fold。

例如：

In [32]: ts = pd.Timestamp("2019-10-27 01:30:00+00:00")

In [33]: ts.fold
Out[33]: 0

In [34]: ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
 ....:                  tz="dateutil/Europe/London", fold=1)
 ....: 

In [35]: ts
Out[35]: Timestamp('2019-10-27 01:30:00+0000', tz='dateutil//usr/share/zoneinfo/Europe/London')

有关使用 fold 的更多信息，请参阅用户指南中的 Fold subsection。

使用 to_datetime 解析带有不同时区的时区感知格式

例如：

In [36]: tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
 ....:           "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
 ....: 

In [37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
Out[37]: 
DatetimeIndex(['2010-01-01 11:00:00+00:00', '2010-01-01 13:00:00+00:00',
 '2010-01-01 09:00:00+00:00', '2010-01-01 08:00:00+00:00'],
 dtype='datetime64[ns, UTC]', freq=None)

In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
Out[37]:
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
       2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
      dtype='object')

Grouper 和 resample 现在支持 origin 和 offset 参数

Grouper 和 DataFrame.resample() 现在支持 origin 和 offset 参数。它让用户控制在哪个时间戳上调整分组。 (GH 31809)

分组的区间根据时间序列起始点的当天开始调整。这在频率是天数的倍数（如 30D）或者是将一天划分的频率（如 90s 或 1min）时效果很好。但是对于某些不符合这些条件的频率可能会产生不一致性。要更改此行为，您现在可以使用 origin 参数指定一个固定的时间戳。

现在已经弃用了两个参数（更多信息请参阅 DataFrame.resample() 的文档）：

base 应该被 offset 替换。
在重新采样后，loffset 应该被直接添加到索引 DataFrame 上。

使用 origin 的小例子：

In [38]: start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'

In [39]: middle = '2000-10-02 00:00:00'

In [40]: rng = pd.date_range(start, end, freq='7min')

In [41]: ts = pd.Series(np.arange(len(rng)) * 3, index=rng)

In [42]: ts
Out[42]: 
2000-10-01 23:30:00     0
2000-10-01 23:37:00     3
2000-10-01 23:44:00     6
2000-10-01 23:51:00     9
2000-10-01 23:58:00    12
2000-10-02 00:05:00    15
2000-10-02 00:12:00    18
2000-10-02 00:19:00    21
2000-10-02 00:26:00    24
Freq: 7min, dtype: int64

使用默认行为 'start_day' 进行重新采样（origin 是 2000-10-01 00:00:00）：

In [43]: ts.resample('17min').sum()
Out[43]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

In [44]: ts.resample('17min', origin='start_day').sum()
Out[44]: 
2000-10-01 23:14:00     0
2000-10-01 23:31:00     9
2000-10-01 23:48:00    21
2000-10-02 00:05:00    54
2000-10-02 00:22:00    24
Freq: 17min, dtype: int64

使用固定的 origin 进行重新采样：

In [45]: ts.resample('17min', origin='epoch').sum()
Out[45]: 
2000-10-01 23:18:00     0
2000-10-01 23:35:00    18
2000-10-01 23:52:00    27
2000-10-02 00:09:00    39
2000-10-02 00:26:00    24
Freq: 17min, dtype: int64

In [46]: ts.resample('17min', origin='2000-01-01').sum()
Out[46]: 
2000-10-01 23:24:00     3
2000-10-01 23:41:00    15
2000-10-01 23:58:00    45
2000-10-02 00:15:00    45
Freq: 17min, dtype: int64

如果需要，您可以使用参数 offset（Timedelta）调整区间，该区间将添加到默认的 origin。

完整示例，请参见：使用原点或偏移调整箱子的起始位置。

现在使用 fsspec 进行文件系统处理。

对于除本地文件系统和 HTTP(S)之外的其他文件系统的读写以及从中读取，将使用可选依赖项fsspec来分派操作（GH 33452）。这将为 S3 和 GCS 存储提供不变的功能，因为它们已经受支持，但还将支持其他几种存储实现，如Azure Datalake 和 Blob、SSH、FTP、dropbox 和 github。有关文档和功能，请参阅fsspec 文档。

与 S3 和 GCS 接口的现有功能不会受到此更改的影响，因为fsspec仍将引入与以前相同的软件包。

其他增强

与 matplotlib 3.3.0 兼容（GH 34850）。
IntegerArray.astype()现在支持datetime64 dtype（GH 32538）。
IntegerArray现在实现了sum操作（GH 33172）。
添加了pandas.errors.InvalidIndexError（GH 34570）。
添加了DataFrame.value_counts()（GH 5377）。
添加了一个pandas.api.indexers.FixedForwardWindowIndexer()类，以支持rolling操作期间的前瞻窗口。
添加了一个pandas.api.indexers.VariableOffsetWindowIndexer()类，以支持具有非固定偏移量的rolling操作（GH 34994）。
describe()现在包括一个datetime_is_numeric关键字，用于控制如何总结日期时间列（GH 30164，GH 34798）。
Styler现在可以更有效地呈现具有相同样式的多个单元格（GH 30876）。
highlight_null() 现在接受 subset 参数（GH 31345）
当直接写入 sqlite 连接时，DataFrame.to_sql() 现在支持 multi 方法（GH 29921）
pandas.errors.OptionError 现在在 pandas.errors 中公开（GH 27553）
添加了 api.extensions.ExtensionArray.argmax() 和 api.extensions.ExtensionArray.argmin()（GH 24382）
timedelta_range() 现在在传递 start、stop 和 periods 时将推断频率（GH 32377）
在IntervalIndex上的位置切片现在支持 step > 1 的切片（GH 31658）
Series.str 现在具有 fullmatch 方法，该方法将正则表达式与 Series 中每行的整个字符串进行匹配，类似于 re.fullmatch （GH 32806）
DataFrame.sample() 现在也允许将类数组和 BitGenerator 对象传递给 random_state 作为种子（GH 32503）
Index.union() 现在对于无法排序的 MultiIndex 对象将引发 RuntimeWarning。传递 sort=False 以抑制此警告（GH 33015）
添加了 Series.dt.isocalendar() 和 DatetimeIndex.isocalendar() ，根据 ISO 8601 日历计算返回年、周和日的 DataFrame（GH 33206，GH 34392）
DataFrame.to_feather() 方法现在支持在 pyarrow 0.17 中添加的其他关键字参数（例如，设置压缩）（GH 33422）。
cut() 现在将默认接受参数 ordered，默认为 ordered=True。如果 ordered=False 并且没有提供标签，则会引发错误（GH 33141）。
DataFrame.to_csv()、DataFrame.to_pickle() 和 DataFrame.to_json() 现在支持在使用 gzip 和 bz2 协议时传递压缩参数的字典。这可以用来设置自定义压缩级别，例如，df.to_csv(path, compression={'method': 'gzip', 'compresslevel': 1}（GH 33196）。
melt() 增加了一个 ignore_index 参数（默认为 True），如果设置为 False，则防止该方法丢弃索引（GH 17440）。
Series.update() 现在接受可以转换为 Series 的对象，例如 dict 和 list，与 DataFrame.update() 的行为相同（GH 33215）。
DataFrameGroupBy.transform() 和 DataFrameGroupBy.aggregate() 现在增加了 engine 和 engine_kwargs 参数，支持使用 Numba 执行函数（GH 32854, GH 33388）。
Resampler.interpolate()现在支持 SciPy 插值方法scipy.interpolate.CubicSpline作为cubicspline方法（GH 33670）。
DataFrameGroupBy和SeriesGroupBy现在实现了sample方法，用于在组内进行随机抽样（GH 31775）。
DataFrame.to_numpy()现在支持na_value关键字来控制输出数组中的 NA 标记（GH 33820）。
将api.extension.ExtensionArray.equals添加到扩展数组接口，类似于Series.equals()（GH 27081）。
最低支持的数据版本已从read_stata()和StataReader（GH 26667）中增加到 105。
to_stata()支持使用compression关键字参数进行压缩。压缩可以通过字符串或包含方法和传递给压缩库的任何其他参数的字典来推断或显式设置。压缩还添加到了低级 Stata 文件写入器StataWriter、StataWriter117和StataWriterUTF8中（GH 26599）。
HDFStore.put()现在接受track_times参数。此参数传递给PyTables的create_table方法（GH 32682）。
Series.plot()和DataFrame.plot()现在接受xlabel和ylabel参数，以在 x 和 y 轴上显示标签（GH 9093）。
使Rolling和Expanding可迭代（GH 11704）。
将 option_context 设为 contextlib.ContextDecorator，这使得它可以作为整个函数的装饰器使用（GH 34253)。
DataFrame.to_csv() 和 Series.to_csv() 现在接受一个 errors 参数（GH 22610)。
DataFrameGroupBy.groupby.transform() 现在允许 func 参数为 pad、backfill 和 cumcount（GH 31269)。
read_json() 现在接受一个 nrows 参数（GH 33916)。
DataFrame.hist()、Series.hist()、core.groupby.DataFrameGroupBy.hist() 和 core.groupby.SeriesGroupBy.hist() 获得了 legend 参数。设置为 True 以在直方图中显示图例（GH 6279)。
concat() 和 append() 现在保留扩展的数据类型，例如将可空整数列与 numpy 整数列合并不再导致对象数据类型，而是保留整数数据类型（GH 33607, GH 34339, GH 34095)。
read_gbq() 现在允许禁用进度条（GH 33360)。
read_gbq() 现在支持 pandas-gbq 中的 max_results 参数（GH 34639)。
DataFrame.cov() 和 Series.cov() 现在支持一个新参数 ddof，以支持与相应的 numpy 方法中的 delta 自由度一致（GH 34611)。
DataFrame.to_html() 和 DataFrame.to_string() 的 col_space 参数现在接受列表或字典以仅更改某些特定列的宽度 (GH 28917).
DataFrame.to_excel() 现在也可以写入 OpenOffice 电子表格 (.ods) 文件 (GH 27222)
explode() 现在接受 ignore_index 来重置索引，类似于 pd.concat() 或 DataFrame.sort_values() (GH 34932).
DataFrame.to_markdown() 和 Series.to_markdown() 现在接受 index 参数作为 tabulate 的 showindex 的别名 (GH 32667)
read_csv() 现在可以接受字符串值如“0”，“0.0”，“1”，“1.0”作为可转换为可空布尔类型的值 (GH 34859)
ExponentialMovingWindow 现在支持一个 times 参数，允许使用 times 中的时间戳间隔计算 mean (GH 34839)
DataFrame.agg() 和 Series.agg() 现在接受命名聚合以重新命名输出的列/索引。 (GH 26513)
compute.use_numba 现在作为一个配置选项存在，当可用时使用 numba 引擎 (GH 33966, GH 35374)
Series.plot() 现在支持不对称的误差条。以前，如果 Series.plot() 收到一个带有 yerr 和/或 xerr 的“2xN”数组的错误值，左/下值（第一行）会被镜像，而右/上值（第二行）会被忽略。现在，第一行代表左/下的错误值，第二行代表右/上的错误值。(GH 9536)

显著的错误修复

这些是可能具有显著行为更改的错误修复。

`MultiIndex.get_indexer` 正确解释 `method` 参数

这恢复了在 pandas 0.23.0 之前的行为，特别是将 MultiIndexes 视为元组列表，并且填充或回填是根据这些元组列表的顺序进行的 (GH 29896)。

例如，给定：

In [47]: df = pd.DataFrame({
 ....:    'a': [0, 0, 0, 0],
 ....:    'b': [0, 2, 3, 4],
 ....:    'c': ['A', 'B', 'C', 'D'],
 ....: }).set_index(['a', 'b'])
 ....: 

In [48]: mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

使用 mi_2 重新索引 df 并使用 method='backfill' 的差异可以在这里看到：

pandas >= 0.23, < 1.1.0：

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
 c
0 -1  A
 0  A
 1  D
 3  A
 4  A
 5  C

pandas <0.23, >= 1.1.0

In [49]: df.reindex(mi_2, method='backfill')
Out[49]: 
 c
0 -1    A
 0    A
 1    B
 3    C
 4    D
 5  NaN

使用 mi_2 重新索引 df 并使用 method='pad' 的差异可以在这里看到：

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
 c
0 -1  NaN
 0  NaN
 1    D
 3  NaN
 4    A
 5    C

pandas < 0.23, >= 1.1.0

In [50]: df.reindex(mi_2, method='pad')
Out[50]: 
 c
0 -1  NaN
 0    A
 1    A
 3    C
 4    D
 5    D

失败的基于标签的查找始终引发 KeyError

标签查找 series[key]、series.loc[key] 和 frame.loc[key] 以前会根据键的类型和 Index 的类型引发 KeyError 或 TypeError。现在这些一致引发 KeyError (GH 31867)

In [51]: ser1 = pd.Series(range(3), index=[0, 1, 2])

In [52]: ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

之前的行为：

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

新行为：

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

同样，DataFrame.at() 和 Series.at() 如果传递了不兼容的键，将引发 TypeError 而不是 ValueError，如果传递了缺失的键，则引发 KeyError，与 .loc[] 的行为相匹配（GH 31722）### 多索引上的整数查找失败引发 KeyError

使用具有整数类型第一级的 MultiIndex 进行整数索引时，当一个或多个整数键不存在于索引的第一级时，会错误地未引发 KeyError (GH 33539)

In [53]: idx = pd.Index(range(4))

In [54]: dti = pd.date_range("2000-01-03", periods=3)

In [55]: mi = pd.MultiIndex.from_product([idx, dti])

In [56]: ser = pd.Series(range(len(mi)), index=mi)

之前的行为：

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

新行为：

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

`DataFrame.merge()` 保留右侧框架的行顺序

DataFrame.merge() 在执行右连接时现在会保留右侧框架的行顺序（GH 27453）。

In [57]: left_df = pd.DataFrame({'animal': ['dog', 'pig'],
 ....:                       'max_speed': [40, 11]})
 ....: 

In [58]: right_df = pd.DataFrame({'animal': ['quetzal', 'pig'],
 ....:                        'max_speed': [80, 11]})
 ....: 

In [59]: left_df
Out[59]: 
 animal  max_speed
0    dog         40
1    pig         11

In [60]: right_df
Out[60]: 
 animal  max_speed
0  quetzal         80
1      pig         11

之前的行为：

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
 animal  max_speed
0      pig         11
1  quetzal         80

新的行为：

In [61]: left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
Out[61]: 
 animal  max_speed
0  quetzal         80
1      pig         11

在 DataFrame 中对多个列进行赋值时，某些列不存在

在 DataFrame 中对多个列进行赋值时，如果其中一些列不存在，则以前会将值分配给最后一列。现在，新列将以正确的值构建。 (GH 13658)

In [62]: df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})

In [63]: df
Out[63]: 
 a  b
0  0  3
1  1  4
2  2  5

之前的行为：

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
 a  b
0  1  1
1  1  1
2  1  1

新的行为：

In [64]: df[['a', 'c']] = 1

In [65]: df
Out[65]: 
 a  b  c
0  1  3  1
1  1  4  1
2  1  5  1 
```### 分组约简一致性

使用 `DataFrame.groupby()` 与 `as_index=True` 和聚合 `nunique` 会将分组列包含在结果的列中。现在，分组列仅出现在索引中，与其他约简一致。 ([GH 32579](https://github.com/pandas-dev/pandas/issues/32579))

```py
In [66]: df = pd.DataFrame({"a": ["x", "x", "y", "y"], "b": [1, 1, 2, 3]})

In [67]: df
Out[67]: 
 a  b
0  x  1
1  x  1
2  y  2
3  y  3

之前的行为：

In [3]: df.groupby("a", as_index=True).nunique()
Out[4]:
 a  b
a
x  1  1
y  1  2

新的行为：

In [68]: df.groupby("a", as_index=True).nunique()
Out[68]: 
 b
a 
x  1
y  2

使用 DataFrame.groupby() 与 as_index=False 以及函数 idxmax、idxmin、mad、nunique、sem、skew 或 std 会修改分组列。现在，分组列保持不变，与其他约简一致。 (GH 21090, GH 10355)

之前的行为：

In [3]: df.groupby("a", as_index=False).nunique()
Out[4]:
 a  b
0  1  1
1  1  2

新的行为：

In [69]: df.groupby("a", as_index=False).nunique()
Out[69]: 
 a  b
0  x  1
1  y  2

方法 DataFrameGroupBy.size() 以前会忽略 as_index=False。现在，分组列将作为列返回，使结果成为 DataFrame 而不是 Series。 (GH 32599)

之前的行为：

In [3]: df.groupby("a", as_index=False).size()
Out[4]:
a
x    2
y    2
dtype: int64

新的行为：

In [70]: df.groupby("a", as_index=False).size()
Out[70]: 
 a  size
0  x     2
1  y     2 
```### `DataFrameGroupby.agg()` 在重新标记列时丢失结果

以前 `DataFrameGroupby.agg()` 在 `as_index` 选项设置为 `False` 且结果列被重新标记时会丢失结果列。在这种情况下，结果值被替换为先前的索引 ([GH 32240](https://github.com/pandas-dev/pandas/issues/32240))。

```py
In [71]: df = pd.DataFrame({"key": ["x", "y", "z", "x", "y", "z"],
 ....:                   "val": [1.0, 0.8, 2.0, 3.0, 3.6, 0.75]})
 ....: 

In [72]: df
Out[72]: 
 key   val
0   x  1.00
1   y  0.80
2   z  2.00
3   x  3.00
4   y  3.60
5   z  0.75

之前的行为：

In [2]: grouped = df.groupby("key", as_index=False)
In [3]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))
In [4]: result
Out[4]:
 min_val
 0   x
 1   y
 2   z

新的行为：

In [73]: grouped = df.groupby("key", as_index=False)

In [74]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))

In [75]: result
Out[75]: 
 key  min_val
0   x     1.00
1   y     0.80
2   z     0.75 
```### `DataFrame` 上的 apply 和 applymap 只评估第一行/列一次

```py
In [76]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 6]})

In [77]: def func(row):
 ....:    print(row)
 ....:    return row
 ....:

之前的行为：

In [4]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[4]:
 a  b
0  1  3
1  2  6

新的行为：

In [78]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[78]: 
 a  b
0  1  3
1  2  6

`MultiIndex.get_indexer` 正确解释 `method` 参数

这恢复了MultiIndex.get_indexer()在method='backfill'或method='pad'下的行为，使其恢复到 pandas 0.23.0 之前的行为。特别地，MultiIndexes 被视为元组列表，并且根据这些元组列表的顺序进行填充或回填（GH 29896)。

例如，给定：

In [47]: df = pd.DataFrame({
 ....:    'a': [0, 0, 0, 0],
 ....:    'b': [0, 2, 3, 4],
 ....:    'c': ['A', 'B', 'C', 'D'],
 ....: }).set_index(['a', 'b'])
 ....: 

In [48]: mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

使用method='backfill'重新索引df与mi_2并且可以在这里看到使用method='backfill'的差异：

pandas >= 0.23, < 1.1.0:

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
 c
0 -1  A
 0  A
 1  D
 3  A
 4  A
 5  C

pandas <0.23, >= 1.1.0

In [49]: df.reindex(mi_2, method='backfill')
Out[49]: 
 c
0 -1    A
 0    A
 1    B
 3    C
 4    D
 5  NaN

使用method='pad'重新索引df与mi_2并且可以在这里看到使用method='pad'的差异：

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
 c
0 -1  NaN
 0  NaN
 1    D
 3  NaN
 4    A
 5    C

pandas < 0.23, >= 1.1.0

In [50]: df.reindex(mi_2, method='pad')
Out[50]: 
 c
0 -1  NaN
 0    A
 1    A
 3    C
 4    D
 5    D

标签查找失败总是引发 KeyError

标签查找 series[key]、series.loc[key] 和 frame.loc[key] 以前会根据键的类型和Index的类型引发KeyError或TypeError。现在这些一致地引发KeyError (GH 31867)

In [51]: ser1 = pd.Series(range(3), index=[0, 1, 2])

In [52]: ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

先前的行为:

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

新行为:

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

类似地，如果传递了不兼容的键，DataFrame.at()和Series.at()将引发TypeError，而不是ValueError，如果传递了缺失的键，则引发KeyError，与.loc[]的行为相匹配（GH 31722)

多级索引上的整数查找失败引发 KeyError

使用具有整数类型的第一级别的MultiIndex进行整数索引时，当这些整数键中的一个或多个不存在于索引的第一级别时，会错误地未引发KeyError (GH 33539)

In [53]: idx = pd.Index(range(4))

In [54]: dti = pd.date_range("2000-01-03", periods=3)

In [55]: mi = pd.MultiIndex.from_product([idx, dti])

In [56]: ser = pd.Series(range(len(mi)), index=mi)

先前的行为:

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

新行为:

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

`DataFrame.merge()` 保留了右侧框架的行顺序

DataFrame.merge() 现在在执行右连接时保留了右侧框架的行顺序 (GH 27453)

In [57]: left_df = pd.DataFrame({'animal': ['dog', 'pig'],
 ....:                       'max_speed': [40, 11]})
 ....: 

In [58]: right_df = pd.DataFrame({'animal': ['quetzal', 'pig'],
 ....:                        'max_speed': [80, 11]})
 ....: 

In [59]: left_df
Out[59]: 
 animal  max_speed
0    dog         40
1    pig         11

In [60]: right_df
Out[60]: 
 animal  max_speed
0  quetzal         80
1      pig         11

先前的行为:

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
 animal  max_speed
0      pig         11
1  quetzal         80

新行为:

In [61]: left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
Out[61]: 
 animal  max_speed
0  quetzal         80
1      pig         11

当一些列不存在时，对 DataFrame 的多列进行赋值

当对DataFrame的多列进行赋值时，如果某些列不存在，以前会将值分配给最后一列。现在，将会构建具有正确值的新列。(GH 13658)

In [62]: df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})

In [63]: df
Out[63]: 
 a  b
0  0  3
1  1  4
2  2  5

先前的行为:

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
 a  b
0  1  1
1  1  1
2  1  1

新行为:

In [64]: df[['a', 'c']] = 1

In [65]: df
Out[65]: 
 a  b  c
0  1  3  1
1  1  4  1
2  1  5  1

分组缩减一致性

使用带有 as_index=True 和聚合 nunique 的 DataFrame.groupby() 会在结果的列中包含分组列。现在，分组列仅出现在索引中，与其他缩减一致。（GH 32579）

In [66]: df = pd.DataFrame({"a": ["x", "x", "y", "y"], "b": [1, 1, 2, 3]})

In [67]: df
Out[67]: 
 a  b
0  x  1
1  x  1
2  y  2
3  y  3

之前的行为：

In [3]: df.groupby("a", as_index=True).nunique()
Out[4]:
 a  b
a
x  1  1
y  1  2

新行为：

In [68]: df.groupby("a", as_index=True).nunique()
Out[68]: 
 b
a 
x  1
y  2

使用带有 as_index=False 和函数 idxmax、idxmin、mad、nunique、sem、skew 或 std 的 DataFrame.groupby() 会修改分组列。现在，分组列保持不变，与其他缩减一致。（GH 21090、GH 10355）

之前的行为：

In [3]: df.groupby("a", as_index=False).nunique()
Out[4]:
 a  b
0  1  1
1  1  2

新行为：

In [69]: df.groupby("a", as_index=False).nunique()
Out[69]: 
 a  b
0  x  1
1  y  2

以前的方法 DataFrameGroupBy.size() 会忽略 as_index=False。现在，分组列将作为列返回，使结果成为 DataFrame 而不是 Series。（GH 32599）

之前的行为：

In [3]: df.groupby("a", as_index=False).size()
Out[4]:
a
x    2
y    2
dtype: int64

新行为：

In [70]: df.groupby("a", as_index=False).size()
Out[70]: 
 a  size
0  x     2
1  y     2

当重新标记列时，`DataFrameGroupby.agg()` 在 `as_index=False` 时丢失结果

以前，当 as_index 选项设置为 False 且结果列被重新标记时，DataFrameGroupby.agg() 会丢失结果列。在这种情况下，结果值被替换为先前的索引（GH 32240）。

In [71]: df = pd.DataFrame({"key": ["x", "y", "z", "x", "y", "z"],
 ....:                   "val": [1.0, 0.8, 2.0, 3.0, 3.6, 0.75]})
 ....: 

In [72]: df
Out[72]: 
 key   val
0   x  1.00
1   y  0.80
2   z  2.00
3   x  3.00
4   y  3.60
5   z  0.75

之前的行为：

In [2]: grouped = df.groupby("key", as_index=False)
In [3]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))
In [4]: result
Out[4]:
 min_val
 0   x
 1   y
 2   z

新行为：

In [73]: grouped = df.groupby("key", as_index=False)

In [74]: result = grouped.agg(min_val=pd.NamedAgg(column="val", aggfunc="min"))

In [75]: result
Out[75]: 
 key  min_val
0   x     1.00
1   y     0.80
2   z     0.75

在 `DataFrame` 上的 apply 和 applymap 仅评估第一行/列一次

In [76]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 6]})

In [77]: def func(row):
 ....:    print(row)
 ....:    return row
 ....:

之前的行为：

In [4]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[4]:
 a  b
0  1  3
1  2  6

新行为：

In [78]: df.apply(func, axis=1)
a    1
b    3
Name: 0, dtype: int64
a    2
b    6
Name: 1, dtype: int64
Out[78]: 
 a  b
0  1  3
1  2  6

不兼容的 API 更改

添加 `check_freq` 参数到 `testing.assert_frame_equal` 和 `testing.assert_series_equal`

check_freq 参数被添加到testing.assert_frame_equal()和testing.assert_series_equal()中，在 pandas 1.1.0 中默认为True。testing.assert_frame_equal()和testing.assert_series_equal()现在在索引的频率不相同时引发AssertionError。在 pandas 1.1.0 之前，不检查索引频率。

增加了依赖关系的最低版本

一些最低支持版本的依赖关系已更新（GH 33718, GH 29766, GH 29723, pytables >= 3.4.3）。如果安装了，现在我们需要：

Package	Minimum Version	Required	Changed
numpy	1.15.4	X	X
pytz	2015.4	X
python-dateutil	2.7.3	X	X
bottleneck	1.2.1
numexpr	2.6.2
pytest (dev)	4.0.2

对于可选库，通常建议使用最新版本。以下表格列出了在 pandas 开发过程中当前正在测试的每个库的最低版本。低于最低测试版本的可选库可能仍然可以工作，但不被认为是受支持的。

Package	Minimum Version	Changed
beautifulsoup4	4.6.0
fastparquet	0.3.2
fsspec	0.7.4
gcsfs	0.6.0	X
lxml	3.8.0
matplotlib	2.2.2
numba	0.46.0
openpyxl	2.5.7
pyarrow	0.13.0
pymysql	0.7.1
pytables	3.4.3	X
s3fs	0.4.0	X
scipy	1.2.0	X
sqlalchemy	1.1.4
xarray	0.8.2
xlrd	1.1.0
xlsxwriter	0.9.8
xlwt	1.2.0
pandas-gbq	1.2.0	X

更多信息请参见依赖关系和可选依赖关系。

开发变更

Cython 的最低版本现在是最新的 bug 修复版本 (0.29.16)（GH 33334）。

添加了 `check_freq` 参数到 `testing.assert_frame_equal` 和 `testing.assert_series_equal`

在 pandas 1.1.0 中，将 check_freq 参数添加到 testing.assert_frame_equal() 和 testing.assert_series_equal()，默认值为 True。testing.assert_frame_equal() 和 testing.assert_series_equal() 现在如果索引频率不相同时会引发 AssertionError。在 pandas 1.1.0 之前，不会检查索引频率。

提高了依赖项的最低版本

更新了一些依赖项的最低支持版本（GH 33718, GH 29766, GH 29723，pytables >= 3.4.3）。如果安装了，现在我们需要：

包	最低版本	需要	更改
numpy	1.15.4	X	X
pytz	2015.4	X
python-dateutil	2.7.3	X	X
bottleneck	1.2.1
numexpr	2.6.2
pytest（开发）	4.0.2

对于可选库，通常建议使用最新版本。以下表格列出了在 pandas 开发过程中当前正在测试的每个库的最低版本。低于最低测试版本的可选库可能仍可使用，但不被视为受支持。

包	最低版本	更改
beautifulsoup4	4.6.0
fastparquet	0.3.2
fsspec	0.7.4
gcsfs	0.6.0	X
lxml	3.8.0
matplotlib	2.2.2
numba	0.46.0
openpyxl	2.5.7
pyarrow	0.13.0
pymysql	0.7.1
pytables	3.4.3	X
s3fs	0.4.0	X
scipy	1.2.0	X
sqlalchemy	1.1.4
xarray	0.8.2
xlrd	1.1.0
xlsxwriter	0.9.8
xlwt	1.2.0
pandas-gbq	1.2.0	X

有关更多信息，请参见 Dependencies 和 Optional dependencies。

开发变更

现在 Cython 的最小版本是最新的 bug 修复版本（0.29.16）(GH 33334)。

废弃功能

使用包含切片的单项列表（例如 ser[[slice(0, 4)]]）对 Series 进行查找已被弃用，并将在将来的版本中引发错误。请将列表转换为元组，或直接传递切片（GH 31333）。
调用带有 numeric_only=None 参数的 DataFrame.mean() 和 DataFrame.median() 方法将在将来的版本中包含 datetime64 和 datetime64tz 列（GH 29941）。
使用 .loc 设置值时使用位置切片已被弃用，并将在将来的版本中引发错误。请改用带有标签的 .loc 或带有位置的 .iloc 代替（GH 31840）。
DataFrame.to_dict() 已不再接受 orient 的缩写，并将在将来的版本中引发错误（GH 32515）。
Categorical.to_dense() 已被弃用，并将在将来的版本中删除，改用 np.asarray(cat) 代替（GH 32639）。
SingleBlockManager 构造函数中的 fastpath 关键字已被弃用，并将在将来的版本中删除（GH 33092）。
在 pandas.merge() 中以 set 形式提供 suffixes 已被弃用。请改为提供元组（GH 33740, GH 34741）。
使用多维索引器（例如 [:, None]）索引 Series 返回 ndarray 现在会引发 FutureWarning。请在索引之前转换为 NumPy 数组（GH 27837）。
Index.is_mixed() 已被弃用，并将在将来的版本中删除，直接检查 index.inferred_type 即可（GH 32922）。
将任何参数传递给 read_html() 除第一个参数之外的方式已被弃用。所有其他参数应作为关键字参数给出（GH 27573）。
将除 path_or_buf（第一个参数）之外的任何参数传递给 read_json() 的方式已被弃用。所有其他参数应作为关键字参数给出（GH 27573）。
将任何参数传递给read_excel()除了前两个作为位置参数已被弃用。所有其他参数应作为关键字参数给出（GH 27573）。
pandas.api.types.is_categorical()已被弃用，并将在将来的版本中移除；使用pandas.api.types.is_categorical_dtype()替代（GH 33385）
Index.get_value()已被弃用，并将在将来的版本中移除（GH 19728）
Series.dt.week()和Series.dt.weekofyear()已被弃用，并将在将来的版本中移除，使用Series.dt.isocalendar().week()替代（GH 33595）
DatetimeIndex.week()和DatetimeIndex.weekofyear已被弃用，并将在将来的版本中移除，使用DatetimeIndex.isocalendar().week替代（GH 33595）
DatetimeArray.week()和DatetimeArray.weekofyear已被弃用，并将在将来的版本中移除，使用DatetimeArray.isocalendar().week替代（GH 33595）
DateOffset.__call__()已被弃用，并将在将来的版本中移除，使用offset + other替代（GH 34171）
apply_index()已被弃用，并将在将来的版本中移除。使用offset + other替代（GH 34580）
DataFrame.tshift()和Series.tshift()已被弃用，并将在将来的版本中移除，使用DataFrame.shift()和Series.shift()替代（GH 11631）
使用浮点键索引Index对象已被弃用，并将在将来引发IndexError。您可以手动转换为整数键（GH 34191）。
groupby()中的squeeze关键字已被弃用，并将在将来的版本中移除（GH 32380）
Period.to_timestamp() 中的 tz 关键字已被弃用，并将在将来的版本中移除；请使用 per.to_timestamp(...).tz_localize(tz) 代替（GH 34522)
DatetimeIndex.to_perioddelta() 已被弃用，并将在将来的版本中移除。请使用 index - index.to_period(freq).to_timestamp() 代替（GH 34853)
DataFrame.melt() 接受已存在的 value_name 将被弃用，并将在将来的版本中移除（GH 34731)
DataFrame.expanding() 函数中的 center 关键字已被弃用，并将在将来的版本中移除（GH 20647)

性能改进

Timedelta 构造函数的性能改进（GH 30543)
Timestamp 构造函数的性能改进（GH 30543)
在 DataFrame 和 Series 之间的弹性算术操作中，使用 axis=0 的性能改进（GH 31296)
在 DataFrame 和 Series 之间的算术操作中，使用 axis=1 的性能改进（GH 33600)
内部索引方法 _shallow_copy() 现在将缓存的属性复制到新索引中，避免在新索引上再次创建这些属性。这可以加快许多依赖于创建现有索引副本的操作的速度（GH 28584, GH 32640, GH 32669)
使用DataFrame从scipy.sparse矩阵创建稀疏值时，通过DataFrame.sparse.from_spmatrix()构造函数显著提高性能（GH 32821, GH 32825, GH 32826, GH 32856, GH 32858)。
Groupby.first()和Groupby.last()的分组方法性能改进（GH 34178）。
对可空（整数和布尔值）dtype 进行factorize()的性能改进（GH 33064）。
构造Categorical对象时的性能改进（GH 33921）。
修复了pandas.qcut()和pandas.cut()中的性能回归（GH 33921）。
对可空（整数和布尔值）dtype 进行缩减（sum、prod、min、max）的性能改进（GH 30982, GH 33261, GH 33442）。
两个DataFrame对象之间的算术操作性能改进（GH 32779）。
RollingGroupby中的性能改进（GH 34052）。
对MultiIndex进行算术操作（sub、add、mul、div）的性能改进（GH 34297）。
当bool_indexer为list时，DataFrame[bool_indexer]中的性能改进（GH 33924）。
io.formats.style.Styler.render()的性能显著提高，使用各种方式添加样式，例如io.formats.style.Styler.apply()、io.formats.style.Styler.applymap()或io.formats.style.Styler.bar()（GH 19917）

Bug 修复

Datetimelike

将除int64之外的整数 dtype 传递给np.array(period_index, dtype=...)现在将引发TypeError而不是错误地使用int64（GH 32255）
Series.to_timestamp() 现在如果轴不是 PeriodIndex 将引发 TypeError。之前会引发 AttributeError（GH 33327）
Series.to_period() 现在如果轴不是 DatetimeIndex 将引发 TypeError。之前会引发 AttributeError（GH 33327）
Period 不再接受元组作为 freq 参数（GH 34658）
在构建从模糊的纪元时间的 Timestamp 并再次调用构造函数时，Timestamp.value() 属性会发生变化的错误（GH 24329）
DatetimeArray.searchsorted()、TimedeltaArray.searchsorted()、PeriodArray.searchsorted() 无法识别非 pandas 标量，而是错误地��发 ValueError 而不是 TypeError（GH 30950)
在构建 Timestamp 时存在错误，如果使用 dateutil 时区，在冬季到夏季的夏令时切换前不到 128 纳秒会导致不存在的时间（GH 31043）
在具有微秒频率的 Period.to_timestamp()、Period.start_time() 中返回的时间戳比正确时间早一纳秒的错误（GH 31475）
当年、月或日缺失时，Timestamp 会引发令人困惑的错误消息（GH 31200）
在构造函数中，DatetimeIndex 不正确地接受 bool 类型的输入（GH 32668）
DatetimeIndex.searchsorted() 中存在 bug，不接受 list 或 Series 作为参数（GH 32762）
在传递 Series 字符串时，PeriodIndex() 报错的 bug（GH 26109)
在Timestamp算术运算中，当使用 timedelta64 类型的 np.ndarray 进行加法或减法时存在 bug（GH 33296）
DatetimeIndex.to_period() 中存在 bug，在没有参数调用时未推断频率（GH 33358）
DatetimeIndex.tz_localize() 中存在 bug，在某些情况下错误地保留了 freq，原始 freq 不再有效（GH 30511）
DatetimeIndex.intersection() 中存在 bug，在某些情况下丢失 freq 和时区信息（GH 33604）
DatetimeIndex.get_indexer() 中存在 bug，对于混合日期时间目标，可能返回不正确的输出（GH 33741）
在DatetimeIndex中存在的一个 bug，即在某些类型的 DateOffset 对象上进行加法和减法时，错误地保留了无效的 freq 属性（GH 33779）
在DatetimeIndex中存在的一个 bug，即在一个索引上设置 freq 属性可能会悄悄地改变另一个查看相同数据的索引的 freq 属性（GH 33552）
DataFrame.min() 和 DataFrame.max() 在使用空的 pd.to_datetime() 初始化的对象上调用时，与 Series.min() 和 Series.max() 的结果不一致
DatetimeIndex.intersection() 和 TimedeltaIndex.intersection() 中存在 bug，结果没有正确的 name 属性（GH 33904）
DatetimeArray.__setitem__()、TimedeltaArray.__setitem__()、PeriodArray.__setitem__() 中的 Bug 错误地允许将具有 int64 数据类型的值静默转换（GH 33717)
在一些情况下，TimedeltaIndex 从 Period 中减去时，错误地引发 TypeError，应该成功，并在一些情况下，应该引发 TypeError 的地方引发 IncompatibleFrequency (GH 33883)
在使用只读 NumPy 数组构造 Series 或 Index 时，如果非纳秒分辨率在时间戳范围内，会将其转换为对象数据类型而不是在时间戳边界内强制转换为 datetime64[ns] 数据类型的 Bug (GH 34843)
Period、date_range()、period_range()、pd.tseries.frequencies.to_offset() 中的 freq 关键字不再允许元组，而是改为传递字符串 (GH 34703)
在将包含标量 tz-aware Timestamp 的 Series 追加到空 DataFrame 时，DataFrame.append() 中的 Bug 会导致对象列而不是 datetime64[ns, tz] 数据类型 (GH 35038)
OutOfBoundsDatetime 在时间戳超出实现边界时会提供改进的错误消息。 (GH 32967)
当未定义任何规则时，AbstractHolidayCalendar.holidays() 中存在 Bug (GH 31415)
在将 Tick 与类似 timedelta 的对象进行比较时，会引发 TypeError 的 Bug (GH 34088)
在将 Tick 乘以浮点数时，会引发 TypeError 的 Bug (GH 34486)

Timedelta

在使用高精度整数构造 Timedelta 时，会将 Timedelta 组件四舍五入的 Bug (GH 31354)
将np.nan或None除以Timedelta的错误，错误地返回NaT（GH 31869)
Timedelta现在将µs识别为微秒的标识符（GH 32899)
当纳秒非零时，Timedelta的字符串表示现在包括纳秒（GH 9309)
将Timedelta对象与具有timedelta64 dtype 的np.ndarray进行比较的错误，错误地将所有条目视为不相等（GH 33441)
timedelta_range()中的错误，在边缘情况下产生了一个额外的点（GH 30353，GH 33498)
DataFrame.resample()中的错误，在边缘情况下产生了一个额外的点（GH 30353，GH 13022，GH 33498)
DataFrame.resample()中的错误，在处理时间差时忽略了loffset参数（GH 7687，GH 33498)
Timedelta和pandas.to_timedelta()中的错误，对于字符串输入忽略了unit参数（GH 12136)

时区

to_datetime()中的错误，使用infer_datetime_format=True时，时区名称（例如UTC）将无法正确解析（GH 33133)

数值

DataFrame.floordiv()中的错误，使用axis=0时，不像Series.floordiv()那样将除零视为错误（GH 31271)
to_numeric()中使用字符串参数"uint64"和errors="coerce"时静默失败（GH 32394)
to_numeric()中的错误，使用downcast="unsigned"时对空数据失败（GH 32493)
DataFrame.mean()中numeric_only=False且datetime64类型或PeriodDtype列时错误地引发TypeError（GH 32426)
DataFrame.count()中level="foo"和索引级别"foo"包含 NaN 时导致分段错误（GH 21824)
DataFrame.diff()中axis=1返回混合数据类型时结果不正确（GH 32995)
DataFrame.corr()和DataFrame.cov()在处理带有pandas.NA的可空整数列时出现错误（GH 33803)
在非重叠列具有重复标签的DataFrame对象之间的算术操作中导致无限循环的错误（GH 35194)
DataFrame和Series之间的加法和减法中，对象类型对象和datetime64类型对象之间存在错误（GH 33824)
Index.difference()中的错误导致比较Float64Index和对象Index时结果不正确（GH 35217)
DataFrame缩减操作（例如df.min()，df.max()）与ExtensionArray数据类型对象之间存在错误（GH 34520, GH 32651)
Series.interpolate()和DataFrame.interpolate()现在在limit_direction为'forward'或'both'且method为'backfill'或'bfill'时或limit_direction为'backward'或'both'且method为'pad'或'ffill'时引发 ValueError（GH 34746）

转换

在从具有大端datetime64类型的 NumPy 数组构造的Series中存在的错误（GH 29684)
在使用大纳秒关键字值构造Timedelta时出现的错误（GH 32402）
在DataFrame构造中存在的问题，其中集合会被复制而不是引发错误（GH 32582）
DataFrame构造函数不再接受DataFrame对象的列表。由于 NumPy 的更改，DataFrame对象现在被一致地视为 2D 对象，因此DataFrame对象的列表被视为 3D 对象，不再被DataFrame构造函数接受（GH 32289）。
在使用列表初始化框架并为MultiIndex分配嵌套列表的columns时DataFrame中存在的错误（GH 32173)
在创建新索引时，对于无效列表构造的改进错误消息（GH 35190）

字符串

在将“string”类型数据转换为可空整数类型时，astype()方法中存在的错误（GH 32450）。
修复了在使用StringDtype类型的StringArray或Series的min或max时会引发错误的问题。(GH 31746)
修复了 Series.str.cat() 在其他类型为 Index 时返回 NaN 输出的错误 (GH 33425)
pandas.api.dtypes.is_string_dtype() 不再错误地将分类系列识别为字符串。

区间

修复了 IntervalArray 在设置值时错误地允许更改基础数据的错误 (GH 32782)

索引

DataFrame.xs() 现在在提供 level 关键字且轴不是 MultiIndex 时会引发 TypeError。之前会引发 AttributeError (GH 33610)
修复了在 DatetimeIndex 上进行切片时，在年、季度或月末附近删除高分辨率索引的部分时间戳的错误 (GH 31064)
修复了 PeriodIndex.get_loc() ��处理高分辨率字符串时与 PeriodIndex.get_value() 不同的错误 (GH 31172)
修复了 Series.at() 和 DataFrame.at() 在查找 Float64Index 中的整数时未匹配 .loc 行为的错误 (GH 31329)
修复了 PeriodIndex.is_monotonic() 在包含前导 NaT 条目时错误地返回 True 的错误 (GH 31437)
修复了 DatetimeIndex.get_loc() 在使用转换后的整数键而不是用户传递的键时引发 KeyError 的错误 (GH 31425)
修复了 Series.xs() 在某些对象数据类型情况下错误地返回 Timestamp 而不是 datetime64 的错误 (GH 31630)
修复了 DataFrame.iat() 在某些对象数据类型情况下错误地返回 Timestamp 而不是 datetime 的错误 (GH 32809)
修复了 DataFrame.at() 在列或索引非唯一时的错误 (GH 33041)
在 Series.loc() 和 DataFrame.loc() 中存在的错误，在对象类型的 Index 上使用整数键进行索引，而该索引不是全为整数（GH 31905）
在 DataFrame.iloc.__setitem__() 中存在的错误，对于具有重复列的 DataFrame，错误地为所有匹配列设置值（GH 15686, GH 22036）
DataFrame.loc() 和 Series.loc() 中存在的错误，使用 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 错误地允许查找不匹配的日期时间类似数据类型（GH 32650）
在 Series.__getitem__() 中存在的错误，使用非标准标量进行索引，例如 np.dtype（GH 32684）
在 Index 构造函数中存在的错误，对于 NumPy 标量引发了一个无用的错误消息（GH 33017）
在 DataFrame.lookup() 中存在的错误，当 frame.index 或 frame.columns 不唯一时，错误地引发 AttributeError；现在将引发 ValueError，并提供有用的错误消息（GH 33041）
在 Interval 中存在的错误，其中一个 Timedelta 无法加或减去一个 Timestamp 区间（GH 32023）
在 DataFrame.copy() 中存在的错误，在复制后未使 _item_cache 失效，导致复制后的值更新未反映出来（GH 31784）
在提供datetime64[ns, tz]值时，DataFrame.loc()和Series.loc()中的回归错误已修复，不再在提供该值时引发错误（GH 32395)
在使用整数键和具有前导整数级别的MultiIndex时，Series.__getitem__()存在 bug，如果键不在第一级别中，则不会引发KeyError（GH 33355)
在使用ExtensionDtype（例如df.iloc[:, :1]）对单列DataFrame进行切片时，DataFrame.iloc()存在一个 bug，返回了无效结果（GH 32957)
在向空Series中设置元素时，DatetimeIndex.insert()和TimedeltaIndex.insert()存在 bug，导致索引freq丢失（GH 33573)
在使用IntervalIndex和整数键的列表时，Series.__setitem__()存在 bug（GH 33473)
在使用np.ndarray、Index、Series索引器时，Series.__getitem__()存在 bug，允许缺失标签，但不允许list，现在这些都会引发KeyError（GH 33646)
在假定索引为单调递增时，DataFrame.truncate()和Series.truncate()存在 bug，导致问题（GH 33756)
在使用表示日期时间的字符串列表进行索引时，在DatetimeIndex或PeriodIndex上失败（GH 11278)
在与MultiIndex一起使用时，Series.at()存在 bug，对有效输入会引发异常（GH 26989)
当使用值字典修改具有int类型的列时，DataFrame.loc()存在缺陷，将其更改为float类型 (GH 34573)
当在与MultiIndex一起使用时，Series.loc()存在缺陷，当访问None值时会引发IndexingError（GH 34318）
当空DataFrame或包含MultiIndex的空Series使用DataFrame.reset_index()和Series.reset_index()时，不会保留数据类型 (GH 19602)
当在具有NaT条目的DatetimeIndex上使用time键对Series和DataFrame进行索引时存在缺陷 (GH 35114)

缺失

在空Series上调用fillna()现在会正确返回一个浅复制的对象。现在的行为与Index、DataFrame和非空Series一致 (GH 32543)。
当参数to_replace的类型为字典/列表并且用于包含<NA>的Series时，Series.replace()存在缺陷，会引发TypeError。该方法现在通过在进行替换时忽略<NA>值来处理此问题（GH 32621）
在使用可空布尔 dtype 和 skipna=False 时，any() 和 all() 错误地对所有 False 或所有 True 值返回 <NA>（GH 33253）
在使用 method=akima 进行插值时澄清了文档。der 参数必须是标量或 None（GH 33426）
DataFrame.interpolate() 现在使用正确的轴约定。以前沿着列进行插值导致沿着索引进行插值，反之亦然。此外，使用 pad、ffill、bfill 和 backfill 方法进行插值与使用 DataFrame.fillna() 中的这些方法是相同的（GH 12918、GH 29146）
当在具有字符串类型列名的 DataFrame 上调用 DataFrame.interpolate() 时，存在 bug，会引发 ValueError。该方法现在与列名的类型无关（GH 33956）
现在可以将 NA 传递给使用格式规范的格式字符串。例如，"{:.1f}".format(pd.NA) 以前会引发 ValueError，但现在将返回字符串 "<NA>"（GH 34740）
Series.map() 中存在 bug，无法在无效的 na_action 上引发异常（GH 32815）

MultiIndex

如果轴不是 MultiIndex，DataFrame.swaplevels() 现在会引发 TypeError。以前会引发 AttributeError（GH 31126）
在与 MultiIndex 一起使用时，Dataframe.loc() 中存在一个 bug。返回的值与给定的输入顺序不同（GH 22797）

In [79]: df = pd.DataFrame(np.arange(4),
 ....:                  index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
 ....: 

# Rows are now ordered as the requested keys
In [80]: df.loc[(['b', 'a'], [2, 1]), :]
Out[80]: 
 0
b 2  3
 1  2
a 2  1
 1  0

当 sort=False 时，MultiIndex.intersection() 中存在一个 bug，无法保证保留顺序（GH 31325）
在DataFrame.truncate()中存在 bug，会删除MultiIndex的名称。(GH 34564)

In [81]: left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])

In [82]: right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])

# Common elements are now guaranteed to be ordered by the left side
In [83]: left.intersection(right, sort=False)
Out[83]: 
MultiIndex([('b', 2),
 ('a', 1)],
 )

在不指定具有不同列的级别的两个MultiIndex连接时存在 bug。Return-indexers参数被忽略。(GH 34074)

IO

将set作为names参数传递给pandas.read_csv()、pandas.read_table()或pandas.read_fwf()会引发ValueError: Names should be an ordered collection.（GH 34946）
当display.precision为零时打印输出存在 bug。(GH 20359)
在read_json()中存在 bug，当 json 包含大数字字符串时会发生整数溢出。(GH 30320)
当header和prefix参数都不是None时，read_csv()现在会引发ValueError。(GH 27394)
在DataFrame.to_json()中存在 bug，当path_or_buf是 S3 URI 时会引发NotFoundError。(GH 28375)
在DataFrame.to_parquet()中存在 bug，覆盖了coerce_timestamps的 pyarrow 默认值；遵循 pyarrow 的默认值允许使用version="2.0"写入纳秒时间戳。(GH 31652)
在使用sep=None与comment关键字结合使用时，read_csv()会引发TypeError。(GH 31396)
在HDFStore中存在 bug，导致在 Python 3 中从 Python 2 中写入的固定格式的DataFrame读取时将datetime64列的 dtype 设置为int64。(GH 31750)
read_sas() 现在可以处理大于 Timestamp.max 的日期和日期时间，并将它们返回为 datetime.datetime 对象 (GH 20927)
DataFrame.to_json() 存在一个错误，Timedelta 对象在使用 date_format="iso" 时无法正确序列化（GH 28256）
当 parse_dates 中的列名在 Dataframe 中缺失时，read_csv() 会引发 ValueError (GH 31251)
当 UTF-8 字符串带有高代理项时，read_excel() 会导致分段错误 (GH 23809)
在空文件时，read_csv() 会导致文件描述符泄漏 (GH 31488)
read_csv() 在标题和数据行之间存在空行时，导致段错误（segfault） (GH 28071)
在权限问题上，read_csv() 会引发一个误导性的异常 (GH 23784)
当 header=None 且存在两列额外的数据时，read_csv() 会引发 IndexError
在从 Google Cloud Storage 读取文件时，read_sas() 会引发 AttributeError (GH 33069)
DataFrame.to_sql() 中存在一个错误，在保存超出范围的日期时会引发 AttributeError (GH 26761)
read_excel() 在处理 OpenDocument 文本单元格中存在多个嵌套空格时无法正确处理。(GH 32207)
使用 read_json() 读取 list 类型的布尔值到 Series 时引发 TypeError（GH 31464）
pandas.io.json.json_normalize() 中的错误是 record_path 指定的位置未指向数组。（GH 26284）
在加载不受支持的 HDF 文件时，pandas.read_hdf() 提供更明确的错误消息（GH 9539）
当读取 s3 或 http 文件路径时，read_feather() 中的错误引发 ArrowIOError（GH 29055）
to_excel() 中的错误无法处理列名 render，并引发 KeyError（GH 34331）
当 SQL 语句包含 % 字符且没有参数时，execute() 中的错误对于某些 DB-API 驱动程序引发 ProgrammingError（GH 34211）
使用迭代器读取数据时，StataReader() 中的错误导致分类变量具有不同的数据类型（GH 31544）
HDFStore.keys() 现在具有可选的 include 参数，允许检索所有本地 HDF5 表名（GH 29916）
read_csv() 和 read_table() 引发的 TypeError 异常在传递意外关键字参数时显示为 parser_f（GH 25648）
对于 ODS 文件，read_excel() 中的错误会删除 0.0 值（GH 27222）
ujson.encode() 中的错误在数字大于 sys.maxsize 时引发 OverflowError（GH 34395）
当设置了 min_itemsize 参数时，HDFStore.append_to_multiple() 中的错误会引发 ValueError（GH 11238）
当输入中未指定 data_columns 中的 column 参数时，create_table() 中的错误会引发错误（GH 28156）
现在，read_json()可以从文件 URL 读取行分隔的 json 文件，同时设置了lines和chunksize。
在使用 MySQL 读取带有-np.inf条目的 DataFrame 时，DataFrame.to_sql()中的 Bug 现在会更明确地引发ValueError（GH 34431）
读取函数中大写文件扩展名未被解压的 Bug（GH 35164）
当header=None且index_col作为list给出时，read_excel()中的 Bug 会引发TypeError（GH 31783）
当在MultiIndex中的头部使用日期时间值时，read_excel()中的 Bug（GH 34748）
read_excel()不再接受**kwds参数。这意味着传递关键字参数chunksize现在会引发TypeError（之前引发NotImplementedError），而传递关键字参数encoding现在会引发TypeError（GH 34464）
DataFrame.to_records()中的 Bug 在时区感知的datetime64列中错误地丢失了时区信息（GH 32535）

绘图

现在，DataFrame.plot()的线/条图接受颜色字典进行渲染（GH 8193）
DataFrame.plot.hist()中的 Bug 导致多列的权重不起作用（GH 33173）
DataFrame.boxplot()和DataFrame.plot.boxplot()中的 Bug 丢失了medianprops、whiskerprops、capprops和boxprops的颜色属性（GH 30346）
DataFrame.hist()中的 Bug 忽略了column参数的顺序（GH 29235）
在 DataFrame.plot.scatter() 中的 bug，当添加多个具有不同 cmap 的图时，色条始终使用第一个 cmap（GH 33389）
在 DataFrame.plot.scatter() 中的 bug，即使参数 c 被分配给包含颜色名称的列，也会向图中添加一个色条（GH 34316）
在 pandas.plotting.bootstrap_plot() 中的 bug 会导致杂乱的坐标轴和重叠的标签（GH 34905)
在 DataFrame.plot.scatter() 中的 bug，当绘制可变标记大小时会导致错误（GH 32904）

GroupBy/resample/rolling

使用 pandas.api.indexers.BaseIndexer 的 count、min、max、median、skew、cov、corr 现在将为任何单调的 pandas.api.indexers.BaseIndexer 后代返回正确的结果（GH 32865）
DataFrameGroupby.mean() 和 SeriesGroupby.mean()（以及 median()、std() 和 var()）现在如果传入不被接受的关键字参数，会引发 TypeError。之前会引发 UnsupportedFunctionCall（如果传入 min_count 到 median() 会引发 AssertionError）（GH 31485）
在 DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 中的 bug，当 by 轴未排序、有重复值，并且应用的 func 不会改变传入的对象时，会引发 ValueError（GH 30667）
在 DataFrameGroupBy.transform() 中的 bug 会产生与转换函数不符的结果（GH 30918）
在DataFrameGroupBy.transform()和SeriesGroupBy.transform()中的错误，当通过一些是类别的多个键和其他不是类别的键进行分组时，返回的结果错误（GH 32494）
在DataFrameGroupBy.count()和SeriesGroupBy.count()中的错误，当分组列中包含 NaN 时，导致段错误（GH 32841）
在DataFrame.groupby()和Series.groupby()中的错误，在聚合布尔Series时产生不一致的类型（GH 32894）
在DataFrameGroupBy.sum()和SeriesGroupBy.sum()中的错误，当非空值的数量低于可空整数类型的min_count时，将返回一个较大的负数（GH 32861）
SeriesGroupBy.quantile()中的错误在可空整数上引发异常（GH 33136）
在DataFrame.resample()中存在的错误，导致生成的带时区的DatetimeIndex在午夜时有夏令时转换时会引发AmbiguousTimeError（GH 25758）
在DataFrame.groupby()中的错误，当按只读类别和sort=False进行分组时，将引发ValueError（GH 33410）
在 DataFrameGroupBy.agg()、SeriesGroupBy.agg()、DataFrameGroupBy.transform()、SeriesGroupBy.transform()、DataFrameGroupBy.resample() 和 SeriesGroupBy.resample() 中，当子类不被保留时存在 bug（GH 28330）
在 SeriesGroupBy.agg() 中存在一个 bug，之前在 SeriesGroupBy 的命名聚合中任何列名都被接受。现在的行为只允许 str 和可调用对象，否则将引发 TypeError（GH 34422）
DataFrame.groupby() 中的一个 bug 导致当 agg 的键之一引用了空列表时丢失了 Index 的名称（GH 32580）
在 Rolling.apply() 中存在一个 bug，当指定 engine='numba' 时，center=True 被忽略了（GH 34784）
在 DataFrame.ewm.cov() 中存在一个 bug，对于 MultiIndex 输入会抛出 AssertionError（GH 34440）
core.groupby.DataFrameGroupBy.quantile() 中的一个 bug 引发了 TypeError，而不是丢弃列，当非数值类型时（GH 27892）
在 core.groupby.DataFrameGroupBy.transform() 中存在一个 bug，当 func='nunique' 且列的类型为 datetime64 时，结果也将是 datetime64 而不是 int64（GH 35109）
在DataFrame.groupby()中存在的错误是，当选择列并使用as_index=False聚合时引发AttributeError。 (GH 35246).
DataFrameGroupBy.first()和DataFrameGroupBy.last()中存在的错误是，当对多个Categoricals进行分组时会引发不必要的ValueError。 (GH 34951)

重塑

影响所有数值和布尔减少方法的错误，未返回子类化数据类型。 (GH 25596)
在DataFrame.pivot_table()中存在的错误是，当仅设置了MultiIndexed列时。 (GH 17038)
DataFrame.unstack()和Series.unstack()中存在的错误是，可以在MultiIndexed数据中使用元组名称。 (GH 19966)
在DataFrame.pivot_table()中存在的错误是，当margin为True且只定义了column时。 (GH 31016)
在DataFrame.pivot()中存在的错误已修复，当columns设置为None时会出现错误的错误消息。 (GH 30924)
在crosstab()中存在的错误是，当输入为两个Series且具有元组名称时，输出将保留一个虚拟的MultiIndex作为列。 (GH 18321)
DataFrame.pivot()现在可以为index和columns参数使用列表。 (GH 21425)
concat()中存在的错误是，在copy=True时结果的索引没有被复制。 (GH 29879)
在SeriesGroupBy.aggregate()中存在的错误是，当它们共享相同的名称时，聚合会被覆盖。 (GH 30880)
在将 Float64Index 转换为 Int64Index 或转换为 ExtensionArray 数据类型时，Index.astype() 会丢失 name 属性的 Bug (GH 32013)
当传递一个 DataFrame 或包含 DataFrame 的序列时，Series.append() 现在会引发 TypeError 错误（GH 31413）
如果 to_replace 不是预期类型，则 DataFrame.replace() 和 Series.replace() 现在会引发 TypeError 错误。之前的 replace 会静默失败（GH 18634）
Series 的 inplace 操作存在一个 Bug，当使用 inplace=True 从其原始删除位置重新添加到 DataFrame 中时（GH 30484）
在使用 raw=True 请求时，DataFrame.apply() 中的回调函数会被调用并传递 Series 参数的 Bug（GH 32423）
在从具有时区感知数据类型的列创建 MultiIndex 级别时，DataFrame.pivot_table() 会丢失时区信息的 Bug (GH 32558)
在 concat() 函数中存在一个 Bug，当传递一个非字典映射作为 objs 时会引发 TypeError 错误（GH 32863）
当尝试聚合不存在的列时，DataFrame.agg() 现在会提供更具描述性的 SpecificationError 消息（GH 32755）
在 DataFrame.unstack() 中的 Bug 当使用 MultiIndex 列和 MultiIndex 行时出现 (GH 32624, GH 24729 和 GH 28306)
在将字典附加到 DataFrame 时，如果没有传递 ignore_index=True，将会引发 TypeError: Can only append a dict if ignore_index=True，而不是 TypeError: Can only append a :class:Series if ignore_index=True or if the :class:Series has a name (GH 30871)
在 DataFrame.corrwith()、DataFrame.memory_usage()、DataFrame.dot()、DataFrame.idxmin()、DataFrame.idxmax()、DataFrame.duplicated()、DataFrame.isin()、DataFrame.count()、Series.explode()、Series.asof() 和 DataFrame.asof() 中的 Bug 未返回子类化类型。 (GH 31331)
concat()中的 Bug 无法对具有重复键的 DataFrame 和 Series 进行连接（GH 33654)
在 cut() 中的 Bug 当参数 labels 包含重复项时引发错误（GH 33141)
确保 eval() 中只能使用具名函数 (GH 32460)
在某些情况下，Dataframe.aggregate() 和 Series.aggregate() 中出现递归循环的错误已修复 (GH 34224)
修复了 melt() 中，对带有 col_level > 0 的 MultiIndex 列进行融合时，在 id_vars 上引发 KeyError 的错误 (GH 34129)
在带有空的 Series 和非布尔 dtype 的空 cond 的情况下，使用 Series.where() 的错误已修复 (GH 34592)
修复了 DataFrame.apply() 对 S dtype 元素会引发 ValueError 的回归错误 (GH 34529)

稀疏

从带有时区信息的 dtype 创建 SparseArray 时将发出警告，而不是默默地丢弃时区信息 (GH 32501)
修复了 arrays.SparseArray.from_spmatrix() 错误读取 scipy 稀疏矩阵的错误 (GH 31991)
使用 SparseArray 的 Series.sum() 中引发了 TypeError 的错误已修复 (GH 25777)
当被类似列表索引时，包含全稀疏 NaN 的 SparseArray 的 DataFrame 中的错误已修复 (GH 27781, GH 29563)
SparseDtype 的 repr 现在包括其 fill_value 属性的 repr。之前它使用了 fill_value 的字符串表示形式 (GH 34352)
修复了空的 DataFrame 无法转换为 SparseDtype 的错误 (GH 33113)
在使用可迭代对象对稀疏数据框进行索引时，arrays.SparseArray()中存在错误，返回的类型不正确 (GH 34526, GH 34540)

ExtensionArray

修复了Series.value_counts()在Int64数据类型的空输入时引发的错误 (GH 33317)
修复了在使用concat()连接非重叠列的DataFrame对象时导致列类型为对象而不是保留扩展数据类型的错误 (GH 27692, GH 33027)
当pandas.options.mode.use_inf_as_na设置为True时，修复了StringArray.isna()对 NA 值返回False的错误 (GH 33655)
修复了带有 EA 数据类型和索引但没有数据或标量数据的Series构造失败的错误 (GH 26469)
修复了导致对多维数组的元素是扩展类型的Series.__repr__()崩溃的错误 (GH 33770).
修复了Series.update()在缺失值的情况下对ExtensionArray数据类型引发ValueError的错误（GH 33980)
修复了StringArray.memory_usage()未实现的错误 (GH 33963)
修复了在对可空布尔数据类型进行聚合时，DataFrameGroupBy()会忽略min_count参数的错误 (GH 34051)
修复了构造带有dtype='string'的DataFrame失败的错误 (GH 27953, GH 33623)
修复了DataFrame列设置为标量扩展类型时，被视为对象类型而不是扩展类型的错误 (GH 34832)
修复了IntegerArray.astype()中正确复制掩码的错误 (GH 34931).

其他

在对象 dtype Index上的集合操作现在始终返回对象 dtype 结果 (GH 31401)
修复了pandas.testing.assert_series_equal()，以正确引发如果left参数是具有check_series_type=True的不同子类时 (GH 32670).
在DataFrame.query()或DataFrame.eval()字符串中获取缺失属性时，正确引发AttributeError (GH 32408)
修复了pandas.testing.assert_series_equal()中当check_dtype为False时对Interval和ExtensionArray操作数进行 dtype 检查的 bug (GH 32747)
在DataFrame.__dir__()中的 bug 导致使用列名中的 unicode 代理时段错误 (GH 25509)
在DataFrame.equals()和Series.equals()中的 bug，允许子类相等 (GH 34402).

Datetimelike

将除 int64 外的整数 dtype 传递给 np.array(period_index, dtype=...) 现在将引发 TypeError，而不是错误地使用 int64（GH 32255）
Series.to_timestamp() 现在如果轴不是 PeriodIndex 将引发 TypeError。先前引发了 AttributeError（GH 33327）
Series.to_period() 现在如果轴不是 DatetimeIndex 将引发 TypeError。先前引发了 AttributeError（GH 33327）
Period 不再接受 freq 参数的元组（GH 34658）
在从模糊的纪元时间构造 Timestamp 并再次调用构造函数时，构造 Timestamp 的错误。改变了 Timestamp.value() 属性（GH 24329）
DatetimeArray.searchsorted()、TimedeltaArray.searchsorted()、PeriodArray.searchsorted()未识别非 pandas 标量，并错误地引发ValueError而不是TypeError（GH 30950)
在冬季到夏季夏令时切换前不到 128 纳秒的日期时间构造Timestamp存在错误，会导致不存在的时间（GH 31043)
在微秒频率下，Period.to_timestamp()、Period.start_time()存在错误，返回的时间戳比正确时间早一纳秒（GH 31475)
当年、月或日缺失时，Timestamp引发了令人困惑的错误消息（GH 31200)
在DatetimeIndex构造函数中，错误地接受了bool类型的输入（GH 32668)
在DatetimeIndex.searchsorted()中，不接受list或Series作为参数（GH 32762)
当传递一个字符串Series时，PeriodIndex()存在错误（GH 26109)
在将np.ndarray与timedelta64类型相加或相减时，Timestamp算术中存在错误（GH 33296)
在无参数调用时，DatetimeIndex.to_period()未推断频率存在错误（GH 33358)
在某些情况下，DatetimeIndex.tz_localize()错误地保留了freq，原始freq已不再有效（GH 30511)
DatetimeIndex.intersection() 中存在的 bug，在某些情况下丢失了 freq 和时区信息（GH 33604）
DatetimeIndex.get_indexer() 中存在的 bug，对于混合的类似日期时间的目标可能返回不正确的输出（GH 33741）
DatetimeIndex 的加法和减法与某些类型的 DateOffset 对象存在 bug，不正确地保留了无效的 freq 属性（GH 33779）
DatetimeIndex 中存在的 bug，在索引上设置 freq 属性时可能会悄悄地更改另一个查看相同数据的索引的 freq 属性（GH 33552）
DataFrame.min() 和 DataFrame.max() 在对使用空的 pd.to_datetime() 初始化的对象调用时，与 Series.min() 和 Series.max() 的结果不一致
DatetimeIndex.intersection() 和 TimedeltaIndex.intersection() 中存在的 bug，结果没有正确的 name 属性（GH 33904）
DatetimeArray.__setitem__()、TimedeltaArray.__setitem__()、PeriodArray.__setitem__() 中存在的 bug，错误地允许将具有 int64 数据类型的值静默转换（GH 33717）
从 Period 中减去 TimedeltaIndex 中的 bug，在某些情况下会不正确地引发 TypeError，应该成功，在某些情况下会引发 IncompatibleFrequency，应该引发 TypeError（GH 33883）
从只读 NumPy 数组构建 Series 或 Index 时的 bug，在时间戳边界内的非 ns 分辨率被转换为对象数据类型而不是强制转换为 datetime64[ns] 数据类型时，会将其转换为对象数据类型（GH 34843）。
在 Period、date_range()、period_range()、pd.tseries.frequencies.to_offset() 中，freq 关键字不再允许元组，改为传递字符串 (GH 34703)
在向空的DataFrame添加包含标量 tz-aware Timestamp 的Series时，DataFrame.append()中存在错误，导致结果为对象列而不是datetime64[ns, tz] dtype（GH 35038）
当时间戳超出实现边界时，OutOfBoundsDatetime 提供了改进的错误消息。(GH 32967)
在未定义规则时，AbstractHolidayCalendar.holidays() 中存在错误 (GH 31415)
在将 Tick 与类似 timedelta 的对象进行比较时引发 TypeError 的错误 (GH 34088)
在将 Tick 乘以浮点数时引发 TypeError 的错误 (GH 34486)

Tick

在使用高精度整数构造 Timedelta 时，会舍入 Timedelta 组件 (GH 31354)
在将 np.nan 或 None 除以 Timedelta 时，错误地返回 NaT (GH 31869)
Timedelta 现在将 µs 识别为微秒的标识符 (GH 32899)
当纳秒不为零时，Timedelta 字符串表示现在包括纳秒 (GH 9309)
在将Timedelta对象与timedelta64 dtype 的 np.ndarray 进行比较时存在错误，将所有条目错误地视为不等 (GH 33441)
在边缘情况下，timedelta_range() 存在缺陷，会产生额外的数据点（GH 30353, GH 33498)
在边缘情况下，DataFrame.resample() 存在缺陷，会产生额外的数据点（GH 30353, GH 13022, GH 33498)
在处理时间间隔时，DataFrame.resample() 存在缺陷，忽略了 loffset 参数（GH 7687, GH 33498)
对于字符串输入，Timedelta 和 pandas.to_timedelta() 存在缺陷，未能正确解析 unit 参数（GH 12136)

时区

在 infer_datetime_format=True 时，to_datetime() 存在缺陷，无法正确解析时区名称（例如 UTC）（GH 33133)

数值型

在使用 axis=0 参数时，DataFrame.floordiv() 存在缺陷，未能像 Series.floordiv() 那样处理除零操作（GH 31271)
使用 errors="coerce" 时，to_numeric() 与字符串参数 "uint64" 存在缺陷，会静默失败（GH 32394)
使用 downcast="unsigned" 时，to_numeric() 存在缺陷，对于空数据会失败（GH 32493)
使用 numeric_only=False 且列类型为 datetime64 或 PeriodDtype 时，DataFrame.mean() 存在缺陷，错误地引发 TypeError（GH 32426)
使用 level="foo" 和索引级别 "foo" 含有 NaN 值时，DataFrame.count() 存在缺陷，导致分段错误（GH 21824)
使用 axis=1 的 DataFrame.diff() 在混合数据类型情况下返回不正确结果的 Bug (GH 32995)
在处理具有pandas.NA的可空整数列时，DataFrame.corr() 和 DataFrame.cov() 报错的 Bug (GH 33803)
具有重复标签的非重叠列的 DataFrame 对象之间的算术运算导致无限循环的 Bug (GH 35194)
在对象数据类型对象和 datetime64 数据类型对象之间进行 DataFrame 和 Series 的加法和减法时出现的 Bug (GH 33824)
在比较 Float64Index 和对象 Index 时给出不正确结果的 Bug (GH 35217) 的 Index.difference()
具有 ExtensionArray 数据类型的 DataFrame 缩减（例如 df.min()，df.max()）时出现的 Bug (GH 34520, GH 32651)
如果 limit_direction 是 'forward' 或 'both'，并且 method 是 'backfill' 或 'bfill'，或者 limit_direction 是 'backward' 或 'both'，并且 method 是 'pad' 或 'ffill'，则现在 Series.interpolate() 和 DataFrame.interpolate() 会引发 ValueError (GH 34746)

转换

从具有大端 datetime64 数据类型的 NumPy 数组构造的 Series 出现的 Bug (GH 29684)
使用大纳秒关键字值构造 Timedelta 时出现的 Bug (GH 32402)
在DataFrame构造中，集合会被复制而不是引发错误的错误（GH 32582）
DataFrame构造函数不再接受DataFrame对象的列表。由于 NumPy 的更改，DataFrame对象现在被一致地视为 2D 对象，因此DataFrame对象的列表被视为 3D，不再被DataFrame构造函数接受（GH 32289）。
在使用列表初始化框��并为MultiIndex分配嵌套列表的columns时，DataFrame中存在的错误（GH 32173)
在创建新索引时，改进了列表构造无效时的错误消息（GH 35190）

字符串

在将“string” dtype 数据转换为可空整数 dtype 时，astype()方法中存在的错误（GH 32450）。
修复了在StringArray或带有StringDtype类型的Series上取min或max时引发错误的问题。(GH 31746)
在其他类型为Index时，Series.str.cat()中存在的错误导致输出为NaN（GH 33425）
pandas.api.dtypes.is_string_dtype()不再错误地将分类系列识别为字符串。

区间

在设置值时，IntervalArray中的错误允许基础数据被更改（GH 32782）

索引

当提供level关键字并且轴不是MultiIndex时，DataFrame.xs()现在会引发TypeError。之前会引发AttributeError（GH 33610）
在具有部分时间戳的DatetimeIndex上进行切片的 Bug，在年末、季末或月末附近丢弃高分辨率索引（GH 31064)
PeriodIndex.get_loc()中的 Bug，将更高分辨率的字符串与PeriodIndex.get_value()中的行为不同对待（GH 31172)
Series.at()和DataFrame.at()中的 Bug，在Float64Index中查找整数时，与.loc行为不匹配（GH 31329)
PeriodIndex.is_monotonic()中的 Bug，当包含前导NaT条目时，错误地返回True（GH 31437)
DatetimeIndex.get_loc()中的 Bug，使用转换后的整数键引发KeyError，而不是用户传递的键（GH 31425)
Series.xs()中的 Bug，在某些对象类型的情况下，错误地返回Timestamp而不是datetime64（GH 31630)
DataFrame.iat()中的 Bug，在某些对象类型的情况下，错误地返回Timestamp而不是datetime（GH 32809)
DataFrame.at()中的 Bug，当列或索引非唯一时（GH 33041)
Series.loc()和DataFrame.loc()中的 Bug，在对象类型的Index上使用整数键进行索引，该索引不全为整数（GH 31905)
DataFrame.iloc.__setitem__()中的 Bug，在具有重复列的DataFrame上，将所有匹配列的值错误地设置为相同值（GH 15686，GH 22036)
DataFrame.loc() 和 Series.loc() 中的 Bug，与 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 不匹配的 datetime 类型错误地允许查找（GH 32650）
Series.__getitem__() 中的 Bug，在使用非标准标量，例如 np.dtype 进行索引时（GH 32684）
Index 构造函数中的 Bug，对于 NumPy 标量引发了一个不太有用的错误消息（GH 33017）
DataFrame.lookup() 中的 Bug 在 frame.index 或 frame.columns 不唯一时错误地引发 AttributeError；现在将引发一个带有有用错误消息的 ValueError（GH 33041）
Interval 中的 Bug，其中无法将 Timedelta 添加或减去 Timestamp 区间（GH 32023）
DataFrame.copy() 中的 Bug 在复制后未使 _item_cache 无效，导致后续值更新未反映出来（GH 31784）
在提供 datetime64[ns, tz] 值时，DataFrame.loc() 和 Series.loc() 中的回归 Bug 和错误处理，（GH 32395）
Series.__getitem__() 中的 Bug，使用整数键和具有前导整数级的 MultiIndex 在第一级中不存在键时未能引发 KeyError（GH 33355）
DataFrame.iloc() 中的 Bug 在使用 ExtensionDtype（例如 df.iloc[:, :1]）对单列进行切片时返回无效结果（GH 32957）
在将元素设置到空的 Series 中时，导致索引 freq 丢失的 bug，出现在 DatetimeIndex.insert() 和 TimedeltaIndex.insert() 中（GH 33573）
在 IntervalIndex 和整数列表样式的键一起使用时，Series.__setitem__() 中存在的 bug（GH 33473）
允许使用 np.ndarray、Index、Series 索引器缺失标签的 bug 在 Series.__getitem__() 中，但不允许 list，现在全部引发 KeyError（GH 33646）
DataFrame.truncate() 和 Series.truncate() 中存在的 bug，假定索引是单调递增的（GH 33756）
使用表示日期时间的字符串列表进行索引在 DatetimeIndex 或 PeriodIndex 上失败的 bug（GH 11278）
当 Series.at() 与 MultiIndex 结合使用时存在的 bug，在有效输入时会引发异常（GH 26989）
当使用字典值更改具有 int 类型的列的数据类型为 float 时，DataFrame.loc() 中存在的 bug（GH 34573）
当与 MultiIndex 结合使用时，Series.loc() 中存在的 bug，访问 None 值时会引发 IndexingError（GH 34318）
在空的 DataFrame 或 Series 上使用 DataFrame.reset_index() 和 Series.reset_index() 时，不会保留数据类型，当有 MultiIndex 时也是如此（GH 19602）。
在具有 NaT 条目的 DatetimeIndex 上使用 time 键对 Series 和 DataFrame 进行索引存在错误（GH 35114）。

缺失

在空的 Series 上调用 fillna() 现在会正确地返回浅拷贝对象。现在行为与 Index、DataFrame 和非空的 Series 一致（GH 32543）。
当参数 to_replace 的类型为字典/列表且用于包含 <NA> 的 Series 时，Series.replace() 存在错误，会引发 TypeError。现在该方法在替换时会忽略 <NA> 值进行比较（GH 32621）。
any() 和 all() 方法在使用空值布尔类型和 skipna=False 时，对于全为 False 或全为 True 的值不正确地返回 <NA>（GH 33253）。
在使用 method=akima 插值时，明确了文档中对于 der 参数必须是标量或 None 的说明（GH 33426）。
DataFrame.interpolate() 现在使用了正确的轴约定。先前沿列插值会导致沿索引插值，反之亦然。此外，使用pad、ffill、bfill和backfill方法进行插值与使用DataFrame.fillna()中的这些方法是相同的 (GH 12918, GH 29146)
在使用字符串类型的列名的DataFrame上调用DataFrame.interpolate()时抛出 ValueError 的 bug。该方法现在与列名的类型无关 (GH 33956)
现在可以将NA传递到使用格式规范的格式字符串中。例如，"{:.1f}".format(pd.NA)以前会引发ValueError，但现在会返回字符串"<NA>" (GH 34740)
修复了Series.map()在无效na_action时未引发错误的 bug (GH 32815)

MultiIndex

当轴不是MultiIndex时，DataFrame.swaplevels()现在会引发TypeError。先前会引发AttributeError (GH 31126)
在与MultiIndex一起使用时，Dataframe.loc()中的 bug。返回的值与给定的输入顺序不同 (GH 22797)

In [79]: df = pd.DataFrame(np.arange(4),
 ....:                  index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
 ....: 

# Rows are now ordered as the requested keys
In [80]: df.loc[(['b', 'a'], [2, 1]), :]
Out[80]: 
 0
b 2  3
 1  2
a 2  1
 1  0

MultiIndex.intersection()中的 bug 在sort=False时不能保证保持顺序。 (GH 31325)
在调用DataFrame.truncate()时删除了MultiIndex名称的 bug。 (GH 34564)

In [81]: left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])

In [82]: right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])

# Common elements are now guaranteed to be ordered by the left side
In [83]: left.intersection(right, sort=False)
Out[83]: 
MultiIndex([('b', 2),
 ('a', 1)],
 )

在没有指定不同列级别的情况下连接两个MultiIndex时的 bug。返回索引参数被忽略。 (GH 34074)

IO

将 set 作为 names 参数传递给 pandas.read_csv()、pandas.read_table() 或 pandas.read_fwf() 时会引发 ValueError: Names should be an ordered collection. 错误（GH 34946）。
当 display.precision 为零时存在打印输出 bug。 (GH 20359)
read_json() 中存在的 bug 导致包含大数字字符串的 json 发生整数溢出错误（GH 30320）。
read_csv() 现在在参数 header 和 prefix 都不是 None 时会引发 ValueError 错误（GH 27394）。
DataFrame.to_json() 中存在的 bug 在 path_or_buf 是 S3 URI 时引发 NotFoundError 错误（GH 28375）。
DataFrame.to_parquet() 中存在的 bug 覆盖了 pyarrow 的默认设置 coerce_timestamps；遵循 pyarrow 的默认设置可以在 version="2.0" 下写入纳秒级时间戳（GH 31652）。
read_csv() 在使用 sep=None 与 comment 关键词组合时会引发 TypeError 错误（GH 31396）。
在 Python 3 中从 Python 2 写入的固定格式中读取 DataFrame 时，HDFStore 中存在的 bug 会导致将 datetime64 列的 dtype 设置为 int64（GH 31750）。
read_sas() 现在可以处理大于 Timestamp.max 的日期和日期时间，将它们作为 datetime.datetime 对象返回（GH 20927）。
DataFrame.to_json() 中存在的 bug 导致使用 date_format="iso" 时 Timedelta 对象无法正确序列化（GH 28256）。
当在Dataframe中传递的日期解析列名缺失时，read_csv()将引发ValueError（GH 31251）
在具有高代理项的 UTF-8 字符串导致分段违规的情况下，read_excel()中的 Bug 引发了错误（GH 23809）
当文件为空时，read_csv()中的 Bug 导致文件描述符泄漏（GH 31488）
当标题和数据行之间存在空行时，read_csv()中的 Bug 导致段错误（GH 28071）
在权限问题上，read_csv()中的 Bug 引发了误导性异常（GH 23784）
当header=None且存在两个额外的数据列时，read_csv()中的 Bug 引发了IndexError
read_sas()中的 Bug 在从 Google Cloud Storage 读取文件时引发AttributeError（GH 33069）
当保存超出界限的日期时，DataFrame.to_sql()中的 Bug 引发了AttributeError（GH 26761）
read_excel()中的 Bug 未正确处理 OpenDocument 文本单元格中的多个嵌入空格。 (GH 32207)
当将list的布尔值读入Series时，read_json()中的 Bug 引发了TypeError（GH 31464）
pandas.io.json.json_normalize()中的 Bug，其中由record_path指定的位置不指向数组。(GH 26284)
pandas.read_hdf()在加载不受支持的 HDF 文件时具有更明确的错误消息（GH 9539）
在读取 s3 或 http 文件路径时，read_feather()中的 Bug 引发了ArrowIOError（GH 29055）
当列名为render时，to_excel()无法处理并引发了KeyError（GH 34331）
当 SQL 语句包含%字符且没有参数时，execute()中的错误对于一些 DB-API 驱动程序引发了ProgrammingError（GH 34211）
使用迭代器读取数据时，StataReader()中的错误导致分类变量具有不同的 dtype。(GH 31544)
HDFStore.keys()现在有一个可选的include参数，允许检索所有本机 HDF5 表名（GH 29916）
当传递意外关键字参数时，read_csv()和read_table()引发的TypeError异常显示为parser_f（GH 25648）
对于 ODS 文件，read_excel()中的错误会移除 0.0 值（GH 27222）
ujson.encode()中的错误在大于sys.maxsize的数字时引发了OverflowError（GH 34395）
当设置min_itemsize参数时，HDFStore.append_to_multiple()中的错误引发了ValueError（GH 11238）
当在输入的data_columns中未指定column参数时，create_table()中的错误现在会引发错误（GH 28156)
当设置lines和chunksize时，read_json()现在可以从文件 url 读取分行的 json 文件。
当使用 MySQL 读取带有-np.inf条目的 DataFrames 时，DataFrame.to_sql()中的错误现在具有更明确的ValueError（GH 34431）
读取 _*函数未解压大写文件扩展名的错误（GH 35164）
当header=None且index_col作为list给出时，read_excel()中的错误引发了TypeError（GH 31783）
在带有 MultiIndex 的标题中使用日期时间值的 read_excel() 中的错误 (GH 34748)
read_excel() 不再接受 **kwds 参数。这意味着传递关键字参数 chunksize 现在会引发 TypeError（以前引发 NotImplementedError），而传递关键字参数 encoding 现在会引发 TypeError (GH 34464)
DataFrame.to_records() 中的错误在时区感知的 datetime64 列中不正确地丢失时区信息 (GH 32535)

绘图

对于线条/条形图，DataFrame.plot() 现在通过字典接受颜色 (GH 8193).
在多列上不起作用的权重中，DataFrame.plot.hist() 中的错误 (GH 33173)
DataFrame.boxplot() 和 DataFrame.plot.boxplot() 中的错误导致 medianprops、whiskerprops、capprops 和 boxprops 的颜色属性丢失（GH 30346）
在 DataFrame.hist() 中，column 参数的顺序被忽略的错误 (GH 29235)
在添加具有不同 cmap 的多个图时，DataFrame.plot.scatter() 中的错误总是使用第一个 cmap (GH 33389)
DataFrame.plot.scatter() 中的错误会在参数 c 被分配到包含颜色名称的列时仍然向图中添加色条 (GH 34316)
pandas.plotting.bootstrap_plot() 中的错误导致杂乱的坐标轴和重叠的标签 (GH 34905)
在绘制可变标记大小时，DataFrame.plot.scatter() 中的错误导致出错（GH 32904）

GroupBy/resample/rolling

使用 count、min、max、median、skew、cov、corr 的 pandas.api.indexers.BaseIndexer 现在将为任何单调的 pandas.api.indexers.BaseIndexer 后代返回正确的结果（GH 32865）
如果传入非接受的关键字参数，则DataFrameGroupby.mean() 和 SeriesGroupby.mean()（以及类似于 median()、 std() 和 var() 的函数）现在会引发TypeError。先前会引发UnsupportedFunctionCall（如果将 min_count 传递到 median() 则会引发AssertionError）（GH 31485）
当by轴未排序、存在重复项，并且应用的func不改变传入对象时，DataFrameGroupBy.apply() 和 SeriesGroupBy.apply() 存在ValueError的错误（GH 30667）
使用转换函数时，DataFrameGroupBy.transform() 中的错误会产生不正确的结果（GH 30918）
当按多个键分组时，其中一些是分类的，而其他的不是时，DataFrameGroupBy.transform() 和 SeriesGroupBy.transform() 返回错误结果（GH 32494）
当分组的列包含 NaN 时，DataFrameGroupBy.count() 和 SeriesGroupBy.count() 中存在分段错误（GH 32841）
DataFrame.groupby() 和 Series.groupby() 中的一个 bug 是，在聚合布尔类型 Series 时产生不一致的类型（GH 32894）
DataFrameGroupBy.sum() 和 SeriesGroupBy.sum() 中的一个 bug 是，在可空整数数据类型的情况下，当非空值的数量低于 min_count 时，会返回一个较大的负数（GH 32861）
SeriesGroupBy.quantile() 中的一个 bug 是，在可空整数时引发异常（GH 33136）
DataFrame.resample() 中的一个 bug 是，在结果时区感知的 DatetimeIndex 在午夜时有 DST 转换时，会引发 AmbiguousTimeError（GH 25758）
DataFrame.groupby() 中的一个 bug 是，当按只读分类列分组且 sort=False 时，会引发 ValueError（GH 33410）
DataFrameGroupBy.agg()、SeriesGroupBy.agg()、DataFrameGroupBy.transform()、SeriesGroupBy.transform()、DataFrameGroupBy.resample() 和 SeriesGroupBy.resample() 中的一个 bug 是，子类未被保留（GH 28330）
在SeriesGroupBy.agg()中的错误，以前在SeriesGroupBy的命名聚合中接受任何列名。现在的行为只允许str和可调用对象，否则会引发TypeError。(GH 34422)
在DataFrame.groupby()中的错误，当agg键中引用空列表时，丢失了Index的名称。(GH 32580)
在Rolling.apply()中的错误，当指定engine='numba'时，center=True被忽略。(GH 34784)
在DataFrame.ewm.cov()中的错误，对MultiIndex输入抛出AssertionError。(GH 34440)
在core.groupby.DataFrameGroupBy.quantile()中的错误，对非数值类型引发TypeError而不是删除列。(GH 27892)
在core.groupby.DataFrameGroupBy.transform()中的错误，当func='nunique'且列的类型为datetime64时，结果也将是datetime64类型而不是int64。(GH 35109)
在DataFrame.groupby()中的错误，在选择列并使用as_index=False进行聚合时引发AttributeError。(GH 35246).
在DataFrameGroupBy.first()和DataFrameGroupBy.last()中的错误，当在多个Categoricals上分组时会引发不必要的ValueError。(GH 34951)

重塑

影响所有数值和布尔缩减方法不返回子类数据类型的错误。(GH 25596)
在DataFrame.pivot_table()中的错误，当仅设置MultiIndexed列时。(GH 17038)
在DataFrame.unstack()和 Series.unstack() 中存在一个 Bug，可以在 MultiIndexed 数据中使用元组名称 (GH 19966)
在DataFrame.pivot_table()中存在一个 Bug，当 margin 为 True 并且仅定义了 column 时 (GH 31016)
在DataFrame.pivot()中修正了当 columns 设置为 None 时的错误消息。 (GH 30924)
当输入为两个 Series 并且具有元组名称时，在crosstab() 中存在一个 Bug，输出将保留一个虚拟的 MultiIndex 作为列 (GH 18321)
DataFrame.pivot() 现在可以接受 index 和 columns 参数的列表 (GH 21425)
在concat()中存在一个 Bug，当 copy=True 时，结果的索引未被复制（GH 29879）
在 SeriesGroupBy.aggregate() 中存在一个 Bug，当它们共享相同的名称时，聚合将被覆盖 (GH 30880)
Bug，在Index.astype()将从 Float64Index 转换为 Int64Index 或转换为 ExtensionArray dtype 时会丢失 name 属性时 (GH 32013)
当传递给 Series.append() 的参数为 DataFrame 或包含 DataFrame 的序列时，现在将引发 TypeError (GH 31413)
如果 to_replace 不是预期类型，则DataFrame.replace() 和 Series.replace() 将引发 TypeError。之前的 replace 会默默失败 (GH 18634)
关于 inplace 操作的 Series 的 Bug，该 Bug 是将列添加到原始删除列的 DataFrame（使用inplace=True）中（GH 30484）
DataFrame.apply() 中存在的 Bug，即使请求了raw=True，回调函数仍然会以Series参数调用（GH 32423）
DataFrame.pivot_table() 中存在的 Bug，在从具有时区感知 dtype 的列创建 MultiIndex 级别时会丢失时区信息（GH 32558）
concat() 中存在的 Bug，当作为objs传递非字典映射时会引发TypeError错误（GH 32863）
DataFrame.agg() 现在在尝试对不存在的列进行聚合时提供了更具描述性的SpecificationError消息（GH 32755）
DataFrame.unstack() 中存在的 Bug，当使用 MultiIndex 列和 MultiIndex 行时（GH 32624、GH 24729 和 GH 28306）
如果在不传递ignore_index=True的情况下将字典附加到 DataFrame 中，将引发TypeError: Can only append a dict if ignore_index=True错误，而不是TypeError: Can only append a :class:Series if ignore_index=True or if the :class:Series has a name（GH 30871）
DataFrame.corrwith()、DataFrame.memory_usage()、DataFrame.dot()、DataFrame.idxmin()、DataFrame.idxmax()、DataFrame.duplicated()、DataFrame.isin()、DataFrame.count()、Series.explode()、Series.asof() 以及 DataFrame.asof() 函数未返回子类化类型。(GH 31331)
concat() 函数中的 Bug 导致不能将 DataFrame 和 Series 进行连接，当键值重复时。(GH 33654)
在 cut() 函数中的 Bug 在参数 labels 包含重复时引发错误。(GH 33141)
确保只有具名函数可以在 eval() 函数中使用。(GH 32460)
在某些情况下，Dataframe.aggregate() 和 Series.aggregate() 函数中的 Bug 会导致递归循环。(GH 34224)
修复了 melt() 函数中的 Bug，当使用 col_level > 0 来融合 MultiIndex 列时，会在 id_vars 上引发 KeyError。(GH 34129)
在空Series和空cond具有非布尔类型时，Series.where()存在错误（GH 34592)
修复了DataFrame.apply()对S类型元素引发ValueError的回归错误（GH 34529)

Sparse

从时区感知 dtype 创建SparseArray将在删除时区信息之前发出警告，而不是静默执行（GH 32501)
arrays.SparseArray.from_spmatrix()中的错误导致错误地读取了 scipy 稀疏矩阵（GH 31991）
使用SparseArray进行Series.sum()存在TypeError错误（GH 25777)
修复了包含全稀疏SparseArray的DataFrame在被类似列表索引时填充NaN的 bug（GH 27781, GH 29563)
SparseDtype的 repr 现在包括其fill_value属性的 repr。之前它使用了fill_value的字符串表示（GH 34352)
修复了空DataFrame无法转换为SparseDtype的错误（GH 33113)
在使用可迭代对象索引稀疏数据框时，arrays.SparseArray()存在错误类型返回的错误（GH 34526, GH 34540）

ExtensionArray

修复了Series.value_counts()在Int64类型的空输入上引发错误的 bug（GH 33317)
修复了 concat() 在连接具有不重叠列的 DataFrame 对象时导致对象类型列而不是保留扩展类型的错误（GH 27692, GH 33027）
修复了 StringArray.isna() 在 pandas.options.mode.use_inf_as_na 设置为 True 时返回 NA 值为 False 的错误（GH 33655）
修复了使用 EA 类型和索引但没有数据或标量数据构造 Series 时失败的错误（GH 26469）
修复了导致 Series.__repr__() 在元素为多维数组的扩展类型时崩溃的错误（GH 33770）。
修复了 Series.update() 在具有缺失值的 ExtensionArray 类型上引发 ValueError 的错误（GH 33980)
修复了未实现的 StringArray.memory_usage() 的错误（GH 33963）
修复了 DataFrameGroupBy() 在可空布尔类型上进行聚合时忽略 min_count 参数的错误（GH 34051）
修复了使用 dtype='string' 构造 DataFrame 会失败的错误（GH 27953, GH 33623）
修复了设置为标量扩展类型的 DataFrame 列被视为对象类型而不是扩展类型的错误（GH 34832）
修复了 IntegerArray.astype() 中的错误，以正确复制掩码（GH 34931）。

其他

对象类型 Index 上的集合操作现在始终返回对象类型的结果（GH 31401）
修复了 pandas.testing.assert_series_equal() 如果 left 参数是不同子类且 check_series_type=True 时正确引发错误的错误（GH 32670）。
在 DataFrame.query() 或 DataFrame.eval() 字符串中获取缺少的属性会引发正确的 AttributeError（GH 32408）。
在 pandas.testing.assert_series_equal() 中修复了一个错误，当 check_dtype 为 False 时，会检查 Interval 和 ExtensionArray 操作数的数据类型（GH 32747）。
在使用列名中的 Unicode 代理时，在 DataFrame.__dir__() 中出现错误导致分段错误（GH 25509）。
在允许子类相等时，在 DataFrame.equals() 和 Series.equals() 中修复了一个错误（GH 34402）。