【Python床头书】python pandas.DataFrame参数属性方法用法权威详解

BigDataMLApplication

已于 2023-12-27 18:21:03 修改

阅读量836

点赞数

分类专栏： python 文章标签： python pandas 开发语言

于 2023-10-25 23:52:24 首次发布

本文链接：https://blog.csdn.net/wang2leee/article/details/134046213

版权

python 专栏收录该内容

18 篇文章 1 订阅

订阅专栏

python pandas.DataFrame参数属性方法用法权威详解

文章目录

python pandas.DataFrame参数属性方法用法权威详解
参数：
另请参见：
注意事项：
示例：
属性：
方法：
参考链接

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)[source]

二维、大小可变、潜在异构的表格数据结构。

数据结构还包含带有标签的轴（行和列）。算术运算在行和列标签上对齐。可以将其视为Series对象的类似字典的容器。是主要的pandas数据结构。

参数：

data：结构化或同质的ndarray，可迭代对象，字典或DataFrame
- 如果data是字典，则按插入顺序排序。
- 如果字典包含定义了索引的Series，则根据索引进行对齐。如果data本身就是Series或DataFrame，则也会进行对齐。
- 如果data是字典列表，则按插入顺序排序。
index：索引或类似数组
- 用于生成结果帧的索引。如果输入数据没有索引信息并且未提供索引，则默认为RangeIndex。
columns：索引或类似数组
- 用于生成结果帧时使用的列标签。如果数据没有列标签，则默认为RangeIndex（0, 1, 2，…，n）。如果数据包含列标签，则将执行列选择。
dtype：数据类型，默认为None
- 要强制使用的数据类型。只允许一个单独的dtype。如果为None，则自动推断。
copy：bool或None，默认为None
- 从输入复制数据。对于字典数据，None的默认行为相当于copy=True。对于DataFrame或2D ndarray输入，None的默认行为相当于copy=False。如果data是包含一个或多个Series的字典（可能具有不同的dtype），copy=False将确保不复制这些输入。

版本1.3.0中的更改。

另请参见：

DataFrame.from_records
- 使用元组构造函数，也可以使用记录数组。
DataFrame.from_dict
- 从Series、数组或字典的字典创建。
read_csv
- 将逗号分隔值（csv）文件读入DataFrame。
read_table
- 将常规分隔文件读入DataFrame。
read_clipboard
- 将剪贴板中的文本读入DataFrame。

注意事项：

请参考用户指南获取更多信息。

示例：

构造DataFrame从字典

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
   col1  col2
0     1     3
1     2     4

请注意推断的dtype是int64。

df.dtypes
col1    int64
col2    int64
dtype: object

要强制使用单个dtype：

df = pd.DataFrame(data=d, dtype=np.int8)
df.dtypes
col1    int8
col2    int8
dtype: object

从包含Series的字典构造DataFrame

d = {'col1': [0, 1, 2, 3], 'col2': pd.Series([2, 3], index=[2, 3])}
pd.DataFrame(data=d, index=[0, 1, 2, 3])
   col1  col2
0     0   NaN
1     1   NaN
2     2   2.0
3     3   3.0

从numpy ndarray构造DataFrame

df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

从具有标记列的numpy ndarray构造DataFrame

data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
                dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
df3 = pd.DataFrame(data, columns=['c', 'a'])

df3
   c  a
0  3  1
1  6  4
2  9  7

从dataclass构造DataFrame

from dataclasses import make_dataclass
Point = make_dataclass("Point", [("x", int), ("y", int)])
pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3

从Series/DataFrame构造DataFrame

ser = pd.Series([1, 2, 3], index=["a", "b", "c"])
df = pd.DataFrame(data=ser, index=["a", "c"])
df
   0
a  1
c  3
df1 = pd.DataFrame([1, 2, 3], index=["a", "b", "c"], columns=["x"])
df2 = pd.DataFrame(data=df1, index=["a", "c"])
df2
   x
a  1
c  3

属性：

属性	描述
`T`	DataFrame的转置。
`at`	访问行/列标签对的单个值。
`attrs`	此数据集的全局属性字典。
`axes`	返回表示DataFrame的轴的列表。
`columns`	DataFrame的列标签。
`dtypes`	返回DataFrame中的数据类型。
`empty`	表示Series/DataFrame是否为空的指示符。
`flags`	获取与此pandas对象关联的属性。
`iat`	根据整数位置访问行/列对的单个值。
`iloc`	纯粹基于整数位置的索引，用于按位置选择。
`index`	DataFrame的索引（行标签）。
`loc`	按标签或布尔数组访问一组行和列。
`ndim`	返回表示轴数/数组维度的整数。
`shape`	返回表示DataFrame的维度的元组。
`size`	返回表示对象中元素数量的整数。
`style`	返回一个Styler对象。
`values`	返回DataFrame的Numpy表示。

方法：

方法	描述
`abs()`	返回每个元素的绝对值的Series/DataFrame。
`add(other[, axis, level, fill_value])`	获取DataFrame和other的加法，逐元素执行（二进制运算符add）。
`add_prefix(prefix[, axis])`	使用前缀字符串添加标签。
`add_suffix(suffix[, axis])`	使用后缀字符串添加标签。
`agg([func, axis])`	在指定轴上使用一个或多个操作进行聚合。
`aggregate([func, axis])`	在指定轴上使用一个或多个操作进行聚合。
`align(other[, join, axis, level, copy, ...])`	根据指定的连接方法在两个对象上对齐它们的轴。
`all([axis, bool_only, skipna])`	返回所有元素是否为True，可能是沿着一个轴的。
`any(*[, axis, bool_only, skipna])`	返回任何元素是否为True，可能是沿着一个轴的。
`apply(func[, axis, raw, result_type, args, ...])`	沿着DataFrame的轴应用函数。
`applymap(func[, na_action])`	（已弃用）按元素对Dataframe应用函数。
`asfreq(freq[, method, how, normalize, ...])`	将时间序列转换为指定频率。
`asof(where[, subset])`	返回where之前没有NaN的最后一行。
`assign(**kwargs)`	将新列分配给DataFrame。
`astype(dtype[, copy, errors])`	将pandas对象转换为指定的dtype。
`at_time(time[, asof, axis])`	选择特定时间的值（例如，上午9:30）。
`backfill(*[, axis, inplace, limit, downcast])`	（已弃用）使用下一个有效观察值来填充NA/NaN值。
`between_time(start_time, end_time[, ...])`	选择一天中特定时间段的值（例如，上午9:00-9:30）。
`bfill(*[, axis, inplace, limit, downcast])`	使用下一个有效观察值填充NA/NaN值。
`bool()`	（已弃用）返回单个元素Series或DataFrame的布尔值。
`boxplot([column, by, ax, fontsize, rot, ...])`	根据DataFrame列绘制盒图。
`clip([lower, upper, axis, inplace])`	在输入阈值处修剪值。
`combine(other, func[, fill_value, overwrite])`	使用另一个DataFrame进行按列合并。
`combine_first(other)`	将null元素更新为other中相同位置的值。
`compare(other[, align_axis, keep_shape, ...])`	与另一个DataFrame进行比较并显示差异。
`convert_dtypes([infer_objects, ...])`	使用支持pd.NA的dtypes将列转换为最佳可能的dtypes。
`copy([deep])`	复制此对象的索引和数据。
`corr([method, min_periods, numeric_only])`	计算列之间的成对相关性，不包括NA/null值。
`corrwith(other[, axis, drop, method, ...])`	计算成对相关性。
`count([axis, numeric_only])`	计算每列或每行的非NA单元格数。
`cov([min_periods, ddof, numeric_only])`	计算列之间的成对协方差，不包括NA/null值。
`cummax([axis, skipna])`	返回DataFrame或Series轴上的累积最大值。
`cummin([axis, skipna])`	返回DataFrame或Series轴上的累积最小值。
`cumprod([axis, skipna])`	返回DataFrame或Series轴上的累积乘积。
`cumsum([axis, skipna])`	返回DataFrame或Series轴上的累积总和。
`describe([percentiles, include, exclude])`	生成描述性统计信息。
`diff([periods, axis])`	计算元素的首个离散差异。
`div(other[, axis, level, fill_value])`	获取DataFrame和other的浮点除法，逐元素执行（二进制运算符truediv）。
`divide(other[, axis, level, fill_value])`	获取DataFrame和other的浮点除法，逐元素执行（二进制运算符truediv）。
`dot(other)`	计算DataFrame和other之间的矩阵乘法。
`drop([labels, axis, index, columns, level, ...])`	从行或列中删除指定的标签。
`drop_duplicates([subset, keep, inplace, ...])`	返回删除重复行的DataFrame。
`droplevel(level[, axis])`	返回请求的索引/列级别已被删除的Series/DataFrame。
`dropna(*[, axis, how, thresh, subset, ...])`	删除缺失值。
`duplicated([subset, keep])`	返回表示重复行的布尔Series。
`eq(other[, axis, level])`	获取DataFrame和other的等于，逐元素执行（二进制运算符eq）。
`equals(other)`	测试两个对象是否包含相同的元素。
`eval(expr, *[, inplace])`	计算描述DataFrame列操作的字符串。
`ewm([com, span, halflife, alpha, ...])`	提供指数加权（EW）计算。
`expanding([min_periods, axis, method])`	提供扩展窗口计算。
`explode(column[, ignore_index])`	将列表的每个元素转换为行，复制索引值。
`ffill(*[, axis, inplace, limit, downcast])`	通过将最后一个有效观察值传播到下一个有效观察值来填充NA/NaN值。
`fillna([value, method, axis, inplace, ...])`	使用指定的方法填充NA/NaN值。
`filter([items, like, regex, axis])`	根据指定的索引标签子集DataFrame的行或列。
`first(offset)`	根据日期偏移量选择时间序列数据的初始周期。
`first_valid_index()`	返回第一个非NA值的索引或None（如果未找到非NA值）。
`floordiv(other[, axis, level, fill_value])`	获取DataFrame和other的整数除法，逐元素执行（二进制运算符floordiv）。
`from_dict(data[, orient, dtype, columns])`	从类似数组或字典的字典构造DataFrame。
`from_records(data[, index, exclude, ...])`	将结构化或记录ndarray转换为DataFrame。
`ge(other[, axis, level])`	获取DataFrame和other的大于等于，逐元素执行（二进制运算符ge）。
`get(key[, default])`	获取给定键的对象项（例如DataFrame列）。
`groupby([by, axis, level, as_index, sort, ...])`	使用映射器或一系列列对DataFrame进行分组。
`gt(other[, axis, level])`	获取DataFrame和other的大于，逐元素执行（二进制运算符gt）。
`head([n])`	返回前n行。
`hist([column, by, grid, xlabelsize, xrot, ...])`	从DataFrame列生成直方图。
`idxmax([axis, skipna, numeric_only])`	返回最大值的第一个出现位置的索引。
`idxmin([axis, skipna, numeric_only])`	返回最小值的第一个出现位置的索引。
`infer_objects([copy])`	尝试推断对象列的更好dtypes。
`info([verbose, buf, max_cols, memory_usage, ...])`	打印DataFrame的简明摘要。
`insert(loc, column, value[, allow_duplicates])`	在指定位置插入列到DataFrame中。
`interpolate([method, axis, limit, inplace, ...])`	使用插值方法填充NaN值。
`isetitem(loc, value)`	在位置loc的列中设置给定值。
`isin(values)`	检查DataFrame中的每个元素是否包含在值中。
`isna()`	检测缺失值。
`isnull()`	DataFrame.isnull是DataFrame.isna的别名。
`items()`	迭代（列名，Series）对。
`iterrows()`	迭代DataFrame行作为（索引，Series）对。
`itertuples([index, name])`	以命名元组的形式迭代DataFrame行。
`join(other[, on, how, lsuffix, rsuffix, ...])`	连接另一个DataFrame的列。
keys()	获取’info axis’（参见索引），返回Index对象。
kurt([axis, skipna, numeric_only])	返回请求轴上无偏峰度。
kurtosis([axis, skipna, numeric_only])	返回请求轴上无偏峰度。
last(offset)	根据日期偏移选择时间序列数据的最后一段。
last_valid_index()	返回最后一个非NA值的索引，如果没有非NA值则返回None。
le(other[, axis, level])	逐元素比较dataframe和其他对象，返回小于等于的结果。
lt(other[, axis, level])	逐元素比较dataframe和其他对象，返回小于的结果。
map(func[, na_action])	对Dataframe逐元素应用函数。
mask(cond[, other, inplace, axis, level])	将满足条件的值替换为指定值。
max([axis, skipna, numeric_only])	返回请求轴上的最大值。
mean([axis, skipna, numeric_only])	返回请求轴上的平均值。
median([axis, skipna, numeric_only])	返回请求轴上的中位数。
melt([id_vars, value_vars, var_name, …])	将DataFrame从宽格式转换为长格式，可选择保留标识符。
memory_usage([index, deep])	返回每列的内存使用量（以字节为单位）。
merge(right[, how, on, left_on, right_on, …])	使用类似数据库的连接方式合并DataFrame或命名Series对象。
min([axis, skipna, numeric_only])	返回请求轴上的最小值。
mod(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行取模运算。
mode([axis, numeric_only, dropna])	沿所选轴获取每个元素的众数。
mul(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行乘法运算。
multiply(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行乘法运算。
ne(other[, axis, level])	对dataframe和其他对象逐元素进行不等于比较。
nlargest(n, columns[, keep])	返回按指定列降序排列的前n行。
notna()	检测存在（非缺失）值。
notnull()	DataFrame.notnull是DataFrame.notna的别名。
nsmallest(n, columns[, keep])	返回按指定列升序排列的前n行。
nunique([axis, dropna])	计算指定轴上的唯一元素数量。
pad(*[, axis, inplace, limit, downcast])	（已弃用）通过传播最后一个有效观测值填充NA/NaN值。
pct_change([periods, fill_method, limit, freq])	当前元素与前一个元素之间的分数变化。
pipe(func, args, *kwargs)	应用期望Series或DataFrames的可链式函数。
pivot(*, columns[, index, values])	根据给定的索引/列值返回重塑的DataFrame。
pivot_table([values, index, columns, …])	创建类似电子表格的数据透视表作为DataFrame。
plot	PlotAccessor的别名。
pop(item)	返回项并从frame中删除。
pow(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行指数幂运算。
prod([axis, skipna, numeric_only, min_count])	返回请求轴上的值的乘积。
product([axis, skipna, numeric_only, min_count])	返回请求轴上的值的乘积。
quantile([q, axis, numeric_only, …])	返回请求轴上给定分位数的值。
query(expr, *[, inplace])	使用布尔表达式查询DataFrame的列。
radd(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行加法运算。
rank([axis, method, numeric_only, …])	沿轴计算数值数据的排名（1到n）。
rdiv(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行浮点除法运算。
reindex([labels, index, columns, axis, …])	使用可选填充逻辑将DataFrame调整为新索引。
reindex_like(other[, method, copy, limit, …])	返回与其他对象具有匹配索引的对象。
rename([mapper, index, columns, axis, copy, …])	重命名列或索引标签。
rename_axis([mapper, index, columns, axis, …])	为索引或列设置轴的名称。
reorder_levels(order[, axis])	使用输入顺序重新排列索引级别。
replace([to_replace, value, inplace, limit, …])	用指定的值替换to_replace中的值。
resample(rule[, axis, closed, label, …])	对时间序列数据进行重新采样。
reset_index([level, drop, inplace, …])	重置索引，或其中一级别。
rfloordiv(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行整数除法运算。
rmod(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行取模运算。
rmul(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行乘法运算。
rolling(window[, min_periods, center, …])	提供滚动窗口计算。
round([decimals])	将DataFrame四舍五入到指定的小数位数。
rpow(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行指数幂运算。
rsub(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行减法运算。
rtruediv(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行浮点除法运算。
sample([n, frac, replace, weights, …])	从对象的轴中返回随机样本。
select_dtypes([include, exclude])	根据列数据类型返回DataFrame的子集。
sem([axis, skipna, ddof, numeric_only])	返回请求轴上均值的无偏标准误差。
set_axis(labels, *[, axis, copy])	为给定轴分配所需的索引。
set_flags(*[, copy, allows_duplicate_labels])	返回更新后的flags的新对象。
set_index(keys, *[, drop, append, inplace, …])	使用现有列设置DataFrame的索引。
shift([periods, freq, axis, fill_value, suffix])	以所需的周期数移动索引，可以选择带有时间频率。
skew([axis, skipna, numeric_only])	返回请求轴上的无偏偏度。
sort_index(*[, axis, level, ascending, …])	按标签排序对象（沿轴）。
sort_values(by, *[, axis, ascending, …])	按值沿任一轴排序。
sparse	SparseFrameAccessor的别名。
squeeze([axis])	将1维轴对象挤压为标量。
stack([level, dropna, sort, future_stack])	从列到索引堆叠指定的级别。
std([axis, skipna, ddof, numeric_only])	返回请求轴上的样本标准差。
sub(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行减法运算。
subtract(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行减法运算。
sum([axis, skipna, numeric_only, min_count])	返回请求轴上的值的总和。
swapaxes(axis1, axis2[, copy])	（已弃用）交换轴并相应地交换值轴。
swaplevel([i, j, axis])	在MultiIndex中交换级别i和级别j。
tail([n])	返回最后n行。
take(indices[, axis])	返回在指定轴上给定位置索引的元素。
to_clipboard([excel, sep])	将对象复制到系统剪贴板。
to_csv([path_or_buf, sep, na_rep, …])	将对象写入逗号分隔值（csv）文件。
to_dict([orient, into, index])	将DataFrame转换为字典。
to_excel(excel_writer[, sheet_name, na_rep, …])	将对象写入Excel表格中。
to_feather(path, **kwargs)	将DataFrame写入二进制Feather格式。
to_gbq(destination_table[, project_id, …])	将DataFrame写入Google BigQuery表。
to_hdf(path_or_buf, key[, mode, complevel, …])	使用HDFStore将包含的数据写入HDF5文件。
to_html([buf, columns, col_space, header, …])	将DataFrame渲染为HTML表格。
to_json([path_or_buf, orient, date_format, …])	将对象转换为JSON字符串。
to_latex([buf, columns, header, index, …])	将对象渲染为LaTeX表格。
to_markdown([buf, mode, index, storage_options])	以Markdown友好的格式打印DataFrame。
to_numpy([dtype, copy, na_value])	将DataFrame转换为NumPy数组。
to_orc([path, engine, index, engine_kwargs])	将DataFrame写入ORC格式。
to_parquet([path, engine, compression, …])	将DataFrame写入二进制parquet格式。
to_period([freq, axis, copy])	将DataFrame从DatetimeIndex转换为PeriodIndex。
to_pickle(path[, compression, protocol, …])	将对象pickle（序列化）到文件中。
to_records([index, column_dtypes, index_dtypes])	将DataFrame转换为NumPy记录数组。
to_sql(name, con, *[, schema, if_exists, …])	将存储在DataFrame中的记录写入SQL数据库。
to_stata(path, *[, convert_dates, …])	将DataFrame对象导出为Stata dta格式。
to_string([buf, columns, col_space, header, …])	将DataFrame渲染为控制台友好的表格输出。
to_timestamp([freq, how, axis, copy])	将时间戳的数据类型转换为DatatimeIndex，位于周期的开始处。
to_xarray()	从pandas对象返回xarray对象。
to_xml([path_or_buffer, index, root_name, …])	将DataFrame渲染为XML文档。
transform(func[, axis])	在self上调用func，生成与self形状相同的DataFrame。
transpose(*args[, copy])	转置索引和列。
truediv(other[, axis, level, fill_value])	对dataframe和其他对象逐元素进行浮点除法运算。
truncate([before, after, axis, copy])	在某个索引值之前和之后截断Series或DataFrame。
tz_convert(tz[, axis, level, copy])	将时区感知轴转换为目标时区。
tz_localize(tz[, axis, level, copy, …])	将tz-naive索引本地化到目标时区。
unstack([level, fill_value, sort])	从列到索引展开指定的级别。
update(other[, join, overwrite, …])	使用另一个DataFrame中的非NA值修改当前DataFrame。
value_counts([subset, normalize, sort, …])	返回包含数据框中每个不同行的频率的系列。
var([axis, skipna, ddof, numeric_only])	返回请求轴上的无偏方差。
where(cond[, other, inplace, axis, level])	替换条件为False的值。
xs(key[, axis, level, drop_level])	返回Series / DataFrame的交叉部分。