pandas.DataFrame

最新推荐文章于 2024-06-14 23:15:56 发布

「已注销」

最新推荐文章于 2024-06-14 23:15:56 发布

阅读量656

点赞数

分类专栏： python 文章标签： pandas.DataFrame python pandas

python 专栏收录该内容

159 篇文章 5 订阅

订阅专栏

官方文档入口

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
这里写图片描述

数据除了可以直接传递对象外还有以下几个来源：

DataFrame.from_records
    constructor from tuples, also record arrays
DataFrame.from_dict
    from dicts of Series, arrays, or dicts
DataFrame.from_items
    from sequence of (key, value) pairs
pandas.read_csv, pandas.read_table, pandas.read_clipboard

上一个官方的示例：

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
...                    columns=['a', 'b', 'c', 'd', 'e'])
>>> df
    a   b   c   d   e
0   2   8   8   3   4
1   4   2   9   0   9
2   1   0   7   8   0
3   5   1   7   1   3
4   6   0   2   4   2

属性有以下：
这里写图片描述

下为常用方法解释以及示例：

方法	解释
abs()	返回Series/DataFrame中每一个元素的绝对值
add()	将两个Series/DataFrame中的元素进行对应相加
add_prefix(prefix)	为columns加前缀
add_suffix(suffix)	为columns加后缀
agg(func[, axis])	对于特定的轴应用一些常规操作，如df.agg([‘sum’, ‘min’])
all([axis, bool_only, skipna, level])	如果所有元素为真则返回True，通常作用于一个轴
any([axis, bool_only, skipna, level])	如果任意元素为真则返回True，通常作用于一个轴
append(other[, ignore_index, …])	将其他Series/DataFrame中的行附加到当前frame之后，并且返回一个新对象
apply(func[, axis, broadcast, raw, reduce, …])	对给定frame的每一个元素应用某个函数，如df.apply(np.sqrt)
applymap()	根据输入对frame的每一个元素应用函数，如df.applymap(lambda x: x**2)
at_time()	选择某天某个时间点的数据，如i = pd.date_range(‘2018-04-09’, periods=4, freq=’12H’)；ts = pd.DataFrame({‘A’: [1,2,3,4]}, index=i)；ts.at_time(‘12:00’)
between_time(start_time, end_time[, …])	选择在某两个时间点之间的数据，如ts.between_time(‘0:45’, ‘0:15’)
copy([deep])	复制当前的frame
count([axis, level, numeric_only])	对于行/列中的non-NA 元素进行计数
drop([labels, axis, index, columns, level, …])	从指定的标签或者栏中丢弃数据
drop_duplicates([subset, keep, inplace])	行去重
dropna([axis, how, thresh, subset, inplace])	移除遗失/缺失的值
equals(other)	判断两个NDFrame是否包含相同的元素
fillna([value, method, axis, inplace, …])	使用指定的方法填充NA/NaN
filter([items, like, regex, axis])
get(key[, default])	获取给定键的值(DataFrame column, Panel slice, etc.).
head([n])/tail([n])	获取frame的前/后N行元素
max([axis, skipna, level, numeric_only])	获取给定对象的最大值，一般以行为维度
to_csv([path_or_buf, sep, na_rep, …])	将DataFrame写入制定的csv文件
更多用法请参阅官方文档

>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'],
...                  columns=['one'])
>>> a
   one
a  1.0
b  1.0
c  1.0
d  NaN
>>> b = pd.DataFrame(dict(one=[1, np.nan, 1, np.nan],
...                       two=[np.nan, 2, np.nan, 2]),
...                  index=['a', 'b', 'd', 'e'])
>>> b
   one  two
a  1.0  NaN
b  NaN  2.0
d  1.0  NaN
e  NaN  2.0
>>> a.add(b, fill_value=0)
   one  two
a  2.0  NaN
b  1.0  2.0
c  1.0  NaN
d  1.0  NaN
e  NaN  2.0