dataframe保存为txt_「干货」Pandas数据结构之DataFrame常见操作

最新推荐文章于 2024-07-08 03:48:49 发布

weixin_39844284

最新推荐文章于 2024-07-08 03:48:49 发布

阅读量1.4k

点赞数 1

文章标签： dataframe保存为txt dataframe转series 相减

本文详细介绍了Pandas DataFrame的各种操作，包括提取、添加、删除列，如何填充列，以及如何通过方法链分配新列。还提到了assign()方法，允许在DataFrame上创建新列并进行计算，同时讨论了Python版本对代码行为的影响。此外，文章还涵盖了数据对齐、索引选择、广播运算以及DataFrame与NumPy函数的交互。

摘要由CSDN通过智能技术生成

文章来源于Python大咖谈，作者呆鸟的Python大咖谈

提取、添加、删除列用方法链分配新列索引 / 选择数据对齐和运算转置DataFrame 应用 NumPy 函数控制台显示DataFrame 列属性访问和 IPython 代码补全

提取、添加、删除列

DataFrame 就像带索引的 Series 字典，提取、设置、删除列的操作与字典类似：

In [61]: df['one']Out[61]: a 1.0b 2.0c 3.0d NaNName: one, dtype: float64In [62]: df['three'] = df['one'] * df['two']In [63]: df['flag'] = df['one'] > 2In [64]: dfOut[64]:  one two three flaga 1.0 1.0 1.0 Falseb 2.0 2.0 4.0 Falsec 3.0 3.0 9.0 Trued NaN 4.0 NaN False

删除(del、pop)列的方式也与字典类似：

In [65]: del df['two']In [66]: three = df.pop('three')In [67]: dfOut[67]:  one flaga 1.0 Falseb 2.0 Falsec 3.0 Trued NaN False

标量值以广播的方式填充列：

In [68]: df['foo'] = 'bar'In [69]: dfOut[69]:  one flag fooa 1.0 False barb 2.0 False barc 3.0 True bard NaN False bar

插入与 DataFrame 索引不同的 Series 时，以 DataFrame 的索引为准：

In [70]: df['one_trunc'] = df['one'][:2]In [71]: dfOut[71]:  one flag foo one_trunca 1.0 False bar 1.0b 2.0 False bar 2.0c 3.0 True bar NaNd NaN False bar NaN

可以插入原生多维数组，但长度必须与 DataFrame 索引长度一致。

默认在 DataFrame 尾部插入列。insert 函数可以指定插入列的位置：

In [72]: df.insert(1, 'bar', df['one'])In [73]: dfOut[73]:  one bar flag foo one_trunca 1.0 1.0 False bar 1.0b 2.0 2.0 False bar 2.0c 3.0 3.0 True bar NaNd NaN NaN False bar NaN

用方法链分配新列

受 dplyr 的 mutate 启发，DataFrame 提供了 assign() 方法，可以利用现有的列创建新列。

In [74]: iris = pd.read_csv('data/iris.data')In [75]: iris.head()Out[75]:  SepalLength SepalWidth PetalLength PetalWidth Name0 5.1 3.5 1.4 0.2 Iris-setosa1 4.9 3.0 1.4 0.2 Iris-setosa2 4.7 3.2 1.3 0.2 Iris-setosa3 4.6 3.1 1.5 0.2 Iris-setosa4 5.0 3.6 1.4 0.2 Iris-setosaIn [76]: (iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']) ....: .head()) ....: Out[76]:  SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio0 5.1 3.5 1.4 0.2 Iris-setosa 0.6862751 4.9 3.0 1.4 0.2 Iris-setosa 0.6122452 4.7 3.2 1.3 0.2 Iris-setosa 0.6808513 4.6 3.1 1.5 0.2 Iris-setosa 0.6739134 5.0 3.6 1.4 0.2 Iris-setosa 0.720000

上例中，插入了一个预计算的值。还可以传递带参数的函数，在 assign 的 DataFrame 上求值。

In [77]: iris.assign(sepal_ratio=lambda x: (x['SepalWidth'] / x['SepalLength'])).head()Out[77]:  SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio0 5.1 3.5 1.4 0.2 Iris-setosa 0.6862751 4.9 3.0 1.4 0.2 Iris-setosa 0.6122452 4.7 3.2 1.3 0.2 Iris-setosa 0.6808513 4.6 3.1 1.5 0.2 Iris-setosa 0.6739134 5.0 3.6 1.4 0.2 Iris-setosa 0.720000

assign 返回的都是数据副本，原 DataFrame 不变。

未引用 DataFrame 时，传递可调用的，不是实际要插入的值。这种方式常见于在操作链中调用 assign 的操作。例如，将 DataFrame 限制为花萼长度大于 5 的观察值，计算比例，再制图：

In [78]: (iris.query('SepalLength > 5') ....: .assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength, ....: PetalRatio=lambda x: x.PetalWidth / x.PetalLength) ....: .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) ....: Out[78]:

最低0.47元/天解锁文章

weixin_39844284

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫