Pandas库学习笔记(4) Pandas中Series和DataFrame数据类型的操作

最新推荐文章于 2024-04-15 22:14:04 发布

敲代码的小风

最新推荐文章于 2024-04-15 22:14:04 发布

阅读量628

点赞数

分类专栏： Pandas库学习笔记文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/m0_46653437/article/details/110434156

版权

Pandas库学习笔记专栏收录该内容

9 篇文章 3 订阅

订阅专栏

参考链接: Python数据分析与展示
参考链接: Pandas官网
参考链接: User Guide
参考链接: Getting started tutorials

重新索引:

如何改变Series和DataFrame对象？
增加或重排：重新索引
删除：drop
.reindex()能够改变或重排Series和DataFrame索引

实验演示1:

Microsoft Windows [版本 10.0.18363.1198]
(c) 2019 Microsoft Corporation。保留所有权利。

C:\Users\chenxuqi>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> dl = {\
...         "城市":['北京','上海','广州','深圳','沈阳'],\
...         "环比":[101.5,101.2,101.3,102.0,100.1],\
...         "同比":[120.7,127.3,119.4,140.9,101.4],\
...         "定基":[121.4,127.8,120.0,145.5,101.6],\
...     }
>>> d = pd.DataFrame(dl,index=['c1','c2','c3','c4','c5'])
>>> d
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
c5  沈阳  100.1  101.4  101.6
>>> d['同比']['c2']
127.3
>>> d['c2']['同比']
Traceback (most recent call last):
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\Python37\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'
>>> # 改变行的排列布局
... d = d.reindex(index=['c5','c4','c3','c2','c1'])
>>>
>>> d
    城市     环比     同比     定基
c5  沈阳  100.1  101.4  101.6
c4  深圳  102.0  140.9  145.5
c3  广州  101.3  119.4  120.0
c2  上海  101.2  127.3  127.8
c1  北京  101.5  120.7  121.4
>>> # 改变列的排列布局
... d = d.reindex(columns=['城市','同比','环比','定基'])
>>> d
    城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.0
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4
>>>
>>>

重新索引:
.reindex(index=None, columns=None, …)的参数:

参数	说明
index, columns	新的行列自定义索引
fill_value	重新索引中，用于填充缺失位置的值
method	填充方法, ffill当前值向前填充，bfill向后填充
limit	最大填充量
copy	默认True，生成新的对象，False时，新旧相等不复制

实验演示2:

>>>
>>> d
    城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.0
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4
>>> # Series和DataFrame的索引是Index类型
... # Index对象是不可修改类型
... newc = d.columns.insert(4,'昊昊-新增列')
>>> newc
Index(['城市', '同比', '环比', '定基', '昊昊-新增列'], dtype='object')
>>> newd = d.reindex(columns=newc,fill_value=20200910)
>>> newd
    城市     同比     环比     定基    昊昊-新增列
c5  沈阳  101.4  100.1  101.6  20200910
c4  深圳  140.9  102.0  145.5  20200910
c3  广州  119.4  101.3  120.0  20200910
c2  上海  127.3  101.2  127.8  20200910
c1  北京  120.7  101.5  121.4  20200910
>>> d.index
Index(['c5', 'c4', 'c3', 'c2', 'c1'], dtype='object')
>>> d.columns
Index(['城市', '同比', '环比', '定基'], dtype='object')
>>> # Series和DataFrame的索引是Index类型
... # Index对象是不可修改类型
... d
    城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.0
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4
>>>

索引类型:

Series和DataFrame的索引是Index类型
Index对象是不可修改类型

索引类型的常用方法:

方法	说明
.append(idx)	连接另一个Index对象，产生新的Index对象
.diff(idx)	计算差集，产生新的Index对象
.intersection(idx)	计算交集
.union(idx)	计算并集
.delete(loc)	删除loc位置处的元素
.insert(loc,e)	在loc位置增加一个元素e

实验3:

>>>
>>> d
    城市     同比     环比     定基
c5  沈阳  101.4  100.1  101.6
c4  深圳  140.9  102.0  145.5
c3  广州  119.4  101.3  120.0
c2  上海  127.3  101.2  127.8
c1  北京  120.7  101.5  121.4
>>> nc = d.columns.delete(2)
>>> nc
Index(['城市', '同比', '定基'], dtype='object')
>>> ni = d.index.insert(5,"c0")
>>> ni
Index(['c5', 'c4', 'c3', 'c2', 'c1', 'c0'], dtype='object')
>>> # nd = d.reindex(index=ni,columns=nc,method='ffill') # 这行语句在新版本中会有问题,使用如下语句即可
... nd = d.reindex(index=ni,columns=nc).ffill()
>>>
>>> nd
    城市     同比     定基
c5  沈阳  101.4  101.6
c4  深圳  140.9  145.5
c3  广州  119.4  120.0
c2  上海  127.3  127.8
c1  北京  120.7  121.4
c0  北京  120.7  121.4
>>>
>>>

.drop()能够删除Series和DataFrame指定行或列索引
删除指定索引对象,实验4:

>>>
>>>
>>> a = pd.Series([9,8,7,6],index=['a','b','c','d'])
>>> a
a    9
b    8
c    7
d    6
dtype: int64
>>> a.drop(['b','c'])
a    9
d    6
dtype: int64
>>> a
a    9
b    8
c    7
d    6
dtype: int64
>>> dl = {\
...         "城市":['北京','上海','广州','深圳','沈阳'],\
...         "环比":[101.5,101.2,101.3,102.0,100.1],\
...         "同比":[120.7,127.3,119.4,140.9,101.4],\
...         "定基":[121.4,127.8,120.0,145.5,101.6],\
...     }
>>>
>>> d = pd.DataFrame(dl,index=['c1','c2','c3','c4','c5'])
>>> d
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
c5  沈阳  100.1  101.4  101.6
>>> d.drop('c5')
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
>>> d.drop('同比',axis=1)
    城市     环比     定基
c1  北京  101.5  121.4
c2  上海  101.2  127.8
c3  广州  101.3  120.0
c4  深圳  102.0  145.5
c5  沈阳  100.1  101.6
>>> d
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
c5  沈阳  100.1  101.4  101.6
>>>
>>>

敲代码的小风

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Pandas库学习笔记(4) Pandas中Series和DataFrame数据类型的操作

参考链接: Python数据分析与展示参考链接: Pandas官网参考链接: User Guide参考链接: Getting started tutorials
复制链接

扫一扫