Pandas库学习笔记(3) DataFrame类型

最新推荐文章于 2024-02-28 20:15:59 发布

敲代码的小风

最新推荐文章于 2024-02-28 20:15:59 发布

阅读量1.6k

点赞数

分类专栏： Pandas库学习笔记文章标签： python 数据分析 pandas

本文链接：https://blog.csdn.net/m0_46653437/article/details/110432783

版权

Pandas库学习笔记专栏收录该内容

9 篇文章 3 订阅

订阅专栏

参考链接: Python数据分析与展示
参考链接: Pandas官网
参考链接: User Guide
参考链接: Getting started tutorials

DataFrame类型:

DataFrame类型由共用相同索引的一组列组成
DataFrame是一个表格型的数据类型，每列值类型可以不同
DataFrame既有行索引(index)、也有列索引(column)
DataFrame常用于表达二维数据，但可以表达多维数据
DataFrame是二维带“标签”数组
DataFrame基本操作类似Series，依据行列索引

DataFrame的创建方式:

DataFrame类型可以由如下类型创建：
- 二维ndarray对象
- 由一维ndarray、列表、字典、元组或Series构成的字典
- Series类型
- 其他的DataFrame类型

从二维ndarray对象创建:

Microsoft Windows [版本 10.0.18363.1198]
(c) 2019 Microsoft Corporation。保留所有权利。

C:\Users\chenxuqi>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> # 从二维ndarray对象创建DataFrame对象
... d = pd.DataFrame(np.arange(10).reshape(2,5))
>>> d
   0  1  2  3  4
0  0  1  2  3  4
1  5  6  7  8  9
>>>

从一维ndarray对象字典创建,数据根据行列索引自动补齐:

>>>
>>> dt = {'one':pd.Series([1,2,3],index=['a','b','c']),
...      'two':pd.Series([9,8,7,6],index=['a','b','c','d'])}
>>> dt
{'one': a    1
b    2
c    3
dtype: int64, 'two': a    9
b    8
c    7
d    6
dtype: int64}
>>> d = pd.DataFrame(dt)
>>> # 字典的键默认作为自定义的列索引
... # 字典的每个值作为一列
... # 行索引相同,对于缺失的数据,
... # 被自动添加为NaN
...
>>> d
   one  two
a  1.0    9
b  2.0    8
c  3.0    7
d  NaN    6
>>> # index= 和 columns= 分别用于指定行和列的索引
... # 缺失的元素自动补齐,是NaN
...
>>> pd.DataFrame(dt,index=['b','c','d'],columns=['two','three'])
   two three
b    8   NaN
c    7   NaN
d    6   NaN
>>>
>>>

从列表类型的字典创建:

>>>
>>> # 从列表类型的字典创建
... dl = {'one':[1,2,3,4],'two':[9,8,7,6]}
>>> d = pd.DataFrame(dl,index=["a","b","c","d"])
>>> d
   one  two
a    1    9
b    2    8
c    3    7
d    4    6
>>>

表格数据举例:

>>> dl = {\
...      "城市":['北京','上海','广州','深圳','沈阳'],\
...      "环比":[101.5,101.2,101.3,102.0,100.1],\
...      "同比":[120.7,127.3,119.4,140.9,101.4],\
...      "定基":[121.4,127.8,120.0,145.5,101.6],\
...      }
>>> d = pd.DataFrame(dl,index=['c1','c2','c3','c4','c5'])
>>> d
    城市     环比     同比     定基
c1  北京  101.5  120.7  121.4
c2  上海  101.2  127.3  127.8
c3  广州  101.3  119.4  120.0
c4  深圳  102.0  140.9  145.5
c5  沈阳  100.1  101.4  101.6
>>> d.index
Index(['c1', 'c2', 'c3', 'c4', 'c5'], dtype='object')
>>> d.columns
Index(['城市', '环比', '同比', '定基'], dtype='object')
>>> d.values
array([['北京', 101.5, 120.7, 121.4],
       ['上海', 101.2, 127.3, 127.8],
       ['广州', 101.3, 119.4, 120.0],
       ['深圳', 102.0, 140.9, 145.5],
       ['沈阳', 100.1, 101.4, 101.6]], dtype=object)
>>> d['同比']
c1    120.7
c2    127.3
c3    119.4
c4    140.9
c5    101.4
Name: 同比, dtype: float64
>>> d.ix['c2'] # pandas的1.0.0版本后，已经对该函数进行了升级和重构。可以使用loc来代替
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\Python37\lib\site-packages\pandas\core\generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'ix'
>>> d.loc['c2']
城市       上海
环比    101.2
同比    127.3
定基    127.8
Name: c2, dtype: object
>>> d.iloc[0:2,2:4]
       同比     定基
c1  120.7  121.4
c2  127.3  127.8
>>> d.loc['c2']['同比']
127.3
>>> d.loc['同比']['c2']
Traceback (most recent call last):
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '同比'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1768, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1965, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 625, in _get_label
    return self.obj._xs(label, axis=axis)
  File "D:\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3537, in xs
    loc = self.index.get_loc(key)
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '同比'
>>> d['同比']['c2']
127.3
>>> d['c2']['同比']
Traceback (most recent call last):
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\Python37\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'
>>>