参考链接: Python数据分析与展示
参考链接: Pandas官网
参考链接: User Guide
参考链接: Getting started tutorials
DataFrame类型:
DataFrame类型由共用相同索引的一组列组成
DataFrame是一个表格型的数据类型,每列值类型可以不同
DataFrame既有行索引(index)、也有列索引(column)
DataFrame常用于表达二维数据,但可以表达多维数据
DataFrame是二维带“标签”数组
DataFrame基本操作类似Series,依据行列索引
DataFrame的创建方式:
- DataFrame类型可以由如下类型创建:
- 二维ndarray对象
- 由一维ndarray、列表、字典、元组或Series构成的字典
- Series类型
- 其他的DataFrame类型
从二维ndarray对象创建:
Microsoft Windows [版本 10.0.18363.1198]
(c) 2019 Microsoft Corporation。保留所有权利。
C:\Users\chenxuqi>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import numpy as np
>>> # 从二维ndarray对象创建DataFrame对象
... d = pd.DataFrame(np.arange(10).reshape(2,5))
>>> d
0 1 2 3 4
0 0 1 2 3 4
1 5 6 7 8 9
>>>
从一维ndarray对象字典创建,数据根据行列索引自动补齐:
>>>
>>> dt = {'one':pd.Series([1,2,3],index=['a','b','c']),
... 'two':pd.Series([9,8,7,6],index=['a','b','c','d'])}
>>> dt
{'one': a 1
b 2
c 3
dtype: int64, 'two': a 9
b 8
c 7
d 6
dtype: int64}
>>> d = pd.DataFrame(dt)
>>> # 字典的键默认作为自定义的列索引
... # 字典的每个值作为一列
... # 行索引相同,对于缺失的数据,
... # 被自动添加为NaN
...
>>> d
one two
a 1.0 9
b 2.0 8
c 3.0 7
d NaN 6
>>> # index= 和 columns= 分别用于指定行和列的索引
... # 缺失的元素自动补齐,是NaN
...
>>> pd.DataFrame(dt,index=['b','c','d'],columns=['two','three'])
two three
b 8 NaN
c 7 NaN
d 6 NaN
>>>
>>>
从列表类型的字典创建:
>>>
>>> # 从列表类型的字典创建
... dl = {'one':[1,2,3,4],'two':[9,8,7,6]}
>>> d = pd.DataFrame(dl,index=["a","b","c","d"])
>>> d
one two
a 1 9
b 2 8
c 3 7
d 4 6
>>>
表格数据举例:
>>> dl = {\
... "城市":['北京','上海','广州','深圳','沈阳'],\
... "环比":[101.5,101.2,101.3,102.0,100.1],\
... "同比":[120.7,127.3,119.4,140.9,101.4],\
... "定基":[121.4,127.8,120.0,145.5,101.6],\
... }
>>> d = pd.DataFrame(dl,index=['c1','c2','c3','c4','c5'])
>>> d
城市 环比 同比 定基
c1 北京 101.5 120.7 121.4
c2 上海 101.2 127.3 127.8
c3 广州 101.3 119.4 120.0
c4 深圳 102.0 140.9 145.5
c5 沈阳 100.1 101.4 101.6
>>> d.index
Index(['c1', 'c2', 'c3', 'c4', 'c5'], dtype='object')
>>> d.columns
Index(['城市', '环比', '同比', '定基'], dtype='object')
>>> d.values
array([['北京', 101.5, 120.7, 121.4],
['上海', 101.2, 127.3, 127.8],
['广州', 101.3, 119.4, 120.0],
['深圳', 102.0, 140.9, 145.5],
['沈阳', 100.1, 101.4, 101.6]], dtype=object)
>>> d['同比']
c1 120.7
c2 127.3
c3 119.4
c4 140.9
c5 101.4
Name: 同比, dtype: float64
>>> d.ix['c2'] # pandas的1.0.0版本后,已经对该函数进行了升级和重构。可以使用loc来代替
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\Python37\lib\site-packages\pandas\core\generic.py", line 5274, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'ix'
>>> d.loc['c2']
城市 上海
环比 101.2
同比 127.3
定基 127.8
Name: c2, dtype: object
>>> d.iloc[0:2,2:4]
同比 定基
c1 120.7 121.4
c2 127.3 127.8
>>> d.loc['c2']['同比']
127.3
>>> d.loc['同比']['c2']
Traceback (most recent call last):
File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '同比'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1768, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 1965, in _getitem_axis
return self._get_label(key, axis=axis)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexing.py", line 625, in _get_label
return self.obj._xs(label, axis=axis)
File "D:\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3537, in xs
loc = self.index.get_loc(key)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '同比'
>>> d['同比']['c2']
127.3
>>> d['c2']['同比']
Traceback (most recent call last):
File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python\Python37\lib\site-packages\pandas\core\frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "D:\Python\Python37\lib\site-packages\pandas\core\indexes\base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'c2'
>>>