3.1Pandas
3.1.1 Series
class pandas.Series(data = None, index = None, dtype = None, name = None, copy = False, fastpath = False)
- data 表示传入的数据
- index 表示索引
- dtype 数据类型,默认会自己判断
- name 设置名称
- copy 拷贝数据,默认为 False
//通过传入列表创建series对象
import pandas as pd
a = ["Google", "Runoob", "Wiki"]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
//通过字典(键值对)创建Series
import pandas as pd
sites = {
1: "Google", 2: "Runoob", 3: "Wiki"}
myvar = pd.Series(sites)
print(myvar)
注:以下均为jupyter中代码实例
In [1]:
import pandas as pd # 导入pandas库
ser_obj = pd.Series([1, 2, 3, 4, 5]) # 创建Series类对象
ser_obj
Out[1]:
0 1
1 2
2 3
3 4
4 5
dtype: int64
In [2]:
# 创建Series类对象,并指定索引
ser_obj = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
ser_obj
Out[2]:
a 1
b 2
c 3
d 4
e 5
dtype: int64
In [3]:
year_data = {
2001: 17.8, 2002: 20.1, 2003: 16.5}
ser_obj2 = pd.Series(year_data) # 创建Series类对象
ser_obj2
Out[3]:
2001 17.8
2002 20.1
2003 16.5
dtype: float64
In [4]:
ser_obj.index # 获取ser_obj的索引
Out[4]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
In [5]:
ser_obj.values # 获取ser_obj的数据
Out[5]:
array([1, 2, 3, 4, 5], dtype=int64)
In [5]:
ser_obj[3] # 获取位置索引3对应的数据
Out[5]:
4
In [6]:
ser_obj * 2
Out[6]:
a 2
b 4
c 6
d 8
e 10
dtype: int64
3.1.2 DataFrame
pandas.DataFrame(data, index, columns, dtype, copy)
- data:一组数据(ndarray,series, map, lists, dict 等类型)
- index:索引值,或者可以称为行标签
- columns:列标签,默认为 RangeIndex (0, 1, 2, …, n)
- dtype:数据类型
- copy:拷贝数据,默认为 False
In [7]:
import numpy as np
import pandas as pd
demo_arr = np.array([['a', 'b', 'c'], ['d', 'e', 'f']])
# 创建数组
df_obj = pd.DataFrame(demo_arr) # 基于数组创建DataFrame对象
df_obj
Out[7]:
0 | 1 | 2 | |
---|---|---|---|
0 | a | b | c |
1 | d | e | f |
In [8]:
# 创建DataFrame对象,指定列索引
df_obj1 = pd.DataFrame(demo_arr, columns=['No1', 'No2', 'No3'])
df_obj1
Out[8]:
No1 | No2 | No3 | |
---|---|---|---|
0 | a | b | c |
1 | d | e | f |
In [10]:
element = df_obj1['No2'] # 通过列索引的方式获取一列数据
element
Out[10]:
0 b
1 e
Name: No2, dtype: object
In [11]:
type(element) # 查看返回结果的类型
Out[11]:
pandas.core.series.Series
In [11]:
element = df_obj1.No2 # 通过属性获取列数据
element
Out[11]:
0 b
1 e
Name: No2, dtype: object
In [12]:
type(element) # 查看返回结果的类型
Out[12]:
pandas.core.series.Series
In [13]:
df_obj1['No4'] = ['g', 'h']
df_obj1
Out[13]:
No1 | No2 | No3 | No4 | |
---|---|---|---|---|
0 | a | b | c | g |
1 | d | e | f | h |
In [14]:
del df_obj1['No4']
df_obj1
Out[14]:
No1 | No2 | No3 | |
---|---|---|---|
0 | a | b | c |
1 | d | e | f |
3.2 索引操作及高级索引
3.2.1 索引对象
In [15]:
import pandas as pd
ser_obj = pd.Series(range(5), index=['a','b','c','d','e'])
ser_index = ser_obj.index
ser_index
Out[15]:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
In [16]:
ser_obj
Out[16]:
a 0
b 1
c 2
d 3
e 4
dtype: int64
In [17]:
ser_index['2'] = 'cc' # (执行时,将注释打开,便可以看到错误信息)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-3d779dc501cd> in <module>
----> 1 ser_index['2'] = 'cc' # (执行时,将注释打开,便可以看到错误信息)
c:\users\dell\anaconda3\envs\pyg\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
4082
4083 def __setitem__(self, key, value):
-> 4084 raise TypeError("Index does not support mutable operations")
4085
4086 def __getitem__(self, key):
TypeError: Index does not support mutable operations
In [22]:
ser_obj1 = pd.Series(range(3), index=['a','b','c'])
ser_obj2 = pd.Series(['a','b','c'], index=ser_obj1.index)
ser_obj2.index is ser_obj1.index
Out[22]:
True
In [19]:
ser_obj1
Out[19]:
a 0
b 1
c 2
dtype: int64
In [20]:
ser_obj2
Out[20]:
a a
b b
c c
dtype: object
3.2.2 重置索引
In [23]:
import pandas as pd
ser_obj = pd.Series([1, 2, 3, 4, 5], index=['c', 'd', 'a', 'b', 'e'])
ser_obj
Out[23]:
c 1
d 2
a 3
b 4
e 5
dtype: int64
In [24]:
# 重新索引
ser_obj2 = ser_obj.reindex(['a', 'b', 'c', 'd', 'e', 'f'])
ser_obj2
Out[24]:
a 3.0
b 4.0
c 1.0
d 2.0
e 5.0
f NaN
dtype: float64
In [21]:
# 重新索引时指定填充的缺失值
ser_obj2 = ser_obj.reindex(['a', 'b', 'c', 'd', 'e', 'f'], fill_value = 6)
ser_obj2
Out[21]:
a 3
b 4
c 1
d 2
e 5
f 6
dtype: int64
In [25]:
# 创建Series对象,并为其指定索引
ser_obj3 = pd.Series([1, 3, 5, 7], index=[0, 2, 4, 6])
ser_obj3
Out[25]:
0 1
2 3
4 5
6 7
dtype: int64
In [27]:
ser_obj3.reindex(range(6)