文章目录
一、对象创建
1.通过Series创建
Series 是一种类似于一维数组的对象, 由一组数据和一组与之相关的数据标签(索引)组成
Creating a Series by passing a list of values, letting pandas create a default integer index:
import numpy as np
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
2.通过DataFrame创建
Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:
dates = pd.date_range('20130101', periods=6)
dates
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
Creating a DataFrame by passing a dict of objects that can be converted to series-like.
df2 = pd.DataFrame({'A': 1.,
...: 'B': pd.Timestamp('20130102'),
...: 'C': pd.Series(1, index=list(range(4)), dtype='float32'),
...: 'D': np.array([3] * 4, dtype='int32'),
...: 'E': pd.Categorical(["test", "train", "test", "train"]),
...: 'F': 'foo'})
A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo
The columns of the resulting DataFrame have different dtypes.
df2.dtypes
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
3. DataFrame 的基本属性和整体情况查询
二、数据查看
df.head() #默认前5个
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
df.tail(3)
A B C D
2013-01-04 0.721555 -0.706771 -1.039575 0.271860
2013-01-05 -0.424972 0.567020 0.276232 -1.087401
2013-01-06 -0.673690 0.113648 -1.478427 0.524988
Display the index, columns:
df.index
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
三、Series应用NumPy数组运算
1.NumPy中运算和操作可用于Series类型
In [26]: series04
Out[26]:
20071001 6789.98
20071002 34556.89
20071003 3748758.88
In [27]: series04[series04>10000]
Out[27]:
20071002 34556.89
20071003 3748758.88
dtype: float64
In [28]: series04/100
Out[28]:
20071001 67.8998
20071002 345.5689
20071003 37487.5888
dtype: float64
In [29]: series01
Out[29]:
0 1
1 2
2 3
3 4
dtype: int32
In [30]: np.exp(series01)
Out[30]:
0 2.718282
1 7.389056
2 20.085537
3 54.598150
dtype: float64
2.Series类型的操作类似Python字典类型
• 通过自定义索引访问
• 保留字in操作
• 使用.get()方法
In [18]: b=pd.Series([9,8,7,6],index=list('abcd'))
In [19]: b['b']
Out[19]: 8
In [20]: 'c' in b
Out[20]: True
In [23]: b.get('f',100)
Out[23]: 100
In [25]: b.get('c',100)
Out[25]: 7
3.Series缺失值检测
pandas中的isnull和notnull函数
In [31]: score=pd.Series({'Tom':89,'John':88,'Merry':96,'Max':65})
In [32]: score
Out[32]:
Tom 89
John 88
Merry 96
Max 65
dtype: int64
In [33]: new_index=['Tom','Max','Joe','John','Merry']
In [34]: scores = pd.Series(score,index=new_index)
In [35]: scores
Out[35]:
Tom 89.0
Max 65.0
Joe NaN
John 88.0
Merry 96.0
dtype: float64
pandas中的isnull和notnull函数可用于Series缺失值检测。
isnull和notnull都返回一个布尔类型的Series。
In [37]: pd.isnull(scores)
Out[37]:
Tom False
Max False
Joe True
John False
Merry False
dtype: bool
In [38]: pd.notnull(scores)
Out[38]:
Tom True
Max True
Joe False
John True
Merry True
dtype: bool
In [39]: scores[pd.isnull(scores)]
Out[39]:
Joe NaN
dtype: float64
In [40]: scores[pd.notnull(scores)]
Out[40]:
Tom 89.0
Max 65.0
John 88.0
Merry 96.0
dtype: float64