NumPy基本操作:
1、数组的创建:
In [1]: import numpy as np
In [2]: narr = np.arange(0,10,1)
In [3]: narr
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [4]: narr1 = np.arange(0,1,0.1)
In [5]: narr1
Out[5]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
In [6]: narr2 = np.linspace(0,1,10,endpoint=False)
In [7]: narr2
Out[7]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
In [8]: narr = np.array([1,2,3])
In [9]: narr
Out[9]: array([1, 2, 3])
In [10]: narr = np.array((1,2,3))
In [11]: narr
Out[11]: array([1, 2, 3])
In [12]: narr = np.array([[1,2,3], [4,5,6]])
In [13]: narr
Out[13]:
array([[1, 2, 3],
[4, 5, 6]])
In [14]: narr3 = np.array([narr,narr1])
In [15]: narr3
Out[15]:
array([array([[1, 2, 3],
[4, 5, 6]]),
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])],
dtype=object)
还有其他的操作:
In [17]: np.zeros((2,3))
Out[17]:
array([[0., 0., 0.],
[0., 0., 0.]])
In [18]: np.ones((2,3))
Out[18]:
array([[1., 1., 1.],
[1., 1., 1.]])
In [19]: np.eye(3)
Out[19]:
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
2、数组的属性:
In [20]: narr
Out[20]:
array([[1, 2, 3],
[4, 5, 6]])
In [21]: narr.dtype
Out[21]: dtype('int64')
In [22]: narr1 = np.array([1,2,3,4], dtype='float64')
In [23]: narr1.dtype
Out[23]: dtype('float64')
//自定义属性:
In [26]: student = np.dtype([('姓名', 'str', 5),('学号', 'int8'),('绩点','float')])
In [27]: classx = np.array([('小明','1','3.0'),('小红','2','2.88')],dtype=student)
In [28]: classx
Out[28]:
array([('小明', 1, 3. ), ('小红', 2, 2.88)],
dtype=[('姓名', '<U5'), ('学号', 'i1'), ('绩点', '<f8')])
In [29]: classx[0]
Out[29]: ('小明', 1, 3.)
In [30]: classx[0][0]
Out[30]: '小明'
In [33]: np.ones((3,5))
Out[33]:
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
In [34]: np.ones((3,5)).reshape(1,15)
Out[34]: array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
3、数组的操作:
In [8]: na
Out[8]: array(10)
In [9]: na = np.arange(10)
In [10]: na
Out[10]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [11]: na[::-1]
Out[11]: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
In [12]: mul = np.array([[1,2,3,4], [2,3,4,5]])
In [13]: mul
Out[13]:
array([[1, 2, 3, 4],
[2, 3, 4, 5]])
In [14]: mul[0,:]
Out[14]: array([1, 2, 3, 4])
In [15]: mul[0:2,0:4]
Out[15]:
array([[1, 2, 3, 4],
[2, 3, 4, 5]])
In [16]: mul[0:2,0:2] // 逗号前面是行,后面是列
Out[16]:
array([[1, 2],
[2, 3]])
In [17]: mul[0:2,0:3] //取头,不取尾部0~1取到
Out[17]:
array([[1, 2, 3],
[2, 3, 4]])
当然数组也可以进行加减乘除,阶乘等。(+,-,*,/,**)
pandas的操作:
1、一维的Series操作:
In [1]: import pandas as pd
In [2]: from pandas import Series, DataFrame
In [3]: Series([1,2,3]) // 前面的第一列是索引,从0开始
Out[3]:
0 1
1 2
2 3
dtype: int64
In [4]: Series((1,2,3))
Out[4]:
0 1
1 2
2 3
dtype: int64
In [5]: Series((1,2,3),index=['a','b','c'])
Out[5]:
a 1
b 2
c 3
dtype: int64
//通过字典的形式创建
In [6]: a = {'a':1,'b':2}
In [7]: Series(a)
Out[7]:
a 1
b 2
dtype: int64
In [8]: a.keys()
Out[8]: dict_keys(['a', 'b'])
//属性修改
In [11]: s1 = Series(a)
In [12]: s1
Out[12]:
a 1
b 2
dtype: int64
In [13]: s1.index
Out[13]: Index(['a', 'b'], dtype='object')
In [14]: s1.index = [0,1]
In [15]: s1
Out[15]:
0 1
1 2
dtype: int64
In [16]: s1.index.name = "Index"
In [17]: s1
Out[17]:
Index
0 1
1 2
dtype: int64
// 索引:
In [22]: s1
Out[22]:
Index
0 1
1 2
dtype: int64
In [23]: s1[0]
Out[23]: 1
In [24]: s1[0:2]
Out[24]:
Index
0 1
1 2
dtype: int64
//数据结构转换
In [32]: s1
Out[32]:
Index
0 1
1 2
dtype: int64
In [33]: s1.to_string()
Out[33]: 'Index\n0 1\n1 2'
In [34]: s1.to_dict()
Out[34]: {0: 1, 1: 2}
In [35]: s1.tolist()
Out[35]: [1, 2]
In [36]: s1.to_json()
Out[36]: '{"0":1,"1":2}'
In [37]: s1.to_frame()
Out[37]:
0
Index
0 1
1 2
In [38]: s1.to_csv()
Out[38]: '0,1\n1,2\n'
2、二维的DataFrame的操作:
In [39]: d = {'a':[1,2,3],'b':[4,5,6]}
In [40]: df = DataFrame(d)
In [41]: df
Out[41]:
a b
0 1 4
1 2 5
2 3 6
In [42]: df.columns = ['b','a']
In [43]: df
Out[43]:
b a
0 1 4
1 2 5
2 3 6
In [44]: df.columns.name = 'T'
In [45]: df.index.name = 'I'
In [46]: df
Out[46]:
T b a
I
0 1 4
1 2 5
2 3 6
//索引
In [47]: df.a
Out[47]:
I
0 4
1 5
2 6
Name: a, dtype: int64
In [48]: df['a']
Out[48]:
I
0 4
1 5
2 6
Name: a, dtype: int64
In [49]: df.columns
Out[49]: Index(['b', 'a'], dtype='object', name='T')
In [50]: df.columns[:2]
Out[50]: Index(['b', 'a'], dtype='object', name='T')
In [51]: df[df.columns[:2]]
Out[51]:
T b a
I
0 1 4
1 2 5
2 3 6
还有:df.to_json,df.to_dict(),df.to_latex()等等。但没有tolist
In [69]: df
Out[69]:
T b a
a 1 4
b 2 5
c 3 6
In [70]: df.loc['a':'c']
Out[70]:
T b a
a 1 4
b 2 5
c 3 6
布尔值数组和函数应用
In [76]: df
Out[76]:
T b a c
a 1 4 5
b 2 5 6
c 3 6 7
In [77]: df.columns = ['a','b','c']
In [78]: df
Out[78]:
a b c
a 1 4 5
b 2 5 6
c 3 6 7
In [79]: df['a'] >= 2
Out[79]:
a False
b True
c True
Name: a, dtype: bool
In [80]: df[df['a'] >= 2]
Out[80]:
a b c
b 2 5 6
c 3 6 7
In [81]: df.query("a>=2 and b>=5")
Out[81]:
a b c
b 2 5 6
c 3 6 7
// 函数应用
In [88]: df
Out[88]:
a b c
0 1 4 5
1 2 5 6
2 3 6 7
In [89]: df.sum()
Out[89]:
a 6
b 15
c 18
dtype: int64
In [90]: df.sum(axis=0)
Out[90]:
a 6
b 15
c 18
dtype: int64
In [92]: df.sum(axis=1)
Out[92]:
0 10
1 13
2 16
dtype: int64
In [93]: df.mean()
Out[93]:
a 2.0
b 5.0
c 6.0
dtype: float64
In [94]: df.max()
Out[94]:
a 3
b 6
c 7
dtype: int64
In [95]: df.min()
Out[95]:
a 1
b 4
c 5
dtype: int64
还有df.describe()等。对于Series还有map的操作,对于DataFrame有apply,applymap等。
In [1]: from pandas import DataFrame
In [2]: df = DataFrame({'a':[1,2,3,None],'b':[1,None,4,5]})
In [3]: df
Out[3]:
a b
0 1.0 1.0
1 2.0 NaN
2 3.0 4.0
3 NaN 5.0
In [4]: df.isnull()
Out[4]:
a b
0 False False
1 False True
2 False False
3 True False
In [5]: df.isnull().sum()
Out[5]:
a 1
b 1
dtype: int64
In [6]: df.fillna('missing')
Out[6]:
a b
0 1 1
1 2 missing
2 3 4
3 missing 5
In [7]: df.fillna(df.mean())
Out[7]:
a b
0 1.0 1.000000
1 2.0 3.333333
2 3.0 4.000000
3 2.0 5.000000
编码问题:
from ftfy import fix_text
fix_text(data.text)