马马虎虎写一下Numpy和pandas

最新推荐文章于 2023-05-25 13:02:07 发布

指缝间的_阳光

最新推荐文章于 2023-05-25 13:02:07 发布

阅读量120

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/weixin_42315169/article/details/99612443

版权

爬虫专栏收录该内容

3 篇文章 0 订阅

订阅专栏

NumPy基本操作：

1、数组的创建：

  In [1]: import numpy as np

  In [2]: narr = np.arange(0,10,1)

  In [3]: narr
  Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

  In [4]: narr1 = np.arange(0,1,0.1)

  In [5]: narr1
  Out[5]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

  In [6]: narr2 = np.linspace(0,1,10,endpoint=False)

  In [7]: narr2
  Out[7]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
  
  In [8]: narr = np.array([1,2,3])

  In [9]: narr
  Out[9]: array([1, 2, 3])

  In [10]: narr = np.array((1,2,3))

  In [11]: narr
  Out[11]: array([1, 2, 3])

  In [12]: narr = np.array([[1,2,3], [4,5,6]])

  In [13]: narr
  Out[13]: 
  array([[1, 2, 3],
         [4, 5, 6]])

  In [14]: narr3 = np.array([narr,narr1])

  In [15]: narr3
  Out[15]: 
  array([array([[1, 2, 3],
         [4, 5, 6]]),
         array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])],
        dtype=object)
      
      
  还有其他的操作：
  In [17]: np.zeros((2,3))
  Out[17]: 
  array([[0., 0., 0.],
         [0., 0., 0.]])

  In [18]: np.ones((2,3))
  Out[18]: 
  array([[1., 1., 1.],
         [1., 1., 1.]])

  In [19]: np.eye(3)
  Out[19]: 
  array([[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]])

2、数组的属性：

  In [20]: narr
  Out[20]: 
  array([[1, 2, 3],
         [4, 5, 6]])

  In [21]: narr.dtype
  Out[21]: dtype('int64')

  In [22]: narr1 = np.array([1,2,3,4], dtype='float64')

  In [23]: narr1.dtype
  Out[23]: dtype('float64')
  
  //自定义属性：
  In [26]: student = np.dtype([('姓名', 'str', 5),('学号', 'int8'),('绩点','float')])

  In [27]: classx = np.array([('小明','1','3.0'),('小红','2','2.88')],dtype=student)

  In [28]: classx
  Out[28]: 
  array([('小明', 1, 3.  ), ('小红', 2, 2.88)],
        dtype=[('姓名', '<U5'), ('学号', 'i1'), ('绩点', '<f8')])

  In [29]: classx[0]
  Out[29]: ('小明', 1, 3.)

  In [30]: classx[0][0]
  Out[30]: '小明'
  
  
  In [33]: np.ones((3,5))
  Out[33]: 
  array([[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]])

  In [34]: np.ones((3,5)).reshape(1,15)
  Out[34]: array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

3、数组的操作：

  In [8]: na
  Out[8]: array(10)

  In [9]: na = np.arange(10)

  In [10]: na
  Out[10]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

  In [11]: na[::-1]
  Out[11]: array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
  
  
  In [12]: mul = np.array([[1,2,3,4], [2,3,4,5]])

  In [13]: mul
  Out[13]: 
  array([[1, 2, 3, 4],
         [2, 3, 4, 5]])

  In [14]: mul[0,:]       
  Out[14]: array([1, 2, 3, 4])

  In [15]: mul[0:2,0:4]
  Out[15]: 
  array([[1, 2, 3, 4],
         [2, 3, 4, 5]])

  In [16]: mul[0:2,0:2]        // 逗号前面是行，后面是列
  Out[16]: 
  array([[1, 2],
         [2, 3]])

  In [17]: mul[0:2,0:3]       //取头，不取尾部0~1取到
  Out[17]: 
  array([[1, 2, 3],
         [2, 3, 4]])

	
当然数组也可以进行加减乘除，阶乘等。（+，-，*，/，**）

pandas的操作：

1、一维的Series操作：

  In [1]: import pandas as pd

  In [2]: from pandas import Series, DataFrame

  In [3]: Series([1,2,3])    // 前面的第一列是索引，从0开始
  Out[3]: 
  0    1
  1    2
  2    3
  dtype: int64

  In [4]: Series((1,2,3))
  Out[4]: 
  0    1
  1    2
  2    3
  dtype: int64

  In [5]: Series((1,2,3),index=['a','b','c'])
  Out[5]: 
  a    1
  b    2
  c    3
  dtype: int64
  
  //通过字典的形式创建
  In [6]: a = {'a':1,'b':2}

  In [7]: Series(a)
  Out[7]: 
  a    1
  b    2
  dtype: int64

  In [8]: a.keys()
  Out[8]: dict_keys(['a', 'b'])

 //属性修改
  In [11]: s1 = Series(a)

  In [12]: s1
  Out[12]: 
  a    1
  b    2
  dtype: int64

  In [13]: s1.index
  Out[13]: Index(['a', 'b'], dtype='object')

  In [14]: s1.index = [0,1]

  In [15]: s1
  Out[15]: 
  0    1
  1    2
  dtype: int64

  In [16]: s1.index.name = "Index"

  In [17]: s1
  Out[17]: 
  Index
  0    1
  1    2
  dtype: int64
  
  // 索引：
  In [22]: s1
  Out[22]: 
  Index
  0    1
  1    2
  dtype: int64

  In [23]: s1[0]
  Out[23]: 1

  In [24]: s1[0:2]
  Out[24]: 
  Index
  0    1
  1    2
  dtype: int64
  
  //数据结构转换
  In [32]: s1
  Out[32]: 
  Index
  0    1
  1    2
  dtype: int64

  In [33]: s1.to_string()
  Out[33]: 'Index\n0    1\n1    2'

  In [34]: s1.to_dict()
  Out[34]: {0: 1, 1: 2}

  In [35]: s1.tolist()
  Out[35]: [1, 2]

  In [36]: s1.to_json()
  Out[36]: '{"0":1,"1":2}'

  In [37]: s1.to_frame()
  Out[37]: 
         0
  Index   
  0      1
  1      2

  In [38]: s1.to_csv()
  Out[38]: '0,1\n1,2\n'

2、二维的DataFrame的操作：

  In [39]: d = {'a':[1,2,3],'b':[4,5,6]}

  In [40]: df = DataFrame(d)

  In [41]: df
  Out[41]: 
     a  b
  0  1  4
  1  2  5
  2  3  6

  In [42]: df.columns = ['b','a']

  In [43]: df
  Out[43]: 
     b  a
  0  1  4
  1  2  5
  2  3  6

  In [44]: df.columns.name = 'T'

  In [45]: df.index.name = 'I'

  In [46]: df
  Out[46]: 
  T  b  a
  I      
  0  1  4
  1  2  5
  2  3  6
  
  //索引
  In [47]: df.a
  Out[47]: 
  I
  0    4
  1    5
  2    6
  Name: a, dtype: int64

  In [48]: df['a']
  Out[48]: 
  I
  0    4
  1    5
  2    6
  Name: a, dtype: int64

  In [49]: df.columns
  Out[49]: Index(['b', 'a'], dtype='object', name='T')

  In [50]: df.columns[:2]
  Out[50]: Index(['b', 'a'], dtype='object', name='T')

  In [51]: df[df.columns[:2]]
  Out[51]: 
  T  b  a
  I      
  0  1  4
  1  2  5
  2  3  6
  
  还有：df.to_json，df.to_dict()，df.to_latex()等等。但没有tolist
  
  
  In [69]: df
  Out[69]: 
  T  b  a
  a  1  4
  b  2  5
  c  3  6

  In [70]: df.loc['a':'c']
  Out[70]: 
  T  b  a
  a  1  4
  b  2  5
  c  3  6

布尔值数组和函数应用

  In [76]: df
  Out[76]: 
  T  b  a  c
  a  1  4  5
  b  2  5  6
  c  3  6  7

  In [77]: df.columns = ['a','b','c']

  In [78]: df
  Out[78]: 
     a  b  c
  a  1  4  5
  b  2  5  6
  c  3  6  7

  In [79]: df['a'] >= 2
  Out[79]: 
  a    False
  b     True
  c     True
  Name: a, dtype: bool

  In [80]: df[df['a'] >= 2]
  Out[80]: 
     a  b  c
  b  2  5  6
  c  3  6  7

  In [81]: df.query("a>=2 and b>=5")
  Out[81]: 
     a  b  c
  b  2  5  6
  c  3  6  7


	// 函数应用
  In [88]: df
  Out[88]: 
     a  b  c
  0  1  4  5
  1  2  5  6
  2  3  6  7

  In [89]: df.sum()
  Out[89]: 
  a     6
  b    15
  c    18
  dtype: int64

  In [90]: df.sum(axis=0)
  Out[90]: 
  a     6
  b    15
  c    18
  dtype: int64
  
  In [92]: df.sum(axis=1)
  Out[92]: 
  0    10
  1    13
  2    16
  dtype: int64

  In [93]: df.mean()
  Out[93]: 
  a    2.0
  b    5.0
  c    6.0
  dtype: float64

  In [94]: df.max()
  Out[94]: 
  a    3
  b    6
  c    7
  dtype: int64

  In [95]: df.min()
  Out[95]: 
  a    1
  b    4
  c    5
  dtype: int64

还有df.describe()等。对于Series还有map的操作，对于DataFrame有apply,applymap等。

    In [1]: from pandas import DataFrame

    In [2]: df = DataFrame({'a':[1,2,3,None],'b':[1,None,4,5]})

    In [3]: df
    Out[3]: 
         a    b
    0  1.0  1.0
    1  2.0  NaN
    2  3.0  4.0
    3  NaN  5.0

    In [4]: df.isnull()
    Out[4]: 
           a      b
    0  False  False
    1  False   True
    2  False  False
    3   True  False

    In [5]: df.isnull().sum()
    Out[5]: 
    a    1
    b    1
    dtype: int64

    In [6]: df.fillna('missing')
    Out[6]: 
             a        b
    0        1        1
    1        2  missing
    2        3        4
    3  missing        5

    In [7]: df.fillna(df.mean())
    Out[7]: 
         a         b
    0  1.0  1.000000
    1  2.0  3.333333
    2  3.0  4.000000
    3  2.0  5.000000

编码问题：

from ftfy import fix_text
fix_text(data.text)

指缝间的_阳光

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
马马虎虎写一下Numpy和pandas

NumPy基本操作：1、数组的创建： In [1]: import numpy as np In [2]: narr = np.arange(0,10,1) In [3]: narr Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [4]: narr1 = np.arange(0,1,0.1) In [5]: na...
复制链接

扫一扫

专栏目录