读书笔记:<python一行流> -- 3Numpy和一行流

numpy与数据科学中的一行流

基础二维数组计算

创建一维\二维\三维数组

import numpy as np

a = np.array([1,2,3,4,5,6,7])
b=np.array([[1,2],
          [3,4]])
c=np.array([[[1,2],[3,4]],
           [[5,6],[7,8]]])
#查看数组的维度
print(a.ndim)
1
#查看数组的维度
print(b.ndim)
print(c.ndim)

2
3

二维数组的基本算术运算

a=np.array([[1,0,0],
            [1,1,1],
            [2,0,0]])
b=np.array([[1,1,1],
           [1,2,1],
           [1,0,2]])
a+b

array([[2, 1, 1],
       [2, 3, 2],
       [3, 0, 2]])
a-b
array([[ 0, -1, -1],
       [ 0, -1,  0],
       [ 1,  0, -2]])
a*b
array([[1, 0, 0],
       [1, 2, 1],
       [2, 0, 0]])
# 产生除0错误 但没有报错,而是以结果nan表示.
a/b
C:\Users\A1\AppData\Local\Temp/ipykernel_7944/1348051284.py:1: RuntimeWarning: invalid value encountered in true_divide
  a/b





array([[1. , 0. , 0. ],
       [1. , 0.5, 1. ],
       [2. , nan, 0. ]])
#np数组的算术运算都是在元素层面上进行的

聚合函数np.max(),np.min(),np.average()

np.max(a)
2
np.min(b)
0
np.average(a)
0.6666666666666666

给定一群人的年薪和税率,找到其中税后收入最高的人

# 数据 [2017,2018,2019] 这三年的年收入
alice = [99,101,103]
bob = [ 110,108,105]
tim = [90,88,85]
salaries = np.array([alice,bob,tim])
taxation = np.array([[0.2,0.25,.22],
                    [.4,.5,.5],
                    [.1,.2,.1]])
# 一行流
max_income=np.max(salaries-salaries*taxation)
max_income
81.0
print(salaries-salaries*taxation) # 扣税后的收入情况
[[79.2  75.75 80.34]
 [66.   54.   52.5 ]
 [81.   70.4  76.5 ]]

Numpy数组的切片\广播\数组类型

markdown简介

markdown是一种亲轻量级标记语言
分割线


或者


分割线


或者


加粗

斜体

加粗加斜体

删除线

超链接
http://www.baidu.com

[超链接文字](http://www.baidu.com “title”
加粗

斜体

加粗加斜体

删除线

超链接
http://www.baidu.com

超链接文字
段落 两个enter

这是一个新段落
段落 两个enter

这是一个新段落
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bVVEtcVS-1640597322816)(图片地址 “图片title”)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LIerB0v3-1640597322817)(图片地址 “图片title”)]
块引用

这里是引用的文字
块引用
这里是引用的文字
列表

  • 无序列表1
    • 子列表1
  • 无需列表1
    • 自列表1
  • 无需列表1
    • 自列表1
  1. 有序列表
    1. 自列表

列表

  • 无序列表1
    • 子列表1
  • 无需列表1
    • 自列表1
  • 无需列表1
    • 自列表1
  1. 有序列表
    1. 自列表

切片和索引

numpy支持多数组的多个维度同时索引,每一个维度用逗号分隔

一维切片的例子
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,0])
a
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
a[:]
array([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
a[2:]
array([3, 4, 5, 6, 7, 8, 9, 0])
a[1:4]
array([2, 3, 4])
a[2:-2]
array([3, 4, 5, 6, 7, 8])
a[::2]
array([1, 3, 5, 7, 9])
a[2::2]
array([3, 5, 7, 9])
a[::-1]
array([0, 9, 8, 7, 6, 5, 4, 3, 2, 1])
a[:1:-2]
array([0, 8, 6, 4])
a[-1:1:-2]
array([0, 8, 6, 4])
二维切片的例子
a = np.array([[0,1,2,3],
              [4,5,6,7],
              [8,9,10,11],
              [12,13,14,15]]
            )
#所有行,第3列
a[:,2]
array([ 2,  6, 10, 14])
# 第二行所有列
a[1,:]
array([4, 5, 6, 7])
# 第三行,间隔取值
a[2,::2]
array([ 8, 10])
# 所有的行,但不带最后一列
a[:,:-1]
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10],
       [12, 13, 14]])
# 只有一个slice表示默认另一个轴,取全部
a[:-2]
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

**总结:**多维切片的基本格式ndarray[slice1,slice2,slice3…] slice=start:stop:step

广播

广播(broadcasting)是指numpy的一种自动处理的过程,它把两个ndarray变成相同的形状(shape)"

为了让两个不同形状的数组进行运算,numpy会通过广播把一个低维的数组通过填充的方式扩展成高维的数组,以进行运算

#数组的维度

import numpy as np
a = np.array([1,2,3])
b = np.array([[1,2,3],
             [4,5,6]])
c=np.array([[[1,2,3],[4,5,6]],
           [[7,8,9],[10,11,12]]])
a.ndim
1
b.ndim

2
c.ndim

3
# 数组的形状shape,返回每个维度上元素的个数组成的元组
a.shape
(3,)
b.shape
(2, 3)
c.shape
(2, 2, 3)

总结 每新增一个维度新的轴将变成第0轴,而原来的低维数组的第i轴变成高维数组的第i+1轴
同质:指的是数组中的所有元素必须是相同的类型
bool 1字节
int 默认4字节或8字节
np.int8 1字节
np.int16 2字节
np.int32 4字节
np.int64 8字节
float 默认大小8字节
np.float16 2字节
np.float32 4字节
np.float64 8字节
complex 默认大小 16字节

# 指定元素类型
a=np.array([1,2,3],dtype=np.int16)
b=np.array([11,22,33],dtype=np.float32)
b.dtype
dtype('float32')
将数据科学家的工资每隔一年提高10%

现有一个二维数组,保存各职业25,26,27年的工资数据,要求将数据科学家的工资每隔一年提高10%

import numpy as np
# 数据 年收入[2025,2026,2027]
datascientist = [130,132,137]
productmanager=[127,140,145]
designer = [118,118,127]
softwareEngineer=[129,131,137]

employees=np.array([datascientist,
          productmanager,
          designer,
          softwareEngineer])

# 一行流
employees[0,::2]=employees[0,::2]*1.1
employees
array([[143, 132, 150],
       [127, 140, 145],
       [118, 118, 127],
       [129, 131, 137]])

**总结:**使用到了切片与切片赋值,还有广播让数组和浮点数相乘,结果并没有改变元素的类型,还是整型

使用条件数组查询\过滤和广播检测异常值

背景知识
#nonzero():可以得到数组非0元素的索引
x=np.array([[1,0,0],
          [0,2,2],
          [3,0,3]])
print(np.nonzero(x))
(array([0, 1, 1, 2, 2], dtype=int64), array([0, 1, 2, 0, 2], dtype=int64))

结果是一个元组,由两个np数组构成的元素组成,第一个数组保存非0值的行索引,第二个保存非0数组的列索引

y=np.array([[[1,2,3,0],[1,2,0,4]],
           [[0,1,3,0],[0,0,4,5]]])
print(np.nonzero(y))
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64), array([0, 0, 0, 1, 1, 1, 0, 0, 1, 1], dtype=int64), array([0, 1, 2, 0, 1, 3, 1, 2, 2, 3], dtype=int64))
# 利用广播进行布尔级操作
a= np.array([[1,0,0],
            [0,2,2],
            [3,0,0]])
print(a==2)
[[False False False]
 [False  True  True]
 [False False False]]
找出污染峰值超过平均值的城市
# 数据:空气质量指数(行=dity)
x=np.array([
    [42,40,41,43,44,43],#Hong Kong
    [30,31,29,29,29,30],#New York
    [8,13,31,11,11,9],# Berlin
    [11,11,12,13,11,12]] )# Montreal
cities = np.array(['Hong Kong','New York','Berlin','Montreal'])
## 一行流
polluted = set(cities[np.nonzero(x>np.average(x))[0]])
polluted
{'Berlin', 'Hong Kong', 'New York'}
x>np.average(x)
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [False, False,  True, False, False, False],
       [False, False, False, False, False, False]])
np.nonzero(x>np.average(x))
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2], dtype=int64),
 array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 2], dtype=int64))

高级索引:numpy允许定义一个序列作为索引,而不用是连续的切片,
就可以通过指定一个整数序列(代表要选择的索引)或者一个布尔值的序列(选择对应值是True的那些索引)来获取数组中的元素

x=[1,2,3,4,5,6]
x=np.array(x)
x[[1,4,5]]
array([2, 5, 6])
x[[0,1,1,0]]
array([1, 2, 2, 1])
x[[True,False,False,True,True,True]]
array([1, 4, 5, 6])

使用布尔索引过滤二维数组

背景知识
# 数据数组和索引数组
a = np.array([[1,2,3],  # 数据数组
              [4,5,6],
              [7,8,9]])
indices = np.array([[False,False,True], # 索引数组
                  [True,False,False],
                    [False,True,True]])
a[indices]
array([3, 4, 8, 9])
# 给定一个二维数组,每行是一个影响者的数据,第一列代表名字,第二列代表粉丝数量,求出粉丝超过一亿的影响者名字
inst = np.array([[232,"李佳琪"],
                 [133,'老罗'],
                 [120,"薇薇安"],
                 [111,"唐糖"],
                 [76,"一地鸡毛"]])
superstar = inst[inst[:,0].astype(float)>100,1]
superstar
array(['李佳琪', '老罗', '薇薇安', '唐糖'], dtype='<U11')
# astype(float)用于把切片生成的数组转换为浮点型,因为原始数组中有整数和字符串,np自动将所有类型转化为了字符串
inst.dtype
dtype('<U11')
inst[:,0].astype(float)>100
array([ True,  True,  True,  True, False])

使用广播\切片赋值和重塑清洗固定步长的数组元素

基础知识
# 切片赋值
a=np.array([4]*16)
a
array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
a[1:]=[32]*15
a
array([ 4, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32])
a[2:10:3]=16
a
array([ 4, 32, 16, 32, 32, 16, 32, 32, 16, 32, 32, 32, 32, 32, 32, 32])
# reshape重塑
a=np.array([1,2,3,4,5,6,7,8,9])
a.reshape((3,3))
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])
a.reshape((3,-1)) #当某个维度的参数为-1时,np会自动计算
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
# 轴参数(axis argument)
a=np.array(range(10))
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b=a.reshape((2,-1))
b
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
np.average(b,axis=0) # 列方向求平均值
array([2.5, 3.5, 4.5, 5.5, 6.5])
np.average(b,axis=1) # 行方向求平均值
array([2., 7.])
# 给定一个温度值数组,将第7天的数据用过去7天的平均值代替
data = [1,2,3,4,5,6,12,
       2,3,4,5,4,3,1,
       3,5,3,2,3,4,9]
tmp=np.array(data)
#一行流
data[6::7]=np.average(tmp.reshape((-1,7)),axis=1)
data
[1,
 2,
 3,
 4,
 5,
 6,
 4.714285714285714,
 2,
 3,
 4,
 5,
 4,
 3,
 3.142857142857143,
 3,
 5,
 3,
 2,
 3,
 4,
 4.142857142857143]
tmp.reshape((-1,7))
array([[ 1,  2,  3,  4,  5,  6, 12],
       [ 2,  3,  4,  5,  4,  3,  1],
       [ 3,  5,  3,  2,  3,  4,  9]])
np.average(tmp.reshape((-1,7)),axis=1)
array([4.71428571, 3.14285714, 4.14285714])

Numpy中的排序

sort()和argsort()

# argsort()返回在排序后创建一个原数组的索引组成的数组

import numpy as np
a=list(reversed([0,1,2,3,4,5,6,7]))

a=np.array(a)
np.sort(a)
array([0, 1, 2, 3, 4, 5, 6, 7])
np.argsort(a)
array([7, 6, 5, 4, 3, 2, 1, 0], dtype=int64)
a=np.array([10,6,8,2,5,4,9,10])

np.sort(a)
array([ 2,  4,  5,  6,  8,  9, 10, 10])
np.argsort(a)
array([3, 5, 4, 1, 2, 6, 0, 7], dtype=int64)
# 指定轴参数的排序
a=np.array([[1,6,2],
            [5,1,1],
            [8,4,3]])
np.sort(a,axis=0)
array([[1, 1, 1],
       [5, 4, 2],
       [8, 6, 3]])
np.sort(a,axis=1)
array([[1, 2, 6],
       [1, 1, 5],
       [3, 4, 8]])

分数最高的三名学生的名字

#数据 不同学生的分数
score=np.array([1100,1256,1543,1043,998,1200,1533])
students=np.array(['bob','tom','kohn','john','jim','anmi','rose'])
# 一行流
top_3=students[
    np.argsort(score)[:-4:-1]]
top_3
array(['kohn', 'rose', 'tom'], dtype='<U4')

使用lambda函数和布尔索引来过滤数组

创建一个过滤函数

# 创建一个过滤函数
# 传入一个图书列表x和最低评分y,返回一个评分高于最低评分的潜在畅销书列表
# 数据 (row=[书名,评分])
books = np.array([['笨办法学python',4.9],
                  ['流畅的python',5.9],
                  ['python从入门到项目实践',5.2],
                  ['从0到1javascript快速上手',4.7],
                  ['从1到无穷大',5.1]])
# 一行流
predict_bestseller=lambda x,y:x[x[:,1].astype(float)>y]
print(predict_bestseller(books,5.0))
[['流畅的python' '5.9']
 ['python从入门到项目实践' '5.2']
 ['从1到无穷大' '5.1']]

**总结:**使用lambda创建一个函数,它有两个参数,分别是x,y

高级数组过滤器

# 异常值检测:如果一个观测值与平均值的偏差超过标准差,那么它就被定义为异常值

均值和标准差

import numpy as np
import matplotlib.pyplot as plt
# np.random.normal(mean,deviation,shape)以给定的平均值和标准差以正态分布的随机抽样创建一个np数组
sequence = np.random.normal(10.0,1.0,500)
sequence
array([11.60273162,  6.65293584, 10.12605379, 11.85986644,  8.63110802,
       10.06998214, 10.45973071,  9.98725049,  9.09705239, 10.13015269,
        9.85089427,  9.99808394,  8.78441828,  9.63904572, 10.85481516,
        7.35075106, 10.94845724, 11.0869893 ,  9.28596955,  8.97815481,
        9.26186245,  9.53746784, 10.87090498,  9.12047918,  9.02657023,
       10.0868035 ,  9.20006406,  8.73189255,  9.13076184, 11.10047448,
       10.43225348,  9.58469227, 10.40723005, 10.55369215,  9.87143636,
       10.29960485, 10.25535646,  9.79499688,  9.46107401,  7.7014166 ,
        9.68435112,  9.4686142 ,  8.46912804,  9.35320587, 12.15193188,
       10.50262112,  9.38940117,  9.47157601, 10.25920525, 12.05583665,
       11.43411283,  9.13550521,  9.93624269,  8.63198423,  9.29192518,
        8.36295787,  9.97559304, 11.97734564,  9.6559952 , 10.90758338,
        9.03529344,  6.91678558,  9.23626327, 11.17717063, 10.55490176,
       12.24268514, 10.10515331,  8.20452911,  9.6762657 ,  9.56359262,
       13.02888554, 10.06103927, 10.71648254,  9.29524475,  9.93546077,
       10.8254731 ,  9.77770707, 10.11214372, 11.11175554,  9.30043339,
       10.7485596 , 10.97295057, 10.66158835,  7.892232  ,  9.93887982,
       10.7188748 , 11.62641763,  9.22422878, 11.96244947, 11.37277716,
       12.02332692,  8.90228317,  9.43515463, 10.70487583, 10.23028306,
        9.66090627, 10.94709562,  8.6961016 , 10.41037259, 10.17828131,
       11.80130789,  9.31310925, 10.09872767,  8.20013718, 10.92970956,
       10.2495051 , 10.36134533,  8.91298481, 10.02369332, 10.68721058,
        9.00054432, 10.32584858,  8.75465787, 12.21659052,  9.95105381,
       10.28374084,  9.09521556, 11.75475578,  9.75949037,  9.89331747,
        9.23478178,  7.90975969,  9.4273364 ,  8.9090529 , 11.10110534,
       10.36954102, 10.19726741, 10.28052735, 10.30538537,  8.70120177,
        9.3671505 , 10.11196977,  8.36794355, 10.32525939,  9.65441533,
        9.51908772, 10.8800997 , 10.09716578, 10.91563748, 10.72492996,
       11.10679298, 10.02064013, 11.07823158,  9.5317347 , 10.2914028 ,
       10.01976486,  9.72845379, 10.65084245, 12.41439712,  9.69910187,
        9.75108477,  9.9845896 ,  9.81770095, 11.29157327, 10.15456955,
       10.60837797,  9.45917681, 10.24010858, 10.5761626 ,  9.55445776,
        9.57869162, 10.80719746, 11.5905502 ,  9.56478353,  9.65956002,
       11.30053638, 10.59873521, 10.28842162,  8.54243158, 10.33120558,
        9.59322875, 11.58458479, 10.09302003, 11.15638722, 12.23678871,
        8.15472985, 10.42502666, 10.04885823, 10.81404769,  8.15788842,
       11.68225804,  9.36783949,  9.49919482, 10.52601385, 11.89667602,
        9.67034409, 10.36543152,  7.83546884, 10.63937759, 11.47507461,
       10.13818656,  9.51148501, 11.52854464, 11.26931747, 10.99338663,
        9.50413836,  7.24050328,  9.6154753 ,  9.63745882, 10.94203385,
        8.86364438,  8.27484969, 11.29900913, 10.63114097, 10.67904167,
        8.6837591 ,  8.88040556, 10.58496528,  9.33096014,  8.52821296,
        9.5348672 , 10.75886848,  9.51366472, 11.28135789,  9.12082393,
        9.97388793,  9.82316507,  9.88920019, 10.24871057,  8.75774533,
        8.33304482,  8.65544812, 10.00809262, 10.62840715, 10.11816525,
        9.9628467 , 10.04342218,  8.48637003,  9.33254844,  9.76771249,
        8.38893789, 11.05047808,  9.67126876,  9.83964206,  9.17303963,
       10.57315043, 10.38521662, 10.84684624,  7.85330557, 10.20538821,
       10.81687365,  8.64936151, 10.12903228, 10.56758503,  9.14424382,
        9.64866383, 10.9616145 ,  9.98213592, 10.92951974,  7.47230314,
       10.63895034, 11.0604198 ,  9.72761195, 10.60446029, 10.43152824,
        9.00839484,  9.83700604, 12.45059843, 10.43414501, 12.34487213,
       10.75545494, 10.27786507, 12.55689347, 10.34912244,  9.29060352,
       10.72588034,  9.94346514, 10.54777849, 11.72420947,  9.8708743 ,
        9.22212126,  9.68541625,  9.59774448,  9.11221574,  9.91278983,
       10.4820126 ,  8.25422937, 10.53147771, 10.04705301,  9.05978545,
       10.49055762, 12.42477809, 12.07271904,  7.61849858,  8.3178447 ,
       10.55941704, 10.38182936, 11.0665193 , 11.28441137,  9.66078923,
        9.38680616, 10.38885545,  8.23828454, 10.13555809,  9.30452756,
        9.99692358, 10.46199192,  9.77339638, 11.3772616 ,  9.1032097 ,
        9.66978   , 10.89886416, 11.7536681 ,  9.59221274, 10.73252456,
        7.0888786 ,  9.45314876,  8.86301785,  8.75563987,  8.3921786 ,
       11.4712737 ,  8.9499308 , 11.45150041, 10.24883007,  9.92543423,
       10.30065446,  8.84153306, 10.78120675, 11.97610638,  8.18525749,
       11.66789819, 10.33936961, 10.54462992, 10.79140861,  9.28616639,
        9.99606121,  9.37771233,  7.69844409, 10.50744526, 10.05817203,
        9.90328269,  9.21993504,  8.49566261,  8.59472478,  8.88297634,
        9.05656056, 11.1155842 ,  8.71964028, 10.3287043 , 11.03014754,
       10.74346088,  9.6133147 ,  8.94829591, 11.10567045, 11.53995834,
       10.79425045,  8.29533343, 11.73033694,  9.05390855,  9.96661056,
        9.65544677,  9.28845565,  9.73081175, 10.79029066,  8.71006262,
       10.21652827, 11.55566433,  8.40294093,  8.81970601,  9.32839453,
        9.72528028, 10.49817569, 11.32272509, 11.40344254, 10.15429825,
        8.77501545, 10.17652889,  7.49672875, 11.23734041, 10.35285293,
       10.83609291,  9.39994134,  9.79438055, 11.60539785, 11.1956015 ,
       10.0348099 ,  8.64140396,  9.35232112, 10.33737302, 11.2767812 ,
       10.35535787,  9.22550179,  9.53862746, 11.1766378 ,  9.59710827,
        8.68138898, 11.14109352,  9.6750869 , 11.29629946,  7.92121702,
        8.93229853,  9.46711141, 10.53834349,  9.12461323, 10.48423828,
        8.16854633,  9.5213865 , 12.62846863,  9.65370518,  8.66493306,
       10.26409626,  8.80259482, 10.72081245, 11.77263016,  9.75088966,
        8.99110743,  9.88220114, 10.40220736,  9.00647373,  9.71239656,
        9.49302085,  8.98309053,  9.66597825, 10.27313298,  9.73224669,
       10.98600021,  8.42882756, 10.54587761,  9.3863515 , 10.63129959,
        9.54279797, 11.91960088, 10.95716162, 10.34511289,  9.00471249,
        9.64651796, 10.07223761,  9.47935434,  9.61415037,  8.82044289,
        9.25640485,  9.17837895, 10.72889737, 11.68353699, 10.29589292,
       11.35906728, 10.92925076,  8.72458937, 12.56023535,  9.72806574,
       10.6106498 , 10.84630502, 11.29221825, 10.45026532,  9.40195773,
       10.56191066,  9.25029589,  8.81880033,  8.10918461,  9.64208128,
        9.8698753 ,  9.03411736, 11.51661176,  9.94319404, 10.72636352,
       10.42417327, 10.15396807, 10.87273094,  7.84001872, 10.71706689,
       10.25032915,  9.8371806 , 10.9336922 ,  9.52777884, 11.45964879,
        8.97052672, 10.73160785,  9.60847006, 11.13603076, 11.35955011,
        9.85793642,  9.61844113,  9.54059302, 10.27081728,  9.38568048,
        9.94275759, 11.35352371,  7.89166765,  9.30537635,  9.46898646,
       10.89801433, 10.0963657 ,  9.52512553,  9.67269072,  9.46775335,
        9.4408703 ,  9.88111873,  9.00232895,  9.74406787, 12.19007662,
       10.01493886, 10.61546349, 10.52380967,  9.05287545,  9.72946157,
        8.28731793,  9.2750119 , 10.76406343, 10.15305887,  9.77640742])
plt.xkcd()  # 绘图样式
plt.hist(sequence)  # 绘制直方图
plt.annotate(r"$\omega_1=9$",(9,70))
plt.annotate(r"$\omega_2=11$",(11,70))
plt.annotate(r"$mu=10$",(10,90))
plt.savefig("plot.jpg")
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qkAyGv3H-1640597322818)(output_144_0.png)]

绝对值

a=np.array([-1,2,-3,4,-2])
np.abs(a)
array([1, 2, 3, 4, 2])

逻辑与运算

a=np.array([False,True,False,True])
b=np.array([True,True,False,True])
np.logical_and(a,b)# 也可以把两个布尔数组相乘,结果相同
array([False,  True, False,  True])

找出统计数据和统计平均值偏离一个标准差的异常日期

#  网站分析数据(每一行为1天,每列为活跃用户数,跳出数,平均会话时长)
a = np.array([[815,70,115],
             [767,80,50],
             [554,88,70],
             [1008,65,128]])
mean,stdev=np.mean(a,axis=0),np.std(a,axis=0)
out = ((np.abs(a[:,0]-mean[0])>stdev[0])
      *(np.abs(a[:,1]-mean[1])>stdev[1])
      *(np.abs(a[:,2]-mean[2])>stdev[2]))
out

array([False, False, False,  True])
a[out]
array([[1008,   65,  128]])

简单关联分析

# 数据 每行是一个顾客的购物篮
# 行=[course1,course2,ebook1,ebook2]
# 数值1 代表已购买
basket = np.array([[0,1,1,0],
                  [0,0,0,1],
                  [1,1,0,0],
                  [0,1,1,1],
                  [1,1,1,0],
                  [0,1,1,0],
                  [1,1,0,1],
                  [1,1,1,1]])
# 一行流
res=np.sum(np.all(basket[:,2:],axis=1))/basket.shape[0]
res
0.25
basket[:,2:]

array([[1, 0],
       [0, 1],
       [0, 0],
       [1, 1],
       [1, 0],
       [1, 0],
       [0, 1],
       [1, 1]])
np.all(basket[:,2:],axis=1)

array([False, False, False,  True, False, False, False,  True])
# 最后得到购买这两本书的顾客所占的比例
res=[(i,j,np.sum(basket[:,i] + basket[:,j] == 2))
    for i in range(4) for j in range(i+1,4)]

res
[(0, 1, 4), (0, 2, 2), (0, 3, 2), (1, 2, 5), (1, 3, 3), (2, 3, 2)]
max(res,key=lambda x:x[2])
(1, 2, 5)


天冷吃货吃火锅,空调冷天不制热
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值