Numpy的基本操作

最新推荐文章于 2024-07-24 23:27:06 发布

梦因you而美

最新推荐文章于 2024-07-24 23:27:06 发布

阅读量1.2k

点赞数

分类专栏：数据挖掘文章标签： Numpy的基本操作创建数组正态分布股票案例数组形状与类型变化

本文链接：https://blog.csdn.net/apollo_miracle/article/details/88238161

版权

数据挖掘专栏收录该内容

44 篇文章 4 订阅

订阅专栏

1 创建数组

1.1 0和1的数组

zeros(shape[, dtype, order])
ones(shape[, dtype, order])
empty(shape[, dtype, order])
empty_like(a[, dtype, order, subok])
eye(N[, M, k, dtype, order])
identity(n[, dtype])
ones_like(a[, dtype, order, subok])
zeros_like(a[, dtype, order, subok])
full(shape, fill_value[, dtype, order])
full_like(a, fill_value[, dtype, order, subok])

In [8]: np.zeros([3, 4])
Out[8]: 
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [9]: np.ones([3, 4])
Out[9]: 
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

1.2 从现有的数据中创建

array(object[, dtype, copy, order, subok, ndmin])
asarray(a[, dtype, order])
copy(a[, order])
asanyarray(a[, dtype, order])
ascontiguousarray(a[, dtype])
asmatrix(data[, dtype])

a = np.array([[1,2,3],[4,5,6]])
# 从现有的数组当中创建
a1 = np.array(a)
# 相当于索引的形式，并没有真正的创建一个新的
a2 = np.asarray(a)

关于array和asarray的不同：

array相当于深拷贝
asarray相当于浅拷贝

1.3 创建固定范围的数组

np.linspace (start, stop, num, endpoint, retstep, dtype)

生成等间隔的序列

start 序列的起始值
stop 序列的终止值，
如果endpoint为true，该值包含于序列中
num 要生成的等间隔样例数量，默认为50
endpoint 序列中是否包含stop值，默认为ture
retstep 如果为true，返回样例，
以及连续数字之间的步长
dtype 输出ndarray的数据类型

# 生成等间隔的数组
In [10]: np.linspace(0, 100, 10)
Out[10]: 
array([  0.        ,  11.11111111,  22.22222222,  33.33333333,
        44.44444444,  55.55555556,  66.66666667,  77.77777778,
        88.88888889, 100.        ])

其它的还有
- numpy.arange(start,stop, step, dtype)
- numpy.logspace(start,stop, num, endpoint, base, dtype)

In [12]: np.arange(10, 50, 2)
Out[12]: 
array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,
       44, 46, 48])

1.4 创建随机数组

np.random模块

均匀分布
- np.random.rand(10)
- np.random.uniform(0, 100)
- np.random.randint(100)
正态分布
- 给定均值／标准差／维度的正态分布
- np.random.normal(1.75, 0.2, (3,4))
- np.random.standard_normal(size=(3,4))

In [20]: # 创建均匀分布的数组
    ...: # 0~1
    ...: np.random.rand(10)
Out[20]: 
array([0.87288205, 0.80947112, 0.46926444, 0.29773747, 0.31025692,
       0.93114797, 0.33767959, 0.39665134, 0.13081277, 0.009583  ])

In [21]: # 默认范围一个数
    ...: np.random.uniform(0, 100)
Out[21]: 28.058745825637

In [22]: # 随机整数
    ...: np.random.randint(10)
Out[22]: 5

In [23]: # 随机生成 10*10 的以1.75为均值、0.1为方差的二维数组
    ...: np.random.normal(1.75, 0.1, (10, 10))
Out[23]: 
array([[1.70764815, 1.90825189, 1.8104449 , 1.70028603, 1.82615611,
        1.93880119, 1.7287118 , 1.78678625, 1.69236525, 1.83106084],
       [1.85506358, 1.8494882 , 1.81245449, 1.65141356, 1.67844396,
        1.7451425 , 1.71257283, 1.82086795, 1.78385828, 1.68407594],
       [1.80595801, 1.75515106, 1.61061639, 1.68932146, 1.74793649,
        1.96122946, 1.76600436, 1.65205964, 1.73050053, 1.7609086 ],
       [1.65729785, 1.89011816, 1.55279622, 1.86074792, 1.68450454,
        1.79638268, 1.86659596, 1.7898481 , 1.62515823, 1.63247927],
       [1.60809596, 1.73966031, 1.73002805, 1.77336273, 1.86400997,
        1.77858312, 1.83105924, 1.66912282, 1.89985875, 1.77700716],
       [1.70033721, 1.68657645, 1.7317493 , 1.56939442, 1.65021153,
        1.81942054, 1.6893915 , 1.70844817, 1.64445185, 1.63156201],
       [1.6867838 , 1.5344188 , 1.87800365, 1.9121467 , 1.61793696,
        1.67682232, 1.70911169, 1.65752606, 1.7661449 , 1.85856154],
       [1.65011679, 1.75154708, 1.77703809, 1.75621051, 1.75773277,
        1.64378273, 1.79769325, 1.57588754, 1.8350513 , 1.83420537],
       [1.86856956, 1.75142543, 1.68319314, 1.9257889 , 1.90830985,
        1.78855576, 1.70669114, 1.8173815 , 1.72834393, 1.8191069 ],
       [1.70305002, 1.7388776 , 1.68824347, 1.83657128, 1.66446864,
        1.63069994, 1.72884818, 1.97192137, 1.72334265, 1.60113022]])

2 正态分布（理解）

2.1 什么是正态分布

正态分布是一种概率分布。正态分布是具有两个参数μ和σ的连续型随机变量的分布，第一参数μ是服从正态分布的随机变量的均值，第二个参数σ是此随机变量的方差，所以正态分布记作N(μ，σ )。

2.2 正态分布的应用

生活、生产与科学实验中很多随机变量的概率分布都可以近似地用正态分布来描述。

2.3 正态分布特点

μ决定了其位置，其标准差σ。决定了分布的幅度。当μ = 0,σ = 1时的正态分布是标准正态分布。

2.4 方差

是在概率论和统计方差衡量一组数据时离散程度的度量

其中M为平均值，n为数据总个数，S为标准差，S^2可以理解一个整体为方差

2.5 标准差与方差的意义

可以理解成数据的一个离散程度的衡量

3 案例：随机生成500个股票两年的交易日涨幅数据

500只股票，两年(504天：排除周末及节假日)的涨跌幅数据，如何获取？

两年的交易日数量为：2 X 252 = 504
随机生成涨跌幅在某个正态分布内，比如均值0，方差1

3.1 股票涨跌幅数据的创建

# 创建一个符合正太分布的500个股票504天的涨跌幅数据
stock_day_rise = np.random.normal(0, 1, (500, 504))
stock_day_rise.shape

3.2 数组的索引

获取第一个股票的前100个交易日的涨跌幅数据

# 二维的数组，两个维度 
stock_day_rise[0, 0:100]

3.3 数组形状与类型变化

3.3.1 修改形状

让刚才的股票行、日期列反过来，变成日期行，股票列

ndarray.reshape(shape[, order])

ndarray.resize(new_shape[, refcheck])

ndarray.flatten([order])

代码说明：

In [32]: # 创建一个符合正太分布的500个股票504天的涨跌幅数据
    ...: stock_day_rise = np.random.normal(0, 1, (500, 504))
    ...: stock_day_rise.shape
Out[32]: (500, 504)

In [33]: # 在转换形状的时候，一定要注意数组的元素匹配
    ...: stock_day_rise.reshape([504, 500]).shape
Out[33]: (504, 500)

In [34]: # 使用resize进行转换形状，没有返回值，直接操作原数组
    ...: stock_day_rise.resize([504,500])

In [35]: # 输出原数组形状
    ...: stock_day_rise.shape
Out[35]: (504, 500)

In [36]: # flatten改为一维数组
    ...: stock_day_rise.flatten().shape
Out[36]: (252000,)

3.3.2 修改类型

ndarray.astype(type)

In [38]: stock_day_rise.reshape([504, 500]).astype(np.int32)
Out[38]: 
array([[-1, -2,  0, ...,  0,  1,  0],
       [ 0,  0,  0, ...,  1,  0,  0],
       [ 2,  0,  0, ..., -1,  0,  1],
       ...,
       [ 0,  0,  0, ...,  0,  1,  0],
       [ 0,  0,  0, ...,  0,  0, -2],
       [ 1,  1,  0, ..., -1,  1,  0]])

3.3.3 修改小数位数

ndarray.round(arr, out)

In [39]: # 4：小数点保留4位
    ...: np.round(stock_day_rise[:2, :20], 4)
Out[39]: 
array([[-1.3366, -2.3797,  0.0407, -0.3354,  0.5998,  0.6724,  0.6552,
        -0.3418,  0.9803, -1.8896,  1.3886,  0.6984,  0.3171, -0.4011,
        -1.1443, -1.012 ,  0.1216, -1.8686,  0.4273,  1.3278],
       [ 0.7187, -0.6834, -0.7612,  0.1251,  1.554 ,  0.0957,  0.3044,
         1.8853,  0.2439,  0.5864, -1.0216, -0.0401,  0.7749, -0.1589,
         0.3079, -0.967 ,  1.5322,  1.0996, -0.7829, -0.3227]])

4、数组转换

ndarray.T 数组的转置：将数组的行、列进行互换

stock_day_rise.shape
(500, 504)
stock_day_rise.T.shape
(504, 500)

ndarray.tostring([order])或者ndarray.tobytes([order])
- 转换成bytes

In [40]: arr = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]])
    ...: arr.tostring()
Out[40]: b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x0c\x00\
x00\x00\x03\x00\x00\x00"\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00'

In [41]: arr = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]])
    ...: arr.tobytes()
Out[41]: b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x0c\x00\
x00\x00\x03\x00\x00\x00"\x00\x00\x00\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00'

ndarray.copy([order])

# 先从两年stock_day_rise拷贝一些数据
temp = stock_day_rise[:4, :4].copy()

"""
array([[-1.3366099 , -2.37966158,  0.04065209, -0.33543369],
       [ 0.718661  , -0.68340385, -0.7611837 ,  0.12510357],
       [ 2.31357094, -0.54394143, -0.98231446,  1.5545475 ],
       [ 0.41042537, -0.50010819,  1.33251362,  0.41554552]])
"""

当我们不想修改某个股票数据的时候，就可以去进行拷贝操作。在拷贝的数据上进行操作

梦因you而美

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Numpy的基本操作

1 创建数组1.1 0和1的数组zeros(shape[, dtype, order]) ones(shape[, dtype, order]) empty(shape[, dtype, order]) empty_like(a[, dtype, order, subok]) eye(N[, M, k, dtype, order]) identity(n[, dtype]) o...
复制链接

扫一扫

专栏目录