1. NumPy数据结构
1.1 NumPy Array基础
NumPy是python运用于数据分析、科学计算最重要的库之一;
提供类强大的 ndarray
类型用于处理多维数据。
可以定义一个python的array,然后用np.array()转化。
NumPy中的array元素如果有一个是浮点数,那么这个array中的所有元素都会以浮点数存储;
NumPy的Ndarray的数据结构必须是相同的,与list是有区别的;
In [2]:
import numpy as np
1.2NumPy Ndarray的创建方法
方法一、直接输入
In [3]:
# 最近五天的收盘价:10, 10.5, 11.0, 11.5, 12.0,通过np.array方法创建;
#一维数组形式:变量名 = np.array([元素1,元素2....])
close = np.array([10, 10.5, 11.0, 11.5, 12.0])
close
Out[3]:
array([10. , 10.5, 11. , 11.5, 12. ])
In [ ]:
In [3]:
#假设最近五天开盘价分别为: 9. , 9.5, 10. , 10.5, 11. 通过直接输入的方式创建一个包含开盘价和收盘价的二维数组
方法二、通过其它数据序列创建ndarray
从 list, tuple 对象中创建
In [7]:
# list转化为 ndarray
close_list = [10, 10.5, 11.0, 11.5, 12.0]
open_list = [9, 11, 10.4, 11, 11.5]
In [8]:
Out[8]:
array([10. , 10.5, 11. , 11.5, 12. ])
In [9]:
In [4]:
# 元组转化为ndarray
stock_info_tuple = ('000001', '平安银行', '银行', 10.20) # 元组储存股票的代码、名称、行业信息
In [ ]:
方法三、使用numpy函数生成ndarray
In [13]:
help(np.arange)
Help on built-in function arange in module numpy: arange(...) arange([start,] stop[, step,], dtype=None, *, like=None) Return evenly spaced values within a given interval. ``arange`` can be called with a varying number of positional arguments: * ``arange(stop)``: Values are generated within the half-open interval ``[0, stop)`` (in other words, the interval including `start` but excluding `stop`). * ``arange(start, stop)``: Values are generated within the half-open interval ``[start, stop)``. * ``arange(start, stop, step)`` Values are generated within the half-open interval ``[start, stop)``, with spacing between values given by ``step``. For integer arguments the function is roughly equivalent to the Python built-in :py:class:`range`, but returns an ndarray rather than a ``range`` instance. When using a non-integer step, such as 0.1, it is often better to use `numpy.linspace`. See the Warning sections below for more information. Parameters ---------- start : integer or real, optional Start of interval. The interval includes this value. The default start value is 0. stop : integer or real End of interval. The interval does not include this value, except in some cases where `step` is not an integer and floating point round-off affects the length of `out`. step : integer or real, optional Spacing between values. For any output `out`, this is the distance between two adjacent values, ``out[i+1] - out[i]``. The default step size is 1. If `step` is specified as a position argument, `start` must also be given. dtype : dtype, optional The type of the output array. If `dtype` is not given, infer the data type from the other input arguments. like : array_like, optional Reference object to allow the creation of arrays which are not NumPy arrays. If an array-like passed in as ``like`` supports the ``__array_function__`` protocol, the result will be defined by it. In this case, it ensures the creation of an array object compatible with that passed in via this argument. .. versionadded:: 1.20.0 Returns ------- arange : ndarray Array of evenly spaced values. For floating point arguments, the length of the result is ``ceil((stop - start)/step)``. Because of floating point overflow, this rule may result in the last element of `out` being greater than `stop`. Warnings -------- The length of the output might not be numerically stable. Another stability issue is due to the internal implementation of `numpy.arange`. The actual step value used to populate the array is ``dtype(start + step) - dtype(start)`` and not `step`. Precision loss can occur here, due to casting or due to using floating points when `start` is much larger than `step`. This can lead to unexpected behaviour. For example:: >>> np.arange(0, 5, 0.5, dtype=int) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>> np.arange(-3, 3, 0.5, dtype=int) array([-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8]) In such cases, the use of `numpy.linspace` should be preferred. The built-in :py:class:`range` generates :std:doc:`Python built-in integers that have arbitrary size <python:c-api/long>`, while `numpy.arange` produces `numpy.int32` or `numpy.int64` numbers. This may result in incorrect results for large integer values:: >>> power = 40 >>> modulo = 10000 >>> x1 = [(n ** power) % modulo for n in range(8)] >>> x2 = [(n ** power) % modulo for n in np.arange(8)] >>> print(x1) [0, 1, 7776, 8801, 6176, 625, 6576, 4001] # correct >>> print(x2) [0, 1, 7776, 7185, 0, 5969, 4816, 3361] # incorrect See Also -------- numpy.linspace : Evenly spaced numbers with careful handling of endpoints. numpy.ogrid: Arrays of evenly spaced numbers in N-dimensions. numpy.mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions. :ref:`how-to-partition` Examples -------- >>> np.arange(3) array([0, 1, 2]) >>> np.arange(3.0) array([ 0., 1., 2.]) >>> np.arange(3,7) arr