python数据分析学习笔记二

最新推荐文章于 2023-09-17 16:47:52 发布

retacn

最新推荐文章于 2023-09-17 16:47:52 发布

阅读量609

点赞数

分类专栏： python数据分析

本文链接：https://blog.csdn.net/retacn_yue/article/details/55096838

版权

python数据分析专栏收录该内容

8 篇文章 2 订阅

订阅专栏

第二章 Numpy数组

Numpy数组优势

#创建数组

In [16]: a=arange(5)

In [17]: a.dtype

Out[17]: dtype('int32')

In [18]: a

Out[18]: array([0, 1, 2, 3, 4])

#返回一个元组,存放每一个维度的长度

In [19]: a.shape

Out[19]: (5,)

创建多维数组

In [20]: m=array([arange(2),arange(2)])

In [21]: m

Out[21]:

array([[0, 1],

[0, 1]])

In [22]: m.shape

Out[22]: (2, 2)

选择numpy数组元素

In [24]: a=array([[1,2],[3,4]])

In [25]: a

Out[25]:

array([[1, 2],

[3, 4]])

In [26]: a[0,0]

Out[26]: 1

In [27]: a[0,1]

Out[27]: 2

In [28]: a[1,0]

Out[28]: 3

In [29]: a[1,1]

Out[29]: 4

Numpy的数值类型

Bool 布尔

Inti 基于平台的整数

Int8 字节类型

Int16 整型-32768~32767

Int32 整型-2(31)~2(31)-1

Int64 整型-2(63)~2(63)-1

Uint8 无符号整型0-255

Uint16 无符号整型

Uint32 无符号整型

Uint64 无符号整型

Float16 半精度浮点型

Float32 单精度浮点型

Float64 双精度浮点型

Complex64 复数类型

Complex128复数类型

#数据类型字串

In [30]: a.dtype.itemsize

Out[30]: 4

In [31]: a.dtype

Out[31]: dtype('int32')

注:pycharm中,如果运行时,python console中自动运行ipython,可以作如下修改:

File->settings->consloe->取消use ipythonif available的选择

字符码

i 整型

u 无符号整型

f 单精度浮点型

d 双精度浮点型

b 布尔型

D 复数型

S 字符型

U 万国码

V 空类型

In [1]: arange(7,dtype='f')

Out[1]: array([ 0., 1., 2., 3., 4., 5., 6.], dtype=float32)

In [3]: arange(7,dtype='D')

Out[3]: array([ 0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j, 5.+0.j, 6.+0.j])

Dtype构造函数

#python自带常规浮点型

In [4]: dtype(float)

Out[4]: dtype('float64')

In [5]: dtype('f')

Out[5]: dtype('float32')

In [6]: dtype('d')

Out[6]: dtype('float64')

In [7]: dtype('f8')

Out[7]: dtype('float64')

#列出所有类型的字符码

In [8]: sctypeDict.keys()

Out[8]:

[0,

10,

11,

12,

13,

14,

15,

16,

17,

18,

19,

20,

21,

'unicode',

23,

'cfloat',

'longfloat',

'Int32',

'Complex64',

'unicode_',

'complex',

'timedelta64',

'uint16',

'c16',

'float32',

'int32',

'D',

'H',

'void',

'unicode0',

'L',

'P',

'half',

'void0',

'd',

'h',

'l',

'p',

22,

'Timedelta64',

'object0',

'b1',

'M8',

'String0',

'float16',

'ulonglong',

'i1',

'uint32',

'?',

'Void0',

'complex64',

'G',

'O',

'UInt8',

'S',

'byte',

'UInt64',

'g',

'float64',

'ushort',

'float_',

'uint',

'object_',

'Float16',

'complex_',

'Unicode0',

'uintp',

'intc',

'csingle',

'datetime64',

'float',

'bool8',

'Bool',

'intp',

'uintc',

'bytes_',

'u8',

'u4',

'int_',

'cdouble',

'u1',

'complex128',

'u2',

'f8',

'Datetime64',

'ubyte',

'm8',

'B',

'uint0',

'F',

'bool_',

'uint8',

'c8',

'Int64',

'Int8',

'Complex32',

'V',

'int8',

'uint64',

'b',

'f',

'double',

'UInt32',

'clongdouble',

'str',

'f2',

'f4',

'int',

'longdouble',

'single',

'string',

'q',

'Int16',

'Float64',

'longcomplex',

'UInt16',

'bool',

'Float32',

'string0',

'longlong',

'i8',

'int16',

'str_',

'I',

'object',

'M',

'i4',

'singlecomplex',

'Q',

'string_',

'U',

'a',

'short',

'e',

'i',

'clongfloat',

'm',

'Object0',

'int64',

'i2',

'int0']

Dtype属性

#取得类型对应的字符码

In [9]: t=dtype('Float64')

In [10]: t.char

Out[10]: 'd'

#类型属性相当于数组对象的类型

In [11]: t.type

Out[11]: numpy.float64

#取得数据类型字符串.<表示字节顺序,f表示字符码,8表示每个元素所需字节数

In [12]: t.str

Out[12]: '<f8'

一维数组的切片和索引

In [13]: a=arange(9)

#3-7

In [14]: a[3:7]

Out[14]: array([3, 4, 5, 6])

#0-7步长是2

In [15]: a[:7:2]

Out[15]: array([0, 2, 4, 6])

#数组反转

In [16]: a[::-1]

Out[16]: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

处理数组的型状

示例代码如下:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2016/12/7 11:45
# @Author  : Retacn
# @Site    : 数组形状的调整
# @File    : array_reshap.py
# @Software: PyCharm
__author__ = "retacn"
__copyright__ = "property of mankind."
__license__ = "CN"
__version__ = "0.0.1"
__maintainer__ = "retacn"
__email__ = "zhenhuayue@sina.com"
__status__ = "Development"

import numpy as np

print('In:b =arange(24).reshape(2,3,4)')
b = np.arange(24).reshape(2, 3, 4)

print('In:b')
#print(b)
#
# [[[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]
#
# [[12 13 14 15]
#  [16 17 18 19]
#  [20 21 22 23]]]

#拆解 将多维数组变成一维数组
print('In:b.ravel()')
#print(b.ravel())
#[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

#拉直 同上
print('In:b.flatten()')
#print(b.flatten())
#[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

#用元数组指定数组形状
print('In:b.shape(6,4)')
b.shape=(6,4)
# print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]
#  [12 13 14 15]
#  [16 17 18 19]
#  [20 21 22 23]]

#转置 行变列,列变行
print('In:b.transpose()')
#print(b.transpose())
# [[ 0  4  8 12 16 20]
#  [ 1  5  9 13 17 21]
#  [ 2  6 10 14 18 22]
#  [ 3  7 11 15 19 23]]

#调整大小
print('In:b.resize((2,12))')
b.resize((2,12))
#print(b)
# [[ 0  1  2  3  4  5  6  7  8  9 10 11]
#  [12 13 14 15 16 17 18 19 20 21 22 23]]

堆叠数组

In [17]: a=arange(9).reshape(3,3)

In [18]: a

Out[18]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

In [19]: b=2*a

In [20]: b

Out[20]:

array([[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

水平叠加

In [21]: hstack((a,b))

Out[21]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

In [22]: concatenate((a,b),axis=1)

Out[22]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

垂直叠加

In [23]: vstack((a,b))

Out[23]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

In [24]: concatenate((a,b),axis=0)

Out[24]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

深度叠加

In [25]: dstack((a,b))

Out[25]:

array([[[ 0, 0],

[ 1, 2],

[ 2, 4]],

[[ 3, 6],

[ 4, 8],

[ 5, 10]],

[[ 6, 12],

[ 7, 14],

[ 8, 16]]])

列式堆叠

#一维数组

In [26]: oned=arange(2)

In [27]: oned

Out[27]: array([0, 1])

In [29]: twice_oned=2*oned

In [30]: twice_oned

Out[30]: array([0, 2])

In [31]: column_stack((oned,twice_oned))

Out[31]:

array([[0, 0],

[1, 2]])

#二维数组

In [32]: column_stack((a,b))

Out[32]:

array([[ 0, 1, 2, 0, 2, 4],

[ 3, 4, 5, 6, 8, 10],

[ 6, 7, 8, 12, 14, 16]])

In [33]: column_stack((a,b))==hstack((a,b))

Out[33]:

array([[ True, True, True, True, True, True],

[ True, True, True, True, True, True],

[ True, True, True, True, True, True]], dtype=bool)

行式堆叠

#一维数组

In [34]: row_stack((oned,twice_oned))

Out[34]:

array([[0, 1],

[0, 2]])

#二维数组

In [35]: row_stack((a,b))

Out[35]:

array([[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8],

[ 0, 2, 4],

[ 6, 8, 10],

[12, 14, 16]])

拆分numpy数组

纵向拆分

In [39]: vsplit(a,3)

Out[39]: [array([[0, 1, 2]]), array([[3, 4,5]]), array([[6, 7, 8]])]

In [41]: split(a,3,axis=0)

Out[41]: [array([[0, 1, 2]]), array([[3, 4,5]]), array([[6, 7, 8]])]

横向拆分

In [36]: a

Out[36]:

array([[0, 1, 2],

[3, 4, 5],

[6, 7, 8]])

In [37]: hsplit(a,3)

Out[37]:

[array([[0],

[3],

[6]]), array([[1],

[4],

[7]]), array([[2],

[5],

[8]])]

In [38]: split(a,3,axis=1)

Out[38]:

[array([[0],

[3],

[6]]), array([[1],

[4],

[7]]), array([[2],

[5],

[8]])]

深度方向拆分

In [42]: c=arange(27).reshape(3,3,3)

In [43]: c

Out[43]:

array([[[ 0, 1, 2],

[ 3, 4, 5],

[ 6, 7, 8]],

[[ 9, 10, 11],

[12, 13, 14],

[15, 16, 17]],

[[18, 19, 20],

[21, 22, 23],

[24, 25, 26]]])

In [44]: dsplit(c,3)

Out[44]:

[array([[[ 0],

[ 3],

[ 6]],

[[ 9],

[12],

[15]],

[[18],

[21],

[24]]]), array([[[ 1],

[ 4],

[ 7]],

[[10],

[13],

[16]],

[[19],

[22],

[25]]]), array([[[ 2],

[ 5],

[ 8]],

[[11],

[14],

[17]],

[[20],

[23],

[26]]])]

Numpy数组的属性

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2016/12/7 13:32
# @Author  : Retacn
# @Site    : 数组的属性
# @File    : array_attribute.py
# @Software: PyCharm
__author__ = "retacn"
__copyright__ = "property of mankind."
__license__ = "CN"
__version__ = "0.0.1"
__maintainer__ = "retacn"
__email__ = "zhenhuayue@sina.com"
__status__ = "Development"

import numpy as np

b = np.arange(24).reshape(2, 12)
print('In:b')
# print(b)
# [[ 0  1  2  3  4  5  6  7  8  9 10 11]
#  [12 13 14 15 16 17 18 19 20 21 22 23]]

# 取得数组的维度
print('In:b.ndim')
# print(b.ndim)
# 2

# 元素的数量
print('In:b.size')
# print(b.size)
# 24

# 各个元素所占用的字节数
print('In:b.itemsize')
# print(b.itemsize)
# 4

# 要存取整个数组所需要的字节数
print('In:b.nbytes')
# print(b.nbytes)
# 96

print('In:b.size*b.itemsize')
# print(b.size * b.itemsize)
# 96

print('In:b.resize(6,4)')
# b.resize(6, 4)
# print(b)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]
#  [12 13 14 15]
#  [16 17 18 19]
#  [20 21 22 23]]

# 与transpose()函数相同
print('In:b.T')
# print(b.T)
# [[ 0  4  8 12 16 20]
#  [ 1  5  9 13 17 21]
#  [ 2  6 10 14 18 22]
#  [ 3  7 11 15 19 23]]

# 生成一个复数数组
print('In:b.=array([1.j+1,2.j+3])')
b = np.array([1.j + 1, 2.j + 3])
# print(b)
# [ 1.+1.j  3.+2.j]

# 返回数组的实部
print('In:b.real')
# print(b.real)
# [ 1.  3.]

# 数组的虚部
print('In:b.imag')
# print(b.imag)
# [ 1.  2.]

print('In:b.dtype')
# print(b.dtype)
# complex128

# 如果数组含有复数,数据类型将自动变为复数类型
print('In:b.dtype,str')
# print(b.dtype.str)
# <c16

print('In:b=arange(4).reshape(2,2)')
b = np.arange(4).reshape(2, 2)
# print(b)
# [[0 1]
#  [2 3]]

# 返回 numpy.flatiter
print('In:f=b.flat')
f = b.flat
# print(f)
# <numpy.flatiter object at 0x029B8438>

print('In:for it in f:print(it)')
# for it in f:
# print(it)
# 0
# 1
# 2
# 3

# 查询单个元素
print('In:b.flat[2]')
# print(b.flat[2])
# 2

# 查询多个元素
print('In:b.flat[[1,3]]')
print(b.flat[[1, 3]])
# [1 3]


print('In:b')
print(b)
# [[0 1]
#  [2 3]]

# 赋值
print('In:b.flat[[1,3]]=1')
b.flat[[1, 3]] = 1

print('In:b')
print(b)
# [[0 1]
#  [2 1]]

数组的转换


import numpy as np

b = np.array([1.j + 1, 2.j + 3])
print(b)
#[ 1.+1.j  3.+2.j]

#numpy数组转换成python列表
b.tolist()
print(b)
#[ 1.+1.j  3.+2.j]

#把数组元素转换为指定类型
b.astype(int)
print(b)
#[ 1.+1.j  3.+2.j]

#转换为int类型时,虚部将被替换
b.astype('complex')
print(b)
#[ 1.+1.j  3.+2.j]

创建数组的视图和拷贝


from scipy import misc
import matplotlib.pyplot as plt

ascent= misc.ascent()
# 创建一份视图的拷贝
acopy = ascent.copy()
# 为该数组创建一个视图
aview = ascent.view()

# 显示图像
plt.subplot(221), plt.imshow(ascent)
plt.title(ascent), plt.xticks([]), plt.yticks([])
plt.subplot(222), plt.imshow(acopy)
plt.title('acopy'), plt.xticks([]), plt.yticks([])
plt.subplot(223), plt.imshow(aview)
plt.title('aview'), plt.xticks([]), plt.yticks([])
# 通过flat迭代器,将视图中所有值全部设为0
aview.flat = 0
plt.subplot(224), plt.imshow(aview)
plt.title('aview1'), plt.xticks([]), plt.yticks([])
plt.show()

花式索引


from scipy import misc
from matplotlib import pyplot as plt

# 读入图像
ascent = misc.ascent()
# print(ascent)
# 取得x轴y轴的长度
xmax = ascent.shape[0]
ymax = ascent.shape[1]

# print(range(xmax))
# print(range(ymax))
# print(range(xmax - 1, -1, -1))
# print(ascent[range(xmax), range(ymax)])
# 将一条对角线上的值设为0
ascent[range(xmax), range(ymax)] = 0
# print(ascent[range(xmax), range(ymax)])
# 将别一条对角线上的值设为0
ascent[range(xmax - 1, -1, -1), range(ymax)] = 0

plt.imshow(ascent)
plt.show()

基于位置列表的索引方法


from scipy import misc
from matplotlib import pyplot as plt
import numpy as np

# 读入图像
ascent = misc.ascent()
# 取得图像的大小
xmax = ascent.shape[0]
ymax = ascent.shape[1]


# 打乱数组的索引
def shuffle_indices(size):
    arr = np.arange(size)
    np.random.shuffle(arr)
    return arr


xindices = shuffle_indices(xmax)
print(xindices, len(xindices), xmax)
np.testing.assert_equal(len(xindices), xmax)

yindices = shuffle_indices(ymax)
np.testing.assert_equal(len(yindices), ymax)

# 显示打乱后的图像,实际打乱的是位置索引
plt.imshow(ascent[np.ix_(xindices, yindices)])
plt.show()

使用布尔变量索引numpy数组


from scipy import misc
from matplotlib import pyplot as plt
import numpy as np

ascent = misc.ascent()


def get_indices(size):
    arr = np.arange(size)
    return arr % 4 == 0


# 对角线上可以被4整除的点
ascent1 = ascent.copy()
xindices = get_indices(ascent.shape[0])
yindices = get_indices(ascent.shape[1])
ascent1[xindices, yindices] = 0

# 将数组中值大于1/4到3/4的值 设为0
ascent2 = ascent.copy()
ascent1[(ascent > ascent.max() / 4) & (ascent < 3 * ascent.max() / 4)] = 0

# 显示图像 
plt.subplot(131), plt.imshow(ascent)
plt.title('ascent'), plt.xticks([]), plt.yticks([])
plt.subplot(132), plt.imshow(ascent1)
plt.title('ascent1'), plt.xticks([]), plt.yticks([])
plt.subplot(133), plt.imshow(ascent2)
plt.title('ascent2'), plt.xticks([]), plt.yticks([])
plt.show()

Numpy数组的广播

Python读取wave文件,示例代码如下:


from tkinter import *
import wave
from matplotlib import pyplot as plt
import numpy as np

# 打开文件
f = wave.open(r"si2323.wav", 'rb')

# 读取格式信息
params = f.getparams()
nchannels, sampwidth, framerate, nframes = params[:4]

# 读取波型数据
str_data = f.readframes(nframes)
f.close()

# 将wav波型数据转换为array数组
wave_data = np.fromstring(str_data, dtype=np.short)
wave_data.shape = -1, 2
wave_data = wave_data.T
time = np.arange(1, nframes) * (1.0 / framerate)
# 解决time和wave_data[0]在plot维度不同的问题
len_time = len(time) / 2 + 1
time = time[:int(len_time)]

# 显示声音波型
plt.subplot(211)
plt.plot(time, wave_data[0])
plt.subplot(212)
plt.plot(time, wave_data[1], c='r')
plt.xlabel('time')
plt.show()

示例代码如下:


from scipy.io import wavfile
from matplotlib import pyplot as plt
import urllib
import numpy as np

# response = urllib.request.urlopen('http://www.thesoundarchive.com/austinpowers/smashingbaby.wav')
# print(response.info())

# WAV_FILE =r'si2323.wav'
# filehandle = open(WAV_FILE, 'w')
# filehandle.write(response.read())
# filehandle.close()

# 读取音频文件
sample_rate, data = wavfile.read('si2323.wav')
print('Data type', data.dtype, 'Shape', data.shape)

# 显示原始声音图像
plt.subplot(211), plt.title('Original')
plt.plot(data)

# 保存wav文件
newdata = data * 0.2
newdata = newdata.astype(np.int16)
print('Data type', newdata.dtype, 'Shape', newdata.shape)
wavfile.write('quite.wav', sample_rate, newdata)

# 显示保存声音图像
plt.subplot(212), plt.title('Quiet')
plt.plot(newdata)
plt.show()