python数据分析（五）——numpy+matplotlib实例

最新推荐文章于 2024-04-25 15:12:04 发布

kuluomi111

最新推荐文章于 2024-04-25 15:12:04 发布

阅读量831

点赞数

分类专栏： python数据分析—numpy 文章标签： python 数据分析 numpy

本文链接：https://blog.csdn.net/Vapus/article/details/126180605

版权

python数据分析—numpy 专栏收录该内容

5 篇文章 2 订阅

订阅专栏

系列文章：
python数据分析（一）——numpy数组的创建
 python数据分析（二）——numpy数组的计算
 python数据分析（三）——numpy读取本地数据和索引
 python数据分析（四）——numpy中的nan和数据的填充

实战一

英国和美国各自youtube1000的数据结合之前的matplotlib绘制出各自的评论数量的直方图
希望了解英国youtube中视频的评论数和喜欢数的关系，应该如何绘制改图

一、评论数量分布直方图

分布直方图

us = np.loadtxt(us_file_path, delimiter = ",", skiprows= 0, dtype = "int")
# uk = np.loadtxt(uk_file_path, delimiter = ",", skiprows= 0, dtype = "int")

# 取评论的数据
us_comments = us[:, -1]

d = 250

# 5000以内的数量分布较多
us_comments = us_comments[us_comments <= 5000]

print(us_comments.max(), us_comments.min())

num = (us_comments.max()-us_comments.min())//d

# 设置图片大小
plt.figure(figsize=(20, 8), dpi=80)

# 绘图
plt.hist(us_comments, num)

# 标题、刻度线
plt.title("美国评论数量分布直方图", fontproperties=font)
plt.xticks(fontproperties=font)
plt.yticks(fontproperties=font)

plt.show()

二、评论数和喜欢数相互关系散点图

散点图

import numpy as np
import matplotlib
from matplotlib import pyplot as plt
from matplotlib.font_manager import FontProperties

font = FontProperties(fname="/System/Library/Fonts/Supplemental/Songti.ttc", size=14)

uk_file_path = ".../youtube_video_data/GB_video_data_numbers.csv"

uk = np.loadtxt(uk_file_path, delimiter = ",", skiprows= 0, dtype = "int")

# 选择喜欢数量大于500000的数据
uk = uk[uk[:, 1] <= 500000]
uk_comments = uk[:, -1]
uk_likes = uk[:, 1]

# 设置图片大小
plt.figure(figsize=(20, 8), dpi=80)

# 绘图
plt.scatter(uk_likes, uk_comments)

# 标题、刻度线
plt.title("评论数和喜欢数相关关系", fontproperties=font)
plt.xticks(fontproperties=font)
plt.xlabel("喜欢数量", fontproperties=font)
plt.yticks(fontproperties=font)
plt.ylabel("评论数量", fontproperties=font)

plt.show()

三、数组的拼接

现在把之前案例中两个国家的数据方法一起来研究分析，应该怎么做？

代码实例：

In [1]: t1
0ut[1]:
array([[0, 1, 2, 3, 4, 5],
	   [6, 7, 8, 9, 10, 11]])
	   
In [2]: t2
Out[2]:
array([[12, 13, 14, 15, 16, 17],
	   [18, 19, 20, 21, 22, 23]])
	   
In [3]: np.vstack((t1,t2)) -> 竖直拼接(vertically)
Out[3]:
array([[0, 1, 2, 3, 4, 5],
	   [6, 7, 8, 9, 10, 11],
	   [12, 13, 14, 15, 16, 17],
	   [18, 19, 20, 21, 22, 23]])

In [4]: np.hstack((t1,t2)) -> 水平拼接(horizontally)
Out[167]:
array([[0, 1, 2, 3, 4, 5, 12, 13, 14, 15, 16, 17],
	   [6, 7, 8, 9, 10, 11, 18, 19, 20, 21, 22, 23]])

水平分割和竖直分割跟水平拼接和竖直拼接是反方向的，水平分割是竖直的一条线切割，竖直分割是水平的一条线切割

四、数组的行列交换

数组水平或者竖直拼接很简单，但是拼接之前应该注意什么？

竖直拼接的时候：每一列代表的意义相同！

如果每一列的意义不同，需要交换某一组的数列，让其和另外一类相同，如何交换某个数组的行或者列？

代码实例：

In [180]: t = nр. arange(12, 24) .reshape(3, 4)
In [181]: t
0ut[181]:
array([[12, 13, 14, 15],
	   [16, 17, 18, 19],
	   [20, 21, 22, 23]])
	   
In [182]: t[[1,2], :] = t[[2,1], :] #行交换

In [183]: t
Out[183]:
array([[12, 13, 14, 15],
	   [20, 21, 22, 23],
 	   [16, 17, 18, 19]])
 	   
In [184]: t[:, [0,2]] = t[:, [2,0]] #列交换

In [185]: t
Out[185]:
array([(14, 13, 12, 15],
	   [22, 21, 20, 23],
	   [18, 17, 16, 19]])

实战二

现在希望把之前案例中两个国家的数据方法一起来研究分析，同时保留国家的信息(每条数据的国家来源)，应该怎么办

代码实例：

import numpy as np

us_file_path = ".../youtube_video_data/US_video_data_numbers.csv"
uk_file_path = ".../youtube_video_data/GB_video_data_numbers.csv"

# 加载国家数据
us = np.loadtxt(us_file_path, delimiter=",", skiprows=0, dtype="int")
uk = np.loadtxt(uk_file_path, delimiter=",", skiprows=0, dtype="int")

# 添加国家信息
# 构造全为0的数据,shape[0]表示行,1表示1列
zeros_data = np.zeros((us.shape[0], 1)).astype("int")
# 构造全为1的数据
ones_data = np.ones((uk.shape[0], 1)).astype("int")

us = np.hstack((us, zeros_data))
uk = np.hstack((uk, ones_data))

# 拼接两组数据
final_data = np.vstack((us, uk))
print(final_data)

一、numpy更多好用的方法

跨行获取最大值的位置：np.argmax(t, axis = 0)
跨列获取最小值的位置：np.argmin(t, axis = 1)
创建一个全0数组：np.zeros((3, 4))
创建一个全1数组：np.ones((3, 4))
创建一个对角线为1，其他地方全为0的正方形数组/矩阵：np.eye(3)

代码实例：

In [7]: t = np.arange(12).reshape(3, 4)
In [8]: t
Out[8]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
       
In [9]: np.argmax(t, axis = 0)
Out[9]: array([2, 2, 2, 2])

In [10]: np.argmin(t, axis = 1)
Out[10]: array([0, 0, 0])

01 numpy生成随机数

在这里插入图片描述

分布的补充：

均匀分布：在相同的大小范围内出现概率是等可能的
正态分布：呈钟型，两头低，中间高，左右对称

代码实例：

In [3]: np.random.randint(10,20,(4,5)) # 取不到20
Out[3]: 
array([[14, 17, 14, 16, 11],
       [15, 11, 15, 12, 18],
       [14, 16, 13, 18, 13],
       [10, 13, 17, 11, 12]])

代码实例：

In [4]: np.random.seed(10)
In [5]: t = np.random.randint(0,20,(3,4))
In [6]: t
Out[6]: 
array([[ 9,  4, 15,  0],
       [17, 16, 17,  8],
       [ 9,  0, 10,  8]])

In [9]: np.random.seed(11)
In [10]: t = np.random.randint(0,20,(3,4))
In [11]: t
Out[11]: 
array([[16, 17, 13, 12],
       [ 1,  7, 18, 13],
       [16,  0, 13, 12]])