[391]matplotlib.pyplot中的hist函数

最新推荐文章于 2024-10-01 00:01:01 发布

周小董

最新推荐文章于 2024-10-01 00:01:01 发布

阅读量3.6w

点赞数 22

分类专栏：数据分析

本文链接：https://blog.csdn.net/xc_zhou/article/details/82224865

版权

数据分析专栏收录该内容

44 篇文章 9 订阅

订阅专栏

区分直方图与条形图：

条形图是用条形的长度表示各类别频数的多少，其宽度（表示类别）则是固定的；
直方图是用面积表示各组频数的多少，矩形的高度表示每一组的频数或频率，宽度则表示各组的组距，因此其高度与宽度均有意义。

由于分组数据具有连续性，直方图的各矩形通常是连续排列，而条形图则是分开排列。

条形图主要用于展示分类数据，而直方图则主要用于展示数据型数据

官方文档

程序与注释

# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt

#概率分布直方图
#高斯分布
#均值为0
mean = 0
#标准差为1，反应数据集中还是分散的值
sigma = 1
x=mean+sigma*np.random.randn(10000)
fig,(ax0,ax1) = plt.subplots(nrows=2,figsize=(9,6))
#第二个参数是柱子宽一些还是窄一些，越大越窄越密
ax0.hist(x,40,normed=1,histtype='bar',facecolor='yellowgreen',alpha=0.75)
##pdf概率分布图，一万个数落在某个区间内的数有多少个
ax0.set_title('pdf')
ax1.hist(x,20,normed=1,histtype='bar',facecolor='pink',alpha=0.75,cumulative=True,rwidth=0.8)
#cdf累计概率函数，cumulative累计。比如需要统计小于5的数的概率
ax1.set_title("cdf")
fig.subplots_adjust(hspace=0.4)
plt.show()

结果

# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt

matplotlib.pyplot.hist(
x, bins=10, range=None, normed=False, 
weights=None, cumulative=False, bottom=None, 
histtype=u'bar', align=u'mid', orientation=u'vertical', 
rwidth=None, log=False, color=None, label=None, stacked=False, 
hold=None, **kwargs)

x : (n,) array or sequence of (n,) arrays
这个参数是指定每个bin(箱子)分布的数据,对应x轴
bins : integer or array_like, optional
这个参数指定bin(箱子)的个数,也就是总共有几条条状图
normed : boolean, optional
If True, the first element of the return tuple will be the counts normalized to form a probability density, i.e.,n/(len(x)`dbin)
这个参数指定密度,也就是每个条状图的占比例比,默认为1
color : color or array_like of colors or None, optional
这个指定条状图的颜色
facecolor: 直方图颜色
edgecolor: 直方图边框颜色
alpha: 透明度
histtype: 直方图类型，‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’

# -*- coding:utf-8 -*-
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt

# example data
mu = 100  # mean of distribution
sigma = 15  # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='blue', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()

首先构造数据，这里注意构造的是一维数组可以使用pandas中的Series，如果是二维数组使用DataFrame。

# -*- coding:utf-8 -*-
import pandas as pd
import numpy as np
import random


data = np.zeros((1000,1000),dtype=int)
for i in range(len(data)):#这里速度比较慢，因为随机给1000*1000的数组赋值
    for j in range(len(data[0])):
        data[i][j] = random.randint(1,20)#赋值的范围是1-20中的任意一个

#首先构造数据，这里注意构造的是一维数组可以使用pandas中的Series，如果是二维数组使用DataFrame。
data_m = pd.DataFrame(data)
data_m = data_m[1].value_counts()#注意value_counts函数统计一个series上的数据情况
data_m = data_m.sort_index()#给统计后的数据排序
print(data_m)

#随后开始画直方图
import matplotlib.pyplot as plt
plt.hist(data[0])
plt.show()

plt.hist(data[0],bins=20)
plt.show()

运行结果(左边是数据，右边是频数，按照数据的大小来排序）
1 55
2 49
3 51
4 42
5 51
6 38
7 44
8 55
9 41
10 56
11 45
12 43
13 51
14 54
15 46
16 53
17 56
18 52
19 62
20 56
Name: 1, dtype: int64

开始画直方图：

import matplotlib.pyplot as plt
plt.hist(data[0])
plt.show()

默认情况下，总共分为10段，可以数一下上面的段数。如果使用如下代码

import matplotlib.pyplot as plt
plt.hist(data[0],bins=20)
plt.show()

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import pandas as pd


data = pd.read_csv("D:/apaper/pic/s2.csv")
data.head()
songTi = matplotlib.font_manager.FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

x = data.loc[:,'n_songs']
#normed=True是频率图，默认是频数图
plt.hist(x, bins=30, range=(0,100), normed=True,
                weights=None, cumulative=False, bottom=None,
                histtype=u'bar', align=u'left', orientation=u'vertical',
                rwidth=0.8, log=False, color=None, label=None, stacked=False,
                hold=None)
plt.xticks(fontproperties=songTi,fontsize=12)
plt.yticks(fontproperties=songTi,fontsize=12)
plt.xlabel('用户听歌数量（首）',fontproperties=songTi,fontsize=14)
plt.ylabel('人数占比（%）',fontproperties=songTi,fontsize=14)
# plt.legend(fontsize=12)
fig = plt.gcf()
fig.set_size_inches(7.2, 4.2)
fig.savefig('D:/apaper/pic/用户听歌数量2.png', dpi=100)
plt.show()

normed :normed=True是频率图，默认是频数图
range :筛选数据范围，默认是最小到最大的取值范围
histtype:hist柱子类型
orientation:水平或垂直方向
rwidth:柱子与柱子之间的距离，默认是0

图片中文乱码问题解决以及字体选择

本次选择的是宋体

songTi = matplotlib.font_manager.FontProperties(fname='C:\Windows\Fonts\simsun.ttc')

字体选择中的字体路径查看：
打开控制面板——》找到“字体”——》选择自己想要设置的字体，右击属性查看字体路径

通过fontproperties设置字体，fontsize设置字体大小

plt.xticks(fontproperties=songTi,fontsize=12)
plt.yticks(fontproperties=songTi,fontsize=12)
plt.xlabel('用户听歌数量（首）',fontproperties=songTi,fontsize=14)
plt.ylabel('人数占比（%）',fontproperties=songTi,fontsize=14)
# plt.legend(fontsize=12)

图片尺寸输出

通过以下设置图片尺寸并保存

fig = plt.gcf()
fig.set_size_inches(7.2, 4.2)
fig.savefig('D:/apaper/pic/用户听歌数量2.png', dpi=100)

参考：
https://www.jianshu.com/p/edf46a6c091b
https://blog.csdn.net/Robin_Ge/article/details/80945703
https://blog.csdn.net/denny2015/article/details/50581784