TensorBoardX histogram查看说明

最新推荐文章于 2024-05-21 19:21:06 发布

yllgl1

最新推荐文章于 2024-05-21 19:21:06 发布

阅读量1.1k

点赞数 8

分类专栏： pytorch 文章标签： python pytorch 深度学习可视化

本文链接：https://blog.csdn.net/yllgl1/article/details/114089859

版权

pytorch 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

TensorBoardX histogram查看说明

histogram意思是直方图，但是tensorboardX显示的直方图x轴和z轴的数值一般为小数，不知道如何计算出来，通过查阅源码，弄明白了，这里记录一下。
官方文档链接

产生0到9与1000到1009的测试数据，进行测试说明用的代码：

#导入包
from tensorboardX import SummaryWriter
import numpy as np
#写入的文件目录是tmp/exp
writer = SummaryWriter('tmp/exp')
#准备稀疏数据（10到1000一大片没有元素）
data=[i for i in range(10)]
data.extend([i for i in range(1000,1010)])
data = np.array(data)
#产生histogram
for epoch in range(100):
	writer.add_histogram("auto",data, epoch,'auto')
    writer.add_histogram("default",data, epoch)
    #根据官方文档，默认bins参数是使用'tensorflow'
#关闭写入文件
writer.close()
#打印数据
print(data)

如果上述代码运行没有问题，会在当前目录产生tmp/exp目录，在当前目录运行cmd命令：

tensorboard --logdir tmp

控制台命令
在浏览器打开localhost:6006可以进入到tensorboard界面

在histogram界面，可以看到default和auto这两个标签的图。

可以看到在tensorboardX中histogram是一个三维图。

图上x轴显示的刻度从100到1100，代表数据的取值。

图上y轴显示的刻度从10到90，代表每一个epoch（官方文档中的global_step参数）

图上z轴代表的是当前x所处的区间/bins/bucket中数据的个数。

通过在tensorboardX中查看default图可以发现:

x=16.8,z=10.0
x=50.5,z=0
…
x=925,z=0
x=959,z=2.5
x=992,z=7.5

通过查看源码得知，默认分成30个区间，每个区间的长度计算公式：
$\frac{{(\max - \min )}}{{30}} = \frac{{1009 - 0}}{{30}} \approx 33.63$
所以第一个区间是[0,33.63),取平均数为 $\frac{{33.63 + 0}}{2} \approx 16.8$
这就得到了第一个x的值。同理，第二个区间是[33.63,67.26)，取平均数得50.5，也就是第二个x的值，以此类推。
当x=959，所代表区间为[941.733,975.366)
当x=992，所代表的区间为[975.366,1009]
但是我们通过阅读官方的md文件得知下面一段话：

TensorFlow uses a similar approach to create bins, but unlike in our example, it doesn’t create integer bins. For large, sparse datasets, that might result in many thousands of bins. Instead, the bins are exponentially distributed, with many bins close to 0 and comparatively few bins for very large numbers. However, visualizing exponentially-distributed bins is tricky; if height is used to encode count, then wider bins take more space, even if they have the same number of elements. Conversely, encoding count in the area makes height comparisons impossible. Instead, the histograms resample the data into uniform bins. This can lead to unfortunate artifacts in some cases.

意思是它使用指数分布来产生区间，越接近零区间分得越密，越大的数区间的长度越宽。但是显示的时候是通过转化成上文所描述的等长区间来显示的，这就产生了一个问题，如何转换？
通过查阅源码我们知道区间分割点是从1e-12开始的公比为1.1的等比数列，也就是 ${1.1^n} \times {10^{ - 12}}$ 的关于 $n$ 的指数分布。
经过计算，

n=362，value=964.166
n=363，value=1060.583
所以分的区间是[964.166,1060.583)，注意到源码的一行：

const bucketRight = Math.min(max, histogram.buckets[bucketIndex].right);

区间右端点不能超过数据的最大值，所以最终划分的区间为[964.166,1009]，有10个数（1000到1009）。
现在可以计算z的值了。
查阅源码：

 const intersect =Math.min(bucketRight, binRight) - Math.max(bucketLeft, binLeft);
 const count = (intersect / (bucketRight - bucketLeft)) * histogram.buckets[bucketIndex].count;
 binY += intersect > 0 ? count : 0;

可知计算公式是
$\newline . \newline \frac{{交集长度}}{{区间大小}} \times 区间所含元素个数$
当x=992，区间[975.366,1009]与区间[964.166,1009]的交集长度为 $975.366{\text{ = }}33.634$
[964.166,1009]的区间大小是 $964.166{\text{ = }}44.834$
区间含有10个数，所以x=992时， $\frac{{33.634}}{{44.834}} * 10 \approx 7.5$ 。

同理，x=959时， $\frac{{975.366-964.166}}{{44.834}} * 10 \approx 2.5$ 。

现在解释auto的图。
区别仅仅是bins划分的不同，由源码：

counts, limits = np.histogram(values, bins=bins)

可知，本质调用了numpy的histogram函数，还是之前的数据为例，当bins=‘auto’时，得到的

counts = [10,  0,  0,  0,  0, 10]
limits = [0.,  168.16666667,  336.33333333,  504.5  ,
        672.66666667,  840.83333333, 1009. ]

所以 $\frac{{33.63-0}}{{168.16666667-0}} * 10 \approx 2$ 以此类推。

最后说明一下distribution图。

Reading from top to bottom, the lines have the following meaning: [maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]

由于自己的数据的distributions图的线不明显，这里盗用别人的图。

x轴代表epoch（global_step参数），y轴代表数据的取值，图中的曲线代表百分位数，从上到下分别表示最大值、93%百分位数、84%, 69%, 50%, 31%, 16%, 7%, 最小值。

所以distributions图看的是百分位数的变化情况，histogram图看的是数据分布情况。

yllgl1

关注

8
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
TensorBoardX histogram查看说明

TensorBoardX histogram查看说明histogram意思是直方图，但是tensorboardX显示的直方图x轴和z轴的数值一般为小数，不知道如何计算出来，通过查阅源码，弄明白了，这里记录一下。
复制链接

扫一扫