shell散点图_Matplotlib中的散点图轮廓

在matplotlib中创建大规模散点图并生成特定百分位数的轮廓。可以使用2D直方图如hist2d或hexbin,或者通过核密度估计(KDE)方法。示例展示了散点图、六边形直方图、2D直方图和高斯KDE的差异。对于大量数据,gaussian_kde可能会变慢,可以考虑用2D直方图加模糊处理作为加速替代方案。
摘要由CSDN通过智能技术生成

bd96500e110b49cbb3cd949968f18be7.png

I have a massive scatterplot (~100,000 points) that I'm generating in matplotlib. Each point has a location in this x/y space, and I'd like to generate contours containing certain percentiles of the total number of points.

Is there a function in matplotlib which will do this? I've looked into contour(), but I'd have to write my own function to work in this way.

Thanks!

解决方案

Basically, you're wanting a density estimate of some sort. There multiple ways to do this:

Use a 2D histogram of some sort (e.g. matplotlib.pyplot.hist2d or matplotlib.pyplot.hexbin) (You could also display the results as contours--just use numpy.histogram2d and then contour the resulting array.)

Make a kernel-density estimate (KDE) and contour the results. A KDE is essentially a smoothed histogram. Instead of a point falling into a particular bin, it adds a weight to surrounding bins (usually in the shape of a gaussian "bell curve").

Using a 2D histogram is simple and easy to understand, but fundementally gives "blocky" results.

There are some wrinkles to doing the second one "correctly" (i.e. there's no one correct way). I won't go into the details here, but if you want to interpret the results statistically, you need to read up on it (particularly the bandwidth selection).

At any rate, here's an example of the differences. I'm going to plot each one similarly, so I won't use contours, but you could just as easily plot the 2D histogram or gaussian KDE using a contour plot:

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import kde

np.random.seed(1977)

# Generate 200 correlated x,y points

data = np.random.multivariate_normal([0, 0], [[1, 0.5], [0.5, 3]], 200)

x, y = data.T

nbins = 20

fig, axes = plt.subplots(ncols=2, nrows=2, sharex=True, sharey=True)

axes[0, 0].set_title('Scatterplot')

axes[0, 0].plot(x, y, 'ko')

axes[0, 1].set_title('Hexbin plot')

axes[0, 1].hexbin(x, y, gridsize=nbins)

axes[1, 0].set_title('2D Histogram')

axes[1, 0].hist2d(x, y, bins=nbins)

# Evaluate a gaussian kde on a regular grid of nbins x nbins over data extents

k = kde.gaussian_kde(data.T)

xi, yi = np.mgrid[x.min():x.max():nbins*1j, y.min():y.max():nbins*1j]

zi = k(np.vstack([xi.flatten(), yi.flatten()]))

axes[1, 1].set_title('Gaussian KDE')

axes[1, 1].pcolormesh(xi, yi, zi.reshape(xi.shape))

fig.tight_layout()

plt.show()

YNSgP.png

One caveat: With very large numbers of points, scipy.stats.gaussian_kde will become very slow. It's fairly easy to speed it up by making an approximation--just take the 2D histogram and blur it with a guassian filter of the right radius and covariance. I can give an example if you'd like.

One other caveat: If you're doing this in a non-cartesian coordinate system, none of these methods apply! Getting density estimates on a spherical shell is a bit more complicated.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值