我的数据
hour Cost
20 58.00
20 336.00
20 34.50
20 106.50
20 118.00
...
11 198.36
11 276.00
11 40.00
11 308.00
11 140.00
11 72.00
11 116.50
11 290.00
11 266.00
11 66.00
11 100.00
11 79.00
11 106.00
11 160.00
我的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from scipy.stats import multivariate_normal
dataset=df[['hour','Cost']]
X = dataset.hour.values
Y = dataset.Cost.values
X, Y = np.meshgrid(X, Y)
N = len(X)
def estimateGaussian(dataset):
mu = np.mean(dataset, axis=0)
sigma = np.cov(dataset.T)
return mu, sigma
mu, Sigma = estimateGaussian(dataset)
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y
F = multivariate_normal(pos, mu, Sigma)
Z = F.pdf(pos)
fig = plt.figure(figsize=(20,10))
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, rstride=3, cstride=3, linewidth=1, antialiased=True,
cmap=cm.viridis)
cset = ax.contourf(X, Y, Z, zdir='z', offset=-0.15, cmap=cm.viridis)
# Adjust the limits, ticks and view angle
ax.set_zlim(-0.15,0.2)
ax.set_zticks(np.linspace(0,0.2,5))
ax.view_init(27, 90)
plt.show()
假设小时,花费任何随机向量
如何解决这个问题?
C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\scipy\stats\_multivariate.py in __init__(self, mean, cov, allow_singular, seed, maxpts, abseps, releps)
725 self._dist = multivariate_normal_gen(seed)
726 self.dim, self.mean, self.cov = self._dist._process_parameters(
--> 727 None, mean, cov)
728 self.cov_info = _PSD(self.cov, allow_singular=allow_singular)
729 if not maxpts:
C:\ProgramData\Anaconda3\envs\tensorflow\lib\site-packages\scipy\stats\_multivariate.py in _process_parameters(self, dim, mean, cov)
397
398 if mean.ndim != 1 or mean.shape[0] != dim:
--> 399 raise ValueError("Array 'mean' must be a vector of length %d." % dim)
400 if cov.ndim == 0:
401 cov = cov * np.eye(dim)
ValueError: Array 'mean' must be a vector of length 173873952.
我怎么知道我的数据的概率为任何一对(小时,成本)和可视化呢?
对不起,我的错误,我不是英语为母语的人。
所以我的问题呆了一会儿没有答案,我把@ImportanceOfBeingErnest建议简化的例子,使其可核查的例子:
这是一个简单的例子:
time=[1,2,3,4,5,6]
cost=[4,5,3,4,8,9]
var_matrix=np.array([time,cost]).T
mean = np.mean(var_matrix,axis=0)
sigma = np.cov(var_matrix.T)
y = multivariate_normal.pdf(var_matrix, mean=mean, cov=sigma,allow_singular=True)
如何绘制3D图形显示(成本,时间),以及对概率密度值。
提前致谢。