朴素贝叶斯

数星星啦

于 2021-08-11 13:25:48 发布

阅读量603

点赞数 1

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/gcyangxin/article/details/119601033

版权

机器学习专栏收录该内容

5 篇文章 2 订阅

订阅专栏

1.2.2sklearn中的应用-多项式朴素贝叶斯

1、朴素贝叶斯

1.1基础介绍

公式推导、拉普拉斯平滑、各种细节解释、python例子

https://www.pkudodo.com/2018/11/21/1-3/

公式

分母为全概率，分子累乘符号左边为先验概率=标签类别概率，右边为条件概率

如何获取结果

因为对于不同的ck，分母的和都相等，所以只需分别计算不同ck取值下分子的值即可，通过比较值大小得到判断。

算法详解

来源：李航--<<统计学习方法>>

图1.1

I是指示函数，表示统计满足条件样本个数

详例

1、需要计算x=(2,S)的类别是属于哪个，计算分子即可

2、分情况计算

若y=-1，求最大似然估计， $p(y=-1|x^{1}=2,x^{2}=S)$ ；

y=1时， $p(y=1|x^{1}=2,x^{2}=S)$

3、根据朴素贝叶斯公式，这里有x1和x2两个特征，朴素贝叶斯假设两个特征独立分布(特征之间没有关系)，因此

y=1时

$p(y=1|x^{1},x^{2})=\frac{p(x^{1}=2|y=1)*p(x^{2}=S|y=1)*p(y=1)}{\sum p(x^{1},x^{2})}$

y=-1时

略

4、比较大小

扩展

拉普拉斯平滑

详例说明：https://zhuanlan.zhihu.com/p/26329951?utm_medium=social&utm_source=qq

1、避免某标签数目为0，或者某特征的选择为0情况

2、条件概率平滑，先验概率平滑

3、λ不累加，Sj是某特征的选择的总数（图1.1描述部分），λ=0是求最大似然估计，λ=1时就是拉普拉斯平滑

图1.2

1.2算法应用

1.2.1文件分类

实例、计算步骤、拉普拉斯平滑应用

https://www.cnblogs.com/hapjin/p/8119797.html

代码、原理

http://blog.lisp4fun.com/2018/03/09/bayes

数据样本

训练样本
类别      内容
-1      just plain boring
-1      entirely predictable and lacks energy
-1      no surprises and very few laughs


1      very powerful
1      the most fun film of the summer
测试集
计算以下结果的类别
predictable with no fun

4、计算例子

例1

以下计算方法来源：https://www.cnblogs.com/hapjin/p/8119797.html，此例两种混用了

此例伯努利模型

P©= 类c下文件总数/整个训练样本的文件总数

P(tk|c)=(类c下包含单词tk的文件数+1)/(类c的文档总数+2)

训练样本
类别      内容
-1      just plain boring
-1      entirely predictable and lacks energy
-1      no surprises and very few laughs


1      very powerful
1      the most fun film of the summer
测试集
计算以下结果的类别
predictable with no fun

以文件类别为特征

类别为 -1的文档有3篇，P(c)=3/5=0.6

类别为 1 的文档有2篇，P(1)=2/5=0.4

条件概率：

p(predictable|-1)=1+1/（3+2*1）,分母：3是-1类文件数量，2是-1、1两类，(伯努利2是固定值)，λ=1

p(predictable|1)=0+1/（2+2）

p(no|-1)=1+1/（3+2）

p(no|1)=0+1/（2+2）

p(fun|-1)=0+1/（3+2）

p(fun|1)=1+1/（2+2）

计算某特征不发生概率 伯努利贝叶斯详细说明：解释不出现的特征计算方式

p(with|1)=0+1/(2+2）

p(with|-1)=0+1/(3+2）

p(just|1)=1+1/(2+2）

p(just|-1)=0+1/(3+2）

因此，测试集中的文档d归类为 -1 的概率为：0.6*0.032*[1-p(其他特征|-1)]，其他特征指测试集中未出现的特征

测试集中的文档d归类为 1 的概率为：0.4*0.0312*[1-p(其他特征|1)]

比较上面两个概率的大小，就可以知道将“predictable with no fun”归为 -1类别。

方法总结：

1、构建所有单词（train+test）里所有单词向量(特征)，一个文件内重复的特征数量合并为1，也就是说代表出现或者不出现

2、根据测试集构建的特征向量，计算先验概率和条件概率。

例2

以方法来源：https://www.cnblogs.com/kexinxin/p/10049910.html

此例为多项式模型

先验概率P©= 类c下单词总数/整个训练样本的单词总数

类条件概率P(tk|c)=(类c下单词tk在各个文档中出现过的次数之和+1)/(类c下单词总数+|V|)

|V|则表示训练样本包含多少种单词

朴素贝叶斯的多项式模型制作标签

1、特征只有一个，不同词语代表不同选择

训练样本
类别    特征1（每个单词表示特征的一个选择）
-1      just 
-1      plain 
-1      boring
-1      entirely
-1      predictable
-1      and
-1      lacks
-1      energy
-1      no
-1      surprises
-1      and
-1      very
-1      few
-1      laughs
#-1类14个样本

1      very
1      powerful
1      the
1      most
1      fun
1      film
1      of
1      the
1      summer
#1类9个样本
#不重复特征20个

测试集
计算以下结果的类别
predictable with no fun

计算步骤

1、训练样本：分类求得总数，去重求得特征选择数

2、将测试集中每个单词带入公式计算。with只存在测试集，没有训练数据，不计算。

$p(y=-1|x^{1})=\frac{p(y=-1|x^{1}=[1,20])*p(y=-1)}{\sum p(y=-1)*p(x^{1}=[1,20]|y=-1)}$

*[1-20]表示训练集中已去重的20个单词，表示特征的选择有20个

*分母计算，令x1=[1,20]，将20个特征带入累乘

若y=1，根据图1.1的(1)公式，计算似然估计，这里省略分母计算

$p(x^{1}=prediction|y=1)*p(x^{1}=with|y=1)*p(x^{1}=no|y=1)*p(x^{1}=fun|y=1)*{\color{Blue}p(y=1) }$

若y=-1，计算似然估计

省略

3、y=1类中没有predicttion这个特征选择，会造成似然估计结果为0，但是根据拉普拉斯平滑处理可以解决这个问题。图1.2里的4.10公式。

λ=1不累加，Sj=20

计算方法，将所有单词合并为一个特征，不同单词作为不同选择

类别-1一共有14个单词

类别1一共9个单词

不重复的类别一共20种

p(-1)=14/(9+14)=0.61(先验概率的拉普拉斯平滑，sklearn似乎没有这种方式)

p(1)=9/(9+14)=0.39

p(predictable|-1)=(1+1)/(14+20)=2/34，条件概率的贝叶斯估计Sj=特征的选择数=20，1为λ

p(predictable|1)=(0+1)/(9+20)=1/29

p(no|-1)=(1+1)/(14+20)

p(no|1)=(0+1)/(9+20)

p(fun|-1)=(0+1)/(14+20)

p(fun|1)=(1+1)/(9+20)

因此，测试集中的文档d归类为 -1 的概率为：0.61* （2*2*1）/34**3=6.20e-05

测试集中的文档d归类为 1 的概率为：0.39*（1*1*2）/29**3=3.19e-05

比较上面两个概率的大小，就可以知道将“predictable with no fun”归为 -1类别。

下图例子来自：https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

1.2.2sklearn中的应用-多项式朴素贝叶斯

转载：https://blog.csdn.net/qq_36523839/article/details/81505841

sklearn.naive_bayes.MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None)

主要用于离散特征分类，例如文本分类单词统计，以出现的次数作为特征值

参数说明：

alpha：浮点型，可选项，默认1.0，添加拉普拉斯/Lidstone平滑参数

fit_prior：布尔型，可选项，默认True，表示是否学习先验概率，参数为False表示所有类标记具有相同的先验概率

class_prior：类似数组，数组大小为(n_classes,)，默认None，类先验概率

#navie_bayes.py
class MultinomialNB(_BaseDiscreteNB):
    def _count(self, X, Y):
        """Count and smooth feature occurrences."""
        check_non_negative(X, "MultinomialNB (input X)")
        print(Y.T)
        print(X)
        print(Y.T.shape,X.shape)
        self.feature_count_ += safe_sparse_dot(Y.T, X)
        self.class_count_ += Y.sum(axis=0)

    def _update_feature_log_prob(self, alpha):
        """Apply smoothing to raw counts and recompute log probabilities"""
        smoothed_fc = self.feature_count_ + alpha
        smoothed_cc = smoothed_fc.sum(axis=1)

        #公式： p(X |-1,1)=(X+1)/(14+20)=X/34
        self.feature_log_prob_ = (np.log(smoothed_fc) -
                                  np.log(smoothed_cc.reshape(-1, 1)))
        print('拉普拉斯平滑:',smoothed_fc,'\n','smoothed_cc',smoothed_cc)
        print('未使用对数的结果',np.exp(self.feature_log_prob_))
    def _joint_log_likelihood(self, X):
        """Calculate the posterior log probability of the samples X"""

        print('矩阵乘法',X.shape,self.feature_log_prob_.T.shape)#(1, 20) (20, 2)
        #矩阵乘法，测试集包含特征的个数x训练集计算出来的权重
        print('不加log的条件概率值',np.exp(safe_sparse_dot(X, self.feature_log_prob_.T)))#[[1.01770812e-04 8.20041822e-05]]
        return (safe_sparse_dot(X, self.feature_log_prob_.T) +
                self.class_log_prior_)

    #继承父类
    def predict_log_proba(self, X):
        """
        Return log-probability estimates for the test vector X.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)

        Returns
        -------
        C : array-like of shape (n_samples, n_classes)
            Returns the log-probability of the samples for each class in
            the model. The columns correspond to the classes in sorted
            order, as they appear in the attribute :term:`classes_`.
        """
        check_is_fitted(self)
        X = self._check_X(X)
        jll = self._joint_log_likelihood(X)
        print('jll',np.exp(jll))#jll [[3.10400977e-05 6.39632621e-05]]
        # normalize by P(x) = P(f_1, ..., f_n)
        log_prob_x = logsumexp(jll, axis=1)
        # log归一化
        return jll - np.atleast_2d(log_prob_x).T
'''
输出

y.T
[[1 1 1 0 0]
 [0 0 0 1 1]]

X
[[0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0]
 [1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0]
 [1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1]
 [0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 0 2 0]]
y.shape,x.shape
(2, 5) (5, 20)

#计算条件概率
类条件概率P(tk|c)=(类c下单词tk在各个文档中出现过的次数之和+1)/(类c下单词总数+|V|)

类c下单词tk在各个文档中出现过的次数之和=smoothed_fc
[[3. 2. 2. 2. 2. 1. 1. 2. 2. 2. 1. 2. 1. 2. 1. 2. 1. 2. 1. 2.]#-1类特征出现次数之和
 [1. 1. 1. 1. 1. 2. 2. 1. 1. 1. 2. 1. 2. 1. 2. 1. 2. 1. 3. 2.]]#1类特征出现次数之和

类c下单词总数=smoothed_cc: [34. 29.]

未使用对数的条件概率结果，predictable在-1和1类的概率，也可理解为权重
[[0.08823529 0.05882353 0.05882353 0.05882353 0.05882353 0.02941176
  0.02941176 0.05882353 0.05882353 0.05882353 0.02941176 0.05882353
  0.02941176 0.05882353 0.02941176 0.05882353=predictable 0.02941176 0.05882353
  0.02941176 0.05882353]
 [0.03448276 0.03448276 0.03448276 0.03448276 0.03448276 0.06896552
  0.06896552 0.03448276 0.03448276 0.03448276 0.06896552 0.03448276
  0.06896552 0.03448276 0.06896552 0.03448276=predictable 0.06896552 0.03448276
  0.10344828 0.06896552]]
#测试时，用X*条件概率*先验概率，得出的值为样本的类别概率

'''
#test.py
import numpy as np
from sklearn.naive_bayes import MultinomialNB
# from sklearn.datasets import fetch_20newsgroups
'''
-1      just plain boring 
-1      entirely predictable and lacks energy
-1      no surprises and very few laughs 
 
1      very powerful
1      the most fun film of the summer
test
predictable with no fun
'''
# 舍弃with
s = '''just plain boring entirely predictable and lacks energy no surprises and very few laughs very powerful the most fun film of the summer predictable no fun'''
s = s.split(' ')
s = sorted(list(set(s)))
print([(i, k) for i, k in enumerate(s)], len(s))

X = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19],#20个特征
              [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
              [1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1],
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1],
              [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 2, 0]
              ])
X = X[1:, :]
test = np.array([
              [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0]])
y = np.array([-1, -1, -1, 1, 1])
clf = MultinomialNB(alpha=1.0)
clf.fit(X, y)#第一步更新概率值self._update_feature_log_prob(alpha)，第二步更新先验self._update_class_log_prior(class_prior=class_prior)

clf.predict(test)#调用self._joint_log_likelihood，进行sparse matrix计算
print(clf.class_log_prior_)
print(np.exp(clf.predict_log_proba(test)))#[[0.31758854 0.68241146]]

1.2.3sklearn中的应用-伯努利朴素贝叶斯

伯努利朴素贝叶斯：sklearn.naive_bayes.BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True,class_prior=None)

转载：https://blog.csdn.net/qq_36523839/article/details/81505841

类似于多项式朴素贝叶斯，也主要用户离散特征分类，和MultinomialNB的区别是：MultinomialNB以出现的次数为特征值，BernoulliNB为二进制或布尔型特性

参数说明：

binarize：将数据特征二值化的阈值

一个简单例子

import numpy as np
from sklearn.naive_bayes import MultinomialNB,BernoulliNB
X = np.array([[1,2,3,4],[1,3,4,4],[2,4,5,5]])
y = np.array([1,1,2])
clf_b = BernoulliNB(alpha=1.0,binarize = 3.0,fit_prior=True)
'''
x被3.0二值化后
[[0, 0, 0, 1],类1
 [0, 0, 1, 1],类1
 [0, 1, 1, 1]]类2
'''
# clf_b=BernoulliNB(alpha=1,binarize=0)
clf_b.fit(X,y)
print(np.exp(clf_b.feature_log_prob_))#计算条件概率
'''
训练过程
    P©= 类c下文件总数/整个训练样本的文件总数
    P(tk|c)=(类c下包含单词tk的文件数+1)/(类c的文档总数+2)
    
    [[0, 0, 1, 2],类1中特征出现的文件数
     [0, 1, 1, 1]]类2中特征出现的文件数
    条件概率计算0.25=(0+1)/(2+2)
    [[0.25       0.25       0.5        0.75      ] 若为类1则各特征输出的条件概率
    [0.33333333 0.66666667 0.66666667 0.66666667]] 类2

测试过程
    X:[[0, 0, 0, 1]]

    条件概率：[[0.25       0.25       0.5        0.75      ]   若为类1则各特征输出的条件概率
             [0.33333333 0.66666667 0.66666667 0.66666667]].T 类2

    x.dot* 条件概率*先验概率* 不发生概率积=后验概率
    [0.75,0.66667]*[0.667,0.333]*[0.75*0.75*0.5,0.667*0.333*0.333]=[0.140625   0.01646091]
    
    
    '''

函数细节

def _update_feature_log_prob(self, alpha):
        """Apply smoothing to raw counts and recompute log probabilities"""
        smoothed_fc = self.feature_count_ + alpha
        smoothed_cc = self.class_count_ + alpha * 2
        # print('feature_prob',smoothed_fc/smoothed_cc.reshape(-1, 1))
        #计算条件概率
        self.feature_log_prob_ = (np.log(smoothed_fc) -
                                  np.log(smoothed_cc.reshape(-1, 1)))

    def _joint_log_likelihood(self, X):
        """Calculate the posterior log probability of the samples X"""
        n_classes, n_features = self.feature_log_prob_.shape
        n_samples, n_features_X = X.shape

        if n_features_X != n_features:
            raise ValueError("Expected input with %d features, got %d instead"
                             % (n_features, n_features_X))
        # print(1-np.exp(self.feature_log_prob_))#等于print('feature_prob',smoothed_fc/smoothed_cc.reshape(-1, 1))
        # print(self.feature_log_prob_)
        neg_prob = np.log(1 - np.exp(self.feature_log_prob_))#求不发生概率，伯努利模型
        # Compute  neg_prob · (1 - X).T  as  ∑neg_prob - X · neg_prob
        #print(X) #测试集X
        # print(np.exp(self.feature_log_prob_ - neg_prob))#
        jll = safe_sparse_dot(X, (self.feature_log_prob_ - neg_prob).T)
        # print(np.exp(self.feature_log_prob_),np.exp(neg_prob))
        

        print('条件概率\n',np.exp(self.feature_log_prob_))
        print('不发生概率\n',np.exp(neg_prob))
        print('条件概率/不发生概率','\n',np.exp(self.feature_log_prob_)/np.exp(neg_prob))
        z=X.dot((np.exp(self.feature_log_prob_)/np.exp(neg_prob)).T)
        print('X矩阵乘法选出特征 z',z)        
        
        # print('不发生概率累乘',np.exp(neg_prob.sum(axis=1)))
        print('先验概率\n',np.exp(self.class_log_prior_))#,'不发生概率累乘后除以该特征发生概率',np.exp(neg_prob.sum(axis=1))/np.array([0.75,0.66666667]))
        # print('x * 先验权重矩阵',np.dot(X,np.exp(self.feature_log_prob_.T)))
        b=np.exp(neg_prob.sum(axis=1))/np.array([0.25,0.33333333])
        c=np.exp(self.class_log_prior_)
        d=np.dot(X,np.exp(self.feature_log_prob_.T))
        print(d*c*b),'''x * 条件概率*先验概率 * 不发生概率积 [[0.140625   0.01646091]]'''
        #print('x * 条件概率*先验概率 * 不发生概率积\n',d*c*b)
        jll += self.class_log_prior_ + neg_prob.sum(axis=1)#log：sum表示其他没发生的特征概率相乘
        print('z*先验*反概率积\n',z*np.exp(self.class_log_prior_)*np.exp(neg_prob.sum(axis=1)))
        # print(np.exp(jll))
        return jll


'''
函数内计算公式为：x* 条件概率/不发生.T * 先验概率*不发生概率积
条件概率
 [[0.25       0.25       0.5        0.75      ]
 [0.33333333 0.66666667 0.66666667 0.66666667]]
不发生概率
 [[0.75       0.75       0.5        0.25      ]
 [0.66666667 0.33333333 0.33333333 0.33333333]]
条件概率/不发生概率 
 [[0.33333333 0.33333333 1.         3.        ]
 [0.5        2.         2.         2.        ]]
X矩阵乘法选出特征 z [[3. 2.]]
先验概率
 [0.66666667 0.33333333]
z*先验*不发生概率的积，3个没使用，1个发生概率使用
 [[0.140625   0.01646091]]
标准化前后验概率 [[0.140625   0.01646091]]
最终输出[[0.89521081 0.10478919]]

计算过程相当于以下公式：函数内巧妙的提前除以不发生概率，最后与不发生概率的累乘约去0.25
0.75/0.25 x 0.66667 x (0.75*0.75.0.5*0.25)

'''





import numpy as np
from sklearn.naive_bayes import MultinomialNB,BernoulliNB
# from sklearn.datasets import fetch_20newsgroups
'''
-1      just plain boring 
-1      entirely predictable and lacks energy
-1      no surprises and very few laughs 
 
1      very powerful
1      the most fun film of the summer
test
predictable with no fun
'''
s = '''just plain boring entirely predictable and lacks energy no surprises and very few laughs very powerful the most fun film of the summer predictable no fun with'''
s = s.split(' ')
s = sorted(list(set(s)))
print([(i, k) for i, k in enumerate(s)], len(s))

X = np.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19，20],#21个特征,train+test一共21个不重复单词,与多项式构建特征向量不同
              [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
              [1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
              [1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0],
              [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0],
              [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 2, 0, 0]
              ])



X = X[1:, :]
test = np.array([
              [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,1]])#21个特征
y = np.array([-1, -1, -1, 1, 1])
#伯努利分布，将选一个阈值将特征按数量二值化
#条件概率计算
# smoothed_fc：按类别统计特征总数和+α，smoothed_cc：统计各类别的总数+α。
# self.feature_log_prob_ = (np.log(smoothed_fc)- np.log(smoothed_cc.reshape(-1, 1)))
#jll = safe_sparse_dot(X, (self.feature_log_prob_ - neg_prob).T)
clf_b=BernoulliNB(alpha=1,binarize=0)
clf_b.fit(X,y)
clf_b.predict(test)
# print(dir(clf))
# print(np.exp(clf.class_log_prior_))#先验概率
# print(np.exp(clf.feature_log_prob_))#(2,21)
# print(clf.predict_proba(test))

1.2.4sklearn中的应用高斯朴素贝叶斯

高斯朴素贝叶斯：适用于连续型数值，比如身高在160cm以下为一类，160-170cm为一个类，则划分不够细腻。

sklearn.naive_bayes.GaussianNB(priors=None)

clf.fit(X,y,np.array([0.5, 0.4, 0.3, 0.2]))
均值=

一个例子

X = np.array([[1,2,3,4],[1,3,4,4],[2,4,5,5]])
y = np.array([1,1,2])
# clf_b = BernoulliNB(alpha=1.0,binarize = 3.0,fit_prior=True)
test=np.array([[1,2,3,4]])

#高斯朴素贝叶斯
clf_g=GaussianNB()
clf_g.fit(X,y)
print=partial(print,sep='\n')
print(dir(clf_g))
print('先验概率',clf_g.class_prior_)
print('标准差',clf_g.sigma_)#[[1,2,3,4],[1,3,4,4]] -[1. 2.5 3.5 4.] **2  / 2

print('均值',clf_g.theta_)#[[1,2,3,4]+[1,3,4,4]] / 2= 均值[1.2.5 3.5 4. ]  [2,4,5,5]/1=均值[2.  4.  5.  5. ]]
print('均值',X[:2].mean(axis=0))

'''
先验概率
[0.66666667 0.33333333]
标准差
[[6.66666667e-10 2.50000001e-01 2.50000001e-01 6.66666667e-10]
 [6.66666667e-10 6.66666667e-10 6.66666667e-10 6.66666667e-10]]
均值
[[1.  2.5 3.5 4. ]
 [2.  4.  5.  5. ]]
均值
[1.  2.5 3.5 4. ]#1类的2个样本均值
'''

fit(X, y, sample_weight=None)：训练样本，X表示特征向量，y类标记，sample_weight表各样本权重数组
clf.fit(X,y,np.array([0.5, 0.4，0.3]))
类1均值=1*0.5+1*0.4/0.5+0.4
类2均值=0.3*2/0.3
类1方差=1-类1均值