基于朴素贝叶斯算法的脱贫攻坚小数据分类算法

       通过对**县**乡**村扶贫工作数据分析,提出一种适用于精准扶贫的小数据分析方法。将机器学习、数据理论中的相关性分析和推理方法进行改进,得出基于朴素贝叶斯的小数据分析方法。本程序依据如下公式设计:

                                         

          式中aj的作用在于屏蔽、开启、强化或弱化特征系列对分类结果造成的影响。若aj=0则该特征对应概率为1,即表示屏蔽了该列特征对推理结果的影响;若aj=1则该特征对应概率不变,既没有被强化,也没有被弱化,以本征的方式开启了朴素贝叶斯的推理运算;若0<aj<1则表示弱化了特征对最终分类的影响,aj>1则表示强化了特征对最终分类的影响。    

         本程序支持UTF-8编码,以逗号作为做分隔符的CSV数据集,采用Python3.7开发。

   一.训练测试集

        (注:训练集和测试集均在同一文件,训练集1~219行,后续为测试集,可用记事本以UTF-8编码存为csv格式)

序号,户名,人数,民族,文化程度,健康状况,劳动技能,2014年,2015年,2016年,2017年,2018年,2019年,脱贫评估
1,黄**户,1,汉族,小学,健康,普通劳动力,3294,1939,4262.73,5933,6541.35,7705.4,监测户
2,姚**户,3,拉祜族,初中,患有大病,无劳动力,3543.57,4100.83,5117.1,8924.44,7985.35,11623.47,监测户
3,李**户,1,布朗族,小学,患有大病,无劳动力,3416.61,3431.16,4674.96,8252.03,7117.13,10401.87,未脱贫户
4,姚**户,5,拉祜族,小学,健康,无劳动力,3294,2190.45,6076.2,7386.49,10632.32,11075,脱贫户
5,周**户,5,彝族,小学,长期慢性病,弱劳动力或半劳动力,2387.75,2074.72,3448.82,4884.58,5174.79,7156.72,脱贫户
6,黄**户,5,拉祜族,初中,健康,普通劳动力,3186.06,4268.61,3901.45,7083.84,7243.91,9383.3,脱贫户
7,张**户,3,拉祜族,小学,健康,技能劳动力,2177.9,2414.94,4732.91,5221.54,8101.08,8878.35,脱贫户
8,姚**户,4,拉祜族,小学,健康,普通劳动力,2515.41,2930.65,3193.57,6312.75,10789.21,15897.08,监测户
9,黄**户,2,汉族,文盲或半文盲,健康,无劳动力,2784.99,3104.39,4960.34,8842.2,7086.39,10201.66,脱贫户
10,姚**户,4,拉祜族,小学,健康,技能劳动力,3371.43,3356.78,4993.3,6566.22,7686.89,10182.56,脱贫户
11,张**户,2,拉祜族,小学,残疾,丧失劳动力,4883.89,3386.34,6680.47,6145.76,11403.85,14689.6,脱贫户
12,张**户,2,拉祜族,小学,残疾,普通劳动力,3824.3,2324.25,5874.47,4965.22,9408.42,8678.12,脱贫户
13,姚**户,3,拉祜族,小学,"患有大病,残疾",弱劳动力或半劳动力,2785.52,3145.4,3564.59,4534.5,6401.2,9351.65,脱贫户
14,张**户,4,拉祜族,小学,健康,普通劳动力,4254.03,3766.58,5767.27,7176.42,10038.93,14788.79,脱贫户
15,张**户,1,拉祜族,小学,残疾,普通劳动力,3294,3350,2609.3,4788.66,7936.93,11712.2,脱贫户
16,姚**户,1,拉祜族,小学,残疾,丧失劳动力,3294,3350,3280,6306,7630,11192,脱贫户
17,李**户,4,汉族,小学,残疾,无劳动力,3294,1932.83,2862.01,5734.04,8531.18,11972.55,脱贫户
18,李**户,4,汉族,小学,健康,普通劳动力,3294,2789.76,3786.79,5021.32,6089.91,4583.88,脱贫户
19,胡**户,3,拉祜族,小学,健康,技能劳动力,1978.23,3495.54,4838.07,5226.89,10479.85,15397.41,脱贫户
20,杨**户,6,汉族,小学,健康,技能劳动力,2317.84,2591.02,2128.64,4740.52,6573.65,7763.96,脱贫户
21,杨**户,3,汉族,初中,健康,无劳动力,2859.67,1932.78,2316.55,5934.4,7383.17,10890.03,脱贫户
22,李**户,4,拉祜族,小学,健康,技能劳动力,2320.99,1992.05,3319.91,3066.12,9236.2,13642.62,脱贫户
23,李**户,1,汉族,小学,健康,普通劳动力,4272.13,3639.57,2845.93,4883.07,8911.21,8935.97,脱贫户
24,李**户,3,汉族,小学,健康,普通劳动力,2785.49,3546.46,3519.85,5075.37,14153.82,17769.56,脱贫户
25,杨**户,5,拉祜族,小学,健康,普通劳动力,3319.9,3844.21,4427.94,6644.26,6706.91,9885.55,脱贫户
26,钟**户,6,彝族,小学,患有大病,无劳动力,2331.14,3633.41,3142.98,4149.33,6072.9,7558.46,脱贫户
27,李**户,3,汉族,小学,健康,普通劳动力,2322.28,2046.89,3101.38,4355.68,12769.26,18814.78,脱贫户
28,杨**户,1,汉族,初中,健康,技能劳动力,2539.27,2573.76,4244.99,3075.66,24585,36148.27,脱贫户
29,李**户,2,拉祜族,小学,健康,普通劳动力,4837.98,4750.79,5493.19,9361.04,18554.38,27411.81,脱贫户
30,李**户,5,拉祜族,小学,健康,无劳动力,2052,2255.63,3210.94,5914.98,6954.15,10268.58,脱贫户
31,李**户,6,汉族,文盲或半文盲,"患有大病,残疾",无劳动力,2748.57,2285.39,3070,4995.91,8309.15,12277.61,监测户
32,李**户,3,汉族,小学,残疾,普通劳动力,3196.57,3827.22,2973.07,4090.07,10677.7,15467.68,脱贫户
33,李**户,3,拉祜族,小学,健康,普通劳动力,3567.11,6881.96,6734,4017.89,9242.84,9966.5,监测户
34,张**户,3,拉祜族,小学,健康,普通劳动力,2773.35,3107.17,4327.55,5465.67,6797.6,10012.57,监测户
35,李**户,4,汉族,文盲或半文盲,健康,普通劳动力,2135.72,2095.34,2060.59,3265.48,7127.8,10380.69,脱贫户
36,李**户,3,汉族,小学,健康,技能劳动力,2920.86,3406.91,3091.63,5256.96,10959.78,16053.07,脱贫户
37,胡**户,5,拉祜族,小学,长期慢性病,无劳动力,3398.56,5938.6,4894.18,6357.63,6860.24,8279.82,脱贫户
38,李**户,6,汉族,小学,健康,技能劳动力,2565.9,2997.35,3207.95,5620.24,7622.71,9797.36,脱贫户
39,李**户,5,拉祜族,小学,健康,普通劳动力,2953.99,3246.78,3571.22,4666.67,5873.96,8359.02,脱贫户
40,杨**户,6,汉族,小学,健康,普通劳动力,7528.56,9508.45,8199.93,5879.66,7527.37,6413.26,脱贫户
41,李**户,2,汉族,初中,健康,普通劳动力,3294,3350,1835,3333.33,6452.33,9216.5,监测户
42,李**户,4,汉族,小学,健康,普通劳动力,3294,3350,2525.18,4417.92,7641.78,11215.2,脱贫户
43,张**户,2,拉祜族,文盲或半文盲,健康,无劳动力,3294,3350,1366.8,4861.55,8137.73,11729.92,脱贫户
44,张**户,1,拉祜族,小学,健康,普通劳动力,3294,3350,2635.07,4386.6,13637.69,16375.61,脱贫户
45,杨**户,4,汉族,文盲或半文盲,健康,无劳动力,3294,3350,2377.01,3738.65,6327.8,9319.38,脱贫户
46,李**户,4,拉祜族,小学,健康,技能劳动力,3294,3350,1640.33,4201.99,8392.55,12226.58,脱贫户
47,杨**户,4,汉族,小学,健康,普通劳动力,3294,3350,1872.5,3566.95,9359.72,9006.66,脱贫户
48,潘**户,4,汉族,小学,健康,普通劳动力,3053.91,3590.28,3540.82,6028.6,9128.58,9245.42,脱贫户
49,潘**户,2,汉族,文盲或半文盲,健康,普通劳动力,5449.44,3832.53,4784.06,7599.48,7688.66,9926.86,脱贫户
50,张**户,4,拉祜族,小学,健康,普通劳动力,1971,3519.8,4340.61,5373.98,8617.49,12664.17,脱贫户
51,张**户,5,拉祜族,文盲或半文盲,健康,普通劳动力,2020.8,1700.6,2157.37,6729.93,6958.13,10272.19,脱贫户
52,潘**户,4,汉族,小学,健康,普通劳动力,3414.45,4908.03,5302.4,5701.65,3944.18,5722.28,脱贫户
53,何**户,2,彝族,初中,健康,普通劳动力,3156.91,1343.24,2901.05,11083.5,8780.75,12511.64,脱贫户
54,周**户,6,彝族,初中,健康,普通劳动力,2227.09,1385.14,2813.42,7250.65,5975.72,8797.87,脱贫户
55,周**户,4,彝族,小学,健康,普通劳动力,1668.33,3276.52,2497.8,4688.16,7501.12,11078.78,脱贫户
56,潘**户,3,彝族,小学,健康,普通劳动力,3001.1,3338.72,4662.96,5383.73,7344.72,10805.89,脱贫户
57,李**户,2,汉族,小学,健康,无劳动力,2844.81,2422.68,3288.16,4188.75,14956.81,22112.37,脱贫户
58,潘**户,2,汉族,文盲或半文盲,健康,普通劳动力,1594.98,1988.27,3344.99,8150.46,7363.05,10694.75,脱贫户
59,周**户,5,彝族,文盲或半文盲,健康,无劳动力,2322.09,3748.46,2842.58,4143.12,4932.27,7186.04,脱贫户
60,何**户,4,汉族,小学,健康,普通劳动力,2543,2744.25,2956.91,5459.24,7601.19,9048.53,脱贫户
61,潘**户,2,汉族,文盲或半文盲,健康,普通劳动力,5765.33,3469.2,3861.26,4844.8,10239.09,15098.91,脱贫户
62,周**户,5,彝族,小学,健康,普通劳动力,5110.6,5445.79,4665.6,3810.45,4273.19,6235.71,脱贫户
63,田**户,1,佤族,文盲或半文盲,健康,无劳动力,3294,3350,2629.58,7566.03,11926.8,17436.87,脱贫户
64,游**户,1,汉族,文盲或半文盲,健康,无劳动力,3294,3350,3280,2199.11,8169.61,11932.16,脱贫户
65,潘**户,2,汉族,初中,健康,普通劳动力,3294,3350,3280,5222,10543.22,18707.03,脱贫户
66,张**户,3,拉祜族,文盲或半文盲,患有大病,无劳动力,2540.46,5811.27,5347.07,4789.39,7598.83,5617.34,返贫户
67,周**户,3,彝族,小学,"患有大病,残疾",无劳动力,6450.47,4327.78,4919.41,7163.19,6716.06,9708.46,返贫户
68,周**户,2,彝族,小学,健康,普通劳动力,3117.64,3804.84,4039.14,6217.97,15233.81,14964.03,脱贫户
69,张**户,5,拉祜族,小学,健康,普通劳动力,3098.39,3030.01,3296.72,7494.29,5694.68,7636.02,脱贫户
70,周**户,3,彝族,小学,健康,弱劳动力或半劳动力,4613.16,3287.35,3555.6,6072.41,7781.88,11459.38,脱贫户
71,李**户,2,汉族,小学,健康,普通劳动力,2730.74,5173.39,5206.78,6240.27,12499.47,18355.86,脱贫户
72,潘**户,3,汉族,小学,健康,普通劳动力,4095.66,6477.7,4342.68,7674.73,9204.34,13578.11,脱贫户
73,李**户,4,汉族,小学,健康,普通劳动力,2583.53,3031.67,3153.24,7971.59,8086.64,11838.6,脱贫户
74,周**户,4,彝族,小学,健康,普通劳动力,2776.53,2711.23,2918.9,8683.12,6925.69,9236.05,脱贫户
75,李**户,5,汉族,文盲或半文盲,健康,无劳动力,3927.76,4295.28,4634.48,8159.6,6176.32,8306.64,脱贫户
76,钟**户,1,彝族,文盲或半文盲,残疾,无劳动力,3294,3350,3280,7030.2,16484.2,19862.2,脱贫户
77,罗**户,2,彝族,小学,残疾,无劳动力,2628.99,5312.57,4171.94,5337.39,4562.23,6689.43,返贫户
78,李**户,3,汉族,初中,健康,普通劳动力,2289.68,3361.59,4942.38,5654.68,9787.44,14391.12,脱贫户
79,钟**户,2,彝族,初中,残疾,弱劳动力或半劳动力,2918.77,4451.06,3469.8,4206.27,5841.69,6169.9,脱贫户
80,张**户,3,拉祜族,小学,健康,技能劳动力,2321,6829.5,2883.5,4200,7440,10953.94,脱贫户
81,孔**户,4,汉族,文盲或半文盲,健康,无劳动力,2424.84,3107.17,3441.22,4032.07,6869.8,7948.75,监测户
82,罗**户,2,彝族,高中,健康,无劳动力,3229.9,3831.2,4964.3,5214.17,4335.37,6359.69,脱贫户
83,孔**户,3,汉族,文盲或半文盲,健康,无劳动力,3150.51,3263.28,3493.87,7138.56,7991.52,11824.65,脱贫户
84,罗**户,4,彝族,小学,健康,技能劳动力,3650.01,2432.54,2459.38,5217.57,12963,19161.67,脱贫户
85,李**户,1,汉族,小学,健康,技能劳动力,3631.35,5465.75,3847.21,2923.61,6344.22,7658.21,脱贫户
86,罗**户,4,彝族,小学,健康,普通劳动力,3491.07,2616.39,3890,5543.33,7497.91,10128.84,脱贫户
87,罗**户,3,彝族,小学,患有大病,无劳动力,3669.04,3509.43,3534.88,6246.64,5269.13,7566.15,返贫户
88,钟**户,4,汉族,小学,健康,无劳动力,3798.66,2314.84,5090.44,6110.13,6202.31,9169.36,脱贫户
89,武**户,3,汉族,小学,健康,普通劳动力,3370.65,4584.21,4770.36,5089.31,10367.97,13515.13,脱贫户
90,罗**户,1,彝族,小学,残疾,弱劳动力或半劳动力,3294,3350,2637.73,4386.61,8056.24,11548.06,脱贫户
91,罗**户,2,彝族,小学,健康,技能劳动力,3294,3350,2286.63,2257.23,9890,14174.57,脱贫户
92,李**户,3,汉族,小学,健康,技能劳动力,2385.54,2564.23,4057.93,5686.1,6648.83,8774.13,脱贫户
93,钟**户,1,彝族,小学,健康,无劳动力,3294,3350,3280,5967.28,8071.45,11886.24,脱贫户
94,罗**户,1,汉族,小学,残疾,无劳动力,3294,3350,3280,5222,6947.93,9742.95,脱贫户
95,周**户,3,汉族,小学,健康,技能劳动力,3294,2767.12,2741.68,5315.66,4708.3,5336.81,脱贫户
96,周**户,2,汉族,小学,健康,普通劳动力,2316,2519.5,2289.61,4646.01,5864.11,8623.84,脱贫户
97,李**户,1,汉族,小学,长期慢性病,无劳动力,2979.58,2512.36,1565.02,4859.27,4971.55,5947.68,未脱贫户
98,李**户,5,汉族,文盲或半文盲,残疾,无劳动力,2184.58,4115.57,3788.01,5129.19,7078.69,6414.45,脱贫户
99,李**户,2,汉族,小学,健康,技能劳动力,3299.74,4017.38,4319.97,4351.75,3434.57,4818.89,未脱贫户
100,张**户,5,拉祜族,初中,健康,技能劳动力,4043.01,5002.02,3105.01,4484.13,9501.27,13990.59,脱贫户
101,周**户,6,汉族,文盲或半文盲,长期慢性病,无劳动力,3447.74,4511.64,3225.99,6063.55,6091.23,5205.97,脱贫户
102,胡**户,3,拉祜族,小学,健康,普通劳动力,3294,1462.59,1348.88,3995.38,7039.9,5699.12,脱贫户
103,周**户,3,汉族,初中,健康,普通劳动力,3294,2067.4,2907.61,4701.78,5969.73,8592.62,脱贫户
104,周**户,4,汉族,小学,健康,技能劳动力,3294,2441.4,3391.22,6314.05,6024.89,5650.64,脱贫户
105,胡**户,4,拉祜族,初中,健康,普通劳动力,3294,3350,3280,5222,4897.6,7082.76,脱贫户
106,张**户,3,拉祜族,小学,健康,技能劳动力,3294,1874.3,1793.07,4661.41,4854.56,6962.77,脱贫户
107,周**户,2,汉族,文盲或半文盲,健康,无劳动力,3294,2606.63,2960.91,4361.86,4332.41,5707.19,脱贫户
108,张**户,6,拉祜族,小学,健康,普通劳动力,3294,2328.67,3156.21,4537.62,7448.46,6885.68,脱贫户
109,张**户,5,拉祜族,小学,"残疾,患有大病",无劳动力,2025.21,2597.79,4129.98,6110.15,7196.4,7465.41,脱贫户
110,李**户,2,汉族,小学,健康,技能劳动力,5092,2881.5,2217,5204.56,8557.8,12658.73,脱贫户
111,张**户,3,拉祜族,初中,健康,技能劳动力,5011.33,1439.33,4011.33,5159,11661,15617.67,脱贫户
112,张**户,2,拉祜族,小学,健康,普通劳动力,3397.9,3579.46,3978.25,4961.83,6207.01,7806.36,脱贫户
113,何**户,4,彝族,小学,健康,技能劳动力,3040.79,3051.24,3063.37,4519.65,5204.65,5991.85,脱贫户
114,周**户,6,汉族,小学,长期慢性病,无劳动力,2929.07,3508.99,3232.5,4546.51,6541.29,8233.59,脱贫户
115,周**户,4,汉族,初中,健康,普通劳动力,3049.72,3379.5,3167.32,4969.13,6839.29,7205.09,脱贫户
116,张**户,6,拉祜族,小学,健康,弱劳动力或半劳动力,3043.59,3726.4,3169.42,4126.62,5402.15,4046.89,脱贫户
117,周**户,3,汉族,高中,健康,技能劳动力,4007.67,4483.64,5267.09,5618.29,9089.72,13113.16,脱贫户
118,姚**户,2,彝族,小学,健康,技能劳动力,3294,3350,2285,4092.5,8085,5937.5,脱贫户
119,张**户,1,拉祜族,初中,健康,技能劳动力,3294,3350,2609.69,4537.76,6209.11,9034.77,脱贫户
120,周**户,5,汉族,文盲或半文盲,残疾,弱劳动力或半劳动力,3294,3350,2624.61,4158.41,3974.04,5910.97,脱贫户
121,周**户,5,汉族,初中,健康,普通劳动力,3294,3350,2838.43,3946.74,4564.02,4778.76,未脱贫户
122,张**户,4,拉祜族,小学,健康,技能劳动力,3294,1991.87,3885.38,6448.3,14763.22,11903.66,脱贫户
123,钟**户,6,拉祜族,小学,健康,普通劳动力,3294,2272.6,2456,4940.47,7008.79,10315.31,脱贫户
124,石**户,1,拉祜族,小学,健康,普通劳动力,3294,2779.71,3483.58,6880.07,13978.81,19682.36,脱贫户
125,杨**户,3,汉族,小学,健康,普通劳动力,3294,2336.99,2138.93,5182.87,14703,17636.41,脱贫户
126,李**户,3,拉祜族,初中,健康,技能劳动力,3294,1900,1826.5,5184.81,10351.6,7600.49,脱贫户
127,李**户,3,拉祜族,小学,残疾,普通劳动力,3294,2758.32,3548.7,11564.61,9368.27,7956.7,监测户
128,张**户,3,拉祜族,初中,健康,普通劳动力,1790.47,1961.69,2126.83,5999.77,6350.84,9081.17,脱贫户
129,张**户,3,拉祜族,小学,健康,普通劳动力,2447.08,4256.18,2468.91,3609.72,8870.93,10209.71,脱贫户
130,李**户,5,拉祜族,小学,健康,弱劳动力或半劳动力,2847,3659,2733.76,3501.39,5393.6,7972.19,脱贫户
131,张**户,5,拉祜族,小学,健康,无劳动力,2498.44,4092.55,2620.57,7394.28,5343.79,5950.52,监测户
132,陶**户,5,拉祜族,小学,健康,普通劳动力,2390.43,2311.98,2244.01,5935.3,6378.25,9423.89,监测户
133,李**户,2,汉族,小学,健康,普通劳动力,3294,3350,1996,4643,5147.5,7308.34,未脱贫户
134,杨**户,5,汉族,初中,健康,普通劳动力,3294,3350,2902.55,3855.55,6170.44,8965.78,脱贫户
135,张**户,7,拉祜族,小学,健康,普通劳动力,3294,3350,2935.82,6316.21,6338.21,9216.34,脱贫户
136,鲁**户,2,彝族,小学,健康,普通劳动力,3294,2789.71,3789.63,5290.49,8271.63,11568.82,脱贫户
137,鲁**户,2,彝族,文盲或半文盲,健康,普通劳动力,3638.32,3108.79,2509.73,6063.09,7031,5457.53,未脱贫户
138,鲁**户,4,彝族,小学,健康,普通劳动力,2069.26,2547.91,2057.09,4840.43,3731.2,4336.62,脱贫户
139,鲁**户,4,彝族,小学,健康,普通劳动力,2457.24,1646.25,1915.17,4026.82,5852.08,4385.31,脱贫户
140,鲁**户,7,彝族,小学,健康,无劳动力,3086.16,3881.23,2226.86,4982.17,5683.35,7444.04,脱贫户
141,鲁**户,1,彝族,文盲或半文盲,健康,普通劳动力,3294,3350,3280,9336,9700,14162,脱贫户
142,李**户,4,汉族,小学,健康,技能劳动力,4490.16,3484.5,2357.22,6056.54,7896.45,11621.55,脱贫户
143,鲁**户,4,汉族,小学,健康,技能劳动力,3140.19,2996.07,2098.59,4818.19,6486.7,6760.38,脱贫户
144,钟**户,4,彝族,小学,健康,无劳动力,2524.94,3553.61,2505.2,4085.09,8091.34,10448.32,脱贫户
145,钟**户,4,彝族,小学,健康,普通劳动力,5068.35,5087.32,5584.98,4467.01,8038.29,11856.28,脱贫户
146,钟**户,4,彝族,小学,"残疾,患有大病",无劳动力,4461.31,4739.79,5063.54,3971.78,5043.56,4756.18,监测户
147,李**户,3,彝族,小学,健康,普通劳动力,3283.34,3294.43,2156.61,6299.91,6854.34,10065.51,脱贫户
148,黄**户,2,汉族,文盲或半文盲,健康,无劳动力,3032.21,2101.16,2437.07,2687.56,8258.85,10237.31,脱贫户
149,李**户,2,汉族,文盲或半文盲,健康,普通劳动力,3294.32,3668.68,3828.05,4949.78,15872.22,23364.53,脱贫户
150,钟**户,2,彝族,小学,健康,技能劳动力,3294,2692.33,4117.95,5300.06,7969.23,11697.28,脱贫户
151,李**户,2,汉族,初中,健康,普通劳动力,3548.35,5210.52,3634.56,5289.34,9225.31,13439.9,脱贫户
152,黄**户,4,汉族,小学,健康,普通劳动力,2299.4,2811.21,1619.89,4374.64,6530.63,7306.44,脱贫户
153,黄**户,3,汉族,小学,健康,普通劳动力,3222,3379,1860.04,5310.08,11124.48,13343.71,脱贫户
154,李**户,5,汉族,小学,健康,普通劳动力,2523.01,3892.66,4449.77,4900.4,5719.39,6492.62,监测户
155,李**户,5,汉族,小学,残疾,无劳动力,4957.94,3752.44,2600.85,4416.72,6878.98,8276.89,脱贫户
156,黄**户,4,汉族,小学,健康,普通劳动力,2491.88,2787.75,1636.64,4848.63,7247.5,7526.07,脱贫户
157,钟**户,4,彝族,初中,健康,普通劳动力,3294,2711.5,2535.57,5154.4,8397.4,8820.14,脱贫户
158,李**户,2,汉族,小学,健康,普通劳动力,3294,1624.39,2698.75,4233.66,12250.67,16884.22,脱贫户
159,钟**户,3,彝族,初中,健康,技能劳动力,3294,2775,2460.43,4407.83,7757.06,8353.79,脱贫户
160,钟**户,4,彝族,文盲或半文盲,残疾,无劳动力,3294,2556.02,3219.36,3749.66,8230.85,8520.4,脱贫户
161,李**户,2,彝族,小学,健康,普通劳动力,3294,3350,2669.56,2404.22,6396.77,8221.86,脱贫户
162,黄**户,5,汉族,小学,健康,普通劳动力,2431.52,3673.12,3312.51,5301.92,5841.36,8629.61,脱贫户
163,潘**户,4,汉族,小学,健康,普通劳动力,1848.93,2380.36,1725.73,4276.19,7876.43,11633.86,脱贫户
164,张**户,3,拉祜族,小学,健康,无劳动力,3415.3,3041.77,3153.75,5873.5,10563.06,15473.69,脱贫户
165,李**户,5,拉祜族,小学,健康,普通劳动力,2971.47,2715.84,1273.13,5148.44,6008.6,4681.41,监测户
166,钟**户,2,彝族,初中,健康,普通劳动力,5504.82,3134.15,1388.94,5423.31,12260.68,10903.1,脱贫户
167,李**户,3,彝族,小学,健康,普通劳动力,3708.33,4223.44,4745.92,4427.42,10834.33,11938.52,脱贫户
168,李**户,5,汉族,小学,健康,普通劳动力,3294,2511.41,1726.06,4781.47,7679.8,10298.39,脱贫户
169,杨**户,3,汉族,小学,健康,无劳动力,3294,2785.73,3531.02,5615.93,8024.76,10988.94,脱贫户
170,李**户,2,彝族,小学,残疾,弱劳动力或半劳动力,3294,2674.45,4541.1,3782.58,12977.39,14017.34,脱贫户
171,李**户,4,汉族,小学,健康,普通劳动力,4678.43,2268.93,3991.24,6241.59,6360.61,9320.66,脱贫户
172,李**户,5,彝族,小学,健康,普通劳动力,2187.52,3425.9,3617.45,3984.91,5822.79,7416.4,脱贫户
173,李**户,4,汉族,小学,健康,普通劳动力,2082.05,2878.97,3620.53,5409.48,9862.91,11111.91,脱贫户
174,钟**户,5,彝族,小学,健康,普通劳动力,3669.12,3566.07,2589.29,2696.66,4695.76,6774.94,脱贫户
175,李**户,3,汉族,小学,健康,无劳动力,1780.18,3216.45,3700.97,3804.33,9998.36,7197.05,监测户
176,杨**户,3,汉族,小学,健康,普通劳动力,1309.22,2029.97,2705.59,3680.33,7331.17,10673.85,脱贫户
177,罗**户,4,彝族,小学,健康,弱劳动力或半劳动力,7427.72,3140.99,2627.33,5218.56,6948.1,10186.87,脱贫户
178,杨**户,3,汉族,小学,残疾,弱劳动力或半劳动力,2878.64,3881.85,2265.66,3804.78,5997.73,7109.49,脱贫户
179,李**户,4,汉族,初中,健康,普通劳动力,3294,3350,3280,5222,6635.8,4687.72,脱贫户
180,李**户,4,布朗族,文盲或半文盲,健康,无劳动力,3128.08,3340.06,3672.47,5652.13,13932.78,20306.44,脱贫户
181,张**户,2,拉祜族,文盲或半文盲,健康,无劳动力,5511.39,3958.8,3978.92,7990.13,15482.66,12827.06,脱贫户
182,李**户,3,布朗族,初中,健康,弱劳动力或半劳动力,2609.04,1827.23,2216.55,8964.04,13527.91,19587.02,脱贫户
183,李**户,5,布朗族,小学,健康,普通劳动力,1707.74,1558.98,1727.48,6951.03,8192.87,11589.72,脱贫户
184,黄**户,4,汉族,小学,健康,弱劳动力或半劳动力,3136.46,3453.36,3215.28,6688.8,8180.99,9159.27,监测户
185,杨**户,3,彝族,文盲或半文盲,残疾,弱劳动力或半劳动力,3937.44,3062.45,4249.76,7078.05,11198.68,15647.36,脱贫户
186,黄**户,3,汉族,初中,长期慢性病,弱劳动力或半劳动力,1605,1250,2665.65,5367.67,9405.46,13715.71,脱贫户
187,李**户,6,布朗族,小学,健康,普通劳动力,3294,2531.44,2938,6449.11,8609.18,12560.09,脱贫户
188,杨**户,4,彝族,初中,健康,普通劳动力,3294,2354.37,1383.45,3644.6,7272.05,10552.45,脱贫户
189,张**户,1,拉祜族,小学,健康,技能劳动力,3294,2039,5571.35,4426.72,16420.32,17025.14,脱贫户
190,胡**户,4,拉祜族,小学,健康,普通劳动力,3294,2759.4,2025.8,4591.66,5891.08,6956.46,脱贫户
191,胡**户,3,拉祜族,小学,健康,普通劳动力,3294,2782.36,3701.69,5937.35,3761.38,4872.51,脱贫户
192,张**户,2,拉祜族,小学,健康,普通劳动力,3294,2493.89,2010.06,4971.99,8747.42,6279.5,脱贫户
193,李**户,1,汉族,文盲或半文盲,长期慢性病,无劳动力,3294,3029.49,4452.55,7346.76,7869.85,11519.66,脱贫户
194,张**户,5,拉祜族,小学,健康,普通劳动力,3294,2647.06,2906.61,4436.44,6113.05,4473.57,脱贫户
195,张**户,2,拉祜族,初中,健康,普通劳动力,3294,2133.34,2265.32,4247.45,9688.7,13550.23,脱贫户
196,张**户,1,拉祜族,小学,残疾,丧失劳动力,3294,2762.19,4381.82,4730.33,9207.19,7005.87,脱贫户
197,李**户,1,拉祜族,文盲或半文盲,健康,无劳动力,8652,10924,3137,3775,6871,8965,脱贫户
198,胡**户,2,拉祜族,小学,健康,技能劳动力,3004,3781.5,3607.18,6109.86,4949.72,4892.92,未脱贫户
199,张**户,3,拉祜族,小学,健康,普通劳动力,3342.24,3667.72,3479.49,2630.64,4525.14,4904.53,未脱贫户
200,李**户,4,拉祜族,小学,健康,技能劳动力,3103.32,3772.15,3066.55,4353.3,4199.75,5589.52,监测户
201,张**户,3,拉祜族,文盲或半文盲,健康,普通劳动力,3532.38,3946.38,2893.66,5221.09,8734.07,12594.45,脱贫户
202,张**户,5,拉祜族,文盲或半文盲,健康,技能劳动力,3490.61,3552.44,2400.05,2437.49,6147.76,7609.95,脱贫户
203,钟**户,4,彝族,小学,健康,普通劳动力,4114.81,4672.48,2969.29,3255.41,6944.85,7128.79,脱贫户
204,李**户,4,拉祜族,小学,健康,技能劳动力,4523.18,5105.73,2512.17,4897.14,7070.13,9782.21,脱贫户
205,张**户,2,拉祜族,小学,健康,普通劳动力,5518.06,6805.76,1482.8,4556.27,4725.69,5500.23,脱贫户
206,张**户,4,拉祜族,小学,健康,普通劳动力,3560.91,3758.44,3468,4957.16,5010.57,7224.78,脱贫户
207,胡**户,4,拉祜族,小学,健康,技能劳动力,3234.02,3362.65,3207.26,4608.56,8060.91,5930.65,脱贫户
208,李**户,7,拉祜族,文盲或半文盲,健康,技能劳动力,4550.84,4715.47,2672.84,6266.77,7135.22,5696.57,脱贫户
209,胡**户,2,拉祜族,文盲或半文盲,健康,技能劳动力,3621.41,3496.51,2113.32,4122.23,3990.98,5892.77,脱贫户
210,张**户,4,拉祜族,小学,健康,技能劳动力,3494.95,3860.76,3647.7,5060.87,8119.91,6792.15,脱贫户
211,胡**户,3,拉祜族,文盲或半文盲,健康,无劳动力,3621.35,4138.2,4048.67,4929.74,10387.89,7542.25,脱贫户
212,李**户,1,拉祜族,小学,健康,普通劳动力,3294,3350,2984.34,5042.9,12410.88,17704.57,脱贫户
213,张**户,1,拉祜族,文盲或半文盲,健康,无劳动力,3294,3350,3280,5222,4801.83,6767.23,脱贫户
214,张**户,3,拉祜族,文盲或半文盲,健康,无劳动力,3457.51,3632.27,3781.83,4647.43,9318.13,7556.58,脱贫户
215,张**户,2,拉祜族,文盲或半文盲,健康,无劳动力,3324.22,3896.14,4555.81,4621.32,6723.19,8156.48,脱贫户
216,张**户,4,拉祜族,小学,健康,技能劳动力,3191.99,3425.58,2039.88,2494.58,4033.05,5319.79,未脱贫户
217,张**户,2,拉祜族,文盲或半文盲,健康,普通劳动力,5599.32,7093.49,3210.53,5457.8,12140,9099.5,脱贫户
218,李**户,2,汉族,文盲或半文盲,健康,普通劳动力,3294,3350,2645.59,4460.5,10562.9,15332.97,脱贫户
219,李**户,5,拉祜族,小学,健康,普通劳动力,2933.8,3247.8,2168.97,4517.92,6751.25,8497.78,未脱贫户
1,测试1,5,拉祜族,小学,健康,普通劳动力,2024.4,3283.5,2168.97,4517.92,2751.25,9482.3,未脱贫户
2,测试2,4,汉族,小学,健康,弱劳动力或半劳动力,3136.46,3453.36,3215.28,6688.8,8180.99,9159.27,监测户
3,测试3,3,彝族,文盲或半文盲,残疾,弱劳动力或半劳动力,3937.44,3062.45,4249.76,7078.05,11198.68,15647.36,脱贫户
4,测试4,3,拉祜族,初中,患有大病,无劳动力,3543.57,4100.83,5117.1,8924.44,7985.35,11623.47,监测户
5,测试5,1,布朗族,小学,患有大病,无劳动力,3416.61,3431.16,4674.96,8252.03,7117.13,10401.87,未脱贫户

   二.朴素贝叶斯Python源码

# 脱贫攻坚小数据朴素贝叶斯算法程序
import pandas as pd
import numpy as np
# 从豆瓣上快速获取包:pip install 包名 -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
class NaiveBayes(object):
     def getTrainSet(self):
        dataSet = pd.read_csv(r"C:\Users\Administrator\Desktop\小数据课题申报\扶贫小数据集\训练测试集UTF8.csv")
        dataSetNP = np.array(dataSet)  #将数据由dataframe类型转换为数组类型
        trainData = dataSetNP[:219,:dataSetNP.shape[1]-1]   #训练数据
        labels = dataSetNP[:219,dataSetNP.shape[1]-1]  #训练数据所对应的所属类型C
        testData = dataSetNP[219:,:dataSetNP.shape[1]-1] #测试数据
        testlabels =dataSetNP[219:,dataSetNP.shape[1]-1]  #测试数据所对应的类别C
        return trainData, labels, testData, testlabels

     def classify(self, trainData, labels, features):
        #求labels中每个label的先验概率
        a=[0,0,1,1,1,1,1,1,1,1,1,1,1]   #
        labels = list(labels)    #转换为list类型,labels为训练集最后一列分类的标签
        #print(labels)
        P_y = {}       #存入label的概率
        for label in labels:  #求每一种分类占总样本的比例
            P_y[label] = labels.count(label)/float(len(labels))   # p = count(y) / count(Y)
              #有四个分类:脱贫户、未脱贫户、监测户、返贫户,P_y[label]即这四种分类中的一种的总数与全部四种总数的比值
            #print(label,labels.count(label),len(labels))

        #求label与feature同时发生的概率
        P_xy = {}
        #print(P_y.keys())
        for y in P_y.keys():  #P_y.keys()即P_y的下标
            y_index = [i for i, label in enumerate(labels) if label == y]
                       #enumerate(labels)枚举labels,为label添加下标,获得不同分类的对应训练集所在的下标号,并存入y_index
            #range(len(features))为每一列的特征标题名,本例小数据排除最后一例,包括序号、姓名共计13列特征
            #print(y,":",y_index)
            for j in range(len(features)):      #features源自测试集 features[0] 在trainData[:,0]中出现的值的所有下标索引
                 x_index =[]
                 for i, feature in enumerate(trainData[:,j]):
                     if isinstance(feature,float):
                         if features[j]>feature-1000 and features[j]<feature+1000:  #设定收入数据的特征区间范围
                             x_index.append(i)
                     else:
                         if feature == features[j]:
                             x_index.append(i)
                 #print(features)
                 #x_index = [i for i, feature in enumerate(trainData[:,j]) if feature == features[j]]
                      # 把训练集中和测试集特征相同的索引记录至x_index
                 xy_count = len(set(x_index) & set(y_index))   # set(x_index)&set(y_index)列出两个表相同的元素,合取两个数据集
                 #print(set(x_index),set(y_index),set(x_index) & set(y_index))
                 pkey = str(features[j]) + '*' + str(y)
                 P_xy[pkey] = xy_count / float(len(labels)) #len(labels)是最后一列训练分类的总数目
                   # P_xy[pkey]是x与y交集占总训练样本的比例
                 #print(pkey,xy_count,float(len(labels)),":",P_xy[pkey])
                 #print("测试集特征x为:【",features[j],"】\n分布在训练集的位置为: \nx=",x_index,"\n枚举y分类:【",y
                 #      ,"】,分布在训练集的位置为:\ny=",y_index,"\nx与y的交集为:",set(x_index) & set(y_index),"\n\n")

        #求条件概率
        P = {}
        for y in P_y.keys():
            for x in features:
                pkey = str(x) + '|' + str(y)
                P[pkey] = P_xy[str(x)+'*'+str(y)] / float(P_y[y])    #P[X1/Y] = P[X1Y]/P[Y]
                #print(pkey,str(x)+'*'+str(y),P_xy[str(x)+'*'+str(y)], float(P_y[y]), ":",P[pkey])
                  #如求:P[拉祜族|监测户]=P[拉祜族*监测户]/P[监测户]= 0.04128440366972477/0.0779816513761468=0.5294117647058824
                  #但是在求:P[2014年收入|监测户]时就不正确了,因为2933.8作为2014年收入在训练集中是唯一的存在,应当划定区间如2500<2933.8<3500作为一类

         #求testData每条样例所属类别
        F = {}   #testData每条样例属于各个类别的概率
        for y in P_y:
             F[y] = P_y[y]
             for x, i in zip(features,a):
                 F[y] = F[y]*pow(P[str(x)+'|'+str(y)],i)    #P[y|X] = P[X|y]*P[y]/P[X],分母相等,比较分子即可,
                 #print(F[y])
                 # 所以有F=P[X/y]*P[y]=P[x1/Y]*P[x2/Y]*P[x3|y]*P[x4|y]*P[y]

        features_label = max(F, key=F.get)  #概率最大值对应的类别
        return features_label

     def test(self,trainData, labels, testData, testlabels):
         correct = 0
         for i in range(len(testData)):
             features = testData[i]
             # 该特征应属于哪一类
             result = nb.classify(trainData, labels, features)
             print('测试',i+1,'是', result)
             if nb.classify(trainData, labels, testData[i]) == testlabels[i]:
                 correct += 1
         print("\n正确率:", correct / float(len(testlabels)))

if __name__ == '__main__':
    nb = NaiveBayes()
    # 训练数据,测试数据
    trainData, labels, testData, testlabels = nb.getTrainSet()

    nb.test(trainData, labels, testData, testlabels)

三、数据分析

      当aj没有介入时,即a=[1,1,1,1,1,1,1,1,1,1,1,1,1],得到如下错误分类(注:该设置位于代码17行)

                

      当aj没有介入,消除序号和姓名特征对分类结果的影响,即a=[0,0,1,1,1,1,1,1,1,1,1,1,1],得到如下分类

               

       基于该数据集得到对5个测试数据0.8的正确率,还是不错的!查看数据集可以看出,测试2属于监测户,但被识别为脱贫户。这是因为监测户与脱贫户展现的特征数据划分不是太明显所致。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值