分类和聚类算法很多,但是对数据进行精准预测的算法不是很多,这里介绍下最常用的线性回归例子
spark官网上给出的例子不够全面,网上有些例子也不够具体,自己整理了一份
我的开发环境是ubuntu+IDEA+spark+scala
训练数据LR_data如下
3615, 3624, 2.1, 69.05, 15.1, 41.3, 20, 50708
365, 6315, 1.5, 69.31, 11.3, 66.7, 152, 566432
2212, 4530, 1.8, 70.55, 7.8, 58.1, 15, 113417
2110, 3378, 1.9, 70.66, 10.1, 39.9, 65, 51945
21198, 5114, 1.1, 71.71, 10.3, 62.6, 20, 156361
2541, 4884, 0.7, 72.06, 6.8, 63.9, 166, 103766
3100, 5348, 1.1, 72.48, 3.1, 56, 139, 4862
579, 4809, 0.9, 70.06, 6.2, 54.6, 103, 1982
8277, 4815, 1.3, 70.66, 10.7, 52.6, 11, 54090
4931, 4091, 2, 68.54, 13.9, 40.6, 60, 58073
868, 4963, 1.9, 73.6, 6.2, 61.9, 0, 6425
813, 4119, 0.6, 71.87, 5.3, 59.5, 126, 82677
11197, 5107, 0.9, 70.14, 10.3, 52.6, 127, 55748
5313, 4458, 0.7, 70.88, 7.1, 52.9, 122, 36097
2861, 4628, 0.5, 72.56, 2.3, 59, 140, 55941
2280, 4669, 0.6, 72.58, 4.5, 59.9, 114, 81787
3387, 3712, 1.6, 70.1, 10.6, 38.5, 95, 39650
3806, 3545, 2.8, 68.76, 13.2, 42.2, 12, 44930
1058, 3694, 0.7, 70.39, 2.7, 54.7, 161, 30920
4122, 5299, 0.9, 70.22, 8.5, 52.3, 101, 9891
5814, 4755, 1.1, 71.83, 3.3, 58.5, 103, 7826
9111, 4751, 0.9, 70.63, 11.1, 52.8, 125, 56817
3921, 4675, 0.6, 72.96, 2.3, 57.6, 160, 79289
2341, 3098, 2.4, 68.09, 12.5, 41, 50, 47296
4767, 4254, 0.8, 70.69, 9.3, 48.8, 108, 68995
746, 4347, 0.6, 70.56, 5, 59.2, 155, 145587
1544, 4508, 0.6, 72.6, 2.9, 59.3, 139, 76483
590, 5149, 0.5, 69.03, 11.5, 65.2, 188, 109889
812, 4281, 0.7, 71.23, 3.3, 57.6, 174, 9027
7333, 5237, 1.1, 70.93, 5.2, 52.5, 115, 7521
1144, 3601, 2.2, 70.32, 9.7, 55.2, 120, 121412
18076, 4903, 1.4, 70.55, 10.9, 52.7, 82, 47831
5441, 3875, 1.8, 69.21, 11.1, 38.5, 80, 48798
637, 5087, 0.8, 72.78, 1.4, 50.3, 186, 69273
10735,