K-Means算法可视化:https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
基本概念:
- 要得到簇的个数,需要指定K值
- 质心:均值,即向量各维取平均即可(坐标值求平均)
- 距离的度量:常用欧几里得距离和余弦相似读(先标准化)
- 优化目标:min∑(K, i=1)∑(x∈Ci)dist(ci, x)^2
算法流程:
- 随机选取k个中心点
- 遍历所有数据,将每个数据划分到最近的中心点中
- 计算每个聚类的平均值,并作为新的中心点(质心)
- 重复2-3,直到这k个中心点不再变化(收敛了),或执行了足够多的迭代
时间复杂度:O(I*n*k*m)
空间复杂度:O(n*m)
其中m为每个元素字段个数,n为数据量,I为跌打个数。一般I,k,m均可认为是常量,所以时间和空间复杂度可以简化为O(n),即线性的
算法优势:
- 简单,快速,适合常规数据集
算法劣势:
- K值难确定
- 对K个初始质心的选择比较敏感,容易陷入局部最小值
- 复杂度与样本呈线性关系
- 很难发现任意形状的簇
测试数据集:
city.txt:
上海上海,121.48,31.22 上海嘉定,121.24,31.4 上海宝山,121.48,31.41 上海川沙,121.7,31.19 上海南汇,121.76,31.05 上海奉贤,121.46,30.92 上海松江,121.24,31 上海金山,121.16,30.89 上海青浦,121.1,31.15 上海崇明,121.4,31.73 云南昆明,102.73,25.04 云南富民,102.48,25.21 云南晋宁,102.58,24.68 云南呈贡,102.79,24.9 云南安宁,102.44,24.95 云南昭通,103.7,29.32 云南永善,103.63,28.22 云南大关,103.91,27.74 云南彝良,104.06,27.61 云南鲁甸,103.54,27.21 云南绥江,103.97,28.58 云南盐津,104.28,28.08 云南威信,105.05,27.85 云南镇雄,104.86,27.42 云南巧家,102.92,26.9 云南永富,104.38,28.62 云南曲靖,103.79,25.51 云南宣威,104.09,26.24 云南富源,104.24,25.67 云南师宗,103.97,24.85 云南嵩明,103.03,25.35 云南会泽,103.27,26.41 云南沽益,103.82,25.62 云南罗平,104.3,24.88 云南陆良,104.64,25.04 云南宜良,103.12,24.9 云南马龙,103.61,25.41 云南路南,103.24,24.77 云南寻甸,103.25,25.56 云南玉溪,102.52,24.35 云南华宁,102.93,24.26 云南通海,102.75,24.09 云南澄江,102.91,24.68 云南江川,102.73,24.27 云南易门,102.15,24.67 云南元江,102,23.59 云南新平,101.98,24.06 云南峨山,102.38,24.16 云南思茅,101,22.79 云南普洱,101.03,23.07 云南镇沅,100.88,23.9 云南景东,100.82,24.42 云南景谷,100.71,23.5 云南黑江,101.71,23.4 云南澜沦,99.97,22.55 云南西盟,99.47,22.73 云南江城,101.88,22.58 云南孟连,99.55,22.32 云南临沦,100.09,23.88 云南云县,100.12,24.44 云南镇康,99.02,23.92 云南永德,99.25,24.03 云南凤庆,99.92,24.58 云南双江,99.85,23.45 云南沧源,99.24,23.15 云南耿马,99.41,23.56 云南保由,99.18,25.12 云南施甸,99.15,24.69 云南腾冲,98.51,25.01 云南昌宁,99.61,24.82 云南龙陵,98.7,24.58 云南丽江,100.25,26.86 云南华坪,101.24,26.63 云南永胜,100.76,26.71 云南宁蒗,100.82,27.29 云南文山,104.24,23.37 云南广南,105.09,24.05 云南西畴,104.68,23.42 云南麻栗坡,104.71,23.12 云南马关,104.4,23.01 云南丘北,104.19,24.03 云南砚山,104.35,23.62 云南富宁,105.6,23.62 云南个旧,102.43,23.35 云南弥勒,103.43,24.41 云南蒙自,103.41,23.36 云南元阳,102.81,23.17 云南红河,102.42,23.35 云南石屏,102.48,23.73 云南泸西,103.76,24.52 云南金平,103.24,22.77 云南开远,103.23,23.7 云南绿春,102.42,23.01 云南建水,102.79,23.64 云南河口,103.98,22.52 云南屏边,103.67,22.68 云南景淇,100.79,22 云南勐海,100.5,21.95 云南勐腊,101.56,21.48 云南楚雄,101.54,25.01 云南元谋,101.85,25.7 云南武定,102.36,25.55 云南禄丰,102.08,25.15 云南南华,101.26,25.21 云南大姚,101.34,25.73 云南永仁,101.7,26.07 云南禄劝,102.45,25.58 云南牟定,101.58,25.32 云南双柏,101.67,24.68 云南姚安,101.24,25.4 云南下关,100.24,25.45 云南剑川,99.88,26.53 云南洱源,99.94,26.1 云南宾川,100.55,25.82 云南弥渡,100.52,25.34 云南永平,99.52,25.45 云南鹤庆,100.18,26.55 云南大理,100.19,25.69 云南漾濞,99.98,25.68 云南云龙,99.39,25.9 云南祥云,100.56,25.48 云南巍山,100.33,25.23 云南南涧,100.51,25.04 云南潞西,98.6,24.41 云南陇川,97.96,24.33 云南盈江,97.93,24.69 云南畹町,98.08,24.08 云南瑞丽,97.83,24 云南梁河,98.3,24.78 云南泸水,98.82,25.97 云南碧江,98.95,26.55 云南福贡,98.92,26.89 云南兰坪,99.29,26.49 云南贡山,98.65,27.73 云南中甸,99.72,27.78 云南德钦,98.93,28.49 云南维西,99.27,27.15 北京北京,116.46,39.92 北京平谷,117.1,40.13 北京密云,116.85,40.37 北京顺义,116.65,40.13 北京通县,116.67,39.92 北京怀柔,116.62,40.32 北京大兴,116.33,39.73 北京房山,115.98,39.72 吉林长春,125.35,43.88 吉林吉林,126.57,43.87 吉林农安,125.15,44.45 吉林德惠,125.68,44.52 吉林榆树,126.55,44.83 吉林九台,126.83,44.15 吉林双阳,125.68,43.53 吉林永吉,126.57,43.87 吉林舒兰,126.97,44.4 吉林蛟河,127.33,43.75 吉林桦甸,126.72,42.97 吉林磐石,126.03,42.93 吉林延吉,129.52,42.93 吉林汪清,129.75,43.32 吉林珲春,130.35,42.85 吉林图们,129.83,42.98 吉林和龙,129,42.52 吉林安图,128.3,42.58 吉林敦化,128.18,43.35 吉林通化,125.92,41.49 吉林柳河,125.7,40.88 吉林海龙,125.65,42.53 吉林辉南,126.03,42.68 吉林靖宇,126.8,42.38 吉林浑江,126.4,41.97 吉林抚松,127.27,42.33 吉林集安,126.17,41.15 吉林长白,128.17,41.43 吉林四平,124.37,43.17 吉林梨树,124.33,43.32 吉林怀德,124.82,43.5 吉林伊通,125.32,43.33 吉林辽源,125.15,42.97 吉林东丰,125.5,42.68 吉林双辽,123.5,43.52 吉林白城,122.82,45.63 吉林大安,124.18,45.5 吉林扶余,124.82,45.2 吉林乾安,124.02,45 吉林长岭,123.97,44.3 吉林通榆,123.13,44.82 吉林洮安,122.75,45.35 四川成都,104.06,30.67 四川金堂,104.32,30.88 四川双流,104.94,30.57 四川蒲江,103.29,30.2 四川郫县,103.86,30.8 四川新都,104.13,30.82 四川来易,102.15,26.9 四川盐边,101.56,26.9 四川温江,103.81,30.97 四川灌县,103.61,31.04 四川彭县,103.94,30.99 四川什邡,104.16,31.13 四川广汉,104.25,30.99 四川新津,103.78,30.42 四川邛崃,103.47,30.42 四川大邑,103.53,30.58 四川崇庆,103.69,30.63 四川绵阳,104.73,31.48 四川江油,104.7,31.8 四川青川,105.21,32.59 四川平武,104.52,32.42 四川光元,105.86,32.44 四川旺苍,106.33,32.25 四川剑阁,105.45,32.03 四川梓潼,105.16,31.64 四川三台,105.06,31.1 四川盐亭,105.35,31.23 四川射洪,105.31,30.9 四川遂宁,105.58,30.52 四川蓬溪,105.74,30.78 四川中江,104.68,31.06 四川德阳,104.37,31.13 四川绵竹,104.19,31.32 四川安县,104.41,31.64 四川北川,104.44,31.89 四川内江,105.04,29.59 四川乐至,105.02,30.3 四川安岳,105.3,30.12 四川威远,104.7,29.57 四川资中,104.85,29.81 四川资阳,104.6,30.19 四川简阳,104.53,30.38 四川隆昌,105.25,29.64 四川宜宾,104.56,29.77 四川富顺,104.97,29.24 四川南溪,104.96,28.87 四川江安,105.06,28.71 四川纳溪,105.38,28.77 四川泸县,105.46,28.96 四川合江,105.78,28.79 四川泸州,105.39,28.91 四川古蔺,105.79,28.03 四川叙水,105.44,28.19 四川长宁,104.91,28.6 四川兴文,105.06,28.36 四川琪县,104.81,28.38 四川高县,104.52,28.4 四川筠连,104.53,28.16 四川屏由,104.15,28.68 四川乐由,103.73,29.59 四川夹江,103.59,29.75 四川洪雅,103.38,29.95 四川丹棱,103.53,30.04 四川青神,103.81,29.86 四川眉由,103.81,30.05 四川彭由,103.83,30.22 四川井研,104.06,29.67 四川仁寿,104.09,30 四川犍为,103.93,29.21 四川沐川,103.98,28.96 四川娥眉,103.5,29.62 四川马边,103.53,28.87 四川峨边,103.25,29.23 四川金口,103.13,29.24 四川涪陵,107.36,29.7 四川垫江,107.34,30.36 四川丰都,107.7,29.89 四川石柱,108.13,29.98 四川秀山,108.97,28.47 四川西阳,108.75,28.85 四川黔江,108.81,29.53 四川彭水,108.19,29.29 四川武隆,108.72,29.29 四川南川,107.13,29.15 四川万县,108.35,30.83 四川开县,108.39,31.23 四川城口,108.67,31.98 四川巫溪,109.6,31.42 四川巫山,109.86,31.1 四川奉节,109.52,31.06 四川云阳,108.89,30.99 四川忠县,108.03,30.33 四川梁平,107.78,30.66 四川南允,106.06,30.8 四川苍溪,105.96,31.75 四川阆中,105.97,31.75 四川仪陇,106.38,31.52 四川南部,106.03,31.34 四川西允,105.84,31.01 四川营山,106.57,31.07 四川蓬安,106.44,31.04 四川广安,106.61,30.48 四川岳池,106.43,30.55 四川武胜,106.3,30.38 四川华云,106.74,30.41 四川达县,107.49,31.23 四川万源,108.06,32.07 四川宜汉,107.71,31.39 四川开江,107.87,31.1 四川邻水,106.91,30.36 四川大竹,107.21,30.75 四川渠县,106.94,30.85 四川南江,106.83,32.36 四川巴中,106.73,31.86 四川平昌,107.11,31.59 四川通江,108.24,31.95 四川百沙,108.18,32 四川雅安,102.97,29.97 四川芦山,102.91,30.17 四川名山,103.06,30.09 四川荣经,102.81,29.79 四川汉源,102.66,29.4 四川石棉,102.38,29.21 四川天全,102.78,30.09 四川宝兴,102.84,30.36 四川马尔康,102.22,31.92 四川红原,102.55,31.79 四川阿坝,101.72,31.93 四川若尔盖,102.94,33.62 四川黑水,102.95,32.06 四川松潘,103.61,32.64 四川南坪,104.19,33.23 四川汶川,103.61,31.46 四川理县,103.16,31.42 四川小金,102.34,30.97 四川金川,102.03,31.48 四川壤塘,100.97,32.3 四川茂汶,103.89,31.67 四川康定,101.95,30.04 四川炉霍,100.65,31.38 四川甘孜,99.96,31.64 四川新龙,100.28,30.96 四川白玉,98.83,32.23 四川德格,98.57,31.81 四川石渠,98.06,33.01 四川色达,100.35,32.3 四川泸定,102.25,29.92 四川丹巴,101.87,30.85 四川九龙,101.53,29.01 四川雅江,101,30.03 四川道孚,101.14,30.99 四川理塘,100.28,30.03 四川乡城,99.78,28.93 四川稻城,100.31,29.04 四川巴塘,99,30 四川得荣,99.25,28.71 四川西昌,102.29,27.92 四川昭觉,102.83,28.03 四川甘洛,102.74,28.96 四川雷波,103.62,28.21 四川宁南,102.76,27.07 四川会东,102.55,26.74 四川会理,102.21,26.67 四川德昌,102.15,27.4 四川美姑,103.14,28.33 四川金阳,103.22,27.73 四川布拖,102.8,27.7 四川普格,102.52,27.38 四川喜德,102.42,28.33 四川越西,102.49,28.66 四川盐源,101.51,27.42 四川冕宁,102.15,28.58 四川木里,101.25,27.9 天津天津,117.2,39.13 天津宁河,117.83,39.33 天津静海,116.92,38.93 天津蓟县,117.4,40.05 天津宝坻,117.3,39.75 天津武清,117.05,39.4 宁夏回族自治区银川,106.27,38.47 宁夏回族自治区永宁,106.24,38.28 宁夏回族自治区贺兰,106.35,38.55 宁夏回族自治区石嘴山,106.39,39.04 宁夏回族自治区平罗,106.54,38.91 宁夏回族自治区陶乐,106.69,38.82 宁夏回族自治区吴忠,106.21,37.99 宁夏回族自治区同心,105.94,36.97 宁夏回族自治区灵武,106.34,38.1 宁夏回族自治区中宁,105.66,37.48 宁夏回族自治区盐池,107.41,37.78 宁夏回族自治区中卫,105.18,37.51 宁夏回族自治区青铜峡,106.07,38.02 宁夏回族自治区固原,106.28,36.01 宁夏回族自治区西吉,105.7,35.97 宁夏回族自治区泾源,106.33,35.5 宁夏回族自治区海原,105.64,36.56 宁夏回族自治区隆德,106.11,35.63 安徽合肥,117.27,31.86 安徽长丰,117.16,32.47 安徽淮南,116.98,32.62 安徽凤台,116.71,32.68 安徽淮北,116.77,33.97 安徽濉溪,116.76,33.92 安徽芜湖,118.38,31.33 安徽铜陵,117.82,30.93 安徽蚌埠,117.34,32.93 安徽马鞍山,118.48,31.56 安徽安庆,117.03,30.52 安徽宿州,116.97,33.63 安徽宿县,116.97,33.63 安徽砀山,116.34,34.42 安徽萧县,116.93,34.19 安徽吴壁,117.55,33.55 安徽泗县,117.89,33.49 安徽五河,117.87,33.14 安徽固镇,117.32,33.33 安徽怀远,117.19,32.95 安徽滁州,118.31,32.33 安徽嘉山,117.98,32.78 安徽天长,119,32.68 安徽来安,118.44,32.44 安徽全椒,118.27,32.1 安徽定远,117.68,32.52 安徽凤阳,117.4,32.86 安徽巢湖,117.87,31.62 安徽巢县,117.87,31.62 安徽肥东,117.47,31.89 安徽含山,118.11,31.7 安徽和县,118.37,31.7 安徽无为,117.75,31.3 安徽卢江,117.29,31.23 安徽宣城,118.73,31.95 安徽当涂,118.49,31.55 安徽郎溪,119.17,31.14 安徽广德,119.41,30.89 安徽泾县,118.41,30.68 安徽南陵,118.32,30.91 安徽繁昌,118.21,31.07 安徽宁国,118.95,30.62 安徽青阳,117.84,30.64 安徽屯溪,118.31,29.72 安徽休宁,118.19,29.81 安徽旌得,118.53,30.28 安徽绩溪,118.57,30.07 安徽歙县,118.44,29.88 安徽祁门,117.7,29.86 安徽黟县,117.92,29.93 安徽太平,118.13,30.28 安徽石台,117.48,30.19 安徽桐城,116.94,31.04 安徽纵阳,117.21,30.69 安徽怀宁,116.63,30.41 安徽望江,116.69,30.12 安徽宿松,116.13,30.15 安徽太湖,116.27,30.42 安徽岳西,116.36,30.84 安徽潜山,116.53,30.62 安徽东至,116.99,30.08 安徽贵池,117.48,30.66 安徽六安,116.49,31.73 安徽霍丘,116.27,32.32 安徽寿县,116.78,32.57 安徽肥西,117.15,31.7 安徽舒城,116.94,31.45 安徽霍山,116.32,31.38 安徽金寨,115.87,31.67 安徽阜阳,115.81,32.89 安徽毫县,116.76,33.86 安徽涡阳,116.21,33.49 安徽蒙城,116.55,33.25 安徽利辛,116.19,33.12 安徽颖上,116.26,32.62 安徽阜南,115.6,32.63 安徽临泉,115.24,33.06 安徽界首,115.34,33.24 安徽太和,115.61,33.16 山东济南,117,36.65 山东历城,117.07,36.69 山东长清,116.73,36.55 山东章丘,117.53,36.72 山东青岛,120.33,36.07 山东崂山,120.42,36.15 山东胶南,119.97,35.88 山东即墨,120.45,36.38 山东胶县,120,36.28 山东淄博,118.05,36.78 山东枣庄,117.57,34.86 山东滕县,117.17,35.09 山东东营,118.49,37.46 山东垦利,118.54,37.59 山东利津,118.25,37.49 山东德州,116.29,37.45 山东宁津,116.8,37.64 山东乐陵,117.22,37.74 山东商河,117.15,37.31 山东济阳,117.2,36.97 山东禹城,116.66,36.95 山东夏津,116,36.95 山东陵县,116.58,37.34 山东庆云,117.37,37.37 山东临邑,116.86,37.2 山东齐河,116.76,36.79 山东平原,116.44,37.16 山东武城,116.08,37.2 山东滨州,118.03,37.36 山东滨县,117.97,37.47 山东广饶,118.41,37.04 山东桓台,118.12,36.95 山东邹平,117.75,36.89 山东阳信,117.58,37.65 山东沾化,118.14,37.7 山东博兴,118.12,37.12 山东高青,117.66,37.18 山东惠民,117.51,17.49 山东无棣,117.58,37.73 山东潍坊,119.1,36.62 山东潍县,119.22,36.77 山东平度,119.97,36.77 山东诸城,119.42,35.99 山东安丘,119.2,36.42 山东临朐,118.53,36.5 山东寿光,118.73,36.86 山东昌邑,119.41,36.86 山东高密,119.75,36.38 山东五莲,119.2,35.74 山东昌乐,118.83,36.69 山东高都,118.47,36.69 山东烟台,121.39,37.52 山东牟平,121.59,37.38 山东文登,122.05,37.2 山东海阳,121.17,36.76 山东莱阳,120.71,36.97 山东栖霞,120.83,37.28 山东掖县,119.93,37.18 山东长岛,120.73,37.91 山东威海,122.1,37.5 山东福山,121.27,37.49 山东荣成,122.41,37.16 山东乳山,121.52,36.89 山东莱西,120.53,36.86 山东招远,120.38,37.35 山东黄县,120.51,37.64 山东蓬莱,120.75,37.8 山东临沂,118.35,35.05 山东沂水,118.64,35.78 山东日照,119.46,35.42 山东临沭,118.73,34.89 山东仓山,118.03,34.84 山东平邑,117.63,35.49 山东沂源,118.17,36.18 山东沂南,118.47,35.54 山东营县,118.83,35.57 山东莒南,118.83,35.17 山东郯城,118.35,34.61 山东费县,117.97,35.26 山东蒙阴,117.95,35.7 山东泰安,117.13,36.18 山东莱芜,117.67,36.19 山东肥城,116.76,36.24 山东平阴,116.46,36.29 山东新汶,117.67,35.86 山东新泰,117.76,35.91 山东宁阳,116.8,35.76 山东东平,116.3,35.91 山东济宁,116.59,35.38 山东兖州,116.83,35.54 山东泗水,117.27,35.65 山东鱼台,116.65,35 山东嘉祥,116.34,35.41 山东汶上,116.49,35.71 山东曲阜,116.98,35.59 山东邹县,116.97,35.39 山东微山,117.12,34.8 山东金乡,116.32,35.07 山东荷泽,115.43,35.24 山东郓城,115.94,35.59 山东巨野,116.08,35.38 山东单县,116.07,34.82 山东曹县,115.53,34.83 山东鄄城,115.5,35.57 山东梁山,116.1,35.8 山东成武,115.88,34.97 山东定陶,115.57,35.07 山东东明,115.08,35.31 山东聊城,115.97,36.45 山东高唐,116.23,36.86 山东东阿,116.23,36.32 山东莘县,115.67,36.24 山东临清,115.72,36.68 山东茌平,116.27,36.58 山东阳谷,115.78,36.11 山东冠县,115.45,35.47
K-Means算法Python代码:
# coding=utf-8
from numpy import *
# 处理数据集
def load_dataset(fileName):
dataMat = []
fr = open(fileName, 'r', encoding='utf-8')
for line in fr.readlines():
dataMat.append([float(line.split(',')[1]), float(line.split(',')[2])])
return dataMat
# 计算两个向量的距离,用的是欧几里得距离
def dist_eclud(vecA, vecB):
return sqrt(sum(power(vecA - vecB, 2)))
# 随机生成初始的质心(随机选K个点)
def rand_cent(dataSet, k):
n = shape(dataSet)[1] # 列数,此处是2列
centroids = mat(zeros((k, n)))
for j in range(n):
minJ = min(dataSet[:, j])
rangeJ = float(max(array(dataSet)[:, j]) - minJ) # 每列范围
centroids[:, j] = minJ + rangeJ * random.rand(k, 1) # 5个1内的随机数
return centroids
def kmeans(dataSet, k, dist_meas=dist_eclud, create_cent=rand_cent):
m = shape(dataSet)[0] # 行数
clusterAssment = mat(zeros((m, 2))) # create mat to assign data points
# to a centroid, also holds SE of each point
centroids = create_cent(dataSet, k) # 随机取k个质心
clusterChanged = True
while clusterChanged:
clusterChanged = False
for i in range(m): # for each data point assign it to the closest centroid
minDist = inf # 最近距离
minIndex = -1 # 最近质心
for j in range(k):
distJI = dist_meas(centroids[j, :], dataSet[i, :]) # 计算每个向量与质心的距离
if distJI < minDist:
minDist = distJI
minIndex = j
if clusterAssment[i, 0] != minIndex:
clusterChanged = True
clusterAssment[i, :] = minIndex, minDist ** 2 # 每个向量的最近质心,最近距离平方
print(centroids)
for cent in range(k): # 更新质心
ptsInClust = dataSet[nonzero(clusterAssment[:, 0].A == cent)[0]] # 得到各个簇内的向量
# .A表示由矩阵转为Array ,nonzero(A)[0]返回A中所有非0索引
centroids[cent, :] = mean(ptsInClust, axis=0) # 求均值mean
# axis不设置值,对m * n个数求均值,返回一个实数
# axis = 0:压缩行,对各列求均值,返回1 * n矩阵
# axis = 1 :压缩列,对各行求均值,返回m * 1矩阵
return centroids, clusterAssment
def show(dataSet, k, centroids, clusterAssment):
from matplotlib import pyplot as plt
numSamples, dim = dataSet.shape
mark = ['+r', '+b', '+g', '+k', '+y', '^r', '+r', 'sr', '<r', 'pr']
for i in range(numSamples):
markIndex = int(clusterAssment[i, 0]) # 每个向量的簇
plt.plot(dataSet[i, 0], dataSet[i, 1], mark[markIndex]) # 绘制数据点
mark = ['Dr', 'Db', 'Dg', 'Dk', 'Dy', '^b', '+b', 'sb', '<b', 'pb']
for i in range(k):
plt.plot(centroids[i, 0], centroids[i, 1], mark[i], markersize=12) # 绘制最终质点
plt.show()
if __name__ == '__main__':
dataMat = mat(load_dataset('city.txt')) # 创建矩阵
myCentroids, clustAssing = kmeans(dataMat, 5) # 分为5个簇 质心位置,每个向量的簇及最近距离平方
print(myCentroids)
show(dataMat, 5, myCentroids, clustAssing)
运行结果:
基于sklearn库的K-Means算法实现:
# coding=utf-8
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# 从磁盘读取城市经纬度数据
X = []
f = open('city.txt', 'r', encoding='utf-8')
for v in f:
X.append([float(v.split(',')[1]), float(v.split(',')[2])])
# 转换成numpy array
X = np.array(X)
# 类簇的数量
n_clusters = 5
# 现在把数据和对应的分类数放入聚类函数中进行聚类
cls = KMeans(n_clusters).fit(X)
# X中每项所属分类的一个列表
cls.labels_
# 画图
markers = ['^', 'x', 'o', '*', '+']
cValue = ['r', 'y', 'g', 'b', 'c']
for i in range(n_clusters):
members = cls.labels_ == i
plt.scatter(X[members, 0], X[members, 1], s=60, marker=markers[i], c=cValue[i], alpha=0.5)
# 绘制散点图
plt.title('')
plt.show()
运行结果: