关于DBScan的理论就不再黏贴了,这里记录一个讲的比较详细的网站 : https://www.cnblogs.com/pinard/p/6208966.html
下午尝试着用iris数据集去测试一下, 但发现效果不太理想,后面又用比较简单的参数调节方法去调参数,似乎也不理想,这里把记录下来以供后面继续研究。
- Load data from iris.csv
import pandas as pd
import numpy as np
import math
import operator
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
data = pd.read_csv("iris.csv")
data = np.mat(data)
- Use sklearn.DBSCAN to do the clustering
y_pred = DBSCAN(eps=0.5, min_samples=5).fit_predict(data[:, 1:5])
- Show the result
colors = 'gbycm'
y_pred_color = []
category = []
for pred in y_pred:
if pred == -1:
color = 'r'
else:
color = colors[pred]
y_pred_color.appe