首先,我们需要下载并导入必要的库:numpy、pandas、matplotlib、sklearn。
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score, adjusted_rand_score
```
然后,我们可以读取数据集并进行必要的预处理。在这个例子中,我们将只选择前两列作为我们的特征。
```python
data = pd.read_csv('yeast.data', sep='\s+', header=None)
X = data.iloc[:, 1:3].values
```
接下来,我们可以使用AGNES和DBSCAN算法进行聚类,并绘制聚类结果的散点图。我们将用不同的符号表示不同的簇。
```python
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
# AGNES
agnes = AgglomerativeClustering(n_clusters=3).fit(X)
labels = agnes.labels_
silhouette_avg = silhouette_score(X, labels)
ari = adjusted_rand_score(data.iloc[:, 0], labels)
colors = ['red', 'blue', 'green']
markers = ['o', 's', '^']
for i in range(3):
ax[0].scatter(X[labels==i, 0], X[labels==i, 1], color=colors[i], marker=markers[i])
ax[0].set_title(f'AGNES\nSilhouette score: {silhouette_avg:.2f}\nARI: {ari:.2f}')
# DBSCAN
dbscan = DBSCAN(eps=0.4, min_samples=5).fit(X)
labels = dbscan.labels_
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
silhouette_avg = silhouette_score(X, labels)
ari = adjusted_rand_score(data.iloc[:, 0], labels)
colors = ['red', 'blue', 'green', 'cyan', 'magenta', 'yellow', 'black']
markers = ['o', 's', '^', 'D', '*', 'P', 'X']
for i in range(n_clusters):
ax[1].scatter(X[labels==i, 0], X[labels==i, 1], color=colors[i], marker=markers[i])
ax[1].set_title(f'DBSCAN\nSilhouette score: {silhouette_avg:.2f}\nARI: {ari:.2f}')
plt.show()
```
最后,我们可以计算轮廓系数和兰德系数并打印出来。轮廓系数越接近1,表示聚类效果越好;兰德系数越接近1,表示聚类结果与真实结果越吻合。
```python
agnes_silhouette_avg = silhouette_score(X, agnes.labels_)
agnes_ari = adjusted_rand_score(data.iloc[:, 0], agnes.labels_)
print(f'AGNES\nSilhouette score: {agnes_silhouette_avg:.2f}\nARI: {agnes_ari:.2f}')
dbscan_silhouette_avg = silhouette_score(X, dbscan.labels_)
dbscan_ari = adjusted_rand_score(data.iloc[:, 0], dbscan.labels_)
print(f'DBSCAN\nSilhouette score: {dbscan_silhouette_avg:.2f}\nARI: {dbscan_ari:.2f}')
```
完整代码如下:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score, adjusted_rand_score
data = pd.read_csv('yeast.data', sep='\s+', header=None)
X = data.iloc[:, 1:3].values
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
# AGNES
agnes = AgglomerativeClustering(n_clusters=3).fit(X)
labels = agnes.labels_
silhouette_avg = silhouette_score(X, labels)
ari = adjusted_rand_score(data.iloc[:, 0], labels)
colors = ['red', 'blue', 'green']
markers = ['o', 's', '^']
for i in range(3):
ax[0].scatter(X[labels==i, 0], X[labels==i, 1], color=colors[i], marker=markers[i])
ax[0].set_title(f'AGNES\nSilhouette score: {silhouette_avg:.2f}\nARI: {ari:.2f}')
# DBSCAN
dbscan = DBSCAN(eps=0.4, min_samples=5).fit(X)
labels = dbscan.labels_
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
silhouette_avg = silhouette_score(X, labels)
ari = adjusted_rand_score(data.iloc[:, 0], labels)
colors = ['red', 'blue', 'green', 'cyan', 'magenta', 'yellow', 'black']
markers = ['o', 's', '^', 'D', '*', 'P', 'X']
for i in range(n_clusters):
ax[1].scatter(X[labels==i, 0], X[labels==i, 1], color=colors[i], marker=markers[i])
ax[1].set_title(f'DBSCAN\nSilhouette score: {silhouette_avg:.2f}\nARI: {ari:.2f}')
plt.show()
agnes_silhouette_avg = silhouette_score(X, agnes.labels_)
agnes_ari = adjusted_rand_score(data.iloc[:, 0], agnes.labels_)
print(f'AGNES\nSilhouette score: {agnes_silhouette_avg:.2f}\nARI: {agnes_ari:.2f}')
dbscan_silhouette_avg = silhouette_score(X, dbscan.labels_)
dbscan_ari = adjusted_rand_score(data.iloc[:, 0], dbscan.labels_)
print(f'DBSCAN\nSilhouette score: {dbscan_silhouette_avg:.2f}\nARI: {dbscan_ari:.2f}')
```
结果分析:
从散点图中可以看出,AGNES和DBSCAN算法都成功将数据集分成了三个簇。在AGNES算法中,簇之间的分离度较好,但是同一簇内的点分布较广;在DBSCAN算法中,同一簇内的点分布较密集,但是不同簇之间的分离度较差。
从轮廓系数和兰德系数可以看出,AGNES算法相对于DBSCAN算法具有更好的聚类效果。但是需要注意的是,这两个指标只是聚类效果的参考,具体的结果还需要根据实际情况进行判断。