一、使用工具包
numpy、pandas、sklearn
二、使用步骤
1.引入库
代码如下(示例):
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
import numpy as np
2.读入数据
代码如下(示例):
data = pd.read_csv("output.csv",encoding="utf-8")
data
数据处理部分已省略
3.文本向量化
代码如下(示例):
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(music)
4.kmeans聚类
代码如下(示例):
kmeans = KMeans(n_clusters=5)
kmeans.fit(X)
names = data['title']
pred = kmeans.labels_
label_map = {0: '经典老歌', 1: '流行', 2: '伤感情歌', 3: '网络热歌', 4: '民谣'}
5.完整代码
代码如下(示例):
def pred(dataX):
data = pd.read_csv("output.csv",encoding="utf-8")
music = data['text']
music = music.apply(remove)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(music)
kmeans = KMeans(n_clusters=5)
kmeans.fit(X)
names = data['title']
label_map = {0: '经典老歌', 1: '流行', 2: '伤感情歌', 3: '网络热歌', 4: '民谣'}
dataX = dataX.replace("\n","")
dataX = vectorizer.transform([dataX])
# return kmeans.predict(dataX)
return label_map[kmeans.predict(dataX)[0]]