项目描述:
基于西雅图酒店数据集,基于用户选择的酒店,为其推荐相似度高的Top10个其他酒店。
数据集下载链接:https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Seattle_Hotels.csv
数据集包含三个字段:酒店姓名、地址、以及内容描述。
数据集展示:
方法步骤:
1.数据探索及导入相关包:
import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import linear_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import re
import random
pd.options.display.max_columns = 30
import matplotlib.pyplot as plt
%matplotlib inline
# 支持中文
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
df = pd.read_csv('Seattle_Hotels.csv', encoding="latin-1")
# 数据探索
print(df.head())
print('数据集中的酒店个数:', len(df))
name \
0 Hilton Garden Seattle Downtown
1 Sheraton Grand Seattle
2 Crowne Plaza Seattle Downtown
3 Kimpton Hotel Monaco Seattle
4 The Westin Seattle
address \
0 1821 Boren Avenue, Seattle Washington 98101 USA
1 1400 6th Avenue, Seattle, Washington 98101 USA
2 1113 6th Ave, Seattle, WA 98101
3 1101 4th Ave, Se