df = pd.DataFrame(list(results),columns=[‘movie_num’,‘title’,‘language’,‘area’,
‘director’,‘video_type’,‘describe’,‘duration’,‘type’,“crew_name”])
df2 = pd.DataFrame(list(results1),columns=['movie_num','high_light'])
df3 = pd.merge(df,df2,on='movie_num',how='outer')
df4 = df3.fillna("")
df5 = df4.drop('duration',axis=1)
concat 后index重复问题
df_all = df_all.reset_index(drop=True)
df_all.iloc[96,:]['title']
df_all['title'].to_list()
def combine(x):
return x['title'] + " " + x['language']+ " " + x['area']\
+ " " + x['director']+ " " + x['crew_name']+ " " + x['describe']+" "+ x['high_light'] +" " + x['type']+ " " + x['video_type']
df_all['Combined_Data'] = df_all.apply(lambda x: combine(x),axis=1)
df1 = df.drop(['language','area', 'director','movie_type','movie_describe','duration','type'],axis=1)
index是获取索引,enumerate增加索引,sorted排序,x[1]是第二个位置,x[0]是按照第一个位置排序
ser_index = df_all[df_all.title == user_movie].index.values[0]
similar_movies = list(enumerate(cosine_sim[user_index]))
sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)[1:]
保存
pd.DataFrame(rec_all).to_csv('rec_results11.csv')
pd.DataFrame(rec_all).to_excel('rec_results11.xlsx')
excel 列传行操作
复制全部后,然后找个空白位置右键,选择性粘贴,选择转置选项即可
read_scv 读取第一列unnamed是不需要的,将索引复制成一列了,该怎么办呢?
stackflow上提供一个好方法,加入 index_col=0
pd.read_csv(path,index_col=0)
结果输出就没有了Unnamed
更改dataframe类自定义排序df[["","",""]]
读取字典保存出错arrays must all be same length;解决方法pd.DataFrame(sims1)改为:
pd.DataFrame.from_dict(sims1, orient='index').to_csv("521_15_30_rec.csv")