【机器学习课程-华盛顿大学】：1 案例研究 1.4 聚类（2）wikipedia文章聚类

最新推荐文章于 2019-05-22 21:47:56 发布

有石为玉

最新推荐文章于 2019-05-22 21:47:56 发布

阅读量245

点赞数 1

本文链接：https://blog.csdn.net/weixin_41770169/article/details/80812781

版权

机器学习专栏收录该内容

63 篇文章 5 订阅

订阅专栏

1、导入库和数据

import graphlab
graphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)

people = graphlab.SFrame('people_wiki.gl/')
people.head()

2、word_count

obama文章

obama = people[people['name'] == 'Barack Obama']
clooney = people[people['name'] == 'George Clooney']

obama word_count

obama['word_count'] = graphlab.text_analytics.count_words(obama['text'])

obama_word_count_table = obama[['word_count']].stack('word_count', new_column_name = ['word','count'])
obama_word_count_table.head()
obama_word_count_table.sort('count',ascending=False)

3、TF-IDF

people['word_count'] = graphlab.text_analytics.count_words(people['text'])
people.head()
tfidf = graphlab.text_analytics.tf_idf(people['word_count'])

# Earlier versions of GraphLab Create returned an SFrame rather than a single SArray
# This notebook was created using Graphlab Create version 1.7.1
if graphlab.version <= '1.6.1':
    tfidf = tfidf['docs']

tfidf
people['tfidf'] = tfidf

4、word_count和tf_idf拆分

obama = people[people['name'] == 'Barack Obama']
obama[['tfidf']].stack('tfidf',new_column_name=['word','tfidf']).sort('tfidf',ascending=False)

5、cosine distance

clinton = people[people['name'] == 'Bill Clinton']
beckham = people[people['name'] == 'David Beckham']
graphlab.distances.cosine(obama['tfidf'][0],clinton['tfidf'][0])
graphlab.distances.cosine(obama['tfidf'][0],beckham['tfidf'][0])

6、建立模型

knn_model = graphlab.nearest_neighbors.create(people,features=['tfidf'],label='name')
knn_model.query(obama)

有石为玉

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
【机器学习课程-华盛顿大学】：1 案例研究 1.4 聚类（2）wikipedia文章聚类

1、导入库和数据import graphlabgraphlab.set_runtime_config('GRAPHLAB_DEFAULT_NUM_PYLAMBDA_WORKERS', 4)people = graphlab.SFrame('people_wiki.gl/')people.head() 2、word_countobama文章obama = pe...
复制链接

扫一扫