Word Cloud Python

Word Cloud is popular in many web pages. 

Today, I find one simple way to generate word cloud by python. 

Here, we use pytagcloud package. It is can be find by Google.

We used :

python setup.py install

to install this package in python2.7

In the next, we may find this package also need pygame and jsonpickle the two packages. 

The same way to install these two packages.

Then we can generate the word cloud for one string by the follow:

tags = make_tags(get_tag_counts(contenct),maxsize=120)
        
        imagename = 'H:/Project/NextBuildData/Imageresult/'+venue['Venue_id']+'.png'
        print imagename
        create_tag_image(tags, imagename, background=(0, 0, 0), fontname='Lobster')

Here, The contenct is the text or str data

get_tag_counts is used to count the number of word in the string data. 

Make_tag is used to generate the struct for the create word cloud

In this function, maxsize is the word size, also have one parameter minsize is the min size for each word. 

Finally, creat_tag_image is used to generate the word cloud by the "tags".

imagename is the output image name.

background is the background color.

fontname is the word font in the output image.

My code is :

'''
Created on Mar 24, 2013

@author: Yang
'''
import pytagcloud
import nltk
import nltk.book
import sys, os, stat

from nltk import FreqDist
from nltk.corpus import wordnet
#from nltk.corpus import wordnet.synsets
from nltk.stem.lancaster import LancasterStemmer
import enchant
from nltk.tag.simplify import simplify_wsj_tag
import pymongo
import sys, os, stat
from pymongo import Connection

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts

import urllib2
import json
import pymongo
import sys, os, stat
from pymongo import Connection

c = Connection('localhost', 27017)

db = c.FourS2
finf = db.FourInformation
fpho = db.FourPhotos
ftip = db.FourTips
VenueList = db.Venuelist

count = 1

for venue in VenueList.find():
    try:
        if count<=1991:
            count = count+1
            continue
        contenct = ''
        vneuetips = venue['Tips']
        for tip in vneuetips:
            contenct = contenct+' '+vneuetips[tip]['Tip']
            
        tokens = nltk.word_tokenize(contenct) 
        contenct = ''
        tempword = []
        for i in tokens:
            temp = i.lower()
            st = LancasterStemmer()
            temp = st.stem(temp)
            tempword.append(temp)
        
        #print tempword
    #    fdist = FreqDist(tempword)
    #    v = fdist.keys()
        d = enchant.Dict("en_US")
    #    vtemp = v
    #    print 
        for sample in tempword:
            texttemp = nltk.word_tokenize(sample)
            tags = nltk.pos_tag(texttemp)
            s = [(word, simplify_wsj_tag(tag)) for word, tag in tags]
            atrr = s[0][1]
            tempnum = len(sample)
            if tempnum>2:
                if sample not in nltk.corpus.stopwords.words('english'):
                    if sample.isalpha():
                        if d.check(sample):
                            #if atrr=='N' or atrr=='NP':
                            contenct = contenct+' '+sample
                                    
            
    #YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\
    #used to depict keyword metadata on websites, or to visualize free form text."
    
        tags = make_tags(get_tag_counts(contenct),maxsize=120)
        
        imagename = 'H:/Project/NextBuildData/Imageresult/'+venue['Venue_id']+'.png'
        print imagename
        create_tag_image(tags, imagename, background=(0, 0, 0), fontname='Lobster')
    except:
        pass

I read text data from Mongodb. and also to remove the stop word and alpha. 

Friends can inference this code. 

The demo image is follow:



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值