根据地理位置和关键词爬取twitter数据并生成词云

根据地理位置和关键词爬取twitter数据存入MongoDB并生成词云

转载注明出处

  • tweepy获取数据
  • 生成词云

tweepy获取数据

1. 建立model model.py

class twitter_post(Document):
    _id = ObjectIdField(primary_key = True)
    screen_name = StringField(max_length = 128)
    text = StringField(required = True, max_length = 2048)
    text_id = IntField(required = True)
    created_at = DateTimeField(required = True)
    in_reply_to_screen_name = StringField(max_length = 64)
    retweet_count = IntField()
    favorite_count = IntField()
    source = StringField(max_length = 1024)
    longitude = StringField(max_length = 32)
    latitude = StringField(max_length = 32)
    location = StringField(max_length = 256)
    country_code = StringField(max_length = 64)
    lang = StringField(max_length = 4)
    time_zone = StringField(max_length = 64)
    province = StringField(max_length = 64)
    city = StringField(max_length = 64)
    district = StringField(max_length = 64)
    street = StringField(max_length = 64)
    street_number = StringField(max_length = 64)

    meta = {
        'ordering': ['created_at','screen_name'],
        'collection': 'twitter_posts'
    }

2. 访问百度地图接口根据经纬度拿到省市街道信息

import requests
def GetAddress(lon,lat):
    url = 'http://api.map.baidu.com/geocoder/v2/'
    header = {
  'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.79 Safari/537.36'}
    payload = { 'output':'json', 'ak':'pAjezQsQBe8v1c1Lel87r4vprwXiGCEn' }
    payload['location'] = '{0:s},{1:s}'.format(str(lon),str(lat))
    print(lon,lat)
    content = requests.get(url,params=payload,headers=header).json()
    try:
        content = requests.get(url,params=payload,headers=header).json()
        content = content['result']['addressComponent']
        if content['street'] == None:#有一些地理位置街道信息拿不到
            content['street'] = 'NULL'
        if content['street_number'] == None:
            content['street_number'] = 'NULL'
    except:
        content["province"]="NULL"
        content["city"]="NULL"
        content["district"]="NULL"
        content["street"]="NULL"
        content["street_number"]="NULL"
    return content
print(GetAddress(40.07571952, 116.60609467))

下面是三组经纬度拿到的地理位置信息
三个经纬度拿到的信息

3. 访问tweepy开放的接口爬取数据

consumer_key = 'I1XowkiAc72fEp2CXPv0'
consumer_secret = 'drfnZHVUQrq1dyeqepCrbKyGWeYJCeTFQZpkLcXkgKFw3P'
access_key = '936432882482143235-jNLGPsCpZaSqR1D2WarSEshgQcyi'
access_secret = 'YF4ddleSgGxj8BsfmH2DELr7TsNNKAp08ZvqC'

# consumer_key = 'qEgHKHnL55g7k4U9xih'
# consumer_secret= 'QcUDHJS04wK5hrmlxV5C4gweiRPDca9JQoc4gp7ft'
# access_key= '863573499436122112-LA60oJLBzwVnhZjGOUPzRsJc'
# access_secret= '8CKFpp6qyxkAk1KfjWJPoHKloppPrvd7Tjiwllyk'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth_handler
  • 0
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值