Python 从任意文本中提取Twitter的推文元素

CODE:

#!/usr/bin/python 
# -*- coding: utf-8 -*-

'''
Created on 2014-7-28
@author: guaguastd
@name: extract_from_arbitrary.py
'''

if __name__ == '__main__':
    
    # import json
    import json
    
    # import search
    from search import search_for_tweet
    
    # import login, see http://blog.csdn.net/guaguastd/article/details/31706155
    from login import twitter_login

    # get the twitter access api
    twitter_api = twitter_login()
    
    # import twitter_text
    import twitter_text
    
    while 1:
        query = raw_input('\nInput the query (eg. #MentionSomeoneImportantForYou, exit to quit): ')
        
        if query == 'exit':
            print 'Successfully exit!'
            break
        
        statuses = search_for_tweet(twitter_api, query) 
        ex = twitter_text.Extractor(statuses)     
        
        screen_names = ex.extract_mentioned_screen_names_with_indices()
        urls = ex.extract_urls_with_indices()
        hashtags = ex.extract_hashtags_with_indices()    
           
        # Explore the first 5 items for each...
        print json.dumps(screen_names[0:5], indent=1)
        print json.dumps(urls[0:5], indent=1)
        print json.dumps(hashtags[0:5], indent=1)

RESULT:

Input the query (eg. #MentionSomeoneImportantForYou, exit to quit): #MentionSomeoneImportantForYou
Length of statuses 32
[
 {
  "indices": [
   68, 
   78
  ], 
  "screen_name": "ggktyssie"
 }, 
 {
  "indices": [
   113, 
   124
  ], 
  "screen_name": "BE_IBGDRGN"
 }, 
 {
  "indices": [
   180, 
   192
  ], 
  "screen_name": "RMP_IBGDRGN"
 }, 
 {
  "indices": [
   2850, 
   2858
  ], 
  "screen_name": "sdrpxx1"
 }, 
 {
  "indices": [
   2886, 
   2897
  ], 
  "screen_name": "BE_IBGDRGN"
 }
]
[
 {
  "url": "http://twitter.com/download/iphone", 
  "indices": [
   327, 
   361
  ]
 }, 
 {
  "url": "https://abs.twimg.com/images/themes/theme1/bg.png", 
  "indices": [
   1161, 
   1210
  ]
 }, 
 {
  "url": "https://pbs.twimg.com/profile_images/493273088721580035/ITsV9jH-_normal.jpeg", 
  "indices": [
   1297, 
   1373
  ]
 }, 
 {
  "url": "http://pbs.twimg.com/profile_images/493273088721580035/ITsV9jH-_normal.jpeg", 
  "indices": [
   1877, 
   1952
  ]
 }, 
 {
  "url": "https://pbs.twimg.com/profile_banners/2673775045/1406340167", 
  "indices": [
   2024, 
   2083
  ]
 }
]
[
 {
  "indices": [
   149, 
   179
  ], 
  "hashtag": "MentionSomeoneImportantForYou"
 }, 
 {
  "indices": [
   2923, 
   2953
  ], 
  "hashtag": "MentionSomeoneImportantForYou"
 }, 
 {
  "indices": [
   5830, 
   5860
  ], 
  "hashtag": "MentionSomeoneImportantForYou"
 }, 
 {
  "indices": [
   8495, 
   8525
  ], 
  "hashtag": "MentionSomeoneImportantForYou"
 }, 
 {
  "indices": [
   11197, 
   11227
  ], 
  "hashtag": "MentionSomeoneImportantForYou"
 }
]

Input the query (eg. #MentionSomeoneImportantForYou, exit to quit): 


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值