python从文件中提取特定文本_Python 从任意文本中提取Twitter的推文元素

CODE:

#!/usr/bin/python

# -*- coding: utf-8 -*-

'''

Created on 2014-7-28

@author: guaguastd

@name: extract_from_arbitrary.py

'''

if __name__ == '__main__':

# import json

import json

# import search

from search import search_for_tweet

# import login, see http://blog.csdn.net/guaguastd/article/details/31706155

from login import twitter_login

# get the twitter access api

twitter_api = twitter_login()

# import twitter_text

import twitter_text

while 1:

query = raw_input('\nInput the query (eg. #MentionSomeoneImportantForYou, exit to quit): ')

if query == 'exit':

print 'Successfully exit!'

break

statuses = search_for_tweet(twitter_api, query)

ex = twitter_text.Extractor(statuses)

screen_names = ex.extract_mentioned_screen_names_with_indices()

urls = ex.extract_urls_with_indices()

hashtags = ex.extract_hashtags_with_indices()

# Explore the first 5 items for each...

print json.dumps(screen_names[0:5], indent=1)

print json.dumps(urls[0:5], indent=1)

print json.dumps(hashtags[0:5], indent=1)

RESULT:

Input the query (eg. #MentionSomeoneImportantForYou, exit to quit): #MentionSomeoneImportantForYou

Length of statuses 32

[

{

"indices": [

68,

78

],

"screen_name": "ggktyssie"

},

{

"indices": [

113,

124

],

"screen_name": "BE_IBGDRGN"

},

{

"indices": [

180,

192

],

"screen_name": "RMP_IBGDRGN"

},

{

"indices": [

2850,

2858

],

"screen_name": "sdrpxx1"

},

{

"indices": [

2886,

2897

],

"screen_name": "BE_IBGDRGN"

}

]

[

{

"url": "http://twitter.com/download/iphone",

"indices": [

327,

361

]

},

{

"url": "https://abs.twimg.com/images/themes/theme1/bg.png",

"indices": [

1161,

1210

]

},

{

"url": "https://pbs.twimg.com/profile_images/493273088721580035/ITsV9jH-_normal.jpeg",

"indices": [

1297,

1373

]

},

{

"url": "http://pbs.twimg.com/profile_images/493273088721580035/ITsV9jH-_normal.jpeg",

"indices": [

1877,

1952

]

},

{

"url": "https://pbs.twimg.com/profile_banners/2673775045/1406340167",

"indices": [

2024,

2083

]

}

]

[

{

"indices": [

149,

179

],

"hashtag": "MentionSomeoneImportantForYou"

},

{

"indices": [

2923,

2953

],

"hashtag": "MentionSomeoneImportantForYou"

},

{

"indices": [

5830,

5860

],

"hashtag": "MentionSomeoneImportantForYou"

},

{

"indices": [

8495,

8525

],

"hashtag": "MentionSomeoneImportantForYou"

},

{

"indices": [

11197,

11227

],

"hashtag": "MentionSomeoneImportantForYou"

}

]

Input the query (eg. #MentionSomeoneImportantForYou, exit to quit):

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值