处理与Twitter相关数据的手段,持续更新ing
当利用Twitter API下载数据报错时
https://developer.twitter.com/en/support/twitter-api/error-troubleshooting
一些主要错误:
404 V2
- The URI requested is invalid or the resource requested, such as a user, does not exist.
- Check that you are using valid parameters and the correct URI for the endpoint you’re using.
406 V2
- Returned when an invalid format is specified in the request.
- Generally, this occurs where your client fails to properly include the headers to accept gzip encoding, but can occur in other circumstances as well.
429 V2(最令人作呕的报错)
- Too Many Requests
- Returned when a request cannot be served due to the App’s rate limit having been exhausted for the resource. See Rate Limiting.
- 对于429错误要万分小心,一定记得提前写好排雷代码
有关其他error的原因的解释:https://developer.twitter.com/en/support/twitter-api/error-troubleshooting
利用正则表达式删除tweet中的url
https://stackoverflow.com/questions/24399820/expression-to-remove-url-links-from-twitter-tweet
利用正则表达式清洗Tweet文本
https://github.com/ziishaned/learn-regex/blob/master/translations/README-cn.md
- 去掉RT、@user_name、url和emoji
Twython API
https://twython.readthedocs.io/en/latest/api.html?highlight=rate#twython.Twython.get_application_rate_limit_status
Tweepy API
https://docs.tweepy.org/en/v3.5.0/api.html#api-reference
Avoid Twitter API limitation with Tweepy\
https://stackoverflow.com/questions/21308762/avoid-twitter-api-limitation-with-tweepy
http://62.234.115.194/ask/127450325.html
tweet = api.get_status(id_of_tweet)#在tweepy中
tweet = api.show_status(id_of_tweet)#在twython中