python存储mysql_使用Python和MySQL收集/存储推文

python存储mysql

A few days ago, I published Collecting / Storing Tweets with Python and MongoDB. In that post, I describe the steps needed to collect and store tweets gathered via the Twitter Streaming API.

几天前,我用Python和MongoDB发布了“ 收集/存储推文” 。 在那篇文章中,我描述了收集和存储通过Twitter Streaming API收集的推文所需的步骤。

I received a comment on that post asking how to store data into MySQL instead of MongoDB. Here’s what you’d need to do to make that change.

我收到关于该帖子的评论,询问如何将数据存储到MySQL而不是MongoDB中。 这是进行更改所需的操作。

使用Python和MySQL收集/存储推文 (Collecting / Storing Tweets with Python and MySQL)

In the previous post, we store the twitter data into MongoDB with one line of code:

一篇文章中 ,我们使用一行代码将twitter数据存储到MongoDB中:

db.twitter_search.insert(datajson)

Unlike MongoDB where we can insert the entire json object, if we want to use MySQL instead of MongoDB, we need to do some additional work with the ‘datajson’ object before storing it.

与可以插入整个json对象的MongoDB不同,如果我们要使用MySQL而不是MongoDB,我们需要在存储它之前对'datajson'对象做一些额外的工作。

Let’s assume that we are interested in just capturing the username, date, Tweet and Tweet ID from twitter.  This is most likely the bare minimum information you’d want to capture from the API…there are many (many) more fields available but for now, we’ll go with these four.

假设我们只对捕获Twitter的用户名,日期,Tweet和Tweet ID感兴趣。 这很可能是您想从API捕获的最起码的最低限度信息……有很多(许多)可用字段,但是现在,我们将使用这四个字段。

Note: I’ll hold off on the MySQL specific changes for now and touch on them shortly.

注意:我将暂时暂缓MySQL特定的更改,并在不久之后进行介绍。

Once you capture the tweet (line 38 in my original script) and have it stored in your datajson object, create a few variables to store the date, username, Tweet and ID.

一旦捕获了tweet(我的原始脚本中的第38行)并将其存储在datajson对象中,请创建一些变量来存储日期,用户名,Tweet和ID。

text = datajson['text']
screen_name = datajson['user']['screen_name']
tweet_id = datajson['id']
created_at = parser.parse(datajson['created_at'])

Note in the above that we are using the

请注意,在上文中,我们正在使用

parser.parse()

 command from the

来自的命令

dateutil

  module to parse the created_at date for storage into Mysql.

模块以解析created_at日期以存储到Mysql中。

Now that we have our variables ready to go, we’ll need to store those variables into MySQL.  Before we can do that, we need to set up a MySQL connection. I use the python-mysql connector but you are free to use what you need to.  You’ll need to do a

现在我们准备好了变量,我们需要将这些变量存储到MySQL中。 在此之前,我们需要建立一个MySQL连接。 我使用python-mysql连接器,但是您可以随意使用所需的东西。 您需要做一个

import MySQLdb

  to get the connector imported into your script (and assuming you installed the connector with

将连接器导入到脚本中(并假设您使用

pip install mysql-python

.

You’ll need to create a table to store this data. You can use the sql statement below to do so if you need assistance / guidance.

您需要创建一个表来存储此数据。 如果您需要帮助/指导,可以使用下面的sql语句来这样做。

CREATE TABLE `twitter` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `tweet_id` varchar(250) DEFAULT NULL,
 `screen_name` varchar(128) DEFAULT NULL,
 `created_at` timestamp NULL DEFAULT NULL,
 `text` text,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8;

Now, let’s set up our MySQL connection, query and execute/commit for the script. I’ll use a function for this to be able to re-use it for each tweet captured.

现在,让我们建立我们MySQL连接,查询和执行/提交脚本。 我将为此使用一个函数,以便能够在捕获的每个推文中重复使用它。

def store_data(created_at, text, screen_name, tweet_id):
    db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
    cursor = db.cursor()
    insert_query = "INSERT INTO twitter (tweet_id, screen_name, created_at, text) VALUES (%s, %s, %s, %s)"
    cursor.execute(insert_query, (tweet_id, screen_name, created_at, text))
    db.commit()
    cursor.close()
    db.close()
    return

That’s it.  You are now collecting tweets and storing those tweets into a MySQL database.

而已。 您现在正在收集推文并将这些推文存储到MySQL数据库中。

完整剧本 (Full Script)

from __future__ import print_function
import tweepy
import json
import MySQLdb 
from dateutil import parser

WORDS = ['#bigdata', '#AI', '#datascience', '#machinelearning', '#ml', '#iot']

CONSUMER_KEY = "KEY"
CONSUMER_SECRET = "SECRET"
ACCESS_TOKEN = "TOKEN"
ACCESS_TOKEN_SECRET = "TOKEN_SECRET"

HOST = "YOUR_DATABASE_HOST"
USER = "YOUR_DATABASE_USER"
PASSWD = "YOUR_DATABASE_PASSWORD"
DATABASE = "YOUR_DATABASE"

# This function takes the 'created_at', 'text', 'screen_name' and 'tweet_id' and stores it
# into a MySQL database
def store_data(created_at, text, screen_name, tweet_id):
    db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
    cursor = db.cursor()
    insert_query = "INSERT INTO twitter (tweet_id, screen_name, created_at, text) VALUES (%s, %s, %s, %s)"
    cursor.execute(insert_query, (tweet_id, screen_name, created_at, text))
    db.commit()
    cursor.close()
    db.close()
    return

class StreamListener(tweepy.StreamListener):    
    #This is a class provided by tweepy to access the Twitter Streaming API. 

    def on_connect(self):
        # Called initially to connect to the Streaming API
        print("You are now connected to the streaming API.")
 
    def on_error(self, status_code):
        # On error - if an error occurs, display the error / status code
        print('An Error has occured: ' + repr(status_code))
        return False
 
    def on_data(self, data):
        #This is the meat of the script...it connects to your mongoDB and stores the tweet
        try:
           # Decode the JSON from Twitter
            datajson = json.loads(data)
            
            #grab the wanted data from the Tweet
            text = datajson['text']
            screen_name = datajson['user']['screen_name']
            tweet_id = datajson['id']
            created_at = parser.parse(datajson['created_at']) 

            #print out a message to the screen that we have collected a tweet
            print("Tweet collected at " + str(created_at))
            
            #insert the data into the MySQL database
            store_data(created_at, text, screen_name, tweet_id)
        
       except Exception as e:
           print(e)

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
#Set up the listener. The 'wait_on_rate_limit=True' is needed to help with Twitter API rate limiting.
listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True)) 
streamer = tweepy.Stream(auth=auth, listener=listener)
print("Tracking: " + str(WORDS))
streamer.filter(track=WORDS)

 

 

翻译自: https://www.pybloggers.com/2017/01/collecting-storing-tweets-with-python-and-mysql/

python存储mysql

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值