python3爬取天气数据存入mysql数据库

最新推荐文章于 2024-07-24 14:36:42 发布

青衫故人旧33

最新推荐文章于 2024-07-24 14:36:42 发布

阅读量5.8k

点赞数 10

分类专栏： python爬虫文章标签： python mysql 爬虫存储数据

本文链接：https://blog.csdn.net/JiShun_Wang/article/details/99709521

版权

python爬虫专栏收录该内容

6 篇文章 2 订阅

订阅专栏

爬虫的目的往往是为了获取数据，如果爬取的数据量较小可以用csv格式存储，但在数据量大的情况下可以考虑存入数据库，不仅保存方便，查询调用效率快。本篇博文的目的是为了展示如何将爬取的数据存入数据库。如果大家想学习更多的关于mysql数据库的知识请点击： MySQL学习。

本篇博客以爬取过去时间天气数据为例，将爬取到的数据存入到数据库。关键的两点是如何连接数据库以及将数据存入。首先我们应事先在mysql中的一个数据库中创建一个表。然后可以使用pymysql包中的pymysql.connect(host="localhost", user="root", passwd="123456", db="asset", charset='utf8')函数连接数据库。该函数中的参数很多，具体有
pymysql.connections.Connection(self,
host=None, # 要连接的主机地址
user=None, # 登录用户的名称
password='', # 用户密码
database=None, # 要连接的数据库名称
port=0, # 端口，一般默认为3306
unix_socket=None, # 选择是否要用unix_socket而不是TCP/IP
charset='', # 字符编码方式
sql_mode=None, # Default SQL_MODE to use.
read_default_file=None, # 从默认配置文件(my.ini或my.cnf)中读取参数
conv=None, # 转换字典
use_unicode=None, # 是否使用 unicode 编码
client_flag=0, # Custom flags to send to MySQL. Find potential values in constants.CLIENT.
cursorclass=<class 'pymysql.cursors.Cursor'>, # 选择 Cursor 类型
init_command=None, # 连接建立时运行的初始语句
connect_timeout=10, # 连接超时时间，(default: 10, min: 1, max: 31536000)
ssl=None, # A dict of arguments similar to mysql_ssl_set()'s parameters.For now the capath and cipher arguments are not supported.
read_default_group=None, # Group to read from in the configuration file.
compress=None, # 不支持
named_pipe=None, # 不支持
no_delay=None, #
autocommit=False, # 是否自动提交事务
db=None, # 同 database，为了兼容 MySQLdb
passwd=None, # 同 password，为了兼容 MySQLdb
local_infile=False, # 是否允许载入本地文件
max_allowed_packet=16777216, # 限制 `LOCAL DATA INFILE` 大小
defer_connect=False, # Don't explicitly connect on contruction - wait for connect call.
auth_plugin_map={}, #
read_timeout=None, #
write_timeout=None,
bind_address=None # 当客户有多个网络接口，指定一个连接到主机
)

数据存入数据库的方式有两种：
1. sql = """insert into wether_test(time_local, link, wether_type, temperature, wind_power) \
values(‘内容1’, ‘内容2’, ‘内容3’, ‘内容4’, ‘内容5’)"""

2. sql = """insert into wether_test(time_local, link, wether_type, temperature, wind_power) \
values(%s, %s, %s, %s, %s)"""
cursor.execute(sql,(title,href,wether,wendu,fengli))

详细下面看代码。

"""
爬取天气数据
"""
import pymysql
import requests
from bs4 import BeautifulSoup

db = pymysql.connect(host="localhost", user="root", passwd="123456", db="asset", charset='utf8' )
cursor = db.cursor()


#获取网页信息
def get_html(url):
    html = requests.get(url)
    html.encoding = 'gb2312'
    soup = BeautifulSoup(html.text, 'lxml')
    return soup

year = ['2017','2018']

month = ['01', '02', '03', '04','05', '06', '07', '08', '09', '10', '11', '12']


time = [y+x for y in year for x in month] 
for date in time:
    url = 'http://www.tianqihoubao.com/lishi/jinan/month/'+ date +'.html'
    soup = get_html(url)
    sup = soup.find('table',attrs={'class':'b'})
    tr = sup.find_all('tr')
    for trl in tr[1:]:
        td = trl.find_all('td')
        href = td[0].find('a')['href'] #获取链接信息
        title = td[0].find('a')['title'] #获取名称
        wether = td[1].get_text().replace('\r\n','').replace(' ','') #获取天气状况
        wendu = td[2].get_text().strip().replace(' ','').replace('\r\n','')#获取温度
        fengli = td[3].get_text().strip().replace(' ','').replace('\r\n','') #获取风力大小       

        sql = """insert into wether_test(time_local, link, wether_type, temperature, wind_power) \
                values(%s, %s, %s, %s, %s)"""
        cursor.execute(sql,(title,href,wether,wendu,fengli))
        db.commit()
    print("  已经爬取"+date+"数据")
db.close
print('结束')

程序运行效果：