Python爬取新闻标题及链接存储至MySQL（含源码）

最新推荐文章于 2024-06-08 11:00:00 发布

陈同学q

最新推荐文章于 2024-06-08 11:00:00 发布

阅读量1.7k

点赞数

分类专栏： Python 文章标签：爬虫 python mysql 数据库数据持久化

本文链接：https://blog.csdn.net/qq_54528857/article/details/122273289

版权

Python 专栏收录该内容

13 篇文章 3 订阅

订阅专栏

请求网页： https://www.tsinghua.edu.cn/news.htm/

一. 首先要获取数据，将数据暂存于 list列表

二. 将数据存储至MySQL：
                1.创建连接
                2.创建游标
                3.传入参数，执行命令
                4.数据提交（提交至MySQL）
                5.关闭游标、链接

注意：运行代码前在MySQL新建表，做到代码与数据库的3个对应：
                        1. 用户名、密码对应
                        2. 数据库名对应
                        3. 传入参数时的表名、表中栏位名对应

运行结果：

```源码```


import pandas as pd
import pymysql
import requests
from lxml import etree


# 请求网址
url = 'https://www.tsinghua.edu.cn/news.htm'
# 请求头
header = {
    'user_agent': 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36'
}
# 获取响应 
response = requests.get(url=url, headers=header).content
# 解码
chi = response.decode('utf-8')
# 解析html
html = etree.HTML(chi)
# 通过xpath解析、筛选获取数据
data = html.xpath('/html/body/div[6]/div/div/ul/li/div[3]/a')

for i in data:
    # 创建列表用于存储爬取的数据
    list = []
    # 获取标题 , 并将标题数据加入 list 列表
    title_text = i.xpath('.//text()')[0]
    # list.append(title_text)
    # 获取url
    title_url = i.xpath('./@href')[0]
    # url 不完整，则拼接成完整url, 并将完整url 加入list列表
    if 'https' not in str(title_url):
        stitch_url = 'https://www.tsinghua.edu.cn/' + title_url
        list.append([title_text, stitch_url])
    else:
        list.append([title_text, title_url])
    print(list)

    # MYSQL
    # 1. 创建链接
    conn = pymysql.connect(
        host='127.0.0.1',  # 本地MYSQL
        user='root',  # 用户名
        password='00000',  # 密码
        port=3306,  # 端口号， 默认就为3306，可写可不写
        database='gradem',  # 数据库名
        charset='utf8'  # 编码
    )
    # 2. 创建游标
    cur = conn.cursor()
    n = 1
    for l in list:
        try:
            # 3. 传入参数, 执行命令
            cur.execute('insert into list(title, url) values(%s, %s)', (l[0], l[1]))
            # 4. 数据提交 (提交至MySQL)
            conn.commit()
        except Exception as e:
            # 数据回滚
            conn.rollback()
            print('第' + str(n) + '数据存储失败!')
            n += 1
    # 5. 关闭游标、连接
    cur.close()
    conn.close()