前几天学习了将数据存入数据库,
下面是我遇到的一些问题及体会.O(∩_∩)O~~
↖(ω)↗
连接数据库并创建表。
import pymysql
#连接数据库
db = pymysql.connect(“localhost”, “mysql”, “zhylovezyy”, “test”, charset=‘utf8’ )
# 使用cursor()方法获取操作游标
cursor = db.cursor()
# 如果数据表已经存在使用 execute() 方法删除表。
cursor.execute(“DROP TABLE IF EXISTS MOVIES”)
# 创建数据表SQL语句
sql = “”“CREATE TABLE MOVIES (
VIDEO_NAME CHAR(20) NOT NULL,
VIDEO_SCORE CHAR(20),
VIDEO_PLACE CHAR(20),
VIDEO_TYPE CHAR(20),
VIDEO_TIME CHAR(20) )”""
cursor.execute(sql)
# 关闭光标对象
cursor.close()
# 关闭数据库连接
conn.close()
爬取腾讯电影排名的代码如下:
import requests
from lxml import etree
import pymysql
for page in range(0,31,30):
url = f’http://v.qq.com/x/list/movie?sort=21&offset={page}’
req = requests.get(url=url)
req.encoding="UTF-8"
souce = req.text
html = etree.HTML(souce)
links = html.xpath("//div[@class='figure_title_score']/strong[@class='figure_title']/a/@href")
for url1 in links:
reqs = requests.get(url1)
reqs.encoding = "UTF-8"
souce = reqs.text
html = etree.HTML(souce)
video_name = html.xpath("//div[@class='video_base _base']/h1[@class='video_title _video_title']/text()")
video_score1 = html.xpath("//div[@class='video_base _base']/span[@class='video_score']/span[@class='units']/text()")
video_score2 = html.xpath("//div[@class='video_base _base']/span[@class='video_score']/span[@class='decimal']/text()")
video_information = html.xpath("//div[@class='video_tags _video_tags']/a[@class='tag_item']/text()")
score = ''.join(str(score1) for score1 in video_score1) + ''.join(str(score2) for score2 in video_score2)
name = ''.join(str(name) for name in video_name).strip()
place = str(video_information[0])
time = str(video_information[1])
type = ','.join(video_information[2:7])
sql = f'INSERT INTO MOVIES(VIDEO_NAME, VIDEO_SCORE,VIDEO_PLACE,VIDEO_TYPE,VIDEO_TIME) VALUES ("{name}", "{score}", "{place}", "{type}", "{time}")'
print(sql)
try:
# 执行sql语句
cursor.execute(sql)
# 提交到数据库执行
db.commit()
except:
# 如果发生错误则回滚
db.rollback()
db.close()
得到的结果如图:
将爬取数据存入数据库时,需要注意的问题:
1.语句是否正确。
2.若语句正确,则考虑用爬虫爬取出来的数据和所创建的字段类型是否一致,
3.如若以上均正确,则考虑爬取出的数据是否含有空格等字符串过多,导致字段中的长度不够,而带来的爬取异常等问题。
###嘿嘿,希望能帮到大家。