【声明:因时间关系,爬虫部分代码有所借鉴,本人主要想将该三部分结合起来,供读者学习交流,如有冒犯,尽请原谅!】
一、本地环境:
1.python 3.7
2.mysql 8.0.17(Navicat)
3.beautifulsoup 4-4.8.0
4.pyecharts-0.5.11
!!!(若pyecharts导入过程之中遇到较难解决的问题,读者可尝试Anaconda3,其自带pyecharts)
二、爬取数据放置MySQL
import requests
from bs4 import BeautifulSoup
import re
import pymysql
db = pymysql.connect("localhost", "root", "123zwh", "zwh1")
cursor = db.cursor()
url = 'http://www.china-10.com/news/488659.html'
html = requests.get(url)
soup = BeautifulSoup(html.content, 'html.parser')
# 找到所有class为md_td的td元素
aaa = soup.find_all(name="td", attrs={
"class": re.compile(r"md_td")})
# 检查索引,以便于后面爬取工作
# for n,i in enumerate(aaa):
# print(n,i.text)
demo_list = []
for i in aaa[4:128]:
demo_list.append(i.text)
while demo_list:
print(int(demo_list[0