爬虫笔记与心得

最新推荐文章于 2022-04-17 21:10:55 发布

m0_56170373

最新推荐文章于 2022-04-17 21:10:55 发布

阅读量154

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/m0_56170373/article/details/120867207

版权

第一步，导包

import urllib
import re
import pandas as pd
import pymysql
import os
import requests
from bs4 import BeautifulSoup
import pandas as pd
import pprint
from sqlalchemy import create_engine

第二步，定义download函数

def download_all_htmls():
    htmls = []
    for idx in range(1):
        url = f"https://data.eastmoney.com/zjlx/000001.html"
        print("craw html:", url)
        r = requests.get(url)
        if r.status_code != 200:
            raise Exception("error")
        htmls.append(r.text)
    return htmls

第三步，执行爬取

htmls = download_all_htmls()
def parse_single_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    article_items = (
        soup.find("div", class_="sinstock-filter-wrap")
            .find("table")
            .find("tbody")
            .find_all("tr")
    )
    print(type(article_items))
    datas = []
    for article_items in article_items:
        trnum=article_items.find_all('td')
        for i in trnum:
            # print(i.get_text())
            datas.append(i.get_text())
    print(datas)
    return datas

第四步，解析HTML

all_datas = []
for html in htmls:
    all_datas.extend(parse_single_html(html))
print(all_datas)
len(all_datas)
df = pd.DataFrame(all_datas)

第五步，存入MySQL新表中

engine =create_engine('mysql+pymysql://root:765195804@qq@localhost:3306/work1?charset=utf8')
df.to_sql(name='股票',con=engine)

心得：

1.爬虫获取网页股票数据并保存到本地文件；2.将本地文件数据储存到MySQL数据库。Requests库可以网络资源撷取套件改善Urllib2的缺点，让使用者以最简单的方式获取网络资源，可以使用REST操作存取网络资源。

m0_56170373

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫