爬虫之爬取PythonTip的题目

最新推荐文章于 2024-04-28 17:40:33 发布

听说不挂科

最新推荐文章于 2024-04-28 17:40:33 发布

阅读量255

点赞数 2

分类专栏： python 文章标签：爬虫 python 前端

本文链接：https://blog.csdn.net/qq_53029299/article/details/115002512

版权

python 专栏收录该内容

63 篇文章 9 订阅

订阅专栏

作为寒假在家刷题的网站
PythonTip的功劳功不可没
现在我们来爬取PythonTip的题目，要爬取内容如下
在这里插入图片描述
首先分两个cell
第一个cell函数是获取上边的题号，挑战题目，解题人数，通过率和难度，并且只执行一次

第二个cell就是正常爬取内容，内容没啥多讲，看不懂参考我以前的文章

def mulu():
    url=f'http://www.pythontip.com/coding/code_oj?page=1'
    xpath_1='//*[@id="__next"]/section/div/div/div[2]/div/div/div/div/div/div/div/table/thead'
    driver.get(url)
    tablel_1=driver.find_element_by_xpath(xpath_1).get_attribute('innerHTML')

    soup_1=BS(tablel_1,'html.parser')
    table_1=soup_1.find_all('tr')
    for row in table_1:
        cols=[col.text for col in row.find_all('th')]
        cols.pop()
        csv_write.writerow(cols)

from bs4 import BeautifulSoup as BS
from selenium import webdriver
import csv
import time

driver=webdriver.Chrome()
out=open('d:/question.csv','w',newline='')
csv_write=csv.writer(out,dialect='excel')
mulu()
for page in range(1,7):
    url=f'http://www.pythontip.com/coding/code_oj?page={page}'
    xpath='//*[@id="__next"]/section/div/div/div[2]/div/div/div/div/div/div/div/table/tbody'
    time.sleep(3)
    driver.get(url)
    tablel=driver.find_element_by_xpath(xpath).get_attribute('innerHTML')

    soup=BS(tablel,'html.parser')
    table=soup.find_all('tr')
    for row in table:
        cols=[col.text for col in row.find_all('td')]
        cols.pop()
        csv_write.writerow(cols)

out.close()
driver.close()

爬取的结果图
在这里插入图片描述

听说不挂科

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫之爬取PythonTip的题目

作为寒假在家刷题的网站PythonTip的功劳功不可没现在我们来爬取PythonTip的题目，要爬取内容如下首先分两个cell第一个cell函数是获取上边的题号，挑战题目，解题人数，通过率和难度，并且只执行一次第二个cell就是正常爬取内容，内容没啥多讲，看不懂参考我以前的文章def mulu(): url=f'http://www.pythontip.com/coding/code_oj?page=1' xpath_1='//*[@id="__next"]/section/d
复制链接

扫一扫