刷课，刷课

最新推荐文章于 2024-09-23 23:27:05 发布

caisirking

最新推荐文章于 2024-09-23 23:27:05 发布

阅读量2.6k

点赞数 4

文章标签： chrome 爬虫前端

本文链接：https://blog.csdn.net/weixin_42266726/article/details/126839021

版权

公司的网上大学有不少的好课件，供大家学习。有时候，我也会在网上大学学习学习。

可是，凡事都有度，这一段时间，我们要学天翼云知识，一共有13个专题，加起来有100多小时的课，还要求20号前学完，my god！这事儿做的有点过了，我和同事表达了极其的不满与愤慨。可冷静下来想想，还是要完成的，怎么办，刷课吧。

和同事用仅有的爬虫知识，做了个简单的程序。基本构想如下：

先把课程的每个网址爬出来，保存成txt文件，再做个程序，按行读网址，打开网址，停留50分钟，再读下一个网址。

爬网址程序如下：

from selenium import webdriver
import time

index_url = 'https://kc.zhixueyun.com/xxxxxxxx'

browser = webdriver.Chrome()
browser.get(index_url)
time.sleep(30)
page_text = browser.page_source ##获得html文本


def find_all(string,sub) :  ## 定义查找函数， 两个参数，一个是总字符串， 一个是要查找的字符串
    start = 0 ## 定义查找的起始位置
    pos = [] ## 定义空列表
    while True:  ## 设置循环条件
        start = string.find(sub,start)  ## 根据查找结果调整查找的起始位置
        if start == -1: ## 设置循环终止条件，即没有匹配字符串时， 循环终止
            return pos ## 返回所有的查找结果
        pos.append(start) ## 将每次查找到的字符串的起始索引添加至pos列表
        start += len(sub)  ## 当查找到字符时， 查找起始位置向后移动被字符串长度个单位


list_index=find_all(page_text,'data-resource-id=')  ##利用自定义函数查找指定子字符串
f=open('king.txt','a')  ##打开记录用的文本，以追加方式。
for list_num in list_index:  ##遍历位置集合
    str_list=str(page_text[list_num+18:list_num+54])  ##按照设定的偏移取子字符串
    ban_url='https://kc.zhixueyun.com/xxxxxxx'
    f.write(ban_url+str_list+'\n')  ##拼合url
f.close()

爬出的txt如下：

然后再写个python脚本：

# -*- coding: utf-8 -*-
import webbrowser as web #导入浏览器模块
import time #导入时间模块
import os #导入操作系统模块
import random #导入随机数模块
count = random.randint(2,4) #随机数为2到4次
j = 0

i = 0

file = open ("king.txt", "r")
file = file.readlines()
for line in file:
    web.open_new_tab(line)
    time.sleep(3000) #停留3000秒
    #print (line)
    #print(",")
file. Close ()

测试OK。

把程序、txt等传到云桌面、云主机，执行。安逸的很。