Python实现简单的爬虫获取某刀网的更新数据

最新推荐文章于 2023-11-09 11:27:43 发布

Essani

最新推荐文章于 2023-11-09 11:27:43 发布

阅读量2.5k

点赞数

分类专栏：爬虫技术;新手文章标签：爬虫娱乐 python

本文链接：https://blog.csdn.net/u010834767/article/details/69948774

版权

爬虫技术;新手专栏收录该内容

1 篇文章 0 订阅

订阅专栏

昨天晚上无聊时，想着练习一下Python所以写了一个小爬虫获取小刀娱乐网里的更新数据

#!/usr/bin/python
# coding: utf-8

import urllib.request
import re
#定义一个获取网页源码的子程序
head = "www.xiaodao.la"
def get():
    data = urllib.request.urlopen('http://www.xiaodao.la').read()
    #解码并去除无用文字
    str = data.decode("gbk").replace(r"font-weight:bold;","").replace(r" ","").replace(" ","").replace(" ","").replace("\r\n","").replace("#FF0000","#000000").strip()
    return str[str.find("好卡售"):str.find("20160303184868786878.gif")]#返回指定内容
#获取一次网页源码并赋值给str
str = get();
#print(str)

#定义正则表达式
#reg = r'href="(.*?)"style="color:#000000;"title="(.*?)"target="_blank">'
reg = r'href="(.*?)"style="color:#000000;"title="(.*?)"target="_blank">(.*?)</a></div></td><tdwidth=12.5%align=rightnowrap=nowrapstyle="color:#F00;">(.*?)</td>'

tmp = re.compile(reg);#创建正则表达式
list = re.findall(tmp,str);#正则表达式匹配
list = tuple(list)#转换类型

print("一共匹配到%d个"%(len(list)))#输出匹配数量
#print(list)

for i in range(len(list)):
    print("当前第%d个:"%(i+1))
    print("标题:%s\n地址:%s更新时间:%s\n"%(list[i][1],head + list[i][0],list[i][3]))

Essani

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python实现简单的爬虫获取某刀网的更新数据

昨天晚上无聊时，想着练习一下Python所以写了一个小爬虫获取小刀娱乐网里的更新数据#!/usr/bin/python# coding: utf-8import urllib.requestimport re#定义一个获取网页源码的子程序head = "www.xiaodao.la"def get(): data = urllib.request.urlopen('http:
复制链接

扫一扫