python极客学院爬虫_极客学院Python文本爬虫

weixin_39581845

于 2020-12-11 02:47:00 发布

阅读量76

点赞数

文章标签： python极客学院爬虫

# -*- coding: utf-8 -*-

import re

old_url = 'http://www.jikexueyuan.com/course/android/?pageNum=2'

total_page = 20

f = open('1.wenben.txt','r+')

html = f.read()

f.close()

# re.S 包括换行

# 抓取标题 search 找到内容后自动停止查找 findall则是遍历

title = re.search('

(.*?)',html,re.S).group(1)

print title

# sub的使用

s = '123adsg123'

output = re.sub('123(.*?)123','houzhong%d'%88,s)

print output

不要使用compile。

#匹配数字

a = 'asdfsf12313dfadfad'

b = re.findall('\d',a)

print b

结果：['1', '2', '3', '1', '3']

a = 'asdfsf12313dfadfad2131'

b = re.findall('\d+',a)

print b

结果：['12313', '2131']

翻页功能 re.sub

import re

old_url = 'http://www.jikexueyuan.com/course/android/?pageNum=2'

total_page = 20

for i in range(total_page):

i += 1

new_url = re.sub('pageNum=\d+','pageNum=%d'%i, old_url)

print new_url

weixin_39581845

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。