python用字符串操作20行代码简单爬虫入门+案例（爬取一章《三体》小说）

最新推荐文章于 2024-04-13 00:09:54 发布

weixin_34327223

最新推荐文章于 2024-04-13 00:09:54 发布

阅读量197

点赞数

文章标签： python 爬虫

原文链接：https://segmentfault.com/a/1190000011193034

版权

所需要的简单的方法

1、#导入专用包

import urllib.request

2、try...except..

try:
   语句1....
except Exception as e:
    语句2...
尝试执行语句1，执行不成功就执行语句2

3、urlopen获取内容

response =urllib.request.urlopen(webList)
#获取webList页面的内容

4、read()读取

response.read()
#读取获取的内容

5、decode解码

decode('UTF-8')
#用utf-8的方式解码

6、替换方法

html = html.expandtabs()
#html内容替换所有的制表符为空

html =html.replace(' ','')
#替换掉所有的空格

7、获取长度

lenth = len(html)
#获取文档的长度

8、find()查找方法

lenth = len(html)
#获取文档的长度

9、字符串的截取

html[0:index2]
#对整篇字符串进行截取

10、写入 open..write

writeFile =open('三体.txt','w')
writeFile.write(htm)
#写入文件

案例爬取一页《三体》小说。

#导入专用包
import urllib.request
#需要连接的页面
webList ='http://www.51shucheng.net/kehuan/santi/santi1/174.html'
#用try尝试去连接
try:
    response =urllib.request.urlopen(webList)
    #如果能成功连接，并获取内容，response就是我们所获取的那个页面
except Exception as e:
    print('获取失败')
    #否则就打印出‘获取失败’
html = str(response.read().decode('UTF-8'))
# 把获取的内容读取出来，并且用UTF-8解码
html = html.expandtabs()
#替换掉所有的TAB符号
html =html.replace(' ','')
#替换掉所有的空格
print(html)
#可以打印出来预览一下，方便进行定位
lenth = len(html)
#获取文档的长度
html = html[html.find('neirong">',0,lenth)+9:]
index =html.find('跟鞋。</p>',0)+3
index2 = html.find('眷恋着天空。</p>')
index3 =html.find('<p>“红色联合”的战士们欢呼起来')
#找到一些关键位置，获取索引，方便下边进行定位
htm =str(html[0:index2]+html[index3:index])
#对整篇字符串进行截取
htm = htm.replace('<p>','    ')
htm = htm.replace('</p>','\n')
#把文中的<p></p>替换掉
writeFile =open('三体.txt','w')
writeFile.write(htm)
#写入文件
print('写入完成')

weixin_34327223

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python用字符串操作20行代码简单爬虫入门+案例（爬取一章《三体》小说）

所需要的简单的方法1、#导入专用包import urllib.request2、try...except..try: 语句1....except Exception as e: 语句2...尝试执行语句1，执行不成功就执行语句23、urlopen获取内容response =urllib.request.ur...
复制链接

扫一扫

python用字符串操作20行代码简单爬虫入门+案例（爬取一章《三体》小说）

所需要的简单的方法

1、#导入专用包

2、try...except..

3、urlopen获取内容

4、read()读取

5、decode解码

6、替换方法

7、获取长度

8、find()查找方法

9、字符串的截取

10、写入 open..write

案例爬取一页《三体》小说。

“相关推荐”对你有帮助么？