从零到入职-番外篇-Python-网络爬虫-两个很小的实例

最新推荐文章于 2022-03-20 15:41:08 发布

J-ADan

最新推荐文章于 2022-03-20 15:41:08 发布

阅读量98

点赞数

分类专栏：从零到入职文章标签： python 爬虫

本文链接：https://blog.csdn.net/weixin_43589736/article/details/111313941

版权

从零到入职专栏收录该内容

34 篇文章 0 订阅

订阅专栏

直接上源代码

'''
纵横小说推荐榜
'''

from urllib.request import urlopen


def get_one_page(index):
    url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p={}'.format(index)
    #   url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p=%d'%(index)
    #   print(url)

    response = urlopen(url)
    return response.read().decode()


def save_one_page(i, html):
    file_name = 'zongheng\\recommend_{}.html'.format(i)

    file = open(file_name, 'w', encoding='utf-8')

    file.write(html)

    file.close()

if __name__ == '__main__':

    for i in range(1, 16):
        html = get_one_page(i)

        save_one_page(i, html)

这后面解释一下里面出现的知识点
从上往下说
字符串的操纵

url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p={}'.format(index)
    #   url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p=%d'%(index)

这里有两种操作，一种是{}，后面跟format（），
还有一种是%d，后面加%（）。
这两种都是字符串的格式化操作，但是我们还是习惯于用第一种操作。

再说第二种，Python中文件IO流的几个简单的操作。

file_name = 'zongheng\\recommend_{}.html'.format(i)

    file = open(file_name, 'w', encoding='utf-8')

    file.write(html)

    file.close()

第一行没啥好说的，就是文件的保存位置以及文件的名字，为啥用两个反斜线，这个就不用多说了，就是一个转义字符的作用。

第二行开始我们就要补充一些知识了，
IO流的读写操作
r 只读
w 只写
a 追加

rb 二进制只读
wb 二进制只写
ab 二进制追加

r+ 可读可写
w+ 可写可读
a+ 可追加可读可写

然后open（）打开文件
write 文件写入
打开文件就要关闭文件close（）

我们这里说最重要的一个东西

if __name__ == '__main__':

    for i in range(1, 16):
        html = get_one_page(i)

        save_one_page(i, html)

就是这个地方
这个地方的作用是什么呢，他就是一个if判断语句，
__ name __是Python的系统变量，如果是本文件（本模块）调用，变量的值就是 __ main __，即判断为True。
如果是其他的模块调用， __ name __即为其他模块的名字，即不成立，不执行。

这里补充一个技巧，Python中文件的自主控制打开关闭比较费神，所以我们可以这样改

file = open(file_name, 'w', encoding='utf-8')
    file.write(html)
    file.close()

 with open(file_name, 'w', encoding='utf-8') as file:
        file.write(html)

下面这种写法是读写操作完成之后，由Python虚拟机自主关闭文件。

这个保存百度贴吧前五页代码自己理解一下

from urllib.request import urlopen
from urllib.parse import quote

key = input('请输入搜索内容')


def __get__one__page(key, index):
    url = 'https://tieba.baidu.com/f?kw={}&ie=utf-8&pn={}'.format(key, 50*index)

    response = urlopen(url)

    return response.read().decode()


def __save__one__page(key, index, html):
    file_name = 'Tieba\\{}_{}.html'.format(key, index+1)

    with open(file_name, 'w', encoding='utf-8') as file:
        file.write(html)


if __name__ == '__main__':

    for index in range(0, 6):
        html = __get__one__page(key, index)

        __save__one__page(key, index, html)

J-ADan

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
从零到入职-番外篇-Python-网络爬虫-两个很小的实例

直接上源代码'''纵横小说推荐榜'''from urllib.request import urlopendef get_one_page(index): url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p={}'.format(index) # url = 'http://www.zongheng.com/rank/details.html?rt=6&d=1&p=%d'%
复制链接

扫一扫