python 爬虫

最新推荐文章于 2022-09-17 15:22:12 发布

VIP文章 sunshine0625

最新推荐文章于 2022-09-17 15:22:12 发布

阅读量519

点赞数

分类专栏：【python】

本文链接：https://blog.csdn.net/u012680593/article/details/53818792

版权

爬虫之抓取糗事百科的段子（python3.5环境）：

1.下载页面

2.解析（xpath方法）

# -*-coding:utf-8 -*-
import urllib.request
import sys
import io
from lxml import etree
from urllib.parse import urljoin
sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='gb18030') #改变标准输出的默认编码

def download(originer_url,p):
    url=str(originer_url)+str(p)
    print(url)
    print (p)
    #添加header
    headers={'User-Agent':r'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)','Connection':'keep-alive'}
    #创建opener
    opener=urllib.request.build_opener()
    opener.addheaders=[headers]
    try:
        page=opener.open(str(url)).read().dec

最低0.47元/天解锁文章

优惠劵

sunshine0625

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫

爬虫之抓取糗事百科的段子（python3.5环境）：1.下载页面2.解析（xpath方法）# -*-coding:utf-8 -*-import urllib.requestimport sysimport iofrom lxml import etreefrom urllib.parse import urljoinsys.stdout = io.TextIOWrappe
复制链接

扫一扫