第一个Python单线程爬虫(使用正则表达式)

最新推荐文章于 2020-11-27 11:59:49 发布

FesonX

最新推荐文章于 2020-11-27 11:59:49 发布

阅读量392

点赞数

分类专栏： Python爬虫文章标签：爬虫 python

本文链接：https://blog.csdn.net/FesonX/article/details/53147619

版权

Python爬虫专栏收录该内容

5 篇文章 0 订阅

订阅专栏

第一个爬虫程序

单线程爬虫，使用request模块，使用zip实现多个参数的for循环

# coding=utf-8
import requests
import re

html = requests.get('http://money.163.com/special/pinglun/')
text = html.text
t1 = re.findall('<div class="item_top">(.*?)">', text, re.S)
url = []
title = []
date = []

t2 = re.findall('<div class="item_top">(.*?)<ul class="mod_list">', text, re.S)

t3 = str(t2).decode('unicode-escape')   #如果出现u/***的编码可以使用此编码方式

t4 = re.findall('title="(.*?)" class=', t3, re.S)

t5 = re.findall('<span class="time">(.*?)</span>', t3, re.S)

for i in t4:
    title.append(i)

for i in t1:
    # print i
    t = re.findall('<a href="(.*)', i, re.S)
    # t = re.findall('u\'(.*?)\'', t, re.S)
    url.append(t)

for i in t5:
    date.append(i)

# for i, j, k in title, url, date:
#     print 'title:%s,' % i, 'created_at:%s,' % k, 'url:%s' %j

for (x, y, z) in zip(title, date, url):
    print 'title:%s,' % x, 'created_at:%s,' % y, 'url:%s' % z

# for (x, y, z) in t6:
#     print 'title:%s,' % x, 'created_at:%s,' % y, 'url:%s' % z

FesonX

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第一个Python单线程爬虫(使用正则表达式)

第一个爬虫程序单线程爬虫，使用request模块，使用zip实现多个参数的for循环# coding=utf-8import requestsimport rehtml = requests.get('http://money.163.com/special/pinglun/')text = html.textt1 = re.findall('<div class="item_top">(.
复制链接

扫一扫

专栏目录