python爬虫

最新推荐文章于 2023-08-21 08:00:00 发布

ldaokun2006

最新推荐文章于 2023-08-21 08:00:00 发布

阅读量162

点赞数

分类专栏：环境

本文链接：https://blog.csdn.net/ldaokun2006/article/details/80240135

版权

环境专栏收录该内容

4 篇文章 0 订阅

订阅专栏

python2.7爬虫

有几点需要注意的地方：

1、正则表示的方法match、search、findall，使用方法都不一样

match:只匹配整个字符串第一个字母，如果第一个字母没有匹配到则返回none
search:查找整个文章但是只返回最后一个结果
findall:查询整个文章返回全部结果

2、中文查找

下载的页面需使用unicode转码后方可进行查找
录入的中文符号需要在字符串前加u表明是unicode才可以查找使用

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import urllib2
import urllib
import re
if __name__ == '__main__':
  resp=urllib2.urlopen("http://www.weizhang8.cn/Article/9.html")
  content=resp.read()
  content=unicode(content,'utf-8')
#  print content 
  m=re.findall(u'[\(（](.)[\)）]',content)
  if m :
    for i in m :
      print i

优惠劵

ldaokun2006

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫

python2.7爬虫有几点需要注意的地方：1、正则表示的方法match、search、findall，使用方法都不一样match:只匹配整个字符串第一个字母，如果第一个字母没有匹配到则返回nonesearch:查找整个文章但是只返回最后一个结果findall:查询整个文章返回全部结果2、中文查找下载的页面需使用unicode转码后方可进行查找录入的中文符号需要在字符串前加u表明是unicode才...
复制链接

扫一扫