2018年02月_Jere_Chen

12月 11月 07月 06月 05月 02月 01月

原创 python爬虫--scrapy爬取腾讯招聘网站

背景：虚拟机Ubuntu16.04，爬取https://hr.tencent.com/招聘信息！第一步：新建项目： scrapy startproject tencent 第二步：编写items文件 1 # -*- coding: utf-8 -*- 2 3 # Define here the models for your scraped items 4 # ...

2018-02-24 23:42:31 4441 2

原创 python爬虫--xpath结合re同时爬取文字与图片

还是老家的旅游网址：http://www.patour.cn/site/pananzxw/tcgl/index.html，将这些特产的图片及其介绍都爬取下来！源码： 1 # -*- coding:utf-8 -*- 2 import urllib2 3 import re 4 from lxml import etree 5 6 class Spider: 7 ...

2018-02-16 16:06:26 3387 1

原创 python爬虫--正则爬取内涵段子文字

背景：虚拟机ubuntu16.04 爬取内涵段子文字，replace处理字符串要求，根据客户要求要爬取的page数，将段子爬取下来：源码如下： 1 # -*- coding:utf-8 -*- 2 3 import urllib2 4 import re 5 6 class Spider: 7 def __init__(self): 8 ...

2018-02-13 20:20:21 1781

原创 python爬虫--re结合xpath爬取图片

背景：虚拟机ubuntu16.04利用xpath与爬取www.uumnt.cc/图片当然，我们要爬取的是动物板块！程序分析，将动物板块一页一页分析拿取出来，然后拿去各种动物页面的链接，然后对链接分析拿取图片（每个链接拿取4张图）效果为：源码如下： 1 # -*- coding:utf-8 -*- 2 3 #准备爬取https://www.uumn...

2018-02-05 09:01:43 2018

原创 python 正则表达式----练习题目

该篇记录正则表达式的一些内容，后续会一直补充利用re提取链接内容： # -*- coding:utf-8 -*- import re #方法一 #ret = re.search(r"www.baidu.com","<p>www.baidu.com</p>") #方法二 #ret = re.search(r".*\Bai\B.*","<p>w

2018-02-01 16:13:48 1538

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人