介绍
用Scrapy爬了某美剧网站,本来不想爬的。但是这个网站广告太多了,而且最近还把一个页面分成了六个。我每次访问都要打开六个页面,看很多广告,我的破电脑经常卡住,我都快疯了。于是,我自己做了爬虫去爬,爬完了以后,生成一个个没有广告的页面,顿时心情好了 ^_^。
看,都是广告,而且把资源按天分成了六页。
于是,我自己动手,自定义(客製化, customise)了这个网站。下图是效果。
可见自定义以后,页面干净多了。
Demo
Demo下载地址:
http://download.csdn.net/detail/juwikuang/9855793
依赖:Python,Scrapy
运行的时候,只要点run.bat就行了。
代码
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Spider against TTMEIJUT.COM
Previously in ttmeiju.com. All the latest TV shows and movies
are presentedin one single page. it is very convinent for users.
However, since maybe last year, ttmeiju splited one single page into
six pages, which it is very anoiying to me.
I miss the good old days when there was only one page......
Do you? If you do, this script it for you.
Created on Sun May 28 12:09:05 2017
@author: Eric Chow
"""
import scrapy
from scrapy import signals
class LatestSpider(scrapy.Spider):
name = "latest"
start_urls = [
"http://www.ttmeiju.com/latest-0.ht