前言:有没有看点视频感觉到处都是广告,有没有觉得它的播放速度很慢,不要担心,看完这篇文章你就是老司机了
1.安装scrapy
sudo apt-get install python-scrapy
说明:
scrapy官方文档上说不建议使用这个方法来安装,但是经过我的实验发现官方网站上的安装方法都不好使,ubuntu上的安装文档如下:
http://doc.scrapy.org/en/latest/intro/install.html#ubuntu-9-10-or-above
2.创建项目
sudo scrapy startproject Mp4
创建之后在文件目录下面就有了初始的项目结构
3.Talk is cheap. Show me the code废话少说,放“码”过来!
items.py
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class Mp4Item(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
mp4name = scrapy.Field()
mp4url = scrapy.Field()
middlewares.py
# -*-coding:utf-8-*-
import random
from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
from scrapy import log
class RotateUserAgentMiddleware(UserAgentMiddleware):
def __init__(self,user_agent=''):
self.user_agent = user_agent
def process_request(self,request,spider):
ua = random.choice(self.user_agent_list)
if ua:
#print 'Current UserAgent: ' + ua
request.headers.setdefault('User-Agent',ua)
user_agent_list = [\
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
"(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
"Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
"(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
"(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
"(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
"Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
"(KHTM