Scrapy框架简单使用
下载scrapy模块
pip install scrapy
进入要存放工程的路径,创建工程
scrapy startproject scrapyDemo
进入spiders目录,新建scrapy_demo.py
import scrapy from bs4 import BeautifulSoup class tsSpider(scrapy.Spider): name = "demo" def start_requests(self): urls = [r'https://www.cnblogs.com/', ] headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'} for url in urls: yield scrapy.Request(url=url, headers=headers, callback=self.parse) def parse(self, response): soup = BeautifulSoup(response.body, "html.parser") titles = soup.find_all("a", "titlelnk") for title in titles: print(title.string)
输入
scrapy crawl demo