python
鲲仔
这个作者很懒,什么都没留下…
展开
-
爬虫post请求处理
爬虫post请求时携带json形式request payloadimport jsonimport requestsfrom lxml import etreeurl = 'https://tass.com/userApi/categoryNewsList'header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '原创 2021-06-03 18:02:54 · 251 阅读 · 0 评论 -
scrapy多个yield反复横跳
scrapy里多个yield scrapy.Requestimport scrapyimport reimport requestsimport jsonimport timefrom ..items import InduspiderItemfrom newspaper import Articlefrom gne import GeneralNewsExtractorfrom date_extractor import extract_datefrom lxml import etr原创 2021-06-03 10:49:34 · 753 阅读 · 0 评论 -
newspaper的代理ip设置
newspaper设置代理from newspaper import Articlefrom newspaper.configuration import Configuration# add your corporate proxy information and test the connectionPROXIES = { 'http': "http://ip_address:port_number", 'https': "https://ip_ad原创 2021-05-20 15:58:53 · 355 阅读 · 0 评论