爬虫
lgc_
这个作者很懒,什么都没留下…
展开
-
python爬取百度贴吧图片
# -*- coding: utf-8 -*-import requestsfrom lxml import etree# &pn=50class Ximage: def __init__(self): self.baseurl = "http://tieba.baidu.com/f?kw=" # 定义主页url头部信息 ...原创 2018-09-13 17:37:55 · 446 阅读 · 0 评论 -
python爬取糗事百科
import requestsfrom lxml import etreeimport pymongoclass QiushiSpider: def __init__(self): self.url = "https://www.qiushibaike.com/text/page/8/"# 定义爬取的url self.headers ...原创 2018-09-14 12:03:02 · 176 阅读 · 0 评论 -
Python爬取ajax动态加载内容
import requestsimport jsonimport csvurl= "https://movie.douban.com/j/chart/top_list?"params={"type":17,"interval_id" :"100:90","action":"", "start":0,"limit":100原创 2018-09-14 12:05:13 · 1042 阅读 · 0 评论 -
selenium+BeautifulSoup 爬虫
from selenium import webdriverfrom bs4 import BeautifulSoup as bsimport timedriver = webdriver.PhantomJS()driver.get("https://www.douyu.com/directory/all")#while True:i = 1while True: #htm...原创 2018-09-14 17:37:31 · 304 阅读 · 0 评论