今天的案例以猫眼影院为例:
爬取里面各个地区,各地的电影院的所有信息
url:https://maoyan.com/cinemas
import requests
from lxml import etree
from selenium import webdriver
from urllib import request,parse
import time
dirver=webdriver.PhantomJS(executable_path=r'D:\ysc桌面\Desktop\phantomjs-2.1.1-windows\bin\phantomjs.exe')
#dirver=webdriver.Chrome()
#代理ip
proxy = {
"HTTP": "113.3.152.88:8118",
"HTTPS": "219.234.5.128:3128",
}
#伪装头
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/70.0.3538.110 Safari/537.36",
}
#地址
base_url="https://maoyan.com/cinemas"
#打开网页获取信息
response= requests.get(url=base_url,headers=headers,proxies=proxy)
html=response.content.decode("utf-8")
with open("maoyan.html","w",encoding="utf-8")as fb:
fb.write(html)
#调用etree.HTML进行树状转换
html_tree = etree.HTML(html)
#获取品牌id 行政区id 特殊厅id
li_tree=html_tree.xpath('//ul[@class="tags-lines"]/li')
#获取品牌id
brandId_dict={
}
for i in li_tree[0].xpath('./ul/li')[1

本文通过Python的Selenium库,演示了如何实现对猫眼电影网站上各地区电影院信息的深层爬取。详细步骤包括设置URL、解析地区、遍历电影院等关键操作。
最低0.47元/天 解锁文章
356

被折叠的 条评论
为什么被折叠?



