微信公众号文章的爬取（搜狗微信搜索）

最新推荐文章于 2024-10-09 17:23:38 发布

mr_guo_lei

最新推荐文章于 2024-10-09 17:23:38 发布

阅读量2.1w

点赞数 3

分类专栏： python笔记文章标签： python cookie 微信搜狗爬虫

本文链接：https://blog.csdn.net/mr_guo_lei/article/details/78570744

版权

1.模拟浏览器登陆，获取cookies

2.request.get()带上cookies访问

3.反反爬虫（待定，目前是代理ip+休眠，搜狗模式：封ip+封cookie+重点关照[我这里有一句mmp一定要讲]）

附上勉强能用的代码（自己根据实际情况，选择代理ip和休眠时间）

PS：获取代理ip代码：gei_ip_pools在置顶文章里面

from selenium import webdriver
import requests
import time
from bs4 import BeautifulSoup
import re
from mysql_py import *
import threading
from urllib import request
from get_ip_pools import *
import random

#get cookie
def get_cookies():
    driver = webdriver.Chrome()
    driver.get("http://weixin.sogou.com/")

    driver.find_element_by_xpath('//*[@id="loginBtn"]').click()
    time.sleep(10)

    cookies = driver.get_cookies()
    cookie = {}
    for items in cookies:
        cookie[items.get('name')] = items.get('value')
    return cookie

#url = "http://weixin.sougou.com