安居客租房价格字体反爬

最新推荐文章于 2024-06-02 21:08:37 发布

傲慢与偏见·

最新推荐文章于 2024-06-02 21:08:37 发布

阅读量512

点赞数

分类专栏：爬虫文章标签： python 其他

本文链接：https://blog.csdn.net/q632655672/article/details/108416024

版权

博客讨论了在抓取安居客租房价格时遇到的两个问题：一是保存的CSV文件无法通过`pd.read_csv()`正常读取，二是代码执行到第100页时出现错误。针对这些问题，博主进行了深入的分析。

摘要由CSDN通过智能技术生成

1.代码不会报错但是存的csv不能用pd.read_csv()读取

import requests
from lxml import etree
import re
import time
import random
import csv
from fontTools.ttLib import TTFont
import base64
import io
import pandas as pd
# 关闭InsecureRequestWarning提示
import urllib3
urllib3.disable_warnings()

def decode_base64(font_face):

    b = base64.b64decode(font_face)
    font = TTFont(io.BytesIO(b))
    bestcmap = font['cmap'].getBestCmap()
#     print
    unicode_num_dict = {
   }
    for key in bestcmap.keys():
        num = int(bestcmap[key].replace("glyph", "")) - 1
        key = str(hex(key))
        key = key.replace("0x", "&#x")
        key += ";"
        unicode_num_dict[key] = str(num)
    return unicode_num_dict

headers = {
   "user-agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3775.400 QQBrowser/10.6.4208.400",
           'Connection': 'close'}
url =pd.read_excel(".\安居客\租房\租房_所有街道链接.xlsx")
# df_汇总=pd.DataFrame(columns=['dis','addr','title','house_scale','house_square','price','house_floor','rent_type1','rent_type2'])

count=1
#     global data
for i in url.link.tolist():
    base_url=i+'p{}/'    

    
    page = 1
    while page <= 50:
        url = base_url.format(page)
#         url = "https://fs.zu.anjuke.com/fangyuan/chanchengqu/p{}/".format(page)
        try:
            res = requests.get(url, headers=headers, verify=False)
            content = res.content.decode("utf-8")
#             font_face = re.findall("base64,(.*)'\) format", content)[0]
            font_face = re.findall("charset=utf-8;base64,(.*)'\) format", content)[0]
#             base64_str = re.findall("charset=utf-8;base64,(.*?)'\)", page_content)[0]
# # # # # # # # # # # # # # # # # # # # # # # # # # # # #             
        except Exce

最低0.47元/天解锁文章

傲慢与偏见·

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
安居客租房价格字体反爬

import requestsfrom lxml import etreeimport reimport timeimport randomimport csvfrom fontTools.ttLib import TTFontimport base64import ioimport pandas as pd# 关闭InsecureRequestWarning提示import urllib3urllib3.disable_warnings()def decode_base64(f
复制链接

扫一扫

专栏目录