2025年最新携程酒店详细数据+评论爬取 Python

前提:

本文提供所有爬取思路,如需二次加工(可视化、存入数据库、csv文件、做成接口等)或直接需要现成代码,可以联系博主,有偿获取!

一、酒店数据爬取

1.来到携程页面中酒店模块,可以看到一个搜索框,输入查找参数后点击搜索即可返回酒店数据,那我们打开F12检查一下网络请求。

2.可以发现点击搜索后后台会请求fetchHotelList接口来获取酒店数据,那我们只需要在本地模拟这个请求就可以拿到酒店数据了。

3.使用Python requests库来实现模拟请求(注意cookies中cticket需要换成自己的!!!):

import requests

cookies = {
    'cticket': '需要换成自己的'
}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0'
}

json_data = {
    'hotelIdFilter': {
        'hotelAldyShown': [],
    },
    'destination': {
        'type': 1,
        'geo': {
            'cityId': 1,
            'countryId': 1,
        },
        'keyword': {
            'word': '',
        },
    },
    'date': {
        'dateType': 1,
        'dateInfo': {
            'checkInDate': '20250513',
            'checkOutDate': '20250514',
        },
    },
    'filters': [
        {
            'filterId': '29|1',
            'type': '29',
            'value': '1|1',
            'subType': '2',
        },
    ],
    'extraFilter': {
        'childInfoItems': [],
        'sessionId': '',
    },
    'paging': {
        'pageCode': '102002',
        'pageIndex': 1,
        'pageSize': 10,
    },
    'roomQuantity': 1,
    'recommend': {
        'nearbyHotHotel': {},
    },
    'genk': True,
    'residenceCode': 'CN',
    'head': {
        'platform': 'PC',
        'cid': '09031167313397232673',
        'cver': 'hotels',
        'bu': 'HBU',
        'group': 'ctrip',
        'aid': '4902',
        'sid': '22921635',
        'ouid': '',
        'locale': 'zh-CN',
        'timezone': '8',
        'currency': 'CNY',
        'pageId': '102002',
        'vid': '1747100446387.a7a8My4kdysO',
        'guid': '09031167313397232673',
        'isSSR': False,
    },
    'ServerData': '',
}

response = requests.post(
    'https://m.ctrip.com/restapi/soa2/31454/json/fetchHotelList',
    cookies=cookies,
    headers=headers,
    json=json_data,
)

def crawl_hotel_data(hotel):
    hotel_data = hotel.get('hotelInfo')
    # --1.酒店id
    hotel_id = hotel_data.get("summary").get("hotelId", None)
    # --2.酒店名称
    hotel_name = hotel_data.get("nameInfo").get("name", None)
    # --3.酒店英文名称
    enName = hotel_data.get("nameInfo").get("enName", None)
    # --4.酒店星级
    star = hotel_data.get("hotelStar").get("star", None)
    # --5.酒店评论得分
    commentScore = hotel_data.get("commentInfo").get("commentScore", None)
    # --6.酒店评论描述
    commentDescription = hotel_data.get("commentInfo").get("commentDescription", None)
    # --7.酒店评论数量
    commenterNumber = hotel_data.get("commentInfo").get("commenterNumber", None)
    if hotel_data.get("commentInfo").get("subScore"):
        # 酒店环境得分
        environmentalScore = hotel_data.get("commentInfo").get("subScore")[0].get('number', None)
        # 酒店卫生得分
        hygieneScore = hotel_data.get("commentInfo").get("subScore")[1].get('number', None)
        # 酒店服务得分
        serveScore = hotel_data.get("commentInfo").get("subScore")[2].get('number', None)
        # 酒店设施得分
        facilityScore = hotel_data.get("commentInfo").get("subScore")[3].get('number', None)
    # --8.酒店优势(多个) '&'.join(oneSentenceComment_list)
    oneSentenceComment_list = []
    oneSentenceComments = hotel_data.get("commentInfo").get("oneSentenceComment", [])
    for oneSentenceComment in oneSentenceComments:
        oneSentenceComment_list.append(oneSentenceComment.get('tagTitle', None))
    data = {
        '酒店id': hotel_id,
        '酒店名称': hotel_name,
        '酒店英文名称': enName,
        '酒店星级': star,
        '酒店评论得分': commentScore,
        '酒店评论描述': commentDescription,
        '酒店评论数量': commenterNumber,
        '酒店优势': oneSentenceComment_list,
    }
    return data

hotelList = response.json().get('data',{}).get('hotelList',[])

for hotel in hotelList:
    data = crawl_hotel_data(hotel)
    print(data)

4.可以看到我们成功获取到了携程返回的酒店数据,返回的数据与请求所携带的参数json有关,也就是我们上述代码中定义的json_data,这个参数也就是我们第一步搜索酒店时搜索框输入的数据。
可以通过修改json_data中参数checkInDate、pageIndex、cityId(携程自己定义的城市id)等参数来实现搜索自己想要的酒店数据。

二、房间数据爬取

1.那我们现在知道了如何采集酒店数据,下一步就是采集这个酒店的详细房间数据,老样子,随便点击一个酒店后打开F12刷新下页面,来看下携程后台调的请求数据接口。

2.进入酒店详情页面后打开F12,刷新下页面,我们可以发现携程后台调用了getHotelRoomListInland接口来获取房间数据。

3.本地模拟请求(cookies中cticket参数需要换成自己的!!!):

import requests

cookies = {
    'cticket': '需要换成自己的'
}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0'
}

json_data = {
    'search': {
        'checkIn': '20250513',
        'checkOut': '20250514',
        'hotelId': 121013047,
        'subHotelId': '121013047',
        'roomId': 0,
        'fixSubhotel': 0,
        'priceType': 2,
        'adult': 1,
        'childInfoItems': [],
        'roomQuantity': 1,
        'mustShowRoomList': [],
        'location': {
            'geo': {
                'cityID': 1,
            },
        },
        'cancelPolicyType': 0,
        'isFirstEnterDetailPage': 'T',
        'extras': {},
        'filters': [],
        'isUserSelectCheckInOut': 'T',
        'hotelUniqueKey': '',
        'hasAidInUrl': False,
        'abResultEntities': [
            {
                'key': '240605_IBU_pricH',
                'value': 'B',
            },
        ],
        'tripSub1': '',
        'preSaleInfo': {
            'productId': 0,
            'nights': 0,
            'productUseStartDate': '',
        },
        'residenceCode': 'CN',
    },
    'genk': True,
    'head': {
        'platform': 'PC',
        'cver': 'hotels',
        'ctok': '',
        'cid': '09031167313397232673',
        'bu': 'HBU',
        'group': 'ctrip',
        'syscode': '09',
        'aid': '4902',
        'sid': '22921635',
        'ouid': '',
        'locale': 'zh-CN',
        'timezone': '8',
        'currency': 'CNY',
        'pageId': '102003',
        'vid': '1747100446387.a7a8My4kdysO',
        'guid': '09031167313397232673',
        'isSSR': False,
    },
    'ServerData': '',
}

response = requests.post(
    'https://m.ctrip.com/restapi/soa2/33278/json/getHotelRoomListInland',
    cookies=cookies,
    headers=headers,
    json=json_data,
)


def crawl_room(hotel_id):
    json_data["search"]["hotelId"] = hotel_id
    json_data["search"]["subHotelId"] = hotel_id

    response = requests.post(
        'https://m.ctrip.com/restapi/soa2/33278/json/getHotelRoomListInland',
        cookies=cookies,
        headers=headers,
        json=json_data,
    )

    """返回json"""
    keys_list_physicRoomMap = ["hotel_id", "room_id", "room_name", "pictureInfo_list",
                               "facility_dict", "bed_title", "BedInfo_overall", "windowInfo_title", "smokeInfo_title",
                               "areaInfo_title", "floorInfo_title", "wifiInfo_title"]
    keys_list_saleRoomMap = ["saleRoom_id", "bookingStatusInfo_isBooking",
                             "bookingStatusInfo_isFullRoom", "meal_overall", "cancelInfo_title", "guestCount",
                             "childCount", "price", "deletePrice", "promotionItem_list", "discount_title",
                             "priceLabe_list", "rpInfos"]

    data_all = []
    physicRoomMaps = response.json().get("data", {}).get("physicRoomMap", [])

    """physicRoomMap层>>>"""
    for physicRoomMap in physicRoomMaps.values():
        physicRoomMap_list = []
        """房间信息数据"""
        # --3.房间id  room_id
        room_id = physicRoomMap.get("id", "None")
        # 将酒店的id和名称装进去(每个房间都装)
        physicRoomMap_list.append(hotel_id)

        physicRoomMap_list.append(room_id)
        # --4.房间名称  room_name
        room_name = physicRoomMap.get("name", "None")
        physicRoomMap_list.append(room_name)
        # --5.房间图片  pictureInfo_list
        pictureInfo_list = []
        pictureInfos = physicRoomMap.get("pictureInfo", [])
        for pictureInfo in pictureInfos:
            pictureInfo_list.append(pictureInfo.get("url", "None"))
        physicRoomMap_list.append(pictureInfo_list)
        """设施"""
        faciltityInfos = physicRoomMap.get("faciltityInfo", {}).get("list", [])
        # --6.设施字典  facility_dict
        facility_dict = {}
        for facilityInfo in faciltityInfos:
            title = facilityInfo.get("title", "None")
            subList = facilityInfo.get("subList", [])
            small_title_list = []
            for sub in subList:
                small_title = sub.get("title", "None")

                additionInfos = sub.get("additionInfo", [])
                infoContent_list = []
                for additionInfo in additionInfos:
                    infoContent = additionInfo.get("infoContent", "None")
                    infoContent_list.append(infoContent)
                # 组装小标签
                if len(infoContent_list) != 0:
                    small_title_overall = f"{small_title}({'、'.join(infoContent_list)})"
                else:
                    small_title_overall = small_title
                small_title_list.append(small_title_overall)

            facility_dict[title] = small_title_list
        physicRoomMap_list.append(facility_dict)
        """床"""
        bedInfo = physicRoomMap.get("bedInfo", {})
        # --7.床标题  bed_title
        bed_title = bedInfo.get("title", "None")
        physicRoomMap_list.append(bed_title)
        # --8.床信息  BedInfo_overall
        BedInfo_dict = {}
        BedInfo_title = bedInfo.get("title", "None")
        cpxBedInfo = bedInfo.get("cpxBedInfo", {})
        cpxBedInfo_title = cpxBedInfo.get("title", "None")
        bedDetails = cpxBedInfo.get("bedDetail", [])
        bedDetail_list = []
        for bedDetail in bedDetails:
            roomName = bedDetail.get("roomName", "None")
            details = bedDetail.get("detail", [])
            # 组装卧室信息
            if len(details) != 0:
                room_overall = f"{roomName}({'、'.join(details)})"
            else:
                room_overall = roomName
            bedDetail_list.append(room_overall)

        BedInfo_dict[cpxBedInfo_title] = bedDetail_list
        if BedInfo_dict:
            BedInfo_overall = f"{BedInfo_title}({BedInfo_dict})"
        else:
            BedInfo_overall = BedInfo_title
        physicRoomMap_list.append(BedInfo_overall)
        """告知信息"""
        # --9.窗户信息  windowInfo_title
        windowInfo_title = physicRoomMap.get("windowInfo", {}).get("title", "None")
        physicRoomMap_list.append(windowInfo_title)
        # --10.禁烟信息 smokeInfo_title
        smokeInfo_title = physicRoomMap.get("smokeInfo", {}).get("title", "None")
        physicRoomMap_list.append(smokeInfo_title)
        # --11.房间面积  areaInfo_title
        areaInfo_title = physicRoomMap.get("areaInfo", {}).get("title", "None")
        physicRoomMap_list.append(areaInfo_title)
        # --12.楼层信息 floorInfo_title
        floorInfo_title = physicRoomMap.get("floorInfo", {}).get("title", "None")
        physicRoomMap_list.append(floorInfo_title)
        # --13.wifi信息 wifiInfo_title
        wifiInfo_title = physicRoomMap.get("wifiInfo", {}).get("title", "None")
        physicRoomMap_list.append(wifiInfo_title)

        # 将这个房间的physicRoomMap层数据变成list装进字典中
        dict_physicRoomMap = {key: value for key, value in zip(keys_list_physicRoomMap, physicRoomMap_list)}
        dict_physicRoomMap_json = {"basicInfo": dict_physicRoomMap}
        data_all.append(dict_physicRoomMap_json)

    """saleRoomMaps层>>>"""
    saleRoomMaps = response.json().get("data", {}).get("saleRoomMap", [])
    for saleRoom_id, saleRoomMap in saleRoomMaps.items():
        # 装saleRoomMaps层所有值
        saleRoomMap_list = []
        # 房间id(房间顺序与上一个循环不一样,需要对比房间id来分配数据)
        sale_room_id = saleRoomMap.get("physicalRoomId", "None")
        for data in data_all:
            if data.get("basicInfo").get("room_id") == sale_room_id:
                data_json = data

        # --14.saleRoom_id
        saleRoom_id = saleRoom_id.split("_")[1]
        saleRoomMap_list.append(saleRoom_id)
        # --15.房间是否可以预定  bookingStatusInfo_isBooking
        bookingStatusInfo_isBooking = saleRoomMap.get("bookingStatusInfo", {}).get("isBooking", "None")
        saleRoomMap_list.append(bookingStatusInfo_isBooking)
        # --16.房间是否满了  bookingStatusInfo_isFullRoom
        bookingStatusInfo_isFullRoom = saleRoomMap.get("bookingStatusInfo", {}).get("isFullRoom", "None")
        saleRoomMap_list.append(bookingStatusInfo_isFullRoom)
        # --17.餐食信息  meal_overall
        mealInfo = saleRoomMap.get("mealInfo", {})
        mealInfo_title = mealInfo.get("title", "None")
        mealInfo_hover = mealInfo.get("hover", [])
        # 组装餐食信息
        if len(mealInfo_hover) != 0:
            meal_overall = f"{mealInfo_title}({'、'.join(mealInfo_hover)})"
        else:
            meal_overall = mealInfo_title
        saleRoomMap_list.append(meal_overall)
        # --18.取消预定信息 cancelInfo_title
        cancelInfo_title = saleRoomMap.get("cancelInfo", {}).get("title", "None")
        saleRoomMap_list.append(cancelInfo_title)
        # --19.可入住成年人数量 guestCount
        guestCount = saleRoomMap.get("guestCountInfo", {}).get("guestCount", "None")
        saleRoomMap_list.append(guestCount)
        # --20.可入住儿童数量 childCount
        childCount = saleRoomMap.get("guestCountInfo", {}).get("childCount", "None")
        saleRoomMap_list.append(childCount)
        # --21.金额(现价) price
        price = saleRoomMap.get("priceInfo", {}).get("price", "None")
        saleRoomMap_list.append(price)
        # --22.金额(原价) deletePrice
        deletePrice = saleRoomMap.get("priceInfo", {}).get("deletePricewithOutCurrency", "None")
        if deletePrice == 0:
            deletePrice = "None"
        saleRoomMap_list.append(deletePrice)
        # --23.打折信息 promotionItem_list
        promotionItems = saleRoomMap.get("totalPriceInfo", {}).get("promotionItems", [])
        promotionItem_list = []
        for promotionItem in promotionItems:
            promotionItem_title = promotionItem.get("title", "None")
            promotionItem_content = promotionItem.get("content", "None")
            promotionItem_list.append(promotionItem_title + promotionItem_content)
        saleRoomMap_list.append(promotionItem_list)
        # --24.打折标题 discount_title
        if deletePrice != "None" and price != "None":
            discount_title = round(price / deletePrice, 2)
        else:
            discount_title = "None"
        saleRoomMap_list.append(discount_title)
        # --25.特惠一口价
        priceLabe_list = []
        priceLabes = saleRoomMap.get("priceLabelList", [])
        for priceLabe in priceLabes:
            priceLabe_list.append(priceLabe.get("text", "None"))
        saleRoomMap_list.append(priceLabe_list)
        # --26."一张床位"
        rpInfo_list = []
        rpInfos = saleRoomMap.get("rpInfos", [])
        for rpInfo in rpInfos:
            rpInfo_title = rpInfo.get("title", "None")
            rpInfo_list.append(rpInfo_title)
        saleRoomMap_list.append(rpInfo_list)

        # 将对应房间的saleRoomMaps层数据装进字典中
        dict_saleRoomMap = {key: value for key, value in zip(keys_list_saleRoomMap, saleRoomMap_list)}
        if data_json.get("roomInfo"):
            data_json.get("roomInfo").append(dict_saleRoomMap)
        else:
            data_json["roomInfo"] = [dict_saleRoomMap]
        # 更换对应data_all中对应房间的数据
        for index in range(len(data_all)):
            if data_all[index].get("basicInfo").get("room_id") == sale_room_id:
                data_all[index] = data_json

    # 将data_all中所有房间数据按照价格排序
    data_all = sorted(data_all, key=lambda x: x.get("roomInfo", [])[0].get("price", 10000000))

    return data_all

data_all = crawl_room(121013047)
print(data_all)

4.上述代码中可以通过修改json_data中hotel_id、subHotelId来实现采集需要的酒店,这俩参数指的都是我们第一章中获取的hotel_id,自己把两步合在一起即可实现连续采集。

三、评论数据采集

1.在酒店详情页面中还有一个请求叫GetReviewList,还是需要模拟这个请求来实现获取评论数据。

2.这个请求中涉及一个加密参数testab,会根据携程后台加密生成,有效期1min,涉及到js逆向,我也不想讲解了,直接放代码吧。

3.生成加密参数testab代码(1)cookies中cticket参数需要换成自己的!!!and2)此处的headers中ua必须与请求GetReviewList获取评论数据接口所使用的ua为一个,否则生成的testab无效):

此处代码出自:深入浅出携程testab参数逆向还原(Python版)_携程testtab参数逆向-CSDN博客

import base64
import json
import requests

headers = {
    # 必须与后面token中ua相同
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0'
}
cookies = {
    'cticket': '需要换成自己的'
}
json_data_r = {
            'callback': 'xWygfkgQCQ',
            'a': 0,
            # 这个参数应该是固定的,不用修改,在2025-05-11这天可以成功请求
            'b': '2025-05-10',
            'c': '2025-05-11',
            'd': 'zh-cn',
            'e': 2,
            'head': {
                'Locale': 'zh-CN',
                'Currency': 'CNY',
                'Device': 'PC',
                'UserIP': '49.5.157.217',
                'Group': 'ctrip',
                'ReferenceID': '',
                'UserRegion': 'CN',
                'AID': '4902',
                'SID': '22921635',
                'Ticket': '',
                'UID': '',
                'IsQuickBooking': '',
                'ClientID': '09031090211285786510',
                'OUID': '',
                'TimeZone': '8',
                'P': '30577338741',
                'PageID': '102003',
                'Version': '',
                'HotelExtension': {
                    'WebpSupport': True,
                    'group': 'CTRIP',
                    'Qid': '959076176927',
                    'hasAidInUrl': False,
                },
                'Frontend': {
                    'vid': '1744940890613.ea72Ce4yaIRS',
                    'sessionID': '27',
                    'pvid': '1',
                },
            },
            'ServerData': '',
        }
json_data_testAb = {
            'PageNo': 1,
            'PageSize': 10,
            'MasterHotelId': 996471,
            'NeedFilter': True,
            'UnUsefulPageNo': 1,
            'UnUsefulPageSize': 5,
            'isHasFold': False,
            'genk': True,
            'genKeyParam': {
                'a': 996471,
                'd': 'zh-cn',
                'e': 2,
            },
            'ssr': False,
            'head': {
                'Locale': 'zh-CN',
                'Currency': 'CNY',
                'Device': 'PC',
                'UserIP': '49.5.157.217',
                'Group': 'ctrip',
                'ReferenceID': '',
                'UserRegion': 'CN',
                'AID': '4902',
                'SID': '22921635',
                'Ticket': '',
                'UID': '',
                'IsQuickBooking': '',
                'ClientID': '09031090211285786510',
                'OUID': '',
                'TimeZone': '8',
                'P': '30577338741',
                'PageID': '102003',
                'Version': '',
                'HotelExtension': {
                    'WebpSupport': True,
                    'group': 'CTRIP',
                    'Qid': '959076176927',
                    'hasAidInUrl': False,
                    'hotelUuidKey': 'HlaiUZK3LW0sY8PKohehAEa4W77yXGebhiHYFZjpmWFDxo5YU6EtAePQIlzWcYd7xPmelhrase64EzNjpTWl6rSSx1UrLYalYgwt5iFtjTGwtlvOXjcJLPeFcWgOvHTjbJqsjplwSbvpgjPJdFvl1YLfy0SedHvgHefSYSXjLQyTHR1LvnMw0YT0jNtwUZrq9yNPj3AvQFEOav49Wtgj0ZYLpKtEsYcj7jzfEpsY0ci5Aw0qRXaE1gWosyQoR70wtYQhEPgymzyZcvQMEMOePYznIt1KLqYf6ikgK3ARSYU8iqdK73yPtvFdYNAy3gj08vs8eFDYScjdQyfJaBvnZYhAyS1jTQvgNeldYHLjnByAJpLYfpvB0WhoWNprfbvaaKMY3SesFwDvMNv4oeNBYHBiLHYMowqqKtkrMYFFKkQi5PiSpeD8RToeSYF5IX9YpmvzQIf3KAGetYaLK49Yg3JTAYg6i8niFUisdjkXxQUYSDxOYQFydpj0TjpzvXNE8mY06wXlRd7JTtY57w4pRZaxP4xzYH4IXBxHwhDRt5yfDEMawOfjBTySFis7Jm1Jz1v7mY4pJ6LvDhK1sIZ8ElY8QK3py66xXXKUticfK3YPowD6EqHe05Rd8yMbEkHEG4vQNEt4WhcjFhvpgWSUwnLRGqimdYbUxfPiOYMLKShKlUel6j51wG5vcfjmgJqdiTQWUYoqRkpvX7xqbR87EQzEdBWD0eLFv4MwBsWAUiB5i06vFYHbrttKbTiqLRhfEUoEFfWFAeL6vGAY7MWlswFfymli3YgtjS4Y1fRd5RPLyGNEaGwb1jtHydligZJP3JUBvDnYA0J6fj6OenQx3DIdYD0JPnvnTj5zEBOj3kW46Wl3WBaYMTYBkY6OROmYz8WMdY5zYb8YXnjM7eHpEBnWtBenow0AeaQjObY7fyzBEHZjszE7Mrq3jhDwq5yUw0AY0By6YtlRXfWp4WTpWBaW14Y9Y68x5BYfaizdvs1E3SWXtys5jLJ5NvG8EHkWOQyPhj6sx4j85JsYTHIhnv8fW7AEc8E83E7qRkdEthWPQRkpiSYBpigLRc3RZkY1pEp0EXpY0FYPXYpgY37YAHWsh',
                },
                'Frontend': {
                    'vid': '1744940890613.ea72Ce4yaIRS',
                    'sessionID': '26',
                    'pvid': '5',
                },
            },
            'ServerData': '',
        }


def getr():
    response = requests.post(
        'https://m.ctrip.com/restapi/soa2/21881/json/getHotelScript',
        headers=headers,
        cookies=cookies,
        json=json_data_r,
    )

    Response = response.json()['Response']
    code = Response.split('window')[1][1:].replace(');})();', '')
    code = json.loads(code)
    testab = get_testAb(code)

    return testab


def decode(j):
    if not j:
        return ""

    # 添加填充符号 '=' 以确保字符串长度是 4 的倍数
    padding = len(j) % 4
    if padding != 0:
        j += '=' * (4 - padding)

    # 使用 base64 模块进行解码
    decoded_bytes = base64.b64decode(j)
    decoded_string = decoded_bytes.decode('utf-8', 'ignore')  # 使用 utf-8 解码字节,忽略错误字符
    return decoded_string


def get_testAb(a):
    _bot_81bbc = a

    _bot_02c0 = 1
    _bot_65beb = 0

    decoded_string = decode(_bot_81bbc['b'])
    _bot_a7a72 = []
    temp = []
    for char in decoded_string:
        if len(temp) == 0 or len(temp[-_bot_02c0]) == 5:
            temp.append([])
        temp[-_bot_02c0].append(-_bot_02c0 * 1 + ord(char))
    _bot_a7a72 = temp

    first_arr = []
    first_opcode = _bot_a7a72[1993:2121]

    for item in first_opcode:
        number = item[2]
        final_number = _bot_81bbc['d'][number]
        if isinstance(final_number, str):
            continue
        if item[0] == 50:
            final_number *= -1
        first_arr.append(final_number)

    b_arr = [0, 8, 16, 24, 32, 40, 48, 56]

    arr8_1 = [first_arr[b] for b in b_arr]
    arr_z = arr8_1
    second_arr = arr_z.copy()

    for p in range(7):
        arr_y = []
        for j in range(8):
            if j == 0:
                arr8_1_1 = (arr_z[j] + arr_z[j + 1]) // 2 + first_arr[b_arr[j] + 1 + p]
            elif j == 7:
                arr8_1_1 = (arr_z[j - 1] + arr_z[j]) // 2 + first_arr[b_arr[j] + 1 + p]
            else:
                arr8_1_1 = (arr_z[j - 1] + arr_z[j] + arr_z[j + 1]) // 3 + first_arr[b_arr[j] + 1 + p]
            arr_y.append(arr8_1_1)
        arr_z = arr_y
        second_arr.extend(arr_z)

    third_arr = []
    for u in range(8):
        t_arr_1 = [second_arr[b_arr[j] + u] for j in range(8)]
        third_arr.extend(t_arr_1)

    testab = ''.join([chr(x) for x in third_arr])

    return testab

testab = getr()
print(testab)

4.模拟GetReviewList请求:

import requests

cookies = {
    'cticket': '需要换成自己的'
}

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36 Edg/136.0.0.0',
}

params = {
    'testab': '76373670c8bd9239628908d3e776fa65b91b999bc9ee5a70c1d3bd9fb6b3cb15',
}

json_data = {
    'PageNo': 1,
    'PageSize': 10,
    'MasterHotelId': 121013047,
    'NeedFilter': True,
    'UnUsefulPageNo': 1,
    'UnUsefulPageSize': 5,
    'isHasFold': False,
    'genk': True,
    'genKeyParam': {
        'a': 121013047,
        'd': 'zh-cn',
        'e': 2,
    },
    'ssr': False,
    'head': {
        'Locale': 'zh-CN',
        'Currency': 'CNY',
        'Device': 'PC',
        'UserIP': '123.177.53.139',
        'Group': 'ctrip',
        'ReferenceID': '',
        'UserRegion': 'CN',
        'AID': '4902',
        'SID': '22921635',
        'Ticket': '',
        'UID': '',
        'IsQuickBooking': '',
        'ClientID': '09031167313397232673',
        'OUID': '',
        'TimeZone': '8',
        'P': '30577338741',
        'PageID': '102003',
        'Version': '',
        'HotelExtension': {
            'WebpSupport': True,
            'group': 'CTRIP',
            'Qid': '777139300295',
            'hasAidInUrl': False,
            'hotelUuidKey': '8OHR1TKPoeaDE1bimTegpEfsWdDKkGR4EmYh8xUcIGjODYBsEf5ydoJ1orLYpByFPWBayPpeh7Egdj7mWmpKz0RnnxNYQciXyA3xtDjkLwfSv7MJTJHSRBnEF6YZfjQJpFjo8wgFvpQjqJL9vTNYqbysTjS9vMLe6MesQjFQyOTIPJ5cIzYO6KlAjSsRh8yzQjHzvXOEUDvTPWpZj8PW45Epci8YP4WZURSZYq1Yt3i5HwUdRXhEocWkMY0Ge30KqYZAYPpxOkW8jOqEnXEDYfGjh1xpAr95vp9IBUIFYaSvF7iXfybOvszYHUyZAjckvl8emgYpoj8QytJ1lvB6Y6Gy8UjL4vZSeFTYdlj7QybJoPYcdv1sWMPWHPrt8Rf0ynYfvgmrFsidfvQsetSYFZiLHYFsJhPe6vQY3DIt7yAHJZMiQME1DraYlpIMPJHsE18r5FIlkwdYo0xlqvD4wbOYgXi3OioTiQMjg4wBEZwSYGcwstJHlJ1LvkgE9qY0kwXzR6GJg6YqOwb3Ef8YzpY9Yd4EqvtFxXdR38yH0Egcy1hRHMWFOiklY6ZRs4JT5yz6WLGJNmKGOi3lJsYpfIzhi7bwMOrLdJfMR8YL3eaUwFLef1Rbky8PE16yXbRfHW0FWkcEDZESfwpXigZjGUyl6vHUKXpJSY8FyLovs0vUBjTlwNDvpAjtNvMbjt7yLYH4WlorXESTRkovqQY3NW3Ge7MR3pWU7jpkWDtEHUwlgWoYlnJofxSzvfNRAFvH3YA9Wa6eoMRnhW8NE36WPBRXmwsLRAYgni9pwThI1TRDoyT1EFPyknR09WX0iMZYkQRMqJ4myOZWmFi0Qx45wSDR9Yo0xPOeQOwtgELdjXBWUPWgBWhUYo1YX4YFdRhbY5NWA9Yz7YOHYmZjpzeh4EbFW8feOTwl6ezOjtLYmoy1SEt1jcqE6qrU9j9lw8ayXjfXJ5or0Y7TRHPWAQW46WfmWFOYmYBUvGhK3NIs0vXkEmkWntyFMjtJ60vZpEo9WkPy8qjThRanYUUKfYqkIGOE9Br86EMUEN7EomRl3EF3YNnecsJnYlgvAOKDsenqYoME6oETAYHdY8cYo7YN9JDLRN1',
        },
        'Frontend': {
            'vid': '1747100446387.a7a8My4kdysO',
            'sessionID': '1',
            'pvid': '8',
        },
    },
    'ServerData': '',
}

response = requests.post(
    'https://m.ctrip.com/restapi/soa2/21881/json/GetReviewList',
    params=params,
    cookies=cookies,
    headers=headers,
    json=json_data,
)

print(response.json())

四、总结

本文提供了所有请求的实现代码,具体实现组合需要自己动手,如有需求可在评论区查找博主联系方式或私信我。

Python爬取酒店数据的操作流程大致可分为以下几个步骤: 1. 分析目标网站:首先要了解目标网站的页面结构和数据存储方式。查看页面是否动态加载,确定需要爬取数据在哪些标签下。 2. 导入相关库:使用Python进行数据爬取需要导入一些库,如requests、BeautifulSoup等。 3. 发送请求获取页面内容:使用requests库向目标网站发送请求,获取到网页的HTML内容。 4. 解析页面内容:通过BeautifulSoup库对页面进行解析,将HTML内容进行结构化处理,方便后续提取数据。 5. 提取酒店数据:根据目标网站的页面结构,使用BeautifulSoup或正则表达式等方法提取需要的酒店数据,如酒店名称、价格、评分等。 6. 存储数据:将提取到的酒店数据按照需要的格式进行存储,可以选择保存为CSV文件、Excel表格、数据库等。 7. 翻页操作:如果需要爬取多页数据,则需要模拟翻页操作,发送下一页请求,重复步骤3-6,直到爬取完所有需要的数据。 8. 异常处理和日志记录:在爬取过程中,可能会遇到各种异常情况,如网络连接超时、解析错误等,需要进行相应的异常处理,并记录日志,方便后续排查问题。 9. 定时任务和自动化:可以使用Python的定时任务工具,如APScheduler,实现定时自动运行爬虫程序,更新酒店数据。 以上就是Python爬取酒店数据的大致操作流程,具体实现时可以根据实际情况进行调整和优化。
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值