每平每屋模型组件采集策略研究

注:本文仅供学习参考,不涉及任何商用,如有侵权,可联系删除

采集目标:某家装设计平台页面各模型组件参数默认值

  • 页面入口

网站爬取分析过程:

  • 查看目标数据页面的请求方式:
  1. GET请求,URL:https://ihome-turbo.oss-cn-beijing.aliyuncs.com/online/2155/entity/1324.json?OSSAccessKeyId=STS.NUbd6KGuciYuKWZaXX7sGnRJc&Expires=1656729629&Signature=EBas7BjL%2FIWxr%2BM1tyAuMkZ6SwE%3D&security-token=CAISkQJ1q6Ft5B2yfSjIr5bXL4z%2FqqpC3pueSXHrhVgNO%2FxrgZfhgTz2IHxMfXZpAu4Ys%2FgznWtT5%2FgZlr9yS5hASAnYcNF66dFX9gaseJbQv8GvtRbsBhBWQTr9MQXy%2BeOPScebJYqvV5XAQlTAkTAJstmeXD6%2BXlujHISUgJp8FLo%2BVRW5ajw0b7U%2FZHEVyqkgOGDWKOymPzPzn2PUFzAIgAdnjn5l4qnNqa%2F1qDim1QGll7RI%2Ftuse8n9NJc0bK0SCYnlgLZEEYPayzNV5hRw86N7sbdJ4z%2BvvKvGXwEIvEzdbbSJroE2d1QoOPMgb6VOoOThj%2Fd%2FuvfXlo7twgxcI%2BBOUjjYXpqnxMbU%2BGoclg%2Fr0twagAE6ZE%2BZhP5K3N%2BR8enlR1uXDpU5lFL7YYl9clreIy1f4nQ5PCROp5%2Fg54ee0c%2FCheBUy2A%2F4kr176Tl9Bepz1D5Ae6X32l7AiERMKjPK6vfFl7MXo620iZN0%2BGKRk0%2Bssrc8xRtzp%2FUuoVy60lZuE5orSk7iUdhWDTHn6nsxTUO6w%3D%3D
  2. 测试后发现URL中只有2155和1324是必要的,URL可以精简为【https://ihome-turbo.oss-cn-beijing.aliyuncs.com/online/2155/entity/1324.json
  3. 其中2155固定不变,可能类似于appkey ,1324为组件ID,各组件单独绑定
  4. headers分析未发现cookie等反爬措施

    Host: ihome-turbo.oss-cn-beijing.aliyuncs.com
    Connection: keep-alive
    sec-ch-ua: ".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"
    sec-ch-ua-mobile: ?0
    User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
    sec-ch-ua-platform: "Windows"
    Accept: */*
    Origin: https://3d.shejijia.com
    Sec-Fetch-Site: cross-site
    Sec-Fetch-Mode: cors
    Sec-Fetch-Dest: empty
    Referer: https://3d.shejijia.com/
    Accept-Encoding: gzip, deflate, br
    Accept-Language: zh-CN,zh;q=0.9

  5. 至此组件详情页面暂未发现反爬
  • 组件ID获取:

 点击左侧分类菜单触发接口

URL: https://acs.m.shejijia.com/h5/mtop.homestyler.3d.taurus.model.folder.search/1.0/?jsv=2.6.2&appKey=12574478&t=1656730679098&sign=c56125b16fe8c26078b2fd050aaa2c53&api=mtop.homestyler.3d.taurus.model.folder.search&v=1.0&type=originaljson&dataType=json&data=%7B%22poolIds%22%3A%22%5B464594%2C466584%2C470162%2C467309%2C469180%2C469232%2C469233%2C465619%2C464595%2C467378%2C469243%2C467376%2C465620%2C470224%2C470221%5D%22%2C%22limit%22%3A30%2C%22offset%22%3A30%2C%22sort%22%3A%22desc%22%2C%22requestId%22%3A%22bcf8af6f-c500-4025-9327-d87ffc29c849%22%2C%22tenant%22%3A%22ezhome%22%2C%22traceId%22%3A%22f448885d-426c-4e80-a081-e029ecebb710%22%7D

URL转码:https://acs.m.shejijia.com/h5/mtop.homestyler.3d.taurus.model.folder.search/1.0/?jsv=2.6.2&appKey=12574478&t=1656730679098&sign=c56125b16fe8c26078b2fd050aaa2c53&api=mtop.homestyler.3d.taurus.model.folder.search&v=1.0&type=originaljson&dataType=json&data={"poolIds":"[464594]","limit":30,"offset":30,"sort":"desc","requestId":"bcf8af6f-c500-4025-9327-d87ffc29c849","tenant":"ezhome","traceId":"f448885d-426c-4e80-a081-e029ecebb710"}

其中appKey不变,t为13位时间戳,,sign为32位签名,poolIds对应分类ID,分类ID接口不做阐述,整理结果文件:tp_model_cat.txt

Headers:

Host: acs.m.shejijia.com
Connection: keep-alive
platform-env: sjj
EagleEye-UserData: w_t_id=f448885d-426c-4e80-a081-e029ecebb710
sec-ch-ua-mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
Content-type: application/x-www-form-urlencoded
Accept: application/json
env-domain: sjj
sec-ch-ua-platform: "Windows"
sec-ch-ua: ".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"
Origin: https://3d.shejijia.com
Sec-Fetch-Site: same-site
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://3d.shejijia.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Cookie: t=4063b01b2822a7d153699afda59f8b7f; cna=u94QGhEH5xICAdpeC1LEqgZT; gr_user_id=bda7244d-a3a3-4910-80a9-0f3c185a2853; xlly_s=1; user=%7B%22memberId%22%3A%222347860193285898240%22%2C%22memberType%22%3A%22designer%22%2C%22nickName%22%3A%22%E8%AE%BE%E8%AE%A1%E5%B8%882346%22%2C%22avatar%22%3A%22%22%2C%22umsId%22%3A%222e9c7ae8-a405-4af0-b87a-5205494571d5%22%2C%22accessToken%22%3A%22d26bad58-2d67-49bd-84b1-3cd1a2f6c139%22%2C%22site%22%3A46%2C%22domain%22%3Anull%2C%22enterpriseId%22%3Anull%2C%22employeeId%22%3Anull%7D; __user_location_modal_show__=true; _m_h5_tk=d91ea8a9b5d7a235cfc4dfa8c3cfdc8e_1656735670492; _m_h5_tk_enc=a1f39769d85b0e8ca4148e5a0bc7d490; cookie2=1b14c65d042e0f5a064fdbabe588f774; _tb_token_=e1ee73333e5ee; _samesite_flag_=true; csg=8b709201; isg=BD8_QNCwYckpn2Yq0BDFhDuzzhPJJJPGRsXkM9EIRe414FJipPSnFB6yJrAeuGs-

Headers精简后:

Host: acs.m.shejijia.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
Content-type: application/x-www-form-urlencoded
Accept: application/json
Origin: https://3d.shejijia.com
Referer: https://3d.shejijia.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Cookie: _m_h5_tk=d91ea8a9b5d7a235cfc4dfa8c3cfdc8e_1656735670492; _m_h5_tk_enc=a1f39769d85b0e8ca4148e5a0bc7d490;

sign签名加密分析思路

  1. 根据sign值为32位猜测sign为MD5加密方式,所以用MD5作为关键词在chrome控制台全局搜索,定位到一混淆加密过函数名的方法,下断点后无法触发断点,推断该方法不是sign加密,结合cookie中的 _m_h5_tk和 _m_h5_tk_enc可能和账号的加密有关,因此放弃跟踪该方法
  2. 分析前端click事件关联的JS,推断依据为,前端点击该分类,会向后端发起请求                                                                                                                                                                                               
  3. 在JS中用关键词【sign】搜索,匹配结果过多,关键词修改为【sign 】【sign:】缩小范围,最终定位到疑似加密函数的位置                                                                                                                                                 
  4. 下断点后确认加密函数,sign.txt,其中函数入口为
    token_str + '&' + 13位时间戳 + '&' + '12574478' + '&' + {"poolIds":"[464594]","limit":30,"offset":30,"sort":"desc","requestId":"bcf8af6f-c500-4025-9327-d87ffc29c849","tenant":"ezhome","traceId":"f448885d-426c-4e80-a081-e029ecebb710"}
    其中token_str与cookie中的_m_h5_tk一致,测试结果如下
  5. 至此sign值成功破解,headers中的cookie可以通过抓包获取,本爬虫脚本不需要线上一直运行,因此每次需要抓取数据手动更新即可

总结要点

  • 该站点JS代码存在混淆加密,函数名都为随机数字或字母,且一些关键方法是通过函数回调机制调用的
  • sign值为32位的字符串,容易误导破解方向,误以为MD5加密方式
  • 关键词搜索结果较多的时候,可以通过前端事件监听方面入手缩小关键词分布的JS范围,减轻分析量

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值