使用RPC对某者web端骑行数据进行爬取

使用RPC技术hook web端JS

骑行app某者web端爬虫

2022.2

1.获取轨迹ID

通过更改年月,可以发现获取每个月轨迹ID 的方式,这个非常简单,只需要拷贝请求头headers就可以直接获取

2.获取轨迹详情

可以发现三个请求,其中,中间的那个获取的是轨迹的聚合信息,比如均速,总里程之类的

第三个请求获取的是轨迹点信息,但是我发现这个轨迹点信息没有包含经纬度?

最后在第三个请求的末尾,找到了一行未解码的字符串encoding_points,非常的长,看名字大概率是编码之后的经纬度信息,到这里,就必须要深入JS 查看是如何解码的了

3.逆向JS,查找解码方式

我查找encoding_points 只有一个JS 文件包含,点进去,定位到 encoding_points 处的代码,

发现是传值给了decode()的函数,于是定位到这个decode, 我是把整个JS文件拷进了IDEA去查找的

这个解码函数,并不复杂,都可以直接用python手写一遍,但是我这里采用了一种基于RPC思想的hook方式来解码

decode 函数上面的那行代码 window.hook = o 就是后面添加用来暴露 decode接口的

超详细教程(原理): RPC 技术及其框架 Sekiro 在爬虫逆向中的应用,加密数据一把梭!

4.准备 Sekiro

virjar/sekiro: 基于长链接和代码注入的Android private API暴露框架 (github.com): https://github.com/virjar/sekiro

Sekiro-windows 服务端

Go HTTP File Server (virjar.com): https://oss.virjar.com/sekiro/sekiro-demo

阿里云:https://www.aliyundrive.com/s/yybi9BWJ44K

运行: Windows:bin/sekiro.bat

sekiro_web_client.js 上半部分

sekiro_web_client 上半部分(无需修改): https://sekiro.virjar.com/sekiro-doc/assets/sekiro_web_client.js

阿里云:https://www.aliyundrive.com/s/YpBvYrSSnNT

如果上面连接失效,文末附录收录有该上半部分代码:[sekiro_web_client 上半部分(无需修改)](#附: sekiro_web_client.js 上半部分)

sekiro_web_client.js 下半部分

需要按照实际情况修改

官方示例:

function guid() {
    function S4() {
        return (((1+Math.random())*0x10000)|0).toString(16).substring(1);
    }
    return (S4()+S4()+"-"+S4()+"-"+S4()+"-"+S4()+"-"+S4()+S4()+S4());
}

var client = new SekiroClient("wss://sekiro.virjar.com/business/register?group=ws-group&clientId="+guid());

client.registerAction("clientTime",function(request, resolve, reject){
    resolve(""+new Date());
})

参数解释:

•wss 链接里,如果是免费版,要将 business 改成 business-demo,解释一下涉及到的名词:•group:业务类型(接口组),每个业务一个 group,group 下面可以注册多个终端(SekiroClient),同时 group 可以挂载多个 Action;•clientId:指代设备,多个设备使用多个机器提供 API 服务,提供群控能力和负载均衡能力;•SekiroClient:服务提供者客户端,主要场景为手机/浏览器等。最终的 Sekiro 调用会转发到 SekiroClient。每个 client 需要有一个唯一的 clientId;•registerAction:接口,同一个 group 下面可以有多个接口,分别做不同的功能;•resolve:将内容传回给客户端的方法;•request:客户端传过来的请求,如果请求里有多个参数,可以以键值对的方式从里面提取参数然后再做处理。

实际示例

下面是该次示例的 sekiro_web_client.js 下半部分代码:

function guid() {
        function S4() {
            return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1);
        }

        return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());
    }

    const client_sekiro = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=xing_zhe&clientId=" + guid());

    client_sekiro.registerAction("decode", function (request, resolve, reject) {
        resolve(window.hook.decode(request['encoding_points']));
    })

各参数详解:

const client_sekiro = new SekiroClient(url:

•端口号5620是默认的,如需修改,需要修改服务端 Sekiro\conf\config.properties 文件.•免费版 是 business-demo•group 是分组名称,需要自定义,后面python调用接口时需要传入这个名称

client_sekiro.registerAction( :

•第一个参数 ‘decode’ 是给注册的方法取一个名字,后面python调用接口时需要传入这个名字,需要自定义•resolve() 这里的参数视情况而定:

1.想要调用的函数,在这里可以直接调用,那么就直接调用2.不能直接调用,就需要去源JS代码中,将这个接口暴露出来,这里再去调用,暴露接口的方式我采用的是: window.hook = 调用函数的对象

•request[‘encoding_points’] 这里的 encoding_points 是调用函数时传入的参数,无论python爬虫选择get 或 post 的方式传参,这里取得该参数的方式保持不变

5.向网站中添加hook

如何在web的JS中注入Hook[5]:放在下面那篇文章了,不写在这里

直接在 js源文件的最后 添加以下代码,然后CTRL + S保存

(function (){
    // 这里拷贝 sekiro_web_client.js 上半部分

    // 这里拷贝 sekiro_web_client.js 修改好的下半部分
    // 考虑是否需要暴露接口 查看: 各参数详解 -> resolve() 

})();

6.测试数据

此时刷新一遍网页

然后编写python文件对该hook进行验证:

import requests


host = ''
headers = {}

def parse_encoding_points(encoding_points):
    """
        向 Sekiro的服务端 发送请求,获取hook接口返回的数据
    :param encoding_points:
    :return:
    """
    params = {
        'group': 'xing_zhe',
        'action': 'decode',
    }
    url = 'http://127.0.0.1:5620/business-demo/invoke'

    res = requests.post(url=url, params=params, data={'encoding_points': encoding_points}).json()
    points = res['data']    # 获取前端hook返回的数据
    print(len(points))
    return points


def get_points(workout_id):
    """
        获取轨迹点数据
    :param workout_id: 轨迹ID
    :return:
    """
    url = f'{host}/api/v4/segment_points/?workout_id={workout_id}'
    res = requests.get(url=url, headers=headers).json()
    points = res['points']
    print(len(points))
    points_gps = parse_encoding_points(res['encoding_points'])
    for i, p in enumerate(points):
        p['latitude'], p['longitude'] = points_gps[i]

    return points


if __name__ == '__main__':
    print(get_points(XXXXXX))

可以看到,现在经纬度已经被解析出来了

附: sekiro_web_client.js 上半部分

/*
  Copyright (C) 2020 virjar <virjar@virjar.com> for https://github.com/virjar/sekiro

  Redistribution and use in source and binary forms, with or without
  modification, are permitted provided that the following conditions are met:

    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
  DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
  (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
  ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/


function SekiroClient(wsURL) {
    this.wsURL = wsURL;
    this.handlers = {};
    this.socket = {};
    // check
    if (!wsURL) {
        throw new Error('wsURL can not be empty!!')
    }
    this.webSocketFactory = this.resolveWebSocketFactory();
    this.connect()
}

SekiroClient.prototype.resolveWebSocketFactory = function () {
    if (typeof window === 'object') {
        var theWebSocket = window.WebSocket ? window.WebSocket : window.MozWebSocket;
        return function (wsURL) {

            function WindowWebSocketWrapper(wsURL) {
                this.mSocket = new theWebSocket(wsURL);
            }

            WindowWebSocketWrapper.prototype.close = function () {
                this.mSocket.close();
            };

            WindowWebSocketWrapper.prototype.onmessage = function (onMessageFunction) {
                this.mSocket.onmessage = onMessageFunction;
            };

            WindowWebSocketWrapper.prototype.onopen = function (onOpenFunction) {
                this.mSocket.onopen = onOpenFunction;
            };
            WindowWebSocketWrapper.prototype.onclose = function (onCloseFunction) {
                this.mSocket.onclose = onCloseFunction;
            };

            WindowWebSocketWrapper.prototype.send = function (message) {
                this.mSocket.send(message);
            };

            return new WindowWebSocketWrapper(wsURL);
        }
    }
    if (typeof weex === 'object') {
        // this is weex env : https://weex.apache.org/zh/docs/modules/websockets.html
        try {
            console.log("test webSocket for weex");
            var ws = weex.requireModule('webSocket');
            console.log("find webSocket for weex:" + ws);
            return function (wsURL) {
                try {
                    ws.close();
                } catch (e) {
                }
                ws.WebSocket(wsURL, '');
                return ws;
            }
        } catch (e) {
            console.log(e);
            //ignore
        }
    }
    //TODO support ReactNative
    if (typeof WebSocket === 'object') {
        return function (wsURL) {
            return new theWebSocket(wsURL);
        }
    }
    // weex 鍜� PC鐜鐨剋ebsocket API涓嶅畬鍏ㄤ竴鑷达紝鎵€浠ュ仛浜嗘娊璞″吋瀹�
    throw new Error("the js environment do not support websocket");
};

SekiroClient.prototype.connect = function () {
    console.log('sekiro: begin of connect to wsURL: ' + this.wsURL);
    var _this = this;
    // 涓峜heck close锛岃
    // if (this.socket && this.socket.readyState === 1) {
    //     this.socket.close();
    // }
    try {
        this.socket = this.webSocketFactory(this.wsURL);
    } catch (e) {
        console.log("sekiro: create connection failed,reconnect after 2s");
        setTimeout(function () {
            _this.connect()
        }, 2000)
    }

    this.socket.onmessage(function (event) {
        _this.handleSekiroRequest(event.data)
    });

    this.socket.onopen(function (event) {
        console.log('sekiro: open a sekiro client connection')
    });

    this.socket.onclose(function (event) {
        console.log('sekiro: disconnected ,reconnection after 2s');
        setTimeout(function () {
            _this.connect()
        }, 2000)
    });
};

SekiroClient.prototype.handleSekiroRequest = function (requestJson) {
    console.log("receive sekiro request: " + requestJson);
    var request = JSON.parse(requestJson);
    var seq = request['__sekiro_seq__'];

    if (!request['action']) {
        this.sendFailed(seq, 'need request param {action}');
        return
    }
    var action = request['action'];
    if (!this.handlers[action]) {
        this.sendFailed(seq, 'no action handler: ' + action + ' defined');
        return
    }

    var theHandler = this.handlers[action];
    var _this = this;
    try {
        theHandler(request, function (response) {
            try {
                _this.sendSuccess(seq, response)
            } catch (e) {
                _this.sendFailed(seq, "e:" + e);
            }
        }, function (errorMessage) {
            _this.sendFailed(seq, errorMessage)
        })
    } catch (e) {
        console.log("error: " + e);
        _this.sendFailed(seq, ":" + e);
    }
};

SekiroClient.prototype.sendSuccess = function (seq, response) {
    var responseJson;
    if (typeof response == 'string') {
        try {
            responseJson = JSON.parse(response);
        } catch (e) {
            responseJson = {};
            responseJson['data'] = response;
        }
    } else if (typeof response == 'object') {
        responseJson = response;
    } else {
        responseJson = {};
        responseJson['data'] = response;
    }


    if (Array.isArray(responseJson)) {
        responseJson = {
            data: responseJson,
            code: 0
        }
    }

    if (responseJson['code']) {
        responseJson['code'] = 0;
    } else if (responseJson['status']) {
        responseJson['status'] = 0;
    } else {
        responseJson['status'] = 0;
    }
    responseJson['__sekiro_seq__'] = seq;
    var responseText = JSON.stringify(responseJson);
    console.log("response :" + responseText);
    this.socket.send(responseText);
};

SekiroClient.prototype.sendFailed = function (seq, errorMessage) {
    if (typeof errorMessage != 'string') {
        errorMessage = JSON.stringify(errorMessage);
    }
    var responseJson = {};
    responseJson['message'] = errorMessage;
    responseJson['status'] = -1;
    responseJson['__sekiro_seq__'] = seq;
    var responseText = JSON.stringify(responseJson);
    console.log("sekiro: response :" + responseText);
    this.socket.send(responseText)
};

SekiroClient.prototype.registerAction = function (action, handler) {
    if (typeof action !== 'string') {
        throw new Error("an action must be string");
    }
    if (typeof handler !== 'function') {
        throw new Error("a handler must be function");
    }
    console.log("sekiro: register action: " + action);
    this.handlers[action] = handler;
    return this;
};

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值