浏览器复制的curl转换为Python requests库代码的python代码

是的,你没看错,读起来有点拗口,但真的会很好用。
这是转换浏览器中复制的curl语句的代码
代码来源于curltopy库的源代码。需要的朋友可以前往github查看

安装后发现直接按照他的介绍运行不起来,所以这里修改了一些地方,才可以正常使用!

如何复制curl命令

复制的时候选curl bash,另外一个cmd以^作为换行,会影响解析
在这里插入图片描述
如果是Charles这种工具里,选第二个带request的选项,将复制出来的字符串的url放到curl 开头,不然url会解析错误
在这里插入图片描述

charles的和浏览器的区别

一个是charles喜欢用host,浏览器喜欢用authority。
通俗理解就是 host 是 authority 的子串,authority 可以包含端口,而 host 不含端口。
参考:
https://blog.csdn.net/qq_26810645/article/details/106853573
在这里插入图片描述

代码段

下面是代码

#!/usr/bin/env python
# Description: converts curl statements to python code
# Inspired by http://curl.trillworks.com/
import sys
import shlex
import math

from urllib3.connectionpool import xrange

INDENT = 4
PRINTLINE = 80


def print_key_val(init, value, pre_indent=0, end=','):
    """Print the key and value and insert it into the code list.
    :param init: string to initialize value e.g.
                 "'key': " or "url = "
    :param value: value to print in the dictionary
    :param pre_indent: optional param to set the level of indentation,
                       defaults to 0
    :param end: optional param to set the end, defaults to comma
    """
    indent = INDENT * pre_indent
    # indent is up to the first single quote
    start = indent + len(init)
    # 80 is the print line minus the starting indent
    # minus 2 single quotes, 1 space, and 1 backslash
    left = PRINTLINE - start - 4
    code = []
    code.append("{i}{s}'{v}'".format(i=" " * indent, s=init, v=value[:left]))
    if len(value) > left:
        code[-1] += " \\"
        # figure out lines by taking the length of the value and dividing by
        # chars left to the print line
        lines = int(math.ceil(len(value) / float(left)))
        for i in xrange(1, lines):
            delim = " \\"
            if i == lines - 1:
                delim = end
            code.append("{i}'{v}'{d}".format(i=" " * start,
                                             v=value[i * left:(i + 1) * left],
                                             d=delim))
    else:
        code[-1] += end
    return code


def dict_to_code(name, simple_dict):
    """Converts a dictionary to a python compatible key value pair
    >>> code = dict_to_code("cookies", cookies)
    :param name: name of the variable
    :param simple_dict: dictionary to iterate
    :return: python compatible code in a list
    """
    code = []
    if simple_dict:
        code.append("{} = {{".format(name))
        # check for python3
        try:
            for k, v in simple_dict.items():
                init = "'{k}': ".format(k=k)
                code += print_key_val(init, v, 1)
        except:
            for k, v in simple_dict.iteritems():
                init = "'{k}': ".format(k=k)
                code += print_key_val(init, v, 1)
        code.append("}\n")
    return code


def create_request(url, method, cookies, headers, data=None):
    """Create request code from params
    >>> code = create_request("https://localhost:8080", None, "get", None,
    None)
    :param url: url requested
    :param method: method used e.g. get, post, delete, put
    :param cookies: dict of each cookie
    :param headers: dict of each header
    :param data: optional param to provided data to the request
    :return: python compatible code in a list
    """
    code = []
    key_value = "{i}'{k}': '{v}'"
    # check for cookies
    code += dict_to_code("cookies", cookies)
    # check for headers
    code += dict_to_code("headers", headers)
    code += print_key_val("url = ", url, end='')
    resstr = "res = requests.{}(url, ".format(method)
    append = "headers=headers"
    # if there are cookies / data, then attach it to the requests call
    if cookies:
        append += ", cookies=cookies"
    if data:
        code.append("data = '{}'".format(data))
        append += ", data=data"
    code.append(resstr + append + ")")
    code.append("print(res.content)\n")
    return code


def curl_to_python(command):
    """Convert curl command to python script.
    >>> code = curl_to_python(command)
    >>> print('\n'.join(code))
    :param command: curl command exported from Chrome's Dev Tools
    :return: python compatible code in a list
    """
    # remove quotations
    args = shlex.split(command)
    data = None
    # check for method
    if '-X' in args:
        method = args[args.index('-X') + 1]
    elif '--data' in args:
        method = 'post'
        data = args[args.index('--data') + 1]
    else:
        method = 'get'

    url = args[1]
    # gather all the headers
    headers = {}
    for i, v in enumerate(args):
        if '-H' in v:
            myargs = args[i + 1].split(':')
            headers[myargs[0]] = ''.join(myargs[1:]).strip()

    cookies = {}
    # gather all the cookies
    if 'Cookie' in headers or 'cookie' in headers:
        if 'Cookie' in headers:
            cookie_key = 'Cookie'
        else:
            cookie_key = 'cookie'
        cookie = headers[cookie_key]
        # remove cookies from headers because it will be added separately
        del headers[cookie_key]
        cookies = dict([c.strip().split('=', 1) for c in cookie.split(';')])

    code = []
    code.append("#!/usr/bin/env python")
    code.append("import requests\n")
    code += create_request(url, method, cookies, headers, data)
    return code


def res_to_curl(res):
    """converts a requests response to a curl command
    >>> res = requests.get('http://www.example.com')
    >>> print(res_to_curl(res))
    curl 'http://www.example.com/' -X 'GET' ...
    Source: http://stackoverflow.com/a/17936634
    :param res: request object
    """
    req = res.request
    command = "curl '{uri}' -X '{method}' -H {headers}"
    headers = ["{}: {}".format(k, v) for k, v in req.headers.items()]
    header_line = " -H ".join(['"{}"'.format(h) for h in headers])
    if req.method == "GET":
        return command.format(method=req.method, headers=header_line,
                              uri=req.url)
    else:
        command += " --data-binary '{data}'"
        return command.format(method=req.method, headers=header_line,
                              data=req.body, uri=req.url)


def main(command=None):
    """Main entry point.
    Purposely didn't use argparse or another command line parser to keep this
    script simple.
    """
    if not command:
        command = 'curl "http://www.example.com" ' + \
                  '-H "Pragma: no-cache" ' + \
                  '-H "Accept-Encoding: gzip, deflate" ' + \
                  '-H "Accept-Language: en-US,en;q=0.8"'
    code = curl_to_python(command)
    print('\n'.join(code))
    with open('my_code.py','w') as f:
        f.write('\n'.join(code))


if __name__ == "__main__":
    command = """curl 'https://ditu.amap.com/service/poiInfo?query_type=TQUERY&pagesize=20&pagenum=1&qii=true&cluster_state=5&need_utd=true&utd_sceneid=1000&div=PC1000&addr_poi_merge=true&is_classify=true&zoom=9.45&city=310000&geoobj=121.311876%7C30.803731%7C122.276484%7C31.709055&keywords=%E5%8A%A0%E6%B2%B9%E7%AB%99' \
  -H 'authority: ditu.amap.com' \
  -H 'accept: */*' \
  -H 'x-csrf-token: null' \
  -H 'x-requested-with: XMLHttpRequest' \
  -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36' \
  -H 'amapuuid: 6eb20d10-d5ea-4de2-ba08-6d21adaac10e' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-dest: empty' \
  -H 'referer: https://ditu.amap.com/' \
  -H 'accept-language: zh-CN,zh;q=0.9,ja;q=0.8' \
  -H 'cookie: cna=1y6ZGixOx2UCAbfAJTLJF0Nz; UM_distinctid=17f54a67a4e7-04145a72489c9b-3e604809-100200-17f54a67a4f38c; guid=4133-d74b-a13a-7da7; xlly_s=1; _uab_collina=164960964925517921868322; l=eBauGa7ng1lUhHXzBOfwhurza77OOIRf_uPzaNbMiOCP9JC257wCW620BVYyCnGVH6rMk37vCcaaByLpsyIVM74V7WXXH1MmndC..; tfstk=cLRlB7NKNLWWleXDCb1SOOJ-sO1lZraPVOB2g4fDtlzfuURViqe4QNn4ViumiJ1..; isg=BF5e6sfNsNrv5ecfQmPZdftmr_SgHyKZeYrQJAjn5KGcK_8Ffa_xqamJIzcnFBqx' \
  --compressed"""
    main(command)

用法

将你复制的命令行粘贴到最底下的command字符串后运行文件,如果是Charles的,注意将url提前到最前面
生成的代码会放到目录下的my_code.py中,注意查看自己的有没有同名文件
运行这段生成的代码需要安装requests库
如果ssl有问题,尝试requests参数中设置verify=False,并检查是否有代理工具在运行。

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论
Python 爬虫 本项目依赖Python的BeautifulSoup4第三方,使用本项目需要先安装BeautifulSoup4。 步骤一 安装依赖: 安装BeautifulSoup4: 1.Debain或Ubuntu可以通过系统软件包管理安装 $sudo apt-get install Python-bs4 2.使用easy_install或者pip安装: $ sudo easy_install beautifulsoup4 或$ sudo pip install beautifulsoup4 easy_install和pip是Python的发行包管理工具,同样需要先安装才能使用,这里介绍easy_install的安装方法: 1.Mac OS X 系统可以在终端执行以下命令: curl https://bootstrap.pypa.io/ez_setup.py -o - | sudo python 2.Linux系统可以执行以下命令: wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python 3.Window系统: 下载ez_setup.py并运行 步骤二 运行: 运行sexy.py文件爬取网站一: 1.不带参数运行:直接运行sexy.py,使用默认配置参数。 2.可用参数: -s 或 --startpage :起始扫描页面,默认值为1,从第一页开始扫描 -e 或 --endpage :最后扫描页面,默认值为65589。 -d 或 --dir :相对当前文件,下载图片保存位置,默认为sexy_images文件夹 -m 或 --max :获取页面失败后最大重试次数,默认为3 -n 或 --new :只获取最新更新的图片,强制设置起始扫描页为1,获取完毕后自动退出 例子:Sexy$ ./sexy.py -s 10 -e 12 -d cache -m 3 表示从第10页开始扫描到第12页,图片保存文件夹为cache,获取页面失败最多可以尝试3次。 3.运行期间可以随时按回车键退出程序。 运行atlas.py文件爬取网站二: 1.不带参数运行:直接运行atlas.py,使用默认配置参数,从主页开始爬取。 2.可用参数: -d 或 --dir :相对当前文件,下载图片保存位置,默认为atlas_images文件夹 -m 或 --max :获取页面失败后最大重试次数,默认为3 -v 或 --view :查看当前已知标签和标签id -t 或 --tag :爬取指定标签名的图片,同时提供标签id时,本标签无效 -i 或 --id :爬取指定标签id的图片 -l 或 --last :是否从上次退出的地方继续爬取,默认为false 3.运行过程中可以随时按Ctrl+C退出,退出时如果还有新发现标签没有归类,归类后自动退出。 4.setting文件中为已归类标签和最后抓取位置缓存,请勿删除。 测试环境: python 2.7 测试通过 License Copyright 2015 Jianan - [email protected] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Amoor123

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值