python设置字符串key和value并转为字典输出,将key = value对转换回Python字典

There's a logfile with text in the form of space-separated key=value pairs, and each line was originally serialized from data in a Python dict, something like:

' '.join([f'{k}={v!r}' for k,v in d.items()])

The keys are always just strings. The values could be anything that ast.literal_eval can successfully parse, no more no less.

How to process this logfile and turn the lines back into Python dicts? Example:

>>> to_dict("key='hello world'")

{'key': 'hello world'}

>>> to_dict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'}

>>> to_dict("s='1234' n=1234")

{'s': '1234', 'n': 1234}

>>> to_dict("""k4='k5="hello"' k5={'k6': ['potato']}""")

{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}

Here is some extra context about the data:

Input lines are well-formed (e.g. no dangling brackets)

The data is trusted (unsafe functions such as eval, exec, yaml.load are OK to use)

Order is not important. Performance is not important. Correctness is important.

Edit: As requested in the comments, here is an MCVE and an example code that didn't work correctly

>>> def to_dict(s):

... s = s.replace(' ', ', ')

... return eval(f"dict({s})")

...

...

>>> to_dict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'} # OK

>>> to_dict("s='1234' n=1234")

{'s': '1234', 'n': 1234} # OK

>>> to_dict("key='hello world'")

{'key': 'hello, world'} # Incorrect, the value was corrupted

解决方案

Your input can't be conveniently parsed by something like ast.literal_eval, but it can be tokenized as a series of Python tokens. This makes things a bit easier than they might otherwise be.

The only place = tokens can appear in your input is as key-value separators; at least for now, ast.literal_eval doesn't accept anything with = tokens in it. We can use the = tokens to determine where the key-value pairs start and end, and most of the rest of the work can be handled by ast.literal_eval. Using the tokenize module also avoids problems with = or backslash escapes in string literals.

import ast

import io

import tokenize

def todict(logstring):

# tokenize.tokenize wants an argument that acts like the readline method of a binary

# file-like object, so we have to do some work to give it that.

input_as_file = io.BytesIO(logstring.encode('utf8'))

tokens = list(tokenize.tokenize(input_as_file.readline))

eqsign_locations = [i for i, token in enumerate(tokens) if token[1] == '=']

names = [tokens[i-1][1] for i in eqsign_locations]

# Values are harder than keys.

val_starts = [i+1 for i in eqsign_locations]

val_ends = [i-1 for i in eqsign_locations[1:]] + [len(tokens)]

# tokenize.untokenize likes to add extra whitespace that ast.literal_eval

# doesn't like. Removing the row/column information from the token records

# seems to prevent extra leading whitespace, but the documentation doesn't

# make enough promises for me to be comfortable with that, so we call

# strip() as well.

val_strings = [tokenize.untokenize(tok[:2] for tok in tokens[start:end]).strip()

for start, end in zip(val_starts, val_ends)]

vals = [ast.literal_eval(val_string) for val_string in val_strings]

return dict(zip(names, vals))

This behaves correctly on your example inputs, as well as on an example with backslashes:

>>> todict("key='hello world'")

{'key': 'hello world'}

>>> todict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'}

>>> todict("s='1234' n=1234")

{'s': '1234', 'n': 1234}

>>> todict("""k4='k5="hello"' k5={'k6': ['potato']}""")

{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}

>>> s=input()

a='=' b='"\'' c=3

>>> todict(s)

{'a': '=', 'b': '"\'', 'c': 3}

Incidentally, we probably could look for token type NAME instead of = tokens, but that'll break if they ever add set() support to literal_eval. Looking for = could also break in the future, but it doesn't seem as likely to break as looking for NAME tokens.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值