python设置字符串key和value并转为字典输出,将key = value对转换回Python字典

最新推荐文章于 2023-12-01 08:45:00 发布

汪国

最新推荐文章于 2023-12-01 08:45:00 发布

阅读量397

点赞数

文章标签： python设置字符串key和value并转为字典输出

There's a logfile with text in the form of space-separated key=value pairs, and each line was originally serialized from data in a Python dict, something like:

' '.join([f'{k}={v!r}' for k,v in d.items()])

The keys are always just strings. The values could be anything that ast.literal_eval can successfully parse, no more no less.

How to process this logfile and turn the lines back into Python dicts? Example:

>>> to_dict("key='hello world'")

{'key': 'hello world'}

>>> to_dict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'}

>>> to_dict("s='1234' n=1234")

{'s': '1234', 'n': 1234}

>>> to_dict("""k4='k5="hello"' k5={'k6': ['potato']}""")

{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}

Here is some extra context about the data:

Input lines are well-formed (e.g. no dangling brackets)

The data is trusted (unsafe functions such as eval, exec, yaml.load are OK to use)

Order is not important. Performance is not important. Correctness is important.

Edit: As requested in the comments, here is an MCVE and an example code that didn't work correctly

>>> def to_dict(s):

... s = s.replace(' ', ', ')

... return eval(f"dict({s})")

...

>>> to_dict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'} # OK

>>> to_dict("s='1234' n=1234")

{'s': '1234', 'n': 1234} # OK

>>> to_dict("key='hello world'")

{'key': 'hello, world'} # Incorrect, the value was corrupted

解决方案

Your input can't be conveniently parsed by something like ast.literal_eval, but it can be tokenized as a series of Python tokens. This makes things a bit easier than they might otherwise be.

The only place = tokens can appear in your input is as key-value separators; at least for now, ast.literal_eval doesn't accept anything with = tokens in it. We can use the = tokens to determine where the key-value pairs start and end, and most of the rest of the work can be handled by ast.literal_eval. Using the tokenize module also avoids problems with = or backslash escapes in string literals.

import ast

import io

import tokenize

def todict(logstring):

# tokenize.tokenize wants an argument that acts like the readline method of a binary

# file-like object, so we have to do some work to give it that.

input_as_file = io.BytesIO(logstring.encode('utf8'))

tokens = list(tokenize.tokenize(input_as_file.readline))

eqsign_locations = [i for i, token in enumerate(tokens) if token[1] == '=']

names = [tokens[i-1][1] for i in eqsign_locations]

# Values are harder than keys.

val_starts = [i+1 for i in eqsign_locations]

val_ends = [i-1 for i in eqsign_locations[1:]] + [len(tokens)]

# tokenize.untokenize likes to add extra whitespace that ast.literal_eval

# doesn't like. Removing the row/column information from the token records

# seems to prevent extra leading whitespace, but the documentation doesn't

# make enough promises for me to be comfortable with that, so we call

# strip() as well.

val_strings = [tokenize.untokenize(tok[:2] for tok in tokens[start:end]).strip()

for start, end in zip(val_starts, val_ends)]

vals = [ast.literal_eval(val_string) for val_string in val_strings]

return dict(zip(names, vals))

This behaves correctly on your example inputs, as well as on an example with backslashes:

>>> todict("key='hello world'")

{'key': 'hello world'}

>>> todict("k1='v1' k2='v2'")

{'k1': 'v1', 'k2': 'v2'}

>>> todict("s='1234' n=1234")

{'s': '1234', 'n': 1234}

>>> todict("""k4='k5="hello"' k5={'k6': ['potato']}""")

{'k4': 'k5="hello"', 'k5': {'k6': ['potato']}}

>>> s=input()

a='=' b='"\'' c=3

>>> todict(s)

{'a': '=', 'b': '"\'', 'c': 3}

Incidentally, we probably could look for token type NAME instead of = tokens, but that'll break if they ever add set() support to literal_eval. Looking for = could also break in the future, but it doesn't seem as likely to break as looking for NAME tokens.

汪国

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python设置字符串key和value并转为字典输出,将key = value对转换回Python字典

There's a logfile with text in the form of space-separated key=value pairs, and each line was originally serialized from data in a Python dict, something like:' '.join([f'{k}={v!r}' for k,v in d.items...
复制链接

扫一扫