python字典输出键值没有引号_关于python：从带引号的字符串中提取键值对

最新推荐文章于 2023-03-07 08:51:23 发布

weixin_39744316

最新推荐文章于 2023-03-07 08:51:23 发布

阅读量2.8k

点赞数

文章标签： python字典输出键值没有引号

我在为此要求编写"优雅"解析器时遇到麻烦。 (看起来像不是一份C早餐)。输入是一个字符串，键值对以"，"分隔，并以" ="分隔。

key1=value1,key2=value2

欺骗我的部分是值可以用引号(")引起来，并且在引号'，'内不结束键。

key1=value1,key2="value2,still_value2"

这最后一部分使我难以使用split或re.split，在循环范围内使用for ::。

谁能演示一种干净的方法来做到这一点？

可以假定引号仅出现在值中，并且没有空格或非字母数字字符。

你能发布期望的输出吗？

第二个示例中的key2值是否包含引号？即在您的示例中，key2是否映射到"value2,still_value2"或""value2,still_value2""？

使用拆分字符串中的一些正则表达式魔术，尊重并保留引号，我们可以做到：

import re

string = 'key1=value1,key2="value2,still_value2"'

key_value_pairs = re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', string)

for key_value_pair in key_value_pairs:

key, value = key_value_pair.split("=")

Per BioGeek，我的猜测是，我的意思是解释Janne Karila使用的正则表达式：该模式在逗号处打断字符串，但在此过程中尊重双引号(可能带有逗号)。它有两个单独的选项：不包含引号的字符行；和双引号字符运行，其中双引号结束了运行，除非转义(反斜杠)：

(?: # parenthesis for alternation (|), not memory

[^\s,"] # any 1 character except white space, comma or quote

| # or

"(?:\\.|[^"])*" # a quoted string containing 0 or more characters

# other than quotes (unless escaped)

)+ # one or more of the above

您能否添加一些关于正则表达式如何工作的解释。

@BioGeek，我尝试根据您的请求，让我知道我是否成功！

cdlane，谢谢您的解释！

我建议不要为此任务使用正则表达式，因为您要解析的语言不是正则。

您有一个包含多个键值对的字符串。解析此错误的最佳方法不是匹配其上的模式，而是对其进行正确标记。

Python标准库中有一个名为shlex的模块，该模块模仿POSIX shell进行的解析，并提供可以轻松定制以满足您的需求的lexer实现。

from shlex import shlex

def parse_kv_pairs(text, item_sep=",", value_sep="="):

"""Parse key-value pairs from a shell-like text."""

# initialize a lexer, in POSIX mode (to properly handle escaping)

lexer = shlex(text, posix=True)

# set ',' as whitespace for the lexer

# (the lexer will use this character to separate words)

lexer.whitespace = item_sep

# include '=' as a word character

# (this is done so that the lexer returns a list of key-value pairs)

# (if your option key or value contains any unquoted special character, you will need to add it here)

lexer.wordchars += value_sep

# then we separate option keys and values to build the resulting dictionary

# (maxsplit is required to make sure that '=' in value will not be a problem)

return dict(word.split(value_sep, maxsplit=1) for word in lexer)

示例运行：

parse_kv_pairs(

'key1=value1,key2=\'value2,still_value2,not_key1="not_value1"\''

)

输出：

{'key1': 'value1', 'key2': 'value2,still_value2,not_key1="not_value1"'}

编辑：我忘了补充一点，我通常坚持使用shlex而不是使用正则表达式(在这种情况下更快)的原因是，它给您带来的惊喜更少，特别是如果您以后需要允许更多可能的输入时。我从未发现过如何用正则表达式正确解析此类键值对，总会有一些输入(例如：A="B=\"1,2,3\"")会欺骗引擎。

如果您不关心这样的输入(或者换句话说，如果您可以确保输入遵循常规语言的定义)，则正则表达式就可以了。

EDIT2：split有一个maxsplit参数，它比拆分/切片/联接使用起来更简洁。感谢@cdlane的声音输入！

我相信shlex是一个可靠的生产解决方案，这是一个如何针对当前问题进行调整的好例子。但是，这个答案在return语句中使我失去了所有的优雅-两次split()相同的数据，然后join()在过多的split()之后进行清理，只是为了使用字典理解？ return dict(word.split(value_sep, maxsplit=1) for word in lexer)怎么样

是的，这是更好的方法，我在编写时忘记了maxsplit参数，并且在对值添加对=的支持时确实使它不太优雅。感谢您的建议，我编辑了答案。

我想出了这个正则表达式解决方案：

import re

match = re.findall(r'([^=]+)=(("[^"]+")|([^,]+)),?', 'key1=value1,key2=value2,key3="value3,stillvalue3",key4=value4')

这使得"匹配"：

[('key1', 'value1', '', 'value1'), ('key2', 'value2', '', 'value2'), ('key3', '"value3,stillvalue3"', '"value3,stillvalue3"', ''), ('key4', 'value4', '', 'value4')]

然后，您可以进行for循环以获取键和值：

for m in match:

key = m[0]

value = m[1]

我不确定它看起来不像是C早餐，并且它是否很优雅:)

data = {}

original = 'key1=value1,key2="value2,still_value2"'

converted = ''

is_open = False

for c in original:

if c == ',' and not is_open:

c = '

elif c in ('"',"'"):

is_open = not is_open

converted += c

for item in converted.split('

'):

k, v = item.split('=')

data[k] = v

根据其他几个答案，我提出了以下解决方案：

import re

import itertools

data = 'key1=value1,key2="value2,still_value2"'

# Based on Alan Moore's answer on http://stackoverflow.com/questions/2785755/how-to-split-but-ignore-separators-in-quoted-strings-in-python

def split_on_non_quoted_equals(string):

return re.split('''=(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''', string)

def split_on_non_quoted_comma(string):

return re.split(''',(?=(?:[^'"]|'[^']*'|"[^"]*")*$)''', string)

split1 = split_on_non_quoted_equals(data)

split2 = map(lambda x: split_on_non_quoted_comma(x), split1)

# 'Unpack' the sublists in to a single list. Based on Alex Martelli's answer on http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python

flattened = [item for sublist in split2 for item in sublist]

# Convert alternating elements of a list into keys and values of a dictionary. Based on Sven Marnach's answer on http://stackoverflow.com/questions/6900955/python-convert-list-to-dictionary

d = dict(itertools.izip_longest(*[iter(flattened)] * 2, fillvalue=""))

结果d是以下字典：

{'key1': 'value1', 'key2': '"value2,still_value2"'}

weixin_39744316

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python字典输出键值没有引号_关于python：从带引号的字符串中提取键值对

我在为此要求编写"优雅"解析器时遇到麻烦。 (看起来像不是一份C早餐)。输入是一个字符串，键值对以"，"分隔，并以" ="分隔。key1=value1,key2=value2欺骗我的部分是值可以用引号(")引起来，并且在引号'，'内不结束键。key1=value1,key2="value2,still_value2"这最后一部分使我难以使用split或re.split，在循环范围内使用for :...
复制链接

扫一扫