JSONPath:使用键和值提取单个dict(JSONPath: Extract single dict with keys and values)
我有一个在Azure Data Lake环境中运行的U-SQL应用程序。 它应该处理一个充满JSON数据的文件,看起来像这样,除了在现实生活中超过两行。
[
{"reports" : {"direction": "FWD", "drive": "STOPS", "frob_variable": 0}},
{"reports" : {"direction": "FWD", "drive": "CRANKS", "frob_variable": -3}}
]
在那个Data Lake工作中,我有以下几行:
@json =
EXTRACT direction string, drive string, frob_variable int FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("reports");
当我将该@json变量的内容转储到文本文件时,我得到空值:零长度字符串和零值整数。 我确实获得了正确的输出行数,因此它必须迭代我的所有输入。
对JsonExtractor的源代码进行了一些JsonExtractor向我展示了我指定的JsonPath值(“reports”)似乎正在返回带有嵌入式dict的“reports”键。 如果我尝试JsonPath值“reports。*”我会得到嵌入值(例如, { "FWD", "STOPS", 0 } )但我真的希望键与它们一起使用所以SELECT direction, drive, frob_variable会返回一些有用的东西。
长话短说,我正在寻找一种方法来从内部字典中提取键和值。 因此, EXTRACT所需输出将是一个行集,其列为“direction”,“drive”和“frob_variable”,其值如源数据中所示。 似乎应该有一个JsonPath解决方案或U-SQL中的简单解决方法。
I have a U-SQL application that runs in the Azure Data Lake environment. It's supposed to process a file full of JSON data that looks like this, except for being a lot more than two rows in real life.
[
{"reports" : {"direction": "FWD", "drive": "STOPS", "frob_variable": 0}},
{"reports" : {"direction": "FWD", "drive": "CRANKS", "frob_variable": -3}}
]
In that Data Lake job, I have the following line:
@json =
EXTRACT direction string, drive string, frob_variable int FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("reports");
When I dump the contents of that @json variable to a text file I get empty values: zero-length strings and zero-valued integers. I do get the correct number of output rows though, so it must be iterating over all my input.
A bit of poking around the source code to the JsonExtractor shows me that my specified JsonPath value ("reports") seems to be returning the "reports" key with the embedded dict. If I try a JsonPath value of "reports.*" I do get the embedded values (e.g., { "FWD", "STOPS", 0 }) but I really wanted the keys to go along with them so SELECT direction, drive, frob_variable would return something useful.
Long story short, I'm looking for a way to pull the keys and values from that inner dict. Thus my desired output from the EXTRACT would be a rowset whose columns are "direction", "drive", and "frob_variable" and whose values are as indicated in the source data. It seems like there should be a JsonPath solution or a simple workaround in U-SQL.
原文:https://stackoverflow.com/questions/41044273
更新时间:2020-09-20 17:09
最满意答案
@extract =
EXTRACT
reports String
FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@relation =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(reports)
AS report
FROM @extract;
@fields =
SELECT
report["direction"] AS direction,
report["drive"] AS drive,
Int32.Parse(report["frob_variable"]) AS frob
FROM @relation;
@extract =
EXTRACT
reports String
FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@relation =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(reports)
AS report
FROM @extract;
@fields =
SELECT
report["direction"] AS direction,
report["drive"] AS drive,
Int32.Parse(report["frob_variable"]) AS frob
FROM @relation;
2017-05-23
相关问答
你犯的错误是Array.concat返回一个新的数组含义 dictVal = dictVal.concat(val);
是你想要的结果,你想要的结果。 或者你也可以 for (var key in dict) {
var val = dict[key];
dictVal.push(val);
console.log(dictVal);
}
如果您不想生成新数组。 此外,还有更好的方法可以执行您想要的操作,例如将对象的键映射到值: var dict = {Name: 'Chris',
...
使用以下命令获取名称和图像 kubectl get pods -ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.template.spec.containers[].image}{"\n"}{end}' 它将提供如下名称图像的输出 Use below command to get name and image kubectl get pods -ao jsonpath='{range .items[*]}{@.metadata
...
有很多方法可以做到这一点,但实质上,你想要比较你的dict中每一对可能的键。 最简单的方法是不重新发明轮子并使用itertools: import itertools
for k1, k2 in itertools.combinations(image_dict_copy, 2):
if hamming_distance(image_dict_copy[k1], image_dict_copy[k2]) > .85:
duplicates.append((k1, k2))
...
from collections import defaultdict, OrderedDict
dic = defaultdict(OrderedDict)
rowKeys = ['1','2','3']
columnKeys = ['alfa', 'omega', 'bravo', 'charlie']
# Filling up the dictionary with values
from random import randrange
for rKey in rowKeys:
...
使用集合更有效(如果需要,可以将它们分组在dict中): vowels = set(['aa', 'ae', 'ah', 'ao', 'eh', 'er', 'ey', 'ih', 'iy', 'uh', 'uw', 'o'])
consonants = set(['b', 'ch', 'd', 'dh', 'dx', 'f', 'g', 'hh', 'jh', 'k', 'l', 'm', 'n', 'ng', 'p', 'r', 's', 'sh', 't', 'th', 'v', 'w',
...
是的,它仍然是真的,因为dict.keys()和dict.values() 不需要分配新对象 。 您将获得一个列表对象,分别引用现有的键或值对象。 另一方面, dict.items()需要创建元组对象来保存键值对。 因此,评论。 请注意,运行GC 本身不会调整字典的大小; dicitonaries只有在空间用完新密钥时才会重新调整大小。 但是触发GC可以触发一个__del__处理程序,它可以添加到字典中,这可能会调整大小。所以它可能会触发其他Python代码,这可能会改变这里阻止的字典。 Yes,
...
是。 关键是你不应该在调用d.values()和d.keys()之间修改d 。 Yes. The point is you should not modify d between calling d.values() and d.keys().
使用dict comprehension单个班轮: >>> {k:v for k, v in b.items() if list(b.values()).count(v) > 1}
这里要确定其他值是否也包含相同值,我们检查计数是否大于1。 #driver值: IN : b = {(1, 1): '4', (2, 1): '4',(3,1):'8',(4,2):'9',(2,4):'10'}
OUT : {(1, 1): '4', (2, 1): '4'}
A single liner usi
...
这个你需要reduce() ...... attrmap = {
"new_key_1": ('subdict1', 'subdict2', 'old_key_1'),
...
}
print reduce(lambda x, y: x[y], attrmap[somekey], old_object)
You're going to need reduce() for this one... attrmap = {
"new_key_1": ('subdict1', 'subd
...
@extract =
EXTRACT
reports String
FROM @"/input/file.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
@relation =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(repor
...