sqlrowset 转化为json_JSONPath:使用键和值提取单个dict(JSONPath: Extract single dict with keys and values)...

JSONPath:使用键和值提取单个dict(JSONPath: Extract single dict with keys and values)

我有一个在Azure Data Lake环境中运行的U-SQL应用程序。 它应该处理一个充满JSON数据的文件,看起来像这样,除了在现实生活中超过两行。

[

{"reports" : {"direction": "FWD", "drive": "STOPS", "frob_variable": 0}},

{"reports" : {"direction": "FWD", "drive": "CRANKS", "frob_variable": -3}}

]

在那个Data Lake工作中,我有以下几行:

@json =

EXTRACT direction string, drive string, frob_variable int FROM @"/input/file.json"

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("reports");

当我将该@json变量的内容转储到文本文件时,我得到空值:零长度字符串和零值整数。 我确实获得了正确的输出行数,因此它必须迭代我的所有输入。

对JsonExtractor的源代码进行了一些JsonExtractor向我展示了我指定的JsonPath值(“reports”)似乎正在返回带有嵌入式dict的“reports”键。 如果我尝试JsonPath值“reports。*”我会得到嵌入值(例如, { "FWD", "STOPS", 0 } )但我真的希望键与它们一起使用所以SELECT direction, drive, frob_variable会返回一些有用的东西。

长话短说,我正在寻找一种方法来从内部字典中提取键和值。 因此, EXTRACT所需输出将是一个行集,其列为“direction”,“drive”和“frob_variable”,其值如源数据中所示。 似乎应该有一个JsonPath解决方案或U-SQL中的简单解决方法。

I have a U-SQL application that runs in the Azure Data Lake environment. It's supposed to process a file full of JSON data that looks like this, except for being a lot more than two rows in real life.

[

{"reports" : {"direction": "FWD", "drive": "STOPS", "frob_variable": 0}},

{"reports" : {"direction": "FWD", "drive": "CRANKS", "frob_variable": -3}}

]

In that Data Lake job, I have the following line:

@json =

EXTRACT direction string, drive string, frob_variable int FROM @"/input/file.json"

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("reports");

When I dump the contents of that @json variable to a text file I get empty values: zero-length strings and zero-valued integers. I do get the correct number of output rows though, so it must be iterating over all my input.

A bit of poking around the source code to the JsonExtractor shows me that my specified JsonPath value ("reports") seems to be returning the "reports" key with the embedded dict. If I try a JsonPath value of "reports.*" I do get the embedded values (e.g., { "FWD", "STOPS", 0 }) but I really wanted the keys to go along with them so SELECT direction, drive, frob_variable would return something useful.

Long story short, I'm looking for a way to pull the keys and values from that inner dict. Thus my desired output from the EXTRACT would be a rowset whose columns are "direction", "drive", and "frob_variable" and whose values are as indicated in the source data. It seems like there should be a JsonPath solution or a simple workaround in U-SQL.

原文:https://stackoverflow.com/questions/41044273

更新时间:2020-09-20 17:09

最满意答案

@extract =

EXTRACT

reports String

FROM @"/input/file.json"

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@relation =

SELECT

Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(reports)

AS report

FROM @extract;

@fields =

SELECT

report["direction"] AS direction,

report["drive"] AS drive,

Int32.Parse(report["frob_variable"]) AS frob

FROM @relation;

@extract =

EXTRACT

reports String

FROM @"/input/file.json"

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@relation =

SELECT

Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(reports)

AS report

FROM @extract;

@fields =

SELECT

report["direction"] AS direction,

report["drive"] AS drive,

Int32.Parse(report["frob_variable"]) AS frob

FROM @relation;

2017-05-23

相关问答

你犯的错误是Array.concat返回一个新的数组含义 dictVal = dictVal.concat(val);

是你想要的结果,你想要的结果。 或者你也可以 for (var key in dict) {

var val = dict[key];

dictVal.push(val);

console.log(dictVal);

}

如果您不想生成新数组。 此外,还有更好的方法可以执行您想要的操作,例如将对象的键映射到值: var dict = {Name: 'Chris',

...

使用以下命令获取名称和图像 kubectl get pods -ao jsonpath='{range .items[*]}{@.metadata.name}{" "}{@.spec.template.spec.containers[].image}{"\n"}{end}' 它将提供如下名称图像的输出 Use below command to get name and image kubectl get pods -ao jsonpath='{range .items[*]}{@.metadata

...

有很多方法可以做到这一点,但实质上,你想要比较你的dict中每一对可能的键。 最简单的方法是不重新发明轮子并使用itertools: import itertools

for k1, k2 in itertools.combinations(image_dict_copy, 2):

if hamming_distance(image_dict_copy[k1], image_dict_copy[k2]) > .85:

duplicates.append((k1, k2))

...

from collections import defaultdict, OrderedDict

dic = defaultdict(OrderedDict)

rowKeys = ['1','2','3']

columnKeys = ['alfa', 'omega', 'bravo', 'charlie']

# Filling up the dictionary with values

from random import randrange

for rKey in rowKeys:

...

使用集合更有效(如果需要,可以将它们分组在dict中): vowels = set(['aa', 'ae', 'ah', 'ao', 'eh', 'er', 'ey', 'ih', 'iy', 'uh', 'uw', 'o'])

consonants = set(['b', 'ch', 'd', 'dh', 'dx', 'f', 'g', 'hh', 'jh', 'k', 'l', 'm', 'n', 'ng', 'p', 'r', 's', 'sh', 't', 'th', 'v', 'w',

...

是的,它仍然是真的,因为dict.keys()和dict.values() 不需要分配新对象 。 您将获得一个列表对象,分别引用现有的键或值对象。 另一方面, dict.items()需要创建元组对象来保存键值对。 因此,评论。 请注意,运行GC 本身不会调整字典的大小; dicitonaries只有在空间用完新密钥时才会重新调整大小。 但是触发GC可以触发一个__del__处理程序,它可以添加到字典中,这可能会调整大小。所以它可能会触发其他Python代码,这可能会改变这里阻止的字典。 Yes,

...

是。 关键是你不应该在调用d.values()和d.keys()之间修改d 。 Yes. The point is you should not modify d between calling d.values() and d.keys().

使用dict comprehension单个班轮: >>> {k:v for k, v in b.items() if list(b.values()).count(v) > 1}

这里要确定其他值是否也包含相同值,我们检查计数是否大于1。 #driver值: IN : b = {(1, 1): '4', (2, 1): '4',(3,1):'8',(4,2):'9',(2,4):'10'}

OUT : {(1, 1): '4', (2, 1): '4'}

A single liner usi

...

这个你需要reduce() ...... attrmap = {

"new_key_1": ('subdict1', 'subdict2', 'old_key_1'),

...

}

print reduce(lambda x, y: x[y], attrmap[somekey], old_object)

You're going to need reduce() for this one... attrmap = {

"new_key_1": ('subdict1', 'subd

...

@extract =

EXTRACT

reports String

FROM @"/input/file.json"

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

@relation =

SELECT

Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(repor

...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值