awk grep sed_Unix之战！ -Sed，Grep，Awk，剪切和拉出PowerShell正则表达式捕获的组

最新推荐文章于 2024-09-10 10:15:00 发布

cunfuxiao7305

最新推荐文章于 2024-09-10 10:15:00 发布

阅读量293

点赞数

文章标签： java xml 编程语言 linux servlet

原文链接：https://www.hanselman.com/blog/unix-fight-sed-grep-awk-cut-and-pulling-groups-out-of-a-powershell-regular-expression-capture

版权

awk grep sed

There's a wonderful old programmers joke I've told for years:

我多年来一直在讲一个很棒的老程序员笑话：

"You've got a problem, and you've decided to use regular expressions to solve it.

“您遇到了问题，并且决定使用正则表达式来解决它。

Ok, now you've got two problems..."

好，现在您有两个问题……”

A friend of mine was talking on a social network and said something like:

我的一个朋友在社交网络上聊天时说：

"That decade I spent in the Windows world stunted my growth. one teeny-tiny unix command grabbed certain values from an XML doc for me."

“在Windows世界中度过的那十年阻碍了我的成长。一个很小的unix命令为我从XML文档中获取了某些值。”

Now, of course, I took this immediately as a personal challenge and rose up in a rit of fealous jage and defended my employer. Nah, not really as I worked at Nike on Unix for a number of years and I get the power of sed and awk and what not. However, he said XML, and well, PowerShell rocks XML.

当然，现在，我立即将此视为个人挑战，并以虚假的举动来捍卫自己的雇主。不，不是我在Unix上耐克公司工作多年的时候，我得到了sed和awk的力量，而没有。但是，他说XML，而PowerShell会摇摇XML。

Because it's a dynamic language, you can refer to XML nodes just like this:

因为它是一种动态语言，所以您可以像这样引用XML节点：

$a = ([xml](new-object net.webclient).downloadstring("http://feeds.feedburner.com/Hanselminutes"))
$a.rss.channel.item

The first line gets the feed and the second line gets all the items.

第一行获取提要，第二行获取所有项目。

However, turns out my friend was actually trying to retrieve values within poorly-formed XML fragments within a larger SQL dump file. There's three kinds of XML. Well-formed, valid, and crap. He was sifting through crap for some values. Basically he had this crazy text file with some fragments of XML within it and wanted the values in-between elements: "<FancyPants>He wants this value</FancyPants>."

但是，事实证明，我的朋友实际上是试图在较大SQL转储文件中检索格式不正确的XML片段中的值。 XML有三种。格式正确，有效且胡扯。他正在寻找一些有价值的东西。基本上，他有一个疯狂的文本文件，其中包含XML的一些片段，并且想要元素之间的值：“ <FancyPants>他想要这个值</ FancyPants>。 ”

Something like this:

像这样：

grep "<FancyPants>.*<.FancyPants>" test.txt | sed -e "s/^.*<FancyPants/<FancyPants/" | cut -f2 -d">"| cut -f1 -d"<" > fancyresults.txt

I'm old, but I'm not an expert in grep and sed so I'm sure there are ways he could have done it more tersely. There always is, right? With regular expressions, sometimes someone just types $@($*@)$(*@)(@*)@*(%@%# and Shakespeare pops out. You never know.

我很老，但是我不是grep和sed方面的专家，所以我敢肯定，他有很多方法可以做得更简洁。总有吧？ 使用正则表达式时，有时有人只键入$ @($ * @)$(* @)(@ *)@ *(％@％＃，然后莎士比亚就冒出来了。

There's also a lot of different ways to do this in PowerShell, but since he used RegExes, who am I to disagree?

在PowerShell中，还有很多其他方法可以执行此操作，但是由于他使用RegExes，我不同意谁？

First, here's the one line answer.

首先，这是单行答案。

cat test.txt | foreach-object {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>'; $matches.x}

But I thought I'd also sort them, remove duplicates...

但我以为我也会对它们进行排序，删除重复项...

cat test.txt | foreach-object {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>'; $matches.x} | sort | get-unique

But foreach-object can be aliased as % and get-unique can be just "gu" so the final answer is:

但是foreach-object可以别名为％，而get-unique可以只是“ gu”，因此最终答案是：

cat test.txt | % {$null = $_ -match '<FancyPants>(?<x>.*)<.FancyPants>';$matches.x} | sort | gu

I think we can agree at they are both hard to read. I still love PowerShell.

我认为我们可以同意，因为它们都很难读。我仍然喜欢PowerShell 。

Related Links:

相关链接：

翻译自: https://www.hanselman.com/blog/unix-fight-sed-grep-awk-cut-and-pulling-groups-out-of-a-powershell-regular-expression-capture

awk grep sed

cunfuxiao7305

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫