如何使‘cut‘命令将同一个sequental分隔符视为一个?

本文探讨了如何使用cut命令处理具有多个空格作为分隔符的文本流,并对比了awk、sed等工具的解决方案。介绍了cuts命令,一种增强版的cut工具,能够自动检测分隔符并处理复杂的文本解析任务。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文翻译自:How to make the 'cut' command treat same sequental delimiters as one?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. 我正在尝试从基于列的“空格”调整的文本流中提取某个(第四个)字段。 I'm trying to use the cut command in the following manner: 我正在尝试以下列方式使用cut命令:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. 不幸的是, cut不会将多个空格视为一个分隔符。 I could have piped through awk 我本可以通过awk进行管道传输

awk '{ printf $4; }'

or sed 或者sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively? 崩溃的空间,但我想知道有没有办法处理cut和几个分隔符本地?


#1楼

参考:https://stackoom.com/question/HNqe/如何使-cut-命令将同一个sequental分隔符视为一个


#2楼

shortest/friendliest solution 最短/最友好的解决方案

After becoming frustrated with the too many limitations of cut , I wrote my own replacement, which I called cuts for "cut on steroids". 成为沮丧的太多限制后cut ,我写我自己更换,我把它叫做cuts为“切类固醇”。

cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems. cut提供了最简单的解决方案,以及许多其他相关的剪切/粘贴问题。

One example, out of many, addressing this particular question: 解决这一特定问题的一个例子是:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts supports: cuts支持:

  • auto-detection of most common field-delimiters in files (+ ability to override defaults) 自动检测文件中最常见的字段分隔符(+覆盖默认值的能力)
  • multi-char, mixed-char, and regex matched delimiters multi-char,mixed-char和regex匹配分隔符
  • extracting columns from multiple files with mixed delimiters 从具有混合分隔符的多个文件中提取列
  • offsets from end of line (using negative numbers) in addition to start of line 除了行首之外,从行尾(使用负数)的偏移量
  • automatic side-by-side pasting of columns (no need to invoke paste separately) 自动并排粘贴列(无需单独调用paste
  • support for field reordering 支持现场重新排序
  • a config file where users can change their personal preferences 用户可以更改个人偏好的配置文件
  • great emphasis on user friendliness & minimalist required typing 非常注重用户友好性和极简主义所需的打字

and much more. 以及更多。 None of which is provided by standard cut . 这些都不是由标准cut提供的。

See also: https://stackoverflow.com/a/24543231/1296044 另见: https//stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/ 来源和文件(免费软件): http//arielf.github.io/cuts/


#3楼

As you comment in your question, awk is really the way to go. 当你在你的问题中发表评论时, awk真的是要走的路。 To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows. kev的答案所示,使用cut可以与tr -s一起挤压空间。

Let me however go through all the possible combinations for future readers. 但是,让我为未来的读者介绍所有可能的组合。 Explanations are at the Test section. 解释在测试部分。

tr | tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk AWK

awk '{print $4}' file

bash 庆典

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed SED

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests 测试

Given this file, let's test the commands: 给定此文件,让我们测试命令:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk AWK

$ awk '{print $4}' a
1
2
3
4

bash 庆典

This reads the fields sequentially. 这会按顺序读取字段。 By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. 通过使用_我们指出这是一个一次性变量作为忽略这些字段的“垃圾变量”。 This way, we store $myfield as the 4th field in the file, no matter the spaces in between them. 这样,我们将$myfield存储$myfield中的第4个字段,无论它们之间是否有空格。

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed SED

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3} . 这会捕获三组空格而没有空格([^ ]*[ ]*){3} Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \\1 . 然后,它抓住任何东西,直到作为第4个字段的空间,它最终用\\1打印。

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4

#4楼

This Perl one-liner shows how closely Perl is related to awk: 这个Perl单线程显示了Perl与awk的紧密关系:

perl -lane 'print $F[3]' text.txt

However, the @F autosplit array starts at index $F[0] while awk fields start with $1 然而, @F自动分割阵列开始于索引$F[0]而AWK字段开头$1


#5楼

With versions of cut I know of, no, this is not possible. 我知道cut版本,不,这是不可能的。 cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd ) and that have a fixed number of fields. cut主要用于解析分隔符不是空格的文件(例如/etc/passwd )并且具有固定数量的字段。 Two separators in a row mean an empty field, and that goes for whitespace too. 连续两个分隔符表示空字段,也适用于空白字符。


#6楼

Try: 尝试:

tr -s ' ' <text.txt | cut -d ' ' -f4

From the tr man page: tr手册页:

-s, --squeeze-repeats   replace each input sequence of a repeated character
                        that is listed in SET1 with a single occurrence
                        of that character
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值