sed学习小结

  最初开始接触到sed这个工具是源于linux内核,感觉很复杂,全是一些/ /之类的东西,完全看不懂,现在做android,我想查看他的编译过程,就想在那些.mk文件中插入一些内容来显示跑到那个文件中了,.mk文件很多,文件首位插入一行,我不可能一个个去做。于是想到了sed,在网络上看sed & awk这本书好像是很不错,就找来看了。中文的全是看上去很不爽的,字迹模糊,看了英文原版,还好不长,要不然就吃不消了。

  sed我个人认为主要是插入到bash脚本中使用的,专门用一个script来些我没有碰到过,排版方面的内容我没有做过,也不清楚。看看我在所有.mk文件首位各插入一行:

#!/bin/bash

while (($#>0))
do
    sed -i '1i/
$(info this is the file '"$1"' begin)' $1
    sed -i '$a/
$(info this is the file '"$1"' end)' $1
    shift
done

return 0

 

执行如下:

$./insert.sh `find . -name *.mk`

 

看看吧,我修改至少有100个文件,要一个个手动改我会腻味的,改不下的。

bash的内容我不做介绍,看一下sed,-i 选项指明修改源文件,其实内部也是先保存一个输出,在将输出覆盖源文件。单引号阻止bash解析字符,例如$等,而其中的$1需要它解析,就用了单引号内包双引号的这么一个格式。1为地址,指明第一行,同理,$指明为最后一行。i在当前行的行前插入一行,a在当前行的行尾插入一行。

 

sed使用格式简言之就是地址address后跟命令command,有差异的是address有可能是两个,采取这个的格式one-address,two-address表示两个address之间的全部内容,有的命令是不能用这种格式的。同一地址下多命令使用{}括起来,分行写,如下:

address{

command1

command2

command3

}

还有一种方式,比如你想模仿grep在文件中找含有hello这个字符串的行,并打印行号和内容,可以这样:

sed -n '/hello/{=;p}' file

其中,-n选项阻止其余行输出,=输出行号,p打印行内容。我的gnu sed版本是支持这种把两个命令放在一行的,posix好像不支持。这个例子本身没有什么意义,不如用grep来的直接了当,看起来还舒服些。

 

address通常用正则表达式来表示,置于//之间。当然你可以用数字来具体表示第几行,比如我上面那个脚本,1表示第一行,$表示最后一行。但是应用性估计不大。

 

通常来说,sed的执行流程是一行输入,匹配地址,满足的话就执行命令。当然,存在改变这一流程的语句,比如n命令。

 

下面贴一下sed&awk中所有的sed命令,我修改了排序:

=  [ address] =

Write to standard output the line number of addressed line.

p  [ address1[, address2]] p

Print the addressed line(s). Note that this can result in duplicateoutput unless default output is suppressed by using "#n" orthe -ncommand-line option. Typically used before commands that change flowcontrol (d, n,b) and might prevent the current line from beingoutput.

简单的打印命令,打印行号和内容,不做任何修改,只是输出。sed -n选项会阻止一般输出,这是需要用p打印内容。
i  [ address1] i/

text

Insert text before each line matched byaddress. (See a fordetails on text.)

a  [ address] a/

text

Append textfollowing each line matched by address. Iftext goes over more than one line, newlinesmust be "hidden" by preceding them with a backslash. Thetext will be terminated by the firstnewline that is not hidden in this way. Thetext is not available in the pattern spaceand subsequent commands cannot be applied to it. The results of thiscommand are sent to standard output when the list of editing commandsis finished, regardless of what happens to the current line in thepattern space.

c  [ address1[, address2]] c/

text

Replace (change) the lines selected by the address withtext. When a range of lines is specified,all lines as a group are replaced by a single copy oftext. The newline following each line oftext must be escaped by a backslash, exceptthe last line. The contents of the pattern space are, in effect,deleted and no subsequent editing commands can be applied to it (or totext).

这三个命令改变了模式空间(pattern space,就是在这个空间内对输入的每一行进行处理)的内容,但是修改(插入i/a和改变c)的内容是不可被修改的。
l  [ address1[, address2]] l

List the contents of the pattern space, showing nonprinting charactersas ASCII codes. Long lines are wrapped.

这个命令我从来没用到过,不过你可以用这个命令来显示有几个空格,前提是你记得ASCII编码。
d  [ address1[, address2]] d

Delete line(s) from pattern space. Thus, the line is not passed to standardoutput. A new line of input is read and editing resumes with firstcommand in script.

n  [ address1[, address2]] n

Read next line of input into pattern space. Current line is sent tostandard output. New line becomes current line and increments linecounter. Control passes to command following ninstead of resuming at the top of the script.

q  [ address] q

Quit when address is encountered. Theaddressed line is first written to output (if default output is notsuppressed), along with any text appended to it by previousa or r commands.

这三个命令,q的用处可以用b命令替换,d,n命令用于一些组合,d修改了流程,n命令不修改流程(就是接下去执行下一行),但是模式空间的内容被修改了。
s  [ address1[, address2]] s/ pattern/ replacement/[ flags]

Substitute replacement forpattern on each addressed line. If patternaddresses are used, the pattern // represents thelast pattern address specified. The following flags can be specified:

n

Replace nth instance of/pattern/ on each addressed line.n is any number in the range 1 to 512, andthe default is 1.

g

Replace all instances of /pattern/ on eachaddressed line, not just the first instance.

p

Print the line if a successful substitution is done. If severalsuccessful substitutions are done, multiple copies of the line will beprinted.

w file

Write the line to file if a replacementwas done. A maximum of 10 different filescan be opened.

y  [ address1[, address2]] y/ abc/ xyz/

Transform each character by position in stringabc to its equivalent in stringxyz.

上面两个都是替换命令,y命令的替换是逐字符的,比如y/abc/xyz/就是把所有的a换为x,b换为y,c换为z。s命令用的比较多的,上面的有提到的空模式是这样的。比如: /hello/s//hai/g 这一行表示的意思是,当前行如果有hello这个字符串,我们就把这一行的全部hello字串改为hai字串。 s后面跟的/为分割字符(delimiter),可以使用其他的,例如:s!hello/me!hai/you! 这里delimiter是!,当然,这种情况是正则中大量存在/字符用的,你也可以不停的转义,改为s/hello//me/hai//you/,这样看起来就比较复杂了。 replacement中可以使用元字符,主要是&,替换pattern中匹配的内容,比如你pattern为hel*o,匹配到的是helllllllo,就替换为这个字串。 /n,n为数字,用于替换pattern中用/(和/)之间的内容,第一个就是/1,第二个就是/2,依次类推。还有这么一种情况: s/[tab]// /2 上面的[tab]表示你type一下tab键。意思为,把第二个tab转换为新行。改为这样更好一点:s//t//n/2,因为我在命令行直接打tab键是不行的。不知道脚本中行不。

r  [ address] r file

Read contents of file and append after thecontents of the pattern space. Exactly one space must be put betweenr and the filename.

w   [ address1[, address2]] w file

Append contents of pattern space to file.This action occurs when the command is encountered rather than whenthe pattern space is output. Exactly one space must separate thew and the filename. A maximum of 10 differentfiles can be opened in a script. This command will create the file ifit does not exist; if the file exists, its contents will beoverwritten each time the script is executed. Multiple write commandsthat direct output to the same file append to the end of the file.

读写文件,没什么意思。
:  : label

Label a line in the script for the transfer of control byb or t.label may contain up to seven characters.(The POSIX standard says that an implementation can allow longerlabels if it wishes to. GNU sed allows labels to be of any length.)

b  [ address1[, address2]] b [ label]

Transfer control unconditionally (branch) to:label elsewhere inscript. That is, the command following thelabel is the next command applied to thecurrent line. If no label is specified,control falls through to the end of the script, so no more commandsare applied to the current line.

t  [ address1[, address2]] t [ label]

Test if successful substitutions have been made on addressed lines,and if so, branch to line marked by :label.(See b and :.) If label is notspecified, control falls through to bottom of script.

显然的,:label是用于辅助b和t命令的,改变了执行流程。t命令用于检测替换是否成功,所以一般跟在s命令之后。上一行命令成功,则执行跳转。
N  [ address1[, address2]] N

Append next input line to contents of pattern space; the new line isseparated from the previous contents of the pattern space by a newline. (This command is designed to allow pattern matches across twolines. Using /n to match the embedded newline, you can matchpatterns across multiple lines.)

D  [ address1[, address2]] D

Delete first part (up to embedded newline) of multiline pattern space createdby N command and resume editing with first command inscript. If this command empties the pattern space, then a new lineof input is read, as if the d command had been executed.

P  [ address1[, address2]] P

Print first part (up to embedded newline) of multiline pattern spacecreated by N command. Same as pif N has not been applied to a line.

这些命令配合可以达到一些很好的效果。举例: /^$/{ N /^/n$/D } 多个空行改为一个空行。
/UNIX$/{

        N

        //nSystem/{

        s// Operating &/

        P

        D

        }

}
用于把UNIX/nSystem这种形式的字串改为UNIX Operating/nSystem。

g  [ address1[, address2]] g

Copy (get) contents of hold space (see h orH command) into the pattern space, wiping outprevious contents.

G  [ address1[, address2]] G

Append newline followed by contents of hold space (seeh or H command) to contents ofthe pattern space. If hold space is empty, a newline is stillappended to the pattern space.

h  [ address1[, address2]] h

Copy pattern space into hold space, a special temporary buffer.Previous contents of hold space are wiped out.

H  [ address1[, address2]] H

Append newline and contents of pattern space to contents of the holdspace. Even if hold space is empty, this command still appends thenewline first.

x  [ address1[, address2]] x

Exchange contents of the pattern space with the contents of the holdspace.

这些命令主要使用了hold space,pattern space是一个处理当前内容的空间,hold space类似于一个仓库。开始的时候hold space是空的,这里的命令使用很考验灵活性,用的好作用很大,用的不好就没什么作用了。

贴一段代码欣赏一下:

#! /bin/sh

# phrase -- search for words across lines

# $1 = search string; remaining args = filenames

search=$1

shift

for file 

do

sed '

/'"$search"'/b

N

h

s/.*/n//

/'"$search"'/b

g

s/ */n/ /

/'"$search"'/{

g

b

}

g

D' $file 

done

这是一个bash脚本,不解释了。猜猜看什么作用,这个脚本还有可以完善的地方,比如一个phrase有三行甚至更多。

 

ok,总结就这么多了,sed也就这么多东西。有空把awk也记录一下。

 

2011-01-08 16:47:10

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值