sed - 非交互式文本编辑器

最新推荐文章于 2025-02-19 09:21:14 发布

bingzhuan

最新推荐文章于 2025-02-19 09:21:14 发布

阅读量610

点赞数

分类专栏： SHELL 编程文章标签：文本编辑正则表达式 concatenation unix exchange delete

SHELL 编程专栏收录该内容

31 篇文章 0 订阅

订阅专栏

sed - 非交互式文本编辑器

Lee E. McMahon

Bell Laboratories
Murray Hill, New Jersey 07974

翻译：寒蝉退士

译者声明：译者对译文不做任何担保，译者对译文不拥有任何权利并且不负担任何责任和义务。
原文：http://cm.bell-labs.com/7thEdMan/vol2/sed

摘要

sed 是在 UNIX ^®操作系统上运行的一个非交互式上下文编辑器。sed 被设计在下列三种情况下发挥作用:

1) 编辑那些对舒适的交互式编辑而言太大的文件。
2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。
3) 要在对输入的一趟扫描中有效的进行多个‘全局’编辑函数。

本备忘录是给 sed 用户的手册。

August 15, 1978

介绍

sed 是一个非交互式上下文(context)编辑器，它被设计在下列三种情况下发挥作用:

1) 编辑那些对舒适的交互式编辑而言太大的文件。2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。3) 要在对输入的一趟扫描中有效的进行多个‘全局’(global)编辑函数。

因为每次只把输入的某些行驻留在内存中，并且不使用临时文件，所以可编辑的文件的有效大小，只受限于输入和输出要同时共存于次级存储的要求。

可以单独的建立复杂的编辑脚本并作为给 sed 的命令文件。对于复杂的编辑，这节省了可观的键入和随之而来的错误。从命令文件运行 sed 高效于作者所知道的任何交互式编辑器，甚至包括能用预先写好的脚本驱动的编辑器。

相较于交互式编辑器而言，根本性的损失是缺乏相对地址(由于操作是每次一行的)，和缺乏对命令如期运行的立即验证。

sed 是 UNIX 编辑器 ed 的直系后代。由于在交互式和非交互式操作之间的差异，在 ed 和 sed 之间已经有了可观的变化；甚至 ed 的惯常用户都会经常感到惊讶(并可能气愤)，如果他们没有阅读本文档的章节 2 和 3，就草率的使用 sed 的话。在两个编辑器之间最显著的家族性共同之处，在于他们所识别的模式(‘正则表达式’)的种类；匹配模式的代码可以从 ed 的代码几乎原封不动的复制过来，在章节 2 中对正则表达式的描述就是从 UNIX Programmer’s Manual[1] 几乎原封不动的复制过来的。(代码和描述都是 Dennis M. Ritchie 写的)。

1. 整体操作

sed 缺省的把标准输入复制到标准输出，在把每行写到输出之前可能在其上进行一个或多个编辑命令。这种行为可以通过命令行上的标志来更改；参见下面的章节 1.1。

编辑命令的一般格式为:

[地址1,地址2][函数][参数]

一个或两个地址是可以省略的；地址的格式在章节 2 中给出。可以用任何数目的空白或 tab 把地址和函数分隔开。函数必须出现；在章节 3 中讨论可用的所有命令。依据给出的是哪个函数，参数可能是必需的或是可选的；它们在章节 3 中每个单独的函数之下讨论。

忽略在这些行开始处的 tab 字符和空格。

1.1. 命令行标志

在命令行上识别三个标志:

-n：告诉 sed 不复制所有的行，只复制 p 函数或在 s 函数后 p 标志所指定的行(参见章节 3.3)。
-e：告诉 sed 把下一个参数接受为编辑命令。
-f：告诉 sed 把下一个参数接受为文件名；这个文件应当包含一行一个的编辑命令。

1.2. 编辑命令的应用次序

在做任何编辑之前(实际上，甚至在打开任何文件之前)，所有编辑命令都被编译成了在执行阶段(在把这些命令实际应用于输入文件的行的时候)有适当效率的形式。按它们出现的次序编译这些命令；一般而言这也是在执行时尝试应用它们的次序。这些命令一次应用一个；给每个命令的输入都是所有前面命令的输出。

编译命令应用的缺省的线性次序可以通过控制流命令 t 和 b 来变更(参见章节 3)。即使在应用次序被这些命令改变的时候，给任何命令的输入仍是任何此前应用的命令的输出。

1.3. 模式空间

模式匹配的范围叫做模式空间。一般而言，模式空间是输入文本中某一行，但是可以通过使用 N 命令把多于一行读入模式空间(参见章节 3.6.)。

1.4. 示例

例子分散在正文中。除非特别说明，例子都假定了下列输入文本:

In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.

(在任何情况下 sed 命令的输出都不能被当作是对 Coleridge 作品的改进。)

例子:

命令

2q

会在复制了输入的前两行之后退出。输出将是:

In Xanadu did Kubla Khan
A stately pleasure dome decree:

2. 地址: 选择要编辑的行

编辑命令要应用于其上的，输入文件中的行可以通过地址来选择。地址可以是行号或者是上下文地址。

通过用花括号(‘{ }’)组合(group)命令，可以用一个地址(或地址对)来控制一组命令的应用(参见章节 3.6.)。

2.1. 行号地址

行号是十进制整数。在从输入读入每一行的时候，增加一个行号计数器；行号地址匹配(选择)导致这个内部计数器等于地址行号的输入行。计数器在多个输入文件上累计运行，在打开一个新文件的时候它不被复零(reset)。

作为特殊情况，字符 $ 匹配输入文件的最后一行。

2.2. 上下文地址

上下文地址是包围在斜杠中(‘/’)的模式(‘正则表达式’)。sed 识别的正则表达式被构造如下:

1) 普通字符(不是下面讨论的某个字符)是一个正则表达式，并且匹配这个字符。
2) 在正则表达式开始处的‘^’符号(circumflex)匹配在行开始处的空(null)字符。
3) 在正则表达式结束处的美元符号‘$’匹配在行结束处的空字符。
4) 字符‘/n’匹配内嵌的换行字符，而不是在模式空间结束处的换行。
5) 点‘.’匹配除了模式空间的终止换行之外的任何字符。
6) 跟随着星号‘*’的正则表达式，匹配它所跟丛的正则表达式的任何数目(包括 0)的毗连出现。
7) 在方括号‘[ ]’内的字符串，匹配在字符串内的任何字符，而非其他。但是如果这个字符串的第一个字符是‘^’符号，正则表达式匹配除了在这个字符串内的字符和模式空间的终止换行之外的任何字符。
8) 正则表达式的串联(concatenation)是正则表达式，它匹配这个正则表达式的成员所匹配的字符串的串联。
9) 在顺序的‘/(’和‘/)’之间的正则表达式，在效果上等同于没有它修饰的正则表达式，但它有个副作用，将在下面的 s 命令和紧后面的规定 10 中描述。
10) 表达式‘/d’意味着与在同一个表达式中先前的‘/(’和‘/)’中包围的表达式所匹配的那些字符同样的字符串。这里的 d 是一个单一的数字；指定的字符串是‘/(’的从左至右的第 d 个出现所起始的字符串。例如，表达式‘^/(.*/)/1’匹配开始于同一个字符串的两次重复出现的行。
11) 孤立的空正则表达式(就是‘//’)等价于编译的最后一个正则表达式。

要使用特殊字符(^ $ . * [ ] / /)中的某一个字符作为文字(去匹配输入中它们自身的出现)，要对这个特殊字符前导一个反斜杠‘/’。

上下文地址‘匹配’输入要求地址内的整个模式匹配模式空间的某个部分。

2.3. 地址的数目

在下一章节中的命令可能有 0, 1 或 2 个地址。在每个命令中都给出了允许的地址的最大数目。地址多于最大允许个数的命令被认为是错误的。

如果命令没有地址，它应用于输入中每个行。

如果命令有一个地址，它应用于匹配这个地址的所有行。

如果命令有两个地址，它应用于匹配第一个地址的第一行，和直到(并包括)匹配第二个地址的第一个后续行的所有后续行。接着在后续的行上再次尝试匹配第一个地址，并重复这个处理。

两个地址用逗号分隔。

例子:

/an/         匹配我们样例文本的第 1, 3, 4 行
/an.*an/     匹配第 1 行
/^an/        没有匹配行
/./          匹配所有行
//./         匹配第 5 行
/r*an/       匹配第 1,3, 4 行(number = zero!)
//(an/).*/1/ 匹配第 1 行

3. 函数

所有函数都用一个单一字符来命名。在下面的总结中，允许地址的最大数目在成对的圆括号内给出，接着的单一字符是函数名字，可能有的参数包围在成对的尖括号(< >)内，单一字符名字的英语解释，并在最后描述每个函数做些什么。在参数外围的尖括号不是参数的一部分，在实际编辑命令中不应该键入。

3.1. 面向整行的函数

(2)d -- delete lines

d 函数从文件中删除(不写入输出)匹配它的地址的所有行。

它还有一个副作用，在这个已删除的行上将不再尝试进一步的命令；在执行了 d 之后，马上就从输入读取一个新行，在新行上从头重新启动编辑命令列表。

(2)n -- next line

n 函数从输入读取下一行，替代当前行。当前行被写入输出，如果应该的话。继续执行编辑命令列表在 n 命令之后的部分。

(1)a/
<文本> -- append lines

a 函数导致在匹配它的地址的行之后把参数<文本>写入输出。a 命令是天生多行的；a 必须出现在一行的结束处，而<文本>可以包含任意数目的行。为了保持一行一个命令的构想，内部的换行必须用给换行立即前导上反斜杠字符(‘/’)的方式来隐藏。<文本>参数终止于第一个未隐藏的换行(没有立即前导反斜杠的第一个换行)。

一旦 a 函数成功的执行了，<文本>将被写入输出，而不管后来的命令对触发它的行会做些什么。触发的行可以被完全删除掉；而<文本>仍会被写入输出。

<文本>不被地址匹配所扫描，不尝试对它做编辑命令。它不引起行号计数器的任何变化。

(1)i/
<文本> -- insert lines

i 函数表现得等同于 a 函数，除了<文本>在匹配行之前写入输出之外。关于 a 函数的所有其他注释同样适用于 i 函数。

(2)c/
<文本> -- change lines

c 函数删除它的地址所选择的那些行，并把它们替代为在<文本>中的行。象 a 和 i 一样，c 必须跟随着被反斜杠隐藏了的换行；并且在<文本>中的内部的换行必须用反斜杠隐藏。

c 命令可以有两个地址，所以可选择一定范围内的行。如果找到，在这个范围内的所有行都被删除，只把<文本>的一个复本写入输出，而不是对每个删除的行都写一个复本。同于 a 和 i，<文本>不被地址匹配所扫描，不尝试对它做编辑命令。它不引起行号计数器的任何变化。

在一行已经被 c 函数删除之后，在这个已删除的行上将不再尝试进一步的命令。

如果 a 或 r 函数在某一行之后添加了文本，而这一行随后被 c 函数变更了，则 c 函数所插入的文本将会放置在 a 或 r 函数的文本之前。(r 函数在章节 3.4. 中描述)。

注意: 在这些函数放入输出的文本内，前导的空白和 tab 都会消失，象 sed 的编辑命令一样。要把前导的空白和 tab 放入输出中，需要在想要的第一个空白或 tab 之前前导反斜杠；这个反斜杠不会出现在输出中。

例子:

编辑命令的列表:

n
a/
XXXX
d

应用于我们的标准输入，生成:

In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.

在这个特定情况下，下面两列命令列表会生成同样的效果:

n         n
i/        c/
XXXX      XXXX
d

3.2. 替换函数

这是一个非常重要的函数，它改变在一行之内通过上下文查找而选择出的这一行的某部分。

(2)s<模式><替代><标志> -- substitute

s 函数替代行的(通过<模式>选择的)某部分为<替代>。它可以读做:

替换<模式>为<替代>

<模式>参数包含一个模式，它完全等同于地址中的模式(参见章节 2.2)。在<模式>和上下文地址之间的唯一区别是上下文地址必须用斜杠字符(‘/’)来界定；<模式>可以用不是空格或换行的任何其他字符来界定。

缺省的，只替换匹配<模式>的第一个字符串，参见后面的 g 标志。

<替代>参数紧接着<模式>的第二个分界字符之后开始，并且它必须立即跟随着分界字符的另一个实例。(所以准确的有三个分界字符的实例)。<替代>不是模式，在模式中有特殊意义的字符在<替代>中没有特殊意义。反而有特殊意义的字符是:

& 被替代为匹配<模式>的字符串。
/d (这里的 d 是一个单一的数字)被替代为同<模式>中第 d 个包围在‘/(’和‘/)’内的部分相匹配的子串。如果在<模式>中出现嵌套的子串，第 d 个通过计数开分界符 (‘/(’)来界定。同在模式中一样，特殊字符可以通过前导反斜杠(‘/’)来变为文字。

<标志>参数可以包含任何下列标志:

g -- 把此行中<模式>的所有(不重叠)的实例都替换为<替代>，对<模式>的下一个实例的扫描就开始于插入的这些字符之后；放置入行中的来自<替代>的字符不会被重新扫描。
p -- 打印此行，如果做了成功替换的话。p 标志导致把输入行写入输出，当且仅当这个 s 函数实际上做了替换。注意如果有多个 s 函数，每个函数都跟随着 p 标志，它们都在同一个输入行上成功的做了替换，会把这一行的多个复本写到输出: 每个成功的替换都写一个复本。
w <文件名> -- 把此行写入一个文件，如果做了成功的替换的话。w 标志导致实际上被 s 函数替代了那些行被写到<文件名>所指名的文件中。如果<文件名>在 sed 运行前就存在，则覆盖它。否则，就建立它。

必须用一个单一的空格分隔 w 和<文件名>。

同 p 一样有着写入一个输入行的多个略有不同的复本的可能性。

在 w 标志和 w 函数(参见后面章节)之后可以提及的不同的文件名字合起来的最大数目为 10 个。

例子:

把下列命令应用于我们的标准输入，

s/to/by/w changes

生成，在标准输出上:

In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.

在文件‘changes’中:

Through caverns measureless by man
Down by a sunless sea.

如果不复制选项生效，命令:

s/[.,;?:]/*P&*/gp

生成:

A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*

最后为了展示 g 标志的效果，命令:

/X/s/an/AN/p

生成(假定不复制模式):

In XANadu did Kubhla Khan

而命令:

/X/s/an/AN/gp

生成:

In XANadu did Kubhla KhAN

3.3. 输入输出函数

(2)p -- print

打印函数把寻址到的行写到标准输出文件。在遇到 p 函数的时候就写入它们，而不管后续的编辑命令对这些行会做些什么。

(2)w <文件名> -- write on <filename>

写函数把寻址到的行写到<文件名>指名的文件中。如果这个文件以前就存在，则覆盖它；否则，就建立它。每行都按遇到写函数时现存的样子写入，而不管后续的编辑命令对这些行会做些什么。必须用精确的一个空格分隔 w 和<文件名>。在 s 函数的 w 标志之后和写函数中可以提及的不同的文件名字合起来的最大数目为 10 个。

(1)r <文件名> -- read the contents of a file

读函数读入<文件名>的内容，并把它们添加到匹配这个地址的行的后面。读取这个文件并添加它的内容，而不管后续的编辑命令对匹配它的地址的这些行会做些什么。如果 r 和 a 函数在同一行上执行，来自 a 函数和 r 函数的文本按照这些函数执行的次序写入输出。必须用精确的一个空格分隔 r 和<文件名>。如果 r 函数提及的文件不能打开，它被当作一个空文件，而不是一个错误，所以不给出诊断信息。

注意: 因为对可以同时打开的文件数目是有所限制的，要小心在 w 命令或标志中不要提及多于 10 个(不同的)文件；如果有任何 r 函数出现，这个数目还会再减少一个。(在一个时候只能打开一个读取文件)。

例子

假定文件‘note1’有如下内容:

	Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.

则下列命令:

     /Kubla/r note1

生成:

In Xanadu did Kubla Khan
	Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.

3.4. 多输入行函数

有三个用大写字母拼写的函数特殊处理包含内嵌换行的模式空间；它们主要意图提供跨越输入中的行的模式匹配。

(2)N -- Next line

在模式空间中把下一行添加到当前行之后；两个输入行用一个内嵌的换行分隔。模式匹配可以延伸跨越这个内嵌换行。

(2)D -- Delete first part of the pattern space

删除当前模式空间中直到并包括第一个换行字符的所有字符。如果这个模式空间变成了空的(唯一的换行是终止换行)，则从输入读取另一行。在任何情况下，都再次从编辑命令列表的起始处开始执行。

(2)P -- Print first part of the pattern space

打印模式空间中的直到并包括第一个换行的所有字符。

P 和 D 函数等价于它们对应的小写函数，如果在模式空间中没有内嵌换行的话。

3.5. 保存和取回函数

有四个函数为将来的使用而保存和取回部分输入。

(2)h -- hold pattern space

h 函数把模式空间的内容复制到保存区域(销毁保存区域以前的内容)。

(2)H -- Hold pattern space

H 函数把模式空间的内容添加到保存区域的内容之后；以前和新的内容用换行分隔。

(2)g -- get contents of hold area

g 函数把保存区域的内容复制到模式空间(销毁模式空间以前的内容)。

(2)G -- Get contents of hold area

G 函数把保存区域的内容添加到模式空间的内容之后；以前和新的内容用换行分隔。

(2)x -- exchange

对换命令交换模式空间和保存区域的内容。

例子

命令

1h
1s/ did.*//
1x
G
s//n/ :/

应用于我们的标准例子，生成:

In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu

3.6. 控制流函数

这些函数不在输入行上做编辑，但是控制函数到地址部分所选择的行的应用。

(2)! -- Don’t

非命令导致(写在同一行上的)下一个命令，应用到所有的且只能是未被地址部分选择到那些输入行上。

(2){ -- Grouping

组合命令‘{’导致下一组命令作为一个块而被应用(或不应用)到组合命令的地址所选择的输入行上。在组合控制下的的命令中的第一个命令可以出现在与‘{’相同的一行或下一行上。

组合的命令由自己独立在一行之上的相匹配的‘}’终止。

组合可以嵌套。

(0):<标号> -- place a label

标号函数在编辑命令列表中标记一个位置，它将来可以被 b 和 t 函数所引用。<标号>可以是八个或更少的字符的任何序列；如果两个不同的冒号函数有相同的标号，就会生成编译时间诊断信息，而不做执行尝试。

(2)b<标号> -- branch to label

分支函数导致应用于当前输入行上的编辑命令序列，被立即重新启动到有相同的<标号>的冒号函数的所在位置之后。如果在所有编辑命令都已经被编译了之后仍没有找到有相同的标号的冒号函数，就会生成一个编译时间诊断信息，而不做执行尝试。

不带有<标号>的 b 函数被当作到编辑命令列表结束处的分支；对当前输入行做应做的无论怎样的处理，并读入其他输入行；编辑命令的列表在这个新行上从头重新启动。

(2)t<标号> -- test substitutions

t 函数测试在当前输入行上是否已经做了任何成功的替换；如果有，它分支到<标号>；否则，它什么都不做。指示已经执行了成功的替换的标志通过如下方式复零:

1) 读取一个新输入行，或
2) 执行 a 和 t 函数。

3.7. 杂类函数

(1)= -- equals

= 函数向标准输出写入匹配它的地址的行的行号。

(1)q -- quit

q 函数导致把当前行写到标准输出(如果应该的话)，任何添加的或读入的文本也被写出，而且执行会被终止。

引用

[1] Ken Thompson and Dennis M. Ritchie, The UNIX Programmer’s Manual. Bell Laboratories, 1978.

原文地址 http://cm.bell-labs.com/7thEdMan/vol2/sed

发表于： 2006-06-27，修改于： 2006-07-04 20:57，已浏览15261次，有评论2条推荐投诉

网友： lgfang

时间：2006-06-28 11:20:13 IP地址：192.11.188.★

原文是man－page，无法直接看，我用emacs把它格式化了：

SED -- A Non-interactive Text Editor

Lee E. McMahon

Context search Editing

Sed is a non-interactive context editor that runs on the

operating  system.  Sed is  designed to  be especially  useful in

three cases:



1) To edit files too large for comfortable interactive editing;

2) To edit any size file when the sequence of editing commands is

     too complicated to be comfortably typed in interactive mode.

3) To perform multiple  `global' editing functions efficiently in

     one pass through the input.



     This memorandum constitutes a manual for users of sed.



Introduction



     Sed  is  a non-interactive  context  editor  designed to  be

     especially useful in three cases:



1) To edit files too large for comfortable interactive editing;

2) To edit any size file when the sequence of editing commands is

     too complicated to be comfortably typed in interactive mode;

3) To perform multiple  `global' editing functions efficiently in

     one pass through the input.



     Since only  a few lines of  the input reside in  core at one

     time, and no temporary files are used, the effective size of

     file that can  be edited is limited only  by the requirement

     that the input and  output fit simultaneously into available

     secondary storage.



     Complicated  editing scripts can  be created  separately and

     given to  sed as  a command file.   For complex  edits, this

     saves  considerable typing, and  its attendant  errors.  Sed

     running from a command file  is much more efficient than any

     interactive editor known to  the author, even if that editor

     can be driven by a pre-written script.



     The principal  loss of functions compared  to an interactive

     editor  are  lack of  relative  addressing  (because of  the

     line-at-a-time    operation),   and   lack    of   immediate

     verification that a command has done what was intended.



     Sed is a lineal descendant  of the UNIX editor, ed.  Because

     of the  differences between interactive  and non-interactive

     operation,  considerable changes have  been made  between ed

     and  sed; even  confirmed  users of  ed  will frequently  be

     surprised (and  probably chagrined), if they  rashly use sed

     without reading Sections 2 and 3 of this document.  The most

     striking family  resemblance between  the two editors  is in

     the   class  of   patterns   (`regular  expressions')   they

     recognize; the  code for matching patterns  is copied almost

     verbatim  from  the code  for  ed,  and  the description  of

     regular expressions  in Section 2 is  copied almost verbatim

     from  the  UNIX   Programmer's  Manual[1].  (Both  code  and

     description were written by Dennis M. Ritchie.)



1. Overall Operation



     Sed  by default copies  the standard  input to  the standard

     output, perhaps  performing one or more  editing commands on

     each line  before writing it  to the output.   This behavior

     may be  modified by flags  on the command line;  see Section

     1.1 below.



     The general format of an editing command is:



               [address1,address2][function][arguments]



One or both addresses may  be omitted; the format of addresses is

given in  Section 2.  Any number  of blanks or  tabs may separate

the addresses  from the function.  The function  must be present;

the available commands are discussed in Section 3.  The arguments

may  be required  or  optional, according  to  which function  is

given;  again,  they  are  discussed  in  Section  3  under  each

individual function.



     Tab  characters and  spaces at  the beginning  of  lines are

     ignored.



1.1. Command-line Flags



     Three flags are recognized on the command line:

          -n:

               tells sed  not to copy  all lines, but  only those

               specified  by  p  functions  or p  flags  after  s

               functions (see Section 3.3);

          -e:

               tells sed to take  the next argument as an editing

               command;

          -f:

               tells  sed to  take the  next argument  as  a file

               name;  the file  should contain  editing commands,

               one to a line.



1.2. Order of Application of Editing Commands



     Before any editing  is done (in fact, before  any input file

     is even opened), all  the editing commands are compiled into

     a  form  which  will  be  moderately  efficient  during  the

     execution phase  (when the commands are  actually applied to

     lines of the input file).   The commands are compiled in the

     order in  which they are encountered; this  is generally the

     order  in which they  will be  attempted at  execution time.

     The commands  are applied one at  a time; the  input to each

     command is the output of all preceding commands.



     The default linear order  of application of editing commands

     can be changed by the flow-of-control commands, t and b (see

     Section 3).   Even when the order of  application is changed

     by these commands,  it is still true that  the input line to

     any command is the output of any previously applied command.



1.3.  Pattern-space



     The range  of pattern matches  is called the  pattern space.

     Ordinarily, the pattern space is one line of the input text,

     but more than one line can be read into the pattern space by

     using the N command (Section 3.6.).



1.4. Examples



     Examples  are scattered throughout  the text.   Except where

     otherwise noted, the examples all assume the following input

     text:



          In Xanadu did Kubla Khan

          A stately pleasure dome decree:

          Where Alph, the sacred river, ran

          Through caverns measureless to man

          Down to a sunless sea.



     (In  no  case  is the  output  of  the  sed commands  to  be

     considered an improvement on Coleridge.)



Example:



     The command



     2q



     will quit  after copying the  first two lines of  the input.

     The output will be:



          In Xanadu did Kubla Khan

          A stately pleasure dome decree:



2. ADDRESSES: Selecting lines for editing



     Lines in the input file(s)  to which editing commands are to

     be applied  can be selected by addresses.   Addresses may be

     either line numbers or context addresses.



     The application of a group  of commands can be controlled by

     one address (or address-pair)  by grouping the commands with

     curly braces (`{ }')(Sec. 3.6.).



2.1. Line-number Addresses



     A line  number is a decimal  integer.  As each  line is read

     from  the input,  a  line-number counter  is incremented;  a

     line-number address  matches (selects) the  input line which

     causes   the   internal  counter   to   equal  the   address

     line-number.  The counter runs cumulatively through multiple

     input  files; it  is  not reset  when  a new  input file  is

     opened.



     As a special case, the  character $ matches the last line of

     the last input file.



2.2. Context Addresses



     A  context  address  is  a  pattern  (`regular  expression')

     enclosed   in  slashes   (`/').   The   regular  expressions

     recognized by sed are constructed as follows:



1) An ordinary character (not one  of those discussed below) is a

     regular expression, and matches that character.



2) A  circumflex `^'  at the  beginning of  a  regular expression

     matches the null character at the beginning of a line.

3) A dollar-sign `$'  at the end of a  regular expression matches

     the null character at the end of a line.

4) The characters  `/n' match an imbedded  newline character, but

     not the newline at the end of the pattern space.

5) A period `.' matches any character except the terminal newline

     of the pattern space.

6) A regular  expression followed by an asterisk  `*' matches any

     number   (including  0)  of  adjacent   occurrences   of the

     regular expression it follows.

7) A string  of characters in  square brackets `[ ]'  matches any

     character in the  string,  and no others.  If,  however, the

     first character of the string is circumflex `^', the regular

     expression  matches any character  except the  characters in

     the string and the terminal newline of the pattern space.

8) A concatenation of regular expressions is a regular expression

     which  matches  the  concatenation  of  strings  matched  by

     the components of the regular expression.

9) A regular  expression between the  sequences `/(' and  `/)' is

     identical  in effect to   the unadorned  regular expression,

     but  has  side-effects which   are  described  under  the  s

     command  below and specification 10) immediately below.

10) The  expression  `/d' means  the  same  string of  characters

     matched by  an expression enclosed in `/('  and `/)' earlier

     in the same pattern.  Here  d is  a single digit; the string

     specified is that beginning with the dth  occurrence of `/('

     counting  from  the  left.    For example,   the  expression

     `^/(.*/)/1'  matches   a line  beginning  with two  repeated

     occurrences of the same string.

11) The null  regular expression  standing alone (e.g.,  `//') is

     equivalent to the  last regular expression compiled.



     To use one of the special characters (^  $ . * [ ] / /) as a

     literal  (to match an  occurrence of  itself in  the input),

     precede the special character by a backslash `/'.



     For a context address to `match' the input requires that the

     whole pattern  within the address match some  portion of the

     pattern space.



2.3. Number of Addresses



     The  commands  in the  next  section can  have  0,  1, or  2

     addresses.  Under each command the maximum number of allowed

     addresses is  given.  For a  command to have  more addresses

     than the maximum allowed is considered an error.



     If a command  has no addresses, it is  applied to every line

     in the input.



     If a  command has  one address, it  is applied to  all lines

     which match that address.



     If a command  has two addresses, it is  applied to the first

     line which matches the  first address, and to all subsequent

     lines until (and including)  the first subsequent line which

     matches  the second  address.  Then  an attempt  is  made on

     subsequent lines  to again match the first  address, and the

     process is repeated.



     Two addresses are separated by a comma.



Examples:



          /an/      matches lines 1, 3, 4 in our sample text

          /an.*an/  matches line 1

          /^an/     matches no lines

          /./       matches all lines

          //./      matches line 5

          /r*an/    matches lines 1,3, 4 (number = zero!)

          //(an/).*/1/        matches line 1



3. FUNCTIONS



     All  functions are  named  by a  single  character.  In  the

     following summary, the maximum number of allowable addresses

     is given enclosed in  parentheses, then the single character

     function name, possible arguments  enclosed in angles (< >),

     an  expanded  English  translation of  the  single-character

     name, and finally a  description of what each function does.

     The  angles  around  the  arguments  are  not  part  of  the

     argument,  and  should  not   be  typed  in  actual  editing

     commands.



3.1. Whole-line Oriented Functions



          (2)d --  delete lines The  d function deletes  from the

               file  (does not write   to the  output) all  those

               lines  matched  by   its  address(es).   It   also

               has  the  side effect  that  no  further  commands

               are attempted   on the  corpse of a  deleted line;

               as soon as the d function is executed, a  new line

               is   read  from  the  input,   and  the  list   of

               editing    commands   is   re-started   from   the

               beginning on the new line.

          (2)n --  next line The  n function reads the  next line

               from  the  input,   replacing  the  current  line.

               The current  line is written to  the  output if it

               should  be.   The  list  of  editing  commands  is

               continued following the n command.

          (1)a/

          <text> -- append lines

               The a  function causes  the argument <text>  to be

               written to  the output  after the line  matched by

               its   address.   The   a  command   is  inherently

               multi-line; a  must appear at  the end of  a line,

               and <text>  may contain  any number of  lines.  To

               preserve  the  one-command-to-a-line fiction,  the

               interior  newlines must be  hidden by  a backslash

               character (`/') immediately preceding the newline.

               The  <text> argument  is terminated  by  the first

               unhidden  newline (the  first one  not immediately

               preceded  by backslash).   Once an  a  function is

               successfully executed,  <text> will be  written to

               the output regardless of what later commands do to

               the line which  triggered it.  The triggering line

               may  be  deleted entirely;  <text>  will still  be

               written to the output.   The <text> is not scanned

               for address  matches, and no  editing commands are

               attempted on it.  It  does not cause any change in

               the line-number counter.

          (1)i/

          <text> -- insert lines

               The  i  function   behaves  identically to  the  a

               function,  except that  <text> is  written  to the

               output  before   the  matched  line.    All  other

               comments  about  the a  function  apply  to the  i

               function as well.

          (2)c/

          <text> -- change lines

               The c  function deletes the lines  selected by its

               address(es), and  replaces them with  the lines in

               <text>.  Like  a and  i, c must  be followed  by a

               newline  hidden by a  backslash; and  interior new

               lines  in <text>  must be  hidden  by backslashes.

               The  c   command  may  have   two  addresses,  and

               therefore select  a range  of lines.  If  it does,

               all the  lines in the range are  deleted, but only

               one copy  of <text> is written to  the output, not

               one  copy per  line  deleted.  As  with  a and  i,

               <text> is not scanned  for address matches, and no

               editing commands are attempted on it.  It does not

               change the  line-number counter.  After a line has

               been deleted by a  c function, no further commands

               are attempted on the  corpse.  If text is appended

               after a line by a  or r functions, and the line is

               subsequently changed,  the text inserted  by the c

               function will  be placed before the text  of the a

               or r  functions.  (The r function  is described in

               Section 3.4.)

     Note: Within the text put  in the output by these functions,

     leading  blanks and tabs  will disappear,  as always  in sed

     commands.  To  get leading blanks and tabs  into the output,

     precede the first  desired blank or tab by  a backslash; the

     backslash will not appear in the output.



Example:



     The list of editing commands:



          n

          a/

          XXXX

          d



     applied to our standard input, produces:



          In Xanadu did Kubhla Khan

          XXXX

          Where Alph, the sacred river, ran

          XXXX

          Down to a sunless sea.



     In this  particular case, the same effect  would be produced

     by either of the two following command lists:



          n                   n

          i/                  c/

          XXXX      XXXX

          d



3.2. Substitute Function



     One very important function  changes parts of lines selected

     by a context search within the line.

          (2)s<pattern><replacement><flags>  -- substitute  The s

               function     replaces     part    of     a    line

               (selected   by <pattern>) with  <replacement>.  It

               can best be read:

                         Substitute  for <pattern>, <replacement>

               The <pattern> argument contains a pattern, exactly

               like  the patterns in  addresses (see  2.2 above).

               The  only  difference   between  <pattern>  and  a

               context address  is that the  context address must

               be delimited by  slash (`/') characters; <pattern>

               may be delimited by any character other than space

               or  newline.  By  default, only  the  first string

               matched by  <pattern> is  replaced, but see  the g

               flag  below.   The  <replacement> argument  begins

               immediately after  the second delimiting character

               of <pattern>, and  must be followed immediately by

               another  instance  of  the  delimiting  character.

               (Thus  there are  exactly three  instances  of the

               delimiting character.)  The <replacement> is not a

               pattern, and  the characters which  are special in

               patterns   do   not   have  special   meaning   in

               <replacement>.   Instead,   other  characters  are

               special:

                    &         is  replaced by the  string matched

                         by <pattern>

                    /d (where d is a single digit) is replaced by

                         the    dth     substring   matched    by

                         parts  of  <pattern>  enclosed  in  `/('

                         and  `/)'.    If nested substrings occur

                         in <pattern>, the  dth is determined  by

                         counting opening delimiters (`/(').   As

                         in patterns,  special characters may  be

                         made  literal  by  preceding   them with

                         backslash (`/').

               The  <flags> argument  may  contain the  following

               flags:

                    g   --  substitute   <replacement>   for  all

                         (non-overlapping)       instances     of

                         <pattern>   in  the    line.    After  a

                         successful  substitution, the  scan  for

                         the  next  instance of  <pattern> begins

                         just  after  the  end  of   the inserted

                         characters; characters put into the line

                         from <replacement> are not rescanned.

                    p   --  print  the   line  if   a  successful

                         replacement  was   done.    The  p  flag

                         causes  the line to  be written   to the

                         output if and only if a substitution was

                         actually   made  by   the   s  function.

                         Notice     that     if     several     s

                         functions,   each   followed  by   a   p

                         flag, successfully   substitute  in  the

                         same   input line,  multiple  copies  of

                         the   line  will  be  written   to   the

                         output:    one   for    each  successful

                         substitution.

                    w <filename> -- write the line to a file if a

                         successful replacement was  done.  The w

                         flag  causes  lines which  are  actually

                         substituted  by  the  s function   to be

                         written to  a file named  by <filename>.

                         If  <filename> exists before sed is run,

                         it  is   overwritten;  if  not,   it  is

                         created.   A  single space must separate

                         w       and        <filename>.       The

                         possibilities   of  multiple,   somewhat

                         different   copies   of  one input  line

                         being written  are the same as for p.  A

                         maximum of  10  different file names may

                         be   mentioned  after  w  flags  and   w

                         functions (see below), combined.



Examples:



     The following command, applied to our standard input,



          s/to/by/w changes



     produces, on the standard output:



          In Xanadu did Kubhla Khan

          A stately pleasure dome decree:

          Where Alph, the sacred river, ran

          Through caverns measureless by man

          Down by a sunless sea.



     and, on the file `changes':



          Through caverns measureless by man

          Down by a sunless sea.



     If the nocopy option is in effect, the command:



          s/[.,;?:]/*P&*/gp



     produces:



          A stately pleasure dome decree*P:*

          Where Alph*P,* the sacred river*P,* ran

          Down to a sunless sea*P.*



Finally, to illustrate the effect of the g flag, the command:



          /X/s/an/AN/p



     produces (assuming nocopy mode):



     In XANadu did Kubhla Khan



     and the command:



          /X/s/an/AN/gp



     produces:



          In XANadu did Kubhla KhAN



3.3. Input-output Functions



          (2)p --  print The print function  writes the addressed

               lines  to  the  standard  output file.   They  are

               written   at   the  time    the   p  function   is

               encountered, regardless of what succeeding editing

               commands may do to the lines.

          (2)w  <filename>  --  write  on  <filename>  The  write

               function writes  the  addressed lines to  the file

               named  by  <filename>.    If the  file  previously

               existed,  it  is  overwritten;   if not,   it   is

               created.   The lines  are written exactly  as they

               exist  when  the  write  function is   encountered

               for  each  line,   regardless  of  what subsequent

               editing  commands may  do  to  them.   Exactly one

               space  must  separate  the w  and  <filename>.   A

               maximum    of  ten    different   files   may   be

               mentioned   in write   functions    and  w   flags

               after  s  functions, combined.

          (1)r <filename> -- read the contents of a file The read

               function  reads the   contents of  <filename>, and

               appends  them  after   the  line  matched  by  the

               address.    The  file   is   read    and  appended

               regardless  of  what  subsequent editing  commands

               do to  the  line   which matched  its address.  If

               r and  a functions are executed on  the same line,

               the text from the  a functions and the r functions

               is  written  to  the  output  in  the  order  that

               the  functions   are    executed.    Exactly   one

               space  must separate the  r and  <filename>.  If a

               file mentioned by a  r function cannot be  opened,

               it is considered a null file, not an error, and no

               diagnostic is given.

     NOTE: Since there is a limit to the number of files that can

     be opened simultaneously, care  should be taken that no more

     than ten  files be mentioned  in w functions or  flags; that

     number is  reduced by  one if any  r functions  are present.

     (Only one read file is open at one time.)



Examples



     Assume that the file `note1' has the following contents:



               Note:   Kubla  Khan  (more properly  Kublai  Khan;

               1216-1294)  was  the  grandson  and  most  eminent

               successor of  Genghiz (Chingiz) Khan,  and founder

               of the Mongol dynasty in China.



Then the following command:



          /Kubla/r note1



     produces:



          In Xanadu did Kubla Khan

               Note:   Kubla  Khan  (more properly  Kublai  Khan;

               1216-1294)  was  the  grandson  and  most  eminent

               successor of  Genghiz (Chingiz) Khan,  and founder

               of the Mongol dynasty in China.

          A stately pleasure dome decree:

          Where Alph, the sacred river, ran

          Through caverns measureless to man

          Down to a sunless sea.



3.4.

     Multiple Input-line Functions



     Three  functions,  all spelled  with  capital letters,  deal

     specially with pattern  spaces containing imbedded newlines;

     they  are intended  principally to  provide  pattern matches

     across lines in the input.

          (2)N --  Next line The  next input line is  appended to

               the current  line in  the  pattern space; the  two

               input  lines    are  separated  by    an  imbedded

               newline.   Pattern matches  may extend  across the

               imbedded newline(s).

          (2)D --  Delete first part of the  pattern space Delete

               up to  and including  the  first newline character

               in  the current  pattern  space.  If   the pattern

               space  becomes  empty (the  only  newline was  the

               terminal  newline), read  another  line  from  the

               input.   In  any case,  begin the  list of editing

               commands again from its beginning.

          (2)P -- Print first part  of the pattern space Print up

               to   and  including  the  first   newline   in the

               pattern space.

The  P  and  D  functions  are  equivalent  to  their  lower-case

counterparts  if there are  no imbedded  newlines in  the pattern

space.



3.5.  Hold and Get Functions



     Four  functions save  and  retrieve part  of  the input  for

     possible later use.

     (2)h  --  hold pattern  space  The  h  functions copies  the

          contents  of   the  pattern  space into  a   hold  area

          (destroying the previous contents of the hold area).

     (2)H  --  Hold pattern  space  The  H  function appends  the

          contents of  the pattern space  to the contents of  the

          hold  area;   the    former   and   new   contents  are

          separated  by  a newline.

     (2)g -- get contents of  hold area The g function copies the

          contents  of  the  hold  area into  the  pattern  space

          (destroying  the  previous   contents  of  the  pattern

          space).

     (2)G -- Get contents of hold area The G function appends the

          contents of   the hold  area  to the  contents  of  the

          pattern  space;  the   former and   new   contents  are

          separated by  a newline.

     (2)x  --  exchange  The  exchange command  interchanges  the

          contents of the pattern space and the hold area.



Example



     The commands

          1h

               1s/ did.*//

               1x

               G

               s//n/  :/

     applied to our standard example, produce:

          In Xanadu did Kubla Khan  :In Xanadu

               A stately pleasure dome decree:  :In Xanadu

               Where Alph, the sacred river, ran  :In Xanadu

               Through caverns measureless to man  :In Xanadu

               Down to a sunless sea.  :In Xanadu



3.6.  Flow-of-Control Functions



     These  functions  do no  editing  on  the  input lines,  but

     control the  application of functions to  the lines selected

     by the address part.

          (2)! -- Don't The Don't command causes the next command

               (written  on  the  same line),   to be  applied to

               all and only those input lines not selected by the

               adress part.

          (2){ --  Grouping The  grouping command `{'  causes the

               next  set  of  commands  to  be  applied  (or  not

               applied) as a block to the input lines selected by

               the addresses of the grouping command.  The  first

               of the commands under  control of the grouping may

               appear on the same line  as the `{' or on the next

               line.



     The  group  of commands  is  terminated  by  a matching  `}'

     standing on a line by itself.



     Groups can be nested.

(0):<label> -- place a label  The label function marks a place in

     the list of editing commands  which  may be referred to by b

     and  t functions.   The  <label>  may  be  any  sequence  of

     eight   or  fewer  characters;   if  two   different   colon

     functions    have   identical   labels,   a   compile   time

     diagnostic will  be  generated, and  no execution attempted.

(2)b<label> --  branch to label  The branch function  causes  the

     sequence of  editing commands  being applied to  the current

     input  line to   be restarted  immediately  after  the place

     where   a  colon  function   with  the   same  <label>   was

     encountered.   If   no colon  function with the   same label

     can  be found  after  all the  editing commands   have  been

     compiled,   a  compile  time   diagnostic  is  produced, and

     no execution  is  attempted.  A b function   with no <label>

     is taken  to be a branch to  the end of the  list of editing

     commands; whatever  should be  done  with the  current input

     line is done,  and  another input  line  is  read;  the list

     of  editing commands is  restarted from the beginning on the

     new line.

(2)t<label> --  test substitutions  The t function  tests whether

     any successful substitutions have   been made on the current

     input line;  if  so,  it  branches to  <label>;  if  not, it

     does nothing.   The flag which indicates   that a successful

     substitution has been executed is reset by:

               1) reading a new input line, or

               2) executing a t function.



3.7. Miscellaneous Functions



          (1)= --  equals The =  function writes to  the standard

               output  the line  number  of the  line  matched by

               its address.

          (1)q -- quit The q  function causes the current line to

               be  written  to  the  output  (if it  should  be),

               any  appended or   read text  to be  written,  and

               execution to be terminated.



.SH

Reference



     [1]  Ken   Thompson  and   Dennis  M.   Ritchie,   The  UNIX

          Programmer's Manual.  Bell Laboratories, 1978.