Sed&Awk学习笔记

最新推荐文章于 2021-05-26 04:30:13 发布

weixin_34195546

最新推荐文章于 2021-05-26 04:30:13 发布

阅读量283

点赞数

文章标签： awk shell 数据库

原文链接：http://blog.51cto.com/dchampion/840320

版权

========================Apr.15th, 2012========================

1. What makes it interesting to solve a problem?

The satisfaction is between work and drudgery.

2. What is the diferrence between sed and vi?

Sed is non-interactive and stream-oriented, but vi, as most DOS applications, is not stream-oriented.

3. What is sed used for?

1) To automate editing actions to be performed on one or more files.

2) To simplify the task of performing the same edits on multiple files.

3) To write conversion programs.

4. What is awk used for?

1) View a text as a textual databases made up of records and fields.

2) Use variables to manipulate the database.

3) Use arithmetic and string operators.

4) Use common programming constructs such as loops and conditionals.

5) Generate formatted reports.

6) Define functions.

7) Execute UNIX commands from a script.

8) Process the result of UNIX commands – stream-oriented.

9) Process command-line arguments more gracefully.

10) Work more easily with multiple input streams.

* Awk has been used to write a Lisp interpreter and even a compiler.

5. What is the relationship between the common editors?

ed – sed/grep – awk

ex – vi

6. How to use ed?

num – move to the specified line

p – print current line

d – delete current line

y – copy current line

x – append the line in the cut buffer to the next line of current one

. – back to the ed shell

u – undo last command

7. What is the major difference between ed and sed?

ed is current-line addressing while sed is global-line addressing.

In other word, in ed, we use addressing to expand the number of lines affected by a command, while in sed we use addressing to restrict it in turn.

8. What is nawk?

It is a revised awk, which can offer more support for writing larger programs and tackling general-purpose programming problems.

Terms:

hard-coded:硬编码的、写死的

grep= global regular expression print (What the hell…)

‘’= single quotes

/ = delimiter

\ = backslash

========================Apr.16th, 2012========================

1. How to specify the script of sed&awk?

Use single quotes if specifying it on the command line.

Use –f option with a scriptfile if specifying it in a file.

2. How does sed&awk process the text?

They process the text according to the script. In script, there are two parts which are pattern and procedure.

The pattern is specified with regular expressions and used for addressing the line; the procedure specifies the action.

Sed&awk process the text and may output the result (awk does not only if you script for an output) line by line till thet go through all the lines.

3. How to specify multiple instructions once when using sed?

Add –e option.

The syntax is like this:

sed -e 's/ MA/, Massachusetts/' -e 's/ PA/, Pennsylvania/' list

4. How to suppress the automatic display of sed?

Add –n option which stands for “silent/quiet”.

5. How does awk interpret the input?

It interpret each line as a record and each word delimited by spaces or tabs as a field.

‘$’with number appended represents a specific field, while $0 means a entire input line.

6. How to change the separator in awk?

Use –F option with a separator appended.

7. How to separate multiple instructions in the script when using sed&awk?

End every instructions with semicolons.

8. What is backslash used for?

It transforms metacharacters into ordinary characters(and ordinary ones into meta ones)

9. How to match a string not shown in the end of a line?

Use “String.”, in which the dot won’t match the newline, so the string in the end of a line will be matched.

10. What does backslash inside a pair of square brackets mean when using awk?

Any characters inside a pair of square brackets are interpreted literally, except the backslash, the hyphen and the circumflex.

The backslash is specially used in awk, to escape any special character.

The hyphen represents the range when it doesn’t show in the first or last of a line.

The circumflex represents a reverse match when it doesn’t show in the first of a line.

11. How to interpret square brackets within a character class?

The left square bracket can be interpreted at any position inside a pair of square brackets.

The right one can only be interpreted successfully when it shows in the first of the line or in the first position after a headmost circumflex.

12. What additional components of character class are included into the POSIX standard?

1) Character classes: identified with [: and :], which means a class of character.

2) Collating symbols: identified with [. and .], which makes a multicharacter sequence treated as a unit.

3) Equivalence classes: identified with [= and =], which expands the matching range of an English character to some equivalent foreign characters.

13. What does the POSIX character classes consist of ?

1) [:alnum:] = alphabet + numbers + space

2) [:alpha:] = alphabet

3) [:blank:] = space + tab

4) [:cntrl:] = Control character

5) [:digit:] = numbers

6) [:graph:] = alphabet + numbers + punctuations + space

7) [:lower:] = lowercase alphabet

8) [:upper:] = uppercase alphabet

9) [:print:] = alphabet + numbers

10) [:punct:] = punctuations

11) [:space:] = space

12) [:xdigit:] = hexadecimal digits(from 0~F,case insensitive)

14. How to match any content between a pair of quotation marks?

Syntax is like this:

".*"

15. How to make sed and grep support the extended set of metacharacters?

Add -E option which represents ERE.

16. How to match a circumflex or a dollar sign in awk?

Since the circumflex and dollar sign are always special in awk, we have to use backslash to escape them when we want to match them literally.

Terms:

{} = braces;curly braces

; = semicolon

() = parentheses

[] = square brackets

<> = angle brackets

^ = circumflex

\n = newline

case insensitive = match a letter regardless of whether it is in uppercase or lowercase.

Error Correction:

Page 106 of 570

Line 12 in byState program

print $1 –>; print $1 “\n\t” $2

Others:

Someone let the cat out of the bag = Someone divulges a secret.

========================Apr.17th, 2012========================

1. How to use grouping operations in the extended set of metacharacters?

Use it with "?", in order to match both spelled-out and abbreviated words, like:

Lab(oratorie)?s

Or Use it with "|", in order to match both singular and plural of a word, like:

compan(y|ies)

2. How to construct regular expression when there are both single and double quotes within the character class?

Add backslash before double quotes (and exclamation if exists) within the class to escape, and then quotes the entire regular expression with double quotes.

3. How to match a word in all forms and at all positions in a text?

(^| )[\"[({]*$WORD[]})\"\!?.,;:'es]*( |$)

*"(^| )" and "( |$)" are used to match a word in both ends and middle of a line.

\<$WORD['es]*\>

* thanks to ex editors

4. How to address the beginning and end of a word in ex and vi(which is also supported in GNU sed&awk&grep)?

Beginning = \<

End = \>

\<$WORD\> = a definite word

5. How to match a shortest possible extent between two quotes(brackets/braces/same words)?

syntax is like this:

"[^"]*"

6. How to save and recall of a portion of pattern?

Use "$" and "$" to save and use "\num" to recall the saved portion. The number of saves is limited to 9.

7. How to interpret two addresses in a sed command?

It specifies a range of lines to match, the first address is a beginning and the second is an end.

For instance, “/^test/,/^8/d” means delete all lines between the line beginning with the word “test” and the line beginning with “8”(including “8”).

8. What should we pay attention to when using grouping commands of sed?

We use “{}" to nest one address inside another or to apply multiple commands at the same address, like this:

/^test/,/^800/{

/^$/d

}

ATTENTION: the structure of grouping commands are pretty rigid – the opening brace must end a line, and the closing one must own a line by itself. Meanwhile, there must be no spaces after both braces.

Terms:

| = vertical bar

Error Correction:

Page 133 of 570

The "grep" instruction to match word 'book"

grep " [\"[{(]*book[]})\"?!.,;:'s]* " bookwords -> grep " [\"[{(]*book[]})\"?\!.,;:'s]* " bookwords

p.s. add a backslash before "!" to escape.

========================Apr.18th, 2012========================

1.How to exclude an address?

Add “!”after the address you want to exclude.

2. How to separate multiple commands on one line?

Add “;” between every two commands.

3. What subtle syntax errors are we inclined to make?

1) Add spaces after commands(nonetheless, add spaces before commands is okay).

4. What is the rule of using delimiters?

To addresses, we must use slash as a delimiter.

To regular expression, we can use any character (except newline) other than slash as a delimiter.

ATTENTION: In regular expression field, if the delimiter is shown in the regular expressions, use backslash to escape it.

5. What does “&”mean in the replacement field?

It is a replacement metacharacter, representing the content matched in the pattern field.

* To coment out lines in a file: sed ‘s/[pattern]/#&/g’ filename

6. What is the features of append/insert/change commands in sed?

1) They will output even if the output is suppressed.

2) append/insert do not affect the number of lines in a text.

7. How to replace lowercase letters in a text into uppercase ones by using sed?

y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/

vice versa

8. Can an equal sign command(to print line number) be used on a range of lines?

No, the syntax is simply “[address]=”

9. What is an equal sign command used for?

It is frequently used for debugging, finding where are the specific words in errors reported by a compiler.

Terms:

-- = em-dash

`` = grave accents

# = octothorpe

& = ampersand

: = colon

word = in boldface

Error Correction:

Page 161 of 570

Para 2, “Before leaving this script, …”

repetoire –> repertoire

========================Apr.19th, 2012========================

1.How to insert “invisible”characters in shell?

Ctrl+V+<char>

2. How to print first 10 lines of a large number of text files more efficiently?

…

for file

sed ‘10q’$file

done

…

3. Can we append or insert text into a totally empty text file with append or insert command in sed?

To my knowledge, no, since sed can’t address the line, and then following commands can’t be executed.

4. How to comment out all lines in a file?(revised)

/^[^#]/s/^./#&/

5. What does N command do in sed?

Convert an end-of-line sign($) to a matchable newline.

6.How to use N command safely?

Add "$!" before "N" to exclude the final line of a text when executing. Like:

$!N

7. How to combine multiple lines in one line with sed with N-P-D loop?

$!{                                       //Before reaching the bottom, execute commands below
N                                        //Put the next line in the pattern space, making a 'two-cell' pattern space
s/$.*$\n$.*$/\n\1 \2/        //Remove the embedded newline and add a newline leading the pattern space, to vacate the first cell and to preserve content in

                                         //the second

D                                      //Delete the empty cell,return to the top of script, start the loop.
}

P                                      //loop ends, print the combined content in the pattern space.

========================Apr.20th, 2012========================

1. How to combine multiple lines in one without N-P-D loop?

$!{                                      // Execute the commands below before reaching the bottom
N                                       // Put the next line into the pattern space
G                                       // Copy content from hold space and append it to pattern space
s/$.*\n.*$\n$.*$/\2 \1/   // Swap the position of contents from pattern space and hold space, removing a newline by the way
s/\n/ /                             // Remove another newline
h                                      // Overwrite the hold space
}

${                                    // When reaching the bottom of the text
x                                     // We can also use g command here, since the combination is done and what we should do now is put the final combination into the pattern space to print
p                                    // print the final combination out
}

2. What are the common loops we always use with sed?

1) N-P-D (actually, P is optional)

2) H-d

3) b-loop

4) t-loop

3. Is the length of label limited in sed?

Traditionally, yes; while GNU sed cancelled the limitation according to the POSIX standards.

4. How to combine multiple lines in one with b-loop and t-loop in sed?

1)b-loop:

:top                                        // Anchor the top
$!N                                         // Read next line in the pattern space and convert the end-of-line sign to a newline
s/\n/ /                                    // Delete the newline
$!b top                                 // The flow goes back to the position labelled “top” till reaching the bottom of the text
p                                            // Output the content in the pattern space

2)t-loop:

:top                                       // Anchor the top
$!N                                        // Read next line in the pattern space and convert the end-of-line sign to a newline
s/\n/ /                                   // Delete the newline
t top                                     // If the s command execution is successful, the flow goes back to the position labelled “top”
p                                            // Output the content in the pattern space

5. What is the rule when the pattern matches two identical character in a line?

Sed matches characters as far as it can, so it will match the further one.

6. What should I do if I want to delete the first line of the pattern space but not passing the flow control to the start of the script?

Usually, we use D command to delete the first line of the pattern space, which simultaneously will pass the flow control to the beginning of the sed script.

However, as we know, D command recognizes the first line of the pattern space by newline;in other word, it actually deletes the portion before the newline(including the newline) in a pattern space.

Thus, we can just make this move with a substitution, simply like this:

s/.*\n//

7. How can I match a single quote in sed script argument(in bash)?

If you want to match only one single quote, use double quotes to quote the argument(backslash will do nothing in this scenario). vice versa

If you want to match both single and double quotes, double quotes is the only choice to quote the argument, in which case double quote to be matched should be escaped by a backslash.

Additionally, if the single quotes are used in pair, they will not be matched by sed but mean to quote(protect) regular expressions which may contain some special characters to the shell. So do double quotes.

Error Correction:

Page 245 of 570

Program Search is to be improved.

The program miss a scenario. In this case, a searching phrase is matched in first two lines, while there is a searching phrase spanning across the second line and the unread third line.

With the program named Search, it will output the first two line or the second line if they are matched, by b command with no argument, which will automatically output the pattern space and overwrite the pattern space by reading the next line. Thus, by this means, the scenario mentioned above will be missed.

To do some improvements with this program, we should analyze it first. With analysis, I find the process of matching the searching phrase in the second line will be unnecessary. Because if the searching phrase cannot be matched in the first line or across the two lines, the flow control will normally passed to the D command, which will delete the first line of the pattern space and rerun the script on the second one. The matching job on the second line will be completed at the beginning in the next loop, simply matching the phrase in a line. But we cannot use a b command without label here, since we have to deal with the miss above. So we simply output the line here, and read the next line into the pattern space, matching the phrase across two lines. improved program is like this(assuming the searching phrase is “Operating System”):

:top                                                                                    //Anchor the top
/Operating System/b print                                          // When the phrase is matched in a line, jump to position labelled “print”
$!N                                                                                  // If there is no match in the first line, read the next line into the pattern space, too.
h                                                                                     // Preserve the original two lines into the hold space
s/Operating System/*/g                                           // Check the phrase in two lines and replace each with an asterisk
s/ *\n/ /                                                                        // Convert the newline and redundant space into one space
/Operating System/{                                                // Match the phrase across two lines
g                                                                                 // If there is a match, call the original content into pattern space
p                                                                                 // Since they are not printed previously, print them all.
}
g                                                                                // No matter whether the match is successful, overwrite the pattern space with content in the hold space, since in any cases, the first line will be tossed.
D                                                                               // Delete the first line, return to the top and rerun the loop.
:one                                                                               //This is a position labelled for lines finish the match on their own and check whether there will be a match across two lines. Normal flow control will never be here.
$!N                                                                                // Read the next line into the pattern space
h                                                                                    // Preserve the intact two-line content in the hold space, facilitating the manipulation of the content in the pattern space.
s/Operating System/*/g                                           // Since this procedure is to check only phrases ACROSS two lines, phrases IN two lines will be interferences(namely, only matching a phrase across two line counts a

                                                                                       // hit). The asterisk is used for excluding the possibility that a searching phrase is exactly formed if the phrase is simply removed(work though not perfect).
s/ *\n/ /                                                                        // Convert the newline and redundant space into one space
/Operating System/{                                                 // Match the phrase
g                                                                                   // If matched, overwrite the pattern space with the original two-line content in the hold space, we will output the matched lines next.
s/.*\n//                                                                        // Remove the first line since it has been printed.
p                                                                                  // Output the second line
b top                                                                          // Go back to the top to rerun the loop from the current line(the former second line in the pattern space)
}
g                                                                                 // If not matched, overwrite the pattern space with content in the hold space, too.
D                                                                                 // Since there is no match of a phrase across two lines, the search on the first line is finished. Delete it, go back to the top to rerun the loop from the former second line.
:print                                                                          // This is a position labelled for printing the matched first line. It is out of the normal flow, too.
p                                                                                 //Print the matched line
b one                                                                        // Jump to position labelled “one”

========================Apr.22nd, 2012========================

1. Is single quote suggested to be used when using awk?

No, since shell will be confused.

2. How to change the separator of fields in awk?

Use –F option when using awk in shell, or assign the system variable FS, which stands for separator, in the BEGIN action.

3. How to test a regular expression on a specific field?

field [!]~ /RE/[{action}]

4. What does a variable in awk consist of?

A variable in awk consists of a string value and a numeric value. Awk uses an appropriate one according to the context.

Strings contain no numbers will have a numeric value of 0.

5. What are the system variables in awk?

FS=field separator

OFS=output field separator

NF=number of fields

NR=current number of record

RS=record separator

ORS=output record separator

FILENAME=current input file

FNR= current number of record in current input file

CONVFMT=format of conversion from a floating number to a string // a little bit covoluted, to be conceived when it is indispensable for my use of awk#This is crap#.

6. What is the precedence of the boolean operators in awk？

Expressions in parentheses will be evaluated first.

&& has higher precedence than ||

Terms:

~ = tilde

!~ = bang-tilde

modulo 取余

right-justified=right-aligned

转载于:https://blog.51cto.com/dchampion/840320

weixin_34195546

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Sed&Awk学习笔记

========================Apr.15th, 2012======================== 1. What makes it interesting to solve a problem? The satisfaction is between work and drudgery. 2. What is the diferrence b...
复制链接

扫一扫