Linux Command Line and Shell Scripting Bible,3rd,Part 3

 

Linux Command Line and Shell Scripting Bible,3rd,Part III






**************************************

Chapter  17  Creating Functions

**************************************


Section :   Basic Script Functions

===================================
  
Creating a function
---------------------
function name { 
commands
}

name()  { 
commands
 }



Using functions
----------------
To use a function in your script, specify the function name on a line,



Section : Returning a Value

=============================
functions like mini-scripts, complete with an exit status,
There are three different ways you can generate an exit status for your functions.

The default exit status
-------------------------
By default, the exit status of a function is the exit status returned by the last command in the function.

After the function executes, 
you use the standard $? variable to determine the exit status of the function:

#!/bin/bash  
 # testing the exit status of a function
func1() {
echo "trying to display a non-existent file" ls -l badfile
}
echo "testing the function: " func1
echo "The exit status is: $?"
$



Using the return command
-------------------------
The bash shell uses the return command to exit a function with a specific exit status.

#!/bin/bash
# using the return command in a function
function dbl {
read -p "Enter a value: " value 
echo "doubling the value" 
return $[ $value * 2 ]
}
dbl
echo "The new value is $?" 
$

■ Remember to retrieve the return value as soon as the function completes.
■ Remember that an exit status must be in the range of 0 to 255

Using function output
----------------------
you can also capture the output of a function to a shell variable. 
 retrieve any type of output from a function to assign to a variable:
 assigns the output of the dbl function to the $result shell variable.
 
 result='dbl'

#!/bin/bash
# using the echo to return a value
function dbl {
read -p "Enter a value: " value 
echo $[ $value * 2 ]
}
result=$(dbl)
echo "The new value is $result" 
$


Section : Using Variables in Functions

========================================

Passing parameters to a function
--------------------------------
Functions can use the standard parameter environment variables to represent any param- eters passed to the function on the command line.
it can’t directly access the script parameter values from the command line of the script. 

#!/bin/bash
# passing parameters to a function
function addem {
 if [ $# -eq 0 ] || [ $# -gt 2 ]
 then
echo -1
elif [ $# -eq 1 ]
 then
echo $[ $1+ $1 ]
else
echo $[ $1 + $2 ]
fi
}

echo -n "Adding 10 and 15: "
value=$(addem 10 15)
echo $value

Handling variables in a function
-------------------------------
Functions use two types of variables:

■ Global
If you define a global variable in the main section of a script, 
you can retrieve its value inside a function.
if you define a global variable inside a function, 
you can retrieve its value in the main section of the script.

By default, 
any variables you define in the script are global variables. 
Variables defined outside of a function can be accessed within the function just fine:

#!/bin/bash
# using a global variable to pass a value
function dbl {
value=$[ $value * 2 ]
}
read -p "Enter a value: " value
 dbl
echo "The new value is: $value"





■ Local
any variables that the function uses inter- nally can be declared as local variables. 
just use the local keyword in front of the variable declaration:

local temp
local temp=$[ $value + 5 ]

function func1 {
local temp=$[ $value + 5 ] 
result=$[ $temp * 2 ]
}




Section : Array Variables and Functions

==========================================

Passing arrays to functions
----------------------------
using the array variable as a function parameter, the function only picks up the first value of the array variable.

you must disassemble the array variable into its individual values and use the values as function parameters.
Inside the function, you can reassemble all the parameters into a new array variable.

#!/bin/bash
# array variable to function test
function testit {
local newarray

newarray=(;'echo "$@"')#newarray=( "$@" )

echo "The new array value is: ${newarray[*]}"
}
myarray=(1 2 3 4 5)
echo "The original array is ${myarray[*]}" 
testit ${myarray[*]}
$

---

function addarray { 
local sum=0
local newarray newarray=($(echo "$@"))
for value in ${newarray[*]} 
do
sum=$[ $sum + $value ] 
done
echo $sum 
}

myarray=(1 2 3 4 5)
echo "The original array is: ${myarray[*]}" 
arg1=$(echo ${myarray[*]}) 
result=$(addarray $arg1)
echo "The result is $result"
$



Returning arrays from functions
--------------------------------
The function uses an echo statement to output the individual array values in the proper order, 
and the script must reassemble them into a new array variable:

#!/bin/bash
# returning an array value
function arraydblr {
local origarray
local newarray
local elements
local i 

origarray=($(echo "$@")) 
newarray=($(echo "$@")) 
elements=$[ $# - 1 ]

for (( i = 0; i <= $elements; i++ )) {
newarray[$i]=$[ ${origarray[$i]} * 2 ]
 }
 
echo ${newarray[*]}
 }

myarray=(1 2 3 4 5)
echo "The original array is: ${myarray[*]}"
 arg1=$(echo ${myarray[*]}) 
 result=($(arraydblr $arg1))
echo "The new array is: ${result[*]}"
$



Section : Function Recursion

==============================
function factorial {
 if [ $1 -eq 1 ]
 then
echo 1
else
local temp=$[ $1 - 1 ]
local result='factorial $temp' echo $[ $result * $1 ]
fi
 }



Section : Creating a Library
==============================
The bash shell allows you to create a library file for your functions 
and then reference that single library file in as many scripts as you need to.

The first step in the process is to create a common library file :
# my script functions
function addem { echo $[ $1 + $2 ]
}

The key to using function libraries is the source command. 
The source command exe- cutes commands within the current shell context instead of creating a new shell to execute them. 
This makes the functions available to the script.

The source command has a shortcut alias, called the dot operator.
 To source the myfuncs library file in a shell script, you just need to add the following line:

. ./myfuncs



#!/bin/bash
# using functions defined in a library file 

. ./myfuncs

value1=10
value2=5
result1=$(addem $value1 $value2)
......
$

Section : Using Functions on the Command Line

===============================================

Creating functions on the command line:
---------------------------------------
The first method defines the function all on one line:

When you define the function on the command line,
you must remember to include a semi- colon at the end of each command

$functiondivem{echo$[$1/$2]; }
$ divem 100 5
20
$



The other method is to use multiple lines to define the function.

$ function multem { 
> echo $[ $1 * $2 ] 
>}
$ multem 2 5
10 
$

Define  functions in the .bashrc  file:
-------------------------------------
define the function in a place where it is reloaded by the shell each time you start a new shell.
The best place to do that is the .bashrc file.

Directly defining functions:
# .bashrc
# Source global definitions if [ -r /etc/bashrc ]; then
. /etc/bashrc
fi

function addem { 
echo $[ $1 + $2 ]

$

Sourcing function files
# .bashrc
# Source global definitions if [ -r /etc/bashrc ]; then
. /etc/bashrc
fi

. /home/rich/libraries/myfuncs 
$

The next time you start a shell, all the functions in your library are available at the command line interface:



Section : Following a Practical Example
==========================================
shtool


**************************************

Chapter  19   Introducing sed and gawk

**************************************


Section :    Manipulating Text

===================================
   
Getting to know the sed editor
-------------------------------
The sed editor is called a stream editor,


The sed editor does these things:


1. Reads one data line at a time from the input
2. Matches that data with the supplied editor commands
3. Changes data in the stream as specified in the commands 
4. Outputs the new data to STDOUT


Here’s the format for using the sed command: 

sed options script file


The sed Command Options:
-e script,Adds commands speci ed in the script to the commands run while process- ing the input
-f file,Adds the commands speci ed in the  le to the commands run while process- ing the input
-n ,Doesn’t produce output for each command, but waits for the print command



Defining an editor command in the command line:
By default, the sed editor applies the specified commands to the STDIN input stream.
This allows you to pipe data directly to the sed editor for processing.

$ echo "This is a test" | sed 's/test/big test/'
The s command, the words big test were substituted for the word test.


Using multiple editor commands in the command line:


The commands must be sepa- rated with a semicolon, 
and there shouldn’t be any spaces between the end of the command and the semicolon.
 
$ sed -e 's/brown/green/; s/dog/cat/' data1.txt
 

Just enter the first single quotation mark to open the sed program script,
and bash continues to prompt you for more commands until you enter the closing quotation mark:

$ sed -e '
> s/brown/green/
> s/fox/elephant/
> s/dog/cat/' data1.txt



Reading editor commands from a file:
In this case, you don’t put a semicolon after each command. 

$ cat script1.sed 
s/brown/green/ 
s/fox/elephant/ 
s/dog/cat/
$
$ sed -f script1.sed data1.txt



Getting to know the gawk program
--------------------------------
providing a program- ming language instead of just editor commands. 
Within the gawk programming language, you can do the following:

■ Define variables to store data.
■ Use arithmetic and string operators to operate on data.
■ Use structured programming concepts, such as if-then statements and loops, to add logic to your data processing.
■ Generate formatted reports by extracting data elements within the data file and repositioning them in another order or format.

Visiting the gawk command format:
Here’s the basic format of the gawk program: 

gawk options program file




Reading the program script from the command line:
A gawk program script is defined by opening and closing braces. 
You must place script com- mands between the two braces ({}). 

the gawk program retrieves data from STDIN.
When you run the program, it just waits for text to come in via STDIN.

The Ctrl+D key combination generates an EOF character in bash. 
Using that key combination terminates the gawk program and returns you to a command line interface prompt.

gawk '{print "Hello World!"}'

Using data field variables:

By default, gawk assigns the following variables to each data field it detects in the line of text:

Each data field is determined in a text line by a field separation character. 
The default field separation character in gawk is any whitespace character,
the -F option:uses a different field separation character

■ $0 represents the entire line of text.
■ $1 represents the first data field in the line of text.
■ $2 represents the second data field in the line of text.
■ $n represents the nth data field in the line of text.

$ gawk '{print $1}' data2.txt
$ gawk -F: '{print $1}' /etc/passwd




Using multiple commands in the program script:
$ echo "My name is Rich" | gawk '{$4="Christine"; print $0}'

$ gawk '{
> $4="Christine"
> print $0}'
My name is Rich
My name is Christine 
$





Reading the program from a file:


$ cat script3.gawk
{
text = "'s home directory is "
print $1 text $6
}

$ cat script2.gawk
{print $1 "'s home directory is " $6}
$
$ gawk -F: -f script2.gawk /etc/passwd




Running scripts before processing data:


The BEGIN keyword,
forces gawk to execute the program script specified after the BEGIN keyword, before gawk reads the data:


$ gawk 'BEGIN {print "Hello World!"}'


$ gawk 'BEGIN {print "The data3 File Contents:"} 
> {print $0}' data3.txt


Running scripts after processing data:


$ gawk 'BEGIN {print "The data3 File Contents:"} 
> {print $0}
> END {print "End of File"}' data3.txt



or

$ cat script4.gawk
BEGIN {
print "The latest list of users and shells" print " UserID \t Shell"
print "-------- \t -------"
FS=":"
}

print $1" \t" $7
}

END {
print "This concludes the listing"
}
$
$ gawk -f script4.gawk /etc/passwd



Section : Commanding at the sed Editor Basics

===============================================


Introducing more substitution options:
----------------------------------------
The substitute command, by default, it replaces only the first occurrence in each line.
The substitution flag is set after the substitution command strings:

s/pattern/replacement/flags

■ A number, indicating the pattern occurrence for which new text should be substituted
■ g, indicating that new text should be substituted for all occurrences of the existing text
■ p, indicating that the contents of the original line should be printed
■ w file, which means to write the results of the substitution to a file


$ sed -n 's/test/trial/p' data5.txt
$ sed 's/test/trial/g' data4.txt
$ sed 's/test/trial/w test.txt' data5.txt



 the sed editor allows you to select a different character for the string delimiter in the substitute command:
 
$ sed 's!/bin/bash!/bin/csh!' /etc/passwd








Using addresses:
-----------------
If you want to apply a command only to a specific line or a group of lines,
you must use line addressing.


There are two forms of line addressing in the sed editor:
■ A numeric range of lines
can be a single line number 
a range of lines specified by a starting line number, a comma, and an ending line number.
a group of lines starting at some point within the text, but continuing to the end of the text, you can use the special address, the dollar sign:

$ sed '2s/dog/cat/' data1.txt
$ sed '2,3s/dog/cat/' data1.txt
$ sed '2,$s/dog/cat/' data1.txt



■ A text pattern that filters out a line
This is the format:
/pattern/command
$ sed '/Samantha/s/bash/csh/' /etc/passwd





Both forms use the same format for specifying the address:


[address]command

address { command1
command2
command3
}


The sed editor assigns the first line in the text stream as line number one and continues sequentially for each new line.



Grouping commands:
$ sed '2{
> s/fox/elephant/ 
> s/dog/cat/
> }' data1.txt


$ sed '3,${
> s/brown/green/
> s/lazy/active/
> }' data1.txt




Deleting lines:
------------------
Rember that the sed editor doesn’t touch the original  file.


 
$ sed 'd' data1.txt
you forget to include an addressing scheme, all the lines are deleted from the stream:

$ sed '3d' data6.txt
$ sed '2,3d' data6.txt
$ sed '3,$d' data6.txt
$ sed '/number 1/d' data6.txt



You can also delete a range of lines using two text patterns,
The first pattern you specify “turns on” the line deletion, and the second pattern “turns off” the line deletion. 
The sed editor deletes any lines between the two specified lines (including the specified lines):
but never found the end pattern match, the entire data stream was deleted.


$ sed '/1/,/3/d' data6.txt




Inserting and appending text:
-------------------------------
To insert or append more than one line of text, 
you must use a backslash on each line of new text until you reach the last text line where you want to insert or append text:


sed '[address]command\ 
new line'



$ echo "Test Line 2" | sed 'i\Test Line 1'


$ echo "Test Line 2" | sed 'i\ 
> Test Line 1'



$ echo "Test Line 2" | sed 'a\Test Line 1'


$ sed '3i\
> This is an inserted line.' data6.txt



$ sed '3a\
> This is an appended line.' data6.txt



$ sed '$a\
> This is a new line of text.' data6.txt



$ sed '1i\
> This is one line of new text.\
> This is another line of new text.' data6.txt





Changing lines:
-----------------
The change command allows you to change the contents of an entire line of text in the data stream. 


$ sed '3c\
> This is a changed line of text.' data6.txt





$ sed '/number 3/c\
> This is a changed' data6.txt



$ sed '/number 1/c\
> This is a changed line of text.' data8.txt



$ sed '2,3c\
> This is a new line of text.' data6.txt 



This is line number 1.
This is a new line of text.
This is line number 4.
$




Transforming characters:
---------------------------
The transform command (y) is the only sed editor command that operates on a single character. 
The transform command uses the format:

[address]y/inchars/outchars/


$ echo "This 1 is a test of 1 try." | sed 'y/123/456/' 
This 4 is a test of 4 try.
$


Printing revisited:
--------------------
■ The p command to print a text line

the -n option ,
suppress all the other lines and print only the line that contains the matching text pattern.

$ echo "this is a test" | sed 'p'

$ cat data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
$
$ sed -n '/number 3/p' data6.txt This is line number 3.
$

$ sed -n '2,3p' data6.txt

$ sed -n '/3/{
>p
> s/line/test/p
> }' data6.txt

■ The equal sign (=) command to print line numbers

$ sed '=' data1.txt

$ sed -n '/number 4/{ 
>=
>p
> }' data6.txt


■ The l (lowercase L) command to list a line
The list command (l) allows you to print both the text and nonprintable characters 




Using  files with sed:
-----------------------
Writing to a file :
[address]w filename



$ sed '1,2w test.txt' data6.txt


$ sed -n '/Browncoat/w Browncoats.txt' data11.txt


Reading data from a file:
allows you to insert data contained in a separate file.
The sed editor inserts the text from the file after the address.

[address]r filename

$ cat data12.txt
This is an added line.
This is the second added line. 
$
$ sed '3r data12.txt' data6.txt This is line number 1.

This is line number 2.
This is line number 3.
This is an added line.
This is the second added line. This is line number 4.
$

$ sed '/number 2/r data12.txt' data6.txt This is line number 1.
This is line number 2.
This is an added line.
This is the second added line. This is line number 3.
This is line number 4.
$


$ sed '$r data12.txt' data6.txt

a delete com- mand to replace a placeholder in a file with data from another file. 
$ sed '/LIST/{
> r data11.txt
>d
> }' notice.std







**************************************

Chapter  20 Regular Expressions

**************************************


Section : What Are Regular Expressions?    

===================================
    
The Linux world has two popular regular expression engines:


■ The POSIX Basic Regular Expression (BRE) engine
■ The POSIX Extended Regular Expression (ERE) engine

    
some utilities (such as the sed editor) conform only to a subset of the BRE engine specifications. 
The POSIX ERE engine is often found in programming languages 
The gawk program uses the ERE engine to process its regular expression patterns.




Section : Defining BRE Patterns    

===============================


Plain text:
------------
    $ echo "This is a test" | sed -n '/test/p'
    $ echo "This is a test" | gawk '/test/{print $0}'

    
Special characters:
-------------------    
    These special characters are recognized by regular expressions:
    can’t use these characters by them- selves in your text pattern.
    you need to escape it
    The special character that does this is the backslash character (\).
   
.*[]^${}\+?|()

    
    
    
    $ cat data2
The cost is $4.00
$ sed -n '/\$/p' data2

$ echo "\ is a special character" | sed -n '/\\/p'
$ echo "3 / 2" | sed -n '/\//p'


Anchor characters:
------------------
You can use two special characters to anchor a pattern to either the beginning or the end of lines in the data stream.

Starting at the beginning:

The caret character (^) defines a pattern that starts at the beginning of a line ,

$ echo "The book store" | sed -n '/^book/p' 
$ echo "Books are great" | sed -n '/^Book/p'

If you position the caret character in any place other than at the beginning of the pattern, 
it acts like a normal character and not as a special character:

$echo"This^ isatest"|sed-n'/s^/p'

Looking for the ending:

The dollar sign ($) special character defines the end anchor.
   
    $ echo "This is a good book" | sed -n '/book$/p'
   
    $ sed -n '/^this is a test$/p' data4
   
    
 Combining anchors:
    
    filter blank lines from the data stream.
   
    $ cat data5
This is one test line.
This is another test line. 
$ sed '/^$/d' data5
This is one test line. 
This is another test line. 
$
    


The dot character :
---------------------
The dot special character is used to match any single character except a newline character.


$ sed -n '/.at/p' data6


Character classes:
---------------------
limit what characters to match? This is called a character class in regular expressions.
To define a character class, you use square brackets.The brackets should contain any char- acter you want to include in the class. 


$ sed -n '/[ch]at/p' data6
$ echo "Yes" | sed -n '/[Yy]es/p'
$ echo "yes" | sed -n '/[Yy]es/p'
$ echo "Yes" | sed -n '/[Yy][Ee][Ss]/p' 
    
If you want to ensure that you match against only five numbers, you need to delin- eate them somehow, either with spaces, 
or as in this example, by showing that they’re at the start and end of the line:
    
    $ sed -n '
> /^[0123456789][0123456789][0123456789][0123456789][0123456789]$/p > ' data8



Negating character classes
----------------------------
 you can look for any character that’s not in the class. 
 To do that, just place a caret character at the beginning of the character class range:


    $ sed -n '/[^ch]at/p' data6
    
Using ranges
------------
Just specify the first character in the range, a dash, and then the last character in the range.


$ sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8


The new pattern [c-h]at matches words where the first letter is between the letter c and the letter h.
$ sed -n '/[c-h]at/p' data6


You can also specify multiple, non-continuous ranges in a single character class:
$ sed -n '/[a-ch-m]at/p' data6


Special character classes
--------------------------
In addition to defining your own character classes, the BRE contains special character classes you can use to match against specific types of characters.




The asterisk
--------------
signifies that the character must appear zero or more times in the text to match the pattern:






Section : Extended Regular Expressions

=======================================
The POSIX ERE patterns include a few additional symbols that are used by some Linux applications and utilities.


The gawk program recognizes the ERE patterns, but the sed editor doesn’t.
    
The question mark
-------------------
indicates that the preceding character can appear zero or one time,
$ echo "bt" | gawk '/be?t/{print $0}' 
bt    


The plus sign
---------------
 indicates that the preceding character can appear one or more times, but must be present at least once.
 $ echo "beeet" | gawk '/be+t/{print $0}'
 
 $ echo "beat" | gawk '/b[ae]+t/{print $0}' 
 beat
 
 Using braces
 ---------------
 Curly braces are available in ERE to allow you to specify a limit on a repeatable regular expression.
  This is often referred to as an interval.
 ■ m: The regular expression appears exactly m times.
■ m,n: The regular expression appears at least m times, but no more than n times.


 By default, the gawk program doesn’t recognize regular expression intervals. 
 --re-interval command line option for the gawk program to recognize regular expression intervals.


$ echo "bet" | gawk --re-interval '/be{1}t/{print $0}'
 $ echo "beet" | gawk --re-interval '/be{1,2}t/{print $0}'
 $ echo "bat" | gawk --re-interval '/b[ae]{1,2}t/{print $0}
'
 
 
 The pipe symbol
 ----------------
 The pipe symbol allows to you to specify two or more patterns ,
 If any of the patterns match the data stream text, the text passes. 
 
 $ echo "The dog is asleep" | gawk '/cat|dog/{print $0}'
 
 Grouping expressions
 ------------------------
 
 Regular expression patterns can also be grouped by using parentheses. 
 When you group a regular expression pattern, the group is treated like a standard character.    
    
    $ echo "Sat" | gawk '/Sat(urday)?/{print $0}'
Sat


$ echo "cat" | gawk '/(c|b)a(b|t)/{print $0}'



Regular Expressions in Action

===================


Counting directory  files
--------------------------
$ cat countfiles
#!/bin/bash
# count number of files in your PATH 
mypath=$(echo $PATH | sed 's/:/ /g')
 count=0
for directory in $mypath
do
check=$(ls $directory) 
for item in $check
do
count=$[ $count + 1 ] 
done
echo "$directory - $count"
count=0 
done  
$




Validating a phone number
--------------------------
#!/bin/bash
# script to filter out bad phone numbers
gawk --re-interval '/^\(?[2-9][0-9]{2}\)?(| |-|\¬ [0-9]{3}( |-|\.)[0-9]{4}/{print $0}'
$


Parsing an e-mail address
--------------------------


  


**************************************

Chapter  21 Advanced sed

**************************************


Section :  Looking at Multiline Commands

===================================


The sed editor includes three special commands that you can use to process mul- tiline text:
■ N adds the next line in the data stream to create a multiline group for processing.
■ D deletes a single line in a multiline group.
■ P prints a single line in a multiline group.
    


Navigating the next command
---------------------------
The lowercase n command tells the sed editor to move to the next line of text in the data stream, 
without going back to the beginning of the commands.     


The single-line next command moves the next line of text from the data stream into 
the processing space (called the pattern space) of the sed editor.


The multiline version of the next command (which uses a capital N) 
adds the next line of text to the text already in the pattern space.
The lines of text are still separated by a newline character,
but the sed editor can now treat both lines of text as one line.



the single-line next command:
$ sed '/header/{n ; d}' data1.txt
    


Combining lines of text:
    $ sed '/first/{ N ; s/\n/ / }' data2.txt
$ sed 'N ; s/System.Administrator/Desktop User/' data3.txt

$ sed 'N
> s/System\nAdministrator/Desktop\nUser/ 
> s/System Administrator/Desktop User/
> ' data3.txt

$ sed '
> s/System Administrator/Desktop User/ 
>N
> s/System\nAdministrator/Desktop\nUser/ 
> ' data4.txt





Navigating the multiline delete command:
------------------------------------------
 the single-line delete command (d):


$ sed 'N ; /System\nAdministrator/d' data4.txt
The delete command looked for the words System and Administrator in separate lines and 
deleted both of the lines in the pattern space.




the multiline delete command (D), 
$ sed 'N ; /System\nAdministrator/D' data4.txt

deletes only the first line in the pattern space. 
It removes all characters up to and including the newline character:


$ sed '/^$/{N ; /header/D}' data5.txt
removing a blank line that appears before the first line



Navigating the multiline print command:
------------------------------------------
The multiline print command (P),
prints only the first line in a multiline pattern space. 

The D command has a unique feature in that it forces the sed editor to return to the begin- ning of the script and repeat the commands on the same pattern space (it doesn’t read
a new line of text from the data stream). 


By including the N command in the command script, 
you can effectively single-step through the pattern space, matching multiple lines together.


Next, by using the P command, you can print the first line, and then using the D command,
you can delete the first line and loop back to the beginning of the script. When you are back at the script’s beginning, the N command reads in the next line of text and starts the process all over again. This loop continues until you reach the end of the data stream.




Section : Holding Space

=========================
The pattern space is an active buffer area that holds the text examined by the sed editor while it processes commands.


You can use the hold space to temporarily hold lines of text while working on other lines in the pattern space.


the five commands associated with operating with the hold space:


h, Copies pattern space to hold space
H, Appends pattern space to hold space
g, Copies hold space to pattern space
G, Appends hold space to pattern space
x, Exchanges contents of pattern and hold spaces

$ sed -n '/first/ {h ; p ; n ; p ; g ; p }' data2.txt




Section : Negating a Command
==============================
You can also configure a command to not apply to a specific address or address range in the data stream.


The exclamation mark command (!) is used to negate a command.


$ sed -n '/header/!p' data2.txt


$ sed 'N;
> s/System\nAdministrator/Desktop\nUser/ 
> s/System Administrator/Desktop User/
> ' data4.txt




when the sed editor reaches the last line, it doesn’t execute the N command.


$ sed '$!N;
> s/System\nAdministrator/Desktop\nUser/
> s/System Administrator/Desktop User/
> ' data4.txt

     
 
   
 Reversing the order of a text file using the hold space 
 $ sed -n '{1!G ; h ; $p }' data2.txt
    
     
     

Section : Changing the Flow     

============================
 the sed editor processes commands,
  starting at the top and proceeding 
  toward the end of the script 
  (the exception is the D command, which forces the sed editor to return to the top of the script without reading a new line of text). 
     
 
 Branching:
 -----------
 provides a way to negate an entire section of commands, based on an address, an address pattern, or an address range.
 
  the format of the branch command:
  
  [address]b [label]   
    The address parameter determines which line or lines of data trigger the branch com- mand.
    
    The label parameter defines the location to branch to. 
    Labels start with a colon and can be up to seven characters in length:
    :label2

   
    If the label parameter is not present, the branch command proceeds to the end of the script.
    
    
    $ sed '{2,3b ; s/This is/Is this/ ; s/line./test?/}' data2.txt
    
     If the branch command pattern doesn’t match, 
     the sed editor continues processing commands in the script, including the command after the branch label. 
     
    $ sed '{/first/b jump1 ; s/This is the/No jump on/ > :jump1
> s/This is the/Jump here on/}' data2.txt



    
    
    $ echo "This, is, a, test, to, remove, commas." | sed -n '{ 
    > :start
> s/,//1p
> b start
> }'


$ echo "This, is, a, test, to, remove, commas." | sed -n '{ 
> :start
> s/,//1p
> /,/b start
> }'



Testing
-----------
the test com- mand jumps to a label based on the outcome of a substitution command.


Like the branch command, if you don’t specify a label, sed branches to the end of the script if the test succeeds.


$ sed '{
> s/first/matched/
>t
> s/This is the/No match on/ > }' data2.txt




$ echo "This, is, a, test, to, remove, commas. " | sed -n '{ > :start
> s/,//1p
> t start
> }'






Section : Replacing via a Pattern

===================================


$ echo "The cat sleeps in his hat." | sed 's/cat/"cat"/' 
The "cat" sleeps in his hat.
$



$ echo "The cat sleeps in his hat." | sed 's/.at/".at"/g' 
The ".at" sleeps in his ".at".
$



Using the ampersand
--------------------


The ampersand symbol (&) is used to represent the matching pattern in the substitution command.
Whatever text matches the pattern defined, you can use the ampersand symbol to recall it in the replacement pattern.


$ echo "The cat sleeps in his hat." | sed 's/.at/"&"/g' 
The "cat" sleeps in his "hat".
$


Replacing individual words
---------------------------
uses parentheses to define a substring component within the substitution pattern. 
then reference each substring component using a special character in the replacement pattern. 
The replacement character consists of a backslash and a number. 
The number indicates the substring component’s position. 
The sed editor assigns the first com- ponent the character \1, 




$ echo "That furry cat is pretty" | sed 's/furry \(.at\)/\1/' 
That cat is pretty
$
$ echo "That furry hat is pretty" | sed 's/furry \(.at\)/\1/' 
That hat is pretty



$ echo "1234567" | sed '{
> :start
> s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/ 
> t start
> }'
1,234,567
$


.*[0-9] any number of characters, ending in a digit. 
 [0-9]{3} a series of three digits






Section : Placing sed Commands in Scripts

=================================


Using wrappers
---------------
$ cat reverse.sh
#!/bin/bash
# Shell wrapper for sed editor script.
# to reverse text file lines. #
sed -n '{ 1!G ; h ; $p }' $1
#
$




Redirecting sed output
------------------------
ou can use dollar sign/parenthesis, $(), 
to redirect the output of your sed editor com- mand to a variable for use later in the script. 






result=$(echo $factorial | sed '{ 
:start 
s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
t start
}')












Section : Creating sed Utilities

=================================


Spacing with double lines
---------------------------


The key to this trick is the default value of the hold space.
When you start the sed editor, the hold space contains an empty line. 


$ sed 'G' data2.txt
This is the header line.


This is the first data line.


$
----
$ sed '$!G' data2.txt
This is the header line.


This is the first data line.
$


Spacing  files that may have blanks
------------------------------------


$ cat data6.txt
 This is line one. 
 This is line two.
 
 
This is line three.
 This is line four.
$


$ sed '/^$/d ; $!G' data6.txt
This is line one.


This is line two. 


This is line three.


This is line four. 
$


Numbering lines in a  file
--------------------------


$ sed '=' data2.txt | sed 'N; s/\n/ /'


$ nl data2.txt


$ cat -n data2.txt




Printing last lines
---------------------


$ sed -n '$p' data2.txt


a rolling window.


$ sed '{
> :start
> $q ; N ; 11,$D
> b start
> }' data7.txt


Deleting lines
---------------


Deleting consecutive blank lines:
$ sed '/./,/^$/!d' data8.txt




Deleting leading blank lines:
$ sed '/./,$!d' data9.txt


Deleting trailing blank lines:


$ sed '{
> :start
> /^\n*$/{$d ; N ; b start } 
> }' data10.txt




Removing HTML tags
---------------------


$ sed 's/<.*>//g' data11.txt
$ sed 's/<[^>]*>//g' data11.txt
$ sed 's/<[^>]*>//g ; /^$/d' data11.txt
    






**************************************

Chapter  22 Advanced gawk

**************************************


Section :  Using Variables 

===================================
  The gawk programming language supports two different types of variables:
  
■ Built-in variables
 
  The field and record separator variables,
 
FIELDWIDTHS, A space-separated list of numbers de ning the exact width (in spaces) of each data  eld
FS , Input  field separator character
RS ,Input record separator character
OFS ,Output  field separator character
ORS ,Output record separator character

By default, gawk sets the OFS variable to a space, 
By default, gawk sets the RS and ORS variables to the newline character.

$ gawk 'BEGIN{FS=","} {print $1,$2,$3}' data1
$ gawk 'BEGIN{FS=","; OFS="-"} {print $1,$2,$3}' data1
 
$ cat data1b
1005.3247596.37
$ gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' data1b 
100 5.324 75 96.37

$ gawk 'BEGIN{FS="\n"; RS=""} {print $1,$4}' data2

 
  Data variables,
 
  ARGC , The number of command line parameters present
  ARGIND , The index in ARGV of the current  file being processed
  ARGV , An array of command line parameters
  FILENAME, The  filename of the data  file used for input to the gawk program
  FNR , The current record number in the data  file
  NF , The total number of data  fields in the data file
  NR , The number of input records processed
  RLENGTH ,
  RSTART ,
  ENVIRON , An associative array of the current shell environment variables and their values
 
 
  remember that the program script doesn’t count as a parameter
 
  $ gawk 'BEGIN{print ARGC,ARGV[1]}' data1
  2 data1
$
 
 
  $ gawk '
> BEGIN{
> print ENVIRON["HOME"]
> print ENVIRON["PATH"]
> }'
/home/rich 
/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin 
$
 
  $ gawk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd 
  rich:/bin/bash
  $
 
  $ gawk 'BEGIN{FS=","}{print $1,"FNR="FNR}' data1 data1 data11 FNR=1
data21 FNR=2
data31 FNR=3
data11 FNR=1 
data21 FNR=2 
data31 FNR=3 
$




$ gawk '
> BEGIN {FS=","}
> {print $1,"FNR="FNR,"NR="NR}
> END{print "There were",NR,"records processed"}' data1 data1 
data11 FNR=1 NR=1
data21 FNR=2 NR=2
data31 FNR=3 NR=3
data11 FNR=1 NR=4
data21 FNR=2 NR=5
data31 FNR=3 NR=6
There were 6 records processed
$



 
 
■ User-defined variables
---------------------------
A gawk user-defined variable name can be any number of letters, digits, and underscores, but it can’t begin with a digit. 
It is also important to remember that gawk variable names are case sensitive.


Assigning variables in scripts:


$ gawk '
> BEGIN{
> testing="This is a test" 
> print testing
> }'
This is a test
$


$ gawk 'BEGIN{x=4; x= x * 2 + 3; print x}'







Assigning variables on the command line:


$ cat script1 
BEGIN{FS=","}
{print $n}

$ gawk -f script1 n=2 data1




$ cat script2
BEGIN{print "The starting value is",n; FS=","} 
{print $n}

-v command line parameter,
specify vari- ables that are set before the BEGIN section of code.
must be placed before the script code in the command line:
$ gawk -f script2 n=3 data1
$ gawk -v n=3 -f script2 data1







Section : Working with Arrays

===============================


The gawk programming language provides the array feature using associative arrays.
the index value can be any text string. 
Associative arrays,this is the same con- cept as hash maps or dictionaries.



Defining array variables
-------------------------


$ gawk 'BEGIN{
> capital["Illinois"] = "Springfield"
> print capital["Illinois"] 
> }'
Springfield
$





  $ gawk 'BEGIN{
> var[1] = 34
> var[2] = 3
> total = var[1] + var[2] 
> print total
> }' 37
$




Iterating through array variables
-----------------------------------
for (var in array) {
statements
}


$ gawk 'BEGIN{
> var["a"] = 1
> var["g"] = 2
> var["m"] = 3
> var["u"] = 4
> for (test in var) 
>{
> print "Index:",test," - Value:",var[test] 
>}
> }'






Deleting array variables
---------------------------
Removing an array index,removes the associative index value and the associated data element


delete array[index]










Section : Using Patterns

=============================


Regular expressions
----------------------
You can use either a Basic Regular Expression (BRE) or an Extended Regular Expression (ERE)


 the regular expression must appear before the left brace of the program script that it controls:
  $ gawk 'BEGIN{FS=","} /11/{print $1}' data1
 
 
 


The matching operator
----------------------
to restrict a regular expression to a specific data field in the records. 


The matching operator is the tilde symbol (~). 


$1 ~ /^data/
filters records where the first data field starts with the text data. 

$ gawk 'BEGIN{FS=","} $2 ~ /^data2/{print $0}' data1


$ gawk -F: '$1 ~ /rich/{print $1,$NF}' /etc/passwd


$ gawk –F: '$1 !~ /rich/{print $1,$NF}' /etc/passwd







Mathematical expressions
--------------------------
you can also use mathematical expressions in the matching pattern. 




■ x == y: Value x is equal to y.
■ x <= y: Value x is less than or equal to y.
■ x < y: Value x is less than y.
■ x >= y: Value x is greater than or equal to y.
■ x > y: Value x is greater than y.






display all the system users who belong to the root users group (group number 0), 
$ gawk -F: '$4 == 0{print $1}' /etc/passwd






Section : Structured Commands

==================================


The if statement
-----------------
if (condition) statement1


if (condition) 
statement1


$ gawk '{if ($1 > 20) print $1}' data4




$ gawk '{
> if ($1 > 20) 
>{
> x = $1 * 2
> print x 
>}
> }' data4




 on a single line, but you must use a semicolon after the if statement section:
if (condition) statement1; else statement2
$ gawk '{if ($1 > 20) print $1 * 2; else print $1 / 2}' data4






The while statement
--------------------


while (condition) 
{
statements 
}
 
 
 
 The do-while statement
 ------------------------
 
 do 
 {
 statements
} while (condition)




The for statement
--------------------
for( variable assignment; condition; iteration process)




Section : Formatted Printing

===============================
the print statement doesn’t exactly give you much control


the formatted printing command, called printf,allowing you to specify detailed instructions on how to display data.


printf "format string", var1, var2 . . .



The format specifiers use the following format:
%[modifier]control-letter
 
 
 printf "The answer is: %e\n", x
 
 three modifiers:
  ■ width:
  ■ prec:
  ■ - (minus sign):
 
 $ gawk 'BEGIN{FS="\n"; RS=""} {print $1,$4}' data2
 $ gawk 'BEGIN{FS="\n"; RS=""} {printf "%s %s\n", $1, $4}' data2
 $ gawk 'BEGIN{FS=","} {printf "%s ", $1} END{printf "\n"}' data1
$ gawk 'BEGIN{FS="\n"; RS=""} {printf "%16s %s\n", $1, $4}' data2
$ gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}' data2


> printf "Average: %5.1f\n",avg








Section : Built-In Functions

===============================


Mathematical functions
------------------------
$ gawk 'BEGIN{x=exp(100); print x}'


String functions
----------------


The gawk String Functions.....




$ gawk 'BEGIN{x = "testing"; print toupper(x); print length(x) }'




Time functions
----------------






Section :  User-De ned Functions
===================================
You can create your own functions for use in gawk programs.


De ning a function
--------------------


function name([variables]) 
{
statements
}




function printthird() 
{
print $3 
}



function myrand(limit) 
{
return int(limit * rand()) 
}


x = myrand(100)






Using your functions
-----------------------


When you define a function, 
it must appear by itself before you define any programming sections (including the BEGIN section).


$ gawk '
> function myprint()
>{
> printf "%-16s - %s\n", $1, $4 >
}
> BEGIN{FS="\n"; RS=""}
>{
> myprint()
> }' data2





Creating a function library
------------------------------
 gawk provides a way for you to combine your functions into a single library file that you can use in all your gawk programming.
 
 
 $ cat funclib 
 
function myprint() 
{
printf "%-16s - %s\n", $1, $4 
}


function myrand(limit) 
{
return int(limit * rand()) 
}


function printthird() 
{
print $3 
}
$




$ cat script4
BEGIN{ FS="\n"; RS=""} 
{
myprint() 
}




$ gawk -f funclib -f script4 data2


The funclib file contains three function definitions. 
To use them, you need to use the -f command line parameter. 
Unfortunately, you can’t combine the -f command line parameter with an inline gawk script, 
but you can use multiple -f parameters on the same command line.




Section : Working through a Practical Example

================================================
$ cat scores.txt
Rich Blum,team1,100,115,95
Barbara Blum,team1,110,115,100 
Christine Bresnahan,team2,120,115,118 
Tim Bresnahan,team2,125,112,116
$



#!/bin/bash
for team in $(gawk –F, '{print $2}' scores.txt | uniq) 
do
gawk –v team=$team 'BEGIN{FS=","; total=0} 
{
if ($2==team) 
{
total += $3 + $4 + $5; 
}
}
END {
avg = total / 6;
print "Total for", team, "is", total, ",the average is",avg }' scores.txt


done 

$



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值