awk教程-第二章

2 运行 awk

2.1 调用awk

这里介绍命令行调用中所带的参数的意义:

-F fs :表示设置记录里field的分割符号(field seperator),默认是空格/tab

-f source-file :表示awk程序的源文件,这个参数可以给多次

-i source-file :这里是加载源文件并作为一个库函数文件,相当于在程序里使用@include 这个选项使用方式和-f很相似,但是有两点不同:

1)如果-i调用的source-file已经被调用过,不会再次调用,而-f我么知道可以多次调用同一个源文件;

2-i调用的源文件当成库文件,gawk并把它当成main文件,所以如果程序要运行,还需要-f来调用一个main文件,告诉gawk如何运行。

-v var=val :这个是设置变量,可以方便在awk程序段里运行,一个-v只能用一次,如果要设置多个变量,可以:-v var1=val1 –v var2=val2 …

-c :进入兼容(compatibility)能够是gawkawk一样

-C :打印简短的License

-E file :类似-f,从file里面读取程序段,但是有两个不同

       (1)这个选项参数使得命令行上无法直接下awk程序传递参数

       (2)命令行的var=val也无效

 

更多选项大家自己help哈,如有需要后面再添加。

 

2.2从标准输入读取数据

我们知道awk命令可以从file文件中读取数据,当然也可以采用管道从标准输入中读入,如果二者都想呢,这里采用破折号-代表标准输入的输入

1:我们想从输出ls的结果同时打印出data1的内容(NR代表记录数)

$ ls | awk '{print NR,$0}' data1 -

1 hello world.

2 I am edited by ylf.

3 ylf is a boy,who now is a student in the university.

4 he is so fool.

5 I hope I can learn awk program, and in this way, another purpose, learn Englesh.

6

7 bbs-list

8 data1

9 example02.awk

 

如果不用管道,直接awk ‘{print NR,$0}’ data –会发生什么呢?大家可以试试就知道-的用处了。

 

2.3 gawk的环境变量

1)先说说AWKPATH这个环境变量,是用来查找source-file用的,当我们指定-f  -i选项参数的时候,后面跟的source-file 没有指定路径,而只是文件名时候,就派上用场了。一般查找规则,先找当前目录,然后再找AWKPATH指定目录。记住,awk调用环境变量方式使用ENVIRON[]来指定对象。

$ awk 'BEGIN{print ENVIRON["AWKPATH"]}'

.:/usr/share/awk

 

我们看到,输出的默认查找路径是/usr/share/awk

 

2AWKLIBPATH

这个环境变量是用来指定链接库的地址,就是Lib的位置,是选项参数-l 指定的文件。例如Linux下就是.so文件,可以用@load关键字来载入。

$ awk 'BEGIN{print ENVIRON["AWKLIBPATH"]}'

/usr/lib/gawk

 

3)其他变量就不解释了,大家自己看吧

A number of other environment variables affect gawk's behavior, but they are more specialized. Those in the following list are meant to be used by regular users.

 

POSIXLY_CORRECT

Causes gawk to switch POSIX compatibility mode, disabling all traditional and GNU extensions. See Options.

GAWK_SOCK_RETRIES

Controls the number of time gawk will attempt to retry a two-way TCP/IP (socket) connection before giving up. See TCP/IP Networking.

GAWK_MSEC_SLEEP

Specifies the interval between connection retries, in milliseconds. On systems that do not support the usleep() system call, the value is rounded up to an integral number of seconds.

GAWK_READ_TIMEOUT

Specifies the time, in milliseconds, for gawk to wait for input before returning with an error. See Read Timeout.

The environment variables in the following list are meant for use by the gawk developers for testing and tuning. They are subject to change. The variables are:

 

AVG_CHAIN_MAX

The average number of items gawk will maintain on a hash chain for managing arrays.

AWK_HASH

If this variable exists with a value of ‘gst’, gawk will switch to using the hash function from GNU Smalltalk for managing arrays. This function may be marginally faster than the standard function.

AWKREADFUNC

If this variable exists, gawk switches to reading source files one line at a time, instead of reading in blocks. This exists for debugging problems on filesystems on non-POSIX operating systems where I/O is performed in records, not in blocks.

GAWK_NO_DFA

If this variable exists, gawk does not use the DFA regexp matcher for “does it match” kinds of tests. This can cause gawk to be slower. Its purpose is to help isolate differences between the two regexp matchers that gawk uses internally. (There aren't supposed to be differences, but occasionally theory and practice don't coordinate with each other.)

GAWK_STACKSIZE

This specifies the amount by which gawk should grow its internal evaluation stack, when needed.

TIDYMEM

If this variable exists, gawk uses the mtrace() library calls from GNU LIBC to help track down possible memory leaks.

 

2.4 退出状态

exit [return-code] 返回错误码,一般0表示正确退出,其他则代表出错。

如果在BEGIN{}语句里出现了exit,则程序将停止,不再读入数据,但是如果有

END{},则将执行END{}里面的行为

$ awk 'BEGIN{n=10

           exit 0}

     {print "In the body\n"}

    END{printf "In the end %d\n",n}'

In the end 10

 

$ ls | awk 'BEGIN{}

     {print "in the body\n" ;exit 1}

     END{print "in the end\n"}

    '

in the body

 

in the end

 

我们看到,这个END会在exit后执行,可以看成一个清理函数。但是如果这个END里面也执行exit 呢?看下下面两个对比

$ ls | awk '

          {exit 1;

           print "in the body\n"

          }

          END{

              print "in the end\n"

             }

         '

in the end

 

$ ls | awk '

          {exit 1;

           print "in the body\n"

          }

          END{print "clear OK\n"; exit 1;

              print "in the end\n"

             }

         '

clear OK

 

我们看到,如果END里面再次执行exit,程序直接退出了,真的退出了。

如何在shell里查看返回码呢?这个我想大家都知道吧,为了这篇教程完整点(其实是给自己补充知识),我给大家做个示范J

$ awk '

     BEGIN{

           exit 127;

          }

    '

 

$ echo $?

127

 

2.5 包含其他源文件

这里包含指的是@include这个关键字或者-i这个选项包含进来的。

例子才是王道:

$ cat example031

BEGIN{

        print "my No is 1\n"

     }

 

$ cat example032

@include "example031"

BEGIN{

        print "my No is 2\n"

     }

 

$ awk -f example032

my No is 1

 

my No is 2

 

我们看到了,这就感觉example031这个文件代码被写入到@include 这里了,这就是包含的概念,这种写法在简化一个大量函数的源文件是很有用的,而且这种嵌套可以递归,这就使得更加灵活了。注意格式@include “path/filename”  如果只是跟的文件名,则从当前目录和AWKPATH这个环境变量里面找。

 

2.6 调用动态链接库

就是@load或者-l 调用的文件,从AWKLIBPATH里面找,例如我们调用fork这个库,用来创建子进程的例子

$ cat example04

@load "fork"

BEGIN{

        if((pid=fork())<0)

                print "fork error\n"

        else if(pid==0){

                printf "in the child process;pid=%d\n",pid;

            }

        else{

                printf "parent wait for child exit\n";

                ret=wait();

                printf "in the parent process;pid=%d\n",pid;

                printf " return code is %d\n",ret;

            }

 

     }

 

$ awk -f example04

parent wait for child exit

in the child process;pid=0

in the parent process;pid=9616

 return code is 9616

 

最大的好处就是可以调用公有函数,减少代码复杂度。其实和-i挺像的,不过这个可以在多个项目之间调用,普遍性强。

---------------------------------转载声明------------------------------

感谢GNU官网教程,版权归GNU所有。                                                        元子 (元子_speed新浪微博)

----------------------------------------------------------------------------

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值