mac awk程序_将AWK嵌入C程序中

mac awk程序

Image 1

介绍 (Introduction)

AWK is a language of considerable influence and prestigious lineage. For the shortest (and funniest) of introductions, check the above comic of fellow Montrealer Julia Evans. AWK is an easy to use scripting language created by Alfred Aho (of Dragon Book fame), Peter Weinberger, and Brian Kernighan (of K&R fame). With such illustrious parents, it's not surprising that AWK quickly became one of the most popular scripting languages. It also promoted innovative concepts, like associative arrays, way before they became mainstream.

AWK是一种具有很大影响力和声望的语言。 有关简介的最短(也是最有趣的),请查看上面的蒙特利尔同人Julia Evans漫画。 AWK是由Alfred Aho(《 龙书》成名),Peter Weinberger和Brian Kernighan( K&R成名)创建的一种易于使用的脚本语言。 有了如此杰出的父母,AWKSwift成为最受欢迎的脚本语言之一也就不足为奇了。 在它们成为主流之前,它还促进了创新概念,例如关联数组。

AWK is at its best when you have some data that you have to "massage" to fit into a database. Of course, you could run the data through AWK before passing it to the database engine but if you want to use a small embedded database engine like SQLITE and you want to have everything in the same program, it would be nice to have the scripting language embedded in your C/C++ program. I couldn't find an AWK interpreter that could be embedded in C so I decided to make my own. This article describes the resulting code. A passing familiarity with AWK helps in following along but it's not required. If you want to brush-off your AWK skills, you can check this tutorial.

当您需要“按摩”某些数据以适合数据库时,AWK处于最佳状态。 当然,您可以在将数据传递给数据库引擎之前通过AWK运行数据,但是如果您想使用小型的嵌入式数据库引擎(如SQLITE)并且希望将所有内容都包含在同一程序中,那么拥有脚本语言会很好嵌入在您的C / C ++程序中。 我找不到可以嵌入在C中的AWK解释器,所以我决定自己做。 本文介绍了所得的代码。 熟悉AWK有助于后续操作,但这不是必需的。 如果您想掌握AWK技能,可以查看本教程

样品用量 (Sample Usage)

Let's start with a short example as an "executive summary" of what you can do with the co.

让我们从一个简短的示例开始,作为您对公司可以做什么的“执行摘要”。

The popular Unix wc program counts the lines, words and bytes in a file. Here is a stripped-down implementation using my AWK library:

流行的Unix wc程序对文件中的行,字和字节进行计数。 这是使用我的AWK库的简化实现:

#include <awklib.h>

int main (int argc, char **argv)
{
  AWKINTERP* interp = awk_init (NULL);			//initialize an interpreter object
  awk_setprog (interp,							//set the AWK program
    "{wc += NF; bc += length($0)}\n"
    "END {print NR, wc, bc, ARGV[1]}");
  awk_compile (interp);							//compile the program
  if (argc > 1)
    awk_addarg (interp, argv[1]);				//add our argument as an interpreter argument
  awk_exec (interp);							//execute AWK code
  awk_end (interp);								//cleanup everything
}

The program outlines the basic steps to have to follow to use the AWK library:

该程序概述了使用AWK库必须遵循的基本步骤:

  • First, you must create an interpreter object. All the other API functions take an interpreter as the first argument.

    首先,您必须创建一个解释器对象。 所有其他API函数都将解释器作为第一个参数。
  • You pass a program to the interpreter. As we will see, there are more ways of passing programs to the interpreter, but this is the simplest one.

    您将程序传递给解释器。 正如我们将看到的,有更多的方法可以将程序传递给解释器,但这是最简单的方法。
  • The program must be "compiled". The interpreter doesn't really compile it, but it produces a syntax tree that will be used in the execution phase.

    该程序必须是“已编译”的。 解释器并不会真正编译它,但是会生成一个语法树,该语法树将在执行阶段使用。
  • You can add input files to be processed by the AWK program. In this case, we add argv[1], the first argument that was received by the C program.

    您可以添加要由AWK程序处理的输入文件。 在这种情况下,我们添加argv[1] ,这是C程序接收到的第一个参数。

  • The AWK program gets executed. By default, the output is sent to stdout.

    AWK程序被执行。 默认情况下,输出发送到stdout

  • In the end, the interpreter object is deleted.

    最后,解释器对象被删除。

设计决策 (Design Decisions)

The first question I had to answer was what version of AWK code should I use. There are many variants of awk: gawk, mawk, etc. In the end, I decided to use Dr. Kernighan's original onetrueawk. The code can be found at https://github.com/onetrueawk/awk and it gives the sensation of being in a computer science museum. Just an example: one of the test files seems to be the etc/passwd file of a Unix system. It goes like this:

我必须回答的第一个问题是我应该使用哪个版本的AWK代码。 awk有许多变体:gawk,mawk等。最后,我决定使用Kernighan博士最初的onetrueawk。 可以在https://github.com/onetrueawk/awk上找到该代码,它给人以在计算机科学博物馆的感觉。 只是一个例子:测试文件之一似乎是Unix系统的etc / passwd文件。 它是这样的:

/dev/rrp3:

17379	mel
16693	bwk	me
16116	ken	him	someone else
...
8895	dmr

Assuming bwk stands for Brian W. Kernighan, I'll let you figure who ken and dmr might be 😊. For added antique flavor, /dev/rrp3: indicates the file was residing on a DEC RP03, RP04 or RP06. The RP06 were huge disks for their time with 128MB of data. Yay!

假设bwk代表Brian W. Kernighan,我让你知道kendmr可能是谁。 要获得更多的古董味道,请使用/ dev / rrp3:表示文件位于DEC RP03,RP04或RP06上。 RP06当时是拥有128MB数据的巨大磁盘。 好极了!

Given the historical quality of the code, I would have liked to keep it as a pure C project. Unfortunately, this was not possible mainly due to error handling issues. For a stand-alone program, it is perfectly acceptable to bail out in case of error; an embedded interpreter doesn't have this option. My solution was to wrap large portions of code in try...catch blocks. This idea of keeping it as a C product, at least on the outside, is the reason why I didn't organize the API as a C++ object. One part that had to be completely rewritten was the function call mechanism.

考虑到代码的历史质量,我希望将其保留为纯C项目。 不幸的是,这主要是由于错误处理问题而无法实现的。 对于独立程序,在出现错误的情况下纾困是完全可以接受的。 嵌入式解释器没有此选项。 我的解决方案是将大量代码包装在try ... catch块中。 将其保持为C产品(至少在外部使用)的想法是我未将API组织为C ++对象的原因。 必须完全重写的一部分是函数调用机制。

The API is not very large; I've tried to keep it to a minimum and the plan is to add new functions only if really needed.

API不是很大; 我试图将其最小化,并且计划是仅在确实需要时才添加新功能。

API说明 (API Description)

The API is structured around an opaque AWKINTERP object representing the AWK language interpreter. Its life cycle follows a series of irreversible state transitions: initialization, program compilation, program execution and destruction.

API是围绕表示AWK语言解释器的不透明AWKINTERP对象构造的。 它的生命周期遵循一系列不可逆的状态转换:初始化,程序编译,程序执行和销毁。

Access to interpreter's variables is done through an awksymb structure:

通过awksymb结构可以访问解释器的变量:

struct awksymb {	
   const char *name;        //variable name
   const char *index;       //array index
   unsigned int flags;      //variable type flags
   double fval;             //numerical value
   char *sval;              //string value
};

The same structure is used to pass parameters to an AWK callable C function (see awk_setfunc API call).

使用相同的结构将参数传递给AWK可调用的C函数(请参阅awk_setfunc API调用)。

In AWK, variables don't have a defined type or, better said, they are all strings and sometimes get converted to numbers if they are needed for a numerical operation. In the awksymb structure, the flags member indicates if the symbol is a string, in which case sval member is valid, or a number with the value in fval.

在AWK中,变量没有定义的类型,或者更好的说,它们都是字符串,如果数值运算需要它们,有时会转换为数字。 在awksymb结构中, flags成员指示符号是否为string (在这种情况下sval成员有效)或值为fval

Arrays are also special in AWK. As I said before, all arrays are associative and are "indexed" by a character string. If the flag AWK_ARRAY is set in the flags member, the variable is an array and the index member represents the array index.

数组在AWK中也很特殊。 正如我之前所说,所有数组都是关联的,并由字符串“索引”。 如果在flags成员中设置了标志AWK_ARRAY ,则该变量为数组,而index成员表示数组索引。

Following is a brief description of each API function.

以下是每个API函数的简要说明。

awk_init (awk_init)

AWKINTERP* awk_init (const char **vars);

The function initializes a new AWK interpreter object. It takes an array of variable definitions with the same format as the -v command-line arguments of stand-alone AWK interpreter. The array is terminated with a NULL string.

该函数初始化一个新的AWK解释器对象。 它采用一组变量定义,其格式与独立AWK解释器的-v命令行参数相同。 该数组以NULL string终止。

awk_setprog (awk_setprog)

int awk_setprog (AWKINTERP* pi, const char *prog);

Set the program text for an interpreter. This function can be called only once for an interpreter.

设置解释程序的程序文本。 解释器只能调用一次此函数。

awk_addprogfile (awk_addprogfile)

int awk_addprogfile (AWKINTERP* pi, const char *progfile);

Adds the content of a file as AWK program. The functionality is equivalent with the -f switch on the command line of stand-alone interpreter. Just like the -f switch, this function can be called repeatedly to add multiple programs.

将文件内容添加为AWK程序。 该功能与独立解释器的命令行上的-f开关等效。 就像-f开关一样,可以重复调用此函数以添加多个程序。

awk_compile (awk_compile)

int awk_compile (AWKINTERP* pi);

Compiles the AWK language program(s) that have been specified using awk_setprog or awk_addprogfile functions.

编译使用awk_setprogawk_addprogfile函数指定的AWK语言程序。

awk_addarg (awk_addarg)

int awk_addarg (AWKINTERP* pi, const char *arg);

Add a new argument to the interpreter. The argument can be an input file name or a variable definition, if it has the syntax var=value. Arguments can be added at any time before starting execution of the AWK program.

向解释器添加新的参数。 如果参数的语法为var=value ,则可以是输入文件名或变量定义。 可以在开始执行AWK程序之前随时添加参数。

(Example)
AWKINTERP *pi = awk_init (NULL);
awk_setprog (pi, "{print pass+1 \"-\" NR, $0}");
awk_compile (pi);
awk_addarg (pi, "infile.txt");
awk_addarg (pi, "pass=1");
awk_addarg (pi, "infile.txt");


The output is (assuming infile.txt has 25 lines):

输出为(假设infile.txt有25行):

1 - 25
2 - 25

awk_exec (awk_exec)

int awk_exec (AWKINTERP* pi);

Execute a compiled program. The function returns the value returned by exit statement or a negative error code if something went wrong. If a program terminates without an exit statement, the returned value is 0. Small negative values should be considered reserved for error conditions.

执行已编译的程序。 该函数返回exit语句返回的值,或者如果出现问题则返回负错误代码。 如果程序在没有退出语句的情况下终止,则返回值为0 。 较小的负值应视为为错误情况保留的值。

(Example)
AWKINTERP *pi = awk_init (NULL);
awk_setprog (pi, "{print NR, $0}");
awk_compile (pi);
awk_addarg (pi, "infile.txt);
awk_exec (pi);


awk_run (awk_run)

int awk_run (AWKINTERP* pi, const char *progfile);

This function combines in one call the calls to awk_setprog, awk_compile and awk_exec functions.

该函数在一个调用中组合了对awk_setprogawk_compileawk_exec函数的调用。

If a program terminates without an exit statement, the returned value is 0. Otherwise, the function returns the value specified in the exit statement. Small negative values should be considered reserved for error conditions. If the program requires any arguments, they can be added using awk_addarg function before calling awk_run.

如果程序在没有退出语句的情况下终止,则返回值为0 。 否则,函数将返回exit语句中指定的值。 较小的负值应视为为错误情况保留的值。 如果程序需要任何参数,则可以在调用awk_run之前使用awk_addarg函数添加它们。

(Example)
AWKINTERP *pi = awk_init (NULL);
awk_addarg (pi, "infile.txt");
awk_run (pi, "{print NR, $0}");


awk_end (awk_end)

void awk_end (AWKINTERP* pi);

Releases all memory allocated by the interpreter object.

释放解释器对象分配的所有内存。

awk_setinput (awk_setinput)

int awk_setinput (AWKINTERP* pi, const char *fname);

Forces interpreter to read input from a file. By default, an interpreter reads from stdin. This function redirects the input to another file.

强制解释器从文件读取输入。 默认情况下,解释器从stdin读取。 此功能将输入重定向到另一个文件。

awk_infunc (awk_infunc)

Change the input function with a user-defined function.

使用用户定义的功能更改输入功能。

void awk_infunc (AWKINTERP* pi, inproc fn);

Replaces the input function, by default getc or fgetc, with a user defined function. The inproc function has the same signature as getc:

将输入函数(默认为getcfgetc )替换为用户定义的函数。 inproc函数具有与getc相同的签名:

typedef int (*inproc)();

It returns the next character or EOF if there are no more characters.

如果没有更多字符,它将返回下一个字符或EOF。

Here is an example of how to use AWK to process some in-memory data:

这是一个如何使用AWK处理某些内存中数据的示例:

std::istrstream instr{
    "Record 1\n"
    "Record 2\n"
};

AWKINTERP *pi = awk_init (NULL);
awk_setprog (pi, "{print NR, $0}");
awk_compile (pi);
awk_infunc (pi, []()->int {return instr.get (); });


awk_setoutput (awk_setoutput)

int awk_setoutput (AWKINTERP* pi, const char *fname);

Redirect interpreter output to a file. By default, the interpreter output goes to stdout. Using this function, you can redirect it to a different file.

将解释器输出重定向到文件。 默认情况下,解释器输出转到stdout 。 使用此功能,您可以将其重定向到其他文件。

(Example)
AWKINTERP *pi = awk_init (NULL);
awk_setprog (pi, "BEGIN {print \"Output redirected\"}");
awk_compile (pi);
awk_setoutput (pi, "results.txt");
awk_exec (pi);


awk_outfunc (awk_outfunc)

void awk_outfunc (AWKINTERP* pi, outproc fn);

Change the output function with a user-defined function. The outproc function signature is:

使用用户定义的功能更改输出功能。 outproc函数签名为:

typedef int (*outproc)(const char *buf, size_t len);

(Example)
	std::ostringstream out;
    int strout (const char *buf, size_t sz)
    {
        out.write (buf, sz);
        return out.bad ()? - 1 : 1;
    }
...
    AWKINTERP *pi = awk_init (NULL);
    awk_setprog (pi, "BEGIN {print \"Output redirected\"}");
    awk_compile (pi);
    awk_outfunc (pi, strout);

awk_getvar (awk_getvar)

int awk_getvar (AWKINTERP *pi, awksymb* var);

Retrieves the value of an AWK variable.

检索AWK变量的值。

The function returns 1 if successful or a negative error code otherwise.

如果成功,该函数返回1否则返回负错误代码。

If the variable is an array and the index member is NULL, the function returns AWK_ERR_ARRAY error code.

如果变量是数组并且index成员为NULL ,则该函数返回AWK_ERR_ARRAY错误代码。

For string variables, the AWKSYMB_STR flag is set and the function allocates the memory needed for the string by calling malloc. The user has to release the memory by calling free.

对于string变量,设置AWKSYMB_STR标志,并且该函数通过调用malloc分配string所需的malloc 。 用户必须通过调用free释放内存。

(Example)
AWKINTERP *pi = awk_init (NULL);
awksymb var{ "NR" };

awk_setprog (pi, "{print NR, $0}\n");
awk_compile (pi);
awk_getvar (pi, &var);


awk_setvar (awk_setvar)

int awk_setvar (AWKINTERP *pi, awksymb* var);

Changes the value of an AWK variable. The function takes a pointer to an awksymb structure with information about the variable. The user must set the flags member of the awksymb structure to indicate which values are valid (string or numerical). In addition, for array members, the user must specify the index and set the `AWKSYMB_ARR flag.

更改AWK变量的值。 该函数采用指向awksymb结构的指针,该结构包含有关变量的信息。 用户必须设置awksymb结构的flags成员以指示哪些值有效(字符串或数字)。 另外,对于数组成员,用户必须指定索引并设置` AWKSYMB_ARR标志。

If the variable does not exist, it is created.

如果该变量不存在,则将创建它。

(Example)
AWKINTERP *pi = awk_init (NULL);
awksymb v{ "myvar", NULL, AWKSYMB_NUM, 25.0 };
awk_setprog (pi, "{myvar++; print myvar}\n");
awk_compile (interp);

awk_compile (pi);
awk_setvar (pi, &v);
awk_exec (pi);  //output is "26"


awk_addfunc (awk_addfunc)

Add a user defined function to the interpreter.

将用户定义的函数添加到解释器。

int awk_addfunc (AWKINTERP *pi, const char *name, awkfunc fn, int nargs);

Parameters:

参数:

  • pi - pointer to an interpreter object

    pi指向解释器对象的指针

  • name - function name

    name -函数名称

  • fn - pointer to functype

    fn -指针functype

  • nargs - number of function arguments

    nargs函数参数的数量

The function returns 1 if successful or a negative error code otherwise.

如果成功,该函数返回1否则返回负错误代码。

External user-defined functions can be called from AWK code just like any AWK user-defined function. The nargs parameter specifies the expected number of parameters but, like with any AWK function, the number of actual arguments can be different. The interpreter will provide null values for any missing parameters. The function prototype is:

可以像其他任何AWK用户定义函数一样,从AWK代码中调用外部用户定义函数。 nargs参数指定期望的参数数量,但是,与任何AWK函数一样,实际参数的数量可以不同。 解释器将为所有缺少的参数提供null值。 函数原型是:

typedef void (*awkfunc)(AWKINTERP *pinter, awksymb* ret, int nargs, awksymb* args);

The function can return a value by setting it into the ret variable and setting the appropriate flags. String values must be allocated using malloc.

该函数可以通过将其设置为ret变量并设置适当的标志来返回一个值。 字符串值必须使用malloc分配。

It should be called only after the AWK program has been compiled.

仅在编译AWK程序之后才应调用它。

(Example)
	void fact (AWKINTERP *pi, awksymb* ret, int nargs, awksymb* args)
    {
      int prod = 1;
      for (int i = 2; i <= args[0].fval; i++)
        prod *= i;
      ret->fval = prod;
      ret->flags = AWKSYMB_NUM;
    }
...
    awk_setprog (pi, " BEGIN {n = factorial(3); print n}");
    awk_compile (pi);
    awk_addfunc (pi, "factorial", fact, 1);
    awk_exec (pi);

最后的想法 (Final Thoughts)

The source code has been compiled with Visual Studio 2017. There is also a small makefile for gcc. The syntax analyzer uses YACC so you will need a YACC compiler if you want to do a full rebuild. I have included however the files generate by YACC (ytab.cpp and ytab.h) so you can build it even if you don't have a YACC compiler.

源代码已使用Visual Studio 2017进行了编译。gcc还有一个小makefile。 语法分析器使用YACC,因此如果要进行完全重建,则需要YACC编译器。 但是,我包括了YACC生成的文件( ytab.cppytab.h ),因此即使您没有YACC编译器,也可以构建它。

This concludes the presentation of my embedded AWK interpreter. It can be easily incorporated into a C/C++ program and has a good communication with host program. The host can access any interpreter variable and the interpreter can call external functions defined by host program. Size-wise, the interpreter is very small. You can expect an overhead of about 100 KB which is a decent number when compared with other interpreters (Lua takes about twice as much).

这样就完成了我的嵌入式AWK解释器的演示。 它可以轻松地合并到C / C ++程序中,并且可以与主机程序进行良好的通信。 主机可以访问任何解释器变量,并且解释器可以调用由主机程序定义的外部函数。 在大小上,解释器很小。 您可以预期大约100 KB的开销,与其他解释器相比,这是一个不错的数字(Lua的开销大约是后者的两倍)。

I will continue to improve the embedded AWK interpreter. If you want to contribute to this project or just get the latest version, you can find it at https://github.com/neacsum/awk.

我将继续改进嵌入式AWK解释器。 如果您想为这个项目做贡献或只是获得最新版本,可以在https://github.com/neacsum/awk上找到它。

翻译自: https://www.codeproject.com/Articles/5264205/Embedding-AWK-in-a-C-program

mac awk程序

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值