Linux/FreeBSD下用C语言开发PHP的so扩展模块例解
2008-02-17 17:27:25| 分类: PHP | 标签: |字号大中小 订阅
引用本文请注明出处:Just Do IT (http://www.toplee.com) < Michael Lee @ toplee.com >
我从97年接触互联网的web开发,至今已经过去9年了,从最初的frontpage做html页面到学会ASP+access+IIS开始,就跟web开发干上了,后来又依次使用了ASP+SQLServer+IIS、JSP+Oracle+Jrun(Resin/Tomcat)、PHP+Syabse(MySQL)+Apache … 最后我定格到了 PHP+MySQL+Apache+Linux(BSD) 的架构上,也就是大家常说的LAMP架构,这说来有很多理由,网上也有很多人讨论各种架构和开发语言之间的优劣,我就不多说了,简单说一下我喜欢LAMP的几个主要原因:
1、全开放的免费平台;
2、简单易上手、各种资源丰富;
3、PHP、MySQL、Apache与Linux(BSD)系统底层以及彼此间无缝结合,非常高效;
4、均使用最高效的语言C/C++开发,性能可靠;
5、PHP语言和C的风格基本一致,还吸取了Java和C++的诸多架构优点;
6、这是最关键的一点,那就是PHP可以非常方便的使用C/C++开发扩展模块,给了PHP无限的扩张性!
基于以上原因,我非常喜欢基于PHP语言的架构,其中最关键的一点就是最后一点,以前在Yahoo和mop均推广使用这个平台,在C扩展php方面也有一些经验,在此和大家分享一下,希望可以抛砖引玉。
用C语言编写PHP的扩展模块的方法有几种,根据最后的表现形式有两种,一种是直接编译进php,一种是编译为php的so扩展模块来被php调用,另外根据编译的方式有两种,一种使用phpize工具(php编译后有的),一种使用ext_skel工具(php自带的),我们使用最多,也是最方便的方式就是使用ext_skel工具来编写php的so扩展模块,这里也主要介绍这种方式。
我们在php的源码目录里面可以看到有个ext目录(我这里说的php都是基于Linux平台的php来说的,不包括windows下的),在ext目录下有个工具 ext_skel ,这个工具可以让我们简单的开发出php的扩展模块,它提供了一个通用的php扩展模块开发步骤和模板。下面我们以开发一个在php里面进行utf8/gbk/gb2312三种编码转换的扩展模块为例子进行说明。在这个模块中,我们要最终提供以下几个函数接口:
(1) string toplee_big52gbk(string s)
将输入字符串从BIG5码转换成GBK
(2) string toplee_gbk2big5(string s)
将输入字符串从GBK转换成BIG5码
(3) string toplee_normalize_name(string s)
将输入字符串作以下处理:全角转半角,strim,大写转小写
(4) string toplee_fan2jian(int code, string s)
将输入的GBK繁体字符串转换成简体
(5) string toplee_decode_utf(string s)
将utf编码的字符串转换成UNICODE
(6) string toplee_decode_utf_gb(string s)
将utf编码的字符串转换成GB
(7) string toplee_decode_utf_big5(string s)
将utf编码的字符串转换成BIG5
(8) string toplee_encode_utf_gb(string s)
将输入的GBKf编码的字符串转换成utf编码
首先,我们进入ext目录下,运行下面命令:
#./ext_skel –extname=toplee
这时,php会自动在ext目录下为我们生成一个目录toplee,里面包含下面几个文件
.cvsignore
CREDITS
EXPERIMENTAL
config.m4
php_toplee.h
tests
toplee.c
toplee.php
其中最有用的就是config.m4和toplee.c文件
接下来我们修改config.m4文件
#vi ./config.m4
找到里面有类似这样几行
dnl Make sure that the comment is aligned:
dnl [ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
上面的几行意思是说告诉php编译的使用使用那种方式加载我们的扩展模块toplee,我们使用–with-toplee的方式,于是我们修改为下面的样子
Make sure that the comment is aligned:
[ --with-toplee Include toplee support])
dnl Otherwise use enable:
dnl PHP_ARG_ENABLE(toplee, whether to enable toplee support,
dnl Make sure that the comment is aligned:
dnl [ --enable-toplee Enable toplee support])
然后我们要做的关键事情就是编写toplee.c,这个是我们编写模块的主要文件,如果您什么都不修改,其实也完成了一个php扩展模块的编写,里面有类似下面的几行代码
{
char * arg = NULL ;
int arg_len , len ;
char string [ 256 ] ;
if ( zend_parse_parameters ( ZEND_NUM_ARGS () TSRMLS_CC , " s " , & arg , & arg_len ) == FAILURE ) {
return ;
}
len = sprintf ( string , " Congratulations! You have successfully modified ext/%.78s/config.m4. Module %.78s is now compiled into PHP. " , " toplee " , arg ) ;
RETURN_STRINGL ( string , len , 1 ) ;
}
如果我们在后面完成php的编译时把新的模块编译进去,那么我们就可以在php脚本中调用函数toplee(),它会输出一段字符串“Congratulations! You have successfully modified ext/toplee/config.m4. Module toplee is now compiled into PHP.”
下面是我们对toplee.c的修改,让其支持我们预先规划的功能和接口,下面是toplee.c的源代码
+----------------------------------------------------------------------+
| PHP Version 4 |
+----------------------------------------------------------------------+
| Copyright (c) 1997-2002 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 2.02 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available at through the world-wide-web at |
| http://www.php.net/license/2_02.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| license@php.net so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Author: |
+----------------------------------------------------------------------+
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/
#ifdef HAVE_CONFIG_H
#include " config.h "
#endif
#include " php.h "
#include " php_ini.h "
#include " ext/standard/info.h "
#include " php_gbk.h "
#include " toplee_util.h "
/* If you declare any globals in php_gbk.h uncomment this:
ZEND_DECLARE_MODULE_GLOBALS(gbk)
*/
/* True global resources - no need for thread safety here */
static int le_gbk ;
/* {{{ gbk_functions[]
*
* Every user visible function must have an entry in gbk_functions[].
*/
function_entry gbk_functions [] = {
PHP_FE ( toplee_decode_utf , NULL )
PHP_FE ( toplee_decode_utf_gb , NULL )
PHP_FE ( toplee_decode_utf_big5 , NULL )
PHP_FE ( toplee_encode_utf_gb , NULL )
PHP_FE ( toplee_big52gbk , NULL )
PHP_FE ( toplee_gbk2big5 , NULL )
PHP_FE ( toplee_fan2jian , NULL )
PHP_FE ( toplee_normalize_name , NULL )
{ NULL , NULL , NULL } /* Must be the last line in gbk_functions[] */
} ;
/* }}} */
/* {{{ gbk_module_entry
*/
zend_module_entry gbk_module_entry = {
#if ZEND_MODULE_API_NO >= 20010901
STANDARD_MODULE_HEADER ,
#endif
" gbk " ,
gbk_functions ,
PHP_MINIT ( gbk ) ,
PHP_MSHUTDOWN ( gbk ) ,
PHP_RINIT ( gbk ) , /* Replace with NULL if there's nothing to do at request start */
PHP_RSHUTDOWN ( gbk ) , /* Replace with NULL if there's nothing to do at request end */
PHP_MINFO ( gbk ) ,
#if ZEND_MODULE_API_NO >= 20010901
" 0.1 " , /* Replace with version number for your extension */
#endif
STANDARD_MODULE_PROPERTIES
} ;
/* }}} */
#ifdef COMPILE_DL_GBK
ZEND_GET_MODULE ( gbk )
#endif
/* {{{ PHP_INI
*/
/* Remove comments and fill if you need to have entries in php.ini*/
PHP_INI_BEGIN ()
PHP_INI_ENTRY ( " gbk2uni " , "" , PHP_INI_SYSTEM , NULL )
PHP_INI_ENTRY ( " uni2gbk " , "" , PHP_INI_SYSTEM , NULL )
PHP_INI_ENTRY ( " uni2big5 " , "" , PHP_INI_SYSTEM , NULL )
PHP_INI_ENTRY ( " big52uni " , "" , PHP_INI_SYSTEM , NULL )
PHP_INI_ENTRY ( " big52gbk " , "" , PHP_INI_SYSTEM , NULL )
PHP_INI_ENTRY ( " gbk2big5 " , "" , PHP_INI_SYSTEM , NULL )
// STD_PHP_INI_ENTRY("gbk.global_value", "42", PHP_INI_ALL, OnUpdateInt, global_value, zend_gbk_globals, gbk_globals)
// STD_PHP_INI_ENTRY("gbk.global_string", "foobar", PHP_INI_ALL, OnUpdateString, global_string, zend_gbk_globals, gbk_globals)
PHP_INI_END ()
/* }}} */
/* {{{ php_gbk_init_globals
*/
/* Uncomment this function if you have INI entries
static void php_gbk_init_globals(zend_gbk_globals *gbk_globals)
{
gbk_globals->global_value = 0;
gbk_globals->global_string = NULL;
}
*/
/* }}} */
char gbk2uni_file [ 256 ] ;
char uni2gbk_file [ 256 ] ;
char big52uni_file [ 256 ] ;
char uni2big5_file [ 256 ] ;
char gbk2big5_file [ 256 ] ;
char big52gbk_file [ 256 ] ;
//utf file init flag
static int initutf = 0 ;
/* {{{ PHP_MINIT_FUNCTION
*/
PHP_MINIT_FUNCTION ( gbk )
{
/* If you have INI entries, uncomment these lines
ZEND_INIT_MODULE_GLOBALS(gbk, php_gbk_init_globals, NULL);*/
REGISTER_INI_ENTRIES () ;
memset ( gbk2uni_file , 0 , sizeof ( gbk2uni_file )) ;
memset ( uni2gbk_file , 0 , sizeof ( uni2gbk_file )) ;
memset ( big52uni_file , 0 , sizeof ( big52uni_file )) ;
memset ( uni2big5_file , 0 , sizeof ( uni2big5_file )) ;
memset ( gbk2big5_file , 0 , sizeof ( gbk2big5_file )) ;
memset ( big52gbk_file , 0 , sizeof ( big52gbk_file )) ;
strncpy ( gbk2uni_file , INI_STR ( " gbk2uni " ) , sizeof ( gbk2uni_file ) - 1 ) ;
strncpy ( uni2gbk_file , INI_STR ( " uni2gbk " ) , sizeof ( uni2gbk_file ) - 1 ) ;
strncpy ( big52uni_file , INI_STR ( " big52uni " ) , sizeof ( big52uni_file ) - 1 ) ;
strncpy ( uni2big5_file , INI_STR ( " uni2big5 " ) , sizeof ( uni2big5_file ) - 1 ) ;
strncpy ( gbk2big5_file , INI_STR ( " gbk2big5 " ) , sizeof ( uni2big5_file ) - 1 ) ;
strncpy ( big52gbk_file , INI_STR ( " big52gbk " ) , sizeof ( uni2big5_file ) - 1 ) ;
//InitMMResource();
InitResource () ;
if (( uni2gbk_file [ 0 ] == '\ 0 ' ) || ( uni2big5_file [ 0 ] == '\ 0 ' )
|| ( gbk2big5_file [ 0 ] == '\ 0 ' ) || ( big52gbk_file [ 0 ] == '\ 0 ' )
|| ( gbk2uni_file [ 0 ] == '\ 0 ' ) || ( big52uni_file [ 0 ] == '\ 0 ' ))
{
return FAILURE ;
}
if ( gbk2uni_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_GBK2UNI , gbk2uni_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
if ( uni2gbk_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_UNI2GBK , uni2gbk_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
if ( big52uni_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_BIG52UNI , big52uni_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
if ( uni2big5_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_UNI2BIG5 , uni2big5_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
if ( gbk2big5_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_GBK2BIG5 , gbk2big5_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
if ( big52gbk_file [ 0 ] != '\ 0 ' )
{
if ( LoadOneCodeTable ( CODE_BIG52GBK , big52gbk_file ) != NULL )
{
toplee_cleanup_mmap ( NULL ) ;
return FAILURE ;
}
}
initutf = 1 ;
return SUCCESS ;
}
/* }}} */
/* {{{ PHP_MSHUTDOWN_FUNCTION
*/
PHP_MSHUTDOWN_FUNCTION ( gbk )
{
/* uncomment this line if you have INI entries*/
UNREGISTER_INI_ENTRIES () ;
toplee_cleanup_mmap ( NULL ) ;
return SUCCESS ;
}
/* }}} */
/* Remove if there's nothing to do at request start */
/* {{{ PHP_RINIT_FUNCTION
*/
PHP_RINIT_FUNCTION ( gbk )
{
return SUCCESS ;
}
/* }}} */
/* Remove if there's nothing to do at request end */
/* {{{ PHP_RSHUTDOWN_FUNCTION
*/
PHP_RSHUTDOWN_FUNCTION ( gbk )
{
return SUCCESS ;
}
/* }}} */
/* {{{ PHP_MINFO_FUNCTION
*/
PHP_MINFO_FUNCTION ( gbk )
{
php_info_print_table_start () ;
php_info_print_table_header ( 2 , " gbk support " , " enabled " ) ;
php_info_print_table_end () ;
/* Remove comments if you have entries in php.ini*/
DISPLAY_INI_ENTRIES () ;
}
/* }}} */
/* Remove the following function when you have succesfully modified config.m4
so that your module can be compiled into PHP, it exists only for testing
purposes. */
/* {{{ proto toplee_decode_utf(string s)
*/
PHP_FUNCTION ( toplee_decode_utf )
{
char * s = NULL , * t = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
t = strdup ( s ) ;
if ( t == NULL )
RETURN_FALSE
DecodePureUTF ( t , KEEP_UNICODE ) ;
RETVAL_STRING ( t , 1 ) ;
free ( t ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_decode_utf_gb(string s)
*/
PHP_FUNCTION ( toplee_decode_utf_gb )
{
char * s = NULL , * t = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
t = strdup ( s ) ;
if ( t == NULL )
RETURN_FALSE
DecodePureUTF ( t , DECODE_UNICODE ) ;
RETVAL_STRING ( t , 1 ) ;
free ( t ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_decode_utf_big5(string s)
*/
PHP_FUNCTION ( toplee_decode_utf_big5 )
{
char * s = NULL , * t = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
t = strdup ( s ) ;
if ( t == NULL )
RETURN_FALSE
DecodePureUTF ( t , DECODE_UNICODE | DECODE_BIG5 ) ;
RETVAL_STRING ( t , 1 ) ;
free ( t ) ;
return ;
}
/* }}} */
int EncodePureUTF ( unsigned char * strSrc ,
unsigned char * strDst , int nDstLen , int nFlag )
{
int nRet ;
int pos ;
unsigned short c ;
unsigned short * uBuf ;
int nSize ;
int nLen ;
int nReturn ;
nLen = strlen (( const char * ) strSrc ) ;
if ( nDstLen < nLen * 2 + 1 )
return 0 ;
nSize = nLen + 1 ;
uBuf = ( unsigned short * ) emalloc ( sizeof ( unsigned short ) * nSize ) ;
nRet = MultiByteToWideChar ( 936 , 0 , ( const char * ) strSrc , strlen (( const char * ) strSrc ) ,
uBuf , nSize ) ;
nReturn = 0 ;
pos = nRet ;
while ( pos > 0 )
{
c = * uBuf ;
if ( c < 0x80 ) {
strDst [ nReturn ++ ] = ( char ) c ;
} else if ( c < 0x800 ) {
strDst [ nReturn ++ ] = ( 0xc0 | ( c >> 6 )) ;
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
} else if ( c < 0x10000 ) {
strDst [ nReturn ++ ] = ( 0xe0 | ( c >> 12 )) ;
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 6 ) & 0x3f )) ;
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
} else if ( c < 0x200000 ) {
strDst [ nReturn ++ ] = ( 0xf0 | ( c >> 18 )) ;
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 12 ) & 0x3f )) ;
strDst [ nReturn ++ ] = ( 0x80 | (( c >> 6 ) & 0x3f )) ;
strDst [ nReturn ++ ] = ( 0x80 | ( c & 0x3f )) ;
}
pos --;
uBuf ++;
}
strDst [ nReturn ] ='\ 0 ';
return nReturn ;
}
/* {{{ proto toplee_encode_utf_gb(string s)
*/
PHP_FUNCTION ( toplee_encode_utf_gb )
{
char * s = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
char * sRet ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
sRet = emalloc ( strlen ( s ) * 2 + 1 ) ;
EncodePureUTF ( s , sRet , strlen ( s ) * 2 + 1 , 0 ) ;
RETVAL_STRING ( sRet , 1 ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_big52gbk(string s)
*/
PHP_FUNCTION ( toplee_big52gbk )
{
char * s = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
char * sRet = NULL ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
sRet = estrdup ( s ) ;
if ( NULL == sRet )
RETURN_FALSE
BIG52GBK ( sRet , strlen ( sRet )) ;
RETVAL_STRING ( sRet , 1 ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_gbk2big5(string s)
*/
PHP_FUNCTION ( toplee_gbk2big5 )
{
char * s = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
char * sRet = NULL ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
sRet = estrdup ( s ) ;
if ( NULL == sRet )
RETURN_FALSE
GBK2BIG5 ( sRet , strlen ( sRet )) ;
RETVAL_STRING ( sRet , 1 ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_normalize_name(string s)
*/
PHP_FUNCTION ( toplee_normalize_name )
{
char * s = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len ;
char * sRet = NULL ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " s " , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
NormalizeName ( s ) ;
RETURN_STRING ( s , 1 ) ;
return ;
}
/* }}} */
/* {{{ proto toplee_fan2jian(int code, string s)
*/
PHP_FUNCTION ( toplee_fan2jian )
{
char * s = NULL ;
int argc = ZEND_NUM_ARGS () ;
int s_len , code ;
char * sRet = NULL ;
char * pSource ;
char * pDest1 = NULL , * pDest2 = NULL ;
int nSourceLen , nDestLen ;
if ( zend_parse_parameters ( argc TSRMLS_CC , " ls " , & code , & s , & s_len ) == FAILURE )
return ;
if ( ! initutf )
RETURN_FALSE
pSource = s ;
nSourceLen = s_len ;
pDest1 = malloc ( nSourceLen * 2 ) ;
pDest2 = malloc ( nSourceLen + 1 ) ;
if ( NULL == pDest1 || NULL == pDest2 )
goto _f2j_err ;
memset ( pDest1 , 0 , nSourceLen * 2 ) ;
memset ( pDest2 , 0 , nSourceLen + 1 ) ;
nDestLen = MultiByteToWideChar ( code , 0 , pSource , nSourceLen , ( short * ) pDest1 , nSourceLen * 2 ) ;
if ( 0 >= nDestLen )
goto _f2j_err ;
nDestLen = WideCharToMultiByte ( code , 0 , ( short * ) pDest1 , nDestLen , pDest2 , nSourceLen , NULL , NULL ) ;
if ( 0 >= nDestLen )
goto _f2j_err ;
RETVAL_STRING ( pDest2 , 1 ) ;
if ( pDest1 != NULL )
free ( pDest1 ) ;
if ( pDest2 != NULL )
free ( pDest2 ) ;
return ;
_f2j_err :
if ( pDest1 != NULL )
free ( pDest1 ) ;
if ( pDest2 != NULL )
free ( pDest2 ) ;
RETURN_FALSE ;
}
/* }}} */
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
* End:
* vim600: noet sw=4 ts=4 fdm=marker
* vim<600: noet sw=4 ts=4
*/
.
事实上我们在这个文件里面定义了所有我们要实现的接口,剩下的部分就是我们再编写几个具体实现的C语言代码,有关C具体实现的技术细节就不在此讨论,有个关键的大家注意就是,您可以在ext/toplee目录下加入您所有用于实现您在toplee.c里面定义的接口的C源文件和头文件,让toplee.c在编译的时候可以调用到,这些都是标准的C语言语法。Michael就不另说,下我把我们实现的几个代码都贴出来:
chn_util.h
#define __CHN_UTIL_H__
#include " common.h "
#define LANG_GB 1
#define LANG_B5 2
#define GB_FULL_COUNT ( 20 + 26 * 2 + 5 + 4 + 26 )
#define B5_FULL_COUNT ( 20 + 26 * 2 + 5 + 4 + 24 )
BOOL FullToHalf ( char * str , int nLang ) ;
void LowerString ( char * str ) ;
void TrimString ( char * str ) ;
#endif // __CHN_UTIL_H__
.
chn_util.c
#include < assert.h >
#include < string.h >
#include " common.h "
#include " chn_util.h "
// 0123456789 !@()-_+'<>
static char * GBFull [ GB_FULL_COUNT ] =
{ " 0 " , " 1 " , " 2 " , " 3 " , " 4 " , " 5 " , " 6 " , " 7 " , " 8 " , " 9 " ,
" " , " @ " , " ( " , " ) " , " - " , " _ " , " + " , " ' " , " < " , " > " ,
" a " , " b " , " c " , " d " , " e " , " f " , " g " , " h " , " i " , " j " , " k " ,
" l " , " m " , " n " , " o " , " p " , " q " , " r " , " s " , " t " , " u " , " v " ,
" w " , " x " , " y " , " z " , " A " , " B " , " C " , " D " , " E " , " F " , " G " ,
" H " , " I " , " J " , " K " , " L " , " M " , " N " , " O " , " P " , " Q " , " R " ,
" S " , " T " , " U " , " V " , " W " , " X " , " Y " , " Z " ,
" 。 " , " · " , " . " , " ﹒ " , " & " ,
" 《 " , " 〈 " , " 〉 " , " 》 " ,
" ﹐ " , " , " , " ﹔ " , " ; " , " ﹕ " , " : " , " ﹖ " , " ? " , " ﹗ " , " ! " , " — " ,
" ‘ " , " ’ " , " “ " , " ” " , " ~ " , " ∶ " , " ` " , " | " , " [ " , " ] " , " { " ,
" } " , " # " , " $ " , " % "
} ;
static char GBEnHalf [ GB_FULL_COUNT + 1 ] =
" 0123456789 @()-_+ \ '<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
" ....&<<>>,,;;:: \ ? \ ?!!- \ ' \ ' \ " \ " ~:`|[]{}#$% " ;
// ⒈⒉⒊⒋⒌⒍⒎⒏⌒∨∠ˇ≌≈
static char * B5Full [ B5_FULL_COUNT ] =
{ " " , " " , " ⒈ " , " ⒉ " , " ⒊ " , " ⒋ " , " ⒌ " , " ⒍ " , " ⒎ " , " ⒏ " ,
" " , " " , " " , " " , " ⌒ " , " ∨ " , " ∠ " , " ˇ " , " ≌ " , " ≈ " ,
" ㈤ " , " ㈥ " , " ㈦ " , " ㈧ " , " ㈨ " , " ㈩ " , " " , " " , " Ⅰ " , " Ⅱ " , " Ⅲ " ,
" Ⅳ " , " Ⅴ " , " Ⅵ " , " Ⅶ " , " Ⅷ " , " Ⅸ " , " Ⅹ " , " Ⅺ " , " Ⅻ " , " " , " " ,
" " , " " , " " , " " , " ⑾ " , " ⑿ " , " ⒀ " , " ⒁ " , " ⒂ " , " ⒃ " , " ⒄ " ,
" ⒅ " , " ⒆ " , " ⒇ " , " ① " , " ② " , " ③ " , " ④ " , " ⑤ " , " ⑥ " , " ⑦ " , " ⑧ " ,
" ⑨ " , " ⑩ " , " " , " " , " ㈠ " , " ㈡ " , " ㈢ " , " ㈣ " ,
" " , " " , " " , " ?? " , " ‘ " ,
" " , " " , " " , " " ,
" " , " " , " " , " " , " " , " " , " " , " " , " " , " " , " " ,
" ˉ " , " ˇ " , " ¨ " , " 〃 " , " ° " , " " , " " , " " , " " , " " , " … " ,
" " , " "
} ;
static char B5EnHalf [ B5_FULL_COUNT + 1 ] =
" 0123456789 @()-_+ \ '<>abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ "
" ....&<<>>,,;;:: \ ? \ ?!!- \ ' \ ' \ " \ " ~|[]{}#$% " ;
static int _bFHSortFlag = 0 ;
static void _sorttable ( char * tableFull [] , char * tableHalf , int nSize )
{
int i , j ;
char * p ;
char cTemp ;
for ( i = 0 ; i < nSize ; i ++ )
{
for ( j = i + 1 ; j < nSize ; j ++ )
{
if ( strcmp ( tableFull [ i ] , tableFull [ j ]) < 0 )
{
p = tableFull [ i ] ;
tableFull [ i ] = tableFull [ j ] ;
tableFull [ j ] = p ;
cTemp = tableHalf [ i ] ;
tableHalf [ i ] = tableHalf [ j ] ;
tableHalf [ j ] = cTemp ;
}
}
}
}
BOOL FullToHalf ( char * str , int nCodePage )
{
char * pSrc = str ;
char * pDest = str ;
char ** pFull ;
char * pEnHalf ;
int nCount ;
BOOL bContinue = FALSE ;
int nHigh , nLow , nMid , nResult ;
if ( ! _bFHSortFlag )
{
_sorttable ( GBFull , GBEnHalf , GB_FULL_COUNT ) ;
_sorttable ( B5Full , B5EnHalf , B5_FULL_COUNT ) ;
_bFHSortFlag = 1 ;
}
assert ( NULL != str ) ;
if (( LANG_GB == nCodePage ) || ( 936 == nCodePage ))
{
pFull = GBFull ;
pEnHalf = GBEnHalf ;
nCount = GB_FULL_COUNT ;
}
else if (( LANG_B5 == nCodePage ) || ( 950 == nCodePage ))
{
pFull = B5Full ;
pEnHalf = B5EnHalf ;
nCount = B5_FULL_COUNT ;
}
else
{
assert ( FALSE ) ;
return FALSE ;
}
while ( '\ 0 ' != * pSrc )
{
if ( 0x81 <= ( BYTE ) * pSrc )
{
// 改用二分法,可以极大提高效率
nLow = 0 ;
nHigh = nCount - 1 ;
while ( nLow <= nHigh )
{
nMid = ( nLow + nHigh ) / 2 ;
nResult = strncmp ( pSrc , pFull [ nMid ] , 2 ) ;;
if ( 0 == nResult )
{
* pDest ++ = pEnHalf [ nMid ] ;
pSrc += 2 ;
bContinue = TRUE ;
break ;
}
if ( nResult > 0 )
nHigh = nMid - 1 ;
else
nLow = nMid + 1 ;
}
if ( ! bContinue )
{
// 判断其他符号
if ( ( 0xA1 <= ( BYTE ) * pSrc ) &&
( 0xA9 >= ( BYTE ) * pSrc ) )
{
* pDest ++ = ' ';
pSrc += 2 ;
bContinue = TRUE ;
}
}
/* for (nIndex = 0; nIndex < nCount; nIndex++)
{
assert(NULL != pFull[nIndex]);
if (NULL != pFull[nIndex])
{
if (0 == strncmp(pSrc, pFull[nIndex], 2))
{
*pDest++ = pEnHalf[nIndex]; // convert full to half
pSrc += 2;
bContinue = TRUE;
break;
}
}
}*/
if ( bContinue )
{
bContinue = FALSE ;
continue ;
}
* pDest ++ = * pSrc ++; // copy head char, and the next statement copy tail char
if ( * pSrc == '\ 0 ' )
break ;
}
* pDest ++ = * pSrc ++; // ascii code
}
* pDest = '\ 0 ';
return TRUE ;
}
BOOL MyIsDBCSLeadByte ( BYTE TestChar )
{
if (( TestChar > 0X80 ) && ( TestChar < 0xFF ))
return TRUE ;
else
return FALSE ;
}
void LowerString ( char * str )
{
while ( * str )
{
if ( ! MyIsDBCSLeadByte ( * str ))
{
if ( ( * str >=' A ' ) && ( * str <=' Z ' ) )
* str = ( char )( * str + ( ' a '-' A ' )) ;
}
else
{
str ++;
if ( !* str )
break ;
}
str ++;
}
return ;
}
BOOL myisspace ( char c )
{
return (( c ==' ' ) || ( c =='\ t ' ) || ( c =='\ r ' ) || ( c =='\ n ' )) ;
}
void TrimString ( char * str )
{
char * pDst ;
char * pSrc ;
char * pLast ;
char cCurrent ;
int nState ;
pLast = pDst = pSrc = str ;
nState = 0 ;
while ( * pSrc )
{
cCurrent =* pSrc ;
switch ( nState )
{
case 0 :
if ( ! myisspace ( cCurrent ))
{
nState = 1 ;
continue ;
}
break ;
case 1 :
if ( myisspace ( cCurrent ))
{
nState = 2 ;
* pDst = cCurrent ;
}
else
{
* pDst = cCurrent ;
pLast = pDst + 1 ;
}
pDst ++;
break ;
case 2 :
if ( myisspace ( cCurrent ))
{
* pDst = cCurrent ;
}
else
{
* pDst = cCurrent ;
pLast = pDst + 1 ;
}
pDst ++;
break ;
}
pSrc ++;
}
* pLast ='\ 0 ';
return ;
}
.
toplee_util.c
int ToBase64 ( void * pSrc , int nSrcLen , char * strBase64 , int * nBase64Len )
{
static char * v = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ " ;
.......... 中间代码有长达 3000 多行,本文省略掉了 ........
void NormalizeName ( char * p )
{
FullToHalf ( p , CODE_PAGE_GBK ) ;
TrimString ( p ) ;
LowerString ( p ) ;
}
.
toplee_util.h
#define __TOPLEE_UTIL_INCLUDE__ 1
#include < sys/stat.h >
#include < sys/types.h >
#include < sys/mman.h >
#include < string.h >
#include < stdlib.h >
#ifdef LINUX
#include < time.h >
#endif
#include " common.h "
//#include "euc2uni.h"
/*
typedef int BOOL;
*/
#ifndef TRUE
#define TRUE 1
#define FALSE 0
#endif
#define ASCII 0
#define HZ_HEAD 1
#define HZ_TAIL 2
#ifdef BIG_ENDDING
#define DEFAULT_UNICODE 0x3000
#define DEFAULT_GBK_CODE 0xA1A1
#define DEFAULT_BIG5_CODE 0xA140
#else
#define DEFAULT_UNICODE 0x0030
#define DEFAULT_GBK_CODE 0xA1A1
#define DEFAULT_BIG5_CODE 0x40A1
#endif
#define CODE_PAGE_GBK 936
#define CODE_PAGE_BIG5 950
#define CODE_PAGE_EUC 932
#define CHARSET_DEFAULT 0
#define CHARSET_UNICODE 1
#define CHARSET_UTF8 2
// 24066 = ( 0xFE - 0x81 + 1 ) * ( 0xFE - 0x40 + 1)
#define GBK_COUNT 24066
// 16999 = ( 0xF9 - 0xA1 + 1 ) * ( 0xFE - 0x40 + 1)
#define BIG5_COUNT 16999
typedef struct tagMMapFile2
{
BOOL bUsed ;
struct stat finfo ;
void * mm ;
} MMapFile ;
//int LoadEuc2UniTable(char *strFileName);
//void FreeEuc2UniTable(void);
int ToBase64 ( void * pSrc , int nSrcLen , char * strBase64 , int * nBase64Len ) ;
int FromBase64 ( char * strSrc , int nSrcLen , void * pDest , int * nDestLen ) ;
int htmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int MultiByteToWideChar ( unsigned int uCodePage , unsigned long lFlags ,
char * pMultiByteStr , int nMultiByte ,
unsigned short * pWideChar , int nWideChar ) ;
int WideCharToMultiByte ( unsigned int uCodePage , unsigned long dwFlags ,
unsigned short * pWideCharStr , int nWideChar ,
char * pMultiByteStr , int nMultiByte ,
const char * lpDefaultChar , int * lpUseDefaultChar ) ;
#define ASCII 0
#define HZ_HEAD 1
#define HZ_TAIL 2
void GBK2BIG5 ( char * lpString , int cbString ) ;
void BIG52GBK ( char * lpString , int cbString ) ;
void LowerString ( char * str ) ;
void TrimString ( char * str ) ;
void DecodeFormString ( char * str ) ;
void DecodeUTF ( char * str ) ;
#define DECODE_UNICODE 0
#define KEEP_UNICODE 1
#define DECODE_GBK 0
#define DECODE_BIG5 2
int DecodePureUTF ( unsigned char * str , int nFlag ) ;
#define LANG_GB 1 // used by httpstrtoint and FullToHalf
#define LANG_B5 2
#define LANG_ENG 3
#define LANG_UNKNOWN 4
int httpstrtoint ( char * strHttp ) ;
void lowerhttpprefix ( char * strUrl ) ;
#define FULL_COUNT ( 21 + 26 * 2 + 5 )
BOOL FullToHalf ( char * str , int nLang ) ;
#define URLDESCSEPCHAR '|'
char * DescriptFromUrl ( char * strUrl ) ;
#define CODE_GBK2UNI 1
#define CODE_UNI2GBK 2
#define CODE_BIG52UNI 3
#define CODE_UNI2BIG5 4
#define CODE_GBK2BIG5 5
#define CODE_BIG52GBK 6
const char * mmapOneFile ( char * pFileName , MMapFile * mmapfile ) ;
void toplee_cleanup_mmap ( void * dummy ) ;
void InitMMResource ( void ) ;
const char * LoadOneCodeTable ( int nType , char * strFileName ) ;
int getcuryear () ;
char * mstrncpy ( char * strDest , char * strSrc , size_t nCount ) ;
int formurlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int wmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int htmlencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
#define MAX_INTERNAL_BUFF 16384
int gb2uni_encode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
int unicodeencode ( char * strInput , int nInputLen , char * strOutBuf , int nOutBufLen ) ;
char * stristr ( const char * big , const char * little ) ;
typedef struct auto_string
{
int len , inc_len ;
char * strval ;
} struAutoString ;
#define DEF_INC_LEN ( 1024 )
#define DEF_INT_LEN 12
void init_auto_string ( struAutoString * astr , int inc_len ) ;
int add_auto_string ( struAutoString * astr , char * new_str ) ;
void free_auto_string ( struAutoString * astr ) ;
int unistrcmp ( const char * str1 , int str1len , const char * str2 , int str2len ) ;
void NormalizeName ( char * p ) ;
#endif // __TOPLEE_UTIL_INCLUDE__
.
php_toplee.h
+----------------------------------------------------------------------+
| PHP Version 4 |
+----------------------------------------------------------------------+
| Copyright (c) 1997-2002 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 2.02 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available at through the world-wide-web at |
| http://www.php.net/license/2_02.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| license@php.net so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Author: |
+----------------------------------------------------------------------+
$Id: header,v 1.10 2002/02/28 08:25:27 sebastian Exp $
*/
#ifndef PHP_GBK_H
#define PHP_GBK_H
extern zend_module_entry gbk_module_entry ;
#define phpext_gbk_ptr & gbk_module_entry
#ifdef PHP_WIN32
#define PHP_GBK_API __declspec ( dllexport )
#else
#define PHP_GBK_API
#endif
#ifdef ZTS
#include " TSRM.h "
#endif
PHP_MINIT_FUNCTION ( gbk ) ;
PHP_MSHUTDOWN_FUNCTION ( gbk ) ;
PHP_RINIT_FUNCTION ( gbk ) ;
PHP_RSHUTDOWN_FUNCTION ( gbk ) ;
PHP_MINFO_FUNCTION ( gbk ) ;
PHP_FUNCTION ( confirm_gbk_compiled ) ; /* For testing, remove later. */
PHP_FUNCTION ( toplee_decode_utf ) ;
PHP_FUNCTION ( toplee_decode_utf_gb ) ;
PHP_FUNCTION ( toplee_decode_utf_big5 ) ;
PHP_FUNCTION ( toplee_encode_utf_gb ) ;
PHP_FUNCTION ( toplee_big52gbk ) ;
PHP_FUNCTION ( toplee_gbk2big5 ) ;
PHP_FUNCTION ( toplee_fan2jian ) ;
PHP_FUNCTION ( toplee_normalize_name ) ;
/*
Declare any global variables you may need between the BEGIN
and END macros here:
ZEND_BEGIN_MODULE_GLOBALS(gbk)
int global_value;
char *global_string;
ZEND_END_MODULE_GLOBALS(gbk)
*/
/* In every utility function you add that needs to use variables
in php_gbk_globals, call TSRM_FETCH(); after declaring other
variables used by that function, or better yet, pass in TSRMLS_CC
after the last function argument and declare your utility function
with TSRMLS_DC after the last declared argument. Always refer to
the globals in your function as GBK_G(variable). You are
encouraged to rename these macros something shorter, see
examples in any other php module directory.
*/
#ifdef ZTS
#define GBK_G ( v ) TSRMG ( gbk_globals_id , zend_gbk_globals *, v )
#else
#define GBK_G ( v ) ( gbk_globals . v )
#endif
#endif /* PHP_GBK_H */
/*
* Local variables:
* tab-width: 4
* c-basic-offset: 4
* indent-tabs-mode: t
* End:
*/
.
至此,我们完成了所有C 代码的编写,本模块实现还需要用到几个码表文件,比如gb2b5.tab,uni2gb.tab之类的,这些码表文件我就不提供了,可以查一些文档如何生成,网上也有很多这样的tab码表文件下载。
接下来,我们就可以进行测试和编译了
回到php源码的根目录,运行命令
#./buildconf
#./configure –with-toplee=shared ……
#./make
#./make install
此时,就完成了模块往php里面的编译,由于加上了shared参数,toplee模块将编译后生成 toplee.so,可以在php.ini或者extensions.ini文件里面使用extension=toplee.so来调用,也可以在php中使用dl()函数动态调用,然后就可以在php里面使用之前我们定义好的几个函数接口了。
因Michael技术实力有限,本文有不正确之处请高手指正,也希望通过本文起到抛砖引玉之效果,让更多的php爱好者一起来分享个人的宝贵经验!