IIS7+ 中通过Native HttpModule修改HTML

原文地址: http://blog.csdn.net/wangjia184/article/details/17919667


突然来了这样一个任务:当用户从非欧洲外国家访问时,采用CDNetworks;而该CDN流量费过于昂贵,当用户从欧洲访问的时候要求采用欧洲本地另一个CDN。为了避免改动太多现有代码,最简单的方式就是通过HttpModule在IIS服务器返回HTTP响应前对查找到内容中的资源url域名并进行替换。


这个问题初看上去很简单,但其实上有很多麻烦。


原文地址: http://blog.csdn.net/wangjia184/article/details/17919667

1. 分隔的chunk

服务器在返回的时候是将响应分隔成了多个chunk,那么有可能目标字符串被分隔到了2个chunk中。 

例如,//static.xxxxxxxx.com是查找的目标字符串,可能分布成下图的情况,这给查找带来了很大的不便。



2. 性能

因为这个处理过程会针对所有响应触发,因此应该采用最高效的方式实现,否则对性能的影响很严重。

在Rick Strahl的《Capturing and Transforming ASP.NET Output with Response.Filter》一文中,为了解决第一个问题,他采用托管的HttpModule, 在将多个chunk合并到一起后再进行处理。这样虽然解决了问题,但是对性能造成很大的不利影响。实际上如果某个chunk中没有出现目标字符串的情况下可以完全不用进行处理。

其次,对于字符串的查找可以使用更高效的Boyer-Moore算法。

最后,我决定使用非托管代码实现,这样的话效率是最高的。


工程搭建.

从IIS7开始,非托管Module从以前的ISAPI变成了C++ Module,  首先下载 

IIS7 Native API (C++) Starter Kit


下载的代码中就是一个最简单的HttpModule的工程,需要注意的是,如果服务器是Windows 2008 R2之类的,IIS基本上是运行在x64模式下的(除非特别设置),这个时候应该将工程属性修改成x64。 而且编译的时候采用静态链接,免得服务器上某个依赖项找不到。

在main.cpp中, 有导出方法  RegisterModule 
HRESULT
__stdcall
RegisterModule(
    DWORD                           dwServerVersion,
    IHttpModuleRegistrationInfo *   pModuleInfo,
    IHttpServer *                   pHttpServer
)
{

    HRESULT                             hr = S_OK;
    CPostProcessHttpModuleFactory  *             pFactory = NULL;

    if ( pModuleInfo == NULL || pHttpServer == NULL )
    {
        hr = HRESULT_FROM_WIN32( ERROR_INVALID_PARAMETER );
        goto Finished;
    }

    // step 1: save the IHttpServer and the module context id for future use
    g_pModuleContext = pModuleInfo->GetId();
    g_pHttpServer = pHttpServer;

    // step 2: create the module factory
    pFactory = new CPostProcessHttpModuleFactory();
    if ( pFactory == NULL )
    {
        hr = HRESULT_FROM_WIN32( ERROR_NOT_ENOUGH_MEMORY );
        goto Finished;
    }

    // step 3: register for server events
    // TODO: register for more server events here
    hr = pModuleInfo->SetRequestNotifications( pFactory, /* module factory */
                                               RQ_SEND_RESPONSE /* server event mask */,
                                               0 /* server post event mask */);
    if ( FAILED( hr ) )
    {
        goto Finished;
    }

    pFactory = NULL;

Finished:
    
    if ( pFactory != NULL )
    {
        delete pFactory;
        pFactory = NULL;
    }   

    return hr;
}

其中最重要的是  SetRequestNotifications 的调用,它注册需要处理的事件。这里你需要了解在integrated pipeline模式下 各事件的含义及触发顺序。因为需要修改返回,注册对 RQ_SEND_RESPONSE消息的侦听就可以了。

然后在CHttpModule的派生类中,重载OnSendResponse方法

class CPostProcessHttpModule : public CHttpModule
{
public:


	REQUEST_NOTIFICATION_STATUS
	OnSendResponse(
		IN IHttpContext *                       pHttpContext,
		IN ISendResponseProvider *              pProvider
	);


private:
	BOOL StringStartsWith(LPCSTR szText, LPCSTR szPrefix, int nMaxLength = 1024000);
};

REQUEST_NOTIFICATION_STATUS
CPostProcessHttpModule::OnSendResponse(
    IN IHttpContext *                       pHttpContext,
    IN ISendResponseProvider *              pProvider
)
{
    UNREFERENCED_PARAMETER( pHttpContext );
    UNREFERENCED_PARAMETER( pProvider );

	
        
    return RQ_NOTIFICATION_CONTINUE;
}

工程到这里就搭建完成了


安装与卸载

提升到管理员权限的命令提示符下,使用下面的命令安装该http module.
%systemroot%\system32\inetsrv\APPCMD.EXE install module /name:HtmlPostProcessModule /image:G:\IISPostProcessModule\bin\PostProcessModule_x64.dll /add:false
/image: dll的绝对路径
/name: 安装的http module的名称
/add: false, 只安装不启用

然后打开 inetmgr, 找到需要启用该module的站点,然后进入Modules


在Modules中,点击Configure Native Modules, 在弹出窗口中,勾上刚安装的module,这样就可以了。


如果要卸载,可以使用如下命令

%systemroot%\system32\inetsrv\APPCMD.EXE uninstall module HtmlPostProcessModule
其中HtmlPostProcessModule是安装的时候给的module name.


获得HTTP返回

当服务器返回请求的时候,OnSendResponse函数会被调用。通过下面的代码可以遍历所有的chunk

REQUEST_NOTIFICATION_STATUS
CPostProcessHttpModule::OnSendResponse(
    IN IHttpContext *                       pHttpContext,
    IN ISendResponseProvider *              pProvider
){
	IHttpResponse * pHttpResponse = pHttpContext->GetResponse();

	if( pHttpContext ){
		HTTP_RESPONSE *pResponseStruct = pHttpResponse->GetRawHttpResponse();
		if (pResponseStruct){
			for( int i = 0; i < pResponseStruct->EntityChunkCount; i++) {
				HTTP_DATA_CHUNK pChunk = &(pResponseStruct->pEntityChunks[i]);

				if( pChunk->DataChunkType == HttpDataChunkFromMemory ){
				}
				// TODO : 
			}
		}
	}
}

需要注意的事,实际上ChunkType分很多种,这里只处理了Memory类型的,如果要处理静态文件或者缓存类型的,应该加入相应的代码处理

typedef enum _HTTP_DATA_CHUNK_TYPE
{
    HttpDataChunkFromMemory,
    HttpDataChunkFromFileHandle,
    HttpDataChunkFromFragmentCache,
    HttpDataChunkFromFragmentCacheEx,

    HttpDataChunkMaximum

} HTTP_DATA_CHUNK_TYPE, *PHTTP_DATA_CHUNK_TYPE;

查找目标字符串

对目标字符串采用最高效的BM算法查找,boost准标准库中已经有现成的实现了,直接用即可。

#include <boost\algorithm\searching\boyer_moore.hpp>

// the search target
char * szPattern = "//static.xxxxxxxx.com";
const int PATTERN_SIZE = strlen(szPattern);

boost::algorithm::boyer_moore<char*> bm( szPattern, szPattern + PATTERN_SIZE );

char * pStart = (char *)pChunk->FromMemory.pBuffer;
char * pEnd = pStart + pChunk->FromMemory.BufferLength;

// find out all the appearances
char * pMatch = pStart;
for(;;){
	pMatch = bm( pMatch, pEnd);
	if( !pMatch || pMatch >= pEnd )
		break;

	// TO DO: 
	// pMatch is the matched address of the string

	pMatch += PATTERN_SIZE;
	if( pMatch >= pEnd )
		break;
}

处理多个Chunk的问题

在处理多个chunk的时候,需要考虑目标字符串被分隔到2个chunk中的特殊情况。解决的策略是,在处理n个Chunk中的前(n-1)个的时候,匹配该chunk的末尾是否能够匹配目标字符串的前面某一部分,如果可以,则留到下一个chunk处理前再进行一次匹配。
// detect if there could be uncompleted partner at the end of this chunk
int nChunkRemaingChars = 0;
if( i < pResponseStruct->EntityChunkCount - 1 ){
	int j = PATTERN_SIZE - 1;
	for( ; j > 0; j--){
		char * pFirst = &pStart[pChunk->FromMemory.BufferLength - j];
		if( StringStartsWith( pFirst, szPattern, j) ){
			nChunkRemaingChars = j;
			dwNewSize -= nChunkRemaingChars; // the end part is moved to next chunk to process
			break;
		}
	}
}

修改Chunk

当需要修改Chunk的时候,通过 IHttpContext::AllocateRequestMemory重新分配内存,然后直接将Chunk的指针和大小修改.
LPBYTE pBuffer = (LPBYTE)pHttpContext->AllocateRequestMemory(dwNewSize);

// TODO : modify the new chunk

// Set back the new chunk point
pChunk->FromMemory.pBuffer = pBuffer;
pChunk->FromMemory.BufferLength = dwNewSize;
这里没有使用 WriteEntityChunks方法来写入新的Chunk,而是直接修改。MSDN上说使用该方法一个chunk的大小最大只能为65534。 而通过直接修改chunk的方法,我测试过,一次写入650K都不成问题。

核心部分完成代码

BOOL CPostProcessHttpModule::StringStartsWith(LPCSTR szText, LPCSTR szPrefix, int nMaxLength /* = 1024000 */)
{
	for( int i = 0; i < nMaxLength; i++){
		if( szPrefix[i] == 0 )
			return TRUE;
		if( szText[i] != szPrefix[i] )
			return FALSE;
	}
	return TRUE;
}

 
REQUEST_NOTIFICATION_STATUS
CPostProcessHttpModule::OnSendResponse(
    IN IHttpContext *                       pHttpContext,
    IN ISendResponseProvider *              pProvider
)
{
    UNREFERENCED_PARAMETER( pHttpContext );
    UNREFERENCED_PARAMETER( pProvider );

	IHttpResponse * pHttpResponse = pHttpContext->GetResponse();

	if( pHttpContext )
	{
		pHttpResponse->WriteEntityChunks
		HTTP_RESPONSE *pResponseStruct = pHttpResponse->GetRawHttpResponse();
		if (pResponseStruct)
        {
			PCSTR pszContentType;
			USHORT cchContentType;
			pszContentType = pHttpResponse->GetHeader( HttpHeaderContentType, &cchContentType);
			if( pszContentType )
			{

				if( StringStartsWith( pszContentType, "application/json")  ||
					StringStartsWith( pszContentType, "text/html") )
				{
					char * szPattern = "//static.xxxxxxxx.com";
					char * szReplace = "//cdn.xxxxxxxx.com";
					const int PATTERN_SIZE = strlen(szPattern);
					const int REPLACE_SIZE = strlen(szReplace);

					int nLastChunkChars = 0;

					boost::algorithm::boyer_moore<char*> bm( szPattern, szPattern + PATTERN_SIZE );
					for( int i = 0; i < pResponseStruct->EntityChunkCount; i++)
					{
						std::vector<int> lstAppearance;
						char * pStart = NULL; char * pEnd = NULL;
						PHTTP_DATA_CHUNK pChunk = &(pResponseStruct->pEntityChunks[i]);
						if( pChunk->DataChunkType == HttpDataChunkFromMemory )
						{
							if( pChunk->FromMemory.BufferLength > 0 ) {

								pStart = (char *)pChunk->FromMemory.pBuffer;
								pEnd = pStart + pChunk->FromMemory.BufferLength;

								// caculate the new buffer size
								BOOL bHasUncompletedPartner = FALSE;
								DWORD dwNewSize = pChunk->FromMemory.BufferLength;
								BOOL bRequireModification = FALSE; // flag indicating if this chunk need be modified

								// if there is uncompleted partner from the end of last chunk
								if( nLastChunkChars > 0 ){
									// detect if (the end of last chunk + start of this chunk) matches the partner
									if( StringStartsWith( pStart, szPattern + nLastChunkChars, PATTERN_SIZE - nLastChunkChars) ){
										bHasUncompletedPartner = TRUE;
										dwNewSize = dwNewSize - (PATTERN_SIZE - nLastChunkChars) + REPLACE_SIZE;
									}
									else{
										dwNewSize += nLastChunkChars;
									}
									bRequireModification = TRUE;
								}

								// find out all the appearances
								char * pMatch = pStart;
								if( bHasUncompletedPartner )
									pMatch = pMatch + nLastChunkChars; // skip the begin part if (the end of last chunk + start of this chunk) matches the partner
								for(;;){
									pMatch = bm( pMatch, pEnd);
									if( !pMatch || pMatch >= pEnd )
										break;

									lstAppearance.push_back( (int)(pMatch - pStart) );

									pMatch += PATTERN_SIZE;
									if( pMatch >= pEnd )
										break;
								}
								
								if( !lstAppearance.empty() ){
									dwNewSize += lstAppearance.size() * ( REPLACE_SIZE - PATTERN_SIZE);
									bRequireModification = TRUE;
								}

								// detect if there could be uncompleted partner at the end of this chunk
								int nChunkRemaingChars = 0;
								if( i < pResponseStruct->EntityChunkCount - 1 ){
									int j = PATTERN_SIZE - 1;
									if( lstAppearance.size() > 0 ){
										std::vector<int>::iterator iter = lstAppearance.end();
										iter--; // the last matched position
										int nRemaining = pChunk->FromMemory.BufferLength - ( *iter + PATTERN_SIZE );
										if( j > nRemaining )
											j = nRemaining;
									}

									for( ; j > 0; j--){
										char * pFirst = &pStart[pChunk->FromMemory.BufferLength - j];
										if( StringStartsWith( pFirst, szPattern, j) ){
											nChunkRemaingChars = j;
											dwNewSize -= nChunkRemaingChars; // the end part is moved to next chunk to process
											bRequireModification = TRUE;
											break;
										}
									}
								}

								if( bRequireModification ) {
									LPBYTE pBuffer = (LPBYTE)pHttpContext->AllocateRequestMemory(dwNewSize);
									ATLASSERT(pBuffer);
									int nOffset = 0; // store the written range of the new buffer
									int nLastEnd = 0; // the end position of last match in orginal buffer
									if( pBuffer ) {

									
										if( bHasUncompletedPartner ){ 
											// insert the replace text if (the end of last chunk + start of this chunk) matches the partner
											memcpy_s( pBuffer, dwNewSize, szReplace, REPLACE_SIZE);
											nOffset = REPLACE_SIZE;
											nLastEnd = PATTERN_SIZE - nLastChunkChars;
										}
										else if( nLastChunkChars > 0 ) {
											memcpy_s( pBuffer, dwNewSize, szPattern, nLastChunkChars);
											nOffset = nLastChunkChars;
										}
										nLastChunkChars = nChunkRemaingChars;
									
									
										if( !lstAppearance.empty() ) {
											std::vector<int>::iterator iter;
											for( iter = lstAppearance.begin(); iter != lstAppearance.end(); iter++) {
												int nPos = *iter;
												if( nPos > nLastEnd ) {
													memcpy_s( pBuffer + nOffset, dwNewSize - nOffset, pStart + nLastEnd, nPos - nLastEnd);
													nOffset += nPos - nLastEnd;
												}
												memcpy_s( pBuffer + nOffset, dwNewSize - nOffset, szReplace, REPLACE_SIZE);
												nOffset += REPLACE_SIZE;

												nLastEnd = nPos + PATTERN_SIZE;
											}
										}
										if( nOffset < dwNewSize ){
											memcpy_s( pBuffer + nOffset, dwNewSize - nOffset, pStart + nLastEnd, dwNewSize - nOffset);
										}
										pChunk->FromMemory.pBuffer = pBuffer;
										pChunk->FromMemory.BufferLength = dwNewSize;
									}
								}// if( bRequireModification )
							}// if( pChunk->FromMemory.BufferLength > 0 )
						}// if( pChunk->DataChunkType == HttpDataChunkFromMemory )

					}// for( int i = 0; i < pResponseStruct->EntityChunkCount; i++)
				}
			}

        }
	}

        
    return RQ_NOTIFICATION_CONTINUE;
}


原文地址:  http://blog.csdn.net/wangjia184/article/details/17919667

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值