简单介绍一下,具体技术细节以后再详细介绍
索引服务对文件进行编录时针对不同类型的文件调用相应的filter,正如打开不同类型的文件需要对应到不同的应用程序一样
windows本身只对有限的几种文件进行编录(txt,doc等),如果需要对某种新类型的文件进行编录索引则需要注册相应的filter与之相对应
filter是对ifilter接口的实现,IFilter主要有Init GetChunk GetText GetValue BindRegion 五个接口需要实现,索引服务在调用时,首先启动的是Load(LPCWSTR pszFileName,DWORD dwMode)方法
之后会调用GetChunk 和 GetText
在实现的过程中我们需要在GetText(ULONG * pcwcBuffer,WCHAR * awcBuffer)中将文件的内容存放到awcBuffer中,同一个文件一般会有几个Chunk,每个Chunk会按pcwcBuffer的大小分成多个awcBuffer存储起来,所以我们所需要做的工作有两个:1,在GetChunk中把文件分成合适的块 2,在GetText中把每个块分成awcBuffer
索引在调用Load时pszFileName是当前的文件名,Load之后是Init,Init之后便会调用第一个GetChunk,然后多次调用GetText直到当前Chunk按照*pcwcBuffer的大小全部分解为子字符串awcBuffer,然后再次调用GetChunk重复这个过程直接整个文件全部被分解为awcBuffer
load()
{
ULONG ulChars = (ULONG)wcslen( pszFileName ) + 1;//预留结束符
delete [] m_pwszFileName;
m_pwszFileName = new WCHAR [ulChars];
if ( 0 != m_pwszFileName )
wcscpy( m_pwszFileName, pszFileName );
else
return E_OUTOFMEMORY;
}
getchunk()
{
ReadFromBuffer(m_AllText,cszBuffer,TEXT_FILTER_CHUNK_SIZE,m_ulBufferLen,m_nPos);//m_AllText为整个文件内容,从指定位置读取指定大小的字符存放到cszBuffer中(为一个块)
nLenOfWideCharStr=MultiByteToWideChar(m_ulCodePage,0,cszBuffer,-1,NULL,0);
btrans = MultiByteToWideChar(m_ulCodePage,0,cszBuffer,m_ulBufferLen,m_wcsBuffer,nLenOfWideCharStr);//将此块转化为宽字符
//存储pStat结构
pStat->idChunk = m_ulChunkID;
pStat->breakType = CHUNK_NO_BREAK;
pStat->flags = CHUNK_TEXT;
pStat->locale = GetSystemDefaultLCID();
pStat->attribute.guidPropSet = guidStorage;
pStat->attribute.psProperty.ulKind = PRSPEC_PROPID;
pStat->attribute.psProperty.propid = PID_STG_CONTENTS;
pStat->idChunkSource = m_ulChunkID;
pStat->cwcStartSource = 0;
pStat->cwcLenSource = 0;
m_ulCharsRead = 0;
m_ulChunkID++;
}
gettext()
{
ULONG ulToCopy = min( *pcwcBuffer, m_ulBufferLen - m_ulCharsRead );
memset(awcBuffer,0,2*(*pcwcBuffer));
memcpy( awcBuffer, m_wcsBuffer + m_ulCharsRead, 2*ulToCopy );
m_ulCharsRead += ulToCopy;
*pcwcBuffer = ulToCopy;
if ( m_ulBufferLen == m_ulCharsRead )
{
m_ulCharsRead = 0;
m_ulBufferLen = 0;
return FILTER_S_LAST_TEXT;
}
}