保存部分:
每一个resultItem有原始主键,在建立索引的时候分配一个自增的lableIndex,作为记录在索引系统里的标识。还有有一个记录的label 到offset的映射表,存在master文件的结尾。
public bool WriteResultItem(InternalResultItem item)
{
if (item == null)
throw new ArgumentNullException("item");
// Don't add the object if it already exists
if (pointersTable.ContainsKey(item.LabelIndex))
return false;
pointersTable.Add(item.LabelIndex, stream.Position);
BinaryWriter writer = new BinaryWriter(stream);
WriteResultItem(writer, item);
// If we've exceed the allowed number of bytes per file, then close the current search master,
// and open a new one.
if (!maintainBackwardCompatibility && stream.Length >= bytesPerFile)
{
OpenNewFile();
}
return true;
}
具体写保存一个记录
private void WriteResultItem(BinaryWriter writer, InternalResultItem item)
{
// Write index
writer.Write(item.LabelIndex);
// Write Type
writer.Write(item.Type);
// Write number of attributes
writer.Write(item.Attributes.Count);
// Write attributes
foreach (AttributeItem attrib in item.Attributes)
{
writer.Write(attrib.Key);
writer.Write(attrib.Normalize);
writer.Write(attrib.AllowEmptyValue);
writer.Write(attrib.Culture);
writer.Write(attrib.Value);
}
}
读取部分:
每一个master文件都有一个label到offset的映射表,读的时候拿lableIndex依次去每个master去查
public InternalResultItem GetResultItem(int index)
{
// Go through all our master files, and find the the requested index.
long offset;
foreach (MasterData masterData in masterDataList)
{
// This can happen; with deltas, its not garanteed that an asset will be there.
if (masterData.PointersTable.TryGetValue(index, out offset))
{
if (masterData.Stream.Length < offset)
throw new EndOfStreamException(string.Format("The offset ({0}) exceed the stream length.", offset));
masterData.Stream.Position = offset;
BinaryReader reader = new BinaryReader(masterData.Stream);
return ReadResultItem(reader);
}
}
return null;
}
具体读取一条记录
private InternalResultItem ReadResultItem(BinaryReader reader)
{
InternalResultItem item = new InternalResultItem();
// Read index
item.LabelIndex = reader.ReadInt32();
// Reader Type
item.Type = reader.ReadString();
// Read number of attributes
int nrAttributes = reader.ReadInt32();
if (nrAttributes > 0)
{
for (int i = 0; i < nrAttributes; i++)
{
string key = reader.ReadString();
bool normalize = reader.ReadBoolean();
bool allowEmptyValue = reader.ReadBoolean();
string culture = reader.ReadString();
string value = reader.ReadString();
item.Attributes.Add(new AttributeItem(key, normalize, allowEmptyValue, value, culture));
}
}
return item;
}
总结评价: 这部分解决原始记录/doc的标识、存储,和标识的寻址问题。labelIndex就相当于数据库的row locator,或者搜索里的DocId。索引查询最后得到的就是这个docId,也就是索引里只保存docId,根据docId得到doc是存储系统的事。
resultItem物理上并没有排序,只是按照传进来的顺序依次写入master, 也就是mds的数据是非clustered的,只支持bookmark查询,不支持高效的range查询。