while ( read a document D ) {
while ( read a term T in D ) {
if ( Find( Dictionary, T ) == false )
Insert( Dictionary, T );
Get T’s posting list;
Insert a node to T’s posting list;
}
}
Write the inverted index to disk;
BlockCnt = 0;
while ( read a document D ) {
while ( read a term T in D ) {
if ( out of memory ) {
Write BlockIndex[BlockCnt] to disk;
BlockCnt ++;
FreeMemory;
}
if ( Find( Dictionary, T ) == false )
Insert( Dictionary, T );
Get T’s posting list;
Insert a node to T’s posting list;
}
}
for ( i=0; i<BlockCnt; i++ )
Merge( InvertedIndex, BlockIndex[i] );
Distributed Index
-
Term-partitioned
-
Document-partitioned index
Measures for a Search Engine
- How fast does it index
- How fast does it search
- Expressiveness of query language
Data Retrieval Performance Evaluation (after establishing correctness)
- Response time
- Index space
Information Retrieval Performance Evaluation
- How relevant is the answer set?
Relevant | Irrelevant | |
---|---|---|
Retrieved | R R R_R RR | I R I_R IR |
Not Retrieved | R N R_N RN | I N I_N IN |
Precision: P = R R / ( R R + I R ) P=R_R/(R_R+I_R) P=RR/(RR+IR)
Recall:
R
=
R
R
/
(
R
R
+
R
N
)
R=R_R/(R_R+R_N)
R=RR/(RR+RN)
Exercises
Reference