https://github.com/LongxingTan/open-retrievals
- https://colab.research.google.com/drive/1w2dRoRThG6DnUW46swqEUuWySKS1AXCp?usp=sharing
“BAAI/bge-base-zh-v1.5” in t2ranking
原始分数:bge-base-zh-v1.5: “map”: 0.6569549236524207, “mrr”: 0.7683207806932297
微调基线
- pairwise / with negative / TripletLoss
- “map”: 0.6867783615886847, “mrr”: 0.8076847802613109
pairwise-infonce-with in-batch-negative
- pairwise / with negative/ infonce / with in-batch-negative
- “map”: 0.6870901839321426, “mrr”: 0.8006553018032609
pairwise-infonce-without in-negative-negative
- pairwise /with negative/ infonce
- “map”: 0.7037224977430665, “mrr”: 0.824252915451895
pairwise- SimCSE
- pairwise / simcse / with negative
- “map”: 0.687526271681094, “mrr”: 0.8006553018032609
Qwen2-1.5
- 原始值:“map”: 0.5441969624058484, “mrr”: 0.6443661591620775
- 微调后:
pointwise-arcface
- todo
“BAAI/bge-reranker-base” in t2ranking
原始分数: BAAI/bge-reranker-base: “map”: 0.6660360850586858, “mrr”: 0.76091472303207
微调基线
- cross encoder
- “map”: 0.6987828592129784, “mrr”: 0.7999730050750459
colbert微调
- colbert
数据集实验
scifact
训练数据:query, positive, negative
dev和test只有query, 其他为空