MS MARCO Cross-Encoders¶
MS MARCO is a large scale information retrieval corpus that was created based on real user search queries using Bing search engine. The provided models can be used for semantic search, i.e., given keywords / a search phrase / a question, the model will find passages that are relevant for the search query.
The training data constist of over 500k examples, while the complete corpus consist of over 8.8 Million passages.
Pre-trained models can be used like this:
from sentence_transformers import CrossEncoder model = CrossEncoder('model_name', max_length=512) scores = model.predict([('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')])
Models & Performance¶
|Model-Name||NDCG@10 (TREC DL 19)||MRR@10 (MS Marco Dev)||Docs / Sec|
Note: Runtime was computed on a V100 GPU with Huggingface Transformers v4.