Pretrained Cross-Encoders

This page lists available pretrained Cross-Encoders. Cross-Encoders require the input of a text pair and output a score 0…1. They do not work for individual sentences and they don’t compute embeddings for individual texts.

BiEncoder

MS MARCO

MS MARCO Passage Retrieval is a large dataset with real user queries from Bing search engine with annotated relevant text passages.

These models can be used like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('Query1', 'Paragraph1'), ('Query1', 'Paragraph2')])

#For Example
scores = model.predict([('How many people live in Berlin?', 'Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.'), 
                        ('How many people live in Berlin?', 'Berlin is well known for its museums.')])
  • cross-encoder/ms-marco-TinyBERT-L-2-v2 - MRR@10 on MS Marco Dev Set: 32.56

  • cross-encoder/ms-marco-MiniLM-L-2-v2 - MRR@10 on MS Marco Dev Set: 34.85

  • cross-encoder/ms-marco-MiniLM-L-4-v2 - MRR@10 on MS Marco Dev Set: 37.70

  • cross-encoder/ms-marco-MiniLM-L-6-v2 - MRR@10 on MS Marco Dev Set: 39.01

  • cross-encoder/ms-marco-MiniLM-L-12-v2 - MRR@10 on MS Marco Dev Set: 39.02

For details on the usage, see Applications - Information Retrieval

MS MARCO Cross-Encoders - More details

SQuAD (QNLI)

QNLI is based on the SQuAD dataset and was introduced by the GLUE Benchmark. Given a passage from Wikipedia, annotators created questions that are answerable by that passage.

  • cross-encoder/qnli-distilroberta-base - Accuracy on QNLI dev set: 90.96

  • cross-encoder/qnli-electra-base - Accuracy on QNLI dev set: 93.21

STSbenchmark

The following models can be used like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('Sent A1', 'Sent B1'), ('Sent A2', 'Sent B2')])

They return a score 0…1 indicating the semantic similarity of the given sentence pair.

  • cross-encoder/stsb-TinyBERT-L-4 - STSbenchmark test performance: 85.50

  • cross-encoder/stsb-distilroberta-base - STSbenchmark test performance: 87.92

  • cross-encoder/stsb-roberta-base - STSbenchmark test performance: 90.17

  • cross-encoder/stsb-roberta-large - STSbenchmark test performance: 91.47

Quora Duplicate Questions

These models have been trained on the Quora duplicate questions dataset. They can used like the STSb models and give a score 0…1 indicating the probability that two questions are duplicate questions.

  • cross-encoder/quora-distilroberta-base - Average Precision dev set: 87.48

  • cross-encoder/quora-roberta-base - Average Precision dev set: 87.80

  • cross-encoder/quora-roberta-large - Average Precision dev set: 87.91

Note: The model don’t work for question similarity. The question How to learn Java and How to learn Python will get a low score, as these questions are not duplicates. For question similarity, the respective bi-encoder trained on the Quora dataset yields much more meaningful results.

NLI

Given two sentences, are these contradicting each other, entailing one the other or are these netural? The following models were trained on the SNLI and MultiNLI datasets.

  • cross-encoder/nli-distilroberta-base - Accuracy on MNLI mismatched set: 83.98

  • cross-encoder/nli-MiniLM2-L6-H768 - Accuracy on MNLI mismatched set: 86.89

  • cross-encoder/nli-roberta-base - Accuracy on MNLI mismatched set: 87.47

  • cross-encoder/nli-deberta-base - Accuracy on MNLI mismatched set: 88.08

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name')
scores = model.predict([('A man is eating pizza', 'A man eats something'), ('A black race car starts up in front of a crowd of people.', 'A man is driving down a lonely road.')])

#Convert scores to labels
label_mapping = ['contradiction', 'entailment', 'neutral']
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]