Pretrained Models

We have released various pre-trained Cross Encoder models via our Cross Encoder Hugging Face organization. Additionally, numerous community Cross Encoder models have been publicly released on the Hugging Face Hub.

Original models: Cross Encoder Hugging Face organization.
Community models: All Cross Encoder models on Hugging Face.

Each of these models can be easily downloaded and used like so:

from sentence_transformers import CrossEncoder
import torch

# Load https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", activation_fn=torch.nn.Sigmoid())
scores = model.predict([
    ("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
    ("How many people live in Berlin?", "Berlin is well known for its museums."),
])
# => array([0.9998173 , 0.01312432], dtype=float32)

Cross-Encoders require pairs as inputs and output a score (0 to 1 if the Sigmoid activation function is used). Most models work with text pairs, but some also support non-text inputs such as images (see Multimodal Rerankers). Cross-Encoders do not work for individual sentences and they don’t compute embeddings for individual texts.

MS MARCO

MS MARCO Passage Retrieval is a large dataset with real user queries from Bing search engine with annotated relevant text passages. Models trained on this dataset are very effective as rerankers for search systems.

Note

You can initialize these models with activation_fn=torch.nn.Sigmoid() to force the model to return scores between 0 and 1. Otherwise, the raw value can reasonably range between -10 and 10.

Model Name	NDCG@10 (TREC DL 19)	MRR@10 (MS Marco Dev)	Docs / Sec
cross-encoder/ms-marco-TinyBERT-L2-v2	69.84	32.56	9000
cross-encoder/ms-marco-MiniLM-L2-v2	71.01	34.85	4100
cross-encoder/ms-marco-MiniLM-L4-v2	73.04	37.70	2500
cross-encoder/ms-marco-MiniLM-L6-v2	74.30	39.01	1800
cross-encoder/ms-marco-MiniLM-L12-v2	74.31	39.02	960
cross-encoder/ms-marco-electra-base	71.99	36.41	340

For details on the usage, see Retrieve & Re-Rank.

SQuAD (QNLI)

QNLI is based on the SQuAD dataset (HF) and was introduced by the GLUE Benchmark (HF). Given a passage from Wikipedia, annotators created questions that are answerable by that passage. These models output higher scores if a passage answers a question.

Model Name	Accuracy on QNLI dev set
cross-encoder/qnli-distilroberta-base	90.96
cross-encoder/qnli-electra-base	93.21

STSbenchmark

The following models can be used like this:

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/stsb-roberta-base")
scores = model.predict([("It's a wonderful day outside.", "It's so sunny today!"), ("It's a wonderful day outside.", "He drove to work earlier.")])
# => array([0.60443085, 0.00240758], dtype=float32)

They return a score 0…1 indicating the semantic similarity of the given sentence pair.

Model Name	STSbenchmark Test Performance
cross-encoder/stsb-TinyBERT-L4	85.50
cross-encoder/stsb-distilroberta-base	87.92
cross-encoder/stsb-roberta-base	90.17
cross-encoder/stsb-roberta-large	91.47

Quora Duplicate Questions

These models have been trained on the Quora duplicate questions dataset. They can used like the STSb models and give a score 0…1 indicating the probability that two questions are duplicate questions.

Model Name	Average Precision dev set
cross-encoder/quora-distilroberta-base	87.48
cross-encoder/quora-roberta-base	87.80
cross-encoder/quora-roberta-large	87.91

Note

The model don’t work for question similarity. The question “How to learn Java?” and “How to learn Python?” will get a low score, as these questions are not duplicates. For question similarity, a SentenceTransformer trained on the Quora dataset will yield much more meaningful results.

NLI

Given two sentences, are these contradicting each other, entailing one the other or are these neutral? The following models were trained on the SNLI and MultiNLI datasets.

Model Name	Accuracy on MNLI mismatched set
cross-encoder/nli-deberta-v3-base	90.04
cross-encoder/nli-deberta-base	88.08
cross-encoder/nli-deberta-v3-xsmall	87.77
cross-encoder/nli-deberta-v3-small	87.55
cross-encoder/nli-roberta-base	87.47
cross-encoder/nli-MiniLM2-L6-H768	86.89
cross-encoder/nli-distilroberta-base	83.98

from sentence_transformers import CrossEncoder

model = CrossEncoder("cross-encoder/nli-deberta-v3-base")
scores = model.predict([
    ("A man is eating pizza", "A man eats something"),
    ("A black race car starts up in front of a crowd of people.", "A man is driving down a lonely road."),
])

# Convert scores to labels
label_mapping = ["contradiction", "entailment", "neutral"]
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
# => ['entailment', 'contradiction']

Multimodal Rerankers

Multimodal rerankers can score pairs involving different modalities such as images, video, audio, and text. These models use the same Transformer + LogitScore architecture as text-only decoder rerankers, but with a multimodal backbone that can process non-text inputs. You can check whether a model supports a given modality using modalities and supports().

Community Models

Some notable models from the Community include: