Pretrained Models
We have released various pre-trained Cross Encoder models via our Cross Encoder Hugging Face organization. Additionally, numerous community Cross Encoder models have been publicly released on the Hugging Face Hub.
Original models: Cross Encoder Hugging Face organization.
Community models: All Cross Encoder models on Hugging Face.
Each of these models can be easily downloaded and used like so:
from sentence_transformers import CrossEncoder
import torch
# Load https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", activation_fn=torch.nn.Sigmoid())
scores = model.predict([
("How many people live in Berlin?", "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."),
("How many people live in Berlin?", "Berlin is well known for its museums."),
])
# => array([0.9998173 , 0.01312432], dtype=float32)
Cross-Encoders require text pairs as inputs and output a score 0…1 (if the Sigmoid activation function is used). They do not work for individual sentences and they don’t compute embeddings for individual texts.
MS MARCO
MS MARCO Passage Retrieval is a large dataset with real user queries from Bing search engine with annotated relevant text passages. Models trained on this dataset are very effective as rerankers for search systems.
Note
You can initialize these models with activation_fn=torch.nn.Sigmoid()
to force the model to return scores between 0 and 1. Otherwise, the raw value can reasonably range between -10 and 10.
Model Name | NDCG@10 (TREC DL 19) | MRR@10 (MS Marco Dev) | Docs / Sec |
---|---|---|---|
cross-encoder/ms-marco-TinyBERT-L2-v2 | 69.84 | 32.56 | 9000 |
cross-encoder/ms-marco-MiniLM-L2-v2 | 71.01 | 34.85 | 4100 |
cross-encoder/ms-marco-MiniLM-L4-v2 | 73.04 | 37.70 | 2500 |
cross-encoder/ms-marco-MiniLM-L6-v2 | 74.30 | 39.01 | 1800 |
cross-encoder/ms-marco-MiniLM-L12-v2 | 74.31 | 39.02 | 960 |
cross-encoder/ms-marco-electra-base | 71.99 | 36.41 | 340 |
For details on the usage, see Retrieve & Re-Rank.
SQuAD (QNLI)
QNLI is based on the SQuAD dataset (HF) and was introduced by the GLUE Benchmark (HF). Given a passage from Wikipedia, annotators created questions that are answerable by that passage. These models output higher scores if a passage answers a question.
Model Name | Accuracy on QNLI dev set |
---|---|
cross-encoder/qnli-distilroberta-base | 90.96 |
cross-encoder/qnli-electra-base | 93.21 |
STSbenchmark
The following models can be used like this:
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/stsb-roberta-base")
scores = model.predict([("It's a wonderful day outside.", "It's so sunny today!"), ("It's a wonderful day outside.", "He drove to work earlier.")])
# => array([0.60443085, 0.00240758], dtype=float32)
They return a score 0…1 indicating the semantic similarity of the given sentence pair.
Model Name |
STSbenchmark Test Performance |
---|---|
85.50 |
|
87.92 |
|
90.17 |
|
91.47 |
Quora Duplicate Questions
These models have been trained on the Quora duplicate questions dataset. They can used like the STSb models and give a score 0…1 indicating the probability that two questions are duplicate questions.
Model Name | Average Precision dev set |
---|---|
cross-encoder/quora-distilroberta-base | 87.48 |
cross-encoder/quora-roberta-base | 87.80 |
cross-encoder/quora-roberta-large | 87.91 |
Note
The model don’t work for question similarity. The question “How to learn Java?” and “How to learn Python?” will get a low score, as these questions are not duplicates. For question similarity, a SentenceTransformer
trained on the Quora dataset will yield much more meaningful results.
NLI
Given two sentences, are these contradicting each other, entailing one the other or are these neutral? The following models were trained on the SNLI and MultiNLI datasets.
Model Name |
Accuracy on MNLI mismatched set |
---|---|
90.04 |
|
88.08 |
|
87.77 |
|
87.55 |
|
87.47 |
|
86.89 |
|
83.98 |
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/nli-deberta-v3-base")
scores = model.predict([
("A man is eating pizza", "A man eats something"),
("A black race car starts up in front of a crowd of people.", "A man is driving down a lonely road."),
])
# Convert scores to labels
label_mapping = ["contradiction", "entailment", "neutral"]
labels = [label_mapping[score_max] for score_max in scores.argmax(axis=1)]
# => ['entailment', 'contradiction']
Community Models
Some notable models from the Community include: