Rerankers

Reranker models are often CrossEncoder models with 1 output class, i.e. given a pair of texts (query, answer), the model outputs one score. This score, either a float score that reasonably ranges between -10.0 and 10.0, or a score that’s bound to 0…1, denotes to what extent the answer can help answer the query.

Many reranker models are trained on MS MARCO:

But most likely, you will get the best results when training on your dataset. Because of this, this page includes some examples training scripts that you can adopt for your own data:

BinaryCrossEntropyLoss

The BinaryCrossEntropyLoss is a very strong yet simple loss. Given pairs of texts (e.g. (query, answer) pairs), this loss uses the CrossEncoder model to compute prediction scores. It compares these against the gold (or silver, a.k.a. determined with some model) labels, and computes a lower loss the better the model is doing.

CachedMultipleNegativesRankingLoss

The CachedMultipleNegativesRankingLoss (a.k.a. InfoNCE with GradCache) is more complex than the common BinaryCrossEntropyLoss. It accepts positive pairs (i.e. (query, answer) pairs) or triplets (i.e. (query, right_answer, wrong_answer) triplets), and will then randomly find num_negatives extra incorrect answers per query by taking answers from other questions in the batch. This is often referred to as “in-batch negatives”.

The loss will then compute scores for all (query, answer) pairs, including the incorrect answers ones it just selected. The loss will then use a Cross Entropy Loss to ensure that the score of the (query, correct_answer) is higher than (query, wrong_answer) for all (randomly selected) wrong answers.

The CachedMultipleNegativesRankingLoss uses an approach called GradCache to allow computing the scores in mini-batches without increasing the memory usage excessively. This loss is recommended over the “standard” MultipleNegativesRankingLoss (a.k.a. InfoNCE) loss, which does not have this clever mini-batching support and thus requires a lot of memory.

Experimentation with an activation_fn and scale is warranted for this loss. torch.nn.Sigmoid with scale=10.0 works okay, torch.nn.Identity` with scale=1.0 also works, and the mGTE paper authors suggest using torch.nn.Tanh with scale=10.0.

Inference

The tomaarsen/reranker-ModernBERT-base-gooaq-bce model was trained with the first script. If you want to try out the model before training something yourself, feel free to use this script:

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-ModernBERT-base-gooaq-bce")

# Get scores for pairs of texts
pairs = [
    ["how to obtain a teacher's certificate in texas?", 'Some aspiring educators may be confused about the difference between teaching certification and teaching certificates. Teacher certification is another term for the licensure required to teach in public schools, while a teaching certificate is awarded upon completion of an academic program.'],
    ["how to obtain a teacher's certificate in texas?", '["Step 1: Obtain a Bachelor\'s Degree. One of the most important Texas teacher qualifications is a bachelor\'s degree. ... ", \'Step 2: Complete an Educator Preparation Program (EPP) ... \', \'Step 3: Pass Texas Teacher Certification Exams. ... \', \'Step 4: Complete a Final Application and Background Check.\']'],
    ["how to obtain a teacher's certificate in texas?", "Washington Teachers Licensing Application Process Official transcripts showing proof of bachelor's degree. Proof of teacher program completion at an approved teacher preparation school. Passing scores on the required examinations. Completed application for teacher certification in Washington."],
    ["how to obtain a teacher's certificate in texas?", 'Teacher education programs may take 4 years to complete after which certification plans are prepared for a three year period. During this plan period, the teacher must obtain a Standard Certification within 1-2 years. Learn how to get certified to teach in Texas.'],
    ["how to obtain a teacher's certificate in texas?", 'In Texas, the minimum age to work is 14. Unlike some states, Texas does not require juvenile workers to obtain a child employment certificate or an age certificate to work. A prospective employer that wants one can request a certificate of age for any minors it employs, obtainable from the Texas Workforce Commission.'],
]
scores = model.predict(pairs)
print(scores)
# [0.00121048 0.97105724 0.00536712 0.8632406  0.00168043]

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    "how to obtain a teacher's certificate in texas?",
    [
        "[\"Step 1: Obtain a Bachelor's Degree. One of the most important Texas teacher qualifications is a bachelor's degree. ... \", 'Step 2: Complete an Educator Preparation Program (EPP) ... ', 'Step 3: Pass Texas Teacher Certification Exams. ... ', 'Step 4: Complete a Final Application and Background Check.']",
        "Teacher education programs may take 4 years to complete after which certification plans are prepared for a three year period. During this plan period, the teacher must obtain a Standard Certification within 1-2 years. Learn how to get certified to teach in Texas.",
        "Washington Teachers Licensing Application Process Official transcripts showing proof of bachelor's degree. Proof of teacher program completion at an approved teacher preparation school. Passing scores on the required examinations. Completed application for teacher certification in Washington.",
        "Some aspiring educators may be confused about the difference between teaching certification and teaching certificates. Teacher certification is another term for the licensure required to teach in public schools, while a teaching certificate is awarded upon completion of an academic program.",
        "In Texas, the minimum age to work is 14. Unlike some states, Texas does not require juvenile workers to obtain a child employment certificate or an age certificate to work. A prospective employer that wants one can request a certificate of age for any minors it employs, obtainable from the Texas Workforce Commission.",
    ],
)
print(ranks)
# [
#     {'corpus_id': 0, 'score': 0.97105724},
#     {'corpus_id': 1, 'score': 0.8632406},
#     {'corpus_id': 2, 'score': 0.0053671156},
#     {'corpus_id': 4, 'score': 0.0016804343},
#     {'corpus_id': 3, 'score': 0.0012104829},
# ]