Evaluation
CrossEncoder have their own evaluation classes, that are in sentence_transformers.cross_encoder.evaluation
.
CEBinaryAccuracyEvaluator
- class sentence_transformers.cross_encoder.evaluation.CEBinaryAccuracyEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', threshold: float = 0.5, write_csv: bool = True)[source]
This evaluator can be used with the CrossEncoder class.
It is designed for CrossEncoders with 1 outputs. It measure the accuracy of the predict class vs. the gold labels. It uses a fixed threshold to determine the label (0 vs 1).
See CEBinaryClassificationEvaluator for an evaluator that determines automatically the optimal threshold.
CEBinaryClassificationEvaluator
- class sentence_transformers.cross_encoder.evaluation.CEBinaryClassificationEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', show_progress_bar: bool = False, write_csv: bool = True)[source]
This evaluator can be used with the CrossEncoder class. Given sentence pairs and binary labels (0 and 1), it compute the average precision and the best possible f1 score
CECorrelationEvaluator
- class sentence_transformers.cross_encoder.evaluation.CECorrelationEvaluator(sentence_pairs: list[list[str]], scores: list[float], name: str = '', write_csv: bool = True)[source]
This evaluator can be used with the CrossEncoder class. Given sentence pairs and continuous scores, it compute the pearson & spearman correlation between the predicted score for the sentence pair and the gold score.
CEF1Evaluator
- class sentence_transformers.cross_encoder.evaluation.CEF1Evaluator(sentence_pairs: list[list[str]], labels: list[int], *, batch_size: int = 32, show_progress_bar: bool = False, name: str = '', write_csv: bool = True)[source]
CrossEncoder F1 score based evaluator for binary and multiclass tasks.
The task type (binary or multiclass) is determined from the labels array. For binary tasks the returned metric is binary F1 score. For the multiclass tasks the returned metric is macro F1 score.
- Parameters:
sentence_pairs (List[List[str]]) – A list of sentence pairs, where each pair is a list of two strings.
labels (List[int]) – A list of integer labels corresponding to each sentence pair.
batch_size (int, optional) – Batch size for prediction. Defaults to 32.
show_progress_bar (bool, optional) – Show tqdm progress bar.
name (str, optional) – An optional name for the CSV file with stored results. Defaults to an empty string.
write_csv (bool, optional) – Flag to determine if the data should be saved to a CSV file. Defaults to True.
CESoftmaxAccuracyEvaluator
- class sentence_transformers.cross_encoder.evaluation.CESoftmaxAccuracyEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', write_csv: bool = True)[source]
This evaluator can be used with the CrossEncoder class.
It is designed for CrossEncoders with 2 or more outputs. It measure the accuracy of the predict class vs. the gold labels.
CERerankingEvaluator
- class sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator(samples, at_k: int = 10, name: str = '', write_csv: bool = True, mrr_at_k: int | None = None)[source]
This class evaluates a CrossEncoder model for the task of re-ranking.
Given a query and a list of documents, it computes the score [query, doc_i] for all possible documents and sorts them in decreasing order. Then, MRR@10 and NDCG@10 are computed to measure the quality of the ranking.
- Parameters:
samples (List[Dict, str, Union[str, List[str]]) – Must be a list and each element is of the form: {‘query’: ‘’, ‘positive’: [], ‘negative’: []}. Query is the search query, positive is a list of positive (relevant) documents, negative is a list of negative (irrelevant) documents.