Model

BaseModel

This is the base class for all models in the Sentence Transformers library.

class sentence_transformers.base.model.BaseModel(model_name_or_path: str | None = None, *, modules: list[Module] | OrderedDict[str, Module] | None = None, device: str | None = None, prompts: dict[str, str] | None = None, default_prompt_name: str | None = None, cache_folder: str | None = None, trust_remote_code: bool = False, revision: str | None = None, local_files_only: bool = False, token: bool | str | None = None, model_kwargs: dict[str, Any] | None = None, processor_kwargs: dict[str, Any] | None = None, config_kwargs: dict[str, Any] | None = None, model_card_data: CardData | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch')[source]

Base class for SentenceTransformer, SparseEncoder, and CrossEncoder models.

This class provides common functionality for:

  • Model loading (from Hub, local paths, or creating new models)

  • Model saving (to disk and Hub)

  • Device management

  • Module architecture (sequential composition)

  • Configuration management

  • Tokenizer/processor access

All models inherit from nn.Sequential and are composed of a sequence of modules that are called sequentially in the forward pass.

Initialize a BaseModel instance.

Parameters:
  • model_name_or_path (str, optional) – If a filepath on disk, loads the model from that path. Otherwise, tries to download a pre-trained model. If that fails, tries to construct a model from the Hugging Face Hub with that name. Defaults to None.

  • modules (list[nn.Module], optional) – A list of torch modules that are called sequentially. Can be used to create custom models from scratch. Defaults to None.

  • device (str, optional) – Device (like "cuda", "cpu", "mps", "npu") that should be used for computation. If None, checks if a GPU can be used. Defaults to None.

  • prompts (dict[str, str], optional) – A dictionary with prompts for the model. The key is the prompt name, the value is the prompt text. The prompt text will be prepended before any text during inference. For example: {"query": "query: ", "passage": "passage: "}. If a model has saved prompts, you can override them by passing your own, or pass {"query": "", "document": ""} to disable them. Defaults to None.

  • default_prompt_name (str, optional) – The name of the prompt that should be used by default. If not set, no prompt will be applied. Defaults to None.

  • cache_folder (str, optional) – Path to store models. Can also be set by the SENTENCE_TRANSFORMERS_HOME environment variable. Defaults to None.

  • trust_remote_code (bool, optional) – Whether to allow for custom models defined on the Hub in their own modeling files. Only set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. Defaults to False.

  • revision (str, optional) – The specific model version to use. It can be a branch name, a tag name, or a commit id, for a stored model on Hugging Face. Defaults to None.

  • local_files_only (bool, optional) – Whether to only look at local files (i.e., do not try to download the model). Defaults to False.

  • token (bool or str, optional) – Hugging Face authentication token to download private models. Defaults to None.

  • model_kwargs (dict[str, Any], optional) –

    Keyword arguments passed to the underlying Hugging Face Transformers model via AutoModel.from_pretrained. Particularly useful options include:

    • torch_dtype: Override the default torch.dtype and load the model under a specific dtype. Can be torch.float16, torch.bfloat16, torch.float32, or "auto" to use the dtype from the model’s config.json.

    • attn_implementation: The attention implementation to use. For example "eager", "sdpa", or "flash_attention_2". If you pip install kernels, then "flash_attention_2" should work without having to install flash_attn. It is frequently the fastest option. Defaults to "sdpa" when available (torch>=2.1.1).

    • device_map: Device map for model parallelism, e.g. "auto".

    • provider: For backend="onnx", the ONNX execution provider (e.g. "CUDAExecutionProvider").

    • file_name: For backend="onnx" or "openvino", the filename to load (e.g. for optimized or quantized models).

    • export: For backend="onnx" or "openvino", whether to export the model to the backend format. Also set automatically if the exported file doesn’t exist.

    See the PreTrainedModel.from_pretrained documentation for more details. Defaults to None.

  • processor_kwargs (dict[str, Any], optional) – Keyword arguments passed to the Hugging Face Transformers processor/tokenizer via AutoProcessor.from_pretrained. See the AutoTokenizer.from_pretrained documentation for more details. Defaults to None.

  • config_kwargs (dict[str, Any], optional) – Keyword arguments passed to the Hugging Face Transformers config via AutoConfig.from_pretrained. See the AutoConfig.from_pretrained documentation for more details. Defaults to None.

  • model_card_data (CardData, optional) – A model card data object that contains information about the model. Used to generate a model card when saving the model. If not set, a default model card data object is created. Defaults to None.

  • backend (str, optional) – The backend to use for inference. Can be "torch" (default), "onnx", or "openvino". Defaults to "torch".

property device: device

Get torch.device from module, assuming that the whole module has one device. In case there are no PyTorch parameters, fall back to CPU.

property dtype: dtype | None

The dtype of the module (assuming that all the module parameters have the same dtype).

Type:

torch.dtype

evaluate(evaluator: BaseEvaluator, output_path: str | None = None) dict[str, float] | float[source]

Evaluate the model based on an evaluator

Parameters:
  • evaluator (BaseEvaluator) – The evaluator used to evaluate the model.

  • output_path (str, optional) – The path where the evaluator can write the results. Defaults to None.

Returns:

The evaluation results.

get_backend() Literal['torch', 'onnx', 'openvino'][source]

Return the backend used for inference, which can be one of “torch”, “onnx”, or “openvino”.

Returns:

The backend used for inference.

Return type:

str

get_max_seq_length() int | None[source]

Deprecated since version Use: the max_seq_length property instead.

Returns the maximal sequence length that the first module of the model accepts. Longer inputs will be truncated.

Returns:

The maximal sequence length that the model accepts, or None if it is not defined.

Return type:

Optional[int]

get_model_kwargs() list[str][source]

Get the keyword arguments specific to this model for inference methods like encode or predict.

Example

>>> from sentence_transformers import SentenceTransformer, SparseEncoder
>>> SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2").get_model_kwargs()
[]
>>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs()
['task', 'truncate_dim']
>>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs()
['task']
Returns:

A list of keyword arguments for the forward pass.

Return type:

list[str]

gradient_checkpointing_enable(gradient_checkpointing_kwargs: dict[str, Any] | None = None) None[source]

Enable gradient checkpointing for the model.

is_singular_input(inputs: Any) bool[source]

Check if the input represents a single example or a batch of examples.

Parameters:

inputs – The input to check.

Returns:

True if the input is a single example, False if it is a batch.

Return type:

bool

property max_seq_length: int | None

Returns the maximal input sequence length for the model. Longer inputs will be truncated.

Returns:

The maximal input sequence length, or None if not defined.

Return type:

Optional[int]

property modalities: list[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]]

Return the list of modalities supported by this model, e.g. ["text"] or ["text", "image", "message"].

model_card_data_class[source]

alias of BaseModelCardData

preprocess(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict] | tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]], prompt: str | None = None, **kwargs) dict[str, Tensor | Any][source]

Preprocesses the inputs for the model.

Parameters:
  • inputs (list[SingleInput | PairInput]) – A list of inputs to be preprocessed. Each input can be a string, dict, tuple, PIL Image, numpy array, torch Tensor, or other supported modality. If a single input is provided, it must be wrapped in a list.

  • prompt (str, optional) – A prompt string to prepend to text inputs. Defaults to None. If the model supports the message modality, the prompt will be added as a system message to the input messages instead of being prepended to text.

Returns:

A dictionary of tensors with the preprocessed inputs.

Return type:

dict[str, Tensor | Any]

property processor: Any

Property to get the processor that is used by this model

push_to_hub(repo_id: str, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str | None = None, local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None, revision: str | None = None, create_pr: bool = False) str[source]

Uploads all elements of this model to a HuggingFace Hub repository, creating it if it doesn’t exist.

Parameters:
  • repo_id (str) – Repository name for your model in the Hub, including the user or organization.

  • token (str, optional) – An authentication token (See https://huggingface.co/settings/token)

  • private (bool, optional) – Set to true, for hosting a private model

  • safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way

  • commit_message (str, optional) – Message to commit while pushing.

  • local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded

  • exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible

  • replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card. If false (default), keep the existing model card if one exists in the repository.

  • train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.

  • revision (str, optional) – Branch to push the uploaded files to

  • create_pr (bool, optional) – If True, create a pull request instead of pushing directly to the main branch

Returns:

The url of the commit of your model in the repository on the Hugging Face Hub.

Return type:

str

save_pretrained(path: str, model_name: str | None = None, create_model_card: bool = True, train_datasets: list[str] | None = None, safe_serialization: bool = True) None[source]

Saves a model and its configuration files to a directory, so that it can be loaded again.

Parameters:
  • path (str) – Path on disk where the model will be saved.

  • model_name (str, optional) – Optional model name.

  • create_model_card (bool, optional) – If True, create a README.md with basic information about this model.

  • train_datasets (List[str], optional) – Optional list with the names of the datasets used to train the model.

  • safe_serialization (bool, optional) – If True, save the model using safetensors. If False, save the model the traditional (but unsafe) PyTorch way.

save_to_hub(repo_id: str, organization: str | None = None, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str = 'Add new model.', local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None) str[source]

DEPRECATED, use push_to_hub instead.

Uploads all elements of this model to a new HuggingFace Hub repository.

Parameters:
  • repo_id (str) – Repository name for your model in the Hub, including the user or organization.

  • token (str, optional) – An authentication token (See https://huggingface.co/settings/token)

  • private (bool, optional) – Set to true, for hosting a private model

  • safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way

  • commit_message (str, optional) – Message to commit while pushing.

  • local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded

  • exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible

  • replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card

  • train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.

Returns:

The url of the commit of your model in the repository on the Hugging Face Hub.

Return type:

str

start_multi_process_pool(target_devices: list[str] | None = None) dict[Literal['input', 'output', 'processes'], Any][source]

Starts a multi-process pool to infer with several independent processes.

This method is recommended if you want to predict on multiple GPUs or CPUs. It is advised to start only one process per GPU. This method works together with predict and stop_multi_process_pool.

Parameters:

target_devices (List[str], optional) – PyTorch target devices, e.g. [“cuda:0”, “cuda:1”, …], [“npu:0”, “npu:1”, …], or [“cpu”, “cpu”, “cpu”, “cpu”]. If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used.

Returns:

A dictionary with the target processes, an input queue, and an output queue.

Return type:

Dict[str, Any]

static stop_multi_process_pool(pool: dict[Literal['input', 'output', 'processes'], Any]) None[source]

Stops all processes started with start_multi_process_pool.

Parameters:

pool (Dict[str, object]) – A dictionary containing the input queue, output queue, and process list.

Returns:

None

supports(modality: Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]) bool[source]

Check if the model supports the given modality.

A modality is supported if:

  1. It is directly listed in modalities (including tuple modalities that are explicitly listed), or

  2. It is a tuple of modalities (e.g. ("image", "text")) where each part is individually supported and the model also supports "message" format, which is used to combine multiple modalities into a single input.

Parameters:

modality – A single modality string (e.g. "text", "image") or a tuple of modality strings (e.g. ("image", "text")).

Returns:

Whether the model supports the given modality.

Return type:

bool

Example:

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
>>> model.supports("text")
True
>>> model.supports("image")
False
tokenize(texts: list[str] | list[dict] | list[tuple[str, str]], **kwargs) dict[str, Tensor][source]

Deprecated since version `tokenize`: is deprecated. Use preprocess instead.

property tokenizer: Any

Property to get the tokenizer that is used by this model

property transformers_model: PreTrainedModel | None

Property to get the underlying transformers PreTrainedModel instance, if it exists. Note that it’s possible for a model to have multiple underlying transformers models, but this property will return the first one it finds in the module hierarchy.

Note

This property can also return e.g. ORTModelForFeatureExtraction or OVModelForFeatureExtraction instances from the optimum-intel and optimum-onnx libraries, if the model is loaded using backend="onnx" or backend="openvino".

Returns:

The underlying transformers model or None if not found.

Return type:

PreTrainedModel or None

BaseModelCardData

class sentence_transformers.base.model_card.BaseModelCardData(language: str | list[str] | None = <factory>, license: str | None = None, model_name: str | None = None, model_id: str | None = None, train_datasets: list[dict[str, str]] = <factory>, eval_datasets: list[dict[str, str]] = <factory>, task_name: str | None = 'retrieval', tags: list[str] = <factory>, local_files_only: bool = False, generate_widget_examples: bool = True)[source]

A dataclass storing data used in the model card.

Parameters:
  • language (Optional[Union[str, List[str]]]) – The model language, either a string or a list, e.g. “en” or [“en”, “de”, “nl”]

  • license (Optional[str]) – The license of the model, e.g. “apache-2.0”, “mit”, or “cc-by-nc-sa-4.0”

  • model_name (Optional[str]) – The pretty name of the model.

  • model_id (Optional[str]) – The model ID when pushing the model to the Hub.

  • train_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the training datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“name”: “MultiNLI”, “id”: “nyu-mll/multi_nli”}, {“name”: “STSB”}]

  • eval_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the evaluation datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“id”: “mteb/stsbenchmark-sts”}]

  • task_name (str) – The human-readable task the model is trained on.

  • tags (Optional[List[str]]) – A list of tags for the model.

  • local_files_only (bool) – If True, don’t attempt to find dataset or base model information on the Hub. Defaults to False.

  • generate_widget_examples (bool) – If True, generate widget examples from the evaluation or training dataset, and compute their similarities. Defaults to True.

Tip

Install codecarbon to automatically track carbon emission usage and include it in your model cards.