Modules

sentence_transformers.base.modules defines different building blocks, a.k.a. Modules, that can be used to create models from scratch.

Common Modules

class sentence_transformers.base.modules.Transformer(model_name_or_path: str, *, transformer_task: Literal['feature-extraction', 'sequence-classification', 'text-generation', 'any-to-any', 'fill-mask'] = 'feature-extraction', model_kwargs: dict[str, Any] | None = None, processor_kwargs: dict[str, Any] | None = None, config_kwargs: dict[str, Any] | None = None, processing_kwargs: ProcessingKwargs | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch', modality_config: dict[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...], ModalityParams] | None = None, module_output_name: str | None = None, unpad_inputs: bool | None = None, max_seq_length: int | None = None, do_lower_case: bool = False, tokenizer_name_or_path: str | None = None)[source]

Hugging Face AutoModel wrapper that handles loading, preprocessing, and inference.

Loads the appropriate model class (e.g. BERT, RoBERTa, CLIP, Whisper) based on the model configuration and the specified transformer_task. Supports text, image, audio, and video modalities depending on the underlying model. This module is typically the first module in a SentenceTransformer, SparseEncoder, or CrossEncoder pipeline.

Parameters:
  • model_name_or_path (str) – Hugging Face model name or path to a local model directory.

  • transformer_task (str, optional) –

    The task determining which AutoModel-like class to load. Supported values:

    Defaults to "feature-extraction".

  • model_kwargs (dict[str, Any], optional) –

    Keyword arguments forwarded to AutoModel.from_pretrained when loading the model. Particularly useful options include:

    • torch_dtype: Override the default torch.dtype and load the model under a specific dtype. Can be torch.float16, torch.bfloat16, torch.float32, or "auto" to use the dtype from the model’s config.json.

    • attn_implementation: The attention implementation to use. For example "eager", "sdpa", or "flash_attention_2". If you pip install kernels, then "flash_attention_2" should work without having to install flash_attn. It is frequently the fastest option. Defaults to "sdpa" when available (torch>=2.1.1).

    • device_map: Device map for model parallelism, e.g. "auto".

    • provider: For backend="onnx", the ONNX execution provider (e.g. "CUDAExecutionProvider").

    • file_name: For backend="onnx" or "openvino", the filename to load (e.g. for optimized or quantized models).

    • export: For backend="onnx" or "openvino", whether to export the model to the backend format. Also set automatically if the exported file doesn’t exist.

    See the PreTrainedModel.from_pretrained documentation for more details. Defaults to None.

  • processor_kwargs (dict[str, Any], optional) – Keyword arguments forwarded to AutoProcessor.from_pretrained when loading the processor/tokenizer. See the AutoTokenizer.from_pretrained documentation for more details. Defaults to None.

  • config_kwargs (dict[str, Any], optional) – Keyword arguments forwarded to AutoConfig.from_pretrained when loading the config. See the AutoConfig.from_pretrained documentation for more details. Defaults to None.

  • processing_kwargs (dict[str, dict[str, Any]], optional) – Keyword arguments applied when calling the processor during preprocessing. This is a nested dict whose keys are modality names ("text", "audio", "image", "video"), "common" for kwargs shared across all modalities, or "chat_template" for kwargs forwarded to apply_chat_template (e.g. {"add_generation_prompt": True}). Modality and common kwargs override the built-in defaults. Saved to and loaded from the model configuration file. Defaults to None.

  • backend (str, optional) – Backend used for model inference. Can be "torch" (default), "onnx", or "openvino". Defaults to "torch".

  • modality_config (dict, optional) – Custom modality configuration mapping modality names to method and output name dicts. When provided, module_output_name must also be set. The "message" modality entry may include a "format" key ("structured", "flat", or "auto") to control how chat-template inputs are formatted. Defaults to None.

  • module_output_name (str, optional) – The name of the output feature this module creates (e.g. "token_embeddings", "scores"). Required when modality_config is provided. Defaults to None.

  • unpad_inputs (bool, optional) – Controls whether text-only inputs are concatenated without padding for faster inference using flash attention’s variable-length functions. Non-text inputs (images, audio, video) are always padded normally. If None (default), unpadding is enabled automatically when all prerequisites are met (flash attention with variable-length support, "torch" backend, "feature-extraction" task). Set to False to force padding, which is needed for architectures that don’t support unpadded inputs (e.g. qwen2_vl). Set to True to request unpadding explicitly; a warning is logged if the prerequisites are not met. Defaults to None.

  • max_seq_length (int, optional) – Truncate any inputs longer than this value. Prefer setting model_max_length via processor_kwargs instead. Defaults to None.

  • do_lower_case (bool, optional) – If true, lowercases the input (independent of whether the model is cased or not). Rarely needed. Defaults to False.

  • tokenizer_name_or_path (str, optional) – Name or path of the tokenizer. When None, model_name_or_path is used. Deprecated. Defaults to None.

get_embedding_dimension() int[source]

Get the output embedding dimension from the transformer model.

Returns:

The hidden dimension size of the model’s embeddings.

Return type:

int

Raises:

ValueError – If the embedding dimension cannot be determined from the model config.

property max_seq_length: int | None

The maximum input sequence length. Reads from the tokenizer if available, otherwise falls back to max_position_embeddings from the model config.

property modalities: list[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]]

The list of supported input modalities (e.g. "text", "image", ("image", "text")).

preprocess(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict] | tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]], prompt: str | None = None, **kwargs) dict[str, Any][source]

Preprocess inputs into model-ready features.

Parameters:
  • inputs – List of inputs. Can contain strings, dicts with modality keys, PIL images, or numpy/torch arrays for audio/video.

  • prompt – Optional prompt to prepend to text inputs or inject as a system message.

  • **kwargs – Additional keyword arguments forwarded to prompt length computation (e.g. task). Only used when prompt is provided for text inputs.

Returns:

Dictionary containing preprocessed tensors with a modality key indicating the input type and optionally a prompt_length key for prompt-aware pooling.

class sentence_transformers.base.modules.Dense(in_features: int, out_features: int, bias: bool = True, activation_function: Callable[[Tensor], Tensor] | None = Tanh(), init_weight: Tensor | None = None, init_bias: Tensor | None = None, module_input_name: str = 'sentence_embedding', module_output_name: str | None = None)[source]

Applies a linear transformation with an optional activation function.

Passes the embedding through a feed-forward layer (nn.Linear + activation), useful for dimensionality reduction or projecting embeddings into a different space.

Parameters:
  • in_features – Size of the input dimension.

  • out_features – Size of the output dimension.

  • bias – Whether to include a bias vector in the linear layer.

  • activation_function – Activation function applied after the linear layer. If None, uses nn.Identity(). Defaults to nn.Tanh().

  • init_weight – Initial value for the weight matrix of the linear layer.

  • init_bias – Initial value for the bias vector of the linear layer.

  • module_input_name – The key in the features dictionary to read the input from. Defaults to "sentence_embedding".

  • module_output_name – The key in the features dictionary to store the output in. If None, uses the same key as module_input_name.

class sentence_transformers.base.modules.Router(sub_modules: dict[str, list[Module]], default_route: str | None = None, allow_empty_key: bool = True, route_mappings: dict[tuple[str | None, str | tuple[str, ...] | None], str] | None = None)[source]

This model allows creating flexible SentenceTransformer models that dynamically route inputs to different processing modules based on:

  1. Task type (e.g., “query” or “document”) for asymmetric retrieval models

  2. Modality (e.g., “text”, “image”, or (“text”, “image”)) for crossmodal or multimodal models

  3. Combination of both for complex routing scenarios

Tips:

  • The task argument in model.encode() specifies which route to use

  • model.encode_query() and model.encode_document() are convenient shorthands for task="query" and task="document"

  • Modality is automatically inferred from input data (text strings, PIL Images, etc.)

  • You can override automatic inference by passing modality in model.encode() (and its variants) explicitly

Route Priority:

  1. Exact match: (task, modality) - e.g., ("query", "text")

  2. Task with any modality: (task, None) - e.g., ("query", None)

  3. Any task with modality: (None, modality) - e.g., (None, "image")

  4. Catch-all: (None, None)

  5. Direct lookup by task name in sub_modules

  6. Direct lookup by modality name in sub_modules

  7. Fall back to default_route if set

In the below examples, the Router model is used to create asymmetric models with different encoders for queries and documents. In these examples, the “query” route is efficient (e.g., using SparseStaticEmbedding), while the “document” route uses a more complex model (e.g. a Transformers module). This allows for efficient query encoding while still using a powerful document encoder, but the combinations are not limited to this.

Example

from sentence_transformers import SentenceTransformer
from sentence_transformers.sentence_transformer.modules import Router, Normalize

# Use a regular SentenceTransformer for the document embeddings, and a static embedding model for the query embeddings
document_embedder = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedder = SentenceTransformer("sentence-transformers/static-retrieval-mrl-en-v1")
router = Router.for_query_document(
    query_modules=list(query_embedder.children()),
    document_modules=list(document_embedder.children()),
)
normalize = Normalize()

# Create an asymmetric model with different encoders for queries and documents
model = SentenceTransformer(
    modules=[router, normalize],
)

# ... requires more training to align the vector spaces

# Use the query & document routes
query_embedding = model.encode_query("What is the capital of France?")
document_embedding = model.encode_document("Paris is the capital of France.")
from sentence_transformers.sparse_encoder.modules import Router, SparseStaticEmbedding, SpladePooling, Transformer
from sentence_transformers.sparse_encoder import SparseEncoder

# Load an asymmetric model with different encoders for queries and documents
doc_encoder = Transformer("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill", transformer_task="fill-mask")
router = Router.for_query_document(
    query_modules=[
        SparseStaticEmbedding.from_json(
            "opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill",
            tokenizer=doc_encoder.tokenizer,
            frozen=True,
        ),
    ],
    document_modules=[
        doc_encoder,
        SpladePooling(pooling_strategy="max", activation_function="log1p_relu"),
    ],
)

model = SparseEncoder(modules=[router], similarity_fn_name="dot")

query = "What's the weather in ny now?"
document = "Currently New York is rainy."

query_embed = model.encode_query(query)
document_embed = model.encode_document(document)

sim = model.similarity(query_embed, document_embed)
print(f"Similarity: {sim}")

# Visualize top tokens for each text
top_k = 10
print(f"Top tokens {top_k} for each text:")

decoded_query = model.decode(query_embed, top_k=top_k)
decoded_document = model.decode(document_embed)

for i in range(min(top_k, len(decoded_query))):
    query_token, query_score = decoded_query[i]
    doc_score = next((score for token, score in decoded_document if token == query_token), 0)
    if doc_score != 0:
        print(f"Token: {query_token}, Query score: {query_score:.4f}, Document score: {doc_score:.4f}")

'''
Similarity: tensor([[11.1105]], device='cuda:0')
Top tokens 10 for each text:
Token: ny, Query score: 5.7729, Document score: 0.8049
Token: weather, Query score: 4.5684, Document score: 0.9710
Token: now, Query score: 3.5895, Document score: 0.4720
Token: ?, Query score: 3.3313, Document score: 0.0286
Token: what, Query score: 2.7699, Document score: 0.0787
Token: in, Query score: 0.4989, Document score: 0.0417
'''

Multimodal Example:

from PIL import Image
from sentence_transformers import SentenceTransformer
from sentence_transformers.sentence_transformer.modules import Dense, Pooling, Router, Transformer

# Create separate encoders for different modalities
text_encoder = Transformer("sentence-transformers/all-MiniLM-L6-v2")
# Project to 768 dims to match image encoder
text_dense = Dense(text_encoder.get_embedding_dimension(), 768, module_input_name="token_embeddings")
image_encoder = Transformer(
    "ModernVBERT/modernvbert",
    model_kwargs={"trust_remote_code": True},
    processor_kwargs={"trust_remote_code": True},
    config_kwargs={"trust_remote_code": True},
)
pooling = Pooling(text_encoder.get_embedding_dimension())

# Route based on modality
router = Router(
    sub_modules={
        "text": [text_encoder, text_dense],
        "image": [image_encoder],
    },
    route_mappings={
        (None, "text"): "text",  # Any task with text goes to text encoder
        (None, ("text", "image")): "image",  # Any task with text-image together goes to image encoder
    },
)

model = SentenceTransformer(modules=[router, pooling])

# Modality is automatically inferred
text_embedding = model.encode("A photo of a cat")
multimodal_embedding = model.encode({"text": "A photo of a <image>", "image": Image.open("cat.jpg")})

# Compute the similarity; it'll be poor as the model hasn't yet been trained
similarity = model.similarity(text_embedding, multimodal_embedding)

Hybrid Asymmetric + Multimodal Example:

from sentence_transformers import SentenceTransformer
from sentence_transformers.sentence_transformer.modules import Router

# Different encoders for query text, document text, and images
router = Router(
    sub_modules={
        "query_text": [query_text_modules],
        "doc_text": [document_text_modules],
        "image": [image_modules],
    },
    route_mappings={
        ("query", "text"): "query_text",        # Query text uses efficient encoder
        ("document", "text"): "doc_text",       # Document text uses powerful encoder
        (None, ("text", "image")): "image",     # Any text-image together goes to image encoder
    },
)

model = SentenceTransformer(modules=[router])

# Explicit task + automatic modality inference
query_embedding = model.encode_query("Find images of cats")
doc_embedding = model.encode_document("Article about cats")
multimodal_embedding = model.encode({"text": "A photo of a cat", "image": Image.open("cat.jpg")})

Note

When training models with the Router module, you must use the router_mapping argument in the SentenceTransformerTrainingArguments or SparseEncoderTrainingArguments to map the training dataset columns to the correct route (“query” or “document”). For example, if your training dataset(s) have ["question", "positive", "negative"] columns, then you can use the following mapping:

args = SparseEncoderTrainingArguments(
    ...,
    router_mapping={
        "question": "query",
        "positive": "document",
        "negative": "document",
    }
)

Additionally, it is common to use a different learning rate for the different routes. For this, you should use the learning_rate_mapping argument in the SentenceTransformerTrainingArguments or SparseEncoderTrainingArguments to map parameter patterns to their learning rates. For example, if you want to use a learning rate of 1e-3 for an SparseStaticEmbedding module and 2e-5 for the rest of the model, you can do this:

args = SparseEncoderTrainingArguments(
    ...,
    learning_rate=2e-5,
    learning_rate_mapping={
        r"SparseStaticEmbedding\.*": 1e-3,
    }
)
Parameters:
  • sub_modules – Mapping of route keys to lists of modules. Each key corresponds to a specific route name (e.g., “text_query”, “text_document”, “image”, “multimodal”). Each route contains a list of modules that will be applied sequentially when that route is selected.

  • default_route – The default route to use if no task type or modality is specified. If None, an exception will be thrown if no task type is specified. If allow_empty_key is True, the first key in sub_modules will be used as the default route. Defaults to None.

  • allow_empty_key – If True, allows the default route to be set to the first key in sub_modules if default_route is None. Defaults to True.

  • route_mappings

    Optional dictionary mapping (task, modality) tuples to route keys in sub_modules. This enables sophisticated routing logic based on combinations of task and modality:

    • Use None as a wildcard for either task or modality to create catch-all rules

    • Modality can be a string (e.g., "text", "image") or tuple (e.g., ("text", "image"))

    • Routes are resolved with a priority order (see Route Resolution Priority above)

    • All mapped routes must exist in sub_modules (validated at initialization)

    Example mappings:

    {
        # Exact matches (highest priority)
        ("query", "text"): "efficient_text_encoder",
        ("document", "text"): "powerful_text_encoder",
    
        # Task with any modality
        ("query", None): "query_encoder",  # All query tasks
    
        # Any task with specific modality
        (None, "image"): "image_encoder",  # All image inputs
        (None, ("text", "image")): "multimodal_encoder",  # Multimodal inputs
    
        # Catch-all (lowest priority)
        (None, None): "default_encoder",
    }
    

    If not provided, the router will attempt direct lookup using the task or modality as the route key in sub_modules, then fall back to default_route.

classmethod for_query_document(query_modules: list[Module], document_modules: list[Module], default_route: str | None = 'document', allow_empty_key: bool = True) Self[source]

Creates a Router model specifically for query and document modules, allowing convenient usage via model.encode_query and model.encode_document.

Parameters:
  • query_modules – List of modules to be applied for the “query” task type.

  • document_modules – List of modules to be applied for the “document” task type.

  • default_route – The default route to use if no task type is specified. If None, an exception will be thrown if no task type is specified. If allow_empty_key is True, the first key in sub_modules will be used as the default route. Defaults to “document”.

  • allow_empty_key – If True, allows the default route to be set to the first key in sub_modules if default_route is None. Defaults to True.

Returns:

An instance of the Router model with the specified query and document modules.

Return type:

Router

Base Modules

class sentence_transformers.base.modules.Module(*args, **kwargs)[source]

Base class for all modules in the Sentence Transformers library.

This class provides a common interface for all modules, including methods for loading and saving the module’s configuration and weights. It also provides a method for performing the forward pass of the module.

Two abstract methods are defined in this class, which must be implemented by subclasses:

Optionally, you may also have to override:

To assist with loading and saving the module, several utility methods are provided:

And several class variables are defined to assist with loading and saving the module:

config_file_name: str = 'config.json'

The name of the configuration file used to save the module’s configuration. This file is used to initialize the module when loading it from a pre-trained model.

config_keys: list[str] = []

A list of keys used to save the module’s configuration. These keys are used to save the module’s configuration when saving the model to disk.

abstract forward(features: dict[str, Tensor | Any], **kwargs) dict[str, Tensor | Any][source]

Forward pass of the module. This method should be overridden by subclasses to implement the specific behavior of the module.

The forward method takes a dictionary of features as input and returns a dictionary of features as output. The keys in the features dictionary depend on the position of the module in the model pipeline, as the features dictionary is passed from one module to the next. Common keys in the features dictionary are:

  • input_ids: The input IDs of the tokens in the input text.

  • attention_mask: The attention mask for the input tokens.

  • token_type_ids: The token type IDs for the input tokens.

  • token_embeddings: The token embeddings for the input tokens.

  • sentence_embedding: The sentence embedding for the input text, i.e. pooled token embeddings.

Optionally, the forward method can accept additional keyword arguments (**kwargs) that can be used to pass additional information from model.encode to this module.

Parameters:
  • features (dict[str, torch.Tensor | Any]) – A dictionary of features to be processed by the module.

  • **kwargs – Additional keyword arguments that can be used to pass additional information from model.encode.

Returns:

A dictionary of features after processing by the module.

Return type:

dict[str, torch.Tensor | Any]

get_config_dict() dict[str, Any][source]

Returns a dictionary of the configuration parameters of the module.

These parameters are used to save the module’s configuration when saving the model to disk, and again used to initialize the module when loading it from a pre-trained model. The keys used in the dictionary are defined in the config_keys class variable.

Returns:

A dictionary of the configuration parameters of the module.

Return type:

dict[str, Any]

classmethod load(model_name_or_path: str, subfolder: str = '', token: bool | str | None = None, cache_folder: str | None = None, revision: str | None = None, local_files_only: bool = False, **kwargs) Self[source]

Load this module from a model checkpoint. The checkpoint can be either a local directory or a model id on Hugging Face.

Parameters:
  • model_name_or_path (str) – The path to the model directory or the name of the model on Hugging Face.

  • subfolder (str, optional) – The subfolder within the model directory to load from, e.g. "1_Pooling". Defaults to "".

  • token (bool | str | None, optional) – The token to use for authentication when loading from Hugging Face. If None, tries to use a token saved using huggingface-cli login or the HF_TOKEN environment variable. Defaults to None.

  • cache_folder (str | None, optional) – The folder to use for caching the model files. If None, uses the default cache folder for Hugging Face, ~/.cache/huggingface. Defaults to None.

  • revision (str | None, optional) – The revision of the model to load. If None, uses the latest revision. Defaults to None.

  • local_files_only (bool, optional) – Whether to only load local files. Defaults to False.

  • **kwargs – Additional module-specific arguments used in an overridden load method, such as trust_remote_code, model_kwargs, processor_kwargs, config_kwargs, backend, etc.

Returns:

The loaded module.

Return type:

Self

classmethod load_config(model_name_or_path: str, subfolder: str = '', config_filename: str | None = None, token: bool | str | None = None, cache_folder: str | None = None, revision: str | None = None, local_files_only: bool = False) dict[str, Any][source]

Load the configuration of the module from a model checkpoint. The checkpoint can be either a local directory or a model id on Hugging Face. The configuration is loaded from a JSON file, which contains the parameters used to initialize the module.

Parameters:
  • model_name_or_path (str) – The path to the model directory or the name of the model on Hugging Face.

  • subfolder (str, optional) – The subfolder within the model directory to load from, e.g. "1_Pooling". Defaults to "".

  • config_filename (str | None, optional) – The name of the configuration file to load. If None, uses the default configuration file name defined in the config_file_name class variable. Defaults to None.

  • token (bool | str | None, optional) – The token to use for authentication when loading from Hugging Face. If None, tries to use a token saved using huggingface-cli login or the HF_TOKEN environment variable. Defaults to None.

  • cache_folder (str | None, optional) – The folder to use for caching the model files. If None, uses the default cache folder for Hugging Face, ~/.cache/huggingface. Defaults to None.

  • revision (str | None, optional) – The revision of the model to load. If None, uses the latest revision. Defaults to None.

  • local_files_only (bool, optional) – Whether to only load local files. Defaults to False.

Returns:

A dictionary of the configuration parameters of the module.

Return type:

dict[str, Any]

static load_dir_path(model_name_or_path: str, subfolder: str = '', token: bool | str | None = None, cache_folder: str | None = None, revision: str | None = None, local_files_only: bool = False) str | None[source]

A utility function to load a directory from a model checkpoint. The checkpoint can be either a local directory or a model id on Hugging Face.

Parameters:
  • model_name_or_path (str) – The path to the model directory or the name of the model on Hugging Face.

  • subfolder (str, optional) – The subfolder within the model directory to load from, e.g. "1_Pooling". Defaults to "".

  • token (bool | str | None, optional) – The token to use for authentication when loading from Hugging Face. If None, tries to use a token saved using huggingface-cli login or the HF_TOKEN environment variable. Defaults to None.

  • cache_folder (str | None, optional) – The folder to use for caching the model files. If None, uses the default cache folder for Hugging Face, ~/.cache/huggingface. Defaults to None.

  • revision (str | None, optional) – The revision of the model to load. If None, uses the latest revision. Defaults to None.

  • local_files_only (bool, optional) – Whether to only load local files. Defaults to False.

Returns:

The path to the loaded directory.

Return type:

str | None

static load_file_path(model_name_or_path: str, filename: str, subfolder: str = '', token: bool | str | None = None, cache_folder: str | None = None, revision: str | None = None, local_files_only: bool = False) str | None[source]

A utility function to load a file from a model checkpoint. The checkpoint can be either a local directory or a model id on Hugging Face. The file is loaded from the specified subfolder within the model directory.

Parameters:
  • model_name_or_path (str) – The path to the model directory or the name of the model on Hugging Face.

  • filename (str) – The name of the file to load.

  • subfolder (str, optional) – The subfolder within the model directory to load from, e.g. "1_Pooling". Defaults to "".

  • token (bool | str | None, optional) – The token to use for authentication when loading from Hugging Face. If None, tries to use a token saved using huggingface-cli login or the HF_TOKEN environment variable. Defaults to None.

  • cache_folder (str | None, optional) – The folder to use for caching the model files. If None, uses the default cache folder for Hugging Face, ~/.cache/huggingface. Defaults to None.

  • revision (str | None, optional) – The revision of the model to load. If None, uses the latest revision. Defaults to None.

  • local_files_only (bool, optional) – Whether to only load local files. Defaults to False.

Returns:

The path to the loaded file, or None if the file was not found.

Return type:

str | None

classmethod load_torch_weights(model_name_or_path: str, subfolder: str = '', token: bool | str | None = None, cache_folder: str | None = None, revision: str | None = None, local_files_only: bool = False, model: Self | None = None)[source]

A utility function to load the PyTorch weights of a model from a checkpoint. The checkpoint can be either a local directory or a model id on Hugging Face. The weights are loaded from either a model.safetensors file or a pytorch_model.bin file, depending on which one is available. This method either loads the weights into the model or returns the weights as a state dictionary.

Parameters:
  • model_name_or_path (str) – The path to the model directory or the name of the model on Hugging Face.

  • subfolder (str, optional) – The subfolder within the model directory to load from, e.g. "2_Dense". Defaults to "".

  • token (bool | str | None, optional) – The token to use for authentication when loading from Hugging Face. If None, tries to use a token saved using huggingface-cli login or the HF_TOKEN environment variable. Defaults to None.

  • cache_folder (str | None, optional) – The folder to use for caching the model files. If None, uses the default cache folder for Hugging Face, ~/.cache/huggingface. Defaults to None.

  • revision (str | None, optional) – The revision of the model to load. If None, uses the latest revision. Defaults to None.

  • local_files_only (bool, optional) – Whether to only load local files. Defaults to False.

  • model (Self | None, optional) – The model to load the weights into. If None, returns the weights as a state dictionary. Defaults to None.

Raises:

ValueError – If neither a model.safetensors file nor a pytorch_model.bin file is found in the model checkpoint in the subfolder.

Returns:

The model with the loaded weights or the weights as a state dictionary,

depending on the value of the model argument.

Return type:

Self | dict[str, torch.Tensor]

abstract save(output_path: str, *args, safe_serialization: bool = True, **kwargs) None[source]

Save the module to disk. This method should be overridden by subclasses to implement the specific behavior of the module.

Parameters:
  • output_path (str) – The path to the directory where the module should be saved.

  • *args – Additional arguments that can be used to pass additional information to the save method.

  • safe_serialization (bool, optional) – Whether to use the safetensors format for saving the model weights. Defaults to True.

  • **kwargs – Additional keyword arguments that can be used to pass additional information to the save method.

save_config(output_path: str, filename: str | None = None) None[source]

Save the configuration of the module to a JSON file.

Parameters:
  • output_path (str) – The path to the directory where the configuration file should be saved.

  • filename (str | None, optional) – The name of the configuration file. If None, uses the default configuration file name defined in the config_file_name class variable. Defaults to None.

Returns:

None

save_in_root: bool = False

Whether to save the module’s configuration in the root directory of the model or in a subdirectory named after the module.

save_torch_weights(output_path: str, safe_serialization: bool = True) None[source]

Save the PyTorch weights of the module to disk.

Parameters:
  • output_path (str) – The path to the directory where the weights should be saved.

  • safe_serialization (bool, optional) – Whether to use the safetensors format for saving the model weights. Defaults to True.

Returns:

None

class sentence_transformers.base.modules.InputModule(*args, **kwargs)[source]

Subclass of sentence_transformers.base.modules.Module, base class for all input modules in the Sentence Transformers library, i.e. modules that are used to process inputs and optionally also perform processing in the forward pass.

This class provides a common interface for all input modules, including methods for loading and saving the module’s configuration and weights, as well as input processing. It also provides a method for performing the forward pass of the module.

Two abstract methods are inherited from Module and must be implemented by subclasses:

Additionally, subclasses should override:

Optionally, you may also have to override:

To assist with loading and saving the module, several utility methods are provided:

And several class variables are defined to assist with loading and saving the module:

property modalities: list[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]]

The list of supported input modalities. Defaults to ["text"].

preprocess(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict] | tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]], prompt: str | None = None, **kwargs) dict[str, Tensor | Any][source]

Preprocesses the input texts and returns a dictionary of preprocessed features.

Parameters:
  • inputs (list[SingleInput | PairInput]) – List of inputs to preprocess.

  • prompt (str | None) – Optional prompt to prepend to text inputs.

  • **kwargs – Additional keyword arguments for preprocessing, e.g. task.

Returns:

Dictionary containing preprocessed features, e.g.

{"input_ids": ..., "attention_mask": ...}, depending on what keys the module’s forward method expects.

Return type:

dict[str, torch.Tensor | Any]

save_in_root: bool = True

Whether to save the module’s configuration in the root directory of the model or in a subdirectory named after the module.

save_tokenizer(output_path: str, **kwargs) None[source]

Saves the tokenizer to the specified output path.

Parameters:
  • output_path (str) – Path to save the tokenizer.

  • **kwargs – Additional keyword arguments for saving the tokenizer.

Returns:

None

tokenize(texts: list[str], **kwargs) dict[str, Tensor | Any][source]

Deprecated since version `tokenize`: is deprecated. Use preprocess instead.

Tokenizes the input texts and returns a dictionary of tokenized features.

Parameters:
  • texts (list[str]) – List of input texts to tokenize.

  • **kwargs – Additional keyword arguments for tokenization, e.g. task.

Returns:

Dictionary containing tokenized features, e.g.

{"input_ids": ..., "attention_mask": ...}

Return type:

dict[str, torch.Tensor | Any]

tokenizer: PreTrainedTokenizerBase | Tokenizer

The tokenizer used for tokenizing the input texts. It can be either a transformers.PreTrainedTokenizerBase subclass or a Tokenizer from the tokenizers library.