This page documents the properties and methods when you load a SentenceTransformer model:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('model-name')
class sentence_transformers.SentenceTransformer(model_name_or_path: str = None, modules: Iterable[torch.nn.modules.module.Module] = None, device: str = None)

Loads or create a SentenceTransformer model, that can be used to map sentences / text to embeddings.

  • model_name_or_path – If it is a filepath on disc, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from Huggingface models repository with that name.

  • modules – This parameter can be used to create custom SentenceTransformer models from scratch.

  • device – Device (like ‘cuda’ / ‘cpu’) that should be used for computation. If None, checks if a GPU can be used.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

property device

Get torch.device from module, assuming that the whole module has one device.

encode(sentences: Union[str, List[str], List[int]], batch_size: int = 32, show_progress_bar: bool = None, output_value: str = 'sentence_embedding', convert_to_numpy: bool = True, convert_to_tensor: bool = False, is_pretokenized: bool = False, device: str = None, num_workers: int = 0) → Union[List[torch.Tensor], numpy.ndarray, torch.Tensor]

Computes sentence embeddings :param sentences: the sentences to embed :param batch_size: the batch size used for the computation :param show_progress_bar: Output a progress bar when encode sentences :param output_value: Default sentence_embedding, to get sentence embeddings. Can be set to token_embeddings to get wordpiece token embeddings. :param convert_to_numpy: If true, the output is a list of numpy vectors. Else, it is a list of pytorch tensors. :param convert_to_tensor: If true, you get one large tensor as return. Overwrites any setting from convert_to_numpy :param is_pretokenized: If is_pretokenized=True, sentences must be a list of integers, containing the tokenized sentences with each token convert to the respective int. :param device: Which torch.device to use for the computation :param num_workers: Number of background-workers to tokenize data. Set to positive number to increase tokenization speed :return:

By default, a list of tensors is returned. If convert_to_tensor, a stacked tensor is returned. If convert_to_numpy, a numpy matrix is returned.

encode_multi_process(sentences: List[str], pool: Dict[str, object], is_pretokenized: bool = False, chunk_size=None)

This method allows to run encode() on multiple GPUs. The sentences are chunked into smaller packages and sent to individual processes, which encode these on the different GPUs. This method is only suitable for encoding large sets of sentences

  • sentences – List of sentences

  • pool – A pool of workers started with SentenceTransformer.start_multi_process_pool

  • is_pretokenized – If true, no tokenization will be applied. It is expected that the input sentences are list of ints.

  • chunk_size – Sentences are chunked and sent to the individual processes. If none, it determine a sensible size.


Numpy matrix with all embeddings

evaluate(evaluator: sentence_transformers.evaluation.SentenceEvaluator.SentenceEvaluator, output_path: str = None)

Evaluate the model

  • evaluator – the evaluator

  • output_path – the evaluator can write the results to this path

fit(train_objectives: Iterable[Tuple[, torch.nn.modules.module.Module]], evaluator: sentence_transformers.evaluation.SentenceEvaluator.SentenceEvaluator = None, epochs: int = 1, steps_per_epoch=None, scheduler: str = 'WarmupLinear', warmup_steps: int = 10000, optimizer_class: Type[torch.optim.optimizer.Optimizer] = <class 'transformers.optimization.AdamW'>, optimizer_params: Dict[str, object] = {'correct_bias': False, 'eps': 1e-06, 'lr': 2e-05}, weight_decay: float = 0.01, evaluation_steps: int = 0, output_path: str = None, save_best_model: bool = True, max_grad_norm: float = 1, use_amp: bool = False, callback: Callable[[float, int, int], None] = None, output_path_ignore_not_empty: bool = False)

Train the model with the given training objective Each training objective is sampled in turn for one batch. We sample only as many batches from each objective as there are in the smallest one to make sure of equal training with each dataset.

  • train_objectives – Tuples of (DataLoader, LossFunction). Pass more than one for multi-task learning

  • evaluator – An evaluator (sentence_transformers.evaluation) evaluates the model performance during training on held-out dev data. It is used to determine the best model that is saved to disc.

  • epochs – Number of epochs for training

  • steps_per_epoch – Number of training steps per epoch. If set to None (default), one epoch is equal the DataLoader size from train_objectives.

  • scheduler – Learning rate scheduler. Available schedulers: constantlr, warmupconstant, warmuplinear, warmupcosine, warmupcosinewithhardrestarts

  • warmup_steps – Behavior depends on the scheduler. For WarmupLinear (default), the learning rate is increased from o up to the maximal learning rate. After these many training steps, the learning rate is decreased linearly back to zero.

  • optimizer_class – Optimizer

  • optimizer_params – Optimizer parameters

  • weight_decay – Weight decay for model parameters

  • evaluation_steps – If > 0, evaluate the model using evaluator after each number of training steps

  • output_path – Storage path for the model and evaluation files

  • save_best_model – If true, the best model (according to evaluator) is stored at output_path

  • max_grad_norm – Used for gradient normalization.

  • use_amp – Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0

  • callback – Callback function that is invoked after each evaluation. It must accept the following three parameters in this order: score, epoch, steps

  • output_path_ignore_not_empty – deprecated, no longer used


Returns the maximal sequence length for input the model accepts. Longer inputs will be truncated

property max_seq_length

Property to get the maximal input sequence length for the model. Longer inputs will be truncated.


Saves all elements for this seq. sentence embedder into different sub-folders


Transforms a batch from a SmartBatchingDataset to a batch of tensors for the model Here, batch is a list of tuples: [(tokens, label), …]


batch – a batch from a SmartBatchingDataset


a batch of tensors for the model


Transforms a batch from a SmartBatchingDataset to a batch of tensors for the model. Here, batch is a list of texts


batch – a batch from a SmartBatchingDataset


a batch of tensors for the model

start_multi_process_pool(target_devices: List[str] = None, encode_batch_size: int = 32)

Starts multi process to process the encoding with several, independent processes. This method is recommended if you want to encode on multiple GPUs. It is advised to start only one process per GPU. This method works together with encode_multi_process

  • target_devices – PyTorch target devices, e.g. cuda:0, cuda:1… If None, all available CUDA devices will be used

  • encode_batch_size – Batch size for each process when calling encode


Returns a dict with the target processes, an input queue and and output queue.

static stop_multi_process_pool(pool)

Stops all processes started with start_multi_process_pool

tokenize(text: str)

Tokenizes the text

property tokenizer

Property to get the tokenizer that is used by this model