Models

sentence_transformers.models defines different building blocks, that can be used to create SentenceTransformer networks from scratch. For more details, see Training Overview.

Main Classes

class sentence_transformers.models.Transformer(model_name_or_path: str, max_seq_length: int = 128, model_args: Dict = {}, cache_dir: Optional[str] = None, tokenizer_args: Dict = {}, do_lower_case: Optional[bool] = None)

Huggingface AutoModel to generate token embeddings. Loads the correct class, e.g. BERT / RoBERTa etc.

Parameters
  • model_name_or_path – Huggingface models name (https://huggingface.co/models)

  • max_seq_length – Truncate any inputs longer than max_seq_length

  • model_args – Arguments (key, value pairs) passed to the Huggingface Transformers model

  • cache_dir – Cache dir for Huggingface Transformers to store/load models

  • tokenizer_args – Arguments (key, value pairs) passed to the Huggingface Tokenizer model

  • do_lower_case – Lowercase the input

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.Pooling(word_embedding_dimension: int, pooling_mode_cls_token: bool = False, pooling_mode_max_tokens: bool = False, pooling_mode_mean_tokens: bool = True, pooling_mode_mean_sqrt_len_tokens: bool = False)

Performs pooling (max or mean) on the token embeddings.

Using pooling, it generates from a variable sized sentence a fixed sized sentence embedding. This layer also allows to use the CLS token if it is returned by the underlying word embedding model. You can concatenate multiple poolings together.

Parameters
  • word_embedding_dimension – Dimensions for the word embeddings

  • pooling_mode_cls_token – Use the first token (CLS token) as text representations

  • pooling_mode_max_tokens – Use max in each dimension over all tokens.

  • pooling_mode_mean_tokens – Perform mean-pooling

  • pooling_mode_mean_sqrt_len_tokens – Perform mean-pooling, but devide by sqrt(input_length).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.Dense(in_features: int, out_features: int, bias: bool = True, activation_function=Tanh())

Feed-forward function with activiation function.

This layer takes a fixed-sized sentence embedding and passes it through a feed-forward layer. Can be used to generate deep averaging networs (DAN).

Parameters
  • in_features – Size of the input dimension

  • out_features – Output size

  • bias – Add a bias vector

  • activation_function – Pytorch activation function applied on output

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Further Classes

class sentence_transformers.models.BoW(vocab: List[str], word_weights: Dict[str, float] = {}, unknown_word_weight: float = 1, cumulative_term_frequency: bool = True)

Implements a Bag-of-Words (BoW) model to derive sentence embeddings.

A weighting can be added to allow the generation of tf-idf vectors. The output vector has the size of the vocab.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.CNN(in_word_embedding_dimension: int, out_channels: int = 256, kernel_sizes: List[int] = [1, 3, 5])

CNN-layer with multiple kernel-sizes over the word embeddings

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.LSTM(word_embedding_dimension: int, hidden_dim: int, num_layers: int = 1, dropout: float = 0, bidirectional: bool = True)

Bidirectional LSTM running over word embeddings.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.Normalize

This layer normalizes embeddings to unit length

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.WeightedLayerPooling(word_embedding_dimension, num_hidden_layers: int = 12, layer_start: int = 4, layer_weights=None)

Token embeddings are weighted mean of their different hidden layer representations

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.WKPooling(word_embedding_dimension, layer_start: int = 4, context_window_size: int = 2)

Pooling based on the paper: “SBERT-WK: A Sentence Embedding Method ByDissecting BERT-based Word Models” https://arxiv.org/pdf/2002.06652.pdf

Note: SBERT-WK uses QR decomposition. torch QR decomposition is currently extremely slow when run on GPU. Hence, the tensor is first transferred to the CPU before it is applied. This makes this pooling method rather slow

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.WordEmbeddings(tokenizer: sentence_transformers.models.tokenizer.WordTokenizer.WordTokenizer, embedding_weights, update_embeddings: bool = False, max_seq_length: int = 1000000)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class sentence_transformers.models.WordWeights(vocab: List[str], word_weights: Dict[str, float], unknown_word_weight: float = 1)

This model can weight word embeddings, for example, with idf-values.

Parameters
  • vocab – Vocabulary of the tokenizer

  • word_weights – Mapping of tokens to a float weight value. Words embeddings are multiplied by this float value. Tokens in word_weights must not be equal to the vocab (can contain more or less values)

  • unknown_word_weight – Weight for words in vocab, that do not appear in the word_weights lookup. These can be for example rare words in the vocab, where no weight exists.