Logo

Getting Started

  • Installation
    • Install with pip
    • Install with Conda
    • Install from Source
    • Editable Install
    • Install PyTorch with CUDA support
  • Quickstart
    • Sentence Transformer
    • Cross Encoder
    • Sparse Encoder
    • Next Steps
  • Migration Guide
    • Migrating from v4.x to v5.x
      • Migration for model.encode
      • Migration for Asym to Router
      • Migration of advanced usage
    • Migrating from v3.x to v4.x
      • Migration for parameters on CrossEncoder initialization and methods
      • Migration for specific parameters from CrossEncoder.fit
      • Migration for CrossEncoder evaluators
    • Migrating from v2.x to v3.x
      • Migration for specific parameters from SentenceTransformer.fit
      • Migration for custom Datasets and DataLoaders used in SentenceTransformer.fit

Sentence Transformer

  • Usage
    • Computing Embeddings
      • Initializing a Sentence Transformer Model
      • Calculating Embeddings
      • Prompt Templates
      • Input Sequence Length
      • Multi-Process / Multi-GPU Encoding
    • Semantic Textual Similarity
      • Similarity Calculation
    • Semantic Search
      • Background
      • Symmetric vs. Asymmetric Semantic Search
      • Manual Implementation
      • Optimized Implementation
      • Speed Optimization
      • Elasticsearch
      • OpenSearch
      • Approximate Nearest Neighbor
      • Retrieve & Re-Rank
      • Examples
    • Retrieve & Re-Rank
      • Retrieve & Re-Rank Pipeline
      • Retrieval: Bi-Encoder
      • Re-Ranker: Cross-Encoder
      • Example Scripts
      • Pre-trained Bi-Encoders (Retrieval)
      • Pre-trained Cross-Encoders (Re-Ranker)
    • Clustering
      • k-Means
      • Agglomerative Clustering
      • Fast Clustering
      • Topic Modeling
    • Paraphrase Mining
      • paraphrase_mining()
    • Translated Sentence Mining
      • Margin Based Mining
      • Examples
    • Image Search
      • Installation
      • Usage
      • Examples
    • Embedding Quantization
      • Binary Quantization
      • Scalar (int8) Quantization
      • Additional extensions
      • Demo
      • Try it yourself
    • Creating Custom Models
      • Structure of Sentence Transformer Models
      • Sentence Transformer Model from a Transformers Model
      • Advanced: Custom Modules
    • Speeding up Inference
      • PyTorch
      • ONNX
      • OpenVINO
      • Benchmarks
  • Pretrained Models
    • Original Models
    • Semantic Search Models
      • Multi-QA Models
      • MSMARCO Passage Models
    • Multilingual Models
      • Semantic Similarity Models
      • Bitext Mining
    • Image & Text-Models
    • INSTRUCTOR models
    • Scientific Similarity Models
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Deprecated Training
    • Best Base Embedding Models
    • Comparisons with CrossEncoder Training
  • Dataset Overview
    • Datasets on the Hugging Face Hub
    • Pre-existing Datasets
  • Loss Overview
    • Loss Table
    • Loss modifiers
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Semantic Textual Similarity
      • Training data
      • Loss Function
    • Natural Language Inference
      • Data
      • SoftmaxLoss
      • MultipleNegativesRankingLoss
    • Paraphrase Data
      • Pre-Trained Models
    • Quora Duplicate Questions
      • Training
      • MultipleNegativesRankingLoss
      • Pretrained Models
    • MS MARCO
      • Bi-Encoder
    • Matryoshka Embeddings
      • Use Cases
      • Results
      • Training
      • Inference
      • Code Examples
    • Adaptive Layers
      • Use Cases
      • Results
      • Training
      • Inference
      • Code Examples
    • Multilingual Models
      • Extend your own models
      • Training
      • Datasets
      • Sources for Training Data
      • Evaluation
      • Available Pre-trained Models
      • Usage
      • Performance
      • Citation
    • Model Distillation
      • Knowledge Distillation
      • Speed - Performance Trade-Off
      • Dimensionality Reduction
      • Quantization
    • Augmented SBERT
      • Motivation
      • Extend to your own datasets
      • Methodology
      • Scenario 1: Limited or small annotated datasets (few labeled sentence-pairs)
      • Scenario 2: No annotated datasets (Only unlabeled sentence-pairs)
      • Training
      • Citation
    • Training with Prompts
      • What are Prompts?
      • Why would we train with Prompts?
      • How do we train with Prompts?
    • Training with PEFT Adapters
      • Compatibility Methods
      • Adding a New Adapter
      • Loading a Pretrained Adapter
      • Training Script
    • Unsupervised Learning
      • TSDAE
      • SimCSE
      • CT
      • CT (In-Batch Negative Sampling)
      • Masked Language Model (MLM)
      • GenQ
      • GPL
      • Performance Comparison
    • Domain Adaptation
      • Domain Adaptation vs. Unsupervised Learning
      • Adaptive Pre-Training
      • GPL: Generative Pseudo-Labeling
    • Hyperparameter Optimization
      • HPO Components
      • Putting It All Together
      • Example Scripts
    • Distributed Training
      • Comparison
      • FSDP

Cross Encoder

  • Usage
    • Cross-Encoder vs Bi-Encoder
      • Cross-Encoder vs. Bi-Encoder
      • When to use Cross- / Bi-Encoders?
      • Cross-Encoders Usage
      • Combining Bi- and Cross-Encoders
      • Training Cross-Encoders
    • Retrieve & Re-Rank
      • Retrieve & Re-Rank Pipeline
      • Retrieval: Bi-Encoder
      • Re-Ranker: Cross-Encoder
      • Example Scripts
      • Pre-trained Bi-Encoders (Retrieval)
      • Pre-trained Cross-Encoders (Re-Ranker)
    • Speeding up Inference
      • PyTorch
      • ONNX
      • OpenVINO
      • Benchmarks
  • Pretrained Models
    • MS MARCO
    • SQuAD (QNLI)
    • STSbenchmark
    • Quora Duplicate Questions
    • NLI
    • Community Models
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
      • Hard Negatives Mining
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Training Tips
    • Deprecated Training
    • Comparisons with SentenceTransformer Training
  • Loss Overview
    • Loss Table
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Semantic Textual Similarity
      • Training data
      • Loss Function
      • Inference
    • Natural Language Inference
      • Data
      • CrossEntropyLoss
      • Inference
    • Quora Duplicate Questions
      • Training
      • Inference
    • MS MARCO
      • Cross Encoder
      • Training Scripts
      • Inference
    • Rerankers
      • BinaryCrossEntropyLoss
      • CachedMultipleNegativesRankingLoss
      • Inference
    • Model Distillation
      • Cross Encoder Knowledge Distillation
      • Inference
    • Distributed Training
      • Comparison
      • FSDP

Sparse Encoder

  • Usage
    • Computing Sparse Embeddings
      • Initializing a Sparse Encoder Model
      • Calculating Embeddings
      • Input Sequence Length
      • Controlling Sparsity
      • Interpretability with SPLADE Models
      • Multi-Process / Multi-GPU Encoding
    • Semantic Textual Similarity
      • Similarity Calculation
    • Semantic Search
      • Manual Search
      • Vector Database Search
      • Qdrant Integration
      • OpenSearch Integration
      • Seismic Integration
      • Elasticsearch Integration
    • Retrieve & Re-Rank
      • Overview
      • Interactive Demo: Simple Wikipedia Search
      • Comprehensive Evaluation: Hybrid Search Pipeline
      • Pre-trained Models
    • Sparse Encoder Evaluation
      • Example with Retrieval Evaluation:
  • Pretrained Models
    • Core SPLADE Models
    • Inference-Free SPLADE Models
    • Model Collections
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Training Tips
  • Dataset Overview
    • Datasets on the Hugging Face Hub
    • Pre-existing Datasets
  • Loss Overview
    • Sparse specific Loss Functions
      • SPLADE Loss
      • CSR Loss
    • Loss Table
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Model Distillation
      • MarginMSE
    • MS MARCO
      • SparseMultipleNegativesRankingLoss
    • Semantic Textual Similarity
      • Training data
      • Loss Function
    • Natural Language Inference
      • Data
      • SpladeLoss
    • Quora Duplicate Questions
      • Training
    • Information Retrieval
      • SparseMultipleNegativesRankingLoss (MNRL)
      • Inference & Evaluation
    • Distributed Training
      • Comparison
      • FSDP

Package Reference

  • Sentence Transformer
    • SentenceTransformer
      • SentenceTransformer
      • SentenceTransformerModelCardData
      • SimilarityFunction
    • Trainer
      • SentenceTransformerTrainer
    • Training Arguments
      • SentenceTransformerTrainingArguments
    • Losses
      • BatchAllTripletLoss
      • BatchHardSoftMarginTripletLoss
      • BatchHardTripletLoss
      • BatchSemiHardTripletLoss
      • ContrastiveLoss
      • OnlineContrastiveLoss
      • ContrastiveTensionLoss
      • ContrastiveTensionLossInBatchNegatives
      • CoSENTLoss
      • AnglELoss
      • CosineSimilarityLoss
      • DenoisingAutoEncoderLoss
      • GISTEmbedLoss
      • CachedGISTEmbedLoss
      • MSELoss
      • MarginMSELoss
      • MatryoshkaLoss
      • Matryoshka2dLoss
      • AdaptiveLayerLoss
      • MegaBatchMarginLoss
      • MultipleNegativesRankingLoss
      • CachedMultipleNegativesRankingLoss
      • MultipleNegativesSymmetricRankingLoss
      • CachedMultipleNegativesSymmetricRankingLoss
      • SoftmaxLoss
      • TripletLoss
      • DistillKLDivLoss
    • Samplers
      • BatchSamplers
      • MultiDatasetBatchSamplers
    • Evaluation
      • BinaryClassificationEvaluator
      • EmbeddingSimilarityEvaluator
      • InformationRetrievalEvaluator
      • NanoBEIREvaluator
      • MSEEvaluator
      • ParaphraseMiningEvaluator
      • RerankingEvaluator
      • SentenceEvaluator
      • SequentialEvaluator
      • TranslationEvaluator
      • TripletEvaluator
    • Datasets
      • ParallelSentencesDataset
      • SentenceLabelDataset
      • DenoisingAutoEncoderDataset
      • NoDuplicatesDataLoader
    • Modules
      • Main Modules
      • Further Modules
      • Base Modules
    • quantization
      • quantize_embeddings()
      • semantic_search_faiss()
      • semantic_search_usearch()
  • Cross Encoder
    • CrossEncoder
      • CrossEncoder
      • CrossEncoderModelCardData
    • Trainer
      • CrossEncoderTrainer
    • Training Arguments
      • CrossEncoderTrainingArguments
    • Losses
      • BinaryCrossEntropyLoss
      • CrossEntropyLoss
      • LambdaLoss
      • ListMLELoss
      • PListMLELoss
      • ListNetLoss
      • MultipleNegativesRankingLoss
      • CachedMultipleNegativesRankingLoss
      • MSELoss
      • MarginMSELoss
      • RankNetLoss
    • Evaluation
      • CrossEncoderRerankingEvaluator
      • CrossEncoderNanoBEIREvaluator
      • CrossEncoderClassificationEvaluator
      • CrossEncoderCorrelationEvaluator
  • Sparse Encoder
    • SparseEncoder
      • SparseEncoder
      • SparseEncoderModelCardData
      • SimilarityFunction
    • Trainer
      • SparseEncoderTrainer
    • Training Arguments
      • SparseEncoderTrainingArguments
    • Losses
      • SpladeLoss
      • FlopsLoss
      • CSRLoss
      • CSRReconstructionLoss
      • SparseMultipleNegativesRankingLoss
      • SparseMarginMSELoss
      • SparseDistillKLDivLoss
      • SparseTripletLoss
      • SparseCosineSimilarityLoss
      • SparseCoSENTLoss
      • SparseAnglELoss
      • SparseMSELoss
    • Samplers
      • BatchSamplers
      • MultiDatasetBatchSamplers
    • Evaluation
      • SparseInformationRetrievalEvaluator
      • SparseNanoBEIREvaluator
      • SparseEmbeddingSimilarityEvaluator
      • SparseBinaryClassificationEvaluator
      • SparseTripletEvaluator
      • SparseRerankingEvaluator
      • SparseTranslationEvaluator
      • SparseMSEEvaluator
      • ReciprocalRankFusionEvaluator
    • Modules
      • SPLADE Pooling
      • MLM Transformer
      • SparseAutoEncoder
      • SparseStaticEmbedding
    • Callbacks
      • SpladeRegularizerWeightSchedulerCallback
    • Search Engines
      • semantic_search_elasticsearch()
      • semantic_search_opensearch()
      • semantic_search_qdrant()
      • semantic_search_seismic()
  • util
    • Helper Functions
      • community_detection()
      • http_get()
      • is_training_available()
      • mine_hard_negatives()
      • normalize_embeddings()
      • paraphrase_mining()
      • semantic_search()
      • truncate_embeddings()
    • Model Optimization
      • export_dynamic_quantized_onnx_model()
      • export_optimized_onnx_model()
      • export_static_quantized_openvino_model()
    • Similarity Metrics
      • cos_sim()
      • dot_score()
      • euclidean_sim()
      • manhattan_sim()
      • pairwise_cos_sim()
      • pairwise_dot_score()
      • pairwise_euclidean_sim()
      • pairwise_manhattan_sim()
Sentence Transformers
  • Usage
  • Semantic Search
  • Edit on GitHub

Semantic Search

Semantic search refers to search techniques that go beyond traditional keyword-based search. Instead of relying solely on exact matches of keywords, semantic search aims to understand the meaning and context of the query and the documents being searched. This allows for more relevant and accurate search results, even when the exact keywords may not match.

Sparse embeddings are a type of representation where most of the values are zero, and only a small number of dimensions contain non-zero (a.k.a. active) values. This is in contrast to dense embeddings, where all dimensions typically have non-zero values. Traditional sparse embedding solutions are often lexically based, meaning they rely on exact matches of terms or phrases. However, modern sparse encoders like SPLADE and other sparse encoder models can generate embeddings that capture semantic meaning while still being sparse.

These embeddings can allow for extremely efficient semantic search, as long as the search solution takes good advantage of the fact that the large majority of sparse embedding dimensions are 0. This page shows an example demonstrating how to perform semantic search manually, but also how to integrate a SparseEncoder model with popular vector databases/search systems.

If you aren’t familiar with Semantic Search, see the Sentence Transformers > Semantic Search for a broader explanation using dense embedding models.

Manual Search

Manually performing semantic search with sparse encoders is straightforward, and only consists of a few steps:

  1. Load a SparseEncoder model: Load a pretrained sparse encoder model from the Hugging Face Hub or your local directory.

  2. Encode the corpus: Use the model to encode a set of documents (the corpus) into sparse embeddings.

  3. Encode the queries: Encode the user queries into sparse embeddings using the same model.

  4. Compute similarity: Calculate the similarity between the query embeddings and the corpus embeddings using a suitable similarity function (e.g., cosine similarity, dot product).

  5. Retrieve results: Sort the results based on similarity scores and return the most relevant documents.

  6. Analyze results: Optionally, analyze the results to understand which tokens contributed most to the similarity scores.

Documentation

  1. SparseEncoder

  2. SparseEncoder.encode_query

  3. SparseEncoder.encode_document

  4. util.semantic_search

  5. SparseEncoder.similarity

  6. naver/splade-cocondenser-ensembledistil

from sentence_transformers import SparseEncoder, util

# 1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# 2. Encode a corpus of texts using the SparseEncoder model
corpus = [
    "Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed.",
    "Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning.",
    "Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains.",
    "Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments.",
    "The James Webb Space Telescope is the largest optical telescope in space, designed to conduct infrared astronomy.",
    "SpaceX's Starship is designed to be a fully reusable transportation system capable of carrying humans to Mars and beyond.",
    "Global warming is the long-term heating of Earth's climate system observed since the pre-industrial period due to human activities.",
    "Renewable energy sources include solar, wind, hydro, and geothermal power that naturally replenish over time.",
    "Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground.",
]

# Use "convert_to_tensor=True" to keep the tensors on GPU (if available)
corpus_embeddings = model.encode_document(corpus, convert_to_tensor=True)

# 3. Encode the user queries using the same SparseEncoder model
queries = [
    "How do artificial neural networks work?",
    "What technology is used for modern space exploration?",
    "How can we address climate change challenges?",
]
query_embeddings = model.encode_query(queries, convert_to_tensor=True)

# 4. Use the similarity function to compute the similarity scores between the query and corpus embeddings
top_k = min(5, len(corpus))  # Find at most 5 sentences of the corpus for each query sentence
results = util.semantic_search(query_embeddings, corpus_embeddings, top_k=top_k, score_function=model.similarity)

# 5. Sort the results and print the top 5 most similar sentences for each query
for query_id, query in enumerate(queries):
    pointwise_scores = model.intersection(query_embeddings[query_id], corpus_embeddings)

    print(f"Query: {query}")
    for res in results[query_id]:
        corpus_id, score = res.values()
        sentence = corpus[corpus_id]

        pointwise_score = model.decode(pointwise_scores[corpus_id], top_k=10)

        token_scores = ", ".join([f'("{token.strip()}", {value:.2f})' for token, value in pointwise_score])

        print(f"Score: {score:.4f} - Sentence: {sentence} - Top influential tokens: {token_scores}")
    print("")
Toggle To See Results
"""
Query: How do artificial neural networks work?
Score: 16.9053 - Sentence: Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains. - Top influential tokens: ("neural", 5.71), ("networks", 3.24), ("network", 2.93), ("brain", 2.10), ("computer", 0.50), ("##uron", 0.32), ("artificial", 0.27), ("technology", 0.27), ("communication", 0.27), ("connection", 0.21)
Score: 13.6119 - Sentence: Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. - Top influential tokens: ("artificial", 3.71), ("neural", 3.15), ("networks", 1.78), ("brain", 1.22), ("network", 1.12), ("ai", 1.07), ("machine", 0.39), ("robot", 0.20), ("technology", 0.20), ("algorithm", 0.18)
Score: 2.7373 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("machine", 0.78), ("computer", 0.50), ("technology", 0.32), ("artificial", 0.22), ("robot", 0.21), ("ai", 0.20), ("process", 0.16), ("theory", 0.11), ("technique", 0.11), ("fuzzy", 0.06)
Score: 2.1430 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("technology", 0.42), ("function", 0.41), ("mechanism", 0.21), ("sensor", 0.21), ("device", 0.18), ("process", 0.18), ("generator", 0.13), ("detection", 0.10), ("technique", 0.10), ("tracking", 0.05)
Score: 2.0195 - Sentence: Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments. - Top influential tokens: ("robot", 0.67), ("function", 0.34), ("technology", 0.29), ("device", 0.23), ("experiment", 0.20), ("machine", 0.10), ("artificial", 0.08), ("design", 0.04), ("useful", 0.03), ("they", 0.02)

Query: What technology is used for modern space exploration?
Score: 10.4748 - Sentence: SpaceX's Starship is designed to be a fully reusable transportation system capable of carrying humans to Mars and beyond. - Top influential tokens: ("space", 4.40), ("technology", 1.15), ("nasa", 1.06), ("mars", 0.63), ("exploration", 0.52), ("spacecraft", 0.44), ("robot", 0.32), ("rocket", 0.28), ("astronomy", 0.27), ("travel", 0.26)
Score: 9.3818 - Sentence: The James Webb Space Telescope is the largest optical telescope in space, designed to conduct infrared astronomy. - Top influential tokens: ("space", 3.89), ("nasa", 1.09), ("astronomy", 0.93), ("discovery", 0.48), ("instrument", 0.47), ("technology", 0.35), ("device", 0.26), ("spacecraft", 0.25), ("invented", 0.22), ("equipment", 0.22)
Score: 8.5147 - Sentence: Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments. - Top influential tokens: ("technology", 1.39), ("mars", 0.79), ("exploration", 0.78), ("robot", 0.67), ("used", 0.66), ("nasa", 0.52), ("spacecraft", 0.44), ("device", 0.39), ("explore", 0.38), ("travel", 0.25)
Score: 7.6993 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("technology", 1.99), ("tech", 1.76), ("technologies", 1.74), ("equipment", 0.32), ("device", 0.31), ("technological", 0.28), ("mining", 0.22), ("sensor", 0.19), ("tool", 0.18), ("software", 0.11)
Score: 2.5526 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("technology", 1.52), ("machine", 0.27), ("robot", 0.21), ("computer", 0.18), ("engineering", 0.12), ("technique", 0.11), ("science", 0.05), ("technological", 0.05), ("techniques", 0.02), ("innovation", 0.01)

Query: How can we address climate change challenges?
Score: 9.5587 - Sentence: Global warming is the long-term heating of Earth's climate system observed since the pre-industrial period due to human activities. - Top influential tokens: ("climate", 3.21), ("warming", 2.87), ("weather", 1.58), ("change", 0.46), ("global", 0.41), ("environmental", 0.39), ("storm", 0.19), ("pollution", 0.15), ("environment", 0.11), ("adaptation", 0.08)
Score: 1.3191 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("warming", 0.39), ("pollution", 0.34), ("environmental", 0.15), ("goal", 0.12), ("strategy", 0.07), ("monitoring", 0.07), ("protection", 0.06), ("greenhouse", 0.05), ("safety", 0.02), ("escape", 0.01)
Score: 1.0774 - Sentence: Renewable energy sources include solar, wind, hydro, and geothermal power that naturally replenish over time. - Top influential tokens: ("conservation", 0.39), ("sustainability", 0.18), ("environmental", 0.18), ("sustainable", 0.13), ("agriculture", 0.13), ("alternative", 0.07), ("recycling", 0.00)
Score: 0.2401 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("strategy", 0.10), ("success", 0.06), ("foster", 0.04), ("engineering", 0.03), ("innovation", 0.00), ("research", 0.00)
Score: 0.1516 - Sentence: Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. - Top influential tokens: ("strategy", 0.09), ("foster", 0.04), ("research", 0.01), ("approach", 0.01), ("engineering", 0.01)
"""

Vector Database Search

Alternatively, some vector databases and search engines can be used to perform semantic search with sparse encoders. These systems are designed to efficiently handle large-scale vector data and provide fast retrieval of relevant documents. They can leverage the sparsity of the embeddings to optimize storage and search operations.

The overall structure is similar to the manual search, but the vector database handles the indexing and retrieval of documents. The steps are approximately as follows:

  1. Encode the corpus: Load your data and encode the documents using a pretrained sparse encoder.

  2. Indexing: The documents and their sparse embeddings are indexed in the vector database.

  3. Encode the query: User queries are encoded with the same sparse encoder.

  4. Retrieval: The vector database performs a similarity search to find the most relevant documents.

  5. Results: Search results are returned with their similarity scores and document content.

The advantages of Sparse Vectors for search are:

  • Efficiency: Sparse vectors (where most values are zero) can be stored and searched more efficiently than dense vectors.

  • Interpretability: Non-zero dimensions in sparse embeddings often correspond to specific tokens, allowing you to understand which tokens contributed to the similarity score.

  • Exact Matching: Sparse vectors can preserve exact term matching signals that might be lost in dense embeddings.

Qdrant Integration

This example demonstrates how to set up Qdrant for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_qdrant.py or below:

Prerequisites:

  • Qdrant running locally (or accessible), see the Qdrant Quickstart for more details.

  • Python Qdrant Client installed:

    pip install qdrant-client
    

Documentation

  1. SparseEncoder

  2. SparseEncoder.encode_query

  3. SparseEncoder.encode_document

  4. semantic_search_qdrant

  5. naver/splade-cocondenser-ensembledistil

  6. sentence-transformers/natural-questions

import time

from datasets import load_dataset
from sentence_transformers import SparseEncoder
from sentence_transformers.sparse_encoder.search_engines import semantic_search_qdrant

# 1. Load the natural-questions dataset with 100K answers
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
num_docs = 10_000
corpus = dataset["answer"][:num_docs]

# 2. Come up with some queries
queries = dataset["query"][:2]

# 3. Load the model
sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# 4. Encode the corpus
corpus_embeddings = sparse_model.encode_document(
    corpus, convert_to_sparse_tensor=True, batch_size=16, show_progress_bar=True
)

# Initially, we don't have a qdrant index yet
corpus_index = None
while True:
    # 5. Encode the queries using the full precision
    start_time = time.time()
    query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True)
    print(f"Encoding time: {time.time() - start_time:.6f} seconds")

    # 6. Perform semantic search using qdrant
    results, search_time, corpus_index = semantic_search_qdrant(
        query_embeddings,
        corpus_index=corpus_index,
        corpus_embeddings=corpus_embeddings if corpus_index is None else None,
        top_k=5,
        output_index=True,
    )

    # 7. Output the results
    print(f"Search time: {search_time:.6f} seconds")
    for query, result in zip(queries, results):
        print(f"Query: {query}")
        for entry in result:
            print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
        print("")

    # 8. Prompt for more queries
    queries = [input("Please enter a question: ")]

OpenSearch Integration

This example demonstrates how to set up OpenSearch for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_opensearch.py or below:

Prerequisites:

  • OpenSearch running locally (or accessible), see OpenSearch locally for more details.

  • Further, you need the Python OpenSearch Client installed: https://docs.opensearch.org/docs/latest/clients/python-low-level/, e.g.:

      pip install opensearch-py
    
  • This script was created for opensearch v2.15.0+.

Documentation

  1. SparseEncoder

  2. SparseEncoder.encode_query

  3. SparseEncoder.encode_document

  4. semantic_search_opensearch

  5. opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill

  6. sentence-transformers/natural-questions

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder
from sentence_transformers.models import Router
from sentence_transformers.sparse_encoder.models import MLMTransformer, SparseStaticEmbedding, SpladePooling
from sentence_transformers.sparse_encoder.search_engines import semantic_search_opensearch

# 1. Load the natural-questions dataset with 100K answers
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
num_docs = 10_000
corpus = dataset["answer"][:num_docs]
print(f"Finish loading data. Corpus size: {len(corpus)}")

# 2. Come up with some queries
queries = dataset["query"][:2]

# 3. Load the model
model_id = "opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill"
doc_encoder = MLMTransformer(model_id)
router = Router.for_query_document(
    query_modules=[
        SparseStaticEmbedding.from_json(
            model_id,
            tokenizer=doc_encoder.tokenizer,
            frozen=True,
        ),
    ],
    document_modules=[
        doc_encoder,
        SpladePooling("max", activation_function="log1p_relu"),
    ],
)

sparse_model = SparseEncoder(modules=[router], similarity_fn_name="dot")

print("Start encoding corpus...")
start_time = time.time()
# 4. Encode the corpus
corpus_embeddings = sparse_model.encode_document(
    corpus, convert_to_sparse_tensor=True, batch_size=32, show_progress_bar=True
)
corpus_embeddings_decoded = sparse_model.decode(corpus_embeddings)
print(f"Corpus encoding time: {time.time() - start_time:.6f} seconds")

corpus_index = None
while True:
    # 5. Encode the queries using inference-free mode
    start_time = time.time()
    query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True)
    query_embeddings_decoded = sparse_model.decode(query_embeddings)
    print(f"Query encoding time: {time.time() - start_time:.6f} seconds")

    # 6. Perform semantic search using OpenSearch
    results, search_time, corpus_index = semantic_search_opensearch(
        query_embeddings_decoded,
        corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
        corpus_index=corpus_index,
        top_k=5,
        output_index=True,
    )

    # 7. Output the results
    print(f"Search time: {search_time:.6f} seconds")
    for query, result in zip(queries, results):
        print(f"Query: {query}")
        for entry in result:
            print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
        print("")

    # 8. Prompt for more queries
    queries = [input("Please enter a question: ")]

Seismic Integration

This example demonstrates how to use Seismic for extremely performant sparse vector search. It does not require running a separate client, but instead performs search directly in memory. The Seismic library was introduced in Bruch et al. (2024), where it’s shown to outperform the common inverted file (IVF) approach by an order of magnitude. For more information on building your Seismic Index you can look at the Seismic Guidelines. See semantic_search_seismic.py or below:

Prerequisites:

  • The Seismic Python package installed:

    pip install pyseismic-lsr
    

Documentation

  1. SparseEncoder

  2. SparseEncoder.encode_query

  3. SparseEncoder.encode_document

  4. semantic_search_seismic

  5. naver/splade-cocondenser-ensembledistil

  6. sentence-transformers/natural-questions

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder
from sentence_transformers.sparse_encoder.search_engines import semantic_search_seismic

# 1. Load the natural-questions dataset with 100K answers
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
num_docs = 10_000
corpus = dataset["answer"][:num_docs]

# 2. Come up with some queries
queries = dataset["query"][:2]

# 3. Load the model
sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# 4. Encode the corpus
print("Start encoding corpus...")
start_time = time.time()
corpus_embeddings = sparse_model.encode_document(
    corpus, convert_to_sparse_tensor=True, batch_size=16, show_progress_bar=True
)
corpus_embeddings_decoded = sparse_model.decode(corpus_embeddings)
print(f"Corpus encoding time: {time.time() - start_time:.6f} seconds")

corpus_index = None
while True:
    # 5. Encode the queries using the full precision
    start_time = time.time()
    query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True)
    query_embeddings_decoded = sparse_model.decode(query_embeddings)
    print(f"Encoding time: {time.time() - start_time:.6f} seconds")

    # 6. Perform semantic search using Seismic
    results, search_time, corpus_index = semantic_search_seismic(
        query_embeddings_decoded,
        corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
        corpus_index=corpus_index,
        top_k=5,
        output_index=True,
    )

    # 7. Output the results
    print(f"Search time: {search_time:.6f} seconds")
    for query, result in zip(queries, results):
        print(f"Query: {query}")
        for entry in result:
            print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
        print("")

    # 8. Prompt for more queries
    queries = [input("Please enter a question: ")]

Elasticsearch Integration

This example demonstrates how to set up Elasticsearch for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_elasticsearch.py or below:

Prerequisites:

  • Elasticsearch running locally (or accessible), see Elasticsearch locally for more details.

  • Python Elasticsearch client installed:

    pip install elasticsearch
    

Documentation

  1. SparseEncoder

  2. SparseEncoder.encode_query

  3. SparseEncoder.encode_document

  4. semantic_search_elasticsearch

  5. naver/splade-cocondenser-ensembledistil

  6. sentence-transformers/natural-questions

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder
from sentence_transformers.sparse_encoder.search_engines import semantic_search_elasticsearch

# 1. Load the natural-questions dataset with 100K answers
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
num_docs = 10_000
corpus = dataset["answer"][:num_docs]

# 2. Come up with some queries
queries = dataset["query"][:2]

# 3. Load the model
sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

# 4. Encode the corpus
print("Start encoding corpus...")
start_time = time.time()
corpus_embeddings = sparse_model.encode_document(
    corpus, convert_to_sparse_tensor=True, batch_size=16, show_progress_bar=True
)
corpus_embeddings_decoded = sparse_model.decode(corpus_embeddings)
print(f"Corpus encoding time: {time.time() - start_time:.6f} seconds")

corpus_index = None
while True:
    # 5. Encode the queries using the full precision
    start_time = time.time()
    query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True)
    query_embeddings_decoded = sparse_model.decode(query_embeddings)
    print(f"Encoding time: {time.time() - start_time:.6f} seconds")

    # 6. Perform semantic search using Elasticsearch
    results, search_time, corpus_index = semantic_search_elasticsearch(
        query_embeddings_decoded,
        corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
        corpus_index=corpus_index,
        top_k=5,
        output_index=True,
    )

    # 7. Output the results
    print(f"Search time: {search_time:.6f} seconds")
    for query, result in zip(queries, results):
        print(f"Query: {query}")
        for entry in result:
            print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
        print("")

    # 8. Prompt for more queries
    queries = [input("Please enter a question: ")]
Previous Next

© Copyright 2025.

Built with Sphinx using a theme provided by Read the Docs.