Publications
If you find this repository helpful, feel free to cite our publication Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks:
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "http://arxiv.org/abs/1908.10084",
}
If you use one of the multilingual models, feel free to cite our publication Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation:
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
If you use the code for data augmentation, feel free to cite our publication Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks:
@inproceedings{thakur-2020-AugSBERT,
title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna",
booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = "6",
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2010.08240",
pages = "296--310",
}
If you use the models for MS MARCO, feel free to cite the paper: The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes
@inproceedings{reimers-2020-Curse_Dense_Retrieval,
title = "The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
month = "8",
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2012.14210",
pages = "605--611",
}
When you use the unsupervised learning example, please have a look at: TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning:
@inproceedings{wang-2021-TSDAE,
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
pages = "671--688",
url = "https://arxiv.org/abs/2104.06979",
}
When you use the GenQ learning example, please have a look at: BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models:
@inproceedings{thakur-2021-BEIR,
title = "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
author = {Thakur, Nandan and Reimers, Nils and R{\"{u}}ckl{\'{e}}, Andreas and Srivastava, Abhishek and Gurevych, Iryna},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) - Datasets and Benchmarks Track (Round 2)},
month = "4",
year = "2021",
url = "https://arxiv.org/abs/2104.08663",
}
When you use GPL, please have a look at: GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval:
@inproceedings{wang-2021-GPL,
title = "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval",
author = "Wang, Kexin and Thakur, Nandan and Reimers, Nils and Gurevych, Iryna",
journal= "arXiv preprint arXiv:2112.07577",
month = "12",
year = "2021",
url = "https://arxiv.org/abs/2112.07577",
}
Repositories using SentenceTransformers
haystack - Neural Search / Q&A
Top2Vec - Topic modeling
txtai - AI-powered search engine
BERTTopic - Topic model using SBERT embeddings
KeyBERT - Key phrase extraction using SBERT
contextualized-topic-models - Cross-Lingual Topic Modeling
covid-papers-browser - Semantic Search for Covid-19 papers
backprop - Natural Language Engine that makes using state-of-the-art language models easy, accessible and scalable.
SentenceTransformers in Articles
In the following you find a (selective) list of articles / applications using SentenceTransformers to do amazing stuff. Feel free to contact me (info@nils-reimers.de) to add you application here.
December 2021 - Sentence Transformer Fine-Tuning (SetFit): Outperforming GPT-3 on few-shot Text-Classification while being 1600 times smaller
October 2021: Natural Language Processing (NLP) for Semantic Search
January 2021 - Advance BERT model via transferring knowledge from Cross-Encoders to Bi-Encoders
November 2020 - How to Build a Semantic Search Engine With Transformers and Faiss
October 2020 - Topic Modeling with BERT
September 2020 - Elastic Transformers - Making BERT stretchy - Scalable Semantic Search on a Jupyter Notebook
July 2020 - Simple Sentence Similarity Search with SentenceBERT
May 2020 - HN Time Machine: finally some Hacker News history!
March 2020 - Building a k-NN Similarity Search Engine using Amazon Elasticsearch and SageMaker
February 2020 - Semantic Search Engine with Sentence BERT
SentenceTransformers used in Research
SentenceTransformers is used in hundreds of research projects. For a list of publications, see Google Scholar or Semantic Scholar.