Natural Questions Models
Google’s Natural Questions dataset consists of about 100k real search queries from Google with the respective, relevant passage from Wikipedia. Models trained on this dataset work well for question-answer retrieval.
Usage
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("nq-distilbert-base-v1")
query_embedding = model.encode("How many people live in London?")
# The passages are encoded as [ [title1, text1], [title2, text2], ...]
passage_embedding = model.encode(
[["London", "London has 9,787,426 inhabitants at the 2011 census."]]
)
print("Similarity:", util.cos_sim(query_embedding, passage_embedding))
Note: For the passage, we have to encode the Wikipedia article title together with a text paragraph from that article.
Performance
The models are evaluated on the Natural Questions development dataset using MRR@10.
Approach | MRR@10 (NQ dev set small) |
---|---|
nq-distilbert-base-v1 | 72.36 |
Other models | |
DPR | 58.96 |