MS MARCO — screenshot of microsoft.github.io

MS MARCO

MS MARCO is a foundational dataset collection for deep learning in search, primarily known for its question-answer pairs ideal for finetuning embedding models. It's a go-to resource for developing robust ranking systems.

Visit microsoft.github.io →

Questions & Answers

What is MS MARCO?
MS MARCO is a collection of large-scale datasets developed by Microsoft, primarily focused on deep learning in search. It includes datasets for question answering, passage ranking, document ranking, natural language generation, and keyphrase extraction, often derived from real Bing query logs.
Who should use the MS MARCO datasets?
MS MARCO is intended for researchers and developers working on information retrieval, natural language processing, and deep learning, particularly those focused on improving search relevance, question answering systems, and fine-tuning neural embedding models. The datasets are available for non-commercial research purposes.
What are the key features of the MS MARCO dataset collection?
The MS MARCO collection stands out due to its large scale, including millions of real Bing questions and passages, and its focus on various deep learning in search tasks. It provides human-generated answers and relevance labels, offering a robust foundation for training and evaluating models in document ranking, passage ranking, and question answering.
When should I use MS MARCO for my research?
You should use MS MARCO if you are developing or evaluating deep learning models for tasks such as question answering, document ranking, passage re-ranking, or natural language generation, especially if your goal is to improve real-world search engine performance. It's particularly useful for finetuning embedding models.
What kind of data is included in MS MARCO's ranking tasks?
For tasks like document and passage ranking, MS MARCO provides large corpuses (e.g., 3.2 million documents or 8.8 million passages) along with relevance labels derived from what passages were marked as having the answer in the original QnA dataset. These tasks often involve re-ranking a candidate set (e.g., top 100 BM25 results).