How a Consensus Search Works & Other FAQ’s
This article will lift up the hood on how we go about delivering you results in the current Consensus product and answer a few frequently asked questions.
Our product will always be a work in progress, and this is just the first iteration.
Let’s start with a few FAQ’s…
What is Consensus:
Consensus is a search engine that uses language models to surface papers and synthesize insights from academic research papers.
Consensus is not a chatbot, but we use the same technology throughout the product to help make the research process more efficient.
What does Consensus search over:
The current source material used in Consensus comes from the Semantic Scholar database, which includes over 200M papers across all domains of science.
We will continue to add more data to the product over time and our dataset is updated on a monthly cadence.
How should I format my query:
Glad you asked! Check out our blog on the subject: Consensus Best Practices
Now to the fun stuff…..
How a Consensus search works:
Before a search is run:
We run a custom fine-tuned language model over our entire corpus of research papers and extract the “key takeaway” from every paper.
A user enters their query into the search bar.
We then remove “stop words” (things like “what”, “is”, “are” etc.) from the query and run a combination of keyword search + vector search over the abstract and title of ALL our papers.
This gives us a very intelligent measure of the relevance of a document to the user’s query.
This relevance score is then combined with many other pieces of metadata including but not limited to citation count, velocity of citations, study design and publish date to re-rank the results and produce a top 20 possible results to surface.
If the user input a question or a “phrase” (like “benefits of mindfulness”) into the search engine, we then run a custom fine-tuned language model that generates a “question-relevant conclusion” based on the user query and the abstract of the given paper.
If the user input a keyword search or any other unclassified query type, we use the “key takeaway” that we extracted ahead of time (see above).
With this new list of 20 results (either the generated conclusion, or the “key takeaway”), we then run a final custom fine-tuned language model that is built for question and answering that ranks the results according to how well they address the user’s query.
This language model determines the final order in which the results appear on the search screen.
If you have input a question or a “phrase” into the search engine, and the results are deemed relevant enough, we then run OpenAI’s GPT-4 model over the top 10 results to produce you a simple one sentence summary of the top studies to your question. This generated summary can be seen in the “Summary” box in the top left corner.
If you have input a yes/no question into the search engine, and the results are deemed relevant enough, we then run a custom fine-tuned LLM that classifies the results as suggesting either “yes”, “no” or “possibly” to your
The aggregated results of this model can be seen in the “Consensus Meter” in the top right corner of the results page.
After the search:
You now have a list of the most relevant paper to your question and a key insight or conclusion from that paper.
Want to learn more about a result? Try out our study snapshot feature that extracts key details from the study like population, sample size and more!
This page was created based on feedback from our first users. If there is more you want to learn, please email us with your feedback!
Additionally, if you want to see other content, we have created to help improve our user experience, please check out our piece on Consensus Best Practices.