Introducing: GPT-4-powered, scientific summaries

Today, we are thrilled to officially launch GPT-4-powered scientific summaries.

The first version of Consensus, allowed for faster, more efficient searching of scientific research and got rid of the list of blue links.

Earlier this year, the Consensus Meter, allowed users to see the landscape of evidence in one click – for a specific type of question.

Today, our new GPT-4-powered feature allows this for any research question.

The summary will run over the first 5-10 results, and will only include answers that our model believes are relevant enough to your question.

And importantly, unlike some well-known chatbots, the results that power the summary will appear directly below the summary box and are all from peer-reviewed sources.

Important: our summary feature has limitations to recognize. Please see the section at the bottom for details.

To see it in action, try asking a research question like:

Guardrails:

We are as excited as anybody about the future of the AI-powered information space. However, we also want to ensure that the powerful features we release are built thoughtfully, ethically, and with guardrails in place to further assist in delivering good information.

Here are a few ways that we are trying to put guardrails on our summary feature:

No black boxes – all the results that power the summary can be seen directly below its interface. Our model is explicitly instructed to only include information from these results in the summary output.
Our underlying results are extracted, not generated – most LLM-powered products are fully generative, meaning the AI is creating all of the text. Unfortunately, these models sometimes outright hallucinate. To help mitigate this, the results that power the generative summary are all extracted word-for-word quotes from papers.
Relevancy and confidence thresholding – if our models do not think the results are relevant enough to answer your question, we will exclude them from our analysis.

Limitations:

But – there are still plenty of limitations and this feature (and all other features) will continually be a work in progress.

A few limitations to call out:

Research quality is not a part of the analysis – a limitation that we cannot wait to address! Currently, each claim counts the same toward the summary regardless if it comes from a meta-analysis or an n = 1 case report. Not all research is created equal and future versions of this feature will take into account both paper and journal quality. Remember, sometimes the most relevant answers come from junk research.
The summary is only as good as our search logic – the summary feature synthesizes the top 5 to 10 results that we surface. There will be times where the summary does not answer your question particularly well but the summary actually did what it was “supposed to do” – our search models just did not do a good job surfacing claims that answered your research question
We do not have access to all research – the Consensus database includes north of 150 million peer-reviewed papers. While this represents significant coverage, there is plenty of amazing research that we do have access to. The summary is just a snapshot of some of the relevant research that we have access to, not a fully-comprehensive look into all of the research regarding your question
Hallucinations can occur – based on our testing, GPT-4, when given the right instructions, is significantly less prone to generating content that does not represent the underlying source material than previous iterations. However, anytime a generative model is being used, there is the possibility of it creating an answer that is not based on reality. Always read the papers that power the summary when coming to a final answer to your question!

Feedback:

The summary feature is in Beta and we need your help to improve it! If you see any answers misclassified, please let us know by either:

Opening a support ticket in the chat widget
Or sending an email to support@consensus.app