Product

Resources

About

Contact

Search for free

Open app

Community voices

May 8, 2026

Powering Literature Retrieval for Agentic Biology

Kexin Huang, Co-founder & CEO, Phylo

#1

Top-ranked retrieval quality in Phylo’s internal benchmark

~20K × 100s

Targets × diseases screened via the Consensus API

~0 errors

Production reliability inside an agentic pipeline

#1

Top-ranked retrieval quality in Phylo’s internal benchmark

~20K × 100s

Targets × diseases screened via the Consensus API

~0 errors

Production reliability inside an agentic pipeline

Phylo, an AI-native IDE for biology built by researchers from Stanford, ran a head-to-head benchmark of literature retrieval providers on biomedical queries with core papers.

Consensus came out on top, and now powers literature search across Phylo’s agent workflows, including a high-throughput screen across ~20,000 targets and hundreds of diseases.

“It’s very robust, with almost zero errors. It gives you genuinely useful results, and it embeds into a high-throughput agentic workflow easily.”

Kexin Huang

Co-founder & CEO, Phylo

“It’s very robust, with almost zero errors. It gives you genuinely useful results, and it embeds into a high-throughput agentic workflow easily.”

Kexin Huang

Co-founder & CEO, Phylo

“It’s very robust, with almost zero errors. It gives you genuinely useful results, and it embeds into a high-throughput agentic workflow easily.”

Kexin Huang

Co-founder & CEO, Phylo

the company

An integrated biology environment for scientists and AI agents

Kexin Huang is co-founder and CEO of Phylo, a startup building what the team calls an integrated biology environment, an IDE for biology where scientists and AI agents work together to do biomedical research with rigor and at scale. Phylo integrates across biomedical data sources, tools, and software, and provides the agentic infrastructure to automate the kinds of tasks biologists have done manually for the past hundred years.

the problem

Literature search sits underneath almost everything Phylo does

Literature search sits underneath almost everything that environment does. It’s the cornerstone task, present in nearly every workflow a biologist runs.

“Even in analysis tasks, there’s literature research underneath,” Huang explains. “For single-cell annotation, you need to find marker genes. To compare a paper’s findings against your own dataset, you need literature. To brainstorm a hypothesis, to understand a mechanism of action for a particular drug and disease, all of this requires a large volume oftons of literature. That’s why we spent so much time making sure our literature search was the highest quality.”

The benchmark

Phylo measured whether providers surfaced the right papers

So the team ran a benchmark. Phylo curated a set of biomedical queries spanning several disease areas, each with a ground-truth list of papers known to be highly relevant.

They evaluated retrieval quality across providers, measuring at top 20 results how many of the truly relevant papers each system actually surfaced. Consensus came out on top.

Curated biomedical queries

Questions spanned multiple disease areas and research contexts.

Ground-truth papers

Each query had papers already known to be highly relevant.

Top 20 retrieval check

Providers were compared by how many truly relevant papers appeared in their top results.

Curated biomedical queries

Questions spanned multiple disease areas and research contexts.

Ground-truth papers

Each query had papers already known to be highly relevant.

Top 20 retrieval check

Providers were compared by how many truly relevant papers appeared in their top results.

in production

Literature retrieval now runs inside Phylo’s agent workflows

That benchmark result is what put the Consensus API into Phylo’s product. Today, it powers literature search across agent workflows where researchers move from a hypothesis or dataset signal into evidence-backed next steps.

To maximize coverage, Phylo’s agents make parallel calls across multiple angles of a question, gather relevant papers, and distill the results into a summary for the user.

Biology question defined

A hypothesis, GWAS hit, or dataset signal starts the workflow.

Parallel search angles generated

Phylo queries multiple angles of the question for better coverage.

Evidence returned by Consensus

Relevant papers come back across those retrieval paths.

Summary delivered to the user

Phylo distills the evidence into a usable research answer.

The same infrastructure also runs at high throughput. One internal Phylo team uses the Consensus API to screen roughly 20,000 targets across hundreds of diseases, pulling literature evidence for every target-disease pair.

The bigger picture

Building the infrastructure for a new way of doing biology

Phylo’s ambition stretches well beyond literature search. Huang wants the company to be the agentic infrastructure for a new way of doing biology, one that compresses discovery timelines by 100x or more, accelerates drug development, and surfaces hidden findings already buried in pharma’s underutilized data. But the path there runs through getting the foundational pieces right, and for Phylo, literature retrieval was one of those pieces.

“We benchmarked across many different providers. We ended up choosing Consensus.”

Kexin Huang

Co-founder & CEO, Phylo

“We benchmarked across many different providers. We ended up choosing Consensus.”

Kexin Huang

Co-founder & CEO, Phylo

Ready to give your students research superpowers?

Students and researchers at over 10,000 universities worldwide already research with Consensus. We partner with libraries, labs, and universities to provide the best academic research tools to students and faculty.

Request a demo