Powering Literature Retrieval for Agentic Biology

Kexin Huang, Co-founder & CEO, Phylo
Phylo, an AI-native IDE for biology built by researchers from Stanford, ran a head-to-head benchmark of literature retrieval providers on biomedical queries with core papers.
Consensus came out on top, and now powers literature search across Phylo’s agent workflows, including a high-throughput screen across ~20,000 targets and hundreds of diseases.
1
the company
An integrated biology environment for scientists and AI agents
Kexin Huang is co-founder and CEO of Phylo, a startup building what the team calls an integrated biology environment, an IDE for biology where scientists and AI agents work together to do biomedical research with rigor and at scale. Phylo integrates across biomedical data sources, tools, and software, and provides the agentic infrastructure to automate the kinds of tasks biologists have done manually for the past hundred years.
2
the problem
Literature search sits underneath almost everything Phylo does
Literature search sits underneath almost everything that environment does. It’s the cornerstone task, present in nearly every workflow a biologist runs.
“Even in analysis tasks, there’s literature research underneath,” Huang explains. “For single-cell annotation, you need to find marker genes. To compare a paper’s findings against your own dataset, you need literature. To brainstorm a hypothesis, to understand a mechanism of action for a particular drug and disease, all of this requires a large volume oftons of literature. That’s why we spent so much time making sure our literature search was the highest quality.”
3
The benchmark
Phylo measured whether providers surfaced the right papers
So the team ran a benchmark. Phylo curated a set of biomedical queries spanning several disease areas, each with a ground-truth list of papers known to be highly relevant.
They evaluated retrieval quality across providers, measuring at top 20 results how many of the truly relevant papers each system actually surfaced. Consensus came out on top.
4
in production
Literature retrieval now runs inside Phylo’s agent workflows
That benchmark result is what put the Consensus API into Phylo’s product. Today, it powers literature search across agent workflows where researchers move from a hypothesis or dataset signal into evidence-backed next steps.
To maximize coverage, Phylo’s agents make parallel calls across multiple angles of a question, gather relevant papers, and distill the results into a summary for the user.

Biology question defined
A hypothesis, GWAS hit, or dataset signal starts the workflow.

Parallel search angles generated
Phylo queries multiple angles of the question for better coverage.

Evidence returned by Consensus
Relevant papers come back across those retrieval paths.

Summary delivered to the user
Phylo distills the evidence into a usable research answer.
The same infrastructure also runs at high throughput. One internal Phylo team uses the Consensus API to screen roughly 20,000 targets across hundreds of diseases, pulling literature evidence for every target-disease pair.
5
The bigger picture
Building the infrastructure for a new way of doing biology
Phylo’s ambition stretches well beyond literature search. Huang wants the company to be the agentic infrastructure for a new way of doing biology, one that compresses discovery timelines by 100x or more, accelerates drug development, and surfaces hidden findings already buried in pharma’s underutilized data. But the path there runs through getting the foundational pieces right, and for Phylo, literature retrieval was one of those pieces.








