A 10-Agent Research Pipeline with Human-in-the-Loop Review
A 10-Agent Research Pipeline with Human-in-the-Loop Review
How a medical researcher uses the Consensus MCP + Claude to rapidly identify PhD-worthy research niches in an unfamiliar field

Researcher snapshot
Name | Alex Eponon |
|---|---|
Role | PhD researcher |
Project | Basis Research Agents, an open-source multi-agent research pipeline |
Tools | Claude Sonnet 4.5 (primary), Claude Haiku 4.5 and Ollama fallbacks, Consensus MCP, 9 additional academic APIs, ConceptNet |
License | MIT |
Github Repository |
Alex wanted to use AI to do fundamental research, but he kept running into a control problem. A one-shot Claude prompt will happily write a plausible literature review, and a single search tool will return a list of papers. Neither gives the researcher visibility into which claims are grounded, where ideas come from historically, or where the real gaps in a field sit. As more steps get chained together, it becomes harder to trace any individual claim back to a specific paper.
The second problem was coverage. Alex pulled from a wide set of sources including OpenAlex, arXiv, PubMed, Semantic Scholar, CORE, PhilPapers, PhilArchive, PhilSci-Archive, Google Books, and Open Library. Each one covers a different corner of the literature. Philosophy of science lives in PhilPapers. Preprints live in arXiv. Biomedical work lives in PubMed. Even with all of these sources plugged in, there were still research questions where relevant papers did not surface. The keyword-based APIs were good at finding exact matches, but they missed papers that used different terminology for the same underlying idea.
That gap is what drove Alex to add Consensus to the pipeline.
This solution involves code
Alex used code to create a custom research agent. You can access it on the github repo linked at the start. If you are not comfortable with code, you can take the lessons learned here and apply them to your own projects.
Alex built a pipeline that runs 10 specialized agents in sequence. Each agent does one job, writes its own markdown output, and hands context to the next. Three mandatory human review breaks are baked into the flow so the researcher stays in control of direction before the system spends more tokens.
Agent | Job |
|---|---|
Social | Collects current papers and books from 8 academic sources including Consensus |
Grounder | Decomposes the research question into sub-questions and excavates intellectual origins and seminal works |
Historian | Builds a chronological development map of the field, including abandoned research directions |
Gaper | Identifies and classifies gaps as empirical, conceptual, methodological, or theoretical |
Vision | Draws logical implications from established findings |
Theorist | Proposes concrete, scoped, falsifiable research approaches anchored in the identified gaps |
Rude | Adversarial critic. Evaluates proposals with empirical rigor and identifies the weakest links |
Synthesizer | Produces a unified research narrative and sharpens the original problem statement |
Thinker | Opens genuinely new research directions beyond the existing proposals |
Scribe | Writes the final document in the format the researcher specifies |
Alex has provided access to the repository he created below:
PHASE 1
Collection
Agent in this phase: Social
Goal: Cast the widest possible net. Pull current papers and books from every relevant academic source, expanded by semantic concept mapping so the pool covers adjacent themes the researcher did not name explicitly.
CHECKPOINT
The researcher reviews the themes the pipeline identified and the sources it plans to search. This is the cheapest moment to redirect. Confirming direction here prevents wasted compute on synthesis downstream.
PHASE 2
Foundations
Agents in this phase: Grounder, Historian, Gaper
Goal: Understand the field before proposing anything. Trace where the ideas came from, how they evolved over time, and where the real gaps sit. By the end of this phase, the researcher has a map of the intellectual territory and a classified list of open problems.
CHECKPOINT
Agents in this phase: Grounder, Historian, Gaper
Goal: Understand the field before proposing anything. Trace where the ideas came from, how they evolved over time, and where the real gaps sit. By the end of this phase, the researcher has a map of the intellectual territory and a classified list of open problems.
PHASE 3
Proposals and Synthesis
Agents in this phase: Vision, Theorist, Rude, Synthesizer
Goal: Turn the foundations into concrete, falsifiable research proposals that survive adversarial review, then synthesize everything upstream into a unified narrative. Rude exists specifically to poke holes so weak proposals get filtered out before they reach the researcher.
CHECKPOINT
The researcher reviews the proposals and synthesis, then specifies the final output format (blog post, literature review, research brief, paper section, grant background, or internal memo). This is where the researcher tells the pipeline what artifact they actually need.
PHASE 4
Expansion and Writing
Agents in this phase: Thinker, Scribe
Goal: Open directions beyond the existing proposals and produce the final deliverable in the chosen format. Thinker looks for angles the upstream agents did not catch. Scribe pulls from every upstream markdown output and produces the polished artifact.

1
Split the work across specialized agents, not one big prompt
Alex gave each agent a narrow job and its own markdown output file. This keeps the context clean for the next agent and makes every step auditable. A researcher can open any agent's file and see what it produced without sifting through one giant context window.

2
Use an adversarial agent to stress-test proposals
The Rude agent exists specifically to poke holes. Its job is to find the weakest empirical claim and call it out. Most pipelines converge on a single direction because every agent is trying to be helpful. Alex added friction on purpose so the output survives real scrutiny.

3
Put the human back in the loop at three precise moments
Rather than letting your agent run wild, create checkpoints throughout to ensure that it is researching in the right direction. In this use case, Alex created 3 different checkpoints at critical points in the agents process. Allowing the end researcher to use their own expertise to guide the agent.
Alex had already plugged nine other academic sources into the Social agent before Consensus was added. Most of those sources run on keyword matching. They return what you ask for, and they miss what you did not phrase correctly. When Alex connected Consensus through the MCP, he was looking for a different retrieval behavior. Semantic search ranks papers by meaning rather than by literal string match, which meant it could find relevant work even when the terminology did not overlap with the original query.
Once Alex ran the same research questions through Consensus alongside the other sources, he saw the difference in what came back. Consensus surfaced papers the other APIs had missed entirely, and it filled in the conceptual corners of the field that keyword search could not reach.
Alex shipped this as version 1.0.0 and plans to keep iterating. The near-term priorities include tighter anti-hallucination guardrails between agents (for example, an extended reference section that pins every claim to a specific quote in a specific paper), broader gray-literature coverage for social-science work that does not sit in peer-reviewed journals, and integration with the Consensus citation graph once it lands in the MCP so the pipeline can crawl references forward and backward from a seed paper.
Start by defining the narrow jobs your pipeline actually needs. Alex landed on 10 agents. Yours might need 4 or 5. Name them and give each one a single output.
Insert human review breaks at the points where direction matters most. Before search, before synthesis, before final writing.
Add an adversarial agent whose only job is to find the weakest claim. This is the single highest-leverage addition most builders skip.
Connect Consensus via the MCP and treat it as a primary relevance-ranking source. Layer keyword sources (OpenAlex, arXiv, PubMed) on top for coverage.
Persist every intermediate artifact to local storage so you can resume, audit, and re-use past runs.
Alex has provided access to the repository he created below:
Become a Consensus MCP expert.
For courses and more information how to use the MCP, check out our guide below.





