A 10-Agent Research Pipeline with Human-in-the-Loop Review

Product

Resources

About

Contact

Search for free

Open app

Resources

A 10-Agent Research Pipeline with Human-in-the-Loop Review

How a medical researcher uses the Consensus MCP + Claude to rapidly identify PhD-worthy research niches in an unfamiliar field

Researcher snapshot

Name	Alex Eponon
Role	PhD researcher
Project	Basis Research Agents, an open-source multi-agent research pipeline
Tools	Claude Sonnet 4.5 (primary), Claude Haiku 4.5 and Ollama fallbacks, Consensus MCP, 9 additional academic APIs, ConceptNet
License	MIT
Github Repository	github.com/anvix9/basis_research_agents

The Problem

Alex wanted to use AI to do fundamental research, but he kept running into a control problem. A one-shot Claude prompt will happily write a plausible literature review, and a single search tool will return a list of papers. Neither gives the researcher visibility into which claims are grounded, where ideas come from historically, or where the real gaps in a field sit. As more steps get chained together, it becomes harder to trace any individual claim back to a specific paper.

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

The second problem was coverage. Alex pulled from a wide set of sources including OpenAlex, arXiv, PubMed, Semantic Scholar, CORE, PhilPapers, PhilArchive, PhilSci-Archive, Google Books, and Open Library. Each one covers a different corner of the literature. Philosophy of science lives in PhilPapers. Preprints live in arXiv. Biomedical work lives in PubMed. Even with all of these sources plugged in, there were still research questions where relevant papers did not surface. The keyword-based APIs were good at finding exact matches, but they missed papers that used different terminology for the same underlying idea.

That gap is what drove Alex to add Consensus to the pipeline.

The Solution

This solution involves code

Alex used code to create a custom research agent. You can access it on the github repo linked at the start. If you are not comfortable with code, you can take the lessons learned here and apply them to your own projects.

Alex built a pipeline that runs 10 specialized agents in sequence. Each agent does one job, writes its own markdown output, and hands context to the next. Three mandatory human review breaks are baked into the flow so the researcher stays in control of direction before the system spends more tokens.

Agent	Job
Social	Collects current papers and books from 8 academic sources including Consensus
Grounder	Decomposes the research question into sub-questions and excavates intellectual origins and seminal works
Historian	Builds a chronological development map of the field, including abandoned research directions
Gaper	Identifies and classifies gaps as empirical, conceptual, methodological, or theoretical
Vision	Draws logical implications from established findings
Theorist	Proposes concrete, scoped, falsifiable research approaches anchored in the identified gaps
Rude	Adversarial critic. Evaluates proposals with empirical rigor and identifies the weakest links
Synthesizer	Produces a unified research narrative and sharpens the original problem statement
Thinker	Opens genuinely new research directions beyond the existing proposals
Scribe	Writes the final document in the format the researcher specifies

Alex has provided access to the repository he created below:

Github repo

2.1

How It Works

PHASE 1

Collection

Agent in this phase: Social

Goal: Cast the widest possible net. Pull current papers and books from every relevant academic source, expanded by semantic concept mapping so the pool covers adjacent themes the researcher did not name explicitly.

CHECKPOINT

The researcher reviews the themes the pipeline identified and the sources it plans to search. This is the cheapest moment to redirect. Confirming direction here prevents wasted compute on synthesis downstream.

PHASE 2

Foundations

Agents in this phase: Grounder, Historian, Gaper

Goal: Understand the field before proposing anything. Trace where the ideas came from, how they evolved over time, and where the real gaps sit. By the end of this phase, the researcher has a map of the intellectual territory and a classified list of open problems.

CHECKPOINT

Agents in this phase: Grounder, Historian, Gaper

PHASE 3

Proposals and Synthesis

Agents in this phase: Vision, Theorist, Rude, Synthesizer

Goal: Turn the foundations into concrete, falsifiable research proposals that survive adversarial review, then synthesize everything upstream into a unified narrative. Rude exists specifically to poke holes so weak proposals get filtered out before they reach the researcher.

CHECKPOINT

The researcher reviews the proposals and synthesis, then specifies the final output format (blog post, literature review, research brief, paper section, grant background, or internal memo). This is where the researcher tells the pipeline what artifact they actually need.

PHASE 4

Expansion and Writing

Agents in this phase: Thinker, Scribe

Goal: Open directions beyond the existing proposals and produce the final deliverable in the chosen format. Thinker looks for angles the upstream agents did not catch. Scribe pulls from every upstream markdown output and produces the polished artifact.

Key Insights & Learnings

Split the work across specialized agents, not one big prompt

Alex gave each agent a narrow job and its own markdown output file. This keeps the context clean for the next agent and makes every step auditable. A researcher can open any agent's file and see what it produced without sifting through one giant context window.

Use an adversarial agent to stress-test proposals

The Rude agent exists specifically to poke holes. Its job is to find the weakest empirical claim and call it out. Most pipelines converge on a single direction because every agent is trying to be helpful. Alex added friction on purpose so the output survives real scrutiny.

Put the human back in the loop at three precise moments

Rather than letting your agent run wild, create checkpoints throughout to ensure that it is researching in the right direction. In this use case, Alex created 3 different checkpoints at critical points in the agents process. Allowing the end researcher to use their own expertise to guide the agent.

How Consensus Fits In

Alex had already plugged nine other academic sources into the Social agent before Consensus was added. Most of those sources run on keyword matching. They return what you ask for, and they miss what you did not phrase correctly. When Alex connected Consensus through the MCP, he was looking for a different retrieval behavior. Semantic search ranks papers by meaning rather than by literal string match, which meant it could find relevant work even when the terminology did not overlap with the original query.

Once Alex ran the same research questions through Consensus alongside the other sources, he saw the difference in what came back. Consensus surfaced papers the other APIs had missed entirely, and it filled in the conceptual corners of the field that keyword search could not reach.

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

What Comes Next

Alex shipped this as version 1.0.0 and plans to keep iterating. The near-term priorities include tighter anti-hallucination guardrails between agents (for example, an extended reference section that pins every claim to a specific quote in a specific paper), broader gray-literature coverage for social-science work that does not sit in peer-reviewed journals, and integration with the Consensus citation graph once it lands in the MCP so the pipeline can crawl references forward and backward from a seed paper.

Try It Yourself

Start by defining the narrow jobs your pipeline actually needs. Alex landed on 10 agents. Yours might need 4 or 5. Name them and give each one a single output.

Insert human review breaks at the points where direction matters most. Before search, before synthesis, before final writing.
Add an adversarial agent whose only job is to find the weakest claim. This is the single highest-leverage addition most builders skip.
Connect Consensus via the MCP and treat it as a primary relevance-ranking source. Layer keyword sources (OpenAlex, arXiv, PubMed) on top for coverage.
Persist every intermediate artifact to local storage so you can resume, audit, and re-use past runs.

Alex has provided access to the repository he created below:

Github repo

The Problem

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

That gap is what drove Alex to add Consensus to the pipeline.

The Solution

This solution involves code

Agent	Job
Social	Collects current papers and books from 8 academic sources including Consensus
Grounder	Decomposes the research question into sub-questions and excavates intellectual origins and seminal works
Historian	Builds a chronological development map of the field, including abandoned research directions
Gaper	Identifies and classifies gaps as empirical, conceptual, methodological, or theoretical
Vision	Draws logical implications from established findings
Theorist	Proposes concrete, scoped, falsifiable research approaches anchored in the identified gaps
Rude	Adversarial critic. Evaluates proposals with empirical rigor and identifies the weakest links
Synthesizer	Produces a unified research narrative and sharpens the original problem statement
Thinker	Opens genuinely new research directions beyond the existing proposals
Scribe	Writes the final document in the format the researcher specifies

Alex has provided access to the repository he created below:

Github repo

2.1

How It Works

PHASE 1

Collection

Agent in this phase: Social

CHECKPOINT

PHASE 2

Foundations

Agents in this phase: Grounder, Historian, Gaper

CHECKPOINT

Agents in this phase: Grounder, Historian, Gaper

PHASE 3

Proposals and Synthesis

Agents in this phase: Vision, Theorist, Rude, Synthesizer

CHECKPOINT

PHASE 4

Expansion and Writing

Agents in this phase: Thinker, Scribe

Key Insights & Learnings

Split the work across specialized agents, not one big prompt

Use an adversarial agent to stress-test proposals

Put the human back in the loop at three precise moments

How Consensus Fits In

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

What Comes Next

Try It Yourself

Start by defining the narrow jobs your pipeline actually needs. Alex landed on 10 agents. Yours might need 4 or 5. Name them and give each one a single output.

Insert human review breaks at the points where direction matters most. Before search, before synthesis, before final writing.
Add an adversarial agent whose only job is to find the weakest claim. This is the single highest-leverage addition most builders skip.
Connect Consensus via the MCP and treat it as a primary relevance-ranking source. Layer keyword sources (OpenAlex, arXiv, PubMed) on top for coverage.
Persist every intermediate artifact to local storage so you can resume, audit, and re-use past runs.

Alex has provided access to the repository he created below:

Github repo

The Problem

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

That gap is what drove Alex to add Consensus to the pipeline.

The Solution

This solution involves code

Agent	Job
Social	Collects current papers and books from 8 academic sources including Consensus
Grounder	Decomposes the research question into sub-questions and excavates intellectual origins and seminal works
Historian	Builds a chronological development map of the field, including abandoned research directions
Gaper	Identifies and classifies gaps as empirical, conceptual, methodological, or theoretical
Vision	Draws logical implications from established findings
Theorist	Proposes concrete, scoped, falsifiable research approaches anchored in the identified gaps
Rude	Adversarial critic. Evaluates proposals with empirical rigor and identifies the weakest links
Synthesizer	Produces a unified research narrative and sharpens the original problem statement
Thinker	Opens genuinely new research directions beyond the existing proposals
Scribe	Writes the final document in the format the researcher specifies

Alex has provided access to the repository he created below:

Github repo

2.1

How It Works

PHASE 1

Collection

Agent in this phase: Social

CHECKPOINT

PHASE 2

Foundations

Agents in this phase: Grounder, Historian, Gaper

CHECKPOINT

Agents in this phase: Grounder, Historian, Gaper

PHASE 3

Proposals and Synthesis

Agents in this phase: Vision, Theorist, Rude, Synthesizer

CHECKPOINT

PHASE 4

Expansion and Writing

Agents in this phase: Thinker, Scribe

Key Insights & Learnings

Split the work across specialized agents, not one big prompt

Use an adversarial agent to stress-test proposals

Put the human back in the loop at three precise moments

How Consensus Fits In

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

What Comes Next

Try It Yourself

Start by defining the narrow jobs your pipeline actually needs. Alex landed on 10 agents. Yours might need 4 or 5. Name them and give each one a single output.

Insert human review breaks at the points where direction matters most. Before search, before synthesis, before final writing.
Add an adversarial agent whose only job is to find the weakest claim. This is the single highest-leverage addition most builders skip.
Connect Consensus via the MCP and treat it as a primary relevance-ranking source. Layer keyword sources (OpenAlex, arXiv, PubMed) on top for coverage.
Persist every intermediate artifact to local storage so you can resume, audit, and re-use past runs.

Alex has provided access to the repository he created below:

Github repo