A 10-Agent Research Pipeline with Human-in-the-Loop Review - Consensus

A 10-Agent Research Pipeline with Human-in-the-Loop Review

A 10-Agent Research Pipeline with Human-in-the-Loop Review

How a medical researcher uses the Consensus MCP + Claude to rapidly identify PhD-worthy research niches in an unfamiliar field

Researcher snapshot

Name

Alex Eponon

Role

PhD researcher

Project

Basis Research Agents, an open-source multi-agent research pipeline

Tools

Claude Sonnet 4.5 (primary), Claude Haiku 4.5 and Ollama fallbacks, Consensus MCP, 9 additional academic APIs, ConceptNet

License

MIT

Github Repository

github.com/anvix9/basis_research_agents

01

The Problem

01

The Problem

Alex wanted to use AI to do fundamental research, but he kept running into a control problem. A one-shot Claude prompt will happily write a plausible literature review, and a single search tool will return a list of papers. Neither gives the researcher visibility into which claims are grounded, where ideas come from historically, or where the real gaps in a field sit. As more steps get chained together, it becomes harder to trace any individual claim back to a specific paper.

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

"When you do research with AI, it is very easy to lose control of the information that appears, whether it should be trusted, how solid it is, or what its real motivation is.”

Alex

PhD researcher

The second problem was coverage. Alex pulled from a wide set of sources including OpenAlex, arXiv, PubMed, Semantic Scholar, CORE, PhilPapers, PhilArchive, PhilSci-Archive, Google Books, and Open Library. Each one covers a different corner of the literature. Philosophy of science lives in PhilPapers. Preprints live in arXiv. Biomedical work lives in PubMed. Even with all of these sources plugged in, there were still research questions where relevant papers did not surface. The keyword-based APIs were good at finding exact matches, but they missed papers that used different terminology for the same underlying idea.

That gap is what drove Alex to add Consensus to the pipeline.


02

The Solution

02

The Solution

This solution involves code

Alex used code to create a custom research agent. You can access it on the github repo linked at the start. If you are not comfortable with code, you can take the lessons learned here and apply them to your own projects.

Alex built a pipeline that runs 10 specialized agents in sequence. Each agent does one job, writes its own markdown output, and hands context to the next. Three mandatory human review breaks are baked into the flow so the researcher stays in control of direction before the system spends more tokens.

Agent

Job

Social

Collects current papers and books from 8 academic sources including Consensus

Grounder

Decomposes the research question into sub-questions and excavates intellectual origins and seminal works

Historian

Builds a chronological development map of the field, including abandoned research directions

Gaper

Identifies and classifies gaps as empirical, conceptual, methodological, or theoretical

Vision

Draws logical implications from established findings

Theorist

Proposes concrete, scoped, falsifiable research approaches anchored in the identified gaps

Rude

Adversarial critic. Evaluates proposals with empirical rigor and identifies the weakest links

Synthesizer

Produces a unified research narrative and sharpens the original problem statement

Thinker

Opens genuinely new research directions beyond the existing proposals

Scribe

Writes the final document in the format the researcher specifies

Alex has provided access to the repository he created below:


2.1

How It Works

2.1

How It Works

PHASE 1

Collection

Agent in this phase: Social

Goal: Cast the widest possible net. Pull current papers and books from every relevant academic source, expanded by semantic concept mapping so the pool covers adjacent themes the researcher did not name explicitly.


CHECKPOINT

The researcher reviews the themes the pipeline identified and the sources it plans to search. This is the cheapest moment to redirect. Confirming direction here prevents wasted compute on synthesis downstream.


PHASE 2

Foundations

Agents in this phase: Grounder, Historian, Gaper

Goal: Understand the field before proposing anything. Trace where the ideas came from, how they evolved over time, and where the real gaps sit. By the end of this phase, the researcher has a map of the intellectual territory and a classified list of open problems.


CHECKPOINT

Agents in this phase: Grounder, Historian, Gaper

Goal: Understand the field before proposing anything. Trace where the ideas came from, how they evolved over time, and where the real gaps sit. By the end of this phase, the researcher has a map of the intellectual territory and a classified list of open problems.


PHASE 3

Proposals and Synthesis

Agents in this phase: Vision, Theorist, Rude, Synthesizer

Goal: Turn the foundations into concrete, falsifiable research proposals that survive adversarial review, then synthesize everything upstream into a unified narrative. Rude exists specifically to poke holes so weak proposals get filtered out before they reach the researcher.


CHECKPOINT

The researcher reviews the proposals and synthesis, then specifies the final output format (blog post, literature review, research brief, paper section, grant background, or internal memo). This is where the researcher tells the pipeline what artifact they actually need.


PHASE 4

Expansion and Writing

Agents in this phase: Thinker, Scribe

Goal: Open directions beyond the existing proposals and produce the final deliverable in the chosen format. Thinker looks for angles the upstream agents did not catch. Scribe pulls from every upstream markdown output and produces the polished artifact.


03

Key Insights & Learnings

03

Key Insights & Learnings

1

Split the work across specialized agents, not one big prompt

Alex gave each agent a narrow job and its own markdown output file. This keeps the context clean for the next agent and makes every step auditable. A researcher can open any agent's file and see what it produced without sifting through one giant context window.

2

Use an adversarial agent to stress-test proposals

The Rude agent exists specifically to poke holes. Its job is to find the weakest empirical claim and call it out. Most pipelines converge on a single direction because every agent is trying to be helpful. Alex added friction on purpose so the output survives real scrutiny.

3

Put the human back in the loop at three precise moments

Rather than letting your agent run wild, create checkpoints throughout to ensure that it is researching in the right direction. In this use case, Alex created 3 different checkpoints at critical points in the agents process. Allowing the end researcher to use their own expertise to guide the agent.


04

How Consensus Fits In

04

How Consensus Fits In

Alex had already plugged nine other academic sources into the Social agent before Consensus was added. Most of those sources run on keyword matching. They return what you ask for, and they miss what you did not phrase correctly. When Alex connected Consensus through the MCP, he was looking for a different retrieval behavior. Semantic search ranks papers by meaning rather than by literal string match, which meant it could find relevant work even when the terminology did not overlap with the original query.

Once Alex ran the same research questions through Consensus alongside the other sources, he saw the difference in what came back. Consensus surfaced papers the other APIs had missed entirely, and it filled in the conceptual corners of the field that keyword search could not reach.

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

"Consensus was returning papers that the other sources were not finding at all. It filled in the gaps I still had after plugging in all the other APIs."

Alex

PhD researcher

05

What Comes Next

05

What Comes Next

Alex shipped this as version 1.0.0 and plans to keep iterating. The near-term priorities include tighter anti-hallucination guardrails between agents (for example, an extended reference section that pins every claim to a specific quote in a specific paper), broader gray-literature coverage for social-science work that does not sit in peer-reviewed journals, and integration with the Consensus citation graph once it lands in the MCP so the pipeline can crawl references forward and backward from a seed paper.


Try It Yourself

Start by defining the narrow jobs your pipeline actually needs. Alex landed on 10 agents. Yours might need 4 or 5. Name them and give each one a single output.

  1. Insert human review breaks at the points where direction matters most. Before search, before synthesis, before final writing.

  2. Add an adversarial agent whose only job is to find the weakest claim. This is the single highest-leverage addition most builders skip.

  3. Connect Consensus via the MCP and treat it as a primary relevance-ranking source. Layer keyword sources (OpenAlex, arXiv, PubMed) on top for coverage.

  4. Persist every intermediate artifact to local storage so you can resume, audit, and re-use past runs.

Alex has provided access to the repository he created below:


Become a Consensus MCP expert.

For courses and more information how to use the MCP, check out our guide below.

Example: