← Back to Timeline

MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models

Astrophysics

Authors

Nikhil Mukund, Yifang Luo, Fan Zhang, Lisa Barsotti, Erik Katsavounidis

Abstract

We present MARVEL (https://ligogpt.mit.edu/marvel), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. It is designed to address the increasing demands of a digital assistant for scientific groups that can read highly technical data, cite precisely, and operate within authenticated networks. MARVEL combines a fast path for straightforward queries with a more deliberate DeepSearch mode that integrates retrieval-augmented generation and Monte Carlo Tree Search. It explores complementary subqueries, allocates more compute to promising branches, and maintains a global evidence ledger that preserves sources during drafting. We applied this framework in the context of gravitational-wave research related to the Laser Interferometer Gravitational-wave Observatory. Answers are grounded in a curated semantic index of research literature, doctoral theses, LIGO documents, and long-running detector electronic logbooks, with targeted web searches when appropriate. Because direct benchmarking against commercial LLMs cannot be performed on private data, we evaluated MARVEL on two publicly available surrogate datasets that capture comparable semantic and technical characteristics. On these benchmarks, MARVEL matches a GPT-4o mini baseline on literature-centric queries and substantially outperforms it on detector-operations content, where domain retrieval and guided reasoning are decisive. By making the complete framework and evaluation datasets openly available, we aim to provide a reproducible foundation for developing domain-specific scientific assistants.

Concepts

retrieval-augmented generation monte carlo methods multi-agent orchestration domain-specific qa embeddings scientific workflows test-time scaling gravitational waves interpretability active learning signal detection

The Big Picture

Imagine joining one of the most complex scientific collaborations on Earth: the LIGO gravitational-wave observatory. Your job spans a thousand interlocking systems. Laser optics, vibration isolation, digital control systems, decades of detector logs. The institutional knowledge you need is scattered across hundreds of thousands of documents: technical reports, PhD theses, electronic logbooks from two detector sites, a mountain of published literature. Where do you even start?

This is the daily reality for LIGO researchers. The collaboration has detected hundreds of gravitational-wave events (ripples in spacetime from colliding black holes and neutron stars) since its first detection in 2015. People rotate between roles, expertise walks out the door, and the knowledge base keeps growing. A new researcher asking “why did the detector glitch last Tuesday?” might need to trace years of logbook entries written by engineers who’ve long since moved on.

The bottleneck isn’t intelligence. It’s finding the right information at all.

Researchers at MIT’s Kavli Institute and IAIFI have built MARVEL, a locally deployable AI assistant that doesn’t just search this knowledge base but reasons through it. It cites sources, handles technical jargon, and allocates its own computational effort based on question difficulty.

Key Insight: MARVEL combines retrieval-augmented generation with Monte Carlo Tree Search to give scientific collaborations a domain-expert AI assistant that operates on private, institutional knowledge, not just publicly available data.

How It Works

MARVEL routes every incoming query along one of two tracks based on complexity. Straightforward factual questions take a fast path: rapid retrieval and synthesis. Harder queries, ones involving multi-step reasoning, conflicting sources, or deeply technical detector operations, trigger a DeepSearch mode.

Figure 1

DeepSearch is where things get interesting. It uses Monte Carlo Tree Search (MCTS), the algorithm best known from game-playing AI like AlphaGo, repurposed here for scientific reasoning. Rather than exhaustively exploring every line of investigation, MCTS lets MARVEL probe multiple sub-questions at once, then concentrate effort on the branches turning up real evidence. Think of it as a detective who pursues several leads in parallel, then doubles down on the ones that pay off.

The retrieval layer draws from a curated semantic index (a database organized by meaning, not just keywords) built from four source types:

  • Published arXiv papers and preprints
  • Doctoral theses from LIGO-affiliated researchers
  • Internal LIGO technical documents
  • Detector electronic logbooks

When those sources fall short, MARVEL triggers targeted web searches to fill gaps. It keeps a running evidence ledger throughout, recording every source consulted and every claim grounded, so the final answer arrives with full citations.

Figure 2

The whole system runs on open-weight language models (AI systems whose parameters are publicly available, unlike proprietary services like GPT-4) rather than commercial APIs. It operates entirely within an institution’s private network. For collaborations handling sensitive engineering data or proprietary experimental results, this is a big deal.

Why It Matters

Benchmarking a system trained on private institutional data is tricky: there’s no shared test set to compare against public models. The team’s solution was to construct two surrogate datasets with comparable semantic and technical characteristics to LIGO’s internal documents. One was drawn from published scientific literature, the other mirrored the operational, log-style content of detector logbooks.

On literature-centric queries, MARVEL matches a GPT-4o mini baseline. On detector-operations content, MARVEL substantially outperforms it. This is the messy, domain-specific material that makes up the real working knowledge of a gravitational-wave observatory, and it’s exactly where guided reasoning pays off: when the answer lives in a logbook entry from three years ago, written in engineering shorthand, that no general-purpose model has ever seen.

Figure 3

The approach extends well beyond LIGO. Every large scientific collaboration, in particle physics, astronomy, genomics, generates vast institutional knowledge that gradually becomes inaccessible as people and projects move on. The default response has been to reach for commercial AI tools. These are powerful, but they operate on public data, can’t be customized to specific document collections, and require sending sensitive information off-site.

MARVEL offers something different: a reproducible, open-source framework where scientific groups control their own knowledge base, their own models, and their own reasoning pipeline. Adapting it to a new domain requires swapping datasets and adjusting prompts, not rebuilding from scratch. The full framework and evaluation datasets are publicly available.

Open questions remain. How does MCTS-guided reasoning scale as knowledge bases grow to millions of documents? What happens when logbook entries from different time periods contradict each other? Can benchmark validity be maintained as private institutional data keeps evolving? These problems will shape the next generation of scientific AI assistants.

Bottom Line: MARVEL shows that domain-specific AI assistants, built on private institutional knowledge and guided by tree-search reasoning, can outperform general-purpose commercial LLMs on the queries that matter most for working scientists. Its open-source release gives every major collaboration the tools to build their own.

IAIFI Research Highlights

Interdisciplinary Research Achievement
MARVEL applies Monte Carlo Tree Search, a technique from game-playing AI, to scientific question answering over LIGO's decades of detector engineering knowledge, connecting AI systems research directly with gravitational-wave physics.
Impact on Artificial Intelligence
The system introduces a compute-aware DeepSearch mode that allocates LLM inference across the most promising reasoning branches, outperforming GPT-4o mini on technical operational content.
Impact on Fundamental Interactions
By making LIGO's dispersed institutional knowledge (logbooks, theses, technical documents) searchable through intelligent retrieval, MARVEL supports the operation of instruments probing the universe's most extreme events.
Outlook and References
Future extensions could apply the framework to other large physics collaborations and improve handling of temporally evolving knowledge bases. The paper is available at [arXiv:2601.03436](https://arxiv.org/abs/2601.03436), and the live system at [ligogpt.mit.edu/marvel](https://ligogpt.mit.edu/marvel).

Original Paper Details

Title
MARVEL: A Multi Agent-based Research Validator and Enabler using Large Language Models
arXiv ID
[arXiv:2601.03436](https://arxiv.org/abs/2601.03436)
Authors
Nikhil Mukund, Yifang Luo, Fan Zhang, Lisa Barsotti, Erik Katsavounidis
Abstract
We present MARVEL (https://ligogpt.mit.edu/marvel), a locally deployable, open-source framework for domain-aware question answering and assisted scientific research. It is designed to address the increasing demands of a digital assistant for scientific groups that can read highly technical data, cite precisely, and operate within authenticated networks. MARVEL combines a fast path for straightforward queries with a more deliberate DeepSearch mode that integrates retrieval-augmented generation and Monte Carlo Tree Search. It explores complementary subqueries, allocates more compute to promising branches, and maintains a global evidence ledger that preserves sources during drafting. We applied this framework in the context of gravitational-wave research related to the Laser Interferometer Gravitational-wave Observatory. Answers are grounded in a curated semantic index of research literature, doctoral theses, LIGO documents, and long-running detector electronic logbooks, with targeted web searches when appropriate. Because direct benchmarking against commercial LLMs cannot be performed on private data, we evaluated MARVEL on two publicly available surrogate datasets that capture comparable semantic and technical characteristics. On these benchmarks, MARVEL matches a GPT-4o mini baseline on literature-centric queries and substantially outperforms it on detector-operations content, where domain retrieval and guided reasoning are decisive. By making the complete framework and evaluation datasets openly available, we aim to provide a reproducible foundation for developing domain-specific scientific assistants.