How Do I Run a 'Decision Sanity-Check' With Five Models in One Thread?

Copy Link

I’ve spent a decade in boardrooms and due diligence suites, watching senior leadership make multi-million dollar bets based on "consensus" summaries that were essentially hallucinations dressed in a suit. The modern strategy workflow is broken—we spend half our time tab-switching between Claude, ChatGPT, and Perplexity, trying to reconcile contradictory data points. It’s an exercise in cognitive load management, not high-stakes analysis.

The solution isn’t another aggregation tool that hides the messy process under a "game-changing" UI. The solution is forcing a rigorous, auditable, multi-perspective read using orchestrated model workflows. Here is how I perform a "decision sanity-check" using five models in a single thread to ensure I’m not just getting an average, but getting the truth.

The Problem: The "Dropdown Aggregator" Trap

Most AI platforms give you a dropdown menu. You switch models, ask a question, lose the thread, and start over in another tab. This creates workflow friction. By the time you get to the third model, the initial context has drifted. You lose the nuance of the debate because the AI doesn't "know" what the previous model found unless you manually copy-paste the logs. This is a massive risk. If you cannot trace the provenance of a data point, an auditor will tear your memo apart in five minutes.

When I look at a decision memo, I need to know: Where did that number come from? If you are using isolated dropdowns, you are essentially flying blind, hoping the models are hallucinating in the same direction.

Defining the Framework: Sequential vs. Parallel

To run a true sanity-check, you have to decide whether you want to build a narrative (Sequential) or test a hypothesis (Parallel).

Sequential Mode: Each model acts as a subsequent peer-reviewer. Model A generates the logic, Model B critiques the assumptions, Model C audits the math, Model D looks for the "quiet" risks, and Model E synthesizes the final sanity-check.
Parallel Mode: You prompt all five models simultaneously with the same "ground truth" data set. You are looking for divergence. If four models agree and one model diverges, that divergence is your signal.

The "Super Mind" Mode Approach

Using a "Super Mind" mode—or effectively, a system-prompt orchestrator—allows you to bake the skepticism into the system. Instead of asking one model for an answer, you assign roles. I explicitly prompt the system to act as a "Chief Audit Officer" or a "Market Skeptic.". Pretty simple.

The Five-Model Audit Table Role Task Risk Type The Architect Establish the baseline logic and primary assumptions. Loud (Visible error) The Auditor Verify every number and source claim. Loud (Numerical) The Contrarian Find the "quiet" risks (hidden variables). Quiet (Structural) The Historian Compare to previous, similar deal outcomes. Quiet (Strategic) The Synthesis Lead Final sanity check on the "disagreement signal." N/A

Why Disagreement is a Signal, Not a Failure

One of the most annoying habits of amateur prompt-engineering is trying to force the models to reach a consensus. That is the quickest way to get "fluffy," middle-of-the-road outputs. Disagreement is data.

If your Architect says the TAM (Total Addressable Market) is $5B and your Auditor says $3.2B, you don't pick the middle. You stop. You force the models to cross-examine their own logic. Ask: "Model A and Model B, you have a delta of $1.8B. List your data sources for the CAGR calculations. Which variable is causing the divergence?"

This is where you find the assumptions that aren't actually assumptions—they are guesses. If an AI can’t point to a specific source for a market growth percentage, mark it as a "Loud Risk" and pull it from the final deck.

My Personal Checklist: "What would an auditor ask?"

Before any decision memo hits a board desk, I run this checklist against the entire thread. If I can’t answer these with the outputs from my orchestrated models, the work isn't done:

Provenance Check: Did I ask the model to cite the exact page/report for this figure?
Edge Case Stress: Did we test the "What if?" scenario for a 20% decline in revenue?
Correlation Audit: Did the models identify any hidden dependencies? (e.g., "This growth depends on Interest Rate X staying below Y.")
Tone Check: Does the output contain fluff like "game-changing" or "next-gen"? If yes, delete it and rewrite the sentence in plain, empirical language.
Divergence Report: Did we map every instance where the models disagreed?

Orchestration vs. Dropdown: The Efficiency Frontier

Stop treating AI like a chatbot. Start treating it like a team of analysts. When you keep everyone in one thread (using the "Super Mind" style of orchestration), you create a shared state. If the "Auditor" model corrects the "Architect" model in the middle of the thread, the "Synthesis" model needs to be aware of that correction.

In a standard dropdown approach, the synthesis model is unaware of the correction. That is a failure of workflow. By keeping the context shared, you build an automated paper trail. When a board member asks, "How did you arrive at this risk profile?" you can simply export the thread—a chronological log of AI-to-AI critique—and demonstrate the rigor of your analysis.

Closing Thoughts: Move From Synthesis to Verification

The goal of a decision sanity-check is not to have an AI write the memo for you. It is to use these models to stress-test your own thinking. If your conclusion survives five models—each acting as an adversarial auditor—you have a high-conviction position. If the models find holes, you’ve saved yourself from a costly blunder.

Think about it: stop suprmind.ai looking for "next-gen" magic and start looking for the friction. If the models aren't arguing with each other, your prompt is too soft. Push them. Make them show their work. Because when the auditor asks, "Where did that number come from?" you need to be able to point to the exact chain of verification, not a hazy, AI-generated summary.

Public Last updated: 2026-05-20 09:05:45 AM