Scratchpad 3Q Reasoning: Surfacing Assumptions to Mitigate Hallucination and Improve Truthfulness in Large Language Models

Author: — Oct 2025

Abstract

Large language models (LLMs) make impressive predictions but remain prone to hallucinations—confident yet incorrect statements—when answering factual or adversarial queries. We propose the 3Q scratchpad framework: a lightweight prompting and logging method that asks the model to explicitly produce three short internal reasoning sections for each response ("What I Know", "What I Need", and "What I Am Assuming") and then to emit a concise final answer that is shown to the user while the scratchpad is saved for analysis. This approach does not change model architecture or require additional human annotations; it augments the interaction protocol to make latent reasoning explicit and auditable. We evaluate the method on TruthfulQA, a benchmark designed to elicit model falsehoods. Using the provided task-wise results for a representative model (Phi-4-mini), the 3Q scratchpad substantially reduces hallucination in categories such as logical falsehoods (from 21.43% to 47.64% non-hallucinated) and produces modest gains on other categories. We analyze why the intervention works, catalogue failure modes, discuss privacy and logging trade-offs, and propose extensions. The 3Q framework is a practical, low-cost intervention that meaningfully improves model truthfulness by forcing localized transparency into model outputs. The source code is available at github.com/Pro-GenAI/S3Q-Reasoning.

Keywords: Large language models, LLMs, reasoning, hallucination, truthfulness, scratchpad, interpretability, prompting, TruthfulQA, Artificial Intelligence, AI

PDF

PDF of "Scratchpad 3Q Reasoning: Surfacing Assumptions to Mitigate Hallucination and Improve Truthfulness in Large Language Models"
Download the PDF file