Large Supervisor Models: Real-Time LLM Output Stream Supervision for Interruption

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks, yet they remain susceptible to generating harmful, misleading, or otherwise unsafe content. Existing safety mechanisms, including post-hoc filtering, prompt engineering, and reinforcement learning from human feedback, are fundamentally reactive and operate either before or after the generation process, leaving a critical temporal gap during streaming output. This paper introduces the Large Supervisor Model (LSM), a novel architecture in which a lightweight, independently trained transformer-based model runs concurrently with an LLM, monitoring its token output stream in real time and issuing structured intervention signals-classified as abstain, feedback, or interrupt-at the moment unsafe content is detected. The LSM does not rewrite or alter model outputs; instead, it interrupts the generation stream and notifies the client with a structured JSON payload, allowing the partial response to be immediately cleared. We describe the design principles, training methodology, dual-path inference architecture combining an embedding-based neural classifier with a fine-tuned transformer, and an evaluation framework grounded in adversarially curated test sets. Experiments demonstrate that the combined LSM architecture achieves substantially higher precision and recall on harmful content detection compared to either component alone, while maintaining sufficiently low latency to operate transparently in production streaming pipelines. The source code is available at github.com/Pro-GenAI/Large-Supervisor-Models.

Keywords: Large Language Models (LLMs), LLM safety, AI safety, real-time content moderation, streaming supervision, harmful content detection