Agent-Supervisor: Supervising Actions of Autonomous AI Agents to Ensure Ethical Compliance

Abstract

The rapid adoption of Artificial Intelligence (AI) agents in decision-making involves autonomous selection of tools and execution of actions. User interactions with agents create concerns regarding the autonomous selection of inappropriate tools and the oversharing of unnecessary or sensitive data of the users with APIs, which causes concerns regarding privacy. The selection of malicious tools causes further concerns related to user safety. This paper proposes a comprehensive framework to evaluate actions performed by AI agents through a Large Language Model (LLM), which acts as a supervisory model designed to detect unexpected behavior of agents, such as unsafe, biased, inappropriate, or malicious behavior. The supervisory model also serves as an explainer to enhance the transparency of the decision-making process of agents. The method detects privacy risks, unauthorized actions, and misuse of AI by tool providers, which are critical concerns in the trustability of AI. The experiment demonstrates the effectiveness of this approach through examples illustrating both safe and unsafe agent behaviors. The results of the experiment proved a successful implementation of the framework by successfully generating warnings based on a set of criteria regarding unexpected behavior by the agent. The source code is available at github.com/Pro-GenAI/Agent-Supervisor.

Keywords: large language models, llms, artificial intelligence, Artificial Intelligence, AI, AI agents, Large Language Models, LLMs, AI supervision, autonomous agents, AI explainability, ethical AI