Multi-Agent-as-a-Service — A Senior Engineer’s Overview | by Saman (Sam) Rajaei

[ad_1]

There has been much discussion about AI Agents — pivotal self-contained units capable of performing tasks autonomously, driven by specific instructions and contextual understanding. In fact, the topic has become almost as widely discussed as LLMs. In this article, I consider AI Agents and, more specifically, the concept of Multi-Agents-as-a-Service from the perspective of the lead engineers, architects, and site reliability engineers (SREs) that must deal with AI agents in production systems going forward.

Context: What Problems Can AI Agents Solve?

AI agents are adept at tasks that benefit from human-friendly interactions:

E-Commerce: agents powered by technologies like LLM-based RAG or Text-to-SQL respond to user inquiries with accurate answers based on company policies, allowing for a more tailored shopping experience and customer journey that can revolutionize e-commerce
Customer Service: This is another ideal application. Many of us have experienced long waits to speak with representatives for simple queries like order status updates. Some startups — Decagon for example — are making strides in addressing these inefficiencies through AI agents.
Personalized Product and Content Creation: a prime example of this is Wix — for low-code or no-code website building, Wix developed a chatbot that, through interactive Q&A sessions, creates an initial website for customers according to their description and requirements.

“Humans set goals, but an AI agent independently chooses the best actions it needs to perform to achieve those goals.”

Overall, LLM-based agents would work great in mimicking natural human dialogue and simple business workflows, often producing results that are both effective and impressively satisfying.

An Engineer’s View: AI Agents & Enterprise Production Environments

Considering the benefits mentioned, have you ever wondered how AI agents would function within enterprise production environments? What architecture patterns and infrastructure components best support them? What do we do when things inevitably go wrong and the agents hallucinate, crash or (arguably even worse) carry out incorrect reasoning/planning when performing a critical task?

As senior engineers, we need to carefully consider the above. Moreover, we must ask an even more important question: how do we define what a successful deployment of a multi-agent platform looks like in the first place?

To answer this question, let’s borrow a concept from another software engineering field: Service Level Objectives (SLOs) from Reliability Engineering. SLOs are a critical component in measuring the performance and reliability of services. Simply put, SLOs define the acceptable ratio of “successful” measurements to “all” measurements and their impact on the user journeys. These objectives help us determine the required and expected levels of service from our agents and the broader workflows they support.

So, how are SLOs relevant to our AI Agent discussion?

Using a simplified view, let’s consider two important objectives — “Availability” and “Accuracy” — for the agents and identify some more granular SLOs that contribute to these:

Availability: this refers to the percentage of requests that receive some successful response (think HTTP 200 status code) from the agents or platform. Historically, the uptime and ping success of the underlying servers (i.e. temporal measures) were key correlated indicators of availability. But with the rise of Micro-services, notional uptime has become less relevant. Modern systems instead focus on the number of successful versus unsuccessful responses to user requests as a more accurate proxy for availability. Other related metrics can be thought of as Latency and Throughput.
Accuracy: this, on the other hand, is less about how quickly and con
sistently the agents return responses to the clients, but rather how correctly, from a business perspective, they are able to perform their tasks and return data without a human present in the loop to verify their work. Traditional systems also track similar SLOs such as data correctness and quality.

The act of measuring the two objectives above normally occurs through submission of internal application metrics at runtime, either at set time intervals (e.g. every 10 minutes), or in response to events (user requests, upstream calls etc.). Synthetic probing, for instance, can be used to mimic user requests, trigger relevant events and monitor the numbers. The key idea to explore here is this: traditional systems are deterministic to a large extent and, therefore, it’s generally more straightforward to instrument, probe and evaluate them. On the other hand, in our beautiful yet non-deterministic world of GenAI agents, this is not necessarily the case.

Note: the focus of this post is more so on the former of our two objectives – availability. This includes determining acceptance criteria that sets up baseline cloud/environmental stability to help agents respond to user queries. For a deeper dive into accuracy (i.e. defining sensible task scope for the agents, optimizing performance of few-shot methods and evaluation frameworks), this blog post acts as a wonderful primer.

Now, back to the things engineers need to get right to ensure infrastructure reasiness when deploying agents. In order to achieve our target SLOs and provide a reliable and secure platform, senior engineers consistently take into account the following elements:

Scalability: when number of requests increase (suddenly at times), can the system handle them efficiently?
Cost-Effectiveness: LLM usage is expensive, so how can we monitor and control the cost?
High Availability: how can we keep the system always-available and responsive to customers? Can agents self-heal and recover from errors/crashes?
Security: How can we ensure data is secure at rest and in transit, perform security audits, vulnerability assessments, etc.?
Compliance & Regulatory: a major topic for AI, what are the relevant data privacy regulations and other industry-specific standards to which we must adhere?
Observability: how can we gain real-time visibility into AI agents’ activities, health, and resource utilization levels in order to identify and resolve problems before they impact the user experience?

Sound familiar? These are similar to the challenges that modern web applications, Micro-services pattern and Cloud infrastructure aim to address.

So, now what? We propose an AI Agent development and maintenance framework that adheres to best-practices developed over the years across a range of engineering and software disciplines.

Multi-Agent-as-a-Service (MAaaS)

This time, let us borrow some of best-practices for cloud-based applications to redefine how agents are designed in production systems:

Clear Bounded Context: Each agent should have a well-defined and small scope of responsibility with clear functionality boundaries. This modular approach ensures that agents are more accurate, easier to manage and scale independently.
RESTful and Asynchronous Inter-Service Communication: Usage of RESTful APIs for communication between users and agents, and leveraging message brokers for asynchronous communication. This decouples agents, improving scalability and fault tolerance.
Isolated Data Storage per Agent: Each agent should have its own data storage to ensure data encapsulation and reduce dependencies. Utilize distributed data storage solutions where necessary to support scalability.
Containerization and Orchestration: Using containers (e.g. Docker) to package and deploy agents consistently across different environments, simplifying deployment and scaling. Employ container orchestration platforms like Kubernetes to manage the deployment, scaling, and operational lifecycle of agent services.
Testing and CI/CD: Implementing automated testing (unit, integration, contract, and end-to-end tests) to ensure the reliable change management for agents. Use CI tools to automatically build and test agents whenever code changes are committed. Establish CD pipelines to deploy changes to production seamlessly, reducing downtime and ensuring rapid iteration cycles.
Observability: Implementing robust observability instrumentation such as metrics, tracing and logging for the agents and their supporting infrastructure to build a real-time view of the platform’s reliability (tracing could be of particular interest here if a given user request goes through multiple agents). Calculating and tracking SLO’s and error budgets for the agents and the aggregate request flow. Synthetic probing and
efficient Alerting on warnings and failures to make sure agent health issues are detected before widely impacting the end users.

By applying these principles, we can create a robust framework for AI agents, transforming the concept into “Multi-Agent as a Service” (MAaaS). This approach leverages the best-practices of cloud-based applications to redefine how agents are designed, deployed, and managed.

[ad_2]

Source link

Multi-Agent-as-a-Service — A Senior Engineer’s Overview | by Saman (Sam) Rajaei | Aug, 2024