Agents, MCP, and Kubernetes, Part 4

Learn how to observe and operate AI agent platforms on Kubernetes using Istio Ambient Mode, Kiali, AgentGateway, and Prometheus, with practical guidance on performance, reliability, and governance.

  • Prashant Ramhit & Yevhen Khvastunov
  • 6 min read

Introduction

In the first three parts of this series, we built out the core capabilities of an agent platform on Kubernetes.

Part 1 covered deploying agents and MCP servers. Part 2 brought in zero-trust connectivity with Istio Ambient Mode. Part 3 added application-level governance through AgentGateway.

At this point, the platform can handle real workloads within defined security and policy boundaries. Now we need operational visibility.

Once agents start running production workflows, we need to understand how requests flow through the system, where latency creeps in, how failures propagate, and whether enforcement policies are actually working. Without this visibility, diagnosing incidents and maintaining performance gets complicated fast.

This post walks through how we use Kiali, AgentGateway’s management interface, Prometheus, and Istio telemetry together to monitor service behavior, resource usage, and overall system health.

What Needs to Be Observed

When we’re running agent platforms at scale, certain questions come up repeatedly: which services are talking to each other and how much, where latency is building up, which requests are failing and why, whether governance policies are actually being enforced, how much external API usage we’re generating, and which components are getting close to their limits.

These questions span infrastructure, network, and application layers. No single tool gives us all the answers.

Observability Architecture

We’ve built the observability stack around five core components:

  • Istio Ambient Mode for transport telemetry
  • Kiali for mesh visualization
  • AgentGateway UI and metrics for application insight
  • Prometheus for metric collection
  • Grafana for dashboards

Each layer gives us a different view of the system. Together, they create a coherent operational model.

Kiali and Istio Ambient Mode

In our platform, Kiali is the primary visibility layer for Istio Ambient Mode.

Unlike sidecar-based meshes, Ambient Mode moves interception into shared ztunnel and waypoint proxies. This simplifies application deployment but centralizes where we observe traffic.

Kiali reconstructs service relationships using telemetry from these components.

Topology in Ambient Mode

The topology view shows logical service relationships rather than pod-level wiring. In a typical deployment, we see:

  • Agent workloads communicating with AgentGateway through ztunnel
  • Traffic passing through waypoint proxies when L7 policies apply
  • AgentGateway forwarding requests to LLM providers and MCP servers

Because interception is centralized, the graph stays readable even as the cluster grows.

When we see unexpected paths, it usually means we’re missing AuthorizationPolicy rules, have misconfigured ServiceEntries, or incomplete waypoint coverage.

kiali

Authorization and Policy Visibility

Kiali overlays authorization results directly on service edges. In Ambient deployments, we can immediately see which connections are allowed or denied at L4, where L7 enforcement is active, and which routes are experiencing retries or failures. Policy violations show up right on the graph, so we spend less time digging through logs.

Latency Analysis

Latency heatmaps pull in timing data from ztunnel and waypoint proxies. This lets us tell the difference between gateway-induced delays, backend processing latency, and network-level congestion. For agent systems that rely on external models and tools, being able to make this distinction matters a lot.

Observing Behavior with the AgentGateway UI

AgentGateway has an administrative interface that complements Kiali’s network view.

While Kiali shows us how traffic moves through the mesh, the AgentGateway UI shows us how requests are handled at the application layer.

Request and Route Visibility

The UI gives us insight into active listeners and routes, request volume per endpoint, backend health, response codes, and routing decisions. This is often the fastest way to verify whether traffic is being processed as expected.

AgentGateway Routes

Policy and Security Feedback

AgentGateway enforces CEL-based authorization and filtering rules. The UI surfaces policy evaluation outcomes, rejected requests, secret filtering events, and header and payload validation failures. This makes it straightforward to confirm that governance rules are working correctly without having to parse logs.

AgentGateway policies

Operational Debugging

During incidents, the UI comes in handy for identifying overloaded listeners, detecting misrouted traffic, verifying credential injection, and watching real-time request patterns. It gives you a control-plane view of application behavior.

Prometheus and Metric Collection

Prometheus pulls in telemetry from Istio, AgentGateway, and application workloads. At minimum, it should scrape Istio telemetry endpoints, AgentGateway /metrics, and application metrics endpoints. All critical services should show up as scrape targets. If they’re missing, we’ve usually got deployment or annotation issues.

Core Istio Metrics

Istio exports a standard set of transport-level metrics:

  • istio_requests_total
  • istio_request_duration_seconds_bucket
  • istio_requests_failed_total
  • istio_request_bytes
  • istio_response_bytes

These metrics help us analyze traffic volume, reliability, and latency across services.

AgentGateway Metrics

AgentGateway publishes detailed application telemetry:

  • agentgateway_requests_total
  • agentgateway_request_duration_seconds_bucket
  • agentgateway_rate_limited_total
  • agentgateway_cel_policy_deny_total

These metrics show listener utilization, rate limit enforcement, policy rejections, backend failure rates, and route-level latency. They give us direct visibility into how governance is affecting real workloads.

Building Operational Dashboards

Good dashboards combine network, application, and infrastructure views.

Here’s a practical layout with the template availble from Grafana Agentgateway Overview

Traffic and Reliability

  • Requests per second
  • Success and failure ratios
  • Retry volume

Performance

  • P95 and P99 latency
  • Gateway-to-backend timing
  • Queue depth

Governance

  • Policy denials
  • Secret filtering events
  • Authorization failures

Capacity

  • CPU and memory for AgentGateway
  • Node utilization
  • Network throughput

When we can see these signals together, correlations become obvious.

Grafana Dashboard

Example: Diagnosing a Latency Increase

Here’s how a typical investigation plays out.

  1. Kiali shows rising latency between agents and AgentGateway. At the same time, error ratios are climbing on that path.

  2. The AgentGateway UI confirms that requests are being throttled upstream.

  3. From there, we verify upstream limits and adjust backoff policies.

  4. Prometheus shows a growing number of rate-limited requests on the LLM proxy listener.

This workflow helps us figure out whether the root cause is in external providers, gateway configuration, policy enforcement, or resource exhaustion.

Distributed Tracing: When we integrate with systems like Jaeger or Tempo, Istio propagates trace context across services. Tracing lets us inspect end-to-end workflows, identify slow spans, detect retry loops, and correlate failures with backend behavior. It’s especially valuable for intermittent issues.

Platform Health KPIs: In production, we track several categories of indicators continuously.

Reliability: Key reliability indicators include error rate per route, policy denials, and retry volume.

Performance: Critical performance metrics are P95/P99 latency, backend response time, and queue buildup.

Cost: Cost tracking focuses on token usage, provider request volume, and API call distribution.

Security:

  • Security monitoring covers unexpected traffic paths, abnormal request patterns, and policy violations.
  • Sustained changes in these KPIs often show up before visible incidents.

Conclusion

Observability makes the difference between an agent platform that we can actually operate and one that becomes a black box as complexity increases.

By combining Istio Ambient telemetry, Kiali visualization, AgentGateway UI and metrics, Prometheus aggregation, and Grafana dashboards, we get visibility into both infrastructure behavior and application intent.

This layered approach helps us respond to incidents faster, maintain predictable performance, and plan capacity with confidence.

With governance, security, and observability in place, we can run agent-based systems as stable production platforms instead of experimental pipelines.

References

Recommended for You

Agents, MCP, and Kubernetes, Part 3

Agents, MCP, and Kubernetes, Part 3

Learn how to implement application-level security and intelligent routing for AI agents and MCP servers using kGateway and AgentGateway. Part 3 of a three-part series exploring production-ready agentic architecture

Prashant Ramhit & Yevhen Khvastunov

Agents, MCP, and Kubernetes, Part 2

Agents, MCP, and Kubernetes, Part 2

Learn how to Secure AI agents and Model Context Protocol (MCP) servers on Kubernetes with Istio Ambient Mode. Part 2 of a three-part series exploring production-ready agentic architecture.

Prashant Ramhit & Yevhen Khvastunov