18 Feb 2026

Route MCP and LLM Traffic from Claude Desktop and Claude Code Through AgentGateway

Introduction

Claude Desktop and Claude Code are powerful AI tools, but out of the box they talk directly to backend services with no visibility, no security controls, and no usage governance. Every MCP tool call from Claude Desktop hits your servers unmonitored. Every LLM API call from Claude Code goes straight to the provider with no rate limiting or audit trail.

What if you could put a gateway in front of all that traffic?

This guide shows you how to route both MCP server traffic from Claude Desktop and LLM API calls from Claude Code through Solo AgentGateway β€” giving you JWT authentication, observability traces, rate limiting, and centralized API key management. We’ll use Anthropic as the LLM provider.

What You Get

By routing traffic through AgentGateway, you gain:

  • Security: JWT authentication on MCP endpoints, RBAC for tool access, prompt injection guards
  • Observability: OpenTelemetry traces for every MCP tool call and LLM request
  • Rate Limiting: Token-based and request-based limits per user or team
  • Centralized Secrets: API keys live in Kubernetes secrets, not on developer laptops
  • Audit Trail: Full logging of every interaction for compliance

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Claude Desktop  β”‚          β”‚                          β”‚
β”‚  (MCP traffic)   │─────────▢│                          │──▢ MCP Servers
β”‚                  β”‚          β”‚                          β”‚    (math, github, etc.)
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚    Solo AgentGateway     β”‚
                              β”‚    (Gateway API)         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚                          β”‚
β”‚  Claude Code     β”‚          β”‚  β€’ JWT Auth              β”‚
β”‚  (LLM traffic)   │─────────▢│  β€’ Rate Limiting         │──▢ Anthropic API
β”‚                  β”‚          β”‚  β€’ OTel Tracing          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚  β€’ Prompt Guards         β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Claude Desktop connects to MCP servers through the gateway. Claude Code sends its LLM API calls through the gateway to Anthropic. Both streams get the same security and observability treatment.

Prerequisites

  • Kubernetes cluster with AgentGateway deployed (quickstart)
  • kubectl and helm installed
  • Anthropic API key
  • Claude Desktop installed (for MCP routing)
  • Claude Code CLI installed (for LLM routing): npm install -g @anthropic-ai/claude-code

Part 1: Deploy an MCP Server

We’ll deploy a simple math MCP server that Claude Desktop can call through the gateway.

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-math-script
  namespace: default
data:
  server.py: |
    import uvicorn
    from mcp.server.fastmcp import FastMCP
    from starlette.applications import Starlette
    from starlette.routing import Route
    from starlette.requests import Request
    from starlette.responses import JSONResponse, Response

    mcp = FastMCP("Math-Service")

    @mcp.tool()
    def add(a: int, b: int) -> int:
        return a + b

    @mcp.tool()
    def multiply(a: int, b: int) -> int:
        return a * b

    async def handle_mcp(request: Request):
        try:
            data = await request.json()
            method = data.get("method")
            msg_id = data.get("id")
            result = None

            if method == "initialize":
                result = {
                    "protocolVersion": "2024-11-05",
                    "capabilities": {"tools": {}},
                    "serverInfo": {"name": "Math-Service", "version": "1.0"}
                }
            elif method == "notifications/initialized":
                return Response(status_code=202)
            elif method == "tools/list":
                tools_list = await mcp.list_tools()
                result = {
                    "tools": [
                        {"name": t.name, "description": t.description, "inputSchema": t.inputSchema}
                        for t in tools_list
                    ]
                }
            elif method == "tools/call":
                params = data.get("params", {})
                name = params.get("name")
                args = params.get("arguments", {})
                tool_result = await mcp.call_tool(name, args)
                serialized = []
                for content in tool_result:
                    if hasattr(content, "type") and content.type == "text":
                        serialized.append({"type": "text", "text": content.text})
                    else:
                        serialized.append(content if isinstance(content, dict) else str(content))
                result = {"content": serialized, "isError": False}
            elif method == "ping":
                result = {}
            else:
                return JSONResponse(
                    {"jsonrpc": "2.0", "id": msg_id, "error": {"code": -32601, "message": "Method not found"}},
                    status_code=404
                )
            return JSONResponse({"jsonrpc": "2.0", "id": msg_id, "result": result})
        except Exception as e:
            import traceback
            traceback.print_exc()
            return JSONResponse(
                {"jsonrpc": "2.0", "id": None, "error": {"code": -32603, "message": str(e)}},
                status_code=500
            )

    app = Starlette(routes=[
        Route("/mcp", handle_mcp, methods=["POST"]),
        Route("/", lambda r: JSONResponse({"status": "ok"}), methods=["GET"])
    ])

    if __name__ == "__main__":
        uvicorn.run(app, host="0.0.0.0", port=8000)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-math-server
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mcp-math-server
  template:
    metadata:
      labels:
        app: mcp-math-server
    spec:
      containers:
      - name: math
        image: python:3.11-slim
        command: ["/bin/sh", "-c"]
        args:
        - |
          pip install "mcp[cli]" uvicorn starlette &&
          python /app/server.py
        ports:
        - containerPort: 8000
        volumeMounts:
        - name: script-volume
          mountPath: /app
        readinessProbe:
          httpGet:
            path: /
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: script-volume
        configMap:
          name: mcp-math-script
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-math-server
  namespace: default
spec:
  selector:
    app: mcp-math-server
  ports:
  - port: 80
    targetPort: 8000
EOF

Wait for the pod to be ready:

kubectl wait --for=condition=ready pod -l app=mcp-math-server --timeout=120s

Part 2: Create the Gateway and Routes

We need two things routed through AgentGateway: MCP tool traffic and LLM API traffic to Anthropic.

Create the Anthropic API Key Secret

kubectl create secret generic anthropic-api-key \
  -n agentgateway-system \
  --from-literal=Authorization="Bearer $ANTHROPIC_API_KEY"

Create the Gateway

A single Gateway with one listener handles both MCP and LLM traffic:

# gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: ai-gateway
  namespace: agentgateway-system
spec:
  gatewayClassName: agentgateway
  listeners:
  - name: http
    port: 8080
    protocol: HTTPS
    allowedRoutes:
      namespaces:
        from: All

Create the MCP Backend and Route

Point the gateway at our math MCP server:

# mcp-backend.yaml
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: math-mcp
  namespace: agentgateway-system
spec:
  mcp:
    targets:
    - name: math-service
      static:
        host: mcp-math-server.default.svc.cluster.local
        port: 80
        path: /mcp
        protocol: StreamableHTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: mcp-route
  namespace: agentgateway-system
spec:
  parentRefs:
  - name: ai-gateway
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /mcp
    backendRefs:
    - name: math-mcp
      group: agentgateway.dev
      kind: AgentgatewayBackend

Create the Anthropic LLM Backend and Route

# anthropic-backend.yaml
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: anthropic-llm
  namespace: agentgateway-system
spec:
  type: llm
  llm:
    provider:
      anthropic:
        authToken:
          secretRef:
            name: anthropic-api-key
            namespace: agentgateway-system
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: anthropic-route
  namespace: agentgateway-system
spec:
  parentRefs:
  - name: ai-gateway
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /anthropic
    backendRefs:
    - name: anthropic-llm
      group: agentgateway.dev
      kind: AgentgatewayBackend

Apply everything:

kubectl apply -f gateway.yaml
kubectl apply -f mcp-backend.yaml
kubectl apply -f anthropic-backend.yaml

Get the Gateway Address

export GATEWAY_IP=$(kubectl get svc ai-gateway -n agentgateway-system \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Gateway: http://$GATEWAY_IP:8080"

For local clusters (kind/minikube), use port-forward instead:

kubectl port-forward -n agentgateway-system svc/ai-gateway 8080:8080 &
export GATEWAY_IP=localhost

Part 3: Configure Claude Desktop (MCP Traffic)

Claude Desktop can route its MCP tool calls through AgentGateway using supergateway as a local bridge.

Config File Location

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Basic Configuration

Create or update the config file:

{
  "mcpServers": {
    "math-service": {
      "command": "npx",
      "args": ["-y", "supergateway", "--streamableHttp", "http://GATEWAY_IP:8080/mcp"]
    }
  }
}

Replace GATEWAY_IP with your actual gateway IP or localhost if using port-forward.

With JWT Authentication

If you’ve configured JWT auth on the gateway:

{
  "mcpServers": {
    "math-service": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "http://GATEWAY_IP:8080/mcp"],
      "env": {
        "MCP_HEADERS": "Authorization: Bearer <your-jwt-token>"
      }
    }
  }
}

Restart Claude Desktop after saving the config. You should see the math tools available in the tools menu.

Part 4: Configure Claude Code (LLM Traffic via Anthropic)

Claude Code can route its LLM API traffic through AgentGateway to Anthropic. This means your API keys stay in Kubernetes secrets β€” not on developer machines.

Set the Base URL

Point Claude Code at the gateway’s Anthropic route:

export ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic

For persistence, add it to your shell profile:

echo "export ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic" >> ~/.zshrc

Run Claude Code

claude

All LLM traffic now flows through AgentGateway to Anthropic. You can verify by checking the gateway logs:

kubectl logs -n agentgateway-system -l gateway.networking.k8s.io/gateway-name=ai-gateway -f

Part 5: Add Security Policies

Now that traffic flows through the gateway, you can layer on security controls.

Rate Limiting

Prevent runaway costs with token-based rate limiting:

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: rate-limit
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: anthropic-route
  default:
    rateLimiting:
      tokenBucket:
        maxTokens: 10000
        refillRate: 1000
        refillInterval: 60s

Prompt Guards

Block prompt injection attempts before they reach Anthropic:

apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: prompt-guard
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: anthropic-route
  default:
    promptGuard:
      request:
        matches:
        - action: REJECT
          regex: "ignore (previous|all) instructions"

Apply the policies:

kubectl apply -f rate-limit.yaml
kubectl apply -f prompt-guard.yaml

Part 6: Enable Observability

Add OpenTelemetry tracing to see every request in detail. If you followed our Langfuse observability guide, you already have this set up. Otherwise, configure tracing via Helm values:

# values-tracing.yaml
gateway:
  envs:
    OTEL_EXPORTER_OTLP_ENDPOINT: "http://your-otel-collector:4317"
    OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"
helm upgrade agentgateway oci://cr.agentgateway.dev/helm/agentgateway \
  --version 2.1.0 \
  --namespace agentgateway-system \
  -f values-tracing.yaml

With tracing enabled, every Claude Desktop tool call and every Claude Code LLM request shows up as a trace with full metadata: model, tokens, latency, route, and any security policy actions.

Testing the Full Setup

Test MCP (Claude Desktop)

Open Claude Desktop and ask it to use the math tools:

“What’s 42 multiplied by 17?”

Claude should invoke the multiply tool through AgentGateway. Check the gateway logs to confirm the request was proxied.

Test LLM (Claude Code)

ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic claude

# In Claude Code, type any prompt β€” it routes through the gateway to Anthropic

Test with MCP Inspector

You can also verify MCP connectivity directly:

npx @modelcontextprotocol/inspector

Enter the URL: http://GATEWAY_IP:8080/mcp

You should see the math tools listed and be able to invoke them.

Important Limitations

  • Claude Desktop’s core LLM traffic (conversations with Claude itself) cannot be routed through AgentGateway β€” only MCP server traffic is proxied
  • Claude Code LLM traffic can be fully routed through the gateway via the ANTHROPIC_BASE_URL environment variable
  • For local development (kind/minikube), you’ll need port-forwarding since there’s no external load balancer

Troubleshooting

MCP tools not showing in Claude Desktop?

  • Restart Claude Desktop after config changes
  • Verify the gateway is reachable: curl http://GATEWAY_IP:8080/mcp
  • Check that supergateway/mcp-remote is installed: npx -y supergateway --help

Claude Code connection refused?

  • Verify the gateway service has an external IP: kubectl get svc -n agentgateway-system
  • Check firewall rules allow traffic on port 8080
  • Ensure ANTHROPIC_BASE_URL is set correctly: echo $ANTHROPIC_BASE_URL

Timeout errors?

  • LLM requests can take time β€” ensure gateway timeouts are configured for AI workloads
  • Check gateway pod resources are sufficient for the traffic volume

Authentication errors?

  • Verify the Anthropic API key secret exists: kubectl get secret anthropic-api-key -n agentgateway-system
  • Check the key is valid with a direct curl to Anthropic’s API

Cleanup

kubectl delete -f gateway.yaml -f mcp-backend.yaml -f anthropic-backend.yaml
kubectl delete deployment mcp-math-server
kubectl delete svc mcp-math-server
kubectl delete configmap mcp-math-script
kubectl delete secret anthropic-api-key -n agentgateway-system

Resources