Route MCP and LLM Traffic from Claude Desktop and Claude Code Through AgentGateway
Introduction
Claude Desktop and Claude Code are powerful AI tools, but out of the box they talk directly to backend services with no visibility, no security controls, and no usage governance. Every MCP tool call from Claude Desktop hits your servers unmonitored. Every LLM API call from Claude Code goes straight to the provider with no rate limiting or audit trail.
What if you could put a gateway in front of all that traffic?
This guide shows you how to route both MCP server traffic from Claude Desktop and LLM API calls from Claude Code through Solo AgentGateway β giving you JWT authentication, observability traces, rate limiting, and centralized API key management. We’ll use Anthropic as the LLM provider.
What You Get
By routing traffic through AgentGateway, you gain:
- Security: JWT authentication on MCP endpoints, RBAC for tool access, prompt injection guards
- Observability: OpenTelemetry traces for every MCP tool call and LLM request
- Rate Limiting: Token-based and request-based limits per user or team
- Centralized Secrets: API keys live in Kubernetes secrets, not on developer laptops
- Audit Trail: Full logging of every interaction for compliance
Architecture
ββββββββββββββββββββ ββββββββββββββββββββββββββββ
β Claude Desktop β β β
β (MCP traffic) βββββββββββΆβ ββββΆ MCP Servers
β β β β (math, github, etc.)
ββββββββββββββββββββ β Solo AgentGateway β
β (Gateway API) β
ββββββββββββββββββββ β β
β Claude Code β β β’ JWT Auth β
β (LLM traffic) βββββββββββΆβ β’ Rate Limiting ββββΆ Anthropic API
β β β β’ OTel Tracing β
ββββββββββββββββββββ β β’ Prompt Guards β
ββββββββββββββββββββββββββββ
Claude Desktop connects to MCP servers through the gateway. Claude Code sends its LLM API calls through the gateway to Anthropic. Both streams get the same security and observability treatment.
Prerequisites
- Kubernetes cluster with AgentGateway deployed (quickstart)
kubectlandhelminstalled- Anthropic API key
- Claude Desktop installed (for MCP routing)
- Claude Code CLI installed (for LLM routing):
npm install -g @anthropic-ai/claude-code
Part 1: Deploy an MCP Server
We’ll deploy a simple math MCP server that Claude Desktop can call through the gateway.
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: mcp-math-script
namespace: default
data:
server.py: |
import uvicorn
from mcp.server.fastmcp import FastMCP
from starlette.applications import Starlette
from starlette.routing import Route
from starlette.requests import Request
from starlette.responses import JSONResponse, Response
mcp = FastMCP("Math-Service")
@mcp.tool()
def add(a: int, b: int) -> int:
return a + b
@mcp.tool()
def multiply(a: int, b: int) -> int:
return a * b
async def handle_mcp(request: Request):
try:
data = await request.json()
method = data.get("method")
msg_id = data.get("id")
result = None
if method == "initialize":
result = {
"protocolVersion": "2024-11-05",
"capabilities": {"tools": {}},
"serverInfo": {"name": "Math-Service", "version": "1.0"}
}
elif method == "notifications/initialized":
return Response(status_code=202)
elif method == "tools/list":
tools_list = await mcp.list_tools()
result = {
"tools": [
{"name": t.name, "description": t.description, "inputSchema": t.inputSchema}
for t in tools_list
]
}
elif method == "tools/call":
params = data.get("params", {})
name = params.get("name")
args = params.get("arguments", {})
tool_result = await mcp.call_tool(name, args)
serialized = []
for content in tool_result:
if hasattr(content, "type") and content.type == "text":
serialized.append({"type": "text", "text": content.text})
else:
serialized.append(content if isinstance(content, dict) else str(content))
result = {"content": serialized, "isError": False}
elif method == "ping":
result = {}
else:
return JSONResponse(
{"jsonrpc": "2.0", "id": msg_id, "error": {"code": -32601, "message": "Method not found"}},
status_code=404
)
return JSONResponse({"jsonrpc": "2.0", "id": msg_id, "result": result})
except Exception as e:
import traceback
traceback.print_exc()
return JSONResponse(
{"jsonrpc": "2.0", "id": None, "error": {"code": -32603, "message": str(e)}},
status_code=500
)
app = Starlette(routes=[
Route("/mcp", handle_mcp, methods=["POST"]),
Route("/", lambda r: JSONResponse({"status": "ok"}), methods=["GET"])
])
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-math-server
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: mcp-math-server
template:
metadata:
labels:
app: mcp-math-server
spec:
containers:
- name: math
image: python:3.11-slim
command: ["/bin/sh", "-c"]
args:
- |
pip install "mcp[cli]" uvicorn starlette &&
python /app/server.py
ports:
- containerPort: 8000
volumeMounts:
- name: script-volume
mountPath: /app
readinessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: script-volume
configMap:
name: mcp-math-script
---
apiVersion: v1
kind: Service
metadata:
name: mcp-math-server
namespace: default
spec:
selector:
app: mcp-math-server
ports:
- port: 80
targetPort: 8000
EOF
Wait for the pod to be ready:
kubectl wait --for=condition=ready pod -l app=mcp-math-server --timeout=120s
Part 2: Create the Gateway and Routes
We need two things routed through AgentGateway: MCP tool traffic and LLM API traffic to Anthropic.
Create the Anthropic API Key Secret
kubectl create secret generic anthropic-api-key \
-n agentgateway-system \
--from-literal=Authorization="Bearer $ANTHROPIC_API_KEY"
Create the Gateway
A single Gateway with one listener handles both MCP and LLM traffic:
# gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: ai-gateway
namespace: agentgateway-system
spec:
gatewayClassName: agentgateway
listeners:
- name: http
port: 8080
protocol: HTTPS
allowedRoutes:
namespaces:
from: All
Create the MCP Backend and Route
Point the gateway at our math MCP server:
# mcp-backend.yaml
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: math-mcp
namespace: agentgateway-system
spec:
mcp:
targets:
- name: math-service
static:
host: mcp-math-server.default.svc.cluster.local
port: 80
path: /mcp
protocol: StreamableHTTP
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: mcp-route
namespace: agentgateway-system
spec:
parentRefs:
- name: ai-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /mcp
backendRefs:
- name: math-mcp
group: agentgateway.dev
kind: AgentgatewayBackend
Create the Anthropic LLM Backend and Route
# anthropic-backend.yaml
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: anthropic-llm
namespace: agentgateway-system
spec:
type: llm
llm:
provider:
anthropic:
authToken:
secretRef:
name: anthropic-api-key
namespace: agentgateway-system
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: anthropic-route
namespace: agentgateway-system
spec:
parentRefs:
- name: ai-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /anthropic
backendRefs:
- name: anthropic-llm
group: agentgateway.dev
kind: AgentgatewayBackend
Apply everything:
kubectl apply -f gateway.yaml
kubectl apply -f mcp-backend.yaml
kubectl apply -f anthropic-backend.yaml
Get the Gateway Address
export GATEWAY_IP=$(kubectl get svc ai-gateway -n agentgateway-system \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Gateway: http://$GATEWAY_IP:8080"
For local clusters (kind/minikube), use port-forward instead:
kubectl port-forward -n agentgateway-system svc/ai-gateway 8080:8080 &
export GATEWAY_IP=localhost
Part 3: Configure Claude Desktop (MCP Traffic)
Claude Desktop can route its MCP tool calls through AgentGateway using supergateway as a local bridge.
Config File Location
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
Basic Configuration
Create or update the config file:
{
"mcpServers": {
"math-service": {
"command": "npx",
"args": ["-y", "supergateway", "--streamableHttp", "http://GATEWAY_IP:8080/mcp"]
}
}
}
Replace GATEWAY_IP with your actual gateway IP or localhost if using port-forward.
With JWT Authentication
If you’ve configured JWT auth on the gateway:
{
"mcpServers": {
"math-service": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://GATEWAY_IP:8080/mcp"],
"env": {
"MCP_HEADERS": "Authorization: Bearer <your-jwt-token>"
}
}
}
}
Restart Claude Desktop after saving the config. You should see the math tools available in the tools menu.
Part 4: Configure Claude Code (LLM Traffic via Anthropic)
Claude Code can route its LLM API traffic through AgentGateway to Anthropic. This means your API keys stay in Kubernetes secrets β not on developer machines.
Set the Base URL
Point Claude Code at the gateway’s Anthropic route:
export ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic
For persistence, add it to your shell profile:
echo "export ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic" >> ~/.zshrc
Run Claude Code
claude
All LLM traffic now flows through AgentGateway to Anthropic. You can verify by checking the gateway logs:
kubectl logs -n agentgateway-system -l gateway.networking.k8s.io/gateway-name=ai-gateway -f
Part 5: Add Security Policies
Now that traffic flows through the gateway, you can layer on security controls.
Rate Limiting
Prevent runaway costs with token-based rate limiting:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: rate-limit
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: anthropic-route
default:
rateLimiting:
tokenBucket:
maxTokens: 10000
refillRate: 1000
refillInterval: 60s
Prompt Guards
Block prompt injection attempts before they reach Anthropic:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: prompt-guard
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: anthropic-route
default:
promptGuard:
request:
matches:
- action: REJECT
regex: "ignore (previous|all) instructions"
Apply the policies:
kubectl apply -f rate-limit.yaml
kubectl apply -f prompt-guard.yaml
Part 6: Enable Observability
Add OpenTelemetry tracing to see every request in detail. If you followed our Langfuse observability guide, you already have this set up. Otherwise, configure tracing via Helm values:
# values-tracing.yaml
gateway:
envs:
OTEL_EXPORTER_OTLP_ENDPOINT: "http://your-otel-collector:4317"
OTEL_EXPORTER_OTLP_PROTOCOL: "grpc"
helm upgrade agentgateway oci://cr.agentgateway.dev/helm/agentgateway \
--version 2.1.0 \
--namespace agentgateway-system \
-f values-tracing.yaml
With tracing enabled, every Claude Desktop tool call and every Claude Code LLM request shows up as a trace with full metadata: model, tokens, latency, route, and any security policy actions.
Testing the Full Setup
Test MCP (Claude Desktop)
Open Claude Desktop and ask it to use the math tools:
“What’s 42 multiplied by 17?”
Claude should invoke the multiply tool through AgentGateway. Check the gateway logs to confirm the request was proxied.
Test LLM (Claude Code)
ANTHROPIC_BASE_URL=http://$GATEWAY_IP:8080/anthropic claude
# In Claude Code, type any prompt β it routes through the gateway to Anthropic
Test with MCP Inspector
You can also verify MCP connectivity directly:
npx @modelcontextprotocol/inspector
Enter the URL: http://GATEWAY_IP:8080/mcp
You should see the math tools listed and be able to invoke them.
Important Limitations
- Claude Desktop’s core LLM traffic (conversations with Claude itself) cannot be routed through AgentGateway β only MCP server traffic is proxied
- Claude Code LLM traffic can be fully routed through the gateway via the
ANTHROPIC_BASE_URLenvironment variable - For local development (kind/minikube), you’ll need port-forwarding since there’s no external load balancer
Troubleshooting
MCP tools not showing in Claude Desktop?
- Restart Claude Desktop after config changes
- Verify the gateway is reachable:
curl http://GATEWAY_IP:8080/mcp - Check that supergateway/mcp-remote is installed:
npx -y supergateway --help
Claude Code connection refused?
- Verify the gateway service has an external IP:
kubectl get svc -n agentgateway-system - Check firewall rules allow traffic on port 8080
- Ensure
ANTHROPIC_BASE_URLis set correctly:echo $ANTHROPIC_BASE_URL
Timeout errors?
- LLM requests can take time β ensure gateway timeouts are configured for AI workloads
- Check gateway pod resources are sufficient for the traffic volume
Authentication errors?
- Verify the Anthropic API key secret exists:
kubectl get secret anthropic-api-key -n agentgateway-system - Check the key is valid with a direct curl to Anthropic’s API
Cleanup
kubectl delete -f gateway.yaml -f mcp-backend.yaml -f anthropic-backend.yaml
kubectl delete deployment mcp-math-server
kubectl delete svc mcp-math-server
kubectl delete configmap mcp-math-script
kubectl delete secret anthropic-api-key -n agentgateway-system