Proxying All Your LLM Traffic Through agentgateway with Grok Build
Introduction
If you’re like me, your AI tooling has probably grown into a sprawling mess of API keys: Claude Code reads .claude/.env, Cursor uses OpenAI keys, Langflow points at its own Anthropic endpoint, and your local agents each have their own hardcoded tokens scattered across config files. Every new tool means another key to manage, another endpoint to remember, and another blind spot in your observability stack.
The alternative is elegant: route every single LLM request through a single gateway that handles authentication, observability, rate limiting, and routing β then point all your tools at that one endpoint.
In this guide, I’ll walk you through exactly how I set up this architecture using Solo.io Enterprise agentgateway as the LLM traffic proxy and Grok Build as one of the clients. My production cluster routes traffic to three backends:
- OpenAI (
gpt-4o) β cloud API viaapi.openai.com - xAI Grok (
grok-4.3) β cloud API viaapi.x.ai - DGX Spark (
Qwen/Qwen3.6-35B-A3B-FP8) β local on-prem inference server at172.16.10.173:8000
The key insight is that agentgateway speaks the OpenAI API contract. Every backend β regardless of whether it’s OpenAI, xAI, Anthropic, or a local Qwen β is exposed as a standard /v1/chat/completions endpoint. Your tools don’t need to know or care what’s behind the proxy.
Architecture Overview
Here’s the high-level picture:
βββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββ
β Grok Build β β β
β Claude Codeββββββ agentgateway Proxy (Kubernetes) β
β Cursor β β β
β Langflow β β Gateway β HTTPRoute β AgentgatewayBackend β
β Your CLI β β β
βββββββββββββββ ββββββββββ¬βββββββββββ¬βββββββββββββββ¬βββββββββββ
β β β
ββββββββΌββββ ββββββΌββββββ βββββββΌβββββββ
β OpenAI β β xAI Grok β β DGX Spark β
β gpt-4o β β grok-4.3 β β Qwen 3.6 β
ββββββββββββ ββββββββββββ ββββββββββββββ
All LLM requests flow through the same agentgateway instance β the same IP, same port, same observability pipeline β but different HTTP path prefixes route to different backends:
| Path Prefix | Backend | Model | Location |
|---|---|---|---|
/openai | OpenAI | gpt-4o | Cloud (api.openai.com) |
/grok | xAI | grok-4.3 | Cloud (api.x.ai) |
/spark | DGX Spark | Qwen3.6-35B-A3B-FP8 | On-prem (172.16.10.173) |
The Problem with Scattered LLM Configuration
Before agentgateway, my setup looked like this:
- Grok Build:
base_url: http://172.16.10.173:8000/v1, modelQwen/Qwen3.6-35B-A3B-FP8 - Claude Code:
ANTHROPIC_API_KEYin.env, points atapi.anthropic.com - Cursor: OpenAI key hardcoded in settings, points at
api.openai.com - Custom agents: Various scripts with keys in environment variables,
.envfiles, or inline
The pain points were obvious:
- API key sprawl β every tool had its own key stored somewhere different
- No unified observability β I couldn’t see total LLM spend, compare model performance, or debug issues across providers
- Local model isolation β my on-prem Qwen model on DGX Spark was only accessible to tools that could reach
172.16.10.173, with no unified routing - Secret management β no centralized rotation, no audit trail
After agentgateway, it’s all one URL with path-based routing. Every request goes through the same gateway that logs, traces, and authenticates β regardless of which LLM ultimately serves it.
The Architecture: Three Layers
agentgateway uses three Kubernetes resource types to define each backend connection:
1. AgentgatewayBackend β “Where is the LLM?”
The backend resource tells agentgateway about an LLM provider. Here are all three of mine:
OpenAI backend β Cloud API with API key auth:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: gpt-4o
policies:
auth:
secretRef:
name: openai-secret
xAI Grok backend β Cloud API with TLS and SNI:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: xai-grok
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: grok-4.3
host: api.x.ai
port: 443
pathPrefix: /v1
policies:
auth:
secretRef:
name: xai-secret
tls:
sni: api.x.ai
DGX Spark backend β Local on-prem Qwen model, no auth needed:
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: dgx-spark-llm
namespace: agentgateway-system
spec:
ai:
provider:
openai:
model: Qwen/Qwen3.6-35B-A3B-FP8
host: 172.16.10.173
port: 8000
pathPrefix: /v1
Key observations:
- The OpenAI provider in agentgateway isn’t just for OpenAI β it implements the OpenAI-compatible API contract. That’s why it works for xAI (which exposes an OpenAI-compatible endpoint) and for the DGX Spark’s vLLM server.
- The DGX Spark backend points directly at the on-prem inference server. No cloud egress, no internet required.
- Cloud backends (OpenAI, xAI) use
secretReffor API key auth. The local DGX Spark has no auth β it’s on the internal network.
2. Gateway β “Where does the traffic enter?”
A Gateway resource defines the proxy listener. I have one main gateway for cloud providers and dedicated gateways for local models (useful when you want separate NodePort allocations or network policies):
Main gateway (cloud LLMs):
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: agentgateway-proxy
namespace: agentgateway-system
spec:
gatewayClassName: enterprise-agentgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Dedicated DGX Spark gateway (separate NodePort for the on-prem model):
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: dgx-spark-gateway
namespace: agentgateway-system
spec:
gatewayClassName: enterprise-agentgateway
listeners:
- name: http
protocol: HTTP
port: 80
allowedRoutes:
namespaces:
from: All
Both use the enterprise-agentgateway GatewayClass, which is created automatically by the Enterprise agentgateway Helm chart. On my bare-metal Talos cluster, these are exposed via NodePort:
| Gateway | NodePort |
|---|---|
agentgateway-proxy | 30160 |
dgx-spark-gateway | 31944 |
3. HTTPRoute β “Which backend handles this path?”
HTTPRoute resources connect the Gateway to the Backend, defining which URL path prefix routes to which LLM:
OpenAI route (goes to main gateway, path /openai):
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai
namespace: agentgateway-system
spec:
parentRefs:
- name: agentgateway-proxy
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /openai
backendRefs:
- name: openai
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
xAI Grok route (goes to dedicated gateway, path /grok):
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: xai-grok
namespace: agentgateway-system
spec:
parentRefs:
- name: xai-grok-gateway
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /grok
backendRefs:
- name: xai-grok
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
DGX Spark route (goes to dedicated gateway, path /spark):
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: dgx-spark-llm
namespace: agentgateway-system
spec:
parentRefs:
- name: dgx-spark-gateway
namespace: agentgateway-system
rules:
- matches:
- path:
type: PathPrefix
value: /spark
backendRefs:
- name: dgx-spark-llm
namespace: agentgateway-system
group: agentgateway.dev
kind: AgentgatewayBackend
The complete flow for a request:
Client β http://172.16.10.149:30160/openai/v1/chat/completions
β Gateway "agentgateway-proxy" (port 80, NodePort 30160)
β HTTPRoute matches path "/openai"
β AgentgatewayBackend "openai"
β api.openai.com/v1/chat/completions
Secret Management with Vault
Cloud LLM providers require API keys. I use HashiCorp Vault with the External Secrets Operator to keep all keys out of Git:
Vault (KV v2) ESO K8s Secret Backend
agentgateway/llm-keys/openai β ExternalSecret β openai-secret β AgentgatewayBackend
agentgateway/llm-keys/xai β ExternalSecret β xai-secret β AgentgatewayBackend
The Vault stores the actual API keys. ESO syncs them into Kubernetes Secrets. The AgentgatewayBackend resources reference those Secrets by name β never the key itself. ArgoCD manages all of this declaratively, and no secret ever touches Git.
# ClusterSecretStore β connects ESO to Vault
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
name: vault
namespace: agentgateway-system
spec:
provider:
vault:
server: "http://vault-vault.vault.svc:8200"
path: "agentgateway/llm-keys"
kind: SecretList
auth:
kubernetes:
mountPath: "kubernetes"
role: "eso-role"
Distributed Tracing
Every request through agentgateway emits OpenTelemetry traces to the Solo UI’s telemetry collector. This gives you a unified view of all LLM traffic β OpenAI, xAI, local Qwen β in the same dashboard:
apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
name: tracing
namespace: agentgateway-system
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: agentgateway-proxy
- group: gateway.networking.k8s.io
kind: Gateway
name: dgx-spark-gateway
- group: gateway.networking.k8s.io
kind: Gateway
name: xai-grok-gateway
frontend:
tracing:
backendRef:
name: solo-enterprise-telemetry-collector
namespace: agentgateway-system
kind: Service
port: 4317
randomSampling: "true"
Connecting Grok Build
Now for the exciting part β pointing Grok Build at the gateway. The Grok Build config file lives at ~/.grok/config.toml. Here’s mine:
[cli]
installer = "internal"
[model.qwen]
model = "Qwen/Qwen3.6-35B-A3B-FP8"
base_url = "http://172.16.10.173:8000/v1"
[model.agw]
model = "agw"
base_url = "http://172.16.10.149:31944/spark"
[ui]
max_thoughts_width = 120
fork_secondary_model = "grok-build"
yolo = true
compact_mode = true
permission_mode = "always-approve"
theme = "groknight"
[models]
default = "agw"
The [model.agw] section defines a model called agw that points at the DGX Spark gateway URL β http://172.16.10.149:31944/spark. This is the agentgateway endpoint for my local Qwen model.
The [models] section sets agw as the default model, so every Grok Build conversation goes through agentgateway by default.
The [model.qwen] entry is a fallback direct-to-backend configuration. If I need to bypass the gateway for debugging, I can switch to the qwen model and it points directly at the vLLM server.
Here’s what happens when I type a prompt in Grok Build:
Grok Build β base_url: http://172.16.10.149:31944/spark
β Gateway "dgx-spark-gateway" (NodePort 31944)
β HTTPRoute matches "/spark"
β AgentgatewayBackend "dgx-spark-llm"
β http://172.16.10.173:8000/v1/chat/completions
β Qwen3.6-35B-A3B-FP8 (local DGX)
The request never leaves my network. The gateway still logs it, traces it, and enforces policies β but the payload never traverses the public internet.
Connecting Other Tools
The same pattern works for any tool that supports custom API endpoints:
Claude Code β set the API base to the agentgateway URL for Anthropic:
ANTHROPIC_BASE_URL=http://172.16.10.149:30160/anthropic
Cursor β in settings, set the API endpoint to:
http://172.16.10.149:30160/openai
And set the model to gpt-4o β agentgateway strips the path prefix and forwards to the OpenAI backend.
Langflow / LlamaIndex / custom scripts β same thing:
import openai
client = openai.OpenAI(
api_key="not-needed-proxy",
base_url="http://172.16.10.149:30160/openai/v1"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
The api_key doesn’t need to be a real OpenAI key β it’s just passed through by the proxy. The real auth happens server-side via the Vault-synced Secret.
Testing the Endpoints
You can verify each backend works through the gateway:
OpenAI (main gateway, path /openai):
curl http://172.16.10.149:30160/openai/v1/chat/completions \
-H "content-type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}' | jq
xAI Grok (dedicated gateway, path /grok):
curl http://172.16.10.149:31500/grok/v1/chat/completions \
-H "content-type: application/json" \
-d '{"model":"grok-4.3","messages":[{"role":"user","content":"Hello!"}]}' | jq
DGX Spark (dedicated gateway, path /spark):
curl http://172.16.10.149:31944/spark/v1/chat/completions \
-H "content-type: application/json" \
-d '{"model":"Qwen/Qwen3.6-35B-A3B-FP8","messages":[{"role":"user","content":"Hello!"}]}' | jq
Deploying with GitOps
All of these manifests live in a Git repository and are deployed via ArgoCD using sync waves:
- Wave 1 β Gateway API CRDs
- Wave 2 β agentgateway CRDs (AgentgatewayBackend, EnterpriseAgentgatewayPolicy types)
- Wave 3 β Control plane (controller + proxy)
- Wave 4 β HashiCorp Vault (secrets store)
- Wave 5 β External Secrets Operator (Vault β K8s Secret sync)
- Wave 6 β Solo UI (dashboard + telemetry collector)
- Wave 7 β Config (gateways, backends, routes, policies)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: agentgateway-config
namespace: argocd
spec:
syncWave: 7
# ... (points to config/ directory in Git)
When I add a new LLM backend, I create three YAML files (backend, gateway if needed, route), commit them, and ArgoCD deploys everything automatically. No imperative commands, no manual kubectl apply.
The Benefits
After months of running this setup, the benefits are clear:
One URL, all LLMs β Every tool connects to the same gateway. No scattered keys, no config files spread across machines.
Local-first by default β My Grok Build conversations go to the local Qwen model by default (free, fast, private). I switch to GPT-4o or Grok when I need the heavier models. The gateway makes the switch seamless.
Unified observability β The Solo UI shows traces from all three backends in one place. I can see latency, error rates, and token usage across providers without logging into three different dashboards.
Local models get cloud treatment β The DGX Spark runs behind the same Gateway API routing, tracing, and policy system as the cloud providers. It’s a first-class citizen, not an afterthought.
Secrets never touch Git β Vault + ESO keeps API keys out of the repository. Adding a new provider means storing a key in Vault and applying three YAML files.
Network isolation β The on-prem DGX Spark is on an internal network (
172.16.10.173). agentgateway exposes it to the outside world through the Gateway API β no firewall rules needed on the DGX itself.
Conclusion
agentgateway transforms the LLM landscape from a fragmented mess of per-tool configurations into a unified, observable, centrally-managed infrastructure. Grok Build is just one client β any tool that speaks the OpenAI API contract can connect to the gateway and get access to all your LLMs.
The three-resource pattern (Backend β Gateway β Route) is simple enough to understand in minutes but powerful enough to handle complex multi-provider, multi-model, multi-cluster setups. And because it’s all Kubernetes-native β using Gateway API CRDs managed by ArgoCD β it fits into any GitOps workflow you already have.
The full configuration for this setup is open source in the k8s-goose repository, which includes the ArgoCD GitOps pipeline, all YAML manifests, and scripts for deploying on any Kubernetes cluster.
For a step-by-step walkthrough of the agentgateway setup itself, see Setting Up Enterprise agentgateway on Kind Clusters.