Advanced Routing Patterns for AI Models with AgentGateway
Introduction
One of the most powerful features of AgentGateway is its ability to intelligently route requests to different AI models or providers based on various criteria. This enables sophisticated scenarios like model selection based on request characteristics, A/B testing different models, and cost optimization through intelligent routing.
In this guide, we’ll explore multiple routing patterns that transform your AgentGateway from a simple proxy into an intelligent AI traffic management system.
What You’ll Learn
- Path-based routing for different models and use cases
- Header-based routing for tenant isolation and testing
- Query parameter routing for flexible client control
- Weighted routing for A/B testing and gradual rollouts
- Content-based routing using request body analysis
- Fallback and failover routing strategies
- Cost-optimized routing patterns
Prerequisites
- Completed previous blog posts (AgentGateway setup, observability, OpenAI integration)
- Understanding of Kubernetes Gateway API concepts
- Knowledge of HTTP routing principles
- OpenAI API key for testing (we’ll add Anthropic optionally)
Environment Setup
Ensure your environment is ready:
# Verify environment variables
export OPENAI_API_KEY="sk-your-openai-api-key-here"
export SOLO_TRIAL_LICENSE_KEY="your-license-key-here"
# Verify AgentGateway is running
kubectl get pods -n enterprise-agentgateway
# Verify existing OpenAI configuration
kubectl get agentgatewaybackend openai-all-models -n enterprise-agentgateway
Pattern 1: Path-Based Routing
Model-Specific Paths
Create different paths for different models, allowing clients to choose the right model for their use case:
kubectl apply -f- <<'EOF'
# GPT-4o Mini route (fast, cost-effective)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-gpt4o-mini
namespace: enterprise-agentgateway
labels:
provider: openai
model: gpt-4o-mini
use-case: general
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/fast
- path:
type: PathPrefix
value: /ai/gpt4o-mini
backendRefs:
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
timeouts:
request: "60s"
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
# Add model override header
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Model-Used
value: gpt-4o-mini
- name: X-Use-Case
value: fast-general
---
# GPT-4o route (premium quality)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-gpt4o
namespace: enterprise-agentgateway
labels:
provider: openai
model: gpt-4o
use-case: premium
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/premium
- path:
type: PathPrefix
value: /ai/gpt4o
backendRefs:
- name: openai-gpt4o
group: agentgateway.dev
kind: AgentgatewayBackend
timeouts:
request: "120s"
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Model-Used
value: gpt-4o
- name: X-Use-Case
value: premium-quality
---
# Embeddings route
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-embeddings-route
namespace: enterprise-agentgateway
labels:
provider: openai
endpoint: embeddings
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/embeddings
backendRefs:
- name: openai-all-models
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/embeddings
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Service-Type
value: embeddings
EOF
Create Model-Specific Backends
kubectl apply -f- <<'EOF'
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai-gpt4o-mini
namespace: enterprise-agentgateway
spec:
ai:
provider:
openai:
config:
model: "gpt-4o-mini" # Force specific model
policies:
auth:
secretRef:
name: openai-secret
timeout:
request: "60s"
---
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai-gpt4o
namespace: enterprise-agentgateway
spec:
ai:
provider:
openai:
config:
model: "gpt-4o" # Force premium model
policies:
auth:
secretRef:
name: openai-secret
timeout:
request: "120s"
EOF
Test Path-Based Routing
export GATEWAY_IP="${GATEWAY_IP:-localhost}"
export GATEWAY_PORT="${GATEWAY_PORT:-8080}"
# Test fast/general route (gpt-4o-mini)
echo "=== Testing Fast Route (GPT-4o Mini) ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/fast/completions" \
-H "content-type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "What model are you and what are your strengths?"
}
],
"max_tokens": 100
}' | jq '{
model: .model,
content: .choices[0].message.content,
usage: .usage
}'
echo ""
echo "=== Testing Premium Route (GPT-4o) ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/premium/completions" \
-H "content-type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "What model are you and what are your strengths?"
}
],
"max_tokens": 100
}' | jq '{
model: .model,
content: .choices[0].message.content,
usage: .usage
}'
Pattern 2: Header-Based Routing
Tenant Isolation and Testing
Use headers to route requests to different models or providers based on client characteristics:
kubectl apply -f- <<'EOF'
# Development environment route
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-dev-environment
namespace: enterprise-agentgateway
labels:
environment: development
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-Environment
value: development
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-Environment
value: dev
backendRefs:
- name: openai-gpt4o-mini # Use cheaper model for dev
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Environment-Used
value: development
- name: X-Model-Tier
value: economy
---
# Production environment route
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-prod-environment
namespace: enterprise-agentgateway
labels:
environment: production
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-Environment
value: production
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-Environment
value: prod
backendRefs:
- name: openai-gpt4o # Use premium model for prod
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Environment-Used
value: production
- name: X-Model-Tier
value: premium
---
# A/B Testing route
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-ab-test
namespace: enterprise-agentgateway
labels:
purpose: ab-testing
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
# Variant A - GPT-4o Mini
- matches:
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-AB-Test
value: variant-a
backendRefs:
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-AB-Variant
value: variant-a
- name: X-Model-Used
value: gpt-4o-mini
# Variant B - GPT-4o
- matches:
- path:
type: PathPrefix
value: /ai/chat
headers:
- name: X-AB-Test
value: variant-b
backendRefs:
- name: openai-gpt4o
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-AB-Variant
value: variant-b
- name: X-Model-Used
value: gpt-4o
EOF
Test Header-Based Routing
# Test development environment routing
echo "=== Testing Development Environment ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/chat/completions" \
-H "content-type: application/json" \
-H "X-Environment: development" \
-d '{
"messages": [
{"role": "user", "content": "Hello from development!"}
],
"max_tokens": 50
}' | jq '{
model: .model,
content: .choices[0].message.content
}'
echo ""
echo "=== Testing Production Environment ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/chat/completions" \
-H "content-type: application/json" \
-H "X-Environment: production" \
-d '{
"messages": [
{"role": "user", "content": "Hello from production!"}
],
"max_tokens": 50
}' | jq '{
model: .model,
content: .choices[0].message.content
}'
echo ""
echo "=== Testing A/B Test Variant A ==="
curl -i "$GATEWAY_IP:$GATEWAY_PORT/ai/chat/completions" \
-H "content-type: application/json" \
-H "X-AB-Test: variant-a" \
-d '{
"messages": [
{"role": "user", "content": "A/B test message"}
],
"max_tokens": 30
}' | grep -E "(X-AB-Variant|X-Model-Used)"
echo ""
echo "=== Testing A/B Test Variant B ==="
curl -i "$GATEWAY_IP:$GATEWAY_PORT/ai/chat/completions" \
-H "content-type: application/json" \
-H "X-AB-Test: variant-b" \
-d '{
"messages": [
{"role": "user", "content": "A/B test message"}
],
"max_tokens": 30
}' | grep -E "(X-AB-Variant|X-Model-Used)"
Pattern 3: Query Parameter Routing
Flexible Client-Side Model Selection
Allow clients to specify routing preferences via query parameters:
kubectl apply -f- <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-query-param-routing
namespace: enterprise-agentgateway
labels:
routing-type: query-parameter
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
# Route for speed preference
- matches:
- path:
type: PathPrefix
value: /ai/flexible
queryParams:
- name: speed
value: fast
backendRefs:
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Speed-Optimized
value: "true"
# Route for quality preference
- matches:
- path:
type: PathPrefix
value: /ai/flexible
queryParams:
- name: quality
value: premium
backendRefs:
- name: openai-gpt4o
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Quality-Optimized
value: "true"
# Route for cost preference
- matches:
- path:
type: PathPrefix
value: /ai/flexible
queryParams:
- name: cost
value: minimal
backendRefs:
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Cost-Optimized
value: "true"
EOF
Test Query Parameter Routing
# Test speed-optimized routing
echo "=== Testing Speed-Optimized Routing ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/flexible/completions?speed=fast" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Fast response needed!"}
],
"max_tokens": 50
}' | jq '{model: .model, usage: .usage}'
# Test quality-optimized routing
echo "=== Testing Quality-Optimized Routing ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/flexible/completions?quality=premium" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "High quality response needed!"}
],
"max_tokens": 50
}' | jq '{model: .model, usage: .usage}'
# Test cost-optimized routing
echo "=== Testing Cost-Optimized Routing ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/flexible/completions?cost=minimal" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Cost-effective response needed!"}
],
"max_tokens": 50
}' | jq '{model: .model, usage: .usage}'
Pattern 4: Weighted Routing for Gradual Rollouts
Implement Traffic Splitting
Use multiple backends with different weights for gradual model rollouts:
kubectl apply -f- <<'EOF'
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-weighted-routing
namespace: enterprise-agentgateway
labels:
routing-type: weighted
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/gradual-rollout
# Split traffic between models
backendRefs:
# 80% traffic to stable model
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
weight: 80
# 20% traffic to new model for testing
- name: openai-gpt4o
group: agentgateway.dev
kind: AgentgatewayBackend
weight: 20
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Routing-Type
value: weighted-rollout
EOF
Test Weighted Routing
cat <<'EOF' > test-weighted-routing.sh
#!/bin/bash
GATEWAY_IP="${GATEWAY_IP:-localhost}"
GATEWAY_PORT="${GATEWAY_PORT:-8080}"
REQUESTS=${1:-20}
echo "Testing weighted routing with $REQUESTS requests..."
echo "Expected: ~80% gpt-4o-mini, ~20% gpt-4o"
echo ""
declare -A model_counts
for ((i=1; i<=REQUESTS; i++)); do
response=$(curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/gradual-rollout/completions" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What model are you?"}
],
"max_tokens": 10
}')
model=$(echo "$response" | jq -r '.model')
((model_counts[$model]++))
printf "Request %2d: %s\n" $i "$model"
done
echo ""
echo "=== Results ==="
for model in "${!model_counts[@]}"; do
count=${model_counts[$model]}
percentage=$(( count * 100 / REQUESTS ))
echo "$model: $count requests (${percentage}%)"
done
EOF
chmod +x test-weighted-routing.sh
./test-weighted-routing.sh 10
Pattern 5: Content-Based Routing
Route Based on Request Content
Use AgentGateway’s request analysis capabilities to route based on content characteristics:
kubectl apply -f- <<'EOF'
apiVersion: enterpriseagentgateway.solo.io/v1alpha1
kind: EnterpriseAgentgatewayPolicy
metadata:
name: content-based-routing
namespace: enterprise-agentgateway
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: agentgateway
traffic:
# Simple content analysis - route long requests to premium model
requestTransformation:
inline:
# Add request length as header for routing decisions
headers:
add:
X-Content-Length: 'string(len(json(request.body).messages[0].content))'
X-Content-Type: 'if len(json(request.body).messages[0].content) > 100 then "long" else "short"'
---
# Route short content to fast model
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-short-content
namespace: enterprise-agentgateway
labels:
routing-type: content-based
content-type: short
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/smart
headers:
- name: X-Content-Type
value: short
backendRefs:
- name: openai-gpt4o-mini
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Content-Routing
value: short-content-fast-model
---
# Route long content to premium model
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-long-content
namespace: enterprise-agentgateway
labels:
routing-type: content-based
content-type: long
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/smart
headers:
- name: X-Content-Type
value: long
backendRefs:
- name: openai-gpt4o
group: agentgateway.dev
kind: AgentgatewayBackend
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Content-Routing
value: long-content-premium-model
EOF
Test Content-Based Routing
# Test with short content
echo "=== Testing Short Content (should use gpt-4o-mini) ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/smart/completions" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hi!"}
],
"max_tokens": 20
}' | jq '{model: .model, usage: .usage}'
echo ""
echo "=== Testing Long Content (should use gpt-4o) ==="
curl -s "$GATEWAY_IP:$GATEWAY_PORT/ai/smart/completions" \
-H "content-type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "This is a much longer prompt that contains significantly more content and should trigger the routing logic to use the premium model because it requires more sophisticated processing and understanding. The content-based routing should detect this as a long request and route it to GPT-4o for better handling of complex queries that require more nuanced responses and deeper analysis."
}
],
"max_tokens": 50
}' | jq '{model: .model, usage: .usage}'
Observing Routing Patterns
Create Routing Dashboard
cat <<'EOF' > routing-analysis.sh
#!/bin/bash
echo "Analyzing routing patterns from AgentGateway logs..."
echo ""
# Extract routing information from recent logs
kubectl logs deploy/agentgateway -n enterprise-agentgateway --tail=100 | \
jq -r 'select(.gen_ai) | [
.timestamp,
.route_name // "unknown",
.gen_ai.request.model,
(.gen_ai.usage.total_tokens // 0),
(.request.headers["x-environment"] // "none"),
(.request.headers["x-ab-test"] // "none")
] | @csv' | \
awk -F',' '
BEGIN {
print "Route Analysis Report"
print "===================="
print ""
total_requests = 0
split("", route_counts)
split("", model_counts)
split("", env_counts)
}
{
total_requests++
# Remove quotes
route = $2
model = $3
tokens = $4
env = $5
gsub(/"/, "", route)
gsub(/"/, "", model)
gsub(/"/, "", env)
route_counts[route]++
model_counts[model]++
if (env != "none") env_counts[env]++
total_tokens += tokens
}
END {
print "Total Requests: " total_requests
print "Total Tokens: " total_tokens
print ""
print "Routes:"
for (route in route_counts) {
printf " %s: %d requests\n", route, route_counts[route]
}
print ""
print "Models:"
for (model in model_counts) {
printf " %s: %d requests\n", model, model_counts[model]
}
print ""
print "Environments:"
for (env in env_counts) {
printf " %s: %d requests\n", env, env_counts[env]
}
}'
EOF
chmod +x routing-analysis.sh
./routing-analysis.sh
Monitor Routing in Grafana
Open Grafana Dashboard:
kubectl port-forward -n monitoring svc/grafana-prometheus 3000:3000 &Create Custom Routing Queries:
# Request rate by route sum(rate(agentgateway_requests_total[5m])) by (route_name) # Request rate by model sum(rate(agentgateway_requests_total[5m])) by (model) # Token usage by routing pattern sum(rate(agentgateway_tokens_total[5m])) by (route_name, token_type)Watch routing patterns as you send test requests through different patterns
Pattern 6: Fallback and Failover Routing
Primary-Secondary Backend Configuration
kubectl apply -f- <<'EOF'
# Primary backend with health checking
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai-primary
namespace: enterprise-agentgateway
spec:
ai:
provider:
openai:
config:
model: "gpt-4o"
policies:
auth:
secretRef:
name: openai-secret
# Health checking configuration
healthCheck:
interval: "30s"
timeout: "5s"
unhealthyThreshold: 2
healthyThreshold: 2
---
# Fallback backend
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
name: openai-fallback
namespace: enterprise-agentgateway
spec:
ai:
provider:
openai:
config:
model: "gpt-4o-mini"
policies:
auth:
secretRef:
name: openai-secret
---
# Failover route configuration
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: openai-failover
namespace: enterprise-agentgateway
labels:
routing-type: failover
spec:
parentRefs:
- name: agentgateway
namespace: enterprise-agentgateway
rules:
- matches:
- path:
type: PathPrefix
value: /ai/reliable
backendRefs:
# Primary backend
- name: openai-primary
group: agentgateway.dev
kind: AgentgatewayBackend
weight: 100
# Fallback backend (only used if primary fails)
- name: openai-fallback
group: agentgateway.dev
kind: AgentgatewayBackend
weight: 0
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /v1/chat/completions
- type: ResponseHeaderModifier
responseHeaderModifier:
add:
- name: X-Routing-Type
value: failover-enabled
EOF
Creating Routing Test Suite
Comprehensive Routing Test
cat <<'EOF' > comprehensive-routing-test.sh
#!/bin/bash
GATEWAY_IP="${GATEWAY_IP:-localhost}"
GATEWAY_PORT="${GATEWAY_PORT:-8080}"
echo "Comprehensive Routing Pattern Test Suite"
echo "======================================="
echo ""
test_endpoint() {
local name="$1"
local endpoint="$2"
local headers="$3"
local expected_model="$4"
echo "Testing: $name"
echo "Endpoint: $endpoint"
local response=$(curl -s $headers "$GATEWAY_IP:$GATEWAY_PORT$endpoint" \
-H "content-type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What model are you?"}
],
"max_tokens": 20
}')
local actual_model=$(echo "$response" | jq -r '.model')
local tokens=$(echo "$response" | jq -r '.usage.total_tokens')
if [[ "$actual_model" == *"$expected_model"* ]]; then
echo "✓ Success: $actual_model ($tokens tokens)"
else
echo "✗ Failed: Expected $expected_model, got $actual_model"
fi
echo ""
}
echo "1. Path-based routing tests"
echo "---------------------------"
test_endpoint "Fast Route" "/ai/fast/completions" "" "gpt-4o-mini"
test_endpoint "Premium Route" "/ai/premium/completions" "" "gpt-4o"
echo "2. Header-based routing tests"
echo "-----------------------------"
test_endpoint "Dev Environment" "/ai/chat/completions" "-H 'X-Environment: development'" "gpt-4o-mini"
test_endpoint "Prod Environment" "/ai/chat/completions" "-H 'X-Environment: production'" "gpt-4o"
test_endpoint "A/B Test Variant A" "/ai/chat/completions" "-H 'X-AB-Test: variant-a'" "gpt-4o-mini"
test_endpoint "A/B Test Variant B" "/ai/chat/completions" "-H 'X-AB-Test: variant-b'" "gpt-4o"
echo "3. Query parameter routing tests"
echo "--------------------------------"
test_endpoint "Speed Optimized" "/ai/flexible/completions?speed=fast" "" "gpt-4o-mini"
test_endpoint "Quality Optimized" "/ai/flexible/completions?quality=premium" "" "gpt-4o"
test_endpoint "Cost Optimized" "/ai/flexible/completions?cost=minimal" "" "gpt-4o-mini"
echo "Routing test suite complete!"
echo ""
echo "Check Grafana dashboard to see routing patterns and metrics."
EOF
chmod +x comprehensive-routing-test.sh
./comprehensive-routing-test.sh
Best Practices and Considerations
Routing Strategy Guidelines
- Performance vs Cost: Balance response quality with token costs
- Gradual Rollouts: Use weighted routing for safe model updates
- Environment Separation: Use headers for dev/staging/prod isolation
- Content Awareness: Route based on request complexity
- Fallback Planning: Always have backup routes for reliability
Monitoring Routing Health
cat <<'EOF' > routing-health-check.sh
#!/bin/bash
echo "Routing Health Check"
echo "==================="
echo ""
# Check route statuses
echo "Route Status:"
kubectl get httproute -n enterprise-agentgateway -o custom-columns=\
"NAME:.metadata.name,\
ACCEPTED:.status.parents[0].conditions[?(@.type=='Accepted')].status,\
AGE:.metadata.creationTimestamp"
echo ""
# Check backend health
echo "Backend Status:"
kubectl get agentgatewaybackend -n enterprise-agentgateway -o custom-columns=\
"NAME:.metadata.name,\
ACCEPTED:.status.conditions[?(@.type=='Accepted')].status,\
AGE:.metadata.creationTimestamp"
echo ""
# Test each route with a quick health check
echo "Route Connectivity Tests:"
routes=(
"/ai/fast/completions"
"/ai/premium/completions"
"/ai/flexible/completions?speed=fast"
"/ai/gradual-rollout/completions"
)
for route in "${routes[@]}"; do
printf "%-35s: " "$route"
response_code=$(curl -s -o /dev/null -w "%{http_code}" \
"${GATEWAY_IP:-localhost}:${GATEWAY_PORT:-8080}$route" \
-H "content-type: application/json" \
-d '{"messages":[{"role":"user","content":"test"}],"max_tokens":1}')
if [ "$response_code" = "200" ]; then
echo "✓ OK"
else
echo "✗ $response_code"
fi
done
EOF
chmod +x routing-health-check.sh
./routing-health-check.sh
Cleanup
When you want to clean up the routing configurations:
# Remove all routing test configurations
kubectl delete httproute -n enterprise-agentgateway -l routing-type
kubectl delete httproute -n enterprise-agentgateway \
openai-gpt4o-mini openai-gpt4o openai-embeddings-route \
openai-dev-environment openai-prod-environment openai-ab-test \
openai-query-param-routing openai-weighted-routing \
openai-short-content openai-long-content openai-failover
# Remove test backends (keep original openai-all-models)
kubectl delete agentgatewaybackend -n enterprise-agentgateway \
openai-gpt4o-mini openai-gpt4o openai-primary openai-fallback
# Remove policy
kubectl delete enterpriseagentgatewaypolicy content-based-routing -n enterprise-agentgateway
# Clean up test scripts
rm -f test-weighted-routing.sh routing-analysis.sh comprehensive-routing-test.sh routing-health-check.sh
Next Steps
With advanced routing patterns mastered, you’re ready to:
- Add security layers - Authentication, authorization, and rate limiting
- Implement guardrails - Content filtering and safety policies
- Multi-provider routing - Add Anthropic, AWS Bedrock, and others
- Production optimization - Performance tuning and cost management
In our next blog post, we’ll explore security features including JWT authentication, API key management, and role-based access control.
Key Takeaways
- Intelligent routing transforms AgentGateway into a sophisticated AI traffic manager
- Multiple routing criteria enable complex decision-making logic
- A/B testing and gradual rollouts provide safe model deployment strategies
- Content-based routing optimizes cost and performance automatically
- Observability provides insights into routing effectiveness and patterns
- Fallback strategies ensure reliability even when primary models fail
Your AgentGateway now has enterprise-grade routing capabilities that can handle complex production scenarios while optimizing for cost, performance, and reliability!