Production Deployment on Linux
📑 Table of Contents
- Introduction
- Production Architecture Overview
- Part 1: Dockerizing Your AI Agent
- Project Structure
- Dockerfile for AI Agent
- FastAPI Application (app/api.py)
- docker-compose.yml for Local Testing
- Build and Run
- Part 2: Kubernetes Deployment
- Kubernetes Manifests
- Deploy to Kubernetes
- Part 3: Monitoring with Prometheus & Grafana
- Prometheus Configuration
- Key Metrics to Track
- Part 4: Logging with Loki
- Part 5: CI/CD Pipeline with GitLab CI
- Part 6: Security Best Practices
- 1. Secrets Management with Vault
- 2. Network Policies
- 3. Pod Security Policies
- Performance Benchmarks
- Disaster Recovery
- Conclusion
Production AI Agent Deployment on Linux: Complete DevOps Guide
Last Updated: November 5, 2024 | Reading Time: 22 minutes | Difficulty: Advanced
Introduction
Building an AI agent is one thing. Deploying it to production with high availability, scalability, and security is another. This comprehensive guide covers everything you need to deploy AI agents in production environments on Linux.
What you’ll learn:
- Docker containerization for AI agents
- Kubernetes orchestration and auto-scaling
- Production monitoring and logging
- Security hardening and secrets management
- High availability and disaster recovery
- CI/CD pipelines for agent deployment
Production Architecture Overview
┌─────────────────────────────────────────────────┐
│ Load Balancer (Nginx) │
│ SSL Termination & Rate Limiting │
└────────────┬────────────────────────────────────┘
│
┌────────┴────────┐
│ │
┌───▼────┐ ┌────▼───┐
│ Agent │ │ Agent │ ← Kubernetes Pods
│ API │ │ API │ (Auto-scaling)
└───┬────┘ └────┬───┘
│ │
└────────┬───────┘
│
┌───────▼────────┐
│ Redis Cluster │ ← Caching & Session
│ (Master/Slave)│
└───────┬────────┘
│
┌───────▼────────┐
│ Vector Store │ ← ChromaDB/Pinecone
│ (Persistent) │
└───────┬────────┘
│
┌───────▼────────┐
│ PostgreSQL │ ← Application DB
│ (Primary/Rep) │
└────────────────┘
Monitoring Stack:
- Prometheus (Metrics)
- Grafana (Dashboards)
- Loki (Logs)
- Alert Manager
Part 1: Dockerizing Your AI Agent
Project Structure
ai-agent-prod/
├── app/
│ ├── __init__.py
│ ├── agent.py # Main agent logic
│ ├── memory.py # Memory systems
│ ├── tools.py # Agent tools
│ └── api.py # FastAPI endpoints
├── config/
│ ├── config.yaml
│ └── logging.conf
├── tests/
│ └── test_agent.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .dockerignore
└── README.md
Dockerfile for AI Agent
# Dockerfile
FROM python:3.11-slim
# Set environment variables
ENV PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PIP_NO_CACHE_DIR=1
# Create non-root user
RUN useradd -m -u 1000 agent && \
mkdir -p /app /data && \
chown -R agent:agent /app /data
WORKDIR /app
# Install system dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first (for layer caching)
COPY --chown=agent:agent requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY --chown=agent:agent . .
# Switch to non-root user
USER agent
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health')"
# Expose port
EXPOSE 8000
# Run application
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]
FastAPI Application (app/api.py)
#!/usr/bin/env python3
"""
Production API for AI Agent
"""
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
import logging
import time
from prometheus_client import Counter, Histogram, generate_latest
from starlette.responses import Response
# Initialize app
app = FastAPI(
title="AI Agent API",
version="1.0.0",
docs_url="/docs",
redoc_url="/redoc"
)
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"], # Configure properly in production
allow_methods=["*"],
allow_headers=["*"],
)
# Prometheus metrics
REQUEST_COUNT = Counter('agent_requests_total', 'Total requests', ['endpoint', 'status'])
REQUEST_DURATION = Histogram('agent_request_duration_seconds', 'Request duration', ['endpoint'])
AGENT_EXECUTIONS = Counter('agent_executions_total', 'Agent executions', ['status'])
# Request/Response models
class QueryRequest(BaseModel):
query: str
user_id: Optional[str] = None
session_id: Optional[str] = None
context: Optional[dict] = None
class QueryResponse(BaseModel):
response: str
sources: Optional[list] = None
execution_time: float
session_id: str
# Health check
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {
"status": "healthy",
"timestamp": time.time(),
"version": "1.0.0"
}
# Readiness check
@app.get("/ready")
async def readiness_check():
"""Readiness check for Kubernetes"""
# Check dependencies (DB, Redis, etc.)
try:
# Check vector store
# Check Redis
# Check database
return {"status": "ready"}
except Exception as e:
raise HTTPException(status_code=503, detail="Not ready")
# Metrics endpoint
@app.get("/metrics")
async def metrics():
"""Prometheus metrics endpoint"""
return Response(generate_latest(), media_type="text/plain")
# Main agent endpoint
@app.post("/query", response_model=QueryResponse)
async def query_agent(request: QueryRequest, background_tasks: BackgroundTasks):
"""Process agent query"""
start_time = time.time()
try:
# Execute agent (implement your agent logic)
result = execute_agent(request.query, request.context)
execution_time = time.time() - start_time
# Log to background
background_tasks.add_task(log_query, request, result, execution_time)
# Update metrics
REQUEST_COUNT.labels(endpoint='query', status='success').inc()
REQUEST_DURATION.labels(endpoint='query').observe(execution_time)
AGENT_EXECUTIONS.labels(status='success').inc()
return QueryResponse(
response=result['answer'],
sources=result.get('sources'),
execution_time=execution_time,
session_id=request.session_id or "default"
)
except Exception as e:
REQUEST_COUNT.labels(endpoint='query', status='error').inc()
AGENT_EXECUTIONS.labels(status='error').inc()
logging.error(f"Agent error: {str(e)}")
raise HTTPException(status_code=500, detail=str(e))
def execute_agent(query: str, context: dict = None):
"""Execute agent logic"""
# Implement your agent execution here
return {
"answer": "Agent response",
"sources": ["source1", "source2"]
}
def log_query(request, result, execution_time):
"""Background task to log query"""
logging.info(f"Query: {request.query} | Time: {execution_time:.2f}s")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
docker-compose.yml for Local Testing
version: '3.8'
services:
agent-api:
build: .
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379
- POSTGRES_URL=postgresql://user:pass@postgres:5432/agentdb
depends_on:
- redis
- postgres
volumes:
- ./data:/data
restart: unless-stopped
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
command: redis-server --appendonly yes
restart: unless-stopped
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: agentdb
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
restart: unless-stopped
chromadb:
image: chromadb/chroma:latest
ports:
- "8001:8000"
volumes:
- chroma-data:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
restart: unless-stopped
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
depends_on:
- prometheus
restart: unless-stopped
volumes:
redis-data:
postgres-data:
chroma-data:
prometheus-data:
grafana-data:
Build and Run
# Build image
docker build -t ai-agent:latest .
# Run with docker-compose
docker-compose up -d
# Check logs
docker-compose logs -f agent-api
# Test endpoint
curl http://localhost:8000/health
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "What is Docker?"}'
Part 2: Kubernetes Deployment
Kubernetes Manifests
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ai-agent
---
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-agent
namespace: ai-agent
spec:
replicas: 3
selector:
matchLabels:
app: ai-agent
template:
metadata:
labels:
app: ai-agent
spec:
containers:
- name: agent-api
image: your-registry/ai-agent:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-agent-secrets
key: openai-api-key
- name: REDIS_URL
value: "redis://redis-service:6379"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: ai-agent-service
namespace: ai-agent
spec:
selector:
app: ai-agent
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
---
# k8s/hpa.yaml (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-agent-hpa
namespace: ai-agent
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-agent
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ai-agent-ingress
namespace: ai-agent
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.yourdomain.com
secretName: ai-agent-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ai-agent-service
port:
number: 80
Deploy to Kubernetes
# Create namespace
kubectl apply -f k8s/namespace.yaml
# Create secrets
kubectl create secret generic ai-agent-secrets \
--from-literal=openai-api-key=$OPENAI_API_KEY \
-n ai-agent
# Deploy application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/ingress.yaml
# Check status
kubectl get pods -n ai-agent
kubectl get svc -n ai-agent
kubectl get hpa -n ai-agent
# View logs
kubectl logs -f deployment/ai-agent -n ai-agent
# Scale manually
kubectl scale deployment ai-agent --replicas=5 -n ai-agent
Part 3: Monitoring with Prometheus & Grafana
Prometheus Configuration
# config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'ai-agent'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- ai-agent
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: ai-agent
- source_labels: [__meta_kubernetes_pod_ip]
target_label: __address__
replacement: ${1}:8000
- job_name: 'redis'
static_configs:
- targets: ['redis-service:6379']
- job_name: 'postgres'
static_configs:
- targets: ['postgres-service:5432']
Key Metrics to Track
- Request Rate: Requests per second
- Response Time: P50, P95, P99 latencies
- Error Rate: 4xx/5xx errors
- Agent Execution Time: How long agents take to respond
- Memory Usage: RAM consumption per pod
- CPU Usage: CPU utilization
- Cache Hit Rate: Redis cache effectiveness
- Database Connections: Active DB connections
Part 4: Logging with Loki
# Install Loki stack
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace=monitoring \
--set grafana.enabled=true
# Configure logging in agent
import logging
from pythonjsonlogger import jsonlogger
logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
logger.setLevel(logging.INFO)
# Structured logging
logger.info("Agent query", extra={
"user_id": "user123",
"query": "What is Kubernetes?",
"execution_time": 1.23,
"status": "success"
})
Part 5: CI/CD Pipeline with GitLab CI
# .gitlab-ci.yml
stages:
- test
- build
- deploy
variables:
DOCKER_REGISTRY: registry.gitlab.com/yourgroup/ai-agent
K8S_NAMESPACE: ai-agent
test:
stage: test
image: python:3.11
script:
- pip install -r requirements.txt
- pytest tests/ --cov=app --cov-report=term
build:
stage: build
image: docker:latest
services:
- docker:dind
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- docker build -t $DOCKER_REGISTRY:$CI_COMMIT_SHA .
- docker build -t $DOCKER_REGISTRY:latest .
- docker push $DOCKER_REGISTRY:$CI_COMMIT_SHA
- docker push $DOCKER_REGISTRY:latest
only:
- main
deploy-staging:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl config set-cluster k8s --server="$K8S_SERVER"
- kubectl config set-credentials gitlab --token="$K8S_TOKEN"
- kubectl config set-context default --cluster=k8s --user=gitlab
- kubectl config use-context default
- kubectl set image deployment/ai-agent agent-api=$DOCKER_REGISTRY:$CI_COMMIT_SHA -n $K8S_NAMESPACE
- kubectl rollout status deployment/ai-agent -n $K8S_NAMESPACE
environment:
name: staging
only:
- main
deploy-production:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl set image deployment/ai-agent agent-api=$DOCKER_REGISTRY:$CI_COMMIT_SHA -n production
- kubectl rollout status deployment/ai-agent -n production
environment:
name: production
when: manual
only:
- main
Part 6: Security Best Practices
1. Secrets Management with Vault
# Install Vault
helm install vault hashicorp/vault
# Store secrets
vault kv put secret/ai-agent \
openai_api_key="sk-..." \
anthropic_api_key="sk-ant-..."
# Retrieve in Python
import hvac
client = hvac.Client(url='http://vault:8200', token=os.getenv('VAULT_TOKEN'))
secrets = client.secrets.kv.v2.read_secret_version(path='ai-agent')
openai_key = secrets['data']['data']['openai_api_key']
2. Network Policies
# k8s/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ai-agent-network-policy
namespace: ai-agent
spec:
podSelector:
matchLabels:
app: ai-agent
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8000
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 6379 # Redis
- protocol: TCP
port: 5432 # PostgreSQL
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443 # HTTPS outbound
3. Pod Security Policies
# Run as non-root
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
Performance Benchmarks
| Configuration | RPS | P95 Latency | Cost/Month |
|---|---|---|---|
| Single Pod (1 CPU) | 50 | 200ms | $30 |
| 3 Pods (HPA) | 200 | 150ms | $90 |
| 10 Pods (High Traffic) | 800 | 100ms | $300 |
Disaster Recovery
# Backup Kubernetes resources
kubectl get all -n ai-agent -o yaml > backup.yaml
# Backup databases
pg_dump -h postgres-service -U user agentdb > backup.sql
# Backup vector store
docker exec chromadb tar czf /backup/chroma.tar.gz /chroma/chroma
# Restore
kubectl apply -f backup.yaml
psql -h postgres-service -U user agentdb < backup.sql
Conclusion
Production deployment of AI agents requires careful planning and implementation of DevOps best practices. This guide covered containerization, orchestration, monitoring, security, and CI/CD – everything needed for a robust production system.
Next: See real-world case studies in Article 8 to learn from production deployments.
Was this article helpful?
About Ramesh Sundararamaiah
Red Hat Certified Architect
Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.