Production Deployment on Linux

Production AI Agent Deployment on Linux: Docker, Kubernetes & Scaling Guide (2024)

Production AI Agent Deployment on Linux: Complete DevOps Guide

Last Updated: November 5, 2024 | Reading Time: 22 minutes | Difficulty: Advanced

Introduction

Building an AI agent is one thing. Deploying it to production with high availability, scalability, and security is another. This comprehensive guide covers everything you need to deploy AI agents in production environments on Linux.

What you’ll learn:

  • Docker containerization for AI agents
  • Kubernetes orchestration and auto-scaling
  • Production monitoring and logging
  • Security hardening and secrets management
  • High availability and disaster recovery
  • CI/CD pipelines for agent deployment

Production Architecture Overview


┌─────────────────────────────────────────────────┐
│              Load Balancer (Nginx)               │
│           SSL Termination & Rate Limiting        │
└────────────┬────────────────────────────────────┘
             │
    ┌────────┴────────┐
    │                 │
┌───▼────┐      ┌────▼───┐
│ Agent  │      │ Agent  │  ← Kubernetes Pods
│  API   │      │  API   │     (Auto-scaling)
└───┬────┘      └────┬───┘
    │                │
    └────────┬───────┘
             │
     ┌───────▼────────┐
     │  Redis Cluster │  ← Caching & Session
     │  (Master/Slave)│
     └───────┬────────┘
             │
     ┌───────▼────────┐
     │  Vector Store  │  ← ChromaDB/Pinecone
     │   (Persistent) │
     └───────┬────────┘
             │
     ┌───────▼────────┐
     │   PostgreSQL   │  ← Application DB
     │  (Primary/Rep) │
     └────────────────┘

     Monitoring Stack:
     - Prometheus (Metrics)
     - Grafana (Dashboards)
     - Loki (Logs)
     - Alert Manager

Part 1: Dockerizing Your AI Agent

Project Structure

ai-agent-prod/
├── app/
│   ├── __init__.py
│   ├── agent.py           # Main agent logic
│   ├── memory.py          # Memory systems
│   ├── tools.py           # Agent tools
│   └── api.py             # FastAPI endpoints
├── config/
│   ├── config.yaml
│   └── logging.conf
├── tests/
│   └── test_agent.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .dockerignore
└── README.md

Dockerfile for AI Agent

# Dockerfile
FROM python:3.11-slim

# Set environment variables
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=1

# Create non-root user
RUN useradd -m -u 1000 agent && \
    mkdir -p /app /data && \
    chown -R agent:agent /app /data

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first (for layer caching)
COPY --chown=agent:agent requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY --chown=agent:agent . .

# Switch to non-root user
USER agent

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')"

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "app.api:app", "--host", "0.0.0.0", "--port", "8000"]

FastAPI Application (app/api.py)

#!/usr/bin/env python3
"""
Production API for AI Agent
"""

from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional
import logging
import time
from prometheus_client import Counter, Histogram, generate_latest
from starlette.responses import Response

# Initialize app
app = FastAPI(
    title="AI Agent API",
    version="1.0.0",
    docs_url="/docs",
    redoc_url="/redoc"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Configure properly in production
    allow_methods=["*"],
    allow_headers=["*"],
)

# Prometheus metrics
REQUEST_COUNT = Counter('agent_requests_total', 'Total requests', ['endpoint', 'status'])
REQUEST_DURATION = Histogram('agent_request_duration_seconds', 'Request duration', ['endpoint'])
AGENT_EXECUTIONS = Counter('agent_executions_total', 'Agent executions', ['status'])

# Request/Response models
class QueryRequest(BaseModel):
    query: str
    user_id: Optional[str] = None
    session_id: Optional[str] = None
    context: Optional[dict] = None

class QueryResponse(BaseModel):
    response: str
    sources: Optional[list] = None
    execution_time: float
    session_id: str

# Health check
@app.get("/health")
async def health_check():
    """Health check endpoint"""
    return {
        "status": "healthy",
        "timestamp": time.time(),
        "version": "1.0.0"
    }

# Readiness check
@app.get("/ready")
async def readiness_check():
    """Readiness check for Kubernetes"""
    # Check dependencies (DB, Redis, etc.)
    try:
        # Check vector store
        # Check Redis
        # Check database
        return {"status": "ready"}
    except Exception as e:
        raise HTTPException(status_code=503, detail="Not ready")

# Metrics endpoint
@app.get("/metrics")
async def metrics():
    """Prometheus metrics endpoint"""
    return Response(generate_latest(), media_type="text/plain")

# Main agent endpoint
@app.post("/query", response_model=QueryResponse)
async def query_agent(request: QueryRequest, background_tasks: BackgroundTasks):
    """Process agent query"""

    start_time = time.time()

    try:
        # Execute agent (implement your agent logic)
        result = execute_agent(request.query, request.context)

        execution_time = time.time() - start_time

        # Log to background
        background_tasks.add_task(log_query, request, result, execution_time)

        # Update metrics
        REQUEST_COUNT.labels(endpoint='query', status='success').inc()
        REQUEST_DURATION.labels(endpoint='query').observe(execution_time)
        AGENT_EXECUTIONS.labels(status='success').inc()

        return QueryResponse(
            response=result['answer'],
            sources=result.get('sources'),
            execution_time=execution_time,
            session_id=request.session_id or "default"
        )

    except Exception as e:
        REQUEST_COUNT.labels(endpoint='query', status='error').inc()
        AGENT_EXECUTIONS.labels(status='error').inc()
        logging.error(f"Agent error: {str(e)}")
        raise HTTPException(status_code=500, detail=str(e))

def execute_agent(query: str, context: dict = None):
    """Execute agent logic"""
    # Implement your agent execution here
    return {
        "answer": "Agent response",
        "sources": ["source1", "source2"]
    }

def log_query(request, result, execution_time):
    """Background task to log query"""
    logging.info(f"Query: {request.query} | Time: {execution_time:.2f}s")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

docker-compose.yml for Local Testing

version: '3.8'

services:
  agent-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - REDIS_URL=redis://redis:6379
      - POSTGRES_URL=postgresql://user:pass@postgres:5432/agentdb
    depends_on:
      - redis
      - postgres
    volumes:
      - ./data:/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes
    restart: unless-stopped

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: agentdb
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
    restart: unless-stopped

  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8001:8000"
    volumes:
      - chroma-data:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
    restart: unless-stopped

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./config/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    depends_on:
      - prometheus
    restart: unless-stopped

volumes:
  redis-data:
  postgres-data:
  chroma-data:
  prometheus-data:
  grafana-data:

Build and Run

# Build image
docker build -t ai-agent:latest .

# Run with docker-compose
docker-compose up -d

# Check logs
docker-compose logs -f agent-api

# Test endpoint
curl http://localhost:8000/health
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is Docker?"}'

Part 2: Kubernetes Deployment

Kubernetes Manifests

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ai-agent
---
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
  namespace: ai-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-agent
  template:
    metadata:
      labels:
        app: ai-agent
    spec:
      containers:
      - name: agent-api
        image: your-registry/ai-agent:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-agent-secrets
              key: openai-api-key
        - name: REDIS_URL
          value: "redis://redis-service:6379"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ai-agent-service
  namespace: ai-agent
spec:
  selector:
    app: ai-agent
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer
---
# k8s/hpa.yaml (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-agent-hpa
  namespace: ai-agent
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-agent
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-agent-ingress
  namespace: ai-agent
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.yourdomain.com
    secretName: ai-agent-tls
  rules:
  - host: api.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ai-agent-service
            port:
              number: 80

Deploy to Kubernetes

# Create namespace
kubectl apply -f k8s/namespace.yaml

# Create secrets
kubectl create secret generic ai-agent-secrets \
  --from-literal=openai-api-key=$OPENAI_API_KEY \
  -n ai-agent

# Deploy application
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/hpa.yaml
kubectl apply -f k8s/ingress.yaml

# Check status
kubectl get pods -n ai-agent
kubectl get svc -n ai-agent
kubectl get hpa -n ai-agent

# View logs
kubectl logs -f deployment/ai-agent -n ai-agent

# Scale manually
kubectl scale deployment ai-agent --replicas=5 -n ai-agent

Part 3: Monitoring with Prometheus & Grafana

Prometheus Configuration

# config/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'ai-agent'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - ai-agent
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: ai-agent
      - source_labels: [__meta_kubernetes_pod_ip]
        target_label: __address__
        replacement: ${1}:8000

  - job_name: 'redis'
    static_configs:
      - targets: ['redis-service:6379']

  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-service:5432']

Key Metrics to Track

  • Request Rate: Requests per second
  • Response Time: P50, P95, P99 latencies
  • Error Rate: 4xx/5xx errors
  • Agent Execution Time: How long agents take to respond
  • Memory Usage: RAM consumption per pod
  • CPU Usage: CPU utilization
  • Cache Hit Rate: Redis cache effectiveness
  • Database Connections: Active DB connections

Part 4: Logging with Loki

# Install Loki stack
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace=monitoring \
  --set grafana.enabled=true

# Configure logging in agent
import logging
from pythonjsonlogger import jsonlogger

logger = logging.getLogger()
logHandler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter()
logHandler.setFormatter(formatter)
logger.addHandler(logHandler)
logger.setLevel(logging.INFO)

# Structured logging
logger.info("Agent query", extra={
    "user_id": "user123",
    "query": "What is Kubernetes?",
    "execution_time": 1.23,
    "status": "success"
})

Part 5: CI/CD Pipeline with GitLab CI

# .gitlab-ci.yml
stages:
  - test
  - build
  - deploy

variables:
  DOCKER_REGISTRY: registry.gitlab.com/yourgroup/ai-agent
  K8S_NAMESPACE: ai-agent

test:
  stage: test
  image: python:3.11
  script:
    - pip install -r requirements.txt
    - pytest tests/ --cov=app --cov-report=term

build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  before_script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
  script:
    - docker build -t $DOCKER_REGISTRY:$CI_COMMIT_SHA .
    - docker build -t $DOCKER_REGISTRY:latest .
    - docker push $DOCKER_REGISTRY:$CI_COMMIT_SHA
    - docker push $DOCKER_REGISTRY:latest
  only:
    - main

deploy-staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl config set-cluster k8s --server="$K8S_SERVER"
    - kubectl config set-credentials gitlab --token="$K8S_TOKEN"
    - kubectl config set-context default --cluster=k8s --user=gitlab
    - kubectl config use-context default
    - kubectl set image deployment/ai-agent agent-api=$DOCKER_REGISTRY:$CI_COMMIT_SHA -n $K8S_NAMESPACE
    - kubectl rollout status deployment/ai-agent -n $K8S_NAMESPACE
  environment:
    name: staging
  only:
    - main

deploy-production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl set image deployment/ai-agent agent-api=$DOCKER_REGISTRY:$CI_COMMIT_SHA -n production
    - kubectl rollout status deployment/ai-agent -n production
  environment:
    name: production
  when: manual
  only:
    - main

Part 6: Security Best Practices

1. Secrets Management with Vault

# Install Vault
helm install vault hashicorp/vault

# Store secrets
vault kv put secret/ai-agent \
  openai_api_key="sk-..." \
  anthropic_api_key="sk-ant-..."

# Retrieve in Python
import hvac

client = hvac.Client(url='http://vault:8200', token=os.getenv('VAULT_TOKEN'))
secrets = client.secrets.kv.v2.read_secret_version(path='ai-agent')
openai_key = secrets['data']['data']['openai_api_key']

2. Network Policies

# k8s/network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ai-agent-network-policy
  namespace: ai-agent
spec:
  podSelector:
    matchLabels:
      app: ai-agent
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 6379  # Redis
    - protocol: TCP
      port: 5432  # PostgreSQL
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443  # HTTPS outbound

3. Pod Security Policies

# Run as non-root
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true

Performance Benchmarks

Configuration RPS P95 Latency Cost/Month
Single Pod (1 CPU) 50 200ms $30
3 Pods (HPA) 200 150ms $90
10 Pods (High Traffic) 800 100ms $300

Disaster Recovery

# Backup Kubernetes resources
kubectl get all -n ai-agent -o yaml > backup.yaml

# Backup databases
pg_dump -h postgres-service -U user agentdb > backup.sql

# Backup vector store
docker exec chromadb tar czf /backup/chroma.tar.gz /chroma/chroma

# Restore
kubectl apply -f backup.yaml
psql -h postgres-service -U user agentdb < backup.sql

Conclusion

Production deployment of AI agents requires careful planning and implementation of DevOps best practices. This guide covered containerization, orchestration, monitoring, security, and CI/CD – everything needed for a robust production system.

Next: See real-world case studies in Article 8 to learn from production deployments.

Was this article helpful?

R

About Ramesh Sundararamaiah

Red Hat Certified Architect

Expert in Linux system administration, DevOps automation, and cloud infrastructure. Specializing in Red Hat Enterprise Linux, CentOS, Ubuntu, Docker, Ansible, and enterprise IT solutions.