Scaling and Production Deployment

Robyn is designed to scale efficiently from development to production. This guide covers scaling strategies, production deployment patterns, and performance optimization techniques.

Understanding Robyn's Scaling Model

Multi-Process Architecture

Robyn uses a shared-nothing multi-process model optimized for modern multi-core systems. Each process:

  • Runs independently with its own Python interpreter and memory space
  • Has its own Global Interpreter Lock (GIL), eliminating GIL contention
  • Can handle multiple concurrent requests via worker threads
  • Scales linearly across CPU cores for true parallelism
  • Provides fault isolation - one process crash doesn't affect others

Worker Threads

Within each process, multiple worker threads provide concurrency:

  • Share the same Python interpreter and memory space
  • Are subject to the GIL for CPU-bound tasks (use more processes for CPU work)
  • Excel at I/O-bound operations where threads can release the GIL
  • Handle database calls, HTTP requests, file operations efficiently
  • Provide request-level concurrency within the same process

Scaling Configuration

Basic Scaling

Single Process Development: Start simple during development with auto-reload enabled.

Development Scaling

# Development mode (single process, single worker)
python app.py --dev

# Development with custom port
python app.py --dev --port 3000

Multi-Core Production: Scale across all available CPU cores for maximum performance.

Production Scaling

# Automatic optimization (recommended)
python app.py --fast

# Manual configuration for 8-core system
python app.py --processes 8 --workers 2

# I/O heavy workloads
python app.py --processes 4 --workers 4

# CPU heavy workloads  
python app.py --processes 8 --workers 1

Advanced Configuration

Hardware-Specific Tuning: Optimize based on your specific hardware and application characteristics.

Advanced Tuning

# High-traffic web API (4-core system)
python app.py --processes 4 --workers 3 --log-level INFO

# Data processing service (8-core system)
python app.py --processes 8 --workers 1 --log-level WARNING

# Mixed workload (balanced approach)
python app.py --processes 6 --workers 2 --log-level INFO

# Maximum concurrency (16-core system)
python app.py --processes 8 --workers 4

Configuration Guidelines

Choosing the Right Configuration: Use these guidelines to optimize for your specific use case.

Configuration Guidelines

# For CPU-intensive applications
# Rule: processes = CPU cores, workers = 1
# Example: 8-core machine
python app.py --processes 8 --workers 1

# For I/O-intensive applications  
# Rule: processes = CPU cores / 2, workers = 2-4
# Example: 8-core machine
python app.py --processes 4 --workers 3

# For balanced applications
# Rule: processes = CPU cores / 2, workers = 2
# Example: 8-core machine
python app.py --processes 4 --workers 2

# Memory considerations
# Each process uses ~50-100MB base memory
# Monitor with: ps aux | grep python

Production Deployment Strategies

Container Deployment

Docker Containerization: Deploy Robyn applications using Docker for consistent environments and easy scaling.

Docker Deployment

# Dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8080

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

# Run application with optimized settings
CMD ["python", "app.py", "--fast", "--host", "0.0.0.0", "--port", "8080"]

Load Balancer Configuration

Nginx Load Balancing: Use Nginx as a reverse proxy and load balancer for multiple Robyn instances.

Nginx Configuration

# /etc/nginx/sites-available/robyn-app
upstream robyn_backend {
    least_conn;
    server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8081 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8082 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:8083 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name yourdomain.com;
    
    # Security headers
    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    
    # Gzip compression
    gzip on;
    gzip_types text/plain application/json application/javascript text/css;
    
    # Static files (if any)
    location /static/ {
        alias /var/www/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
    
    # API routes
    location / {
        proxy_pass http://robyn_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        
        # Health checks
        proxy_next_upstream error timeout http_500 http_502 http_503;
    }
    
    # Health check endpoint
    location /health {
        access_log off;
        proxy_pass http://robyn_backend;
    }
}

Kubernetes Deployment

Kubernetes Orchestration: Deploy and scale Robyn applications in Kubernetes clusters.

Kubernetes Manifests

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: robyn-app
  labels:
    app: robyn-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: robyn-app
  template:
    metadata:
      labels:
        app: robyn-app
    spec:
      containers:
      - name: robyn-app
        image: your-registry/robyn-app:latest
        ports:
        - containerPort: 8080
        env:
        - name: ROBYN_HOST
          value: "0.0.0.0"
        - name: ROBYN_PORT
          value: "8080"
        - name: ROBYN_LOG_LEVEL
          value: "INFO"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
  name: robyn-service
spec:
  selector:
    app: robyn-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: LoadBalancer

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: robyn-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: robyn-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Monitoring and Observability

Application Metrics

Built-in Monitoring: Add comprehensive monitoring and metrics collection to your Robyn application for production observability.

Advanced Monitoring Setup

from robyn import Robyn
import time
import psutil
import logging
import threading
from collections import defaultdict, deque

app = Robyn(__file__)

# Configure structured logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('app.log')
    ]
)
logger = logging.getLogger(__name__)

# Advanced metrics collection
class MetricsCollector:
    def __init__(self):
        self.request_count = 0
        self.total_response_time = 0
        self.status_codes = defaultdict(int)
        self.endpoint_stats = defaultdict(lambda: {"count": 0, "total_time": 0})
        self.response_times = deque(maxlen=1000)  # Last 1000 requests
        self.lock = threading.Lock()
    
    def record_request(self, method, path, status_code, duration):
        with self.lock:
            self.request_count += 1
            self.total_response_time += duration
            self.status_codes[status_code] += 1
            endpoint_key = f"{method} {path}"
            self.endpoint_stats[endpoint_key]["count"] += 1
            self.endpoint_stats[endpoint_key]["total_time"] += duration
            self.response_times.append(duration)
    
    def get_metrics(self):
        with self.lock:
            avg_response_time = self.total_response_time / max(self.request_count, 1)
            p95_response_time = sorted(self.response_times)[int(len(self.response_times) * 0.95)] if self.response_times else 0
            
            return {
                "requests_total": self.request_count,
                "avg_response_time": avg_response_time,
                "p95_response_time": p95_response_time,
                "status_codes": dict(self.status_codes),
                "top_endpoints": dict(sorted(
                    self.endpoint_stats.items(),
                    key=lambda x: x[1]["count"],
                    reverse=True
                )[:10])
            }

metrics = MetricsCollector()
start_time = time.time()

@app.before_request
def monitor_request_start(request):
    request.start_time = time.time()
    return request

@app.after_request
def monitor_request_end(request, response):
    duration = time.time() - request.start_time
    status_code = getattr(response, 'status_code', 200)
    
    # Record metrics
    metrics.record_request(request.method, request.url.path, status_code, duration)
    
    # Log slow requests
    if duration > 1.0:
        logger.warning(
            f"Slow request: {request.method} {request.url.path} - "
            f"{duration:.3f}s (status: {status_code})"
        )
    
    # Add response headers
    response.headers["X-Response-Time"] = f"{duration:.3f}s"
    response.headers["X-Request-ID"] = str(time.time_ns())
    return response

# Health endpoint with detailed status
@app.get("/health", const=True)
def health_check():
    return {
        "status": "healthy",
        "version": "1.0.0",
        "uptime_seconds": time.time() - start_time
    }

# Comprehensive metrics endpoint
@app.get("/metrics")
def get_metrics():
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage('/')
    
    app_metrics = metrics.get_metrics()
    
    return {
        "system": {
            "cpu_usage_percent": cpu_percent,
            "memory_usage_percent": memory.percent,
            "memory_available_mb": memory.available / 1024 / 1024,
            "disk_usage_percent": disk.percent,
            "load_average": psutil.getloadavg()
        },
        "application": app_metrics,
        "timestamp": time.time()
    }

# Readiness endpoint for k8s
@app.get("/ready")
def readiness_check():
    # Add your readiness checks here (DB connection, etc.)
    return {"ready": True}

Performance Optimization

Scaling Best Practices

  1. Start Simple: Begin with --fast mode and measure performance
  2. Profile Your Application: Identify CPU vs I/O bound operations
  3. Test Different Configurations: Benchmark various process/worker combinations
  4. Monitor Resource Usage: Watch CPU, memory, and I/O utilization
  5. Scale Horizontally: Use multiple instances behind a load balancer

Performance Testing

Load Testing: Use tools like wrk, Apache Bench, or Locust to test different configurations and find optimal settings.

Performance Testing

# Install wrk for load testing
# macOS: brew install wrk
# Ubuntu: sudo apt install wrk

# Test baseline performance
wrk -t4 -c100 -d30s http://localhost:8080/health

# Test with different configurations
# Configuration 1: Single process
python app.py --processes 1 --workers 1 &
wrk -t4 -c100 -d30s http://localhost:8080/api/endpoint

# Configuration 2: Multi-process
python app.py --processes 4 --workers 2 &
wrk -t4 -c100 -d30s http://localhost:8080/api/endpoint

# Configuration 3: Fast mode
python app.py --fast &
wrk -t4 -c100 -d30s http://localhost:8080/api/endpoint

# Compare results and choose the best configuration

Environment Variables

Configuration via Environment: Use environment variables for deployment flexibility.

Environment Configuration

# Production environment variables
export ROBYN_HOST=0.0.0.0
export ROBYN_PORT=8080
export ROBYN_LOG_LEVEL=INFO
export ROBYN_PROCESSES=4
export ROBYN_WORKERS=2

# Database configuration
export DATABASE_URL=postgresql://user:pass@localhost/db
export REDIS_URL=redis://localhost:6379/0

# Application settings
export SECRET_KEY=your-secret-key
export DEBUG=false
export ENVIRONMENT=production

# Start application with environment config
python app.py