# Deployment Guide This guide covers deploying the Predictor Service using Docker and Docker Compose. ## Prerequisites - Docker Engine 20.10+ - Docker Compose 2.0+ - At least 2GB RAM available - 10GB free disk space ## Quick Deployment ### 1. Clone and Setup ```bash git clone cd predictor ``` ### 2. Validate Configuration ```bash # Validate Docker configuration ./scripts/validate-docker.sh ``` ### 3. Deploy ```bash # Build and start services make up-build # Check status make ps # View logs make logs ``` ## Production Deployment ### Environment Configuration 1. **Copy environment template:** ```bash cp cmd/api/.env cmd/api/.env.production ``` 2. **Edit production environment:** ```bash nano cmd/api/.env.production ``` 3. **Key production settings:** ```bash # Security GSN_PREDICTOR_REDIS_PASSWORD=your_secure_password # Performance GSN_PREDICTOR_GRIB_PARALLEL=8 GSN_PREDICTOR_GRIB_CACHE_TTL=2h # Monitoring GSN_PREDICTOR_GRIB_UPDATER_INTERVAL=3h ``` ### Production Docker Compose Create `docker-compose.prod.yml`: ```yaml version: '3.8' services: predictor: build: context: . dockerfile: Dockerfile container_name: predictor-prod ports: - "8080:8080" env_file: - cmd/api/.env.production volumes: - grib_data:/tmp/grib depends_on: redis: condition: service_healthy networks: - predictor-network restart: unless-stopped deploy: resources: limits: memory: 1G cpus: '0.5' reservations: memory: 512M cpus: '0.25' healthcheck: test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s redis: image: redis:7.2-alpine container_name: predictor-redis-prod ports: - "6379:6379" volumes: - redis_data:/data networks: - predictor-network restart: unless-stopped command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru --requirepass ${GSN_PREDICTOR_REDIS_PASSWORD} healthcheck: test: ["CMD", "redis-cli", "-a", "${GSN_PREDICTOR_REDIS_PASSWORD}", "ping"] interval: 10s timeout: 3s retries: 5 start_period: 10s volumes: grib_data: driver: local redis_data: driver: local networks: predictor-network: driver: bridge ``` ### Deploy to Production ```bash # Deploy with production config docker-compose -f docker-compose.prod.yml up -d # Monitor deployment docker-compose -f docker-compose.prod.yml logs -f # Check health curl http://localhost:8080/health ``` ## Kubernetes Deployment ### Create Namespace ```yaml # k8s/namespace.yaml apiVersion: v1 kind: Namespace metadata: name: predictor ``` ### Redis Deployment ```yaml # k8s/redis.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis namespace: predictor spec: replicas: 1 selector: matchLabels: app: redis template: metadata: labels: app: redis spec: containers: - name: redis image: redis:7.2-alpine ports: - containerPort: 6379 command: ["redis-server", "--appendonly", "yes", "--maxmemory", "512mb", "--maxmemory-policy", "allkeys-lru"] volumeMounts: - name: redis-data mountPath: /data resources: limits: memory: "512Mi" cpu: "250m" requests: memory: "256Mi" cpu: "100m" livenessProbe: exec: command: ["redis-cli", "ping"] initialDelaySeconds: 10 periodSeconds: 10 readinessProbe: exec: command: ["redis-cli", "ping"] initialDelaySeconds: 5 periodSeconds: 5 volumes: - name: redis-data persistentVolumeClaim: claimName: redis-pvc --- apiVersion: v1 kind: Service metadata: name: redis namespace: predictor spec: selector: app: redis ports: - port: 6379 targetPort: 6379 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: redis-pvc namespace: predictor spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi ``` ### Predictor Deployment ```yaml # k8s/predictor.yaml apiVersion: apps/v1 kind: Deployment metadata: name: predictor namespace: predictor spec: replicas: 2 selector: matchLabels: app: predictor template: metadata: labels: app: predictor spec: containers: - name: predictor image: predictor:latest ports: - containerPort: 8080 env: - name: GSN_PREDICTOR_REDIS_HOST value: "redis" - name: GSN_PREDICTOR_REDIS_PORT value: "6379" - name: GSN_PREDICTOR_GRIB_DIR value: "/tmp/grib" - name: GSN_PREDICTOR_SCHEDULER_ENABLED value: "true" - name: GSN_PREDICTOR_GRIB_UPDATER_INTERVAL value: "6h" - name: GSN_PREDICTOR_GRIB_UPDATER_TIMEOUT value: "45m" volumeMounts: - name: grib-data mountPath: /tmp/grib resources: limits: memory: "1Gi" cpu: "500m" requests: memory: "512Mi" cpu: "250m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 40 periodSeconds: 30 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 10 volumes: - name: grib-data emptyDir: {} --- apiVersion: v1 kind: Service metadata: name: predictor namespace: predictor spec: selector: app: predictor ports: - port: 80 targetPort: 8080 type: LoadBalancer ``` ### Deploy to Kubernetes ```bash # Apply namespace kubectl apply -f k8s/namespace.yaml # Apply Redis kubectl apply -f k8s/redis.yaml # Wait for Redis to be ready kubectl wait --for=condition=ready pod -l app=redis -n predictor # Apply Predictor kubectl apply -f k8s/predictor.yaml # Check status kubectl get pods -n predictor kubectl get services -n predictor ``` ## Monitoring and Logging ### Health Checks The service includes built-in health checks: ```bash # Application health curl http://localhost:8080/health # Docker health docker inspect predictor | jq '.[0].State.Health' # Kubernetes health kubectl describe pod -l app=predictor -n predictor ``` ### Logging ```bash # Docker logs docker-compose logs -f predictor # Kubernetes logs kubectl logs -f deployment/predictor -n predictor ``` ### Metrics Consider adding Prometheus metrics: ```yaml # Add to docker-compose.yml prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml networks: - predictor-network ``` ## Backup and Recovery ### Redis Backup ```bash # Create backup docker exec predictor-redis redis-cli BGSAVE # Copy backup file docker cp predictor-redis:/data/dump.rdb ./backup/redis-$(date +%Y%m%d).rdb ``` ### GRIB Data Backup ```bash # Backup GRIB data docker run --rm -v predictor_grib_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/grib-$(date +%Y%m%d).tar.gz -C /data . ``` ### Automated Backup Script ```bash #!/bin/bash # scripts/backup.sh BACKUP_DIR="./backup/$(date +%Y%m%d)" mkdir -p $BACKUP_DIR # Redis backup docker exec predictor-redis redis-cli BGSAVE sleep 5 docker cp predictor-redis:/data/dump.rdb $BACKUP_DIR/redis.rdb # GRIB data backup docker run --rm -v predictor_grib_data:/data -v $(pwd)/$BACKUP_DIR:/backup alpine tar czf /backup/grib.tar.gz -C /data . echo "Backup completed: $BACKUP_DIR" ``` ## Troubleshooting ### Common Issues 1. **Redis Connection Issues:** ```bash # Check Redis status docker-compose exec redis redis-cli ping # Check network connectivity docker-compose exec predictor wget -O- http://redis:6379 ``` 2. **GRIB Download Failures:** ```bash # Check disk space docker-compose exec predictor df -h /tmp/grib # Check internet connectivity docker-compose exec predictor wget -O- https://nomads.ncep.noaa.gov/ ``` 3. **Memory Issues:** ```bash # Check memory usage docker stats # Check container logs docker-compose logs predictor | grep -i memory ``` ### Performance Tuning 1. **Redis Optimization:** ```bash # Increase Redis memory GSN_PREDICTOR_REDIS_MAXMEMORY=1gb # Optimize Redis settings redis-server --maxmemory 1gb --maxmemory-policy allkeys-lru ``` 2. **GRIB Processing:** ```bash # Increase parallel workers GSN_PREDICTOR_GRIB_PARALLEL=8 # Optimize cache TTL GSN_PREDICTOR_GRIB_CACHE_TTL=2h ``` 3. **Container Resources:** ```yaml # In docker-compose.yml deploy: resources: limits: memory: 2G cpus: '1.0' reservations: memory: 1G cpus: '0.5' ``` ## Security Considerations 1. **Network Security:** - Use internal networks for service communication - Expose only necessary ports - Use reverse proxy for external access 2. **Container Security:** - Run as non-root user - Use minimal base images - Regular security updates 3. **Data Security:** - Encrypt sensitive environment variables - Use secrets management for passwords - Regular backups 4. **Access Control:** - Implement API authentication - Use HTTPS in production - Monitor access logs ```