501 lines
No EOL
9.5 KiB
Markdown
501 lines
No EOL
9.5 KiB
Markdown
# Deployment Guide
|
|
|
|
This guide covers deploying the Predictor Service using Docker and Docker Compose.
|
|
|
|
## Prerequisites
|
|
|
|
- Docker Engine 20.10+
|
|
- Docker Compose 2.0+
|
|
- At least 2GB RAM available
|
|
- 10GB free disk space
|
|
|
|
## Quick Deployment
|
|
|
|
### 1. Clone and Setup
|
|
|
|
```bash
|
|
git clone <repository-url>
|
|
cd predictor
|
|
```
|
|
|
|
### 2. Validate Configuration
|
|
|
|
```bash
|
|
# Validate Docker configuration
|
|
./scripts/validate-docker.sh
|
|
```
|
|
|
|
### 3. Deploy
|
|
|
|
```bash
|
|
# Build and start services
|
|
make up-build
|
|
|
|
# Check status
|
|
make ps
|
|
|
|
# View logs
|
|
make logs
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
### Environment Configuration
|
|
|
|
1. **Copy environment template:**
|
|
```bash
|
|
cp cmd/api/.env cmd/api/.env.production
|
|
```
|
|
|
|
2. **Edit production environment:**
|
|
```bash
|
|
nano cmd/api/.env.production
|
|
```
|
|
|
|
3. **Key production settings:**
|
|
```bash
|
|
# Security
|
|
GSN_PREDICTOR_REDIS_PASSWORD=your_secure_password
|
|
|
|
# Performance
|
|
GSN_PREDICTOR_GRIB_PARALLEL=8
|
|
GSN_PREDICTOR_GRIB_CACHE_TTL=2h
|
|
|
|
# Monitoring
|
|
GSN_PREDICTOR_GRIB_UPDATER_INTERVAL=3h
|
|
```
|
|
|
|
### Production Docker Compose
|
|
|
|
Create `docker-compose.prod.yml`:
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
predictor:
|
|
build:
|
|
context: .
|
|
dockerfile: Dockerfile
|
|
container_name: predictor-prod
|
|
ports:
|
|
- "8080:8080"
|
|
env_file:
|
|
- cmd/api/.env.production
|
|
volumes:
|
|
- grib_data:/tmp/grib
|
|
depends_on:
|
|
redis:
|
|
condition: service_healthy
|
|
networks:
|
|
- predictor-network
|
|
restart: unless-stopped
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 1G
|
|
cpus: '0.5'
|
|
reservations:
|
|
memory: 512M
|
|
cpus: '0.25'
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
|
|
redis:
|
|
image: redis:7.2-alpine
|
|
container_name: predictor-redis-prod
|
|
ports:
|
|
- "6379:6379"
|
|
volumes:
|
|
- redis_data:/data
|
|
networks:
|
|
- predictor-network
|
|
restart: unless-stopped
|
|
command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru --requirepass ${GSN_PREDICTOR_REDIS_PASSWORD}
|
|
healthcheck:
|
|
test: ["CMD", "redis-cli", "-a", "${GSN_PREDICTOR_REDIS_PASSWORD}", "ping"]
|
|
interval: 10s
|
|
timeout: 3s
|
|
retries: 5
|
|
start_period: 10s
|
|
|
|
volumes:
|
|
grib_data:
|
|
driver: local
|
|
redis_data:
|
|
driver: local
|
|
|
|
networks:
|
|
predictor-network:
|
|
driver: bridge
|
|
```
|
|
|
|
### Deploy to Production
|
|
|
|
```bash
|
|
# Deploy with production config
|
|
docker-compose -f docker-compose.prod.yml up -d
|
|
|
|
# Monitor deployment
|
|
docker-compose -f docker-compose.prod.yml logs -f
|
|
|
|
# Check health
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
## Kubernetes Deployment
|
|
|
|
### Create Namespace
|
|
|
|
```yaml
|
|
# k8s/namespace.yaml
|
|
apiVersion: v1
|
|
kind: Namespace
|
|
metadata:
|
|
name: predictor
|
|
```
|
|
|
|
### Redis Deployment
|
|
|
|
```yaml
|
|
# k8s/redis.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: redis
|
|
namespace: predictor
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: redis
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: redis
|
|
spec:
|
|
containers:
|
|
- name: redis
|
|
image: redis:7.2-alpine
|
|
ports:
|
|
- containerPort: 6379
|
|
command: ["redis-server", "--appendonly", "yes", "--maxmemory", "512mb", "--maxmemory-policy", "allkeys-lru"]
|
|
volumeMounts:
|
|
- name: redis-data
|
|
mountPath: /data
|
|
resources:
|
|
limits:
|
|
memory: "512Mi"
|
|
cpu: "250m"
|
|
requests:
|
|
memory: "256Mi"
|
|
cpu: "100m"
|
|
livenessProbe:
|
|
exec:
|
|
command: ["redis-cli", "ping"]
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 10
|
|
readinessProbe:
|
|
exec:
|
|
command: ["redis-cli", "ping"]
|
|
initialDelaySeconds: 5
|
|
periodSeconds: 5
|
|
volumes:
|
|
- name: redis-data
|
|
persistentVolumeClaim:
|
|
claimName: redis-pvc
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: redis
|
|
namespace: predictor
|
|
spec:
|
|
selector:
|
|
app: redis
|
|
ports:
|
|
- port: 6379
|
|
targetPort: 6379
|
|
---
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: redis-pvc
|
|
namespace: predictor
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
```
|
|
|
|
### Predictor Deployment
|
|
|
|
```yaml
|
|
# k8s/predictor.yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: predictor
|
|
namespace: predictor
|
|
spec:
|
|
replicas: 2
|
|
selector:
|
|
matchLabels:
|
|
app: predictor
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: predictor
|
|
spec:
|
|
containers:
|
|
- name: predictor
|
|
image: predictor:latest
|
|
ports:
|
|
- containerPort: 8080
|
|
env:
|
|
- name: GSN_PREDICTOR_REDIS_HOST
|
|
value: "redis"
|
|
- name: GSN_PREDICTOR_REDIS_PORT
|
|
value: "6379"
|
|
- name: GSN_PREDICTOR_GRIB_DIR
|
|
value: "/tmp/grib"
|
|
- name: GSN_PREDICTOR_SCHEDULER_ENABLED
|
|
value: "true"
|
|
- name: GSN_PREDICTOR_GRIB_UPDATER_INTERVAL
|
|
value: "6h"
|
|
- name: GSN_PREDICTOR_GRIB_UPDATER_TIMEOUT
|
|
value: "45m"
|
|
volumeMounts:
|
|
- name: grib-data
|
|
mountPath: /tmp/grib
|
|
resources:
|
|
limits:
|
|
memory: "1Gi"
|
|
cpu: "500m"
|
|
requests:
|
|
memory: "512Mi"
|
|
cpu: "250m"
|
|
livenessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 8080
|
|
initialDelaySeconds: 40
|
|
periodSeconds: 30
|
|
readinessProbe:
|
|
httpGet:
|
|
path: /health
|
|
port: 8080
|
|
initialDelaySeconds: 10
|
|
periodSeconds: 10
|
|
volumes:
|
|
- name: grib-data
|
|
emptyDir: {}
|
|
---
|
|
apiVersion: v1
|
|
kind: Service
|
|
metadata:
|
|
name: predictor
|
|
namespace: predictor
|
|
spec:
|
|
selector:
|
|
app: predictor
|
|
ports:
|
|
- port: 80
|
|
targetPort: 8080
|
|
type: LoadBalancer
|
|
```
|
|
|
|
### Deploy to Kubernetes
|
|
|
|
```bash
|
|
# Apply namespace
|
|
kubectl apply -f k8s/namespace.yaml
|
|
|
|
# Apply Redis
|
|
kubectl apply -f k8s/redis.yaml
|
|
|
|
# Wait for Redis to be ready
|
|
kubectl wait --for=condition=ready pod -l app=redis -n predictor
|
|
|
|
# Apply Predictor
|
|
kubectl apply -f k8s/predictor.yaml
|
|
|
|
# Check status
|
|
kubectl get pods -n predictor
|
|
kubectl get services -n predictor
|
|
```
|
|
|
|
## Monitoring and Logging
|
|
|
|
### Health Checks
|
|
|
|
The service includes built-in health checks:
|
|
|
|
```bash
|
|
# Application health
|
|
curl http://localhost:8080/health
|
|
|
|
# Docker health
|
|
docker inspect predictor | jq '.[0].State.Health'
|
|
|
|
# Kubernetes health
|
|
kubectl describe pod -l app=predictor -n predictor
|
|
```
|
|
|
|
### Logging
|
|
|
|
```bash
|
|
# Docker logs
|
|
docker-compose logs -f predictor
|
|
|
|
# Kubernetes logs
|
|
kubectl logs -f deployment/predictor -n predictor
|
|
```
|
|
|
|
### Metrics
|
|
|
|
Consider adding Prometheus metrics:
|
|
|
|
```yaml
|
|
# Add to docker-compose.yml
|
|
prometheus:
|
|
image: prom/prometheus:latest
|
|
ports:
|
|
- "9090:9090"
|
|
volumes:
|
|
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
|
|
networks:
|
|
- predictor-network
|
|
```
|
|
|
|
## Backup and Recovery
|
|
|
|
### Redis Backup
|
|
|
|
```bash
|
|
# Create backup
|
|
docker exec predictor-redis redis-cli BGSAVE
|
|
|
|
# Copy backup file
|
|
docker cp predictor-redis:/data/dump.rdb ./backup/redis-$(date +%Y%m%d).rdb
|
|
```
|
|
|
|
### GRIB Data Backup
|
|
|
|
```bash
|
|
# Backup GRIB data
|
|
docker run --rm -v predictor_grib_data:/data -v $(pwd)/backup:/backup alpine tar czf /backup/grib-$(date +%Y%m%d).tar.gz -C /data .
|
|
```
|
|
|
|
### Automated Backup Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/backup.sh
|
|
|
|
BACKUP_DIR="./backup/$(date +%Y%m%d)"
|
|
mkdir -p $BACKUP_DIR
|
|
|
|
# Redis backup
|
|
docker exec predictor-redis redis-cli BGSAVE
|
|
sleep 5
|
|
docker cp predictor-redis:/data/dump.rdb $BACKUP_DIR/redis.rdb
|
|
|
|
# GRIB data backup
|
|
docker run --rm -v predictor_grib_data:/data -v $(pwd)/$BACKUP_DIR:/backup alpine tar czf /backup/grib.tar.gz -C /data .
|
|
|
|
echo "Backup completed: $BACKUP_DIR"
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Redis Connection Issues:**
|
|
```bash
|
|
# Check Redis status
|
|
docker-compose exec redis redis-cli ping
|
|
|
|
# Check network connectivity
|
|
docker-compose exec predictor wget -O- http://redis:6379
|
|
```
|
|
|
|
2. **GRIB Download Failures:**
|
|
```bash
|
|
# Check disk space
|
|
docker-compose exec predictor df -h /tmp/grib
|
|
|
|
# Check internet connectivity
|
|
docker-compose exec predictor wget -O- https://nomads.ncep.noaa.gov/
|
|
```
|
|
|
|
3. **Memory Issues:**
|
|
```bash
|
|
# Check memory usage
|
|
docker stats
|
|
|
|
# Check container logs
|
|
docker-compose logs predictor | grep -i memory
|
|
```
|
|
|
|
### Performance Tuning
|
|
|
|
1. **Redis Optimization:**
|
|
```bash
|
|
# Increase Redis memory
|
|
GSN_PREDICTOR_REDIS_MAXMEMORY=1gb
|
|
|
|
# Optimize Redis settings
|
|
redis-server --maxmemory 1gb --maxmemory-policy allkeys-lru
|
|
```
|
|
|
|
2. **GRIB Processing:**
|
|
```bash
|
|
# Increase parallel workers
|
|
GSN_PREDICTOR_GRIB_PARALLEL=8
|
|
|
|
# Optimize cache TTL
|
|
GSN_PREDICTOR_GRIB_CACHE_TTL=2h
|
|
```
|
|
|
|
3. **Container Resources:**
|
|
```yaml
|
|
# In docker-compose.yml
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 2G
|
|
cpus: '1.0'
|
|
reservations:
|
|
memory: 1G
|
|
cpus: '0.5'
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Network Security:**
|
|
- Use internal networks for service communication
|
|
- Expose only necessary ports
|
|
- Use reverse proxy for external access
|
|
|
|
2. **Container Security:**
|
|
- Run as non-root user
|
|
- Use minimal base images
|
|
- Regular security updates
|
|
|
|
3. **Data Security:**
|
|
- Encrypt sensitive environment variables
|
|
- Use secrets management for passwords
|
|
- Regular backups
|
|
|
|
4. **Access Control:**
|
|
- Implement API authentication
|
|
- Use HTTPS in production
|
|
- Monitor access logs
|
|
``` |