# Deploying stratoflights-predictor The predictor is a single static Go binary with no database and no required external services. It downloads NOAA GFS/GEFS wind data to **node-local disk** and serves the REST API (see `/docs` or `api/rest/predictor.swagger.yml`). It is an **internal backend**: the public entrypoint is the stratoflights API gateway, which calls the predictor over an internal overlay network. The predictor enforces no auth of its own. ## Environments | Environment | File | Notes | |---|---|---| | Local dev | `docker-compose.yml` | one instance, metrics off, named volume | | Staging (single host) | `docker-compose.staging.yml` | all features + bundled Prometheus | | Production (Swarm) | `docker-compose.swarm.yml` | node-pinned, replicated, metrics | ```bash # Local docker compose up --build curl localhost:8080/ready # Staging (single host, exercises the metrics pipeline) docker compose -f docker-compose.staging.yml up --build # Prometheus at :9090, predictor target should be UP # Production — see below ``` ## Production (Docker Swarm) ### Storage and node placement — the important part The wind dataset is ~8.9 GiB (0.5°) and must live on **local disk, never NFS**. To bound the number of copies, the service is pinned to nodes carrying the `predictor.data=true` label; **label at most two nodes**. Each labelled node keeps exactly one copy under a node-local bind mount. On **each** labelled node, provision the local directories and a writable owner for the non-root container (uid:gid `65532:65532`): ```bash sudo mkdir -p /srv/predictor/data /srv/predictor/elevation sudo chown -R 65532:65532 /srv/predictor # (optional) seed the elevation dataset so descent terminates at ground level: # python3 scripts/build_elevation.py /srv/predictor/elevation/ruaumoko-dataset ``` Label the two storage nodes: ```bash docker node update --label-add predictor.data=true docker node update --label-add predictor.data=true ``` Replicas are spread one-per-node by default (redundancy across both copies). Scaling to multiple replicas **per** node is safe: they share the node-local volume and coordinate the download with an exclusive `flock`, so only one process per node fetches the dataset — the others wait and load the committed file. To scale: `docker service scale predictor_predictor=4` (≤2 per node). ### Network The gateway and Prometheus reach the predictor over a shared overlay. Create it once and have the gateway stack join the same external network: ```bash docker network create -d overlay --attachable stratoflights-net ``` The service is published only on that network under the alias `predictor` (`http://predictor:8080`). No public Traefik router — the gateway is the edge. ### Deploy Via the CI pipeline (recommended): push a `v*` tag → the image is built and the stack is deployed through the Swarmpit API. Manually: ```bash TAG=v1.0.0 docker stack deploy -c docker-compose.swarm.yml --with-registry-auth predictor ``` or import `docker-compose.swarm.yml` into Swarmpit and set `TAG`. ### Configuration All settings are env vars (file/env/flag precedence; see README). Production defaults are in `docker-compose.swarm.yml`: | Variable | Purpose | |---|---| | `PREDICTOR_DATA_DIR=/data` | node-local dataset dir (bind mount) | | `PREDICTOR_ELEVATION_DATASET=/srv/ruaumoko-dataset` | optional terrain data | | `PREDICTOR_SOURCE=gfs-0p50-3h` | `gfs-0p50-3h`, `gfs-0p25-3h`, `gfs-0p25-1h`, `gefs-0p50-3h` | | `PREDICTOR_DOWNLOAD_PARALLEL=16` | concurrent GRIB downloads | | `PREDICTOR_UPDATE_INTERVAL=6h` | forecast refresh cadence | | `PREDICTOR_METRICS_ENABLED=true` | expose `/metrics` | No Docker secrets are needed — the predictor has no database or credentials. ### Health - `GET /health` — liveness (always 200 while the process runs). The container `HEALTHCHECK` calls the binary's `-healthcheck` mode (no curl in the image). - `GET /ready` — readiness (200 only once a dataset is loaded). The gateway should gate traffic on this; Swarm does **not** kill a container that is still performing its first download thanks to the 120s `start_period`. ### Metrics `/metrics` exposes Prometheus counters (`predictor_predictions_total`, `predictor_downloads_total`, `predictor_download_bytes_total`) and the `predictor_active_dataset_epoch_seconds` gauge. The service carries `prometheus.scrape/port/path` deploy labels for Swarm service discovery; point your central Prometheus at the `stratoflights-net` network. ## CI/CD (Forgejo → Swarmpit) `.forgejo/workflows/ci-cd.yml`: 1. **test** (every push/PR): `gofmt` check, `go vet`, `go build`, `go test -race`. 2. **build** (develop branch and `v*` tags): buildx `linux/amd64` image pushed to `git.intra.yksa.space/web/predictor` (`:develop`, or `:` + `:latest`). 3. **deploy-staging** (develop) / **deploy-production** (`v*` tags): deploy `docker-compose.swarm.yml` to the environment's Swarmpit stack via `deploy/swarmpit-deploy.sh`. Configure runner secrets (scope staging/production via Forgejo environments): - `REGISTRY_USERNAME`, `REGISTRY_PASSWORD` — container registry - `SWARMPIT_URL`, `SWARMPIT_TOKEN`, `STACK_NAME` — Swarmpit deploy target - `CA_CERTIFICATES` — optional PEM bundle if Swarmpit uses a private CA Cut a release: ```bash git tag v1.0.0 && git push origin v1.0.0 ``` ## Operations ```bash docker service ls --filter label=com.docker.stack.namespace=predictor docker service logs -f predictor_predictor docker service scale predictor_predictor=2 # ≤2 per labelled node docker service rollback predictor_predictor ``` Trigger a dataset refresh or inspect jobs through the admin API: ```bash curl -X POST http://predictor:8080/api/v1/admin/datasets -d '{"latest":true}' curl http://predictor:8080/api/v1/admin/jobs curl http://predictor:8080/api/v1/admin/status ```