5.8 KiB
Deploying stratoflights-predictor
The predictor is a single static Go binary with no database and no required
external services. It downloads NOAA GFS/GEFS wind data to node-local disk
and serves the REST API (see /docs or api/rest/predictor.swagger.yml).
It is an internal backend: the public entrypoint is the stratoflights API gateway, which calls the predictor over an internal overlay network. The predictor enforces no auth of its own.
Environments
| Environment | File | Notes |
|---|---|---|
| Local dev | docker-compose.yml |
one instance, metrics off, named volume |
| Staging (single host) | docker-compose.staging.yml |
all features + bundled Prometheus |
| Production (Swarm) | docker-compose.swarm.yml |
node-pinned, replicated, metrics |
# Local
docker compose up --build
curl localhost:8080/ready
# Staging (single host, exercises the metrics pipeline)
docker compose -f docker-compose.staging.yml up --build
# Prometheus at :9090, predictor target should be UP
# Production — see below
Production (Docker Swarm)
Storage and node placement — the important part
The wind dataset is ~8.9 GiB (0.5°) and must live on local disk, never NFS.
To bound the number of copies, the service is pinned to nodes carrying the
predictor.data=true label; label at most two nodes. Each labelled node
keeps exactly one copy under a node-local bind mount.
On each labelled node, provision the local directories and a writable owner
for the non-root container (uid:gid 65532:65532):
sudo mkdir -p /srv/predictor/data /srv/predictor/elevation
sudo chown -R 65532:65532 /srv/predictor
# (optional) seed the elevation dataset so descent terminates at ground level:
# python3 scripts/build_elevation.py /srv/predictor/elevation/ruaumoko-dataset
Label the two storage nodes:
docker node update --label-add predictor.data=true <node-a>
docker node update --label-add predictor.data=true <node-b>
Replicas are spread one-per-node by default (redundancy across both copies).
Scaling to multiple replicas per node is safe: they share the node-local
volume and coordinate the download with an exclusive flock, so only one
process per node fetches the dataset — the others wait and load the committed
file. To scale: docker service scale predictor_predictor=4 (≤2 per node).
Network
The gateway and Prometheus reach the predictor over a shared overlay. Create it once and have the gateway stack join the same external network:
docker network create -d overlay --attachable stratoflights-net
The service is published only on that network under the alias predictor
(http://predictor:8080). No public Traefik router — the gateway is the edge.
Deploy
Via the CI pipeline (recommended): push a v* tag → the image is built and the
stack is deployed through the Swarmpit API. Manually:
TAG=v1.0.0 docker stack deploy -c docker-compose.swarm.yml --with-registry-auth predictor
or import docker-compose.swarm.yml into Swarmpit and set TAG.
Configuration
All settings are env vars (file/env/flag precedence; see README). Production
defaults are in docker-compose.swarm.yml:
| Variable | Purpose |
|---|---|
PREDICTOR_DATA_DIR=/data |
node-local dataset dir (bind mount) |
PREDICTOR_ELEVATION_DATASET=/srv/ruaumoko-dataset |
optional terrain data |
PREDICTOR_SOURCE=gfs-0p50-3h |
gfs-0p50-3h, gfs-0p25-3h, gfs-0p25-1h, gefs-0p50-3h |
PREDICTOR_DOWNLOAD_PARALLEL=16 |
concurrent GRIB downloads |
PREDICTOR_UPDATE_INTERVAL=6h |
forecast refresh cadence |
PREDICTOR_METRICS_ENABLED=true |
expose /metrics |
No Docker secrets are needed — the predictor has no database or credentials.
Health
GET /health— liveness (always 200 while the process runs). The containerHEALTHCHECKcalls the binary's-healthcheckmode (no curl in the image).GET /ready— readiness (200 only once a dataset is loaded). The gateway should gate traffic on this; Swarm does not kill a container that is still performing its first download thanks to the 120sstart_period.
Metrics
/metrics exposes Prometheus counters (predictor_predictions_total,
predictor_downloads_total, predictor_download_bytes_total) and the
predictor_active_dataset_epoch_seconds gauge. The service carries
prometheus.scrape/port/path deploy labels for Swarm service discovery; point
your central Prometheus at the stratoflights-net network.
CI/CD (Forgejo → Swarmpit)
.forgejo/workflows/ci-cd.yml:
- test (every push/PR):
gofmtcheck,go vet,go build,go test -race. - build (develop branch and
v*tags): buildxlinux/amd64image pushed togit.intra.yksa.space/web/predictor(:develop, or:<version>+:latest). - deploy-staging (develop) / deploy-production (
v*tags): deploydocker-compose.swarm.ymlto the environment's Swarmpit stack viadeploy/swarmpit-deploy.sh.
Configure runner secrets (scope staging/production via Forgejo environments):
REGISTRY_USERNAME,REGISTRY_PASSWORD— container registrySWARMPIT_URL,SWARMPIT_TOKEN,STACK_NAME— Swarmpit deploy targetCA_CERTIFICATES— optional PEM bundle if Swarmpit uses a private CA
Cut a release:
git tag v1.0.0 && git push origin v1.0.0
Operations
docker service ls --filter label=com.docker.stack.namespace=predictor
docker service logs -f predictor_predictor
docker service scale predictor_predictor=2 # ≤2 per labelled node
docker service rollback predictor_predictor
Trigger a dataset refresh or inspect jobs through the admin API:
curl -X POST http://predictor:8080/api/v1/admin/datasets -d '{"latest":true}'
curl http://predictor:8080/api/v1/admin/jobs
curl http://predictor:8080/api/v1/admin/status