predictor/README.md
2026-05-23 00:55:35 +09:00

9.9 KiB

stratoflights-predictor

High-altitude balloon trajectory prediction service. Forecasts ascent, descent, and float trajectories from NOAA GFS and GEFS wind data, exposed as a REST API.

The trajectory engine is a propagator-and-constraint system: any flight profile can be expressed as a chain of propagators (constant-rate ascent, parachute descent, piecewise rates with absolute / profile-relative / propagator-relative timing, wind drift) with attached constraints (scalar comparisons over altitude or time, terrain contact, geographic polygons). Constraints can stop the profile, hand off to a fallback propagator, or clip the violated coordinate to the boundary. The legacy Tawhiri request shape is kept as a compatibility endpoint so existing clients work unchanged.

Quick start

make build           # produces bin/{predictor,predictor-cli,compare-tawhiri}
./bin/predictor      # downloads ~9 GB of GFS data on first start

./bin/predictor-cli ready
./bin/predictor-cli predict \
  launch_latitude=52.2 launch_longitude=0.1 \
  launch_datetime=2026-03-28T12:00:00Z \
  ascent_rate=5 burst_altitude=30000 descent_rate=5

Configuration

Layered configuration: built-in defaults < YAML file < env vars < CLI flags.

Setting Env var CLI flag Default
HTTP port PREDICTOR_PORT -port 8080
Data directory PREDICTOR_DATA_DIR -data-dir /tmp/predictor-data
Elevation dataset PREDICTOR_ELEVATION_DATASET -elevation /srv/ruaumoko-dataset
Source variant PREDICTOR_SOURCE gfs-0p50-3h
Download parallelism PREDICTOR_DOWNLOAD_PARALLEL -download-parallel 8
Download bandwidth (bytes/s; 0 = unlimited) PREDICTOR_DOWNLOAD_BANDWIDTH -download-bandwidth 0
Scheduler interval PREDICTOR_UPDATE_INTERVAL -update-interval 6h
Dataset freshness TTL PREDICTOR_DATASET_TTL -freshness-ttl 48h
Metrics enabled PREDICTOR_METRICS_ENABLED -metrics true
Metrics HTTP path PREDICTOR_METRICS_PATH -metrics-path /metrics
Log level PREDICTOR_LOG_LEVEL -log-level info

YAML config mirrors the same structure; see internal/config/config.go.

Supported source variants:

source Resolution Cadence Notes
gfs-0p50-3h 0.5° 3h to 192h historical Tawhiri default
gfs-0p25-3h 0.25° 3h to 192h
gfs-0p25-1h 0.25° 1h to 120h
gefs-0p50-3h 0.5° 3h to 192h 21-member ensemble; each member is a separate dataset

REST API

Tawhiri-compatible (legacy)

GET /api/v1/prediction — preserves the exact request and response shape of the upstream Cambridge University Spaceflight predictor.

GET /ready — returns {"status":"ok", "dataset_time":"..."} once a dataset is loaded.

Profile-driven (synchronous)

POST /api/v2/prediction — execute a profile synchronously and return the trajectory. Request shape:

{
  "launch": { "time": "2026-03-28T12:00:00Z", "latitude": 52.2, "longitude": 0.1, "altitude": 0 },
  "direction": "forward",
  "profile": [
    {
      "name": "ascent",
      "model": { "type": "constant_rate", "rate": 5, "include_wind": true },
      "constraints": [{ "type": "altitude", "op": ">=", "limit": 30000 }]
    },
    {
      "name": "descent",
      "model": { "type": "parachute_descent", "sea_level_rate": 5, "include_wind": true },
      "constraints": [{ "type": "terrain_contact" }]
    }
  ],
  "globals": [{ "type": "time", "op": ">", "limit": 1799999999 }]
}

Model types: constant_rate, parachute_descent, piecewise, wind. Constraint types: altitude, time, terrain_contact, polygon. Operators: <, <=, >, >=, ==. Actions: stop (default), fallback, clip. Direction: forward (default) or reverse.

Piecewise segments support a reference field (absolute, profile_start, or propagator_start) so a single rate schedule can be reused across profiles with different launch times.

The response includes per-stage trajectories, detailed termination info (violation state + refined state + constraint name), an events array of non-fatal observations (e.g. above_model when altitude exceeded the dataset's highest pressure level), and dataset metadata.

Profile-driven (asynchronous)

POST /api/v1/predictions — enqueue a prediction. Returns 202 with a job ID:

{"id":"842107d9-…","status":"pending","created_at":"…"}

GET /api/v1/predictions/{id} — poll status. When status == "complete", the response includes a result field with the full v2 PredictionResponse.

DELETE /api/v1/predictions/{id} — cancel a queued job.

A worker pool (http.async_workers, default 4) services the queue; completed results are retained for http.async_result_ttl (default 1h).

Dataset admin

GET    /api/v1/admin/datasets                  list stored datasets (epoch, subset, coverage, loaded?)
POST   /api/v1/admin/datasets                  trigger a download
DELETE /api/v1/admin/datasets/{filename}       delete by filename (DatasetID.Filename())
GET    /api/v1/admin/jobs                      list every download job
GET    /api/v1/admin/jobs/{id}                 fetch one job
DELETE /api/v1/admin/jobs/{id}                 cancel a running download
GET    /api/v1/admin/status                    consolidated status (uptime, mem, goroutines, jobs, datasets)

Trigger-download body:

{
  "epoch": "2026-03-28T06:00:00Z",
  "subset": {
    "region": { "min_lat": -10, "max_lat": 10, "min_lng": 0, "max_lng": 30 },
    "hour_range": { "min_hour": 0, "max_hour": 72 },
    "members": [5]
  }
}

{"latest": true} is a shortcut that refreshes the latest global dataset for the configured source. Each (epoch, subset) combination is a separate dataset; the loader auto-selects which loaded dataset covers a given prediction query.

Metrics

GET /metrics — Prometheus text exposition. Counters: predictor_predictions_total{profile,status}, predictor_downloads_total, predictor_download_bytes_total, and a gauge predictor_active_dataset_epoch_seconds.

Architecture

cmd/
  predictor/                       main server
  predictor-cli/                   HTTP client
  compare-tawhiri/                 end-to-end validation against the public Tawhiri instance
internal/
  numerics/                        pure numerical primitives (interp, bisect, RK4, refinement)
  engine/                          propagator + constraint system + concrete models + registry
  weather/                         WindField interface; gfs/ — variant-parameterized GFS file format + WindField
  datasets/                        Source / Storage / Manager + transactional, resumable, subsettable downloads
                                   grib/  — shared GRIB downloader skeleton (idx parser, HTTP, parallel blit)
                                   gfs/   — GFS Source (URL templating only)
                                   gefs/  — GEFS Source (URL templating + member resolution)
  elevation/                       ruaumoko-format ground elevation reader
  config/                          layered file+env+CLI config
  metrics/                         Sink interface + Prometheus text impl
  api/                             HTTP transport
                                   tawhiri/   — legacy v1 endpoint via ogen
                                   v2/        — synchronous profile-driven endpoint
                                   async/     — asynchronous prediction jobs
                                   admin/     — dataset + service-status endpoints
                                   httpjson/  — tiny JSON response helpers
                                   middleware/
api/rest/predictor.swagger.yml     OpenAPI 3 spec for v1 + /ready
pkg/rest/                          ogen-generated code (regenerate via `make generate-ogen`)
docs/numerics.tex                  end-to-end mathematical reference
scripts/build_elevation.py         ETOPO 2022 → ruaumoko converter

Subsetting and ensembles

Each stored dataset is identified by DatasetID = (epoch, subset). A subset restricts the data fetched by region, forecast-hour range, or ensemble member. The downloader honours the subset (skipping out-of-range forecast steps; member-selecting URLs for GEFS), the storage tracks each subset as a separate file (filename includes a deterministic subset key), and the Manager exposes coverage so per-query dataset selection picks the right one.

Deployment

Local single instance, Docker container, or load-balanced cluster behind a shared filesystem for the dataset cache. The async API stores results in-memory only; for cluster deployments with sticky sessions, ensure clients poll the same node they submitted to.

Validation

./bin/compare-tawhiri --server http://localhost:8080 runs an identical prediction against the local server and the public SondeHub Tawhiri instance, reporting the great-circle distance between landing points.

Numerical methods

docs/numerics.tex is the complete mathematical reference: state vector, equations of motion (constant rate, parachute drag, piecewise, wind transport), numerical methods (multilinear interpolation, bisection, classical RK4, binary-search termination refinement), constraint geometry (scalar comparisons, point-in-polygon with antimeridian handling), and design notes on the deferred items (WGS84/ECEF coordinate system, mass-aware drift, Monte Carlo).

References