predictor/README.md
2026-05-23 00:55:35 +09:00

231 lines
9.9 KiB
Markdown

# stratoflights-predictor
High-altitude balloon trajectory prediction service. Forecasts ascent, descent,
and float trajectories from NOAA GFS and GEFS wind data, exposed as a REST API.
The trajectory engine is a propagator-and-constraint system: any flight
profile can be expressed as a chain of propagators (constant-rate ascent,
parachute descent, piecewise rates with absolute / profile-relative /
propagator-relative timing, wind drift) with attached constraints
(scalar comparisons over altitude or time, terrain contact, geographic
polygons). Constraints can stop the profile, hand off to a fallback
propagator, or clip the violated coordinate to the boundary. The legacy
Tawhiri request shape is kept as a compatibility endpoint so existing
clients work unchanged.
## Quick start
```bash
make build # produces bin/{predictor,predictor-cli,compare-tawhiri}
./bin/predictor # downloads ~9 GB of GFS data on first start
./bin/predictor-cli ready
./bin/predictor-cli predict \
launch_latitude=52.2 launch_longitude=0.1 \
launch_datetime=2026-03-28T12:00:00Z \
ascent_rate=5 burst_altitude=30000 descent_rate=5
```
## Configuration
Layered configuration: built-in defaults < YAML file < env vars < CLI flags.
| Setting | Env var | CLI flag | Default |
|---|---|---|---|
| HTTP port | `PREDICTOR_PORT` | `-port` | `8080` |
| Data directory | `PREDICTOR_DATA_DIR` | `-data-dir` | `/tmp/predictor-data` |
| Elevation dataset | `PREDICTOR_ELEVATION_DATASET` | `-elevation` | `/srv/ruaumoko-dataset` |
| Source variant | `PREDICTOR_SOURCE` | | `gfs-0p50-3h` |
| Download parallelism | `PREDICTOR_DOWNLOAD_PARALLEL` | `-download-parallel` | `8` |
| Download bandwidth (bytes/s; 0 = unlimited) | `PREDICTOR_DOWNLOAD_BANDWIDTH` | `-download-bandwidth` | `0` |
| Scheduler interval | `PREDICTOR_UPDATE_INTERVAL` | `-update-interval` | `6h` |
| Dataset freshness TTL | `PREDICTOR_DATASET_TTL` | `-freshness-ttl` | `48h` |
| Metrics enabled | `PREDICTOR_METRICS_ENABLED` | `-metrics` | `true` |
| Metrics HTTP path | `PREDICTOR_METRICS_PATH` | `-metrics-path` | `/metrics` |
| Log level | `PREDICTOR_LOG_LEVEL` | `-log-level` | `info` |
YAML config mirrors the same structure; see `internal/config/config.go`.
Supported source variants:
| `source` | Resolution | Cadence | Notes |
|---|---|---|---|
| `gfs-0p50-3h` | 0.5° | 3h to 192h | historical Tawhiri default |
| `gfs-0p25-3h` | 0.25° | 3h to 192h | |
| `gfs-0p25-1h` | 0.25° | 1h to 120h | |
| `gefs-0p50-3h` | 0.5° | 3h to 192h | 21-member ensemble; each member is a separate dataset |
## REST API
### Tawhiri-compatible (legacy)
`GET /api/v1/prediction` preserves the exact request and response shape of
the upstream Cambridge University Spaceflight predictor.
`GET /ready` returns `{"status":"ok", "dataset_time":"..."}` once a dataset
is loaded.
### Profile-driven (synchronous)
`POST /api/v2/prediction` execute a profile synchronously and return the
trajectory. Request shape:
```json
{
"launch": { "time": "2026-03-28T12:00:00Z", "latitude": 52.2, "longitude": 0.1, "altitude": 0 },
"direction": "forward",
"profile": [
{
"name": "ascent",
"model": { "type": "constant_rate", "rate": 5, "include_wind": true },
"constraints": [{ "type": "altitude", "op": ">=", "limit": 30000 }]
},
{
"name": "descent",
"model": { "type": "parachute_descent", "sea_level_rate": 5, "include_wind": true },
"constraints": [{ "type": "terrain_contact" }]
}
],
"globals": [{ "type": "time", "op": ">", "limit": 1799999999 }]
}
```
Model types: `constant_rate`, `parachute_descent`, `piecewise`, `wind`.
Constraint types: `altitude`, `time`, `terrain_contact`, `polygon`.
Operators: `<`, `<=`, `>`, `>=`, `==`. Actions: `stop` (default), `fallback`, `clip`.
Direction: `forward` (default) or `reverse`.
Piecewise segments support a `reference` field (`absolute`, `profile_start`, or
`propagator_start`) so a single rate schedule can be reused across profiles
with different launch times.
The response includes per-stage trajectories, detailed termination info
(violation state + refined state + constraint name), an `events` array of
non-fatal observations (e.g. `above_model` when altitude exceeded the dataset's
highest pressure level), and dataset metadata.
### Profile-driven (asynchronous)
`POST /api/v1/predictions` enqueue a prediction. Returns `202` with a job ID:
```json
{"id":"842107d9-…","status":"pending","created_at":"…"}
```
`GET /api/v1/predictions/{id}` poll status. When `status == "complete"`,
the response includes a `result` field with the full v2 PredictionResponse.
`DELETE /api/v1/predictions/{id}` cancel a queued job.
A worker pool (`http.async_workers`, default 4) services the queue; completed
results are retained for `http.async_result_ttl` (default 1h).
### Dataset admin
```
GET /api/v1/admin/datasets list stored datasets (epoch, subset, coverage, loaded?)
POST /api/v1/admin/datasets trigger a download
DELETE /api/v1/admin/datasets/{filename} delete by filename (DatasetID.Filename())
GET /api/v1/admin/jobs list every download job
GET /api/v1/admin/jobs/{id} fetch one job
DELETE /api/v1/admin/jobs/{id} cancel a running download
GET /api/v1/admin/status consolidated status (uptime, mem, goroutines, jobs, datasets)
```
Trigger-download body:
```json
{
"epoch": "2026-03-28T06:00:00Z",
"subset": {
"region": { "min_lat": -10, "max_lat": 10, "min_lng": 0, "max_lng": 30 },
"hour_range": { "min_hour": 0, "max_hour": 72 },
"members": [5]
}
}
```
`{"latest": true}` is a shortcut that refreshes the latest global dataset
for the configured source. Each `(epoch, subset)` combination is a
separate dataset; the loader auto-selects which loaded dataset covers a
given prediction query.
### Metrics
`GET /metrics` Prometheus text exposition. Counters:
`predictor_predictions_total{profile,status}`, `predictor_downloads_total`,
`predictor_download_bytes_total`, and a gauge
`predictor_active_dataset_epoch_seconds`.
## Architecture
```
cmd/
predictor/ main server
predictor-cli/ HTTP client
compare-tawhiri/ end-to-end validation against the public Tawhiri instance
internal/
numerics/ pure numerical primitives (interp, bisect, RK4, refinement)
engine/ propagator + constraint system + concrete models + registry
weather/ WindField interface; gfs/ — variant-parameterized GFS file format + WindField
datasets/ Source / Storage / Manager + transactional, resumable, subsettable downloads
grib/ — shared GRIB downloader skeleton (idx parser, HTTP, parallel blit)
gfs/ — GFS Source (URL templating only)
gefs/ — GEFS Source (URL templating + member resolution)
elevation/ ruaumoko-format ground elevation reader
config/ layered file+env+CLI config
metrics/ Sink interface + Prometheus text impl
api/ HTTP transport
tawhiri/ — legacy v1 endpoint via ogen
v2/ — synchronous profile-driven endpoint
async/ — asynchronous prediction jobs
admin/ — dataset + service-status endpoints
httpjson/ — tiny JSON response helpers
middleware/
api/rest/predictor.swagger.yml OpenAPI 3 spec for v1 + /ready
pkg/rest/ ogen-generated code (regenerate via `make generate-ogen`)
docs/numerics.tex end-to-end mathematical reference
scripts/build_elevation.py ETOPO 2022 → ruaumoko converter
```
## Subsetting and ensembles
Each stored dataset is identified by `DatasetID = (epoch, subset)`. A subset
restricts the data fetched by region, forecast-hour range, or ensemble
member. The downloader honours the subset (skipping out-of-range
forecast steps; member-selecting URLs for GEFS), the storage tracks each
subset as a separate file (filename includes a deterministic subset key),
and the Manager exposes coverage so per-query dataset selection picks the
right one.
## Deployment
Local single instance, Docker container, or load-balanced cluster behind a
shared filesystem for the dataset cache. The async API stores results
in-memory only; for cluster deployments with sticky sessions, ensure
clients poll the same node they submitted to.
## Validation
`./bin/compare-tawhiri --server http://localhost:8080` runs an identical
prediction against the local server and the public SondeHub Tawhiri
instance, reporting the great-circle distance between landing points.
## Numerical methods
`docs/numerics.tex` is the complete mathematical reference: state vector,
equations of motion (constant rate, parachute drag, piecewise, wind
transport), numerical methods (multilinear interpolation, bisection,
classical RK4, binary-search termination refinement), constraint
geometry (scalar comparisons, point-in-polygon with antimeridian
handling), and design notes on the deferred items (WGS84/ECEF
coordinate system, mass-aware drift, Monte Carlo).
## References
- [Tawhiri](https://github.com/cuspaceflight/tawhiri) reference Python/Cython predictor
- [ruaumoko](https://github.com/cuspaceflight/ruaumoko) global elevation dataset format
- [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast)
- [NOAA GEFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-ensemble-forecast)
- [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model)
- [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) public Tawhiri instance