predictor/README.md
2026-05-18 03:17:17 +09:00

274 lines
9.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# stratoflights-predictor
High-altitude balloon trajectory prediction service. Forecasts ascent, descent,
and float trajectories from NOAA GFS wind data, exposed as a REST API.
The trajectory engine is a propagator-and-constraint system: any flight
profile can be expressed as a chain of propagators (constant-rate ascent,
parachute descent, piecewise rates, wind drift) with attached constraints
(altitude, time, terrain contact). The legacy Tawhiri request shape is kept
as a compatibility endpoint so existing clients work unchanged.
## Quick start
```bash
# Build all three binaries (server, CLI, validation tool)
make build
# Run the server (first start downloads ~9 GB of GFS data over 30-60 min)
./bin/predictor
# Check readiness
./bin/predictor-cli ready
# Run a Tawhiri-style prediction
./bin/predictor-cli predict \
launch_latitude=52.2 launch_longitude=0.1 \
launch_datetime=2026-03-28T12:00:00Z \
ascent_rate=5 burst_altitude=30000 descent_rate=5
```
## Configuration
Configuration is layered: built-in defaults, then a YAML file
(`--config path.yml` or `PREDICTOR_CONFIG_FILE=path.yml`), then env vars,
then CLI flags. Flags override env vars override file values override defaults.
| Setting | Env var | CLI flag | Default |
|---|---|---|---|
| HTTP port | `PREDICTOR_PORT` | `-port` | `8080` |
| Data directory | `PREDICTOR_DATA_DIR` | `-data-dir` | `/tmp/predictor-data` |
| Elevation dataset | `PREDICTOR_ELEVATION_DATASET` | `-elevation` | `/srv/ruaumoko-dataset` |
| Source | `PREDICTOR_SOURCE` | — | `noaa-gfs-0p50` |
| Download parallelism | `PREDICTOR_DOWNLOAD_PARALLEL` | `-download-parallel` | `8` |
| Download bandwidth (bytes/s; 0 = unlimited) | `PREDICTOR_DOWNLOAD_BANDWIDTH` | `-download-bandwidth` | `0` |
| Scheduler interval | `PREDICTOR_UPDATE_INTERVAL` | `-update-interval` | `6h` |
| Dataset freshness TTL | `PREDICTOR_DATASET_TTL` | `-freshness-ttl` | `48h` |
| Metrics enabled | `PREDICTOR_METRICS_ENABLED` | `-metrics` | `true` |
| Metrics HTTP path | `PREDICTOR_METRICS_PATH` | `-metrics-path` | `/metrics` |
| Log level | `PREDICTOR_LOG_LEVEL` | `-log-level` | `info` |
A YAML config file mirrors the same structure:
```yaml
http:
port: 8080
data:
dir: /var/lib/predictor
elevation_path: /var/lib/predictor/elevation
source: noaa-gfs-0p50
download:
parallel: 8
bandwidth_bytes_per_second: 0
update_interval: 6h
freshness_ttl: 48h
metrics:
enabled: true
path: /metrics
log:
level: info
```
## REST API
### Tawhiri-compatible
`GET /api/v1/prediction` — preserves the exact request and response shape of
the upstream Cambridge University Spaceflight predictor. Query parameters:
| Parameter | Required | Description |
|---|---|---|
| `launch_latitude` | yes | Degrees, -90 to 90 |
| `launch_longitude` | yes | Degrees, -180 to 180 or 0 to 360 |
| `launch_datetime` | yes | RFC 3339 |
| `launch_altitude` | no | Metres ASL (default 0) |
| `profile` | no | `standard_profile` (default) or `float_profile` |
| `ascent_rate` | no | m/s (default 5) |
| `burst_altitude` | no | Metres (default 28000) |
| `descent_rate` | no | m/s (default 5) |
| `float_altitude` | no | Metres (float profile only) |
| `stop_datetime` | no | Float-profile end time |
`GET /ready` — returns `{"status": "ok", "dataset_time": "..."}` once a
dataset is loaded; `{"status": "not_ready", ...}` before then.
### Profile-driven (new primary)
`POST /api/v2/prediction` — accepts an arbitrary chain of propagators with
optional constraints. Useful when the frontend wants flight profiles the
Tawhiri shape can't express (e.g. piecewise rates, fallback on constraint
violation).
```json
{
"launch": {
"time": "2026-03-28T12:00:00Z",
"latitude": 52.2,
"longitude": 0.1,
"altitude": 0
},
"profile": [
{
"name": "ascent",
"model": {"type": "constant_rate", "rate": 5, "include_wind": true},
"constraints": [{"type": "max_altitude", "limit": 30000}]
},
{
"name": "descent",
"model": {"type": "parachute_descent", "sea_level_rate": 5, "include_wind": true},
"constraints": [{"type": "terrain_contact"}]
}
]
}
```
Model types: `constant_rate`, `parachute_descent`, `piecewise`, `wind`.
Constraint types: `max_altitude`, `min_altitude`, `max_time`,
`terrain_contact`. Constraint actions: `stop` (default), `fallback`, `clip`.
Set `"direction": "reverse"` to integrate backward from a known landing.
### Dataset admin
```
GET /api/v1/admin/datasets list stored epochs
POST /api/v1/admin/datasets {epoch | latest} trigger a download
DELETE /api/v1/admin/datasets/{epoch} delete a stored dataset
GET /api/v1/admin/jobs list every job
GET /api/v1/admin/jobs/{id} fetch one job
DELETE /api/v1/admin/jobs/{id} cancel a running job
```
Returns `JobInfo`:
```json
{"id":"…","source":"noaa-gfs-0p50","epoch":"…","status":"running",
"started_at":"…","total_units":130,"done_units":47,"bytes":510000000}
```
### Metrics
`GET /metrics` — Prometheus text exposition. Counters:
`predictor_predictions_total{profile,status}`,
`predictor_downloads_total{source,status}`,
`predictor_download_bytes_total{source}`,
and a gauge `predictor_active_dataset_epoch_seconds`.
## Architecture
```
cmd/
predictor/main.go main server entry point
predictor-cli/main.go HTTP client
compare-tawhiri/main.go end-to-end validation against the public Tawhiri instance
internal/
numerics/ pure numerical primitives (interp, bisect, RK4, refinement)
engine/ propagator + constraint system + concrete models
weather/ WindField interface; gfs/ — NOAA GFS file format + impl
datasets/ Source/Storage/Manager + transactional, resumable downloads
gfs/ — NOAA GFS source impl
elevation/ ruaumoko-format ground elevation reader
config/ layered file+env+CLI config
metrics/ Sink interface + Prometheus text impl
api/ HTTP transport
tawhiri/ — legacy v1 endpoint via ogen
v2/ — profile-driven endpoint
admin/ — dataset/job admin endpoints
middleware/
api/rest/predictor.swagger.yml OpenAPI 3 spec for v1 + /ready
pkg/rest/ ogen-generated code (regenerate via `make generate-ogen`)
docs/numerics.tex LaTeX math reference for the numerics package
scripts/build_elevation.py ETOPO 2022 → ruaumoko converter
```
## Deployment
### Local single instance
```bash
./bin/predictor --data-dir /var/lib/predictor
```
No external dependencies beyond the NOAA S3 mirror.
### Docker single container
```dockerfile
FROM golang:1.25 AS build
WORKDIR /src
COPY . .
RUN go build -o /predictor ./cmd/predictor
FROM gcr.io/distroless/base
COPY --from=build /predictor /predictor
EXPOSE 8080
ENTRYPOINT ["/predictor"]
```
Mount a volume at `/data` and set `PREDICTOR_DATA_DIR=/data`.
### Load-balanced cluster
The server is stateless apart from the on-disk dataset cache and in-memory
job table. For multiple replicas, point all replicas at a shared filesystem
(NFS or similar) for `data_dir`; each replica reads-only its own mmap. Active
download coordination across replicas is not implemented — run downloads on
one node, or accept that two nodes may download the same epoch concurrently
(only one Commit wins via atomic rename).
## Elevation dataset
Without elevation data, descent terminates at sea level. With elevation,
descent terminates at ground level, matching upstream Tawhiri.
```bash
pip install xarray netcdf4 numpy
python3 scripts/build_elevation.py /var/lib/predictor/elevation
```
`PREDICTOR_ELEVATION_DATASET=/var/lib/predictor/elevation ./bin/predictor`
## Numerical methods
The numerics package (`internal/numerics`) provides:
- regular-grid multilinear interpolation,
- monotone bisection,
- classical RK4 (forward and reverse time),
- binary-search refinement of a termination point.
Detailed math reference: `docs/numerics.tex`. The package has no
domain dependencies and is small enough for manual verification (~300
lines of Go), enabling a future C or Rust port without changes to the
trajectory engine.
## Wind data
| Property | Value |
|---|---|
| Source | NOAA GFS, S3 mirror (`noaa-gfs-bdp-pds.s3.amazonaws.com`) |
| Resolution | 0.5° |
| Grid | 361 × 720 (lat × lng) |
| Forecast steps | 65 (every 3 hours, 0192h) |
| Pressure levels | 47 (1000 → 1 hPa) |
| Variables | Geopotential height, U-wind, V-wind |
| File size | ~8.87 GiB (float32 flat binary, mmap-backed) |
| Update cadence | every 6 hours |
Downloads use HTTP Range requests against `.idx` index files to fetch only
the needed GRIB messages. Downloads are transactional (temp file, manifest,
atomic rename on commit) and resumable: interrupted downloads pick up where
they left off via the manifest.
## Validation
`./bin/compare-tawhiri --server http://localhost:8080` runs an identical
prediction against the local server and against the public SondeHub Tawhiri
instance, reporting the great-circle distance between landing points.
## References
- [Tawhiri](https://github.com/cuspaceflight/tawhiri) — reference Python/Cython predictor
- [ruaumoko](https://github.com/cuspaceflight/ruaumoko) — global elevation dataset format
- [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast)
- [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model)
- [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) — public Tawhiri instance