engine refactor

2026-05-23 00:55:35 +09:00 · 2026-05-23 00:55:35 +09:00 · 81b8e763bd
commit 81b8e763bd
parent 9e663db9dc
37 changed files with 3532 additions and 1639 deletions
--- a/README.md
+++ b/README.md
@ -1,27 +1,25 @@
 # stratoflights-predictor

 High-altitude balloon trajectory prediction service. Forecasts ascent, descent,
-and float trajectories from NOAA GFS wind data, exposed as a REST API.
+and float trajectories from NOAA GFS and GEFS wind data, exposed as a REST API.

 The trajectory engine is a propagator-and-constraint system: any flight
 profile can be expressed as a chain of propagators (constant-rate ascent,
-parachute descent, piecewise rates, wind drift) with attached constraints
-(altitude, time, terrain contact). The legacy Tawhiri request shape is kept
-as a compatibility endpoint so existing clients work unchanged.
+parachute descent, piecewise rates with absolute / profile-relative /
+propagator-relative timing, wind drift) with attached constraints
+(scalar comparisons over altitude or time, terrain contact, geographic
+polygons). Constraints can stop the profile, hand off to a fallback
+propagator, or clip the violated coordinate to the boundary. The legacy
+Tawhiri request shape is kept as a compatibility endpoint so existing
+clients work unchanged.

 ## Quick start

 ```bash
-# Build all three binaries (server, CLI, validation tool)
-make build
+make build           # produces bin/{predictor,predictor-cli,compare-tawhiri}
+./bin/predictor      # downloads ~9 GB of GFS data on first start

-# Run the server (first start downloads ~9 GB of GFS data over 30-60 min)
-./bin/predictor
-
-# Check readiness
 ./bin/predictor-cli ready
-
-# Run a Tawhiri-style prediction
 ./bin/predictor-cli predict \
  launch_latitude=52.2 launch_longitude=0.1 \
  launch_datetime=2026-03-28T12:00:00Z \
@ -30,16 +28,14 @@ make build

 ## Configuration

-Configuration is layered: built-in defaults, then a YAML file
-(`--config path.yml` or `PREDICTOR_CONFIG_FILE=path.yml`), then env vars,
-then CLI flags. Flags override env vars override file values override defaults.
+Layered configuration: built-in defaults < YAML file < env vars < CLI flags.

 | Setting | Env var | CLI flag | Default |
 |---|---|---|---|
 | HTTP port | `PREDICTOR_PORT` | `-port` | `8080` |
 | Data directory | `PREDICTOR_DATA_DIR` | `-data-dir` | `/tmp/predictor-data` |
 | Elevation dataset | `PREDICTOR_ELEVATION_DATASET` | `-elevation` | `/srv/ruaumoko-dataset` |
-| Source | `PREDICTOR_SOURCE` | — | `noaa-gfs-0p50` |
+| Source variant | `PREDICTOR_SOURCE` | — | `gfs-0p50-3h` |
 | Download parallelism | `PREDICTOR_DOWNLOAD_PARALLEL` | `-download-parallel` | `8` |
 | Download bandwidth (bytes/s; 0 = unlimited) | `PREDICTOR_DOWNLOAD_BANDWIDTH` | `-download-bandwidth` | `0` |
 | Scheduler interval | `PREDICTOR_UPDATE_INTERVAL` | `-update-interval` | `6h` |
@ -48,227 +44,188 @@ then CLI flags. Flags override env vars override file values override defaults.
 | Metrics HTTP path | `PREDICTOR_METRICS_PATH` | `-metrics-path` | `/metrics` |
 | Log level | `PREDICTOR_LOG_LEVEL` | `-log-level` | `info` |

-A YAML config file mirrors the same structure:
+YAML config mirrors the same structure; see `internal/config/config.go`.

-```yaml
-http:
-  port: 8080
-data:
-  dir: /var/lib/predictor
-  elevation_path: /var/lib/predictor/elevation
-  source: noaa-gfs-0p50
-download:
-  parallel: 8
-  bandwidth_bytes_per_second: 0
-  update_interval: 6h
-  freshness_ttl: 48h
-metrics:
-  enabled: true
-  path: /metrics
-log:
-  level: info
-```
+Supported source variants:
+
+| `source` | Resolution | Cadence | Notes |
+|---|---|---|---|
+| `gfs-0p50-3h` | 0.5° | 3h to 192h | historical Tawhiri default |
+| `gfs-0p25-3h` | 0.25° | 3h to 192h | |
+| `gfs-0p25-1h` | 0.25° | 1h to 120h | |
+| `gefs-0p50-3h` | 0.5° | 3h to 192h | 21-member ensemble; each member is a separate dataset |

 ## REST API

-### Tawhiri-compatible
+### Tawhiri-compatible (legacy)

 `GET /api/v1/prediction` — preserves the exact request and response shape of
-the upstream Cambridge University Spaceflight predictor. Query parameters:
+the upstream Cambridge University Spaceflight predictor.

-| Parameter | Required | Description |
-|---|---|---|
-| `launch_latitude` | yes | Degrees, -90 to 90 |
-| `launch_longitude` | yes | Degrees, -180 to 180 or 0 to 360 |
-| `launch_datetime` | yes | RFC 3339 |
-| `launch_altitude` | no | Metres ASL (default 0) |
-| `profile` | no | `standard_profile` (default) or `float_profile` |
-| `ascent_rate` | no | m/s (default 5) |
-| `burst_altitude` | no | Metres (default 28000) |
-| `descent_rate` | no | m/s (default 5) |
-| `float_altitude` | no | Metres (float profile only) |
-| `stop_datetime` | no | Float-profile end time |
+`GET /ready` — returns `{"status":"ok", "dataset_time":"..."}` once a dataset
+is loaded.

-`GET /ready` — returns `{"status": "ok", "dataset_time": "..."}` once a
-dataset is loaded; `{"status": "not_ready", ...}` before then.
+### Profile-driven (synchronous)

-### Profile-driven (new primary)
-
-`POST /api/v2/prediction` — accepts an arbitrary chain of propagators with
-optional constraints. Useful when the frontend wants flight profiles the
-Tawhiri shape can't express (e.g. piecewise rates, fallback on constraint
-violation).
+`POST /api/v2/prediction` — execute a profile synchronously and return the
+trajectory. Request shape:

 ```json
 {
-  "launch": {
-    "time": "2026-03-28T12:00:00Z",
-    "latitude": 52.2,
-    "longitude": 0.1,
-    "altitude": 0
-  },
+  "launch": { "time": "2026-03-28T12:00:00Z", "latitude": 52.2, "longitude": 0.1, "altitude": 0 },
+  "direction": "forward",
  "profile": [
    {
      "name": "ascent",
-      "model": {"type": "constant_rate", "rate": 5, "include_wind": true},
-      "constraints": [{"type": "max_altitude", "limit": 30000}]
+      "model": { "type": "constant_rate", "rate": 5, "include_wind": true },
+      "constraints": [{ "type": "altitude", "op": ">=", "limit": 30000 }]
    },
    {
      "name": "descent",
-      "model": {"type": "parachute_descent", "sea_level_rate": 5, "include_wind": true},
-      "constraints": [{"type": "terrain_contact"}]
+      "model": { "type": "parachute_descent", "sea_level_rate": 5, "include_wind": true },
+      "constraints": [{ "type": "terrain_contact" }]
    }
-  ]
+  ],
+  "globals": [{ "type": "time", "op": ">", "limit": 1799999999 }]
 }
 ```

 Model types: `constant_rate`, `parachute_descent`, `piecewise`, `wind`.
-Constraint types: `max_altitude`, `min_altitude`, `max_time`,
-`terrain_contact`. Constraint actions: `stop` (default), `fallback`, `clip`.
-Set `"direction": "reverse"` to integrate backward from a known landing.
+Constraint types: `altitude`, `time`, `terrain_contact`, `polygon`.
+Operators: `<`, `<=`, `>`, `>=`, `==`. Actions: `stop` (default), `fallback`, `clip`.
+Direction: `forward` (default) or `reverse`.
+
+Piecewise segments support a `reference` field (`absolute`, `profile_start`, or
+`propagator_start`) so a single rate schedule can be reused across profiles
+with different launch times.
+
+The response includes per-stage trajectories, detailed termination info
+(violation state + refined state + constraint name), an `events` array of
+non-fatal observations (e.g. `above_model` when altitude exceeded the dataset's
+highest pressure level), and dataset metadata.
+
+### Profile-driven (asynchronous)
+
+`POST /api/v1/predictions` — enqueue a prediction. Returns `202` with a job ID:
+
+```json
+{"id":"842107d9-…","status":"pending","created_at":"…"}
+```
+
+`GET /api/v1/predictions/{id}` — poll status. When `status == "complete"`,
+the response includes a `result` field with the full v2 PredictionResponse.
+
+`DELETE /api/v1/predictions/{id}` — cancel a queued job.
+
+A worker pool (`http.async_workers`, default 4) services the queue; completed
+results are retained for `http.async_result_ttl` (default 1h).

 ### Dataset admin

 ```
-GET    /api/v1/admin/datasets                  list stored epochs
-POST   /api/v1/admin/datasets {epoch | latest} trigger a download
-DELETE /api/v1/admin/datasets/{epoch}          delete a stored dataset
-GET    /api/v1/admin/jobs                      list every job
+GET    /api/v1/admin/datasets                  list stored datasets (epoch, subset, coverage, loaded?)
+POST   /api/v1/admin/datasets                  trigger a download
+DELETE /api/v1/admin/datasets/{filename}       delete by filename (DatasetID.Filename())
+GET    /api/v1/admin/jobs                      list every download job
 GET    /api/v1/admin/jobs/{id}                 fetch one job
-DELETE /api/v1/admin/jobs/{id}                 cancel a running job
+DELETE /api/v1/admin/jobs/{id}                 cancel a running download
+GET    /api/v1/admin/status                    consolidated status (uptime, mem, goroutines, jobs, datasets)
 ```

-Returns `JobInfo`:
+Trigger-download body:

 ```json
-{"id":"…","source":"noaa-gfs-0p50","epoch":"…","status":"running",
- "started_at":"…","total_units":130,"done_units":47,"bytes":510000000}
+{
+  "epoch": "2026-03-28T06:00:00Z",
+  "subset": {
+    "region": { "min_lat": -10, "max_lat": 10, "min_lng": 0, "max_lng": 30 },
+    "hour_range": { "min_hour": 0, "max_hour": 72 },
+    "members": [5]
+  }
+}
 ```

+`{"latest": true}` is a shortcut that refreshes the latest global dataset
+for the configured source. Each `(epoch, subset)` combination is a
+separate dataset; the loader auto-selects which loaded dataset covers a
+given prediction query.
+
 ### Metrics

 `GET /metrics` — Prometheus text exposition. Counters:
-`predictor_predictions_total{profile,status}`,
-`predictor_downloads_total{source,status}`,
-`predictor_download_bytes_total{source}`,
-and a gauge `predictor_active_dataset_epoch_seconds`.
+`predictor_predictions_total{profile,status}`, `predictor_downloads_total`,
+`predictor_download_bytes_total`, and a gauge
+`predictor_active_dataset_epoch_seconds`.

 ## Architecture

 ```
 cmd/
-  predictor/main.go                main server entry point
-  predictor-cli/main.go            HTTP client
-  compare-tawhiri/main.go          end-to-end validation against the public Tawhiri instance
+  predictor/                       main server
+  predictor-cli/                   HTTP client
+  compare-tawhiri/                 end-to-end validation against the public Tawhiri instance
 internal/
  numerics/                        pure numerical primitives (interp, bisect, RK4, refinement)
-  engine/                          propagator + constraint system + concrete models
-  weather/                         WindField interface; gfs/ — NOAA GFS file format + impl
-  datasets/                        Source/Storage/Manager + transactional, resumable downloads
-                                   gfs/ — NOAA GFS source impl
+  engine/                          propagator + constraint system + concrete models + registry
+  weather/                         WindField interface; gfs/ — variant-parameterized GFS file format + WindField
+  datasets/                        Source / Storage / Manager + transactional, resumable, subsettable downloads
+                                   grib/  — shared GRIB downloader skeleton (idx parser, HTTP, parallel blit)
+                                   gfs/   — GFS Source (URL templating only)
+                                   gefs/  — GEFS Source (URL templating + member resolution)
  elevation/                       ruaumoko-format ground elevation reader
  config/                          layered file+env+CLI config
  metrics/                         Sink interface + Prometheus text impl
  api/                             HTTP transport
-                                   tawhiri/ — legacy v1 endpoint via ogen
-                                   v2/      — profile-driven endpoint
-                                   admin/   — dataset/job admin endpoints
+                                   tawhiri/   — legacy v1 endpoint via ogen
+                                   v2/        — synchronous profile-driven endpoint
+                                   async/     — asynchronous prediction jobs
+                                   admin/     — dataset + service-status endpoints
+                                   httpjson/  — tiny JSON response helpers
                                   middleware/
 api/rest/predictor.swagger.yml     OpenAPI 3 spec for v1 + /ready
 pkg/rest/                          ogen-generated code (regenerate via `make generate-ogen`)
-docs/numerics.tex                  LaTeX math reference for the numerics package
+docs/numerics.tex                  end-to-end mathematical reference
 scripts/build_elevation.py         ETOPO 2022 → ruaumoko converter
 ```

+## Subsetting and ensembles
+
+Each stored dataset is identified by `DatasetID = (epoch, subset)`. A subset
+restricts the data fetched by region, forecast-hour range, or ensemble
+member. The downloader honours the subset (skipping out-of-range
+forecast steps; member-selecting URLs for GEFS), the storage tracks each
+subset as a separate file (filename includes a deterministic subset key),
+and the Manager exposes coverage so per-query dataset selection picks the
+right one.
+
 ## Deployment

-### Local single instance
-
-```bash
-./bin/predictor --data-dir /var/lib/predictor
-```
-
-No external dependencies beyond the NOAA S3 mirror.
-
-### Docker single container
-
-```dockerfile
-FROM golang:1.25 AS build
-WORKDIR /src
-COPY . .
-RUN go build -o /predictor ./cmd/predictor
-
-FROM gcr.io/distroless/base
-COPY --from=build /predictor /predictor
-EXPOSE 8080
-ENTRYPOINT ["/predictor"]
-```
-
-Mount a volume at `/data` and set `PREDICTOR_DATA_DIR=/data`.
-
-### Load-balanced cluster
-
-The server is stateless apart from the on-disk dataset cache and in-memory
-job table. For multiple replicas, point all replicas at a shared filesystem
-(NFS or similar) for `data_dir`; each replica reads-only its own mmap. Active
-download coordination across replicas is not implemented — run downloads on
-one node, or accept that two nodes may download the same epoch concurrently
-(only one Commit wins via atomic rename).
-
-## Elevation dataset
-
-Without elevation data, descent terminates at sea level. With elevation,
-descent terminates at ground level, matching upstream Tawhiri.
-
-```bash
-pip install xarray netcdf4 numpy
-python3 scripts/build_elevation.py /var/lib/predictor/elevation
-```
-
-`PREDICTOR_ELEVATION_DATASET=/var/lib/predictor/elevation ./bin/predictor`
-
-## Numerical methods
-
-The numerics package (`internal/numerics`) provides:
-
- regular-grid multilinear interpolation,
- monotone bisection,
- classical RK4 (forward and reverse time),
- binary-search refinement of a termination point.
-
-Detailed math reference: `docs/numerics.tex`. The package has no
-domain dependencies and is small enough for manual verification (~300
-lines of Go), enabling a future C or Rust port without changes to the
-trajectory engine.
-
-## Wind data
-
-| Property | Value |
-|---|---|
-| Source | NOAA GFS, S3 mirror (`noaa-gfs-bdp-pds.s3.amazonaws.com`) |
-| Resolution | 0.5° |
-| Grid | 361 × 720 (lat × lng) |
-| Forecast steps | 65 (every 3 hours, 0–192h) |
-| Pressure levels | 47 (1000 → 1 hPa) |
-| Variables | Geopotential height, U-wind, V-wind |
-| File size | ~8.87 GiB (float32 flat binary, mmap-backed) |
-| Update cadence | every 6 hours |
-
-Downloads use HTTP Range requests against `.idx` index files to fetch only
-the needed GRIB messages. Downloads are transactional (temp file, manifest,
-atomic rename on commit) and resumable: interrupted downloads pick up where
-they left off via the manifest.
+Local single instance, Docker container, or load-balanced cluster behind a
+shared filesystem for the dataset cache. The async API stores results
+in-memory only; for cluster deployments with sticky sessions, ensure
+clients poll the same node they submitted to.

 ## Validation

 `./bin/compare-tawhiri --server http://localhost:8080` runs an identical
-prediction against the local server and against the public SondeHub Tawhiri
+prediction against the local server and the public SondeHub Tawhiri
 instance, reporting the great-circle distance between landing points.

+## Numerical methods
+
+`docs/numerics.tex` is the complete mathematical reference: state vector,
+equations of motion (constant rate, parachute drag, piecewise, wind
+transport), numerical methods (multilinear interpolation, bisection,
+classical RK4, binary-search termination refinement), constraint
+geometry (scalar comparisons, point-in-polygon with antimeridian
+handling), and design notes on the deferred items (WGS84/ECEF
+coordinate system, mass-aware drift, Monte Carlo).
+
 ## References

 - [Tawhiri](https://github.com/cuspaceflight/tawhiri) — reference Python/Cython predictor
 - [ruaumoko](https://github.com/cuspaceflight/ruaumoko) — global elevation dataset format
 - [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast)
+- [NOAA GEFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-ensemble-forecast)
 - [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model)
 - [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) — public Tawhiri instance