This commit is contained in:
Anatoly Antonov 2026-05-18 03:17:17 +09:00
parent 7a8d5d13fa
commit 9e663db9dc
68 changed files with 5647 additions and 2958 deletions

415
README.md
View file

@ -1,261 +1,274 @@
# Balloon Trajectory Predictor
# stratoflights-predictor
High-altitude balloon trajectory prediction service. Predicts ascent, burst, and descent trajectories using GFS wind forecast data from NOAA.
High-altitude balloon trajectory prediction service. Forecasts ascent, descent,
and float trajectories from NOAA GFS wind data, exposed as a REST API.
The prediction algorithms are an exact port of [Tawhiri](https://github.com/cuspaceflight/tawhiri) (Cambridge University Spaceflight) to Go, verified to produce identical results.
The trajectory engine is a propagator-and-constraint system: any flight
profile can be expressed as a chain of propagators (constant-rate ascent,
parachute descent, piecewise rates, wind drift) with attached constraints
(altitude, time, terrain contact). The legacy Tawhiri request shape is kept
as a compatibility endpoint so existing clients work unchanged.
## Quick Start
## Quick start
```bash
# Build
# Build all three binaries (server, CLI, validation tool)
make build
# Run (downloads ~9 GB of GFS data on first start, takes 30-60 min)
PREDICTOR_DATA_DIR=/tmp/predictor-data go run ./cmd/api
# Run the server (first start downloads ~9 GB of GFS data over 30-60 min)
./bin/predictor
# Check readiness
curl http://localhost:8080/ready
./bin/predictor-cli ready
# Run a prediction
curl 'http://localhost:8080/api/v1/prediction?launch_latitude=52.2&launch_longitude=0.1&launch_datetime=2026-03-28T12:00:00Z&launch_altitude=0&ascent_rate=5&burst_altitude=30000&descent_rate=5'
# Run a Tawhiri-style prediction
./bin/predictor-cli predict \
launch_latitude=52.2 launch_longitude=0.1 \
launch_datetime=2026-03-28T12:00:00Z \
ascent_rate=5 burst_altitude=30000 descent_rate=5
```
## Configuration
All configuration is via environment variables.
Configuration is layered: built-in defaults, then a YAML file
(`--config path.yml` or `PREDICTOR_CONFIG_FILE=path.yml`), then env vars,
then CLI flags. Flags override env vars override file values override defaults.
| Variable | Default | Description |
|---|---|---|
| `PREDICTOR_PORT` | `8080` | HTTP server port |
| `PREDICTOR_DATA_DIR` | `/tmp/predictor-data` | Directory for wind datasets and temp files |
| `PREDICTOR_DOWNLOAD_PARALLEL` | `8` | Max concurrent GRIB download goroutines |
| `PREDICTOR_UPDATE_INTERVAL` | `6h` | How often to check for new forecasts |
| `PREDICTOR_DATASET_TTL` | `48h` | Max age before a dataset is considered stale |
| `PREDICTOR_ELEVATION_DATASET` | `/srv/ruaumoko-dataset` | Path to elevation dataset (optional) |
| Setting | Env var | CLI flag | Default |
|---|---|---|---|
| HTTP port | `PREDICTOR_PORT` | `-port` | `8080` |
| Data directory | `PREDICTOR_DATA_DIR` | `-data-dir` | `/tmp/predictor-data` |
| Elevation dataset | `PREDICTOR_ELEVATION_DATASET` | `-elevation` | `/srv/ruaumoko-dataset` |
| Source | `PREDICTOR_SOURCE` | — | `noaa-gfs-0p50` |
| Download parallelism | `PREDICTOR_DOWNLOAD_PARALLEL` | `-download-parallel` | `8` |
| Download bandwidth (bytes/s; 0 = unlimited) | `PREDICTOR_DOWNLOAD_BANDWIDTH` | `-download-bandwidth` | `0` |
| Scheduler interval | `PREDICTOR_UPDATE_INTERVAL` | `-update-interval` | `6h` |
| Dataset freshness TTL | `PREDICTOR_DATASET_TTL` | `-freshness-ttl` | `48h` |
| Metrics enabled | `PREDICTOR_METRICS_ENABLED` | `-metrics` | `true` |
| Metrics HTTP path | `PREDICTOR_METRICS_PATH` | `-metrics-path` | `/metrics` |
| Log level | `PREDICTOR_LOG_LEVEL` | `-log-level` | `info` |
## API
A YAML config file mirrors the same structure:
### `GET /api/v1/prediction`
```yaml
http:
port: 8080
data:
dir: /var/lib/predictor
elevation_path: /var/lib/predictor/elevation
source: noaa-gfs-0p50
download:
parallel: 8
bandwidth_bytes_per_second: 0
update_interval: 6h
freshness_ttl: 48h
metrics:
enabled: true
path: /metrics
log:
level: info
```
Run a balloon trajectory prediction.
## REST API
**Parameters** (query string):
### Tawhiri-compatible
`GET /api/v1/prediction` — preserves the exact request and response shape of
the upstream Cambridge University Spaceflight predictor. Query parameters:
| Parameter | Required | Description |
|---|---|---|
| `launch_latitude` | yes | Launch latitude in degrees (-90 to 90) |
| `launch_longitude` | yes | Launch longitude in degrees (-180 to 180 or 0 to 360) |
| `launch_datetime` | yes | Launch time in RFC 3339 format |
| `launch_altitude` | no | Launch altitude in metres ASL (default: 0) |
| `launch_latitude` | yes | Degrees, -90 to 90 |
| `launch_longitude` | yes | Degrees, -180 to 180 or 0 to 360 |
| `launch_datetime` | yes | RFC 3339 |
| `launch_altitude` | no | Metres ASL (default 0) |
| `profile` | no | `standard_profile` (default) or `float_profile` |
| `ascent_rate` | no | Ascent rate in m/s (default: 5) |
| `burst_altitude` | no | Burst altitude in metres (default: 28000) |
| `descent_rate` | no | Sea-level descent rate in m/s (default: 5) |
| `float_altitude` | no | Float altitude in metres (float_profile only) |
| `stop_datetime` | no | Float end time (float_profile only, default: +24h) |
| `ascent_rate` | no | m/s (default 5) |
| `burst_altitude` | no | Metres (default 28000) |
| `descent_rate` | no | m/s (default 5) |
| `float_altitude` | no | Metres (float profile only) |
| `stop_datetime` | no | Float-profile end time |
**Response** (Tawhiri-compatible):
`GET /ready` — returns `{"status": "ok", "dataset_time": "..."}` once a
dataset is loaded; `{"status": "not_ready", ...}` before then.
### Profile-driven (new primary)
`POST /api/v2/prediction` — accepts an arbitrary chain of propagators with
optional constraints. Useful when the frontend wants flight profiles the
Tawhiri shape can't express (e.g. piecewise rates, fallback on constraint
violation).
```json
{
"prediction": [
"launch": {
"time": "2026-03-28T12:00:00Z",
"latitude": 52.2,
"longitude": 0.1,
"altitude": 0
},
"profile": [
{
"stage": "ascent",
"trajectory": [
{"datetime": "2026-03-28T12:00:00Z", "latitude": 52.2, "longitude": 0.1, "altitude": 0},
...
]
"name": "ascent",
"model": {"type": "constant_rate", "rate": 5, "include_wind": true},
"constraints": [{"type": "max_altitude", "limit": 30000}]
},
{
"stage": "descent",
"trajectory": [...]
"name": "descent",
"model": {"type": "parachute_descent", "sea_level_rate": 5, "include_wind": true},
"constraints": [{"type": "terrain_contact"}]
}
],
"metadata": {
"start_datetime": "...",
"complete_datetime": "..."
},
"request": {
"dataset": "2026-03-28T06:00:00Z",
"launch_latitude": 52.2,
...
}
]
}
```
### `GET /ready`
Model types: `constant_rate`, `parachute_descent`, `piecewise`, `wind`.
Constraint types: `max_altitude`, `min_altitude`, `max_time`,
`terrain_contact`. Constraint actions: `stop` (default), `fallback`, `clip`.
Set `"direction": "reverse"` to integrate backward from a known landing.
Health check. Returns `{"status": "ok"}` when a dataset is loaded.
### Dataset admin
## Elevation Dataset
Without elevation data, descent terminates at sea level (altitude <= 0). With elevation data, descent terminates at ground level, matching Tawhiri's behaviour.
### Building the elevation dataset
The elevation dataset uses ETOPO 2022 at 30 arc-second resolution, converted to a ruaumoko-compatible binary format (21601 x 43200 grid of int16 little-endian elevation values in metres).
**Requirements**: Python 3, xarray, netcdf4, numpy.
```bash
pip install xarray netcdf4 numpy
# Downloads ~1.1 GB from NOAA, produces ~1.74 GB binary file
python3 scripts/build_elevation.py /tmp/predictor-data/ruaumoko-dataset
```
GET /api/v1/admin/datasets list stored epochs
POST /api/v1/admin/datasets {epoch | latest} trigger a download
DELETE /api/v1/admin/datasets/{epoch} delete a stored dataset
GET /api/v1/admin/jobs list every job
GET /api/v1/admin/jobs/{id} fetch one job
DELETE /api/v1/admin/jobs/{id} cancel a running job
```
To skip the download if you already have the ETOPO NetCDF file:
Returns `JobInfo`:
```bash
ETOPO_NC_PATH=/path/to/ETOPO_2022_v1_30s_N90W180_surface.nc \
python3 scripts/build_elevation.py /tmp/predictor-data/ruaumoko-dataset
```json
{"id":"…","source":"noaa-gfs-0p50","epoch":"…","status":"running",
"started_at":"…","total_units":130,"done_units":47,"bytes":510000000}
```
The ETOPO 2022 NetCDF can be manually downloaded from:
https://www.ncei.noaa.gov/products/etopo-global-relief-model
### Metrics
### Using the elevation dataset
```bash
PREDICTOR_ELEVATION_DATASET=/tmp/predictor-data/ruaumoko-dataset go run ./cmd/api
```
If the file doesn't exist or can't be read, the service starts normally with a warning and falls back to sea-level termination.
`GET /metrics` — Prometheus text exposition. Counters:
`predictor_predictions_total{profile,status}`,
`predictor_downloads_total{source,status}`,
`predictor_download_bytes_total{source}`,
and a gauge `predictor_active_dataset_epoch_seconds`.
## Architecture
```
cmd/api/main.go Entry point, config, scheduler, HTTP server
cmd/
predictor/main.go main server entry point
predictor-cli/main.go HTTP client
compare-tawhiri/main.go end-to-end validation against the public Tawhiri instance
internal/
dataset/
dataset.go Shape constants, pressure levels, S3 URLs
file.go mmap-backed dataset file (read/write/blit)
downloader/
downloader.go S3 partial GRIB download (idx + range requests)
idx.go NOAA .idx file parser
config.go Environment-based configuration
elevation/
elevation.go Ruaumoko-compatible elevation dataset (mmap int16)
prediction/
interpolate.go 4D wind interpolation (time, lat, lon, altitude)
solver.go RK4 integrator with binary search termination
models.go Ascent, descent, wind models; flight profiles
warnings.go Prediction warning counters
service/
service.go Dataset lifecycle, concurrent-safe access
transport/
middleware/log.go Request logging middleware
rest/
handler/handler.go ogen API handler implementation
handler/deps.go Service interface
transport.go ogen HTTP server, CORS
api/rest/predictor.swagger.yml OpenAPI 3.0 spec
pkg/rest/ Generated ogen code (17 files)
scripts/
build_elevation.py ETOPO 2022 to ruaumoko converter
numerics/ pure numerical primitives (interp, bisect, RK4, refinement)
engine/ propagator + constraint system + concrete models
weather/ WindField interface; gfs/ — NOAA GFS file format + impl
datasets/ Source/Storage/Manager + transactional, resumable downloads
gfs/ — NOAA GFS source impl
elevation/ ruaumoko-format ground elevation reader
config/ layered file+env+CLI config
metrics/ Sink interface + Prometheus text impl
api/ HTTP transport
tawhiri/ — legacy v1 endpoint via ogen
v2/ — profile-driven endpoint
admin/ — dataset/job admin endpoints
middleware/
api/rest/predictor.swagger.yml OpenAPI 3 spec for v1 + /ready
pkg/rest/ ogen-generated code (regenerate via `make generate-ogen`)
docs/numerics.tex LaTeX math reference for the numerics package
scripts/build_elevation.py ETOPO 2022 → ruaumoko converter
```
## Wind Dataset
## Deployment
The service downloads GFS 0.5-degree forecast data from NOAA S3:
### Local single instance
```bash
./bin/predictor --data-dir /var/lib/predictor
```
No external dependencies beyond the NOAA S3 mirror.
### Docker single container
```dockerfile
FROM golang:1.25 AS build
WORKDIR /src
COPY . .
RUN go build -o /predictor ./cmd/predictor
FROM gcr.io/distroless/base
COPY --from=build /predictor /predictor
EXPOSE 8080
ENTRYPOINT ["/predictor"]
```
Mount a volume at `/data` and set `PREDICTOR_DATA_DIR=/data`.
### Load-balanced cluster
The server is stateless apart from the on-disk dataset cache and in-memory
job table. For multiple replicas, point all replicas at a shared filesystem
(NFS or similar) for `data_dir`; each replica reads-only its own mmap. Active
download coordination across replicas is not implemented — run downloads on
one node, or accept that two nodes may download the same epoch concurrently
(only one Commit wins via atomic rename).
## Elevation dataset
Without elevation data, descent terminates at sea level. With elevation,
descent terminates at ground level, matching upstream Tawhiri.
```bash
pip install xarray netcdf4 numpy
python3 scripts/build_elevation.py /var/lib/predictor/elevation
```
`PREDICTOR_ELEVATION_DATASET=/var/lib/predictor/elevation ./bin/predictor`
## Numerical methods
The numerics package (`internal/numerics`) provides:
- regular-grid multilinear interpolation,
- monotone bisection,
- classical RK4 (forward and reverse time),
- binary-search refinement of a termination point.
Detailed math reference: `docs/numerics.tex`. The package has no
domain dependencies and is small enough for manual verification (~300
lines of Go), enabling a future C or Rust port without changes to the
trajectory engine.
## Wind data
| Property | Value |
|---|---|
| Source | `noaa-gfs-bdp-pds.s3.amazonaws.com` |
| Resolution | 0.5 degrees |
| Grid | 361 lat x 720 lon |
| Time steps | 65 (every 3 hours, 0-192h) |
| Pressure levels | 47 (1000 to 1 hPa) |
| Source | NOAA GFS, S3 mirror (`noaa-gfs-bdp-pds.s3.amazonaws.com`) |
| Resolution | 0.5° |
| Grid | 361 × 720 (lat × lng) |
| Forecast steps | 65 (every 3 hours, 0192h) |
| Pressure levels | 47 (1000 1 hPa) |
| Variables | Geopotential height, U-wind, V-wind |
| Dataset size | 9,528,667,200 bytes (~8.87 GiB) |
| Update cadence | Every 6 hours (GFS runs at 00, 06, 12, 18 UTC) |
| File size | ~8.87 GiB (float32 flat binary, mmap-backed) |
| Update cadence | every 6 hours |
Data is downloaded using HTTP Range requests against `.idx` index files, fetching only the needed GRIB messages (HGT, UGRD, VGRD at 47 pressure levels). Full download takes 30-60 minutes depending on bandwidth.
Downloads use HTTP Range requests against `.idx` index files to fetch only
the needed GRIB messages. Downloads are transactional (temp file, manifest,
atomic rename on commit) and resumable: interrupted downloads pick up where
they left off via the manifest.
The dataset is stored as a memory-mapped flat binary file of float32 values in C-order with shape `(65, 47, 3, 361, 720)`.
## Validation
## Prediction Algorithms
All algorithms are exact ports of the reference implementations in Tawhiri. The following sections describe the key components.
### Interpolation (`internal/prediction/interpolate.go`)
4D wind interpolation from the dataset grid to arbitrary coordinates.
1. **Trilinear weights** (`pick3`): compute 8 interpolation weights for the (hour, lat, lon) cube corners.
2. **Altitude search** (`search`): binary search on interpolated geopotential height to find the two pressure levels bracketing the target altitude.
3. **Wind extraction** (`interp4`): 8-point weighted sum at each bracket level, then linear interpolation between levels.
Reference: `tawhiri/interpolate.pyx`
### Solver (`internal/prediction/solver.go`)
4th-order Runge-Kutta integrator with dt = 60 seconds.
- State vector: (latitude, longitude, altitude) in degrees and metres.
- Time: UNIX timestamp in seconds.
- Longitude is kept in [0, 360) via Python-style modulo after each `vecadd`.
- When a terminator fires, binary search refinement (tolerance 0.01) finds the precise termination point between the last good step and the first terminated step.
- Longitude interpolation (`lngLerp`) handles the 0/360 wrap-around.
Reference: `tawhiri/solver.pyx`
### Models (`internal/prediction/models.go`)
- **Constant ascent**: vertical velocity = ascent_rate m/s.
- **Drag descent**: NASA atmosphere density model with drag coefficient = sea_level_rate * 1.1045. Descent rate increases with altitude due to thinner air.
- **Wind velocity**: u, v components from interpolation converted to degrees/second: `dlat = (180/pi) * v / (R)`, `dlng = (180/pi) * u / (R * cos(lat))` where R = 6371009 + altitude.
- **Linear model**: sum of component models (e.g., wind + ascent).
- **Elevation termination**: `ground_elevation > altitude` using ruaumoko dataset.
Reference: `tawhiri/models.py`
### Profiles
- **standard_profile**: ascent (constant rate + wind) until burst altitude, then descent (drag + wind) until ground level.
- **float_profile**: ascent to float altitude, then drift at constant altitude until stop time.
## Verification
The predictor has been verified against the reference Tawhiri implementation:
| Test | Result |
|---|---|
| Dataset (step 0): 36.6M float32 values vs Python/cfgrib | 0 mismatches, max diff = 0.0 |
| Prediction burst point vs public Tawhiri API | Identical (lat, lon, alt all match) |
| Prediction landing point vs public Tawhiri API | Identical lat/lon, 5m altitude diff (different elevation datasets) |
| Descent point count | Identical (46 points) |
| Ascent point count | Identical (101 points) |
## Development
```bash
# Regenerate ogen API code after modifying the swagger spec
make generate-ogen
# Run tests
make test
# Format
make fmt
```
### Comparison tools
```bash
# Compare single dataset step against Python/cfgrib reference
go run ./cmd/compare_step0 <run_YYYYMMDDHH> <output_path>
# Run prediction and compare against public Tawhiri API
go run ./cmd/compare_prediction
```
`./bin/compare-tawhiri --server http://localhost:8080` runs an identical
prediction against the local server and against the public SondeHub Tawhiri
instance, reporting the great-circle distance between landing points.
## References
- [Tawhiri](https://github.com/cuspaceflight/tawhiri) — Reference Python/Cython predictor (Cambridge University Spaceflight)
- [tawhiri-downloader](https://github.com/cuspaceflight/tawhiri-downloader) — OCaml dataset downloader
- [ruaumoko](https://github.com/cuspaceflight/ruaumoko) — Global elevation dataset
- [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast) — Global Forecast System
- [NOAA GFS on S3](https://noaa-gfs-bdp-pds.s3.amazonaws.com/index.html) — Public S3 bucket
- [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model) — Global relief model for elevation data
- [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) — Public Tawhiri instance for comparison
- [Tawhiri](https://github.com/cuspaceflight/tawhiri) — reference Python/Cython predictor
- [ruaumoko](https://github.com/cuspaceflight/ruaumoko) — global elevation dataset format
- [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast)
- [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model)
- [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) — public Tawhiri instance