predictor/README.md
2026-03-28 03:07:13 +09:00

261 lines
9.9 KiB
Markdown

# Balloon Trajectory Predictor
High-altitude balloon trajectory prediction service. Predicts ascent, burst, and descent trajectories using GFS wind forecast data from NOAA.
The prediction algorithms are an exact port of [Tawhiri](https://github.com/cuspaceflight/tawhiri) (Cambridge University Spaceflight) to Go, verified to produce identical results.
## Quick Start
```bash
# Build
make build
# Run (downloads ~9 GB of GFS data on first start, takes 30-60 min)
PREDICTOR_DATA_DIR=/tmp/predictor-data go run ./cmd/api
# Check readiness
curl http://localhost:8080/ready
# Run a prediction
curl 'http://localhost:8080/api/v1/prediction?launch_latitude=52.2&launch_longitude=0.1&launch_datetime=2026-03-28T12:00:00Z&launch_altitude=0&ascent_rate=5&burst_altitude=30000&descent_rate=5'
```
## Configuration
All configuration is via environment variables.
| Variable | Default | Description |
|---|---|---|
| `PREDICTOR_PORT` | `8080` | HTTP server port |
| `PREDICTOR_DATA_DIR` | `/tmp/predictor-data` | Directory for wind datasets and temp files |
| `PREDICTOR_DOWNLOAD_PARALLEL` | `8` | Max concurrent GRIB download goroutines |
| `PREDICTOR_UPDATE_INTERVAL` | `6h` | How often to check for new forecasts |
| `PREDICTOR_DATASET_TTL` | `48h` | Max age before a dataset is considered stale |
| `PREDICTOR_ELEVATION_DATASET` | `/srv/ruaumoko-dataset` | Path to elevation dataset (optional) |
## API
### `GET /api/v1/prediction`
Run a balloon trajectory prediction.
**Parameters** (query string):
| Parameter | Required | Description |
|---|---|---|
| `launch_latitude` | yes | Launch latitude in degrees (-90 to 90) |
| `launch_longitude` | yes | Launch longitude in degrees (-180 to 180 or 0 to 360) |
| `launch_datetime` | yes | Launch time in RFC 3339 format |
| `launch_altitude` | no | Launch altitude in metres ASL (default: 0) |
| `profile` | no | `standard_profile` (default) or `float_profile` |
| `ascent_rate` | no | Ascent rate in m/s (default: 5) |
| `burst_altitude` | no | Burst altitude in metres (default: 28000) |
| `descent_rate` | no | Sea-level descent rate in m/s (default: 5) |
| `float_altitude` | no | Float altitude in metres (float_profile only) |
| `stop_datetime` | no | Float end time (float_profile only, default: +24h) |
**Response** (Tawhiri-compatible):
```json
{
"prediction": [
{
"stage": "ascent",
"trajectory": [
{"datetime": "2026-03-28T12:00:00Z", "latitude": 52.2, "longitude": 0.1, "altitude": 0},
...
]
},
{
"stage": "descent",
"trajectory": [...]
}
],
"metadata": {
"start_datetime": "...",
"complete_datetime": "..."
},
"request": {
"dataset": "2026-03-28T06:00:00Z",
"launch_latitude": 52.2,
...
}
}
```
### `GET /ready`
Health check. Returns `{"status": "ok"}` when a dataset is loaded.
## Elevation Dataset
Without elevation data, descent terminates at sea level (altitude <= 0). With elevation data, descent terminates at ground level, matching Tawhiri's behaviour.
### Building the elevation dataset
The elevation dataset uses ETOPO 2022 at 30 arc-second resolution, converted to a ruaumoko-compatible binary format (21601 x 43200 grid of int16 little-endian elevation values in metres).
**Requirements**: Python 3, xarray, netcdf4, numpy.
```bash
pip install xarray netcdf4 numpy
# Downloads ~1.1 GB from NOAA, produces ~1.74 GB binary file
python3 scripts/build_elevation.py /tmp/predictor-data/ruaumoko-dataset
```
To skip the download if you already have the ETOPO NetCDF file:
```bash
ETOPO_NC_PATH=/path/to/ETOPO_2022_v1_30s_N90W180_surface.nc \
python3 scripts/build_elevation.py /tmp/predictor-data/ruaumoko-dataset
```
The ETOPO 2022 NetCDF can be manually downloaded from:
https://www.ncei.noaa.gov/products/etopo-global-relief-model
### Using the elevation dataset
```bash
PREDICTOR_ELEVATION_DATASET=/tmp/predictor-data/ruaumoko-dataset go run ./cmd/api
```
If the file doesn't exist or can't be read, the service starts normally with a warning and falls back to sea-level termination.
## Architecture
```
cmd/api/main.go Entry point, config, scheduler, HTTP server
internal/
dataset/
dataset.go Shape constants, pressure levels, S3 URLs
file.go mmap-backed dataset file (read/write/blit)
downloader/
downloader.go S3 partial GRIB download (idx + range requests)
idx.go NOAA .idx file parser
config.go Environment-based configuration
elevation/
elevation.go Ruaumoko-compatible elevation dataset (mmap int16)
prediction/
interpolate.go 4D wind interpolation (time, lat, lon, altitude)
solver.go RK4 integrator with binary search termination
models.go Ascent, descent, wind models; flight profiles
warnings.go Prediction warning counters
service/
service.go Dataset lifecycle, concurrent-safe access
transport/
middleware/log.go Request logging middleware
rest/
handler/handler.go ogen API handler implementation
handler/deps.go Service interface
transport.go ogen HTTP server, CORS
api/rest/predictor.swagger.yml OpenAPI 3.0 spec
pkg/rest/ Generated ogen code (17 files)
scripts/
build_elevation.py ETOPO 2022 to ruaumoko converter
```
## Wind Dataset
The service downloads GFS 0.5-degree forecast data from NOAA S3:
| Property | Value |
|---|---|
| Source | `noaa-gfs-bdp-pds.s3.amazonaws.com` |
| Resolution | 0.5 degrees |
| Grid | 361 lat x 720 lon |
| Time steps | 65 (every 3 hours, 0-192h) |
| Pressure levels | 47 (1000 to 1 hPa) |
| Variables | Geopotential height, U-wind, V-wind |
| Dataset size | 9,528,667,200 bytes (~8.87 GiB) |
| Update cadence | Every 6 hours (GFS runs at 00, 06, 12, 18 UTC) |
Data is downloaded using HTTP Range requests against `.idx` index files, fetching only the needed GRIB messages (HGT, UGRD, VGRD at 47 pressure levels). Full download takes 30-60 minutes depending on bandwidth.
The dataset is stored as a memory-mapped flat binary file of float32 values in C-order with shape `(65, 47, 3, 361, 720)`.
## Prediction Algorithms
All algorithms are exact ports of the reference implementations in Tawhiri. The following sections describe the key components.
### Interpolation (`internal/prediction/interpolate.go`)
4D wind interpolation from the dataset grid to arbitrary coordinates.
1. **Trilinear weights** (`pick3`): compute 8 interpolation weights for the (hour, lat, lon) cube corners.
2. **Altitude search** (`search`): binary search on interpolated geopotential height to find the two pressure levels bracketing the target altitude.
3. **Wind extraction** (`interp4`): 8-point weighted sum at each bracket level, then linear interpolation between levels.
Reference: `tawhiri/interpolate.pyx`
### Solver (`internal/prediction/solver.go`)
4th-order Runge-Kutta integrator with dt = 60 seconds.
- State vector: (latitude, longitude, altitude) in degrees and metres.
- Time: UNIX timestamp in seconds.
- Longitude is kept in [0, 360) via Python-style modulo after each `vecadd`.
- When a terminator fires, binary search refinement (tolerance 0.01) finds the precise termination point between the last good step and the first terminated step.
- Longitude interpolation (`lngLerp`) handles the 0/360 wrap-around.
Reference: `tawhiri/solver.pyx`
### Models (`internal/prediction/models.go`)
- **Constant ascent**: vertical velocity = ascent_rate m/s.
- **Drag descent**: NASA atmosphere density model with drag coefficient = sea_level_rate * 1.1045. Descent rate increases with altitude due to thinner air.
- **Wind velocity**: u, v components from interpolation converted to degrees/second: `dlat = (180/pi) * v / (R)`, `dlng = (180/pi) * u / (R * cos(lat))` where R = 6371009 + altitude.
- **Linear model**: sum of component models (e.g., wind + ascent).
- **Elevation termination**: `ground_elevation > altitude` using ruaumoko dataset.
Reference: `tawhiri/models.py`
### Profiles
- **standard_profile**: ascent (constant rate + wind) until burst altitude, then descent (drag + wind) until ground level.
- **float_profile**: ascent to float altitude, then drift at constant altitude until stop time.
## Verification
The predictor has been verified against the reference Tawhiri implementation:
| Test | Result |
|---|---|
| Dataset (step 0): 36.6M float32 values vs Python/cfgrib | 0 mismatches, max diff = 0.0 |
| Prediction burst point vs public Tawhiri API | Identical (lat, lon, alt all match) |
| Prediction landing point vs public Tawhiri API | Identical lat/lon, 5m altitude diff (different elevation datasets) |
| Descent point count | Identical (46 points) |
| Ascent point count | Identical (101 points) |
## Development
```bash
# Regenerate ogen API code after modifying the swagger spec
make generate-ogen
# Run tests
make test
# Format
make fmt
```
### Comparison tools
```bash
# Compare single dataset step against Python/cfgrib reference
go run ./cmd/compare_step0 <run_YYYYMMDDHH> <output_path>
# Run prediction and compare against public Tawhiri API
go run ./cmd/compare_prediction
```
## References
- [Tawhiri](https://github.com/cuspaceflight/tawhiri) — Reference Python/Cython predictor (Cambridge University Spaceflight)
- [tawhiri-downloader](https://github.com/cuspaceflight/tawhiri-downloader) — OCaml dataset downloader
- [ruaumoko](https://github.com/cuspaceflight/ruaumoko) — Global elevation dataset
- [NOAA GFS](https://www.ncei.noaa.gov/products/weather-climate-models/global-forecast) — Global Forecast System
- [NOAA GFS on S3](https://noaa-gfs-bdp-pds.s3.amazonaws.com/index.html) — Public S3 bucket
- [ETOPO 2022](https://www.ncei.noaa.gov/products/etopo-global-relief-model) — Global relief model for elevation data
- [SondeHub Tawhiri API](https://api.v2.sondehub.org/tawhiri) — Public Tawhiri instance for comparison