fbmc-chronos2 / HANDOVER_GUIDE.md
Evgueni Poloukarov
docs: add comprehensive handover guide and archive test scripts
a321b61

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

FBMC Chronos-2 Zero-Shot Forecasting - Handover Guide

Version: 1.0.0 Date: 2025-11-18 Status: Production-Ready MVP Maintainer: Quantitative Analyst


Executive Summary

This project delivers a zero-shot multivariate forecasting system for FBMC cross-border electricity flows using Amazon's Chronos-2 model. The system forecasts 38 European borders with 15.92 MW mean D+1 MAE - 88% better than the 134 MW target.

Key Achievement: Zero-shot learning (no model training) achieves production-quality accuracy using 615 covariate features.


Quick Start

Running Forecasts via API

from gradio_client import Client

# Connect to HuggingFace Space
client = Client("evgueni-p/fbmc-chronos2")

# Run forecast
result_file = client.predict(
    run_date="2024-09-30",          # YYYY-MM-DD format
    forecast_type="full_14day",      # or "smoke_test"
    api_name="/forecast"
)

# Load results
import polars as pl
forecast = pl.read_parquet(result_file)
print(forecast.head())

Forecast Types:

  • smoke_test: Quick validation (1 border × 7 days, ~30 seconds)
  • full_14day: Production forecast (38 borders × 14 days, ~4 minutes)

Output Format

Parquet file with columns:

  • timestamp: Hourly timestamps (D+1 to D+7 or D+14)
  • {border}_median: Median forecast (MW)
  • {border}_q10: 10th percentile uncertainty bound (MW)
  • {border}_q90: 90th percentile uncertainty bound (MW)

Example:

shape: (336, 115)
┌─────────────────────┬──────────────┬───────────┬───────────┐
│ timestamp           ┆ AT_CZ_median ┆ AT_CZ_q10 ┆ AT_CZ_q90 │
├─────────────────────┼──────────────┼───────────┼───────────┤
│ 2024-10-01 01:00:00 ┆ 287.0        ┆ 154.0     ┆ 334.0     │
│ 2024-10-01 02:00:00 ┆ 290.0        ┆ 157.0     ┆ 337.0     │
└─────────────────────┴──────────────┴───────────┴───────────┘

System Architecture

Components

┌─────────────────────┐
│  HuggingFace Space  │  GPU: A100-large (40-80 GB VRAM)
│  (Gradio API)       │  Cost: ~$500/month
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Chronos-2 Pipeline │  Model: amazon/chronos-2 (710M params)
│  (Zero-Shot)        │  Precision: bfloat16
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Feature Dataset    │  Storage: HuggingFace Datasets
│  (615 covariates)   │  Size: ~25 MB (24 months hourly)
└─────────────────────┘

Multivariate Features (615 total)

  1. Weather (520 features): Temperature, wind speed across 52 grid points × 10 vars
  2. Generation (52 features): Solar, wind, hydro, nuclear per zone
  3. CNEC Outages (34 features): Critical Network Element & Contingency availability
  4. Market (9 features): Day-ahead prices, LTA allocations

Data Flow

  1. User calls API with run_date
  2. System extracts 128-hour context window (historical data up to run_date 23:00)
  3. Chronos-2 forecasts 336 hours ahead (14 days) using 615 future covariates
  4. Returns probabilistic forecasts (3 quantiles: 0.1, 0.5, 0.9)

Performance Metrics

October 2024 Evaluation Results

Metric Value Target Achievement
D+1 MAE (Mean) 15.92 MW ≤134 MW 88% better
D+1 MAE (Median) 0.00 MW - ✅ Excellent
Borders ≤150 MW 36/38 (94.7%) - ✅ Very good
Forecast time 3.56 min <5 min ✅ Fast

MAE Degradation Over Forecast Horizon

D+1:  15.92 MW  (baseline)
D+2:  17.13 MW  (+7.6%)
D+7:  28.98 MW  (+82%)
D+14: 30.32 MW  (+90%)

Interpretation: Forecast accuracy degrades gracefully. Even at D+14, errors remain reasonable.

Border-Level Performance

Best Performers (D+1 MAE = 0.0 MW):

  • AT_CZ, AT_HU, AT_SI, BE_DE, CZ_DE (perfect forecasts!)
  • 15 additional borders with <1 MW error

Outliers (Require Phase 2 attention):

  • AT_DE: 266 MW (bidirectional flow complexity)
  • FR_DE: 181 MW (high volatility, large capacity)

Infrastructure & Costs

HuggingFace Space

Why A100 GPU?

The multivariate model with 615 features requires:

  • Baseline memory: 18 GB (model + dataset + PyTorch cache)
  • Attention computation: 11 GB per border
  • Total: ~29 GB → L4 (22 GB) insufficient, A100 (40 GB) comfortable

Memory Optimizations Applied:

  • batch_size=32 (from default 256) → 87% memory reduction
  • quantile_levels=[0.1, 0.5, 0.9] (from 9) → 67% reduction
  • context_hours=128 (from 512) → 50% reduction
  • torch.inference_mode() → disables gradient tracking

Dataset Storage

  • Location: HuggingFace Datasets (evgueni-p/fbmc-features-24month)
  • Size: 25 MB (17,544 hours × 2,514 features)
  • Access: Public read, authenticated write
  • Update Frequency: Monthly (recommended)

Known Limitations & Phase 2 Roadmap

Current Limitations

  1. Zero-shot only: No model fine-tuning (deliberate MVP scope)
  2. Two outlier borders: AT_DE (266 MW), FR_DE (181 MW) exceed targets
  3. Fixed context window: 128 hours (reduced from 256h for memory)
  4. No real-time updates: Forecast runs are on-demand via API
  5. No automated retraining: Model parameters are frozen

Phase 2 Recommendations

Priority 1: Fine-Tuning for Outlier Borders

  • Objective: Reduce AT_DE and FR_DE MAE below 150 MW
  • Approach: LoRA (Low-Rank Adaptation) fine-tuning on 6 months of border-specific data
  • Expected Improvement: 40-60% MAE reduction for outliers
  • Timeline: 2-3 weeks

Priority 2: Extend Context Window

  • Objective: Increase from 128h to 512h for better pattern learning
  • Requires: Code change + verify no OOM on A100
  • Expected Improvement: 10-15% overall MAE reduction
  • Timeline: 1 week

Priority 3: Feature Engineering Enhancements

  • Add: Scheduled outages, cross-border ramping constraints
  • Refine: CNEC weighting based on binding frequency
  • Expected Improvement: 5-10% MAE reduction
  • Timeline: 2 weeks

Priority 4: Automated Daily Forecasting

  • Objective: Scheduled daily runs at 23:00 CET
  • Approach: GitHub Actions + HF Space API
  • Storage: Results in HF Datasets or S3
  • Timeline: 1 week

Priority 5: Probabilistic Calibration

  • Objective: Ensure 80% of actuals fall within [q10, q90] bounds
  • Approach: Conformal prediction or quantile calibration
  • Expected Improvement: Better uncertainty quantification
  • Timeline: 2 weeks

Troubleshooting

Common Issues

1. Space Shows "PAUSED" Status

Cause: GPU tier requires manual approval or billing issue

Solution:

  1. Check Space settings: https://huggingface.co/spaces/evgueni-p/fbmc-chronos2/settings
  2. Verify account tier supports A100-large
  3. Click "Factory Reboot" to restart

2. CUDA Out of Memory Errors

Symptoms: Returns debug_*.txt file instead of parquet, error shows OOM

Solution:

  1. Verify suggested_hardware: a100-large in README.md
  2. Check Space logs for actual GPU allocated
  3. If downgraded to L4, file GitHub issue for GPU upgrade

Fallback: Reduce context_hours from 128 to 64 in src/forecasting/chronos_inference.py:117

3. Forecast Returns Empty/Invalid Data

Check:

  1. Verify run_date is within dataset range (2023-10-01 to 2025-09-30)
  2. Check dataset accessibility: https://huggingface.co/datasets/evgueni-p/fbmc-features-24month
  3. Review debug file for specific errors

4. Slow Inference (>10 minutes)

Normal Range: 3-5 minutes for 38 borders × 14 days

If Slower:

  1. Check Space GPU allocation (should be A100)
  2. Verify batch_size=32 in code (not reverted to 256)
  3. Check HF Space region (US-East faster than EU)

Development Workflow

Local Development

# Clone repository
git clone https://github.com/evgspacdmy/fbmc_chronos2.git
cd fbmc_chronos2

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install dependencies with uv (faster than pip)
.venv/Scripts/uv.exe pip install -r requirements.txt

# Run local tests
pytest tests/ -v

Deploying Changes to HF Space

CRITICAL: HF Space uses main branch, local uses master

# Make changes locally
git add .
git commit -m "feat: your description"

# Push to BOTH remotes
git push origin master           # GitHub (version control)
git push hf-new master:main      # HF Space (deployment)

Wait 3-5 minutes for Space rebuild. Check logs for successful deployment.

Adding New Features

  1. Create feature branch: git checkout -b feature/name
  2. Implement changes with tests
  3. Run evaluation: python scripts/evaluate_october_2024.py
  4. Merge to master if MAE doesn't degrade
  5. Push to both remotes

API Reference

Gradio API Endpoints

/forecast

Parameters:

  • run_date (str): Forecast run date in YYYY-MM-DD format
  • forecast_type (str): "smoke_test" or "full_14day"

Returns:

  • File path to parquet forecast or debug txt (if errors)

Example:

result = client.predict(
    run_date="2024-09-30",
    forecast_type="full_14day",
    api_name="/forecast"
)

Python SDK (Gradio Client)

from gradio_client import Client
import polars as pl

# Initialize client
client = Client("evgueni-p/fbmc-chronos2")

# Run forecast
result = client.predict(
    run_date="2024-09-30",
    forecast_type="full_14day",
    api_name="/forecast"
)

# Load and process results
df = pl.read_parquet(result)

# Extract specific border
at_cz_median = df.select(["timestamp", "AT_CZ_median"])

Data Schema

Feature Dataset Columns

Total: 2,514 columns (1 timestamp + 603 target borders + 12 actuals + 1,899 features)

Target Columns (603):

  • target_border_{BORDER}: Historical flow values (MW)
  • Example: target_border_AT_CZ, target_border_FR_DE

Actual Columns (12):

  • actual_{ZONE}_price: Day-ahead electricity price (EUR/MWh)
  • Example: actual_DE_price, actual_FR_price

Feature Categories (1,899 total):

  1. Weather Future (520 features)

    • weather_future_{zone}_{var}: temperature, wind_speed, etc.
    • Zones: AT, BE, CZ, DE, FR, HU, HR, NL, PL, RO, SI, SK
    • Variables: temperature, wind_u, wind_v, pressure, humidity, etc.
  2. Generation Future (52 features)

    • generation_future_{zone}_{type}: solar, wind, hydro, nuclear
    • Example: generation_future_DE_solar
  3. CNEC Outages (34 features)

    • cnec_outage_{cnec_id}: Binary availability (0=outage, 1=available)
    • Tier-1 CNECs (most binding)
  4. Market (9 features)

    • lta_{border}: Long-term allocation (MW)
    • Day-ahead price forecasts

Forecast Output Schema

Columns: 115 (1 timestamp + 38 borders × 3 quantiles)

timestamp: datetime
{border}_median: float64  (50th percentile forecast)
{border}_q10: float64     (10th percentile, lower bound)
{border}_q90: float64     (90th percentile, upper bound)

Borders: AT_CZ, AT_HU, AT_SI, BE_DE, CZ_AT, ..., NL_DE (38 total)


Contact & Support

Project Repository

Key Documentation

  • doc/activity.md: Development log and session history
  • DEPLOYMENT_NOTES.md: HF Space deployment troubleshooting
  • CLAUDE.md: Development rules and conventions
  • README.md: Project overview and quick start

Getting Help

  1. Check documentation first (this guide, README.md, activity.md)
  2. Review recent commits for similar issues
  3. Check HF Space logs for runtime errors
  4. File GitHub issue with detailed error description

Appendix: Technical Details

Model Specifications

  • Architecture: Chronos-2 (T5-based encoder-decoder)
  • Parameters: 710M
  • Precision: bfloat16 (memory efficient)
  • Context: 128 hours (reduced from 512h for GPU memory)
  • Horizon: 336 hours (14 days)
  • Batch Size: 32 (optimized for A100 GPU)
  • Quantiles: 3 [0.1, 0.5, 0.9]

Inference Configuration

pipeline.predict_df(
    context_data,          # 128h × 2,514 features
    future_df=future_data, # 336h × 615 features
    prediction_length=336,
    batch_size=32,
    quantile_levels=[0.1, 0.5, 0.9]
)

Memory Footprint

  • Model weights: ~2 GB (bfloat16)
  • Dataset: ~1 GB (in-memory)
  • PyTorch cache: ~15 GB (workspace)
  • Attention (per batch): ~11 GB
  • Total: ~29 GB (peak)

GPU Requirements

GPU VRAM Status
T4 16 GB ❌ Insufficient (18 GB baseline)
L4 22 GB ❌ Insufficient (29 GB peak)
A10G 24 GB ⚠️ Marginal (tight fit)
A100 40-80 GB Recommended

Document Version: 1.0.0 Last Updated: 2025-11-18 Status: Production Ready