Spaces:
Sleeping
feat: fix future covariate architecture (615 features: temporal, weather, 176 CNEC outages)
Browse filesCritical fixes to future covariate identification for Chronos 2 inference:
ISSUE 1: CNEC Transmission Outages (31 → 176 features)
- Problem: Only 31 CNECs with historical outages had features
- During inference, any of 176 CNECs could have planned future outages
- Model was blind to 82% of transmission network outages
- Solution: Preserve all 176 zero-filled CNEC features in cleanup
ISSUE 2: Weather Features Not Marked (0 → 375 features)
- Problem: 375 weather features existed but not marked as future covariates
- ECMWF D+15 forecasts wouldn't be used by model
- Solution: Include weather_cols in metadata future covariate check
ISSUE 3: Temporal Features Missing (0 → 12 features)
- Problem: Deterministic features (hour, day, weekday) not marked as future
- Model couldn't leverage known temporal patterns
- Solution: Add temporal_cols to future covariate identification
Changes:
- src/feature_engineering/engineer_entsoe_features.py:
* Skip CNEC outages in zero-variance cleanup (lines 843-858)
* Skip CNEC outages in duplicate removal (lines 860-883)
* Result: 296 → 441 ENTSO-E features (+145 CNEC outages)
- notebooks/05_unified_features_final.py:
* Add temporal features to identification (lines 287-320)
* Include temporal + weather in metadata (lines 931-976)
* Update summary table and text (lines 382-418, 1034)
- doc/activity.md:
* Comprehensive documentation of fixes and rationale
* Future covariate architecture and inference strategy
* Data quality validation results
Results:
- Total features: 2,408 → 2,553 (+145)
- Future covariates: 83 → 615 (+532)
* Temporal: 12 (deterministic)
* LTA: 40 (years ahead)
* Load Forecasts: 12 (D+1)
* CNEC Outages: 176 (D+22)
* Weather: 375 (D+15 ECMWF)
- Historical features: 1,938
Data regenerated:
- features_entsoe_24month.parquet (10.67 MB, 441 features)
- features_unified_24month.parquet (24.9 MB, 2,553 features)
- features_unified_metadata.csv (615 future covariates marked)
All validations passed. Ready for Day 3 zero-shot inference.
|
@@ -3260,15 +3260,278 @@ Successfully unified all three feature sets (JAO, ENTSO-E, Weather) into a singl
|
|
| 3260 |
|
| 3261 |
---
|
| 3262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3263 |
**Status Update**:
|
| 3264 |
- Day 0: ✅ Setup complete
|
| 3265 |
- Day 1: ✅ Data collection complete (JAO, ENTSO-E, Weather)
|
| 3266 |
- Day 2: ✅ Feature engineering complete (JAO, ENTSO-E, Weather)
|
| 3267 |
-
-
|
|
|
|
| 3268 |
- Day 3: ⏳ Zero-shot inference (NEXT)
|
| 3269 |
- Day 4: ⏳ Evaluation
|
| 3270 |
- Day 5: ⏳ Documentation + handover
|
| 3271 |
|
| 3272 |
**NEXT SESSION BOOKMARK**: Day 3 - Implement Chronos 2 zero-shot inference pipeline
|
| 3273 |
|
| 3274 |
-
**Ready for Inference**: ✅ Unified dataset
|
|
|
|
| 3260 |
|
| 3261 |
---
|
| 3262 |
|
| 3263 |
+
## 2025-11-11 - Future Covariate Architecture Fixed ✅
|
| 3264 |
+
|
| 3265 |
+
### Summary
|
| 3266 |
+
Identified and fixed critical gaps in future covariate identification: CNEC transmission outages (31 → 176), weather features (not marked), and temporal features (missing). Rebuilt ENTSO-E feature engineering with cleanup safeguards, regenerated unified dataset, and updated metadata. Final system: **2,553 features (615 future, 1,938 historical)** ready for Chronos 2 zero-shot inference.
|
| 3267 |
+
|
| 3268 |
+
### Issues Identified
|
| 3269 |
+
|
| 3270 |
+
#### 1. CNEC Transmission Outages Insufficient (31 vs 176)
|
| 3271 |
+
**Problem**: Only 31 CNECs with historical outages had features. During inference, ANY of the 176 master CNECs could have planned future outages, but the model couldn't receive that information.
|
| 3272 |
+
|
| 3273 |
+
**Root Cause**: Feature engineering cleanup logic removed 145 zero-filled CNEC features as "zero-variance" and "duplicates."
|
| 3274 |
+
|
| 3275 |
+
**Impact**: Model blind to future outages for 145 CNECs (82% of transmission network).
|
| 3276 |
+
|
| 3277 |
+
#### 2. Weather Features Not Marked as Future Covariates (0 vs 375)
|
| 3278 |
+
**Problem**: 375 weather features exist but metadata marked them as historical, not future covariates.
|
| 3279 |
+
|
| 3280 |
+
**Root Cause**: Notebook metadata generation (`create_metadata()` line 942) only checked LTA, load forecasts, and outages - excluded weather.
|
| 3281 |
+
|
| 3282 |
+
**Impact**: During inference, ECMWF D+15 forecasts wouldn't be used by Chronos 2.
|
| 3283 |
+
|
| 3284 |
+
#### 3. Temporal Features Missing from Future Covariates (0 vs 12)
|
| 3285 |
+
**Problem**: Temporal features (hour, day, weekday, etc.) always known deterministically but not marked as future covariates.
|
| 3286 |
+
|
| 3287 |
+
**Root Cause**: Not included in future covariate identification logic.
|
| 3288 |
+
|
| 3289 |
+
**Impact**: Model couldn't leverage known future temporal patterns.
|
| 3290 |
+
|
| 3291 |
+
### Work Completed
|
| 3292 |
+
|
| 3293 |
+
#### 1. Fixed ENTSO-E Feature Engineering
|
| 3294 |
+
**File**: `src/feature_engineering/engineer_entsoe_features.py`
|
| 3295 |
+
|
| 3296 |
+
**Changes Made**:
|
| 3297 |
+
```python
|
| 3298 |
+
# Line 843-858: Zero-variance cleanup - skip transmission outages
|
| 3299 |
+
if col.startswith('outage_cnec_'):
|
| 3300 |
+
continue # Keep even if zero-filled
|
| 3301 |
+
|
| 3302 |
+
# Line 860-883: Duplicate removal - skip transmission outages
|
| 3303 |
+
if col1.startswith('outage_cnec_') or col2.startswith('outage_cnec_'):
|
| 3304 |
+
continue # Each CNEC needs own column for inference
|
| 3305 |
+
```
|
| 3306 |
+
|
| 3307 |
+
**Result**:
|
| 3308 |
+
- Before: 296 ENTSO-E features (31 CNEC outages)
|
| 3309 |
+
- After: 441 ENTSO-E features (176 CNEC outages)
|
| 3310 |
+
- Change: +145 zero-filled CNEC outage features preserved
|
| 3311 |
+
|
| 3312 |
+
**Validation**: All 176 CNEC outage features confirmed present in output file.
|
| 3313 |
+
|
| 3314 |
+
#### 2. Updated Unification Notebook
|
| 3315 |
+
**File**: `notebooks/05_unified_features_final.py`
|
| 3316 |
+
|
| 3317 |
+
**Change 1** (line 287-320): Added temporal features to identification
|
| 3318 |
+
```python
|
| 3319 |
+
# Added:
|
| 3320 |
+
temporal_cols = [c for c in future_cov_all_cols if any(x in c for x in
|
| 3321 |
+
['hour', 'day', 'month', 'weekday', 'year', 'weekend', '_sin', '_cos'])]
|
| 3322 |
+
|
| 3323 |
+
# Updated return:
|
| 3324 |
+
return temporal_cols, lta_cols, load_forecast_cols, outage_cols, weather_cols, future_cov_counts
|
| 3325 |
+
```
|
| 3326 |
+
|
| 3327 |
+
**Change 2** (line 382-418): Added temporal row to summary table
|
| 3328 |
+
|
| 3329 |
+
**Change 3** (line 931-976): Updated metadata generation
|
| 3330 |
+
```python
|
| 3331 |
+
# Line 931: Added temporal_cols and weather_cols to function signature
|
| 3332 |
+
def create_metadata(pl, categories, temporal_cols, lta_cols, load_forecast_cols,
|
| 3333 |
+
outage_cols, weather_cols, outage_stats):
|
| 3334 |
+
|
| 3335 |
+
# Line 948-952: Include temporal and weather in future covariate check
|
| 3336 |
+
meta_is_future = (meta_col in temporal_cols or
|
| 3337 |
+
meta_col in lta_cols or
|
| 3338 |
+
meta_col in load_forecast_cols or
|
| 3339 |
+
meta_col in outage_cols or
|
| 3340 |
+
meta_col in weather_cols)
|
| 3341 |
+
|
| 3342 |
+
# Line 955-966: Added extension periods for temporal and weather
|
| 3343 |
+
if meta_col in temporal_cols:
|
| 3344 |
+
meta_extension_days = 'Full horizon (deterministic)'
|
| 3345 |
+
elif meta_col in weather_cols:
|
| 3346 |
+
meta_extension_days = '15 days (D+15 ECMWF)'
|
| 3347 |
+
```
|
| 3348 |
+
|
| 3349 |
+
**Change 4** (line 1034): Updated summary text (87 → 615)
|
| 3350 |
+
|
| 3351 |
+
#### 3. Regenerated All Outputs
|
| 3352 |
+
|
| 3353 |
+
**Step 1**: Re-ran ENTSO-E feature engineering
|
| 3354 |
+
```bash
|
| 3355 |
+
.venv\Scripts\python.exe src\feature_engineering\engineer_entsoe_features.py
|
| 3356 |
+
```
|
| 3357 |
+
- Output: 441 ENTSO-E features (176 CNEC outages preserved)
|
| 3358 |
+
- File: `data/processed/features_entsoe_24month.parquet` (10.67 MB)
|
| 3359 |
+
|
| 3360 |
+
**Step 2**: Re-ran unification
|
| 3361 |
+
```bash
|
| 3362 |
+
.venv\Scripts\python.exe scripts\unify_features_checkpoint.py
|
| 3363 |
+
```
|
| 3364 |
+
- Output: 2,553 total features (17,544 hours × 2,553 columns)
|
| 3365 |
+
- File: `data/processed/features_unified_24month.parquet` (24.9 MB)
|
| 3366 |
+
|
| 3367 |
+
**Step 3**: Regenerated metadata with updated logic
|
| 3368 |
+
- Custom script with temporal + weather future covariate marking
|
| 3369 |
+
- Output: `data/processed/features_unified_metadata.csv`
|
| 3370 |
+
- Result: 615 future covariates correctly identified
|
| 3371 |
+
|
| 3372 |
+
### Final Feature Architecture
|
| 3373 |
+
|
| 3374 |
+
#### Total Feature Count: 2,553
|
| 3375 |
+
|
| 3376 |
+
| Source | Features | Description |
|
| 3377 |
+
|--------|----------|-------------|
|
| 3378 |
+
| JAO | 1,737 | CNECs, borders, net positions, LTA, temporal |
|
| 3379 |
+
| ENTSO-E | 441 | Generation, demand, prices, load forecasts, outages (176 CNECs) |
|
| 3380 |
+
| Weather | 375 | Temperature, wind, solar, cloud, pressure, lags, derived |
|
| 3381 |
+
| **TOTAL** | **2,553** | **Complete FBMC feature set** |
|
| 3382 |
+
|
| 3383 |
+
#### Future Covariate Breakdown: 615
|
| 3384 |
+
|
| 3385 |
+
| Category | Count | Extension Period | Purpose |
|
| 3386 |
+
|----------|-------|------------------|---------|
|
| 3387 |
+
| **Temporal** | 12 | Full horizon (deterministic) | Hour, day, weekday always known |
|
| 3388 |
+
| **LTA** | 40 | Full horizon (years) | Auction results known in advance |
|
| 3389 |
+
| **Load Forecasts** | 12 | D+1 (1 day) | TSO demand forecasts |
|
| 3390 |
+
| **CNEC Outages** | 176 | Up to D+22 | Planned transmission maintenance |
|
| 3391 |
+
| **Weather** | 375 | D+15 (15 days) | ECMWF IFS 0.25° forecasts |
|
| 3392 |
+
| **TOTAL** | **615** | **Variable** | **24.1% of features** |
|
| 3393 |
+
|
| 3394 |
+
#### Historical Features: 1,938
|
| 3395 |
+
|
| 3396 |
+
These include:
|
| 3397 |
+
- CNEC binding/RAM/utilization (historical congestion)
|
| 3398 |
+
- Border flows and capacities (historical)
|
| 3399 |
+
- Net positions (historical)
|
| 3400 |
+
- PTDF coefficients and interactions
|
| 3401 |
+
- Generation by type (historical)
|
| 3402 |
+
- Day-ahead prices (historical)
|
| 3403 |
+
- Hydro storage levels (historical)
|
| 3404 |
+
|
| 3405 |
+
### Data Quality Validation
|
| 3406 |
+
|
| 3407 |
+
**Unified Dataset**:
|
| 3408 |
+
- Dimensions: 17,544 rows × 2,553 columns
|
| 3409 |
+
- Date range: Oct 1, 2023 - Sept 30, 2025 (24 months, hourly)
|
| 3410 |
+
- File size: 24.9 MB (compressed parquet)
|
| 3411 |
+
- Timestamp continuity: 100% (no gaps)
|
| 3412 |
+
|
| 3413 |
+
**Completeness by Category**:
|
| 3414 |
+
- Temporal: 100%
|
| 3415 |
+
- LTA: 100%
|
| 3416 |
+
- Border capacity: 99.86%
|
| 3417 |
+
- Net positions: 100%
|
| 3418 |
+
- Load forecasts: 99.73%
|
| 3419 |
+
- Transmission outages: 100% (binary: 0 or 1)
|
| 3420 |
+
- Weather: 100%
|
| 3421 |
+
- Generation/demand: 99.85%
|
| 3422 |
+
- **CNEC features: 26.41%** (expected sparsity - congestion is occasional)
|
| 3423 |
+
|
| 3424 |
+
**Overall Completeness**: 57.11% (due to expected CNEC sparsity)
|
| 3425 |
+
|
| 3426 |
+
### Files Modified
|
| 3427 |
+
|
| 3428 |
+
**Code Changes**:
|
| 3429 |
+
1. `src/feature_engineering/engineer_entsoe_features.py`
|
| 3430 |
+
- Lines 843-858: Zero-variance cleanup safeguard
|
| 3431 |
+
- Lines 860-883: Duplicate removal safeguard
|
| 3432 |
+
|
| 3433 |
+
2. `notebooks/05_unified_features_final.py`
|
| 3434 |
+
- Lines 287-320: Future covariate identification (added temporal)
|
| 3435 |
+
- Lines 382-418: Summary table (added temporal row)
|
| 3436 |
+
- Lines 931-976: Metadata generation (added temporal + weather)
|
| 3437 |
+
- Line 1034: Summary text (87 → 615)
|
| 3438 |
+
|
| 3439 |
+
**Data Files Regenerated**:
|
| 3440 |
+
1. `data/processed/features_entsoe_24month.parquet`
|
| 3441 |
+
- Size: 10.67 MB
|
| 3442 |
+
- Features: 441 (was 296, +145)
|
| 3443 |
+
- CNEC outages: 176 (was 31, +145)
|
| 3444 |
+
|
| 3445 |
+
2. `data/processed/features_unified_24month.parquet`
|
| 3446 |
+
- Size: 24.9 MB
|
| 3447 |
+
- Features: 2,553 (was 2,408, +145)
|
| 3448 |
+
- Rows: 17,544 (unchanged)
|
| 3449 |
+
|
| 3450 |
+
3. `data/processed/features_unified_metadata.csv`
|
| 3451 |
+
- Total features: 2,552 (excludes timestamp)
|
| 3452 |
+
- Future covariates: 615 (was 83, +532)
|
| 3453 |
+
- Historical features: 1,937 (was 2,324, -387)
|
| 3454 |
+
|
| 3455 |
+
### Key Lessons
|
| 3456 |
+
|
| 3457 |
+
1. **Zero-filled features are valid**: For inference, model needs columns for ALL possible future events, even if they never occurred historically. Zero-filled CNECs are placeholders for future outages.
|
| 3458 |
+
|
| 3459 |
+
2. **Future covariate marking is critical**: Chronos 2 uses metadata to know which features extend into the forecast horizon. Missing weather marking would have crippled D+1 to D+14 forecasts.
|
| 3460 |
+
|
| 3461 |
+
3. **Temporal features are deterministic covariates**: Hour, day, weekday are always known - must be marked as future covariates for model to leverage seasonal/daily patterns.
|
| 3462 |
+
|
| 3463 |
+
4. **Cleanup logic needs safeguards**: Aggressive removal of zero-variance/duplicate features can delete valid future covariate placeholders. Must explicitly preserve critical feature categories.
|
| 3464 |
+
|
| 3465 |
+
5. **Extension periods matter**: Different covariates extend different horizons:
|
| 3466 |
+
- D+1: Load forecasts (mask D+2 to D+15)
|
| 3467 |
+
- D+15: Weather (ECMWF forecasts)
|
| 3468 |
+
- D+22: Transmission outages
|
| 3469 |
+
- ∞: Temporal (deterministic)
|
| 3470 |
+
|
| 3471 |
+
### Inference Strategy (Day 3)
|
| 3472 |
+
|
| 3473 |
+
**Future Covariate Handling**:
|
| 3474 |
+
1. **Temporal** (12 features): Generate for full D+1 to D+14 horizon (deterministic)
|
| 3475 |
+
2. **LTA** (40 features): Truncate to D+15 (known years ahead, no need beyond horizon)
|
| 3476 |
+
3. **Load Forecasts** (12 features): Use D+1 values, mask D+2 to D+15 (Chronos handles missing)
|
| 3477 |
+
4. **CNEC Outages** (176 features): Collect latest planned outages (up to D+22 available)
|
| 3478 |
+
5. **Weather** (375 features): Run `scripts/collect_openmeteo_forecast_latest.py` before inference to get fresh D+15 ECMWF forecasts
|
| 3479 |
+
|
| 3480 |
+
**Forecast Extension Pattern**:
|
| 3481 |
+
- Historical data: Oct 2023 - Sept 30, 2025 (17,544 hours)
|
| 3482 |
+
- Inference from: Oct 1, 2025 00:00 onwards
|
| 3483 |
+
- Context window: Last 512 hours (Chronos 2 maximum)
|
| 3484 |
+
- Forecast horizon: D+1 to D+14 (336 hours)
|
| 3485 |
+
- Future covariates: Extend 615 features forward 336 hours
|
| 3486 |
+
|
| 3487 |
+
### Performance Metrics
|
| 3488 |
+
|
| 3489 |
+
**Re-engineering Time**:
|
| 3490 |
+
- ENTSO-E feature engineering: ~8 minutes
|
| 3491 |
+
- Unification: ~15 seconds
|
| 3492 |
+
- Metadata regeneration: ~2 seconds
|
| 3493 |
+
- Total: ~9 minutes
|
| 3494 |
+
|
| 3495 |
+
**Data Sizes**:
|
| 3496 |
+
- ENTSO-E features: 10.67 MB (was 10.62 MB, +50 KB)
|
| 3497 |
+
- Unified features: 24.9 MB (was ~25 MB, minimal change - zeros compress well)
|
| 3498 |
+
- Metadata: ~80 KB (was ~50 KB, +30 KB)
|
| 3499 |
+
|
| 3500 |
+
### Next Steps
|
| 3501 |
+
|
| 3502 |
+
**Immediate**: Day 3 - Zero-Shot Inference
|
| 3503 |
+
1. Create `src/modeling/` directory
|
| 3504 |
+
2. Implement Chronos 2 inference pipeline:
|
| 3505 |
+
- Load unified features (2,553 features × 17,544 hours)
|
| 3506 |
+
- Identify 615 future covariates from metadata
|
| 3507 |
+
- Collect fresh weather forecasts (D+15)
|
| 3508 |
+
- Generate temporal features for forecast horizon
|
| 3509 |
+
- Prepare context window (last 512 hours)
|
| 3510 |
+
- Run zero-shot inference (D+1 to D+14)
|
| 3511 |
+
- Save predictions
|
| 3512 |
+
|
| 3513 |
+
3. Performance targets:
|
| 3514 |
+
- Inference time: <5 minutes per 14-day forecast
|
| 3515 |
+
- D+1 MAE: <150 MW (target 134 MW)
|
| 3516 |
+
- Memory: <10 GB (A10G GPU compatible)
|
| 3517 |
+
|
| 3518 |
+
**Documentation Needed**:
|
| 3519 |
+
- Update README with new feature counts
|
| 3520 |
+
- Document future covariate extension strategy
|
| 3521 |
+
- Add inference preprocessing steps
|
| 3522 |
+
|
| 3523 |
+
---
|
| 3524 |
+
|
| 3525 |
**Status Update**:
|
| 3526 |
- Day 0: ✅ Setup complete
|
| 3527 |
- Day 1: ✅ Data collection complete (JAO, ENTSO-E, Weather)
|
| 3528 |
- Day 2: ✅ Feature engineering complete (JAO, ENTSO-E, Weather)
|
| 3529 |
+
- Day 2.5: ✅ Feature unification complete (2,408 → 2,553 features)
|
| 3530 |
+
- **Day 2.75: ✅ Future covariate architecture fixed** (615 future covariates)
|
| 3531 |
- Day 3: ⏳ Zero-shot inference (NEXT)
|
| 3532 |
- Day 4: ⏳ Evaluation
|
| 3533 |
- Day 5: ⏳ Documentation + handover
|
| 3534 |
|
| 3535 |
**NEXT SESSION BOOKMARK**: Day 3 - Implement Chronos 2 zero-shot inference pipeline
|
| 3536 |
|
| 3537 |
+
**Ready for Inference**: ✅ Unified dataset with complete future covariate architecture
|
|
@@ -288,13 +288,17 @@ def identify_future_covariates(pl, unified_df):
|
|
| 288 |
"""Identify all future covariate features.
|
| 289 |
|
| 290 |
Future covariates:
|
| 291 |
-
1.
|
| 292 |
-
2.
|
| 293 |
-
3.
|
| 294 |
-
4.
|
|
|
|
| 295 |
"""
|
| 296 |
future_cov_all_cols = unified_df.columns
|
| 297 |
|
|
|
|
|
|
|
|
|
|
| 298 |
# Identify by prefix
|
| 299 |
lta_cols = [c for c in future_cov_all_cols if c.startswith('lta_')]
|
| 300 |
load_forecast_cols = [c for c in future_cov_all_cols if c.startswith('load_forecast_')]
|
|
@@ -305,14 +309,15 @@ def identify_future_covariates(pl, unified_df):
|
|
| 305 |
weather_cols = [c for c in future_cov_all_cols if any(c.startswith(p) for p in weather_prefixes)]
|
| 306 |
|
| 307 |
future_cov_counts = {
|
|
|
|
| 308 |
'LTA': len(lta_cols),
|
| 309 |
'Load Forecasts': len(load_forecast_cols),
|
| 310 |
'Transmission Outages': len(outage_cols),
|
| 311 |
'Weather': len(weather_cols),
|
| 312 |
-
'Total': len(lta_cols) + len(load_forecast_cols) + len(outage_cols) + len(weather_cols)
|
| 313 |
}
|
| 314 |
|
| 315 |
-
return lta_cols, load_forecast_cols, outage_cols, weather_cols, future_cov_counts
|
| 316 |
|
| 317 |
|
| 318 |
@app.cell
|
|
@@ -379,7 +384,7 @@ def display_future_cov_summary(mo, future_cov_counts, outage_stats):
|
|
| 379 |
outage_ext = f"{outage_stats['extension_days']} days" if outage_stats['extension_days'] is not None else "N/A"
|
| 380 |
|
| 381 |
# Calculate percentage of future covariates
|
| 382 |
-
total_pct = (future_cov_counts['Total'] /
|
| 383 |
|
| 384 |
mo.md(
|
| 385 |
f"""
|
|
@@ -387,6 +392,7 @@ def display_future_cov_summary(mo, future_cov_counts, outage_stats):
|
|
| 387 |
|
| 388 |
| Category | Count | Extension Period | Description |
|
| 389 |
|----------|-------|------------------|-------------|
|
|
|
|
| 390 |
| LTA (Long-Term Allocations) | {future_cov_counts['LTA']} | Full horizon (years) | Auction results known in advance |
|
| 391 |
| Load Forecasts | {future_cov_counts['Load Forecasts']} | D+1 (1 day) | TSO demand forecasts, published daily |
|
| 392 |
| Transmission Outages | {future_cov_counts['Transmission Outages']} | Up to {outage_ext} | Planned maintenance schedules |
|
|
@@ -922,7 +928,7 @@ def section8_header(mo):
|
|
| 922 |
|
| 923 |
|
| 924 |
@app.cell
|
| 925 |
-
def create_metadata(pl, categories, lta_cols, load_forecast_cols, outage_cols, outage_stats):
|
| 926 |
"""Create feature metadata file."""
|
| 927 |
metadata_rows = []
|
| 928 |
|
|
@@ -939,15 +945,23 @@ def create_metadata(pl, categories, lta_cols, load_forecast_cols, outage_cols, o
|
|
| 939 |
source = 'Unknown'
|
| 940 |
|
| 941 |
# Determine if future covariate
|
| 942 |
-
meta_is_future = meta_col in
|
|
|
|
|
|
|
|
|
|
|
|
|
| 943 |
|
| 944 |
# Determine extension days
|
| 945 |
-
if meta_col in
|
|
|
|
|
|
|
| 946 |
meta_extension_days = 'Full horizon (years)'
|
| 947 |
elif meta_col in load_forecast_cols:
|
| 948 |
meta_extension_days = '1 day (D+1)'
|
| 949 |
elif meta_col in outage_cols:
|
| 950 |
meta_extension_days = f"Up to {outage_stats['extension_days']} days" if outage_stats['extension_days'] else 'Variable'
|
|
|
|
|
|
|
| 951 |
else:
|
| 952 |
meta_extension_days = 'N/A (historical)'
|
| 953 |
|
|
@@ -1017,7 +1031,7 @@ def display_save_info(mo, save_info):
|
|
| 1017 |
- [OK] All 3 data sources merged (JAO + ENTSO-E + Weather)
|
| 1018 |
- [OK] Timestamps standardized to UTC with hourly frequency
|
| 1019 |
- [OK] {save_info['features_shape'][1] - 1:,} features engineered and cleaned
|
| 1020 |
-
- [OK]
|
| 1021 |
- [OK] Data quality validated (>99% completeness)
|
| 1022 |
- [OK] Standard decimal precision applied
|
| 1023 |
- [OK] Metadata file created for feature reference
|
|
|
|
| 288 |
"""Identify all future covariate features.
|
| 289 |
|
| 290 |
Future covariates:
|
| 291 |
+
1. Temporal (hour, day, etc.): Known deterministically
|
| 292 |
+
2. LTA (lta_*): Known years in advance
|
| 293 |
+
3. Load forecasts (load_forecast_*): D+1
|
| 294 |
+
4. Transmission outages (outage_cnec_*): Up to D+22
|
| 295 |
+
5. Weather (temp_*, wind*, solar_*, etc.): D+15 via ECMWF forecasts
|
| 296 |
"""
|
| 297 |
future_cov_all_cols = unified_df.columns
|
| 298 |
|
| 299 |
+
# Temporal features (deterministic)
|
| 300 |
+
temporal_cols = [c for c in future_cov_all_cols if any(x in c for x in ['hour', 'day', 'month', 'weekday', 'year', 'weekend', '_sin', '_cos'])]
|
| 301 |
+
|
| 302 |
# Identify by prefix
|
| 303 |
lta_cols = [c for c in future_cov_all_cols if c.startswith('lta_')]
|
| 304 |
load_forecast_cols = [c for c in future_cov_all_cols if c.startswith('load_forecast_')]
|
|
|
|
| 309 |
weather_cols = [c for c in future_cov_all_cols if any(c.startswith(p) for p in weather_prefixes)]
|
| 310 |
|
| 311 |
future_cov_counts = {
|
| 312 |
+
'Temporal': len(temporal_cols),
|
| 313 |
'LTA': len(lta_cols),
|
| 314 |
'Load Forecasts': len(load_forecast_cols),
|
| 315 |
'Transmission Outages': len(outage_cols),
|
| 316 |
'Weather': len(weather_cols),
|
| 317 |
+
'Total': len(temporal_cols) + len(lta_cols) + len(load_forecast_cols) + len(outage_cols) + len(weather_cols)
|
| 318 |
}
|
| 319 |
|
| 320 |
+
return temporal_cols, lta_cols, load_forecast_cols, outage_cols, weather_cols, future_cov_counts
|
| 321 |
|
| 322 |
|
| 323 |
@app.cell
|
|
|
|
| 384 |
outage_ext = f"{outage_stats['extension_days']} days" if outage_stats['extension_days'] is not None else "N/A"
|
| 385 |
|
| 386 |
# Calculate percentage of future covariates
|
| 387 |
+
total_pct = (future_cov_counts['Total'] / 2553) * 100 # ~2,553 total features
|
| 388 |
|
| 389 |
mo.md(
|
| 390 |
f"""
|
|
|
|
| 392 |
|
| 393 |
| Category | Count | Extension Period | Description |
|
| 394 |
|----------|-------|------------------|-------------|
|
| 395 |
+
| Temporal | {future_cov_counts['Temporal']} | Full horizon (deterministic) | Hour, day, weekday, etc. always known |
|
| 396 |
| LTA (Long-Term Allocations) | {future_cov_counts['LTA']} | Full horizon (years) | Auction results known in advance |
|
| 397 |
| Load Forecasts | {future_cov_counts['Load Forecasts']} | D+1 (1 day) | TSO demand forecasts, published daily |
|
| 398 |
| Transmission Outages | {future_cov_counts['Transmission Outages']} | Up to {outage_ext} | Planned maintenance schedules |
|
|
|
|
| 928 |
|
| 929 |
|
| 930 |
@app.cell
|
| 931 |
+
def create_metadata(pl, categories, temporal_cols, lta_cols, load_forecast_cols, outage_cols, weather_cols, outage_stats):
|
| 932 |
"""Create feature metadata file."""
|
| 933 |
metadata_rows = []
|
| 934 |
|
|
|
|
| 945 |
source = 'Unknown'
|
| 946 |
|
| 947 |
# Determine if future covariate
|
| 948 |
+
meta_is_future = (meta_col in temporal_cols or
|
| 949 |
+
meta_col in lta_cols or
|
| 950 |
+
meta_col in load_forecast_cols or
|
| 951 |
+
meta_col in outage_cols or
|
| 952 |
+
meta_col in weather_cols)
|
| 953 |
|
| 954 |
# Determine extension days
|
| 955 |
+
if meta_col in temporal_cols:
|
| 956 |
+
meta_extension_days = 'Full horizon (deterministic)'
|
| 957 |
+
elif meta_col in lta_cols:
|
| 958 |
meta_extension_days = 'Full horizon (years)'
|
| 959 |
elif meta_col in load_forecast_cols:
|
| 960 |
meta_extension_days = '1 day (D+1)'
|
| 961 |
elif meta_col in outage_cols:
|
| 962 |
meta_extension_days = f"Up to {outage_stats['extension_days']} days" if outage_stats['extension_days'] else 'Variable'
|
| 963 |
+
elif meta_col in weather_cols:
|
| 964 |
+
meta_extension_days = '15 days (D+15 ECMWF)'
|
| 965 |
else:
|
| 966 |
meta_extension_days = 'N/A (historical)'
|
| 967 |
|
|
|
|
| 1031 |
- [OK] All 3 data sources merged (JAO + ENTSO-E + Weather)
|
| 1032 |
- [OK] Timestamps standardized to UTC with hourly frequency
|
| 1033 |
- [OK] {save_info['features_shape'][1] - 1:,} features engineered and cleaned
|
| 1034 |
+
- [OK] 615 future covariates identified (temporal, LTA, load forecasts, outages, weather)
|
| 1035 |
- [OK] Data quality validated (>99% completeness)
|
| 1036 |
- [OK] Standard decimal precision applied
|
| 1037 |
- [OK] Metadata file created for feature reference
|
|
@@ -841,34 +841,45 @@ def engineer_all_entsoe_features(
|
|
| 841 |
features = features.drop(list(null_pcts.keys()))
|
| 842 |
|
| 843 |
# Remove zero-variance features (constants)
|
|
|
|
| 844 |
zero_var_cols = []
|
| 845 |
for col in features.columns:
|
| 846 |
if col != 'timestamp':
|
|
|
|
|
|
|
|
|
|
| 847 |
# Check if all values are the same (excluding nulls)
|
| 848 |
non_null = features[col].drop_nulls()
|
| 849 |
if len(non_null) > 0 and non_null.n_unique() == 1:
|
| 850 |
zero_var_cols.append(col)
|
| 851 |
|
| 852 |
if zero_var_cols:
|
| 853 |
-
print(f"\nRemoving {len(zero_var_cols)} zero-variance features")
|
| 854 |
features = features.drop(zero_var_cols)
|
| 855 |
|
| 856 |
# Remove duplicate columns
|
|
|
|
| 857 |
dup_groups = {}
|
| 858 |
cols_to_check = [c for c in features.columns if c != 'timestamp']
|
| 859 |
|
| 860 |
for i, col1 in enumerate(cols_to_check):
|
| 861 |
if col1 in dup_groups.values(): # Already marked as duplicate
|
| 862 |
continue
|
|
|
|
|
|
|
|
|
|
| 863 |
for col2 in cols_to_check[i+1:]:
|
| 864 |
if col2 in dup_groups.values(): # Already marked as duplicate
|
| 865 |
continue
|
|
|
|
|
|
|
|
|
|
| 866 |
# Check if columns are identical
|
| 867 |
if features[col1].equals(features[col2]):
|
| 868 |
dup_groups[col2] = col1 # col2 is duplicate of col1
|
| 869 |
|
| 870 |
if dup_groups:
|
| 871 |
-
print(f"\nRemoving {len(dup_groups)} duplicate columns (keeping
|
| 872 |
features = features.drop(list(dup_groups.keys()))
|
| 873 |
|
| 874 |
features_after = len(features.columns) - 1
|
|
|
|
| 841 |
features = features.drop(list(null_pcts.keys()))
|
| 842 |
|
| 843 |
# Remove zero-variance features (constants)
|
| 844 |
+
# EXCEPT transmission outage features - keep them even if zero-filled for future inference
|
| 845 |
zero_var_cols = []
|
| 846 |
for col in features.columns:
|
| 847 |
if col != 'timestamp':
|
| 848 |
+
# Skip transmission outage features (needed for future inference)
|
| 849 |
+
if col.startswith('outage_cnec_'):
|
| 850 |
+
continue
|
| 851 |
# Check if all values are the same (excluding nulls)
|
| 852 |
non_null = features[col].drop_nulls()
|
| 853 |
if len(non_null) > 0 and non_null.n_unique() == 1:
|
| 854 |
zero_var_cols.append(col)
|
| 855 |
|
| 856 |
if zero_var_cols:
|
| 857 |
+
print(f"\nRemoving {len(zero_var_cols)} zero-variance features (keeping transmission outages)")
|
| 858 |
features = features.drop(zero_var_cols)
|
| 859 |
|
| 860 |
# Remove duplicate columns
|
| 861 |
+
# EXCEPT transmission outage features - keep all CNECs even if identical (all zeros)
|
| 862 |
dup_groups = {}
|
| 863 |
cols_to_check = [c for c in features.columns if c != 'timestamp']
|
| 864 |
|
| 865 |
for i, col1 in enumerate(cols_to_check):
|
| 866 |
if col1 in dup_groups.values(): # Already marked as duplicate
|
| 867 |
continue
|
| 868 |
+
# Skip transmission outage features (each CNEC needs its own column for inference)
|
| 869 |
+
if col1.startswith('outage_cnec_'):
|
| 870 |
+
continue
|
| 871 |
for col2 in cols_to_check[i+1:]:
|
| 872 |
if col2 in dup_groups.values(): # Already marked as duplicate
|
| 873 |
continue
|
| 874 |
+
# Skip transmission outage features
|
| 875 |
+
if col2.startswith('outage_cnec_'):
|
| 876 |
+
continue
|
| 877 |
# Check if columns are identical
|
| 878 |
if features[col1].equals(features[col2]):
|
| 879 |
dup_groups[col2] = col1 # col2 is duplicate of col1
|
| 880 |
|
| 881 |
if dup_groups:
|
| 882 |
+
print(f"\nRemoving {len(dup_groups)} duplicate columns (keeping transmission outages)")
|
| 883 |
features = features.drop(list(dup_groups.keys()))
|
| 884 |
|
| 885 |
features_after = len(features.columns) - 1
|