Evgueni Poloukarov commited on
Commit
c6bf910
·
1 Parent(s): 10c4205

docs: update activity log with HF Space deployment milestone

Browse files
Files changed (1) hide show
  1. doc/activity.md +176 -0
doc/activity.md CHANGED
@@ -5124,3 +5124,179 @@ load_forecast_cols[timestamp > d1_cutoff] = np.nan
5124
  **Next Session**: Deploy to HF Space, run time-travel validation tests
5125
 
5126
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5124
  **Next Session**: Deploy to HF Space, run time-travel validation tests
5125
 
5126
  ---
5127
+
5128
+ ## Session 7: HuggingFace Space Deployment (Nov 14, 2025)
5129
+
5130
+ ### Objectives
5131
+ 1. Extend dataset to Oct 14, 2025 for multivariate forecasting
5132
+ 2. Create production-ready Jupyter notebooks for HF Space
5133
+ 3. Deploy to HuggingFace Space with Docker and GPU support
5134
+ 4. Enable zero-shot inference testing on A10G GPU
5135
+
5136
+ ### 1. Dataset Extension (Oct 2025 Data Processing)
5137
+
5138
+ **Problem**: Dataset ended Sept 30, 2025, but dynamic forecasting with `run_date=Sept 30, 23:00` requires Oct 1-14 future covariates (336 hours) for 14-day forecast.
5139
+
5140
+ **Solution**: Process October raw data and extend unified dataset.
5141
+
5142
+ #### Scripts Created
5143
+ - **`process_october_features.py`** (341 lines)
5144
+ - Processes weather and ENTSO-E raw data for Oct 1-14
5145
+ - Applies existing feature engineering modules
5146
+ - Output: 336 rows × 840 features (weather + ENTSO-E)
5147
+
5148
+ - **`extend_dataset.py`** (195 lines)
5149
+ - Merges October features with 24-month baseline
5150
+ - Handles dtype mismatches (176 columns fixed: Float64 → Int64)
5151
+ - Adds missing JAO features (1,736 columns) filled with nulls
5152
+ - Output: 17,880 rows × 2,553 features (Oct 2023 - Oct 14, 2025)
5153
+
5154
+ - **`upload_to_hf.py`** (146 lines)
5155
+ - Uploads extended dataset to HuggingFace
5156
+ - Replaces 24-month dataset with 24.5-month version
5157
+ - Dataset: `evgueni-p/fbmc-features-24month`
5158
+
5159
+ **Results**:
5160
+ - Dataset extended: 17,544 → 17,880 rows (+336 hours)
5161
+ - Date range: Oct 1, 2023 - Oct 14, 2025 (24.5 months)
5162
+ - Upload verified: 17,880 rows × 2,553 columns on HuggingFace
5163
+ - No data leakage: October data only available as future covariates
5164
+
5165
+ ### 2. Production Jupyter Notebooks
5166
+
5167
+ Created 3 notebooks for HuggingFace Space:
5168
+
5169
+ #### **`inference_smoke_test.ipynb`**
5170
+ - **Purpose**: Quick validation (1 border × 7 days, ~1 min)
5171
+ - **Configuration**:
5172
+ - Run date: Sept 30, 2025 23:00
5173
+ - Forecast: Oct 1-7 (168 hours)
5174
+ - Context: 512 hours
5175
+ - Single border test
5176
+ - **Features**:
5177
+ - Environment setup with GPU detection
5178
+ - Dataset loading from HuggingFace
5179
+ - Dynamic forecast system integration
5180
+ - Chronos-2 model loading on GPU
5181
+ - Zero-shot inference with visualization
5182
+
5183
+ #### **`inference_full_14day.ipynb`**
5184
+ - **Purpose**: Production run (38 borders × 14 days, ~5 min)
5185
+ - **Configuration**:
5186
+ - Run date: Sept 30, 2025 23:00
5187
+ - Forecast: Oct 1-14 (336 hours)
5188
+ - Context: 512 hours
5189
+ - All 38 borders
5190
+ - **Features**:
5191
+ - Batch processing with progress tracking
5192
+ - Per-border inference timing
5193
+ - Forecast export to parquet
5194
+ - Sample visualizations (4 borders)
5195
+ - Performance summary statistics
5196
+
5197
+ #### **`evaluation.ipynb`**
5198
+ - **Purpose**: Performance analysis vs Oct 1-14 actuals
5199
+ - **Metrics**:
5200
+ - D+1 MAE (first 24 hours) - Target: <150 MW
5201
+ - 14-day MAE (full horizon)
5202
+ - RMSE, MAPE across all borders
5203
+ - Best/worst border identification
5204
+ - **Outputs**:
5205
+ - Performance distribution histogram
5206
+ - Forecast vs actual comparison charts
5207
+ - CSV export of results
5208
+
5209
+ ### 3. HuggingFace Space Configuration
5210
+
5211
+ #### **Dockerfile** (Docker SDK for GPU)
5212
+ ```dockerfile
5213
+ FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
5214
+ WORKDIR /app
5215
+ COPY requirements.txt .
5216
+ RUN pip install --no-cache-dir -r requirements.txt
5217
+ COPY src/ ./src/
5218
+ COPY inference_smoke_test.ipynb .
5219
+ COPY inference_full_14day.ipynb .
5220
+ COPY evaluation.ipynb .
5221
+ EXPOSE 7860
5222
+ CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=7860", "--no-browser",
5223
+ "--allow-root", "--NotebookApp.token=''", "--NotebookApp.password=''"]
5224
+ ```
5225
+
5226
+ #### **README.md** (Space Metadata)
5227
+ - SDK: `docker` (changed from `jupyterlab` - not supported)
5228
+ - Hardware: `a10g-small` (NVIDIA A10G, 24GB VRAM)
5229
+ - License: MIT
5230
+ - Features: 2,553 engineered features, 38 borders
5231
+ - Model: Amazon Chronos-2 Large (710M params)
5232
+
5233
+ #### **requirements.txt** (GPU Dependencies)
5234
+ - Core ML: torch>=2.0.0, transformers>=4.35.0, chronos-forecasting>=1.2.0
5235
+ - Data: polars>=0.19.0, datasets>=2.14.0, pyarrow>=13.0.0
5236
+ - Viz: altair>=5.0.0
5237
+ - Jupyter: ipykernel, jupyter, jupyterlab
5238
+
5239
+ ### 4. Deployment
5240
+
5241
+ **Git Operations**:
5242
+ ```bash
5243
+ git add README.md requirements.txt Dockerfile inference_smoke_test.ipynb \
5244
+ inference_full_14day.ipynb evaluation.ipynb src/forecasting/
5245
+ git commit -m "feat: add HF Space deployment with Docker and Jupyter notebooks"
5246
+ git push origin master # GitHub repo
5247
+ git push hf-space master:main # HuggingFace Space
5248
+ ```
5249
+
5250
+ **Results**:
5251
+ - HuggingFace Space: https://huggingface.co/spaces/evgueni-p/fbmc-chronos2-forecast
5252
+ - GitHub Repo: https://github.com/evgspacdmy/fbmc_chronos2
5253
+ - Dataset: https://huggingface.co/datasets/evgueni-p/fbmc-features-24month
5254
+
5255
+ ### Files Created
5256
+ - `process_october_features.py` (341 lines)
5257
+ - `extend_dataset.py` (195 lines)
5258
+ - `upload_to_hf.py` (146 lines)
5259
+ - `Dockerfile` (17 lines)
5260
+ - `inference_smoke_test.ipynb` (16 cells)
5261
+ - `inference_full_14day.ipynb` (8 cells)
5262
+ - `evaluation.ipynb` (8 cells)
5263
+
5264
+ ### Files Modified
5265
+ - `README.md` - Changed SDK from jupyterlab to docker
5266
+ - `requirements.txt` - Renamed from requirements_hf_space.txt
5267
+
5268
+ ### Key Decisions
5269
+ 1. **Docker SDK**: Required for Jupyter deployment on HF Spaces (jupyterlab SDK not supported)
5270
+ 2. **No Gradio**: User confirmed Jupyter notebooks only (previous Gradio app archived)
5271
+ 3. **October extension**: Essential for multivariate forecasting with Sept 30 run date
5272
+ 4. **JAO features**: Filled with nulls for October (no API data available)
5273
+ 5. **Dataset naming**: Kept `fbmc-features-24month` (backwards compatible)
5274
+
5275
+ ### Technical Challenges Resolved
5276
+ 1. **Datetime precision mismatch**: Fixed μs vs ns timezone issues in Polars
5277
+ 2. **Dtype mismatches**: Cast 176 Float64 columns to Int64 to match schema
5278
+ 3. **HF SDK error**: Changed from unsupported `jupyterlab` to `docker`
5279
+ 4. **Missing October JAO**: Filled 1,736 columns with nulls (expected behavior)
5280
+ 5. **Forward-fill**: Oct 14 ENTSO-E data missing, forward-filled from Oct 13
5281
+
5282
+ ### Testing Status
5283
+ - Dataset upload: ✅ Verified 17,880 rows on HuggingFace
5284
+ - Git deployment: ✅ Pushed to both GitHub and HF Space
5285
+ - Docker build: ⏳ Pending (Space is building)
5286
+ - GPU inference: ⏳ Pending (awaiting Space startup)
5287
+ - MAE validation: ⏳ Pending (requires running evaluation notebook)
5288
+
5289
+ ### Next Steps
5290
+ 1. **Configure HF Space**: Set `HF_TOKEN` secret for private dataset access
5291
+ 2. **Test smoke test**: Run on A10G GPU, verify 1-border inference works
5292
+ 3. **Test full inference**: Run all 38 borders, validate 5-minute target
5293
+ 4. **Run evaluation**: Compare vs Oct 1-14 actuals, document MAE
5294
+ 5. **Update activity.md**: Final results and handover documentation
5295
+
5296
+ ---
5297
+
5298
+ **Status**: [IN PROGRESS] HF Space deployed, awaiting build completion
5299
+ **Timestamp**: 2025-11-14 12:30 UTC
5300
+ **Next Session**: Configure Space secrets, test notebooks on GPU, evaluate MAE
5301
+
5302
+ ---