Evgueni Poloukarov Claude commited on
Commit
10c4205
·
1 Parent(s): 3c8562f

feat: add HF Space deployment with Docker and Jupyter notebooks

Browse files

- Created Dockerfile for GPU-enabled JupyterLab environment
- Added 3 inference notebooks: smoke test, full 14-day, evaluation
- Configured Docker SDK for HF Spaces compatibility
- Extended dataset to Oct 14 for multivariate forecasting (17,880 rows)
- Includes dynamic forecast system with time-aware data extraction

Co-Authored-By: Claude <[email protected]>

Files changed (4) hide show
  1. Dockerfile +32 -0
  2. README.md +1 -4
  3. evaluation.ipynb +319 -0
  4. inference_full_14day.ipynb +361 -0
Dockerfile ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # HuggingFace Space Dockerfile for FBMC Chronos-2 Zero-Shot Forecasting
2
+ # GPU-enabled JupyterLab environment
3
+
4
+ FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
5
+
6
+ # Set working directory
7
+ WORKDIR /app
8
+
9
+ # Install system dependencies
10
+ RUN apt-get update && apt-get install -y \
11
+ git \
12
+ curl \
13
+ && rm -rf /var/lib/apt/lists/*
14
+
15
+ # Copy requirements and install Python dependencies
16
+ COPY requirements.txt .
17
+ RUN pip install --no-cache-dir -r requirements.txt
18
+
19
+ # Copy source code and notebooks
20
+ COPY src/ ./src/
21
+ COPY inference_smoke_test.ipynb .
22
+ COPY inference_full_14day.ipynb .
23
+ COPY evaluation.ipynb .
24
+
25
+ # Expose JupyterLab port
26
+ EXPOSE 7860
27
+
28
+ # Set environment variables
29
+ ENV JUPYTER_ENABLE_LAB=yes
30
+
31
+ # Start JupyterLab (HF Spaces expects port 7860)
32
+ CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=7860", "--no-browser", "--allow-root", "--NotebookApp.token=''", "--NotebookApp.password=''"]
README.md CHANGED
@@ -3,12 +3,9 @@ title: FBMC Chronos-2 Zero-Shot Forecasting
3
  emoji: ⚡
4
  colorFrom: blue
5
  colorTo: green
6
- sdk: jupyterlab
7
- sdk_version: "4.0.0"
8
- app_file: inference_smoke_test.ipynb
9
  pinned: false
10
  license: mit
11
- hardware: a10g-small
12
  ---
13
 
14
  # FBMC Flow-Based Market Coupling Forecasting
 
3
  emoji: ⚡
4
  colorFrom: blue
5
  colorTo: green
6
+ sdk: docker
 
 
7
  pinned: false
8
  license: mit
 
9
  ---
10
 
11
  # FBMC Flow-Based Market Coupling Forecasting
evaluation.ipynb ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# FBMC Chronos-2 Zero-Shot Evaluation\n",
8
+ "\n",
9
+ "**Performance analysis**: Compare 14-day forecasts vs actual flows (Oct 1-14, 2025)\n",
10
+ "\n",
11
+ "This notebook evaluates zero-shot forecast accuracy against ground truth."
12
+ ]
13
+ },
14
+ {
15
+ "cell_type": "markdown",
16
+ "metadata": {},
17
+ "source": [
18
+ "## 1. Environment Setup"
19
+ ]
20
+ },
21
+ {
22
+ "cell_type": "code",
23
+ "execution_count": null,
24
+ "metadata": {},
25
+ "outputs": [],
26
+ "source": [
27
+ "import os\n",
28
+ "import polars as pl\n",
29
+ "import numpy as np\n",
30
+ "from datetime import datetime\n",
31
+ "from datasets import load_dataset\n",
32
+ "import altair as alt\n",
33
+ "from pathlib import Path\n",
34
+ "\n",
35
+ "print(\"Environment setup complete\")"
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "markdown",
40
+ "metadata": {},
41
+ "source": [
42
+ "## 2. Load Forecasts and Actuals"
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "code",
47
+ "execution_count": null,
48
+ "metadata": {},
49
+ "outputs": [],
50
+ "source": [
51
+ "# Load forecasts from full inference run\n",
52
+ "forecast_path = Path('/home/user/app/forecasts_14day.parquet')\n",
53
+ "if not forecast_path.exists():\n",
54
+ " raise FileNotFoundError(\"Run inference_full_14day.ipynb first to generate forecasts\")\n",
55
+ "\n",
56
+ "forecasts = pl.read_parquet(forecast_path)\n",
57
+ "print(f\"Forecasts loaded: {forecasts.shape}\")\n",
58
+ "print(f\" Forecast period: {forecasts['timestamp'].min()} to {forecasts['timestamp'].max()}\")\n",
59
+ "\n",
60
+ "# Load actual values from dataset\n",
61
+ "hf_token = os.getenv(\"HF_TOKEN\")\n",
62
+ "dataset = load_dataset(\n",
63
+ " \"evgueni-p/fbmc-features-24month\",\n",
64
+ " split=\"train\",\n",
65
+ " token=hf_token\n",
66
+ ")\n",
67
+ "df = pl.from_arrow(dataset.data.table)\n",
68
+ "\n",
69
+ "# Extract Oct 1-14 actuals\n",
70
+ "actuals = df.filter(\n",
71
+ " (pl.col('timestamp') >= datetime(2025, 10, 1, 0, 0)) &\n",
72
+ " (pl.col('timestamp') <= datetime(2025, 10, 14, 23, 0))\n",
73
+ ")\n",
74
+ "\n",
75
+ "# Select only target columns\n",
76
+ "target_cols = [col for col in actuals.columns if col.startswith('target_border_')]\n",
77
+ "actuals = actuals.select(['timestamp'] + target_cols)\n",
78
+ "\n",
79
+ "print(f\"Actuals loaded: {actuals.shape}\")\n",
80
+ "print(f\" Actual period: {actuals['timestamp'].min()} to {actuals['timestamp'].max()}\")"
81
+ ]
82
+ },
83
+ {
84
+ "cell_type": "markdown",
85
+ "metadata": {},
86
+ "source": [
87
+ "## 3. Calculate Error Metrics"
88
+ ]
89
+ },
90
+ {
91
+ "cell_type": "code",
92
+ "execution_count": null,
93
+ "metadata": {},
94
+ "outputs": [],
95
+ "source": [
96
+ "# Align forecasts and actuals\n",
97
+ "borders = [col.replace('target_border_', '') for col in target_cols]\n",
98
+ "\n",
99
+ "results = []\n",
100
+ "\n",
101
+ "for border in borders:\n",
102
+ " forecast_col = f'forecast_{border}'\n",
103
+ " actual_col = f'target_border_{border}'\n",
104
+ " \n",
105
+ " if forecast_col not in forecasts.columns:\n",
106
+ " print(f\"Warning: No forecast for {border}\")\n",
107
+ " continue\n",
108
+ " \n",
109
+ " # Get forecast and actual values\n",
110
+ " y_pred = forecasts[forecast_col].to_numpy()\n",
111
+ " y_true = actuals[actual_col].to_numpy()\n",
112
+ " \n",
113
+ " # Skip if any nulls\n",
114
+ " if np.isnan(y_pred).any() or np.isnan(y_true).any():\n",
115
+ " print(f\"Warning: Nulls detected for {border}\")\n",
116
+ " continue\n",
117
+ " \n",
118
+ " # Calculate metrics\n",
119
+ " mae = np.abs(y_pred - y_true).mean()\n",
120
+ " rmse = np.sqrt(((y_pred - y_true) ** 2).mean())\n",
121
+ " mape = (np.abs((y_true - y_pred) / (y_true + 1e-8)) * 100).mean()\n",
122
+ " \n",
123
+ " # D+1 metrics (first 24 hours)\n",
124
+ " mae_d1 = np.abs(y_pred[:24] - y_true[:24]).mean()\n",
125
+ " \n",
126
+ " results.append({\n",
127
+ " 'border': border,\n",
128
+ " 'mae_14day': mae,\n",
129
+ " 'mae_d1': mae_d1,\n",
130
+ " 'rmse_14day': rmse,\n",
131
+ " 'mape_14day': mape,\n",
132
+ " 'actual_mean': y_true.mean(),\n",
133
+ " 'actual_std': y_true.std()\n",
134
+ " })\n",
135
+ "\n",
136
+ "results_df = pl.DataFrame(results).sort('mae_d1')\n",
137
+ "\n",
138
+ "print(f\"\\nEvaluation complete for {len(results)} borders\")"
139
+ ]
140
+ },
141
+ {
142
+ "cell_type": "markdown",
143
+ "metadata": {},
144
+ "source": [
145
+ "## 4. Overall Performance Summary"
146
+ ]
147
+ },
148
+ {
149
+ "cell_type": "code",
150
+ "execution_count": null,
151
+ "metadata": {},
152
+ "outputs": [],
153
+ "source": [
154
+ "print(\"=\"*60)\n",
155
+ "print(\"ZERO-SHOT PERFORMANCE SUMMARY\")\n",
156
+ "print(\"=\"*60)\n",
157
+ "print(f\"\\nD+1 MAE (First 24 hours):\")\n",
158
+ "print(f\" Mean: {results_df['mae_d1'].mean():.1f} MW\")\n",
159
+ "print(f\" Median: {results_df['mae_d1'].median():.1f} MW\")\n",
160
+ "print(f\" Best: {results_df['mae_d1'].min():.1f} MW ({results_df.filter(pl.col('mae_d1') == pl.col('mae_d1').min())['border'][0]})\")\n",
161
+ "print(f\" Worst: {results_df['mae_d1'].max():.1f} MW ({results_df.filter(pl.col('mae_d1') == pl.col('mae_d1').max())['border'][0]})\")\n",
162
+ "\n",
163
+ "print(f\"\\n14-Day MAE (Full horizon):\")\n",
164
+ "print(f\" Mean: {results_df['mae_14day'].mean():.1f} MW\")\n",
165
+ "print(f\" Median: {results_df['mae_14day'].median():.1f} MW\")\n",
166
+ "\n",
167
+ "print(f\"\\n14-Day RMSE:\")\n",
168
+ "print(f\" Mean: {results_df['rmse_14day'].mean():.1f} MW\")\n",
169
+ "print(f\" Median: {results_df['rmse_14day'].median():.1f} MW\")\n",
170
+ "\n",
171
+ "print(f\"\\n14-Day MAPE:\")\n",
172
+ "print(f\" Mean: {results_df['mape_14day'].mean():.1f}%\")\n",
173
+ "print(f\" Median: {results_df['mape_14day'].median():.1f}%\")\n",
174
+ "\n",
175
+ "# Target check\n",
176
+ "target_mae = 150 # MW\n",
177
+ "borders_meeting_target = results_df.filter(pl.col('mae_d1') <= target_mae)\n",
178
+ "print(f\"\\nBorders meeting D+1 MAE target (<= {target_mae} MW):\")\n",
179
+ "print(f\" {len(borders_meeting_target)}/{len(results_df)} ({len(borders_meeting_target)/len(results_df)*100:.1f}%)\")\n",
180
+ "\n",
181
+ "print(\"\\n\" + \"=\"*60)"
182
+ ]
183
+ },
184
+ {
185
+ "cell_type": "markdown",
186
+ "metadata": {},
187
+ "source": [
188
+ "## 5. Top 10 Best and Worst Borders"
189
+ ]
190
+ },
191
+ {
192
+ "cell_type": "code",
193
+ "execution_count": null,
194
+ "metadata": {},
195
+ "outputs": [],
196
+ "source": [
197
+ "print(\"Top 10 Best Performers (D+1 MAE):\")\n",
198
+ "print(results_df.head(10).select(['border', 'mae_d1', 'mae_14day', 'rmse_14day']))\n",
199
+ "\n",
200
+ "print(\"\\nTop 10 Worst Performers (D+1 MAE):\")\n",
201
+ "print(results_df.tail(10).select(['border', 'mae_d1', 'mae_14day', 'rmse_14day']))"
202
+ ]
203
+ },
204
+ {
205
+ "cell_type": "markdown",
206
+ "metadata": {},
207
+ "source": [
208
+ "## 6. Visualize Performance Distribution"
209
+ ]
210
+ },
211
+ {
212
+ "cell_type": "code",
213
+ "execution_count": null,
214
+ "metadata": {},
215
+ "outputs": [],
216
+ "source": [
217
+ "# MAE distribution histogram\n",
218
+ "mae_hist = alt.Chart(results_df.to_pandas()).mark_bar().encode(\n",
219
+ " x=alt.X('mae_d1:Q', bin=alt.Bin(maxbins=20), title='D+1 MAE (MW)'),\n",
220
+ " y=alt.Y('count()', title='Number of Borders')\n",
221
+ ").properties(\n",
222
+ " width=600,\n",
223
+ " height=300,\n",
224
+ " title='D+1 MAE Distribution Across Borders'\n",
225
+ ")\n",
226
+ "\n",
227
+ "# Add target line\n",
228
+ "target_line = alt.Chart(pl.DataFrame({'target': [150]})).mark_rule(color='red', strokeDash=[5, 5]).encode(\n",
229
+ " x='target:Q'\n",
230
+ ")\n",
231
+ "\n",
232
+ "mae_hist + target_line"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "markdown",
237
+ "metadata": {},
238
+ "source": [
239
+ "## 7. Compare Best vs Worst Border"
240
+ ]
241
+ },
242
+ {
243
+ "cell_type": "code",
244
+ "execution_count": null,
245
+ "metadata": {},
246
+ "outputs": [],
247
+ "source": [
248
+ "# Select best and worst border\n",
249
+ "best_border = results_df.head(1)['border'][0]\n",
250
+ "worst_border = results_df.tail(1)['border'][0]\n",
251
+ "\n",
252
+ "# Create comparison charts\n",
253
+ "charts = []\n",
254
+ "for border in [best_border, worst_border]:\n",
255
+ " # Combine forecast and actual\n",
256
+ " viz_data = pl.DataFrame({\n",
257
+ " 'timestamp': forecasts['timestamp'],\n",
258
+ " 'Forecast': forecasts[f'forecast_{border}'],\n",
259
+ " 'Actual': actuals[f'target_border_{border}']\n",
260
+ " }).unpivot(index='timestamp', variable_name='type', value_name='flow')\n",
261
+ " \n",
262
+ " mae = results_df.filter(pl.col('border') == border)['mae_d1'][0]\n",
263
+ " \n",
264
+ " chart = alt.Chart(viz_data.to_pandas()).mark_line().encode(\n",
265
+ " x=alt.X('timestamp:T', title='Date'),\n",
266
+ " y=alt.Y('flow:Q', title='Flow (MW)'),\n",
267
+ " color='type:N',\n",
268
+ " strokeDash='type:N'\n",
269
+ " ).properties(\n",
270
+ " width=600,\n",
271
+ " height=250,\n",
272
+ " title=f'{border} (D+1 MAE: {mae:.1f} MW)'\n",
273
+ " )\n",
274
+ " charts.append(chart)\n",
275
+ "\n",
276
+ "alt.vconcat(*charts).properties(\n",
277
+ " title='Best vs Worst Performing Border'\n",
278
+ ")"
279
+ ]
280
+ },
281
+ {
282
+ "cell_type": "markdown",
283
+ "metadata": {},
284
+ "source": [
285
+ "## 8. Export Results"
286
+ ]
287
+ },
288
+ {
289
+ "cell_type": "code",
290
+ "execution_count": null,
291
+ "metadata": {},
292
+ "outputs": [],
293
+ "source": [
294
+ "# Save results to CSV\n",
295
+ "output_path = Path('/home/user/app/evaluation_results.csv')\n",
296
+ "results_df.write_csv(output_path)\n",
297
+ "\n",
298
+ "print(f\"✓ Results saved to {output_path}\")\n",
299
+ "print(f\"\\nEvaluation complete!\")\n",
300
+ "print(f\" Borders evaluated: {len(results_df)}\")\n",
301
+ "print(f\" Mean D+1 MAE: {results_df['mae_d1'].mean():.1f} MW\")\n",
302
+ "print(f\" Target (<= 150 MW): {'ACHIEVED' if results_df['mae_d1'].mean() <= 150 else 'NOT MET'}\")"
303
+ ]
304
+ }
305
+ ],
306
+ "metadata": {
307
+ "kernelspec": {
308
+ "display_name": "Python 3",
309
+ "language": "python",
310
+ "name": "python3"
311
+ },
312
+ "language_info": {
313
+ "name": "python",
314
+ "version": "3.10.0"
315
+ }
316
+ },
317
+ "nbformat": 4,
318
+ "nbformat_minor": 4
319
+ }
inference_full_14day.ipynb ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# FBMC Chronos-2 Zero-Shot Inference - Full Production Forecast\n",
8
+ "\n",
9
+ "**Production run**: 38 borders × 14 days (336 hours)\n",
10
+ "\n",
11
+ "This notebook runs complete zero-shot forecasts for all FBMC borders on HuggingFace Space with GPU."
12
+ ]
13
+ },
14
+ {
15
+ "cell_type": "markdown",
16
+ "metadata": {},
17
+ "source": [
18
+ "## 1. Environment Setup"
19
+ ]
20
+ },
21
+ {
22
+ "cell_type": "code",
23
+ "execution_count": null,
24
+ "metadata": {},
25
+ "outputs": [],
26
+ "source": [
27
+ "import time\n",
28
+ "import os\n",
29
+ "import polars as pl\n",
30
+ "import torch\n",
31
+ "from datetime import datetime, timedelta\n",
32
+ "from datasets import load_dataset\n",
33
+ "from chronos import ChronosPipeline\n",
34
+ "import altair as alt\n",
35
+ "from pathlib import Path\n",
36
+ "\n",
37
+ "# Add src to path for imports\n",
38
+ "import sys\n",
39
+ "sys.path.append('/home/user/app/src') # HF Space path\n",
40
+ "\n",
41
+ "from forecasting.dynamic_forecast import DynamicForecast\n",
42
+ "from forecasting.feature_availability import FeatureAvailability\n",
43
+ "\n",
44
+ "print(\"Environment setup complete\")\n",
45
+ "print(f\"PyTorch version: {torch.__version__}\")\n",
46
+ "print(f\"GPU available: {torch.cuda.is_available()}\")\n",
47
+ "if torch.cuda.is_available():\n",
48
+ " print(f\"GPU device: {torch.cuda.get_device_name(0)}\")"
49
+ ]
50
+ },
51
+ {
52
+ "cell_type": "markdown",
53
+ "metadata": {},
54
+ "source": [
55
+ "## 2. Load Extended Dataset from HuggingFace"
56
+ ]
57
+ },
58
+ {
59
+ "cell_type": "code",
60
+ "execution_count": null,
61
+ "metadata": {},
62
+ "outputs": [],
63
+ "source": [
64
+ "print(\"Loading dataset from HuggingFace...\")\n",
65
+ "start_time = time.time()\n",
66
+ "\n",
67
+ "# Load dataset\n",
68
+ "hf_token = os.getenv(\"HF_TOKEN\")\n",
69
+ "dataset = load_dataset(\n",
70
+ " \"evgueni-p/fbmc-features-24month\",\n",
71
+ " split=\"train\",\n",
72
+ " token=hf_token\n",
73
+ ")\n",
74
+ "\n",
75
+ "# Convert to Polars\n",
76
+ "df = pl.from_arrow(dataset.data.table)\n",
77
+ "\n",
78
+ "print(f\"✓ Loaded: {df.shape}\")\n",
79
+ "print(f\" Date range: {df['timestamp'].min()} to {df['timestamp'].max()}\")\n",
80
+ "print(f\" Load time: {time.time() - start_time:.1f}s\")"
81
+ ]
82
+ },
83
+ {
84
+ "cell_type": "markdown",
85
+ "metadata": {},
86
+ "source": [
87
+ "## 3. Configure Dynamic Forecast System"
88
+ ]
89
+ },
90
+ {
91
+ "cell_type": "code",
92
+ "execution_count": null,
93
+ "metadata": {},
94
+ "outputs": [],
95
+ "source": [
96
+ "# Categorize features by availability\n",
97
+ "categories = FeatureAvailability.categorize_features(df.columns)\n",
98
+ "\n",
99
+ "print(\"Feature categorization:\")\n",
100
+ "print(f\" Full-horizon D+14: {len(categories['full_horizon_d14'])} features\")\n",
101
+ "print(f\" Partial D+1: {len(categories['partial_d1'])} features\")\n",
102
+ "print(f\" Historical only: {len(categories['historical'])} features\")\n",
103
+ "print(f\" Total: {sum(len(v) for v in categories.values())} features\")\n",
104
+ "\n",
105
+ "# Identify target borders\n",
106
+ "target_cols = [col for col in df.columns if col.startswith('target_border_')]\n",
107
+ "borders = [col.replace('target_border_', '') for col in target_cols]\n",
108
+ "print(f\"\\n✓ Found {len(borders)} borders\")\n",
109
+ "print(f\" Borders: {', '.join(borders[:5])}...\")"
110
+ ]
111
+ },
112
+ {
113
+ "cell_type": "markdown",
114
+ "metadata": {},
115
+ "source": [
116
+ "## 4. Load Chronos-2 Model on GPU"
117
+ ]
118
+ },
119
+ {
120
+ "cell_type": "code",
121
+ "execution_count": null,
122
+ "metadata": {},
123
+ "outputs": [],
124
+ "source": [
125
+ "print(\"Loading Chronos-2 Large model...\")\n",
126
+ "start_time = time.time()\n",
127
+ "\n",
128
+ "pipeline = ChronosPipeline.from_pretrained(\n",
129
+ " \"amazon/chronos-t5-large\",\n",
130
+ " device_map=\"cuda\",\n",
131
+ " torch_dtype=torch.bfloat16\n",
132
+ ")\n",
133
+ "\n",
134
+ "print(f\"✓ Model loaded in {time.time() - start_time:.1f}s\")\n",
135
+ "print(f\" Device: {next(pipeline.model.parameters()).device}\")\n",
136
+ "print(f\" Dtype: {next(pipeline.model.parameters()).dtype}\")"
137
+ ]
138
+ },
139
+ {
140
+ "cell_type": "markdown",
141
+ "metadata": {},
142
+ "source": [
143
+ "## 5. Run Zero-Shot Inference for All Borders"
144
+ ]
145
+ },
146
+ {
147
+ "cell_type": "code",
148
+ "execution_count": null,
149
+ "metadata": {},
150
+ "outputs": [],
151
+ "source": [
152
+ "# Production configuration\n",
153
+ "prediction_hours = 336 # 14 days\n",
154
+ "context_hours = 512 # Context window\n",
155
+ "run_date = datetime(2025, 9, 30, 23, 0) # Sept 30 11 PM\n",
156
+ "\n",
157
+ "print(\"Production forecast configuration:\")\n",
158
+ "print(f\" Run date: {run_date}\")\n",
159
+ "print(f\" Context: {context_hours} hours\")\n",
160
+ "print(f\" Forecast: {prediction_hours} hours (14 days)\")\n",
161
+ "print(f\" Forecast range: Oct 1 00:00 to Oct 14 23:00\")\n",
162
+ "print(f\" Borders: {len(borders)}\")\n",
163
+ "print()\n",
164
+ "\n",
165
+ "# Initialize dynamic forecast\n",
166
+ "forecaster = DynamicForecast(\n",
167
+ " df=df,\n",
168
+ " feature_categories=categories\n",
169
+ ")\n",
170
+ "\n",
171
+ "# Storage for all forecasts\n",
172
+ "all_forecasts = {}\n",
173
+ "inference_times = {}\n",
174
+ "\n",
175
+ "# Run inference for each border\n",
176
+ "total_start = time.time()\n",
177
+ "\n",
178
+ "for i, border in enumerate(borders, 1):\n",
179
+ " print(f\"[{i}/{len(borders)}] Processing {border}...\", end=\" \")\n",
180
+ " \n",
181
+ " try:\n",
182
+ " # Extract data\n",
183
+ " context_data, future_data = forecaster.prepare_forecast_data(\n",
184
+ " run_date=run_date,\n",
185
+ " border=border\n",
186
+ " )\n",
187
+ " \n",
188
+ " # Get context (last 512 hours)\n",
189
+ " context = context_data.select([border]).to_numpy()[-context_hours:].flatten()\n",
190
+ " \n",
191
+ " # Run inference\n",
192
+ " start_time = time.time()\n",
193
+ " forecast = pipeline.predict(\n",
194
+ " context=context,\n",
195
+ " prediction_length=prediction_hours,\n",
196
+ " num_samples=20\n",
197
+ " )\n",
198
+ " elapsed = time.time() - start_time\n",
199
+ " \n",
200
+ " # Store median forecast\n",
201
+ " forecast_median = forecast.numpy().median(axis=0)\n",
202
+ " all_forecasts[border] = forecast_median\n",
203
+ " inference_times[border] = elapsed\n",
204
+ " \n",
205
+ " print(f\"✓ {elapsed:.1f}s\")\n",
206
+ " \n",
207
+ " except Exception as e:\n",
208
+ " print(f\"✗ ERROR: {str(e)}\")\n",
209
+ " all_forecasts[border] = None\n",
210
+ " inference_times[border] = 0.0\n",
211
+ "\n",
212
+ "total_time = time.time() - total_start\n",
213
+ "\n",
214
+ "print(\"\\n\" + \"=\"*60)\n",
215
+ "print(\"INFERENCE COMPLETE\")\n",
216
+ "print(\"=\"*60)\n",
217
+ "print(f\"Total time: {total_time/60:.1f} minutes\")\n",
218
+ "print(f\"Avg per border: {total_time/len(borders):.1f}s\")\n",
219
+ "print(f\"Successful: {sum(1 for v in all_forecasts.values() if v is not None)}/{len(borders)}\")"
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "markdown",
224
+ "metadata": {},
225
+ "source": [
226
+ "## 6. Save Forecasts to Parquet"
227
+ ]
228
+ },
229
+ {
230
+ "cell_type": "code",
231
+ "execution_count": null,
232
+ "metadata": {},
233
+ "outputs": [],
234
+ "source": [
235
+ "# Create timestamp range for forecasts\n",
236
+ "forecast_timestamps = pl.datetime_range(\n",
237
+ " datetime(2025, 10, 1, 0, 0),\n",
238
+ " datetime(2025, 10, 14, 23, 0),\n",
239
+ " interval='1h',\n",
240
+ " eager=True\n",
241
+ ")\n",
242
+ "\n",
243
+ "# Build forecast DataFrame\n",
244
+ "forecast_data = {'timestamp': forecast_timestamps}\n",
245
+ "for border, forecast in all_forecasts.items():\n",
246
+ " if forecast is not None:\n",
247
+ " forecast_data[f'forecast_{border}'] = forecast.tolist()\n",
248
+ " else:\n",
249
+ " forecast_data[f'forecast_{border}'] = [None] * len(forecast_timestamps)\n",
250
+ "\n",
251
+ "forecast_df = pl.DataFrame(forecast_data)\n",
252
+ "\n",
253
+ "# Save to parquet\n",
254
+ "output_path = Path('/home/user/app/forecasts_14day.parquet')\n",
255
+ "forecast_df.write_parquet(output_path)\n",
256
+ "\n",
257
+ "print(f\"✓ Forecasts saved: {forecast_df.shape}\")\n",
258
+ "print(f\" File: {output_path}\")\n",
259
+ "print(f\" Size: {output_path.stat().st_size / 1024 / 1024:.1f} MB\")"
260
+ ]
261
+ },
262
+ {
263
+ "cell_type": "markdown",
264
+ "metadata": {},
265
+ "source": [
266
+ "## 7. Visualize Sample Borders"
267
+ ]
268
+ },
269
+ {
270
+ "cell_type": "code",
271
+ "execution_count": null,
272
+ "metadata": {},
273
+ "outputs": [],
274
+ "source": [
275
+ "# Select 4 representative borders for visualization\n",
276
+ "sample_borders = borders[:4]\n",
277
+ "\n",
278
+ "charts = []\n",
279
+ "for border in sample_borders:\n",
280
+ " if all_forecasts[border] is not None:\n",
281
+ " viz_data = pl.DataFrame({\n",
282
+ " 'timestamp': forecast_timestamps,\n",
283
+ " 'forecast': all_forecasts[border].tolist()\n",
284
+ " })\n",
285
+ " \n",
286
+ " chart = alt.Chart(viz_data.to_pandas()).mark_line().encode(\n",
287
+ " x=alt.X('timestamp:T', title='Date'),\n",
288
+ " y=alt.Y('forecast:Q', title='Flow (MW)'),\n",
289
+ " tooltip=['timestamp:T', alt.Tooltip('forecast:Q', format='.0f')]\n",
290
+ " ).properties(\n",
291
+ " width=400,\n",
292
+ " height=200,\n",
293
+ " title=f'{border}'\n",
294
+ " )\n",
295
+ " charts.append(chart)\n",
296
+ "\n",
297
+ "# Combine into 2x2 grid\n",
298
+ "combined = alt.vconcat(\n",
299
+ " alt.hconcat(charts[0], charts[1]),\n",
300
+ " alt.hconcat(charts[2], charts[3])\n",
301
+ ").properties(\n",
302
+ " title='Sample Zero-Shot Forecasts (Oct 1-14, 2025)'\n",
303
+ ")\n",
304
+ "\n",
305
+ "combined"
306
+ ]
307
+ },
308
+ {
309
+ "cell_type": "markdown",
310
+ "metadata": {},
311
+ "source": [
312
+ "## 8. Performance Summary"
313
+ ]
314
+ },
315
+ {
316
+ "cell_type": "code",
317
+ "execution_count": null,
318
+ "metadata": {},
319
+ "outputs": [],
320
+ "source": [
321
+ "# Create performance summary\n",
322
+ "perf_data = pl.DataFrame({\n",
323
+ " 'border': list(inference_times.keys()),\n",
324
+ " 'inference_time_s': list(inference_times.values()),\n",
325
+ " 'status': ['SUCCESS' if all_forecasts[b] is not None else 'FAILED' for b in inference_times.keys()]\n",
326
+ "}).sort('inference_time_s', descending=True)\n",
327
+ "\n",
328
+ "print(\"\\nTop 10 Slowest Borders:\")\n",
329
+ "print(perf_data.head(10))\n",
330
+ "\n",
331
+ "print(\"\\nPerformance Statistics:\")\n",
332
+ "print(f\" Mean: {perf_data['inference_time_s'].mean():.1f}s\")\n",
333
+ "print(f\" Median: {perf_data['inference_time_s'].median():.1f}s\")\n",
334
+ "print(f\" Min: {perf_data['inference_time_s'].min():.1f}s\")\n",
335
+ "print(f\" Max: {perf_data['inference_time_s'].max():.1f}s\")\n",
336
+ "\n",
337
+ "print(\"\\n\" + \"=\"*60)\n",
338
+ "print(\"PRODUCTION FORECAST COMPLETE\")\n",
339
+ "print(\"=\"*60)\n",
340
+ "print(f\"Borders processed: {len(borders)}\")\n",
341
+ "print(f\"Forecast horizon: 14 days (336 hours)\")\n",
342
+ "print(f\"Total runtime: {total_time/60:.1f} minutes\")\n",
343
+ "print(f\"Output: forecasts_14day.parquet\")\n",
344
+ "print(f\"\\n✓ Ready for evaluation against Oct 1-14 actuals\")"
345
+ ]
346
+ }
347
+ ],
348
+ "metadata": {
349
+ "kernelspec": {
350
+ "display_name": "Python 3",
351
+ "language": "python",
352
+ "name": "python3"
353
+ },
354
+ "language_info": {
355
+ "name": "python",
356
+ "version": "3.10.0"
357
+ }
358
+ },
359
+ "nbformat": 4,
360
+ "nbformat_minor": 4
361
+ }