Evgueni Poloukarov Claude commited on
Commit
3b607e3
·
1 Parent(s): 13db9d8

fix: add GPU cache clearing for multi-border forecasts

Browse files

Prevents GPU memory accumulation across sequential forecasts by clearing CUDA
cache after each border completes. This enables multi-border forecasting within
24GB VRAM limit on L4 GPU.

Technical details:
- Add torch.cuda.empty_cache() after each border forecast (line 241)
- Releases intermediate tensors without affecting model weights (710M params)
- Does NOT impact forecast accuracy (each border processes independently)
- Solves OOM errors in full_14day forecasts (38 borders)

Memory before: 17.71 GB allocated + 10.75 GB needed = OOM
Memory after: Cache cleared between borders, enabling sequential processing

Co-Authored-By: Claude <[email protected]>

src/forecasting/chronos_inference.py CHANGED
@@ -234,6 +234,12 @@ class ChronosInferencePipeline:
234
 
235
  print(f" [OK] Complete in {inference_time:.1f}s (WITH {len(future_data.columns)-2} covariates)", flush=True)
236
 
 
 
 
 
 
 
237
  except Exception as e:
238
  import traceback
239
  error_msg = f"{type(e).__name__}: {str(e)}"
 
234
 
235
  print(f" [OK] Complete in {inference_time:.1f}s (WITH {len(future_data.columns)-2} covariates)", flush=True)
236
 
237
+ # Release GPU memory cache before processing next border
238
+ # This prevents memory accumulation across sequential forecasts
239
+ # Does NOT affect model weights (710M params stay loaded)
240
+ # Does NOT affect forecast accuracy (each border is independent)
241
+ torch.cuda.empty_cache()
242
+
243
  except Exception as e:
244
  import traceback
245
  error_msg = f"{type(e).__name__}: {str(e)}"