Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

Evgueni Poloukarov Claude commited on Nov 17

Commit

3b607e3

1 Parent(s): 13db9d8

fix: add GPU cache clearing for multi-border forecasts

Prevents GPU memory accumulation across sequential forecasts by clearing CUDA
cache after each border completes. This enables multi-border forecasting within
24GB VRAM limit on L4 GPU.

Technical details:
- Add torch.cuda.empty_cache() after each border forecast (line 241)
- Releases intermediate tensors without affecting model weights (710M params)
- Does NOT impact forecast accuracy (each border processes independently)
- Solves OOM errors in full_14day forecasts (38 borders)

Memory before: 17.71 GB allocated + 10.75 GB needed = OOM
Memory after: Cache cleared between borders, enabling sequential processing

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

src/forecasting/chronos_inference.py +6 -0

src/forecasting/chronos_inference.py CHANGED Viewed

@@ -234,6 +234,12 @@ class ChronosInferencePipeline:
                 print(f"    [OK] Complete in {inference_time:.1f}s (WITH {len(future_data.columns)-2} covariates)", flush=True)
             except Exception as e:
                 import traceback
                 error_msg = f"{type(e).__name__}: {str(e)}"

                 print(f"    [OK] Complete in {inference_time:.1f}s (WITH {len(future_data.columns)-2} covariates)", flush=True)
+                # Release GPU memory cache before processing next border
+                # This prevents memory accumulation across sequential forecasts
+                # Does NOT affect model weights (710M params stay loaded)
+                # Does NOT affect forecast accuracy (each border is independent)
+                torch.cuda.empty_cache()
             except Exception as e:
                 import traceback
                 error_msg = f"{type(e).__name__}: {str(e)}"