Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

Evgueni Poloukarov Claude commited on Nov 16

Commit

c8d76da

1 Parent(s): 572e6a8

perf: switch to bfloat16 precision for memory efficiency

Changes:
- Default dtype: float32 → bfloat16 (50% memory reduction)
- Model memory: 16GB → ~8GB expected
- Enables 615-feature inference on L4 GPU (24GB VRAM)
- torch.inference_mode() + model.eval() + bfloat16 = full optimization stack

Memory calculation:
- Model (bfloat16): ~8GB
- Attention forward pass: 12.44GB
- Total: ~20.5GB < 24GB L4 capacity

Related commits: 572e6a8 (torch.inference_mode + model.eval)

Co-Authored-By: Claude <[email protected]>

Files changed (1) hide show

src/forecasting/chronos_inference.py +2 -2

src/forecasting/chronos_inference.py CHANGED Viewed

@@ -32,7 +32,7 @@ class ChronosInferencePipeline:
         self,
         model_name: str = "amazon/chronos-2",
         device: str = "cuda",
-        dtype: str = "float32"
     ):
         """
         Initialize inference pipeline.
@@ -40,7 +40,7 @@ class ChronosInferencePipeline:
         Args:
             model_name: HuggingFace model identifier (chronos-2 supports covariates)
             device: Device for inference ('cuda' or 'cpu')
-            dtype: Data type for model weights (float32 for chronos-2)
         """
         self.model_name = model_name
         self.device = device

         self,
         model_name: str = "amazon/chronos-2",
         device: str = "cuda",
+        dtype: str = "bfloat16"
     ):
         """
         Initialize inference pipeline.
         Args:
             model_name: HuggingFace model identifier (chronos-2 supports covariates)
             device: Device for inference ('cuda' or 'cpu')
+            dtype: Data type for model weights (bfloat16 for memory efficiency)
         """
         self.model_name = model_name
         self.device = device