prov-gigatime

community

Activity Feed

AI & ML interests

None defined yet.

alvarobartt

posted an update 10 days ago

Post

274

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

alvarobartt

posted an update 14 days ago

Post

3270

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

alvarobartt

posted an update 3 months ago

Post

3742

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

alvarobartt

posted an update 4 months ago

Post

3271

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

jmjose

updated a model 6 months ago

prov-gigatime/GigaTIME

Image-to-Image • Updated Dec 11, 2025 • 227 • 64

jmjose

published a model 6 months ago

prov-gigatime/GigaTIME

Image-to-Image • Updated Dec 11, 2025 • 227 • 64

naotous

updated a model 6 months ago

prov-gigatime/GigaTIME

Image-to-Image • Updated Dec 11, 2025 • 227 • 64

hoifung

authored a paper about 1 year ago

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

Paper • 2505.03981 • Published May 6, 2025 • 15

tnaumann

authored a paper about 1 year ago

X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains

Paper • 2505.03981 • Published May 6, 2025 • 15

tnaumann

authored 5 papers over 1 year ago

Diagnosing Transformers: Illuminating Feature Spaces for Clinical Decision-Making

Paper • 2305.17588 • Published May 27, 2023

Self-Verification Improves Few-Shot Clinical Information Extraction

Paper • 2306.00024 • Published May 30, 2023

BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once

Paper • 2405.12971 • Published May 21, 2024 • 2

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

Paper • 2403.08002 • Published Mar 12, 2024 • 2

Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning

Paper • 2502.19655 • Published Feb 27, 2025 • 1

alvarobartt

posted an update over 1 year ago

Post

3647

🔥 Agents can do anything! @microsoft Research just announced the release of Magma 8B!

Magma is a new Visual Language Model (VLM) with 8B parameters for multi-modal agents designed to handle complex interactions across virtual and real environments; and it's MIT licensed!

Magma comes with exciting new features such as:
- Introduces the Set-of-Mark and Trace-of-Mark techniques for fine-tuning
- Leverages a large amount of unlabeled video data to learn the spatial-temporal grounding and planning
- A strong generalization and ability to be fine-tuned for other agentic tasks
- SOTA in different multi-modal benchmarks spanning across UI navigation, robotics manipulation, image / video understanding and spatial understanding and reasoning
- Generates goal-driven visual plans and actions for agentic use cases

Model: microsoft/Magma-8B
Technical Report: Magma: A Foundation Model for Multimodal AI Agents (2502.13130)

naotous

authored 4 papers over 1 year ago

Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing

Paper • 2303.00915 • Published Mar 2, 2023 • 6

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

Paper • 2311.16452 • Published Nov 28, 2023 • 2

BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once

Paper • 2405.12971 • Published May 21, 2024 • 2

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Paper • 2411.03590 • Published Nov 6, 2024 • 10

alvarobartt

posted an update almost 2 years ago

Post

3043

🤗 Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai

AI & ML interests

Team members 7

prov-gigatime's activity