ContextualAI
/

ctxl-rerank-v2-instruct-multilingual-6b-nvfp4

@@ -4,41 +4,90 @@ license: cc-by-nc-sa-4.0
 pipeline_tag: text-ranking
 ---
 # Contextual AI Reranker v2 6B-NVFP4
 ## Highlights
-Our reranker is on the cost/performance Pareto frontier across 5 key areas:
-- Instruction following (including capability to rank more recent information higher)
-- Question answering
-- Multilinguality
-- Product search / recommendation systems
 - Real-world use cases
 <p align="center">
     <img src="main_benchmark.png" width="1200"/>
 <p>
-For more details on these and other benchmarks, please refer to our [blogpost](https://contextual.ai/blog/rerank-v2).
 ## Overview
-- Model Type: Text Reranking
-- Supported Languages: 100+
-- Number of Paramaters: 6B
-- Quantization: NVFP4
-- Context Length: up to 32K
-- Blogpost: https://contextual.ai/blog/rerank-v2
 ## Quickstart
-### vLLM usage
-Requires vllm==0.10.0 for NVFP4 or vllm>=0.8.5 for BF16.
 ```python
 import os
-os.environ['VLLM_USE_V1'] = '0'  # v1 engine doesn’t support logits processor yet
 import torch
 from vllm import LLM, SamplingParams
@@ -98,66 +147,20 @@ def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[
     print(f"Instruction: {instruction}")
     for score, doc_id, doc in results:
         print(f"Score: {score:.4f} | Doc: {doc}")
-```
-### Transformers Usage
-Requires transformers>=4.51.0 for BF16. Not supported for NVFP4.
-```python
-import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]:
-    """Format query and documents into prompts for reranking."""
-    if instruction:
-        instruction = f" {instruction}"
-    prompts = []
-    for doc in documents:
-        prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??"
-        prompts.append(prompt)
-    return prompts
-def infer_w_hf(model_path: str, query: str, instruction: str, documents: list[str]):
-    device = "cuda" if torch.cuda.is_available() else "cpu"
-    dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32
-    tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
-    if tokenizer.pad_token is None:
-        tokenizer.pad_token = tokenizer.eos_token
-    tokenizer.padding_side = "left"  # so -1 is the real last token for all prompts
-    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype).to(device)
-    model.eval()
-    prompts = format_prompts(query, instruction, documents)
-    enc = tokenizer(
-        prompts,
-        return_tensors="pt",
-        padding=True,
-        truncation=True,
-    )
-    input_ids = enc["input_ids"].to(device)
-    attention_mask = enc["attention_mask"].to(device)
-    with torch.no_grad():
-        out = model(input_ids=input_ids, attention_mask=attention_mask)
-    next_logits = out.logits[:, -1, :]  # [batch, vocab]
-    scores_bf16 = next_logits[:, 0].to(torch.bfloat16)
-    scores = scores_bf16.float().tolist()
-    # Sort by score (descending)
-    results = sorted([(s, i, documents[i]) for i, s in enumerate(scores)], key=lambda x: x[0], reverse=True)
-    print(f"Query: {query}")
-    print(f"Instruction: {instruction}")
-    for score, doc_id, doc in results:
-        print(f"Score: {score:.4f} | Doc: {doc}")
 ```
 ## Citation
@@ -167,7 +170,7 @@ If you use this model, please cite:
 ```bibtex
 @misc{ctxl_rerank_v2_instruct_multilingual,
       title={Contextual AI Reranker v2},
-      author={George Halal, Sheshansh Agrawal},
       year={2025},
       url={https://contextual.ai/blog/rerank-v2},
 }
@@ -179,4 +182,4 @@ Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
 ## Contact
-For questions or issues, please open an issue on the model repository or contact george@contextual.ai.

 pipeline_tag: text-ranking
 ---
+<div align="center">
 # Contextual AI Reranker v2 6B-NVFP4
+<img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/>
+[![Blog Post](https://img.shields.io/badge/📝%20Blog-ContextualReranker-green)](https://contextual.ai/blog/rerank-v2)
+[![Hugging Face Collection](https://img.shields.io/badge/🤗%20Hugging%20Face-Model%20Collection-yellow)](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2)
+</div>
+<hr>
 ## Highlights
+Contextual AI's reranker is the **first instruction-following reranker** capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across:
+- Instruction following
+- Question answering
+- Multilinguality (100+ languages)
+- Product search & recommendation
 - Real-world use cases
 <p align="center">
     <img src="main_benchmark.png" width="1200"/>
 <p>
+For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2).
 ## Overview
+- **Model Type**: Text Reranking
+- **Supported Languages**: 100+
+- **Parameters**: 6B
+- **Precision**: NVFP4 (4-bit floating point)
+- **Context Length**: up to 32K
+## When to Use This Model
+Use this reranker when you need to:
+- Re-rank retrieved documents with custom instructions
+- Handle conflicting information in retrieval results
+- Prioritize documents by recency or other criteria
+- Support multilingual search (100+ languages)
+- Process long contexts (up to 32K tokens)
+- **Maximize efficiency with 4-bit precision (NVFP4)**
 ## Quickstart
+### Basic Usage
+```python
+# Requires vLLM==0.10.0 for NVFP4 support
+# See full implementation below
+model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b-nvfp4"
+query = "What are the health benefits of exercise?"
+instruction = "Prioritize recent medical research"
+documents = [
+    "Regular exercise reduces risk of heart disease and improves mental health.",
+    "A 2024 study shows exercise enhances cognitive function in older adults.",
+    "Ancient Greeks valued physical fitness for military training."
+]
+infer_w_vllm(model_path, query, instruction, documents)
+```
+**Expected Output:**
+```
+Query: What are the health benefits of exercise?
+Instruction: Prioritize recent medical research
+Score: 0.8542 | Doc: A 2024 study shows exercise enhances cognitive function in older adults.
+Score: 0.7891 | Doc: Regular exercise reduces risk of heart disease and improves mental health.
+Score: 0.4123 | Doc: Ancient Greeks valued physical fitness for military training.
+```
+### vLLM Usage
+Requires `vllm==0.10.0` for NVFP4 support.
 ```python
 import os
+os.environ['VLLM_USE_V1'] = '0'  # v1 engine doesn't support logits processor yet
 import torch
 from vllm import LLM, SamplingParams
     print(f"Instruction: {instruction}")
     for score, doc_id, doc in results:
         print(f"Score: {score:.4f} | Doc: {doc}")
+# Example usage
+if __name__ == "__main__":
+    model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-6b-nvfp4"
+    query = "What are the health benefits of exercise?"
+    instruction = "Prioritize recent medical research"
+    documents = [
+        "Regular exercise reduces risk of heart disease and improves mental health.",
+        "A 2024 study shows exercise enhances cognitive function in older adults.",
+        "Ancient Greeks valued physical fitness for military training."
+    ]
+    infer_w_vllm(model_path, query, instruction, documents)
 ```
 ## Citation
 ```bibtex
 @misc{ctxl_rerank_v2_instruct_multilingual,
       title={Contextual AI Reranker v2},
+      author={Halal, George and Agrawal, Sheshansh},
       year={2025},
       url={https://contextual.ai/blog/rerank-v2},
 }
 ## Contact
+For questions or issues, please open an issue on the model repository.