RTriever-4B
RTriever-4B is a 4-billion-parameter dense retriever based on Qwen/Qwen3-Embedding-4B, specialized for reasoning-intensive information retrieval.
Model details
| Base model | Qwen/Qwen3-Embedding-4B |
| Parameters | 4 B |
| Hidden size | 2,560 |
| Layers | 36 |
| Attention heads | 32 |
| Embedding dimension | 2,560 |
| Pooling | Last-token + L2 normalization |
| Max sequence length | 32,768 tokens |
| Tokenizer | Qwen3 (vocab size 151,665) |
| Format | safetensors (sharded, fp16); also exposes a sentence-transformers interface |
| License | MIT |
The model can be loaded as either a sentence-transformers model or a plain transformers causal-LM with manual last-token pooling.
Quick start
Option A β sentence-transformers (recommended)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("yale-nlp/RTriever-4B")
# Queries should use a query prompt; documents should NOT have a prompt.
query = "Why are insects attracted to light at night?"
docs = [
"Recent flight-tracking studies show insects orient their dorsal axis toward "
"the brightest visual region; near a point light source, this dorsal-light "
"response disrupts flight stability and traps the insect.",
"Fluorescent lamps emit in the UV range, which can be perceived by some "
"nocturnal insects as a navigational cue similar to moonlight.",
]
q_emb = model.encode(query, prompt_name="query")
d_emb = model.encode(docs)
# Cosine similarity (the embeddings are L2-normalized, so a dot product suffices)
scores = q_emb @ d_emb.T
print(scores)
The prompt_name="query" flag prepends the default query instruction stored in config_sentence_transformers.json:
Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:
For task-specific instructions (e.g. a domain-tuned prompt template), pass prompt=... directly:
custom_prompt = (
"Given a Biology post, retrieve relevant passages that help answer the post\nPost: "
)
q_emb = model.encode(query, prompt=custom_prompt)
Option B β transformers with manual pooling
import torch
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("yale-nlp/RTriever-4B")
model = AutoModel.from_pretrained(
"yale-nlp/RTriever-4B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
QUERY_PROMPT = "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery:"
def last_token_pool(last_hidden_states, attention_mask):
# Right-padding-aware last-token pooling. Works whether or not the tokenizer
# left-pads.
left_padding = attention_mask[:, -1].sum() == attention_mask.shape[0]
if left_padding:
return last_hidden_states[:, -1]
seq_lens = attention_mask.sum(dim=1) - 1
return last_hidden_states[torch.arange(last_hidden_states.size(0)), seq_lens]
def encode(texts, prompt: str = ""):
if prompt:
texts = [prompt + t for t in texts] # match sentence-transformers behavior: no separator
batch = tokenizer(texts, padding=True, truncation=True, max_length=8192, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model(**batch)
pooled = last_token_pool(out.last_hidden_state, batch["attention_mask"])
return F.normalize(pooled, p=2, dim=1)
queries = ["Why are insects attracted to light at night?"]
docs = [
"Recent flight-tracking studies show insects orient their dorsal axis toward "
"the brightest visual region; near a point light source, this dorsal-light "
"response disrupts flight stability and traps the insect.",
]
q_emb = encode(queries, prompt=QUERY_PROMPT)
d_emb = encode(docs, prompt="")
scores = (q_emb @ d_emb.T).cpu().tolist()
print(scores)
Option C β batched retrieval over a corpus
import numpy as np
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("yale-nlp/RTriever-4B")
corpus = [...] # list[str], thousandsβmillions of docs
doc_emb = model.encode(corpus, batch_size=16, show_progress_bar=True)
queries = [...] # list[str]
q_emb = model.encode(queries, prompt_name="query", batch_size=16)
# Top-k retrieval (cosine similarity == dot product, since both sides are L2-normalized)
scores = q_emb @ doc_emb.T # (n_query, n_doc)
top_k = np.argsort(-scores, axis=1)[:, :100]
Notes on inputs
- Query prompt is required. Use
prompt_name="query"(sentence-transformers) or prependQUERY_PROMPTmanually (transformers). Documents are encoded without a prompt. - Domain-specific prompts typically improve retrieval quality on reasoning-intensive queries; a generic web-search prompt is provided as the default.
- Long inputs are supported up to 32 K tokens; for retrieval you usually want to truncate documents to 4β8 K tokens to keep encoding cost manageable.
- Embeddings are L2-normalized, so cosine similarity reduces to a dot product. Both
sentence-transformers(util.cos_sim) and a plainq @ d.Twork.
Intended use
RTriever-4B is intended for reasoning-intensive retrieval: queries that require multi-step inference and the integration of complementary evidence rather than surface-level keyword or paraphrase matching. It can also be used as a drop-in replacement for any general-purpose dense retriever in retrieval-augmented generation, scholarly search, and agentic search pipelines.
License
Released under the MIT License. The base model (Qwen/Qwen3-Embedding-4B) retains its original license; consult the Qwen3-Embedding model card for upstream attribution.
- Downloads last month
- 26