File size: 9,166 Bytes
3370983 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# Context Engineering π§
> Keeping long-running agents "forever young" by managing their memory.
## The Problem
LLMs have finite context windows. As conversations grow, you eventually hit the token limit and the agent breaks. Simply truncating old messages loses valuable context.
## The Solution: Compactive Summarization
Instead of truncating, we **summarize** old conversation history into a compact narrative, preserving the essential context while freeing up tokens.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Before Compaction (500+ tokens) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [System] You are an HR assistant... β
β [Human] Show me all candidates β
β [AI] Here are 5 candidates: Alice, Bob... β
β [Human] Tell me about Alice β
β [AI] Alice is a senior engineer with 5 years... β
β [Human] Schedule an interview with her β
β [Tool] Calendar event created... β
β [AI] Done! Interview scheduled for Monday. β
β [Human] Now check Bob's CV β new β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COMPACTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β After Compaction (~200 tokens) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [System] You are an HR assistant... β
β [AI Summary] User reviewed candidates, focused on β
β Alice (senior engineer), scheduled interview β
β for Monday. β
β [Human] Now check Bob's CV β kept β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Architecture
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CompactingSupervisor β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Intercept agent execution β β
β β 2. Run agent normally β β
β β 3. Count tokens after response β β
β β 4. If over limit β trigger compaction β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β HistoryManager β β
β β β’ compact_messages() β LLM summarization β β
β β β’ replace_thread_history() β checkpoint update β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## π Subagents and Memory Safety
Compaction affects **only the supervisorβs `messages` channel** inside LangGraphβs checkpoint.
This includes:
- User messages
- Supervisor AI messages
- **Tool call and Tool result messages** (because these are part of the supervisorβs visible conversation history)
This does **not** include:
- Sub-agent internal reasoning
- Sub-agent private memory
- Hidden chain-of-thought
- Any messages stored in sub-agentβspecific channels
Only the messages that the supervisor itself receives are ever compacted.
No internal sub-agent state leaks into the compacted summary.
## Key Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `token_limit` | 500 | Trigger compaction when exceeded |
| `compaction_ratio` | 0.5 | Fraction of messages to summarize |
### Compaction Ratio Explained
The `compaction_ratio` controls how aggressively we summarize:
```
compaction_ratio = 0.5 (Default)
βββ Summarizes: oldest 50% of messages
βββ Keeps verbatim: newest 50% of messages
compaction_ratio = 0.8 (Aggressive)
βββ Summarizes: oldest 80% of messages
βββ Keeps verbatim: only newest 20%
β Use when context is very tight
compaction_ratio = 0.2 (Gentle)
βββ Summarizes: only oldest 20%
βββ Keeps verbatim: newest 80%
β Use when you want more history preserved
```
**Example with 10 messages:**
- `ratio=0.5` β Summarize messages 1-5, keep 6-10 verbatim
- `ratio=0.8` β Summarize messages 1-8, keep 9-10 verbatim
- `ratio=0.2` β Summarize messages 1-2, keep 3-10 verbatim
## Usage
```python
from src.backend.context_eng import compacting_supervisor
# Just use it like a normal agent - compaction is automatic!
response = compacting_supervisor.invoke(
{"messages": [HumanMessage(content="Hello")]},
config={"configurable": {"thread_id": "my-thread"}}
)
# Streaming works too
for chunk in compacting_supervisor.stream(...):
if chunk["type"] == "token":
print(chunk["content"], end="")
```
## LangGraph Integration
### How It Wraps the Agent
The `CompactingSupervisor` uses the **Interceptor Pattern** - it wraps the existing LangGraph agent without modifying it:
```python
# In compacting_supervisor.py
from src.backend.agents.supervisor.supervisor_v2 import supervisor_agent, memory
compacting_supervisor = CompactingSupervisor(
agent=supervisor_agent, # β Original LangGraph agent
history_manager=HistoryManager(memory_saver=memory), # β LangGraph's MemorySaver
...
)
```
The agent itself is **unchanged**. We just intercept `invoke()` and `stream()` calls.
### How It Manipulates LangGraph Memory
LangGraph uses **checkpoints** to persist conversation state. Normally, messages are append-only. Our `HistoryManager.replace_thread_history()` bypasses this to force a rewrite:
```
Normal LangGraph flow:
βββββββββββββββββββββββββββββββββββββββ
β Checkpoint Storage (MemorySaver) β
β βββββββββββββββββββββββββββββββββ β
β β messages: [m1, m2, m3, m4...] β β β Append-only
β βββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
After compaction (we override):
βββββββββββββββββββββββββββββββββββββββ
β Checkpoint Storage (MemorySaver) β
β βββββββββββββββββββββββββββββββββ β
β β messages: [sys, summary, m4] β β β Force-replaced!
β βββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
```
**Key mechanism in `replace_thread_history()`:**
1. Get current checkpoint via `memory.get_tuple(config)`
2. Build new checkpoint with compacted messages
3. Increment version + update timestamps
4. Write directly via `memory.put(...)` - bypassing normal reducers
This is a **low-level override** of LangGraph's internal checkpoint format. It works because we maintain the expected checkpoint structure (`channel_versions`, `channel_values`, etc.).
## Files
| File | Purpose |
|------|---------|
| `token_counter.py` | Count tokens in message lists |
| `history_manager.py` | Summarization + checkpoint manipulation |
| `compacting_supervisor.py` | Agent wrapper (Interceptor Pattern) |
|