File size: 9,166 Bytes
3370983
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# Context Engineering 🧠

> Keeping long-running agents "forever young" by managing their memory.

## The Problem

LLMs have finite context windows. As conversations grow, you eventually hit the token limit and the agent breaks. Simply truncating old messages loses valuable context.

## The Solution: Compactive Summarization

Instead of truncating, we **summarize** old conversation history into a compact narrative, preserving the essential context while freeing up tokens.

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Before Compaction (500+ tokens)                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [System] You are an HR assistant...                    β”‚
β”‚  [Human] Show me all candidates                         β”‚
β”‚  [AI] Here are 5 candidates: Alice, Bob...              β”‚
β”‚  [Human] Tell me about Alice                            β”‚
β”‚  [AI] Alice is a senior engineer with 5 years...        β”‚
β”‚  [Human] Schedule an interview with her                 β”‚
β”‚  [Tool] Calendar event created...                       β”‚
β”‚  [AI] Done! Interview scheduled for Monday.             β”‚
β”‚  [Human] Now check Bob's CV                      ← new  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         ↓ COMPACTION ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  After Compaction (~200 tokens)                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [System] You are an HR assistant...                    β”‚
β”‚  [AI Summary] User reviewed candidates, focused on      β”‚
β”‚       Alice (senior engineer), scheduled interview      β”‚
β”‚       for Monday.                                       β”‚
β”‚  [Human] Now check Bob's CV                      ← kept β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  CompactingSupervisor                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  1. Intercept agent execution                      β”‚  β”‚
β”‚  β”‚  2. Run agent normally                             β”‚  β”‚
β”‚  β”‚  3. Count tokens after response                    β”‚  β”‚
β”‚  β”‚  4. If over limit β†’ trigger compaction             β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                          β”‚                               β”‚
β”‚                          β–Ό                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚              HistoryManager                        β”‚  β”‚
β”‚  β”‚  β€’ compact_messages() β†’ LLM summarization          β”‚  β”‚
β”‚  β”‚  β€’ replace_thread_history() β†’ checkpoint update    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## πŸ”’ Subagents and Memory Safety

Compaction affects **only the supervisor’s `messages` channel** inside LangGraph’s checkpoint.

This includes:

- User messages  
- Supervisor AI messages  
- **Tool call and Tool result messages** (because these are part of the supervisor’s visible conversation history)

This does **not** include:

- Sub-agent internal reasoning  
- Sub-agent private memory  
- Hidden chain-of-thought  
- Any messages stored in sub-agent–specific channels

Only the messages that the supervisor itself receives are ever compacted.  
No internal sub-agent state leaks into the compacted summary.


## Key Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `token_limit` | 500 | Trigger compaction when exceeded |
| `compaction_ratio` | 0.5 | Fraction of messages to summarize |

### Compaction Ratio Explained

The `compaction_ratio` controls how aggressively we summarize:

```
compaction_ratio = 0.5 (Default)
β”œβ”€β”€ Summarizes: oldest 50% of messages
└── Keeps verbatim: newest 50% of messages

compaction_ratio = 0.8 (Aggressive)
β”œβ”€β”€ Summarizes: oldest 80% of messages  
└── Keeps verbatim: only newest 20%
    β†’ Use when context is very tight

compaction_ratio = 0.2 (Gentle)
β”œβ”€β”€ Summarizes: only oldest 20%
└── Keeps verbatim: newest 80%
    β†’ Use when you want more history preserved
```

**Example with 10 messages:**
- `ratio=0.5` β†’ Summarize messages 1-5, keep 6-10 verbatim
- `ratio=0.8` β†’ Summarize messages 1-8, keep 9-10 verbatim
- `ratio=0.2` β†’ Summarize messages 1-2, keep 3-10 verbatim

## Usage

```python
from src.backend.context_eng import compacting_supervisor

# Just use it like a normal agent - compaction is automatic!
response = compacting_supervisor.invoke(
    {"messages": [HumanMessage(content="Hello")]},
    config={"configurable": {"thread_id": "my-thread"}}
)

# Streaming works too
for chunk in compacting_supervisor.stream(...):
    if chunk["type"] == "token":
        print(chunk["content"], end="")
```

## LangGraph Integration

### How It Wraps the Agent

The `CompactingSupervisor` uses the **Interceptor Pattern** - it wraps the existing LangGraph agent without modifying it:

```python
# In compacting_supervisor.py
from src.backend.agents.supervisor.supervisor_v2 import supervisor_agent, memory

compacting_supervisor = CompactingSupervisor(
    agent=supervisor_agent,      # ← Original LangGraph agent
    history_manager=HistoryManager(memory_saver=memory),  # ← LangGraph's MemorySaver
    ...
)
```

The agent itself is **unchanged**. We just intercept `invoke()` and `stream()` calls.

### How It Manipulates LangGraph Memory

LangGraph uses **checkpoints** to persist conversation state. Normally, messages are append-only. Our `HistoryManager.replace_thread_history()` bypasses this to force a rewrite:

```
Normal LangGraph flow:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Checkpoint Storage (MemorySaver)   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ messages: [m1, m2, m3, m4...] β”‚  β”‚  ← Append-only
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

After compaction (we override):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Checkpoint Storage (MemorySaver)   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ messages: [sys, summary, m4]  β”‚  β”‚  ← Force-replaced!
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Key mechanism in `replace_thread_history()`:**
1. Get current checkpoint via `memory.get_tuple(config)`
2. Build new checkpoint with compacted messages
3. Increment version + update timestamps
4. Write directly via `memory.put(...)` - bypassing normal reducers

This is a **low-level override** of LangGraph's internal checkpoint format. It works because we maintain the expected checkpoint structure (`channel_versions`, `channel_values`, etc.).

## Files

| File | Purpose |
|------|---------|
| `token_counter.py` | Count tokens in message lists |
| `history_manager.py` | Summarization + checkpoint manipulation |
| `compacting_supervisor.py` | Agent wrapper (Interceptor Pattern) |