how-affect-v1 — Bridge-Grounded Affect Detector

A DistilBERT-based affect-valence classifier fine-tuned on non-circular author-narrated affect labels (mined from public-domain novel narration via BookNLP), rather than on LLM-generated personality scores.

Why this exists

Production personality / emotion classifiers in companion AI are commonly trained on LLM labels (e.g. Claude/GPT scores). Evaluation against those same LLM labels is circular — the model only learns to imitate the labeling LLM. We needed a HOW (affect) detector grounded in independent human-written signal about how characters speak. Solution: harvest dialogue-tag adverbs + WordNet emotion supersenses from BookNLP-processed novels (~1000 books, 25k labeled quotes), bind them to the speaker's actual utterances, and train a probe.

Metrics

Held-out test set (5,971 quotes, balanced neg/pos author affect):

Model	Held-out AUC
Existing circular "emotion" dim (177-dim model trained on Claude scores)	0.557
Frozen-embedding probe (sentence-transformer + linear head)	0.637
This model — DistilBERT end-to-end on bridge labels	0.678

Honest ceiling: ~0.68 is real but modest. Narrated affect ("said bitterly") often lives in prosody, not lexical content, so text-only affect detection has a structural ceiling. A voice/prosody channel is the path to higher AUC.

Files

model.pt — full state-dict: DistilBERT encoder + mean-pool + Linear(hidden→1) head.
metrics.json — final held-out AUC + baseline comparison.

Usage

The head is custom (DistilBERT + mean-pool + 1-logit), so you can't use AutoModelForSequenceClassification.from_pretrained directly. Load like this:

import torch
from transformers import AutoTokenizer, AutoModel

class AffectNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.enc = AutoModel.from_pretrained("distilbert-base-uncased")
        self.head = torch.nn.Linear(self.enc.config.hidden_size, 1)
    def forward(self, ids, mask):
        h = self.enc(input_ids=ids, attention_mask=mask).last_hidden_state
        m = mask.unsqueeze(-1).float()
        pooled = (h * m).sum(1) / m.sum(1).clamp(min=1e-6)
        return self.head(pooled).squeeze(1)

tok = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AffectNet()
model.load_state_dict(torch.load("model.pt", map_location="cpu"))
model.eval()

text = "I can't bear this any longer."
enc = tok(text, padding="max_length", truncation=True, max_length=48, return_tensors="pt")
with torch.no_grad():
    valence = torch.sigmoid(model(enc["input_ids"], enc["attention_mask"]))[0].item()
print(valence)  # ~1.0 = negative/distressed affect, ~0.0 = positive

Training data (non-circular)

Bridge corpus from ~1000 BookNLP-processed novels (corpus/booknlp_output/): for each character quote, the narration window (±7 tokens around the quote) was scanned for emotion supersense spans (verb.emotion, noun.feeling) and manner adverbs anchored to a speech verb ("said bitterly"). Quotes mapped to net-negative vs net-positive author affect → 17,749 neg / 16,375 pos balanced labels (29,852 total used, 23,881 train / 5,971 test).

Architecture

Base encoder: distilbert-base-uncased (~66M params).
Head: Dropout-free Linear(hidden_size, 1) over mean-pooled token embeddings.
Loss: BCEWithLogitsLoss on binary affect-valence.
Trained 1-2 epochs on CPU (best epoch saved by held-out AUC; early-stopped when AUC stopped improving).
Max input length: 48 tokens (quotes are short).

License

Trained on derivatives of public-domain (Project Gutenberg) novels processed via BookNLP. The model weights are released for research use; please consult your jurisdiction's rules around derivative works for production deployment.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Mozat/how-affect-v1

Base model

distilbert/distilbert-base-uncased

Finetuned

(11725)

this model