Jim Crow Law Classifier (ModernBERT-base)

A text classification model fine-tuned on biglam/on_the_books to identify Jim Crow laws in historical US legislative text.

Model Description

This model classifies sections of US state legislation as either Jim Crow laws (discriminatory laws targeting racial minorities) or non-Jim Crow laws. It was fine-tuned from answerdotai/ModernBERT-base, which supports up to 8,192 tokens of context.

Performance

Evaluated on a stratified 15% held-out test set (268 samples):

Metric	Score
F1	0.9487
Accuracy	0.9701
Precision	0.9367
Recall	0.9610

Training Details

Base model: answerdotai/ModernBERT-base (149M parameters)
Dataset: biglam/on_the_books (1,785 samples total; 1,517 train / 268 test)
Max sequence length: 1024 tokens
Epochs: 5 (best checkpoint at epoch 5 by F1)
Batch size: 16
Learning rate: 2e-5 with linear decay
Warmup: 6% of training steps
Weight decay: 0.01
Hardware: NVIDIA T4 GPU
Training time: ~8 minutes

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="davanstrien/jim-crow-laws-ml-agent")

text = "The Commission shall provide separate sleeping quarters and separate eating space for the different races."
result = classifier(text)
print(result)
# [{'label': 'jim_crow', 'score': 0.99...}]

Dataset

The On the Books dataset contains 1,785 sections of North Carolina state legislation from the Jim Crow era, annotated by historians as either Jim Crow laws or non-Jim Crow laws. The dataset is imbalanced: 71% non-Jim Crow, 29% Jim Crow.

Labels

no_jim_crow (0): Non-discriminatory legislation
jim_crow (1): Jim Crow law (racially discriminatory legislation)

Limitations

Trained only on North Carolina legislation; may not generalize to other states
Historical language patterns may not transfer to modern legal text
The model may be biased toward the specific annotation criteria used in the dataset

Downloads last month: 69

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for davanstrien/jim-crow-laws-ml-agent

Base model

answerdotai/ModernBERT-base

Finetuned

(1272)

this model

Dataset used to train davanstrien/jim-crow-laws-ml-agent

Evaluation results

F1 on biglam/on_the_books
test set self-reported

0.949
Accuracy on biglam/on_the_books
test set self-reported

0.970
Precision on biglam/on_the_books
test set self-reported

0.937
Recall on biglam/on_the_books
test set self-reported

0.961