Publication Date

2026

Document Type

Forthcoming Work

Abstract

For seventy years, research has shown actuarial methods outperform clinical judgment. Yet actuarial approaches have limitations: they generally rely on structured data; cannot exploit rare case-specific details; have limited accuracy where outcome data are scarce or incomplete; and cannot offer case-level justifications. Large language models (LLMs) offer a different approach. Like actuarial methods, they aggregate information algorithmically, but like clinicians, they bring general knowledge and can provide case-level justifications. We prompted seven LLMs to assess rearrest risk from 113 parole hearing transcripts and compared their predictions to a machine learning model trained on 4,000 cases with 91 administrative variables. GPT-5 outperformed this actuarial baseline despite no task-specific training on arrest outcomes; however, its stated justifications did not reliably explain its own predictions.

Share

COinS