TITLE:
A Counterfactual Explainability Framework for Transparent, Actionable, and Clinician Validated Psychiatric Treatment Decision Support
AUTHORS:
Rocco de Filippis, Abdullah Al Foysal
KEYWORDS:
Counterfactual Explanations, Explainable AI, Psychiatric Treatment, Clinical Decision Support, Algorithmic Recourse, Treatment Response Prediction, Bayesian Deep Learning, Interpretable Machine Learning, XAI, What-If Scenarios, Clinician Evaluation, Causal Consistency
JOURNAL NAME:
Open Access Library Journal,
Vol.13 No.5,
May
29,
2026
ABSTRACT: Psychiatric treatment decisions are among the most consequential and least transparent clinical choices a physician makes. A machine learning model that predicts treatment non-response is only clinically useful if it can also answer the question every psychiatrist immediately asks: what would need to change for this patient to respond? Standard black-box models cannot answer this question. Counterfactual explanation methods propose to fill this gap, but existing approaches generate scenarios that are mathematically optimal yet clinically implausible changing features that cannot be acted upon, violating causal constraints between clinical variables, or ignoring the patient’s circumstances and preferences. We introduce CounterPsych, an end-to-end counterfactual explainability framework specifically designed for psychiatric treatment decision support. CounterPsych combines a Bayesian outcome predictor a ten-member deep ensemble with calibrated uncertainty (ECE = 0.021) with a constrained counterfactual generator that produces what-if treatment scenarios satisfying four simultaneous validity criteria: clinical plausibility, medical actionability, causal consistency, and patient-preference alignment. The counterfactual generator is built on a novel proximity-constrained gradient search with clinical validity filtering and diversity regularization, producing sparse, realistic recourse plans with a mean of 2.3 feature changes per counterfactual. Trained and validated on a retrospective-prospective cohort of 2,480 psychiatric outpatients across five diagnostic categories and four clinical sites, CounterPsych achieves treatment outcome prediction accuracy of 94.1%, AUC-ROC of 0.977, and macro-F1 of 0.919. In a prospective clinician evaluation with 24 consultant psychiatrists, CounterPsych counterfactuals received mean ratings of 4.42/5 for clinical plausibility and 4.51/5 for trustworthiness substantially outperforming the best prior counterfactual method (DiCE: 3.21/5 and 3.14/5). CounterPsych is the first counterfactual explanation framework validated for psychiatric treatment decisions through direct clinician evaluation, establishing a new standard for clinically meaningful machine learning explainability in psychiatry.