TITLE:
Detecting Bias in AI: A Multi-Label RoBERTa Classification Model to Detect Bias in LLM-Generated Diversity Reports
AUTHORS:
Mathew Sunil Abraham, Nicole Lee, Rezza Moieni
KEYWORDS:
Artificial Intelligence (AI), Ethnical AI, Bias Detection, Large Language Models (LLMs), Diversity Data Analysis
JOURNAL NAME:
Open Journal of Social Sciences,
Vol.13 No.10,
September
29,
2025
ABSTRACT: With increasing use of generative AI in creating textual summaries and dashboard reports, there is significant concern regarding diverse forms of bias such as gender, geographical, religious, cultural, and language biases. This research investigates the presence of bias in AI-generated diversity reports and presents a sentence-level bias detection model to quantify and classify different types of bias. The study focuses on five key bias categories: gender, religion, age, disability, and sexuality. We train and evaluate a multi-label bias classifier with a RoBERTa-based deep learning model, integrating manual confidence-weighted annotation practices to ensure reliable labelling. Synthetic diversity reports were generated using the Gemini 1.5-flash language model to simulate real-world corporate content. We use this model to analyse around 1000 reports (10,000+ sentences) for bias and assess the nature and distribution of different bias types. The model demonstrated high accuracy and recall rates, effectively detecting both overt and subtle biases across categories. Analysis of over 10,000 sentences revealed measurable bias in the generated reports, with disability, gender, and religion biases being the most frequently detected. These findings highlight that even when using inclusive prompts, large language models can produce biased content. The model’s strong performance and ability to detect nuanced and intersectional biases make it a practical tool for organisations aiming to audit AI-generated communications. We expect this to contribute to the growing field of ethical AI by supporting more transparent, fair, and inclusive corporate reporting.