<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article">
 <front>
  <journal-meta>
   <journal-id journal-id-type="publisher-id">
    jss
   </journal-id>
   <journal-title-group>
    <journal-title>
     Open Journal of Social Sciences
    </journal-title>
   </journal-title-group>
   <issn pub-type="epub">
    2327-5952
   </issn>
   <issn publication-format="print">
    2327-5960
   </issn>
   <publisher>
    <publisher-name>
     Scientific Research Publishing
    </publisher-name>
   </publisher>
  </journal-meta>
  <article-meta>
   <article-id pub-id-type="doi">
    10.4236/jss.2024.127025
   </article-id>
   <article-id pub-id-type="publisher-id">
    jss-134674
   </article-id>
   <article-categories>
    <subj-group subj-group-type="heading">
     <subject>
      Articles
     </subject>
    </subj-group>
    <subj-group subj-group-type="Discipline-v2">
     <subject>
      Business 
     </subject>
     <subject>
       Economics, Social Sciences 
     </subject>
     <subject>
       Humanities
     </subject>
    </subj-group>
   </article-categories>
   <title-group>
    Overview of Machine Learning Algorithms for Detecting Microaggression in Written Text
   </title-group>
   <contrib-group>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Asif
      </surname>
      <given-names>
       Tareque
      </given-names>
     </name>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Harshith Hullakere
      </surname>
      <given-names>
       Siddegowda
      </given-names>
     </name>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Denster Joseph
      </surname>
      <given-names>
       Frank
      </given-names>
     </name>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Nicole
      </surname>
      <given-names>
       Lee
      </given-names>
     </name>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Rezza
      </surname>
      <given-names>
       Moieni
      </given-names>
     </name>
    </contrib>
   </contrib-group> 
   <aff id="affnull">
    <addr-line>
     aDiversity Atlas, Melbourne, Australia
    </addr-line> 
   </aff> 
   <pub-date pub-type="epub">
    <day>
     09
    </day> 
    <month>
     07
    </month>
    <year>
     2024
    </year>
   </pub-date> 
   <volume>
    12
   </volume> 
   <issue>
    07
   </issue>
   <fpage>
    347
   </fpage>
   <lpage>
    358
   </lpage>
   <history>
    <date date-type="received">
     <day>
      1,
     </day>
     <month>
      December
     </month>
     <year>
      2023
     </year>
    </date>
    <date date-type="published">
     <day>
      19,
     </day>
     <month>
      December
     </month>
     <year>
      2023
     </year> 
    </date> 
    <date date-type="accepted">
     <day>
      19,
     </day>
     <month>
      July
     </month>
     <year>
      2024
     </year> 
    </date>
   </history>
   <permissions>
    <copyright-statement>
     © Copyright 2014 by authors and Scientific Research Publishing Inc. 
    </copyright-statement>
    <copyright-year>
     2014
    </copyright-year>
    <license>
     <license-p>
      This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/
     </license-p>
    </license>
   </permissions>
   <abstract>
    Microaggressions are brief, daily verbal, behavioral, or environmental actions that convey negative, demeaning, or hostile racial undertones. These can be unintentional and often go unnoticed by the offender. They can have significant impacts on the mental health of the victims, leading to stress, low self-esteem, and feelings of invalidation. This research aims to detect microaggressions in written communication using machine learning. The study tackles the problem of data scarcity and lacking annotated data on microaggression, by collecting text data from microagreesions.com, ChatGPT, Reddit and office workplaces, and annotating the data using GPT 3.5 language model. Multiple machine learning algorithms were used to detect microaggressive language in text and were evaluated across proper metrics. Long Short-Term Memory (LSTM) with BERT embeddings was found to be the most stable model in detecting microaggression. It advances the field of microaggression detection with the leveraging of deep learning techniques, which could be potentially expanded to eliminate microaggression in texts.
   </abstract>
   <kwd-group> 
    <kwd>
     Natural Language Processing
    </kwd> 
    <kwd>
      Inclusivity
    </kwd> 
    <kwd>
      Microaggression
    </kwd> 
    <kwd>
      Diversity
    </kwd> 
    <kwd>
      Machine Learning
    </kwd> 
    <kwd>
      Deep Learning
    </kwd>
   </kwd-group>
  </article-meta>
 </front>
 <body>
  <sec id="s1">
   <title>1. Introduction</title>
   <p>Microaggressions are short, everyday verbal, behavioral, or environmental actions, whether deliberate or not, that convey negative, demeaning, or hostile racial undertones (<xref ref-type="bibr" rid="scirp.134674-24">
     Sue et al., 2007
    </xref>). Initially, it was believed that victims existed exclusively in racial minorities or people of colour, however, microaggression can happen to anyone based on race, gender, sexual orientation or any other protected characteristic (<xref ref-type="bibr" rid="scirp.134674-22">
     Sue, 2010
    </xref>).</p>
   <p>The subtle nature of microaggression makes them insidiously dangerous as offenders often commit them unintentionally and are unaware of the consequences for the victim. Microaggressions have been compared to deadly carbon monoxide gas which is potentially lethal but undetectable (<xref ref-type="bibr" rid="scirp.134674-23">
     Sue &amp; Sue, 2003
    </xref>) when offenders are not conscious of such microaggressive behavior they may be less inclined to correct their behaviour and may even justify it to themselves. Research, however, suggests microaggressions impact the victims greatly and thus are worthy of further research.</p>
   <p>Correlation and regression analysis reveal the link between racial microaggressions and mental health and concludes that the more microaggressions a person experiences, there are associated poor mental health outcomes. Notably, microaggressions were significantly correlated with negative mental health outcomes, particularly in depression, lack of positive mood, and lack of behavioral control (<xref ref-type="bibr" rid="scirp.134674-15">
     Nadal et al., 2014
    </xref>). Furthermore, while individual microaggressions might seem minor in the wider scheme of things, their cumulative effect is potentially substantial. Over time such effects may lead to stress, lack of self-esteem and loneliness. Furthermore, microaggressions may cause victims a sense of invalidation within the world, thus further impacting mental health outcomes. Lastly, within the context of clinical therapy, the occurrence of microaggressions can deteriorate the therapeutic alliance between client and therapist (<xref ref-type="bibr" rid="scirp.134674-24">
     Sue et al., 2007
    </xref>).</p>
   <p>Utilizing the Racial and Ethnic Microaggressions Scales (REMS) and the Mental Health Inventory (MHI) it was found that racial microaggressions were experienced significantly more by people with general mental health issues. Furthermore, it was found that factors such as geographic location, education, and age can also be factors of experiencing microaggressions (<xref ref-type="bibr" rid="scirp.134674-14">
     Nadal, 2018
    </xref>). Microaggressions can also be termed as algorithmic bias in language models, as a result, companies hold an ethical responsibility for the inflicted damage caused by their algorithms (<xref ref-type="bibr" rid="scirp.134674-11">
     McClure &amp; Wald, 2022
    </xref>).</p>
   <p>While a lot of work has been done in classifying toxic written text, such as comments on social media (<xref ref-type="bibr" rid="scirp.134674-1">
     Aken et al., 2018
    </xref>), little research has been completed on using machine learning to detect microaggressions. Therefore, this study aims to answer the following research questions:</p>
   <p>1) How to detect microaggression in written communication using Machine learning?</p>
   <p>2) Can synthetic data tackle the lack of microaggression data?</p>
  </sec><sec id="s2">
   <title>2. Literature Review</title>
   <p>Natural Language Processing (NLP) has been used previously to overcome some of the challenges that the diversity and inclusion field faces such as promoting gender equity (<xref ref-type="bibr" rid="scirp.134674-20">
     Raichur, Lee, &amp; Moieni, 2023
    </xref>). Previous research has also found that there is a lack of work completed about detecting microaggressions and proposes an automated racial microaggression detection tool using Random Forest and IBk (KNN classifier of Weka Software) classification algorithms (<xref ref-type="bibr" rid="scirp.134674-2">
     Ali et al., 2020
    </xref>) where the results seem consistent in detecting non-microaggression and promising in terms of detecting racial microaggression. However, it only performs well when explicitly microaggressive text is the input. It does not perform well with general texts such as newspaper articles, however. The shortcoming arises from the lack of data and class imbalance, as the data set had almost two-thirds of the text labelled microaggressive.</p>
   <p>Furthermore, studies have been done to research the effectiveness of Machine Learning models in detecting microaggressions in various contexts such as workplaces, social media, and general conversations. These studies how if such models could detect microaggressions in scripted TV shows when trained on real life conversation (<xref ref-type="bibr" rid="scirp.134674-16">
     Ngueajio et al., 2023
    </xref>). This research implements a Support Vector Machine (SVM) with N-grams for feature representation in one model and Robustly Optimized Bidirectional Encoder Representation from Transformers (ROBERT) for context-based feature representation for another model. The paper concludes that the contextual model simply outperforms the model that uses N-grams for feature representation and that models trained on real-life conversations were able to detect microaggression in scripted TV settings at equal rates.</p>
   <p>Moreover, unsupervised machine learning algorithms have been used in working to detect microaggressions (<xref ref-type="bibr" rid="scirp.134674-18">
     Ògúnrèmí, Basile, &amp; Caselli, 2022
    </xref>). The research shows that inherent biases present in pre-trained word embedding could be used to pinpoint subtle, offensive language patterns, particularly microaggression. While unsupervised algorithms do not require labeled data to operate, the algorithm is not able to detect implicit, othering phrasing commonly associated with microaggressions, such as “your kind” or “you people”. Additionally, the challenge of polysemy, where a word can have multiple meanings, presents another obstacle to unsupervised machine learning. Hence, we propose a supervised approach to address these challenges.</p>
   <p>Nonetheless, this research highlights a key challenge in microaggression research: a lack of annotated data. The challenge lies in the lack of real-world data specifically annotated as microaggression (<xref ref-type="bibr" rid="scirp.134674-3">
     Breitfeller et al., 2019
    </xref>) as opposed to general aggression or hate speech. While the traditional solution involves crowd sourcing data through platforms such as MTurk, in the era of AI, we propose annotation and synthetic data generation using GPT to overcome this data shortage.</p>
   <p>Recent research has shown GPT achieves a 70% accuracy rate for content moderation tweets data, 81% for news articles data and 83% for US Congress tweets data. In terms of GPT’s intercoder agreement, ChatGPT performs significantly better: MTurk averages 56%, trained annotators 79%, ChatGPT (temperature = 1) 91%, and ChatGPT (temperature = 0.2) achieves a remarkable 97%. ChatGPT’s temperature parameter controls the degree of randomness of the output (<xref ref-type="bibr" rid="scirp.134674-7">
     Gilardi, Alizadeh, &amp; Kubli, 2023
    </xref>). Thus, large generative language models could be considered in performing data annotation with significant reliability.</p>
   <p>Furthermore, with the rise to End-to-end machine learning while working with NLP where raw data is inserted as input, as opposed to manually engineering features and training both encoder and model at once. One way of achieving this is to implement Bidirectional Encoder Representation (BERT) as an embedding layer on top of which another neural network architecture is set (<xref ref-type="bibr" rid="scirp.134674-10">
     Li et al., 2019
    </xref>). Word2Vec or GloVebased embedding layers generate a context-independent representation for each token and the BERT embedding layer calculates the token-level representation using the information from the overall sentence provided as input. Therefore, using such an approach removes the overall challenge of feature engineering required for such a task. Another study (<xref ref-type="bibr" rid="scirp.134674-12">
     Miaschi &amp; Dell’Orletta, 2020
    </xref>) compares the probing score of contextual BERT word embeddings and non-contextual word2vec. BERT captures features related to the basic text and sentence structure, while Word2vec is better at predicting word formation and sentence structure details.</p>
  </sec><sec id="s3">
   <title>3. Methodology</title>
   <p>
    <xref ref-type="fig" rid="fig1">
     Figure 1
    </xref> shows an outline of this paper’s methodology which is elaborated in details in the upcoming subsections.</p>
   <fig id="fig1" position="float">
    <label>Figure 1</label>
    <caption>
     <title>Figure 1. Overall architecture on data collection, data annotation, model evaluation, model selection, and microaggression detection.</title>
    </caption>
    <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/1768077-rId12.jpeg?20240722030506" />
   </fig>
   <sec id="s3_1">
    <title>3.1. Data Collection</title>
    <p>To compile a diverse and comprehensive dataset for the research, we employed various data collection techniques from different sources as listed below. By employing diverse methods, we aimed to construct a well-rounded dataset that encompasses a wide range of perspectives and situations, enabling a comprehensive analysis for our research objectives.</p>
    <p>1) Synthetic Data Generation from Chat GPT (GPT-16k Turbo):</p>
    <p>Due to ethical considerations and limitations arising from the updated policies of Chat GPT, the study resorted to generating synthetic data using the GPT-16k Turbo model. This, however, gives rise to challenges pertaining to both ethical concerns and the quantity of generated data. To mitigate these issues, the generation process was conducted in batches. Ultimately, approximately 1700 data points were collected, some which are shown in Appendix.</p>
    <p>2) Reddit API Data Collection:</p>
    <p>To incorporate real-world perspectives and opinions on controversial topics, we utilized the Reddit API. We focused on a selection of topics known for their contentious nature, including social justice, Black Lives Matter, feminism, LGBTQI+ issues, Asian American, Native American, Latino America, disability, Muslim Lounge, Jewish, Climate Action, Technology, Science, Mental Health, Personal Finance, Parenting, Travel, Books, Fitness, and Art. From each of these categories, we extracted the first 10 hot topics. In total, 725 data points were collected. Subsequently, the data underwent a thorough cleaning and preprocessing process.</p>
    <p>3) Data Scraping from microaggressions.com:</p>
    <p>Employing web scraping techniques, we gathered data from microaggressions.com (<xref ref-type="bibr" rid="scirp.134674-13">
      Microaggressions Project, 2023
     </xref>). This source provided valuable insights into instances of microaggressions, which are often critical in understanding subtle forms of discrimination and bias.</p>
    <p>4) Email Conversations from Office Workplace:</p>
    <p>Additionally, the data was sourced from email conversations within office workplaces (<xref ref-type="bibr" rid="scirp.134674-4">
      Civil Research Data, 2018
     </xref>). This particular mode of communication provides a unique perspective on professional interactions and sheds light on various workplace dynamics.</p>
   </sec>
   <sec id="s3_2">
    <title>3.2. Data Annotation Using GPT 3.5</title>
    <p>Following the initial data collection phase, deliberation arose regarding the choice between employing human annotators or utilizing Language Learning Models (LLMs) for data annotation as shown in <xref ref-type="fig" rid="fig2">
      Figure 2
     </xref>. Subsequent to an extensive review of the existing literature (<xref ref-type="bibr" rid="scirp.134674-7">
      Gilardi, Alizadeh, &amp; Kubli, 2023
     </xref>), it was conclusively determined that Chat GPT exhibited superior performance compared to human annotators in the task of text annotation. Consequently, the GPT-3.5 Turbo model with a capacity of 16k tokens were employed for the annotation of data into binary classifications of “Yes Microaggression” and “No Microaggression”.</p>
    <fig id="fig2" position="float">
     <label>Figure 2</label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.134674-"></xref>Figure 2. Architecture on data annotation using Language Learning Models (LLMs).</title>
     </caption>
     <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/1768077-rId13.jpeg?20240722030506" />
    </fig>
   </sec>
   <sec id="s3_3">
    <title>3.3. Data Preprocessing</title>
    <p>In the data cleaning phase, a series of NLP techniques were applied to enhance the quality and consistency of the dataset. These procedures encompassed the following steps:</p>
    <p>1) Converting to Lowercase: All text entries were transformed into lowercase, ensuring uniformity and simplifying subsequent analyses.</p>
    <p>2) Whitespace Removal: Extraneous spaces within the text were systematically eliminated, further streamlining the dataset.</p>
    <p>3) Special Character Removal: Any non-alphanumeric characters were excised from the text, eliminating potential sources of noise or irregularities.</p>
    <p>4) URL Elimination: Uniform Resource Locators (URLs) were systematically removed to prevent them from influencing subsequent analyses.</p>
    <p>5) Emoji Removal: Emoticons and other non-textual symbols were purged from the dataset, focusing the analysis exclusively on textual content.</p>
    <p>6) Non-English Language Exclusion: Texts not in English were identified and subsequently excluded from the dataset, ensuring linguistic homogeneity.</p>
    <p>7) Lemmatization: Lemmatization was preferred over stemming due to its ability to produce linguistically valid root words. Unlike stemming, which often results in non-standard or even non-existent words, lemmatization preserves the semantic integrity of the text. This ensures that the derived root words maintain their meaningfulness within the context of the language.</p>
    <p>8) Retention of Stop Words: As per the findings of our comprehensive literature review in section 2, the decision was made to retain stop words. Removal of these common linguistic elements can lead to a loss of crucial context and nuance, potentially impeding subsequent analyses.</p>
   </sec>
   <sec id="s3_4">
    <title>3.4. Model Selection</title>
    <p>This research experiments with four classification techniques, including both non-neural and deep neural network:</p>
    <p>1) Logistic regression (LR):</p>
    <p>LR is a traditional statistical tool has been gaining attention in the realm of machine learning, particularly in the domain of text classification (<xref ref-type="bibr" rid="scirp.134674-26">
      Zhang et al., 2003
     </xref>; <xref ref-type="bibr" rid="scirp.134674-6">
      Genkin, Lewis, &amp; Madigan, 2007
     </xref>; <xref ref-type="bibr" rid="scirp.134674-9">
      Ifrim, Bakir, &amp; Weikum, 2008
     </xref>).</p>
    <p>2) Support Vector Machine (SVM):</p>
    <p>SVM is a linear model that is commonly applied in binary classification (<xref ref-type="bibr" rid="scirp.134674-21">
      Steinwart &amp; Christmann, 2008
     </xref>). It served as essential baseline model, providing valuable insights into the task’s initial complexities and setting the foundation for subsequent investigations in this study.</p>
    <p>3) Long Short-Term Memory (LSTM) with BERT embedding layer:</p>
    <p>LSTM is a deep neural network that is applicable to text classification and prediction (<xref ref-type="bibr" rid="scirp.134674-17">
      Nowak, Taspinar, &amp; Scherer, 2017
     </xref>). While the BERT model is a language model capable of capturing nuanced contextual information (<xref ref-type="bibr" rid="scirp.134674-5">
      Devlin et al., 2019
     </xref>). The combination of both is a sophisticated approach in NLP to understand the subtle nuances of language and context (<xref ref-type="bibr" rid="scirp.134674-19">
      Pandey &amp; Singh, 2023
     </xref>), which is a critical requirement for accurately identifying microaggressions in text.</p>
    <p>4) Gated Recurrent Units (GRU):</p>
    <p>GRU is another deep neural network with the capability of learning long sequences of text. It is gaining popularity by its simplicity and fewer parameters compared to the LSTM models (<xref ref-type="bibr" rid="scirp.134674-27">
      Zulqarnain et al., 2020
     </xref>).</p>
   </sec>
   <sec id="s3_5">
    <title>3.5. Evaluation Metrics</title>
    <p>This study incorporates the assessment of accuracy and loss that is relevant in the context of deep learning models. The degree of fitting of each model was assessed in terms of training and testing accuracy. Overfitting exists when the model performs well on training data and performs badly on testing data. Underfitting, on the other hand, exists when the model performs badly on training data and performs well on testing data. A model is considered stable when it generalises well with both seen data on training data and unseen data on testing data, with no overfitting or underfitting (<xref ref-type="bibr" rid="scirp.134674-25">
      Ying, 2019
     </xref>).</p>
    <p>In addition, criteria to evaluate the models involve three key metrics, which were assessed on both microaggressive (1) and non-microaggressive (0) text:</p>
    <p>1) Precision: The measure of positive observations that are correctly predicted from the total predicted observations in the class.</p>
    <p>2) Recall: The fraction of class from the total correctly classified once.</p>
    <p>3) F1-score: The harmonic balance of precision and recall (<xref ref-type="bibr" rid="scirp.134674-8">
      Hossin &amp; Sulaiman, 2015
     </xref>).</p>
    <p>While comparing, the F1 score will suggest the better model and standardize comparison with relevant previous works. We expect a higher recall from the model in the microaggressive class and high precision in the non-microaggressive class. This preference arises from the importance of correctly detecting microaggressions and not mislabeling microaggressive texts, as failing to do so could result in catastrophic consequences, especially when numerous microaggressive texts go undetected.</p>
    <p>Together, these metrics provide a multifaceted evaluation framework, enabling a thorough understanding of the model’s performance over the classification task and model optimization.</p>
   </sec>
  </sec><sec id="s4">
   <title>4. Results</title>
   <p>All four models have roughly the same test accuracies, as described in <xref ref-type="table" rid="table1">
     Table 1
    </xref>, from which GRU under-performed at the lowest accuracy value of 72% on the test data. Meanwhile, LSTM has a training accuracy of 73% and test accuracy of 74%, showing no overfitting and is consistent with both seen and unseen data, which appears to be the most stable model.</p>
   <table-wrap id="table1">
    <label>
     <xref ref-type="table" rid="table1">
      Table 1
     </xref></label>
    <caption>
     <title>
      <xref ref-type="bibr" rid="scirp.134674-"></xref>Table 1. Training and test accuracy value of different models.</title>
    </caption>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td acenter" width="27.18%"><p style="text-align:center">Model</p></td> 
      <td class="custom-bottom-td acenter" width="27.18%"><p style="text-align:center">Training accuracy</p></td> 
      <td class="custom-bottom-td acenter" width="27.18%"><p style="text-align:center">Test accuracy</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td acenter" width="27.18%"><p style="text-align:center">LR</p></td> 
      <td class="custom-top-td acenter" width="27.18%"><p style="text-align:center">0.8313</p></td> 
      <td class="custom-top-td acenter" width="27.18%"><p style="text-align:center">0.7505</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="27.18%"><p style="text-align:center">SVM</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.9852</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.7474</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="27.18%"><p style="text-align:center">LSTM</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.7320</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.7408</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="27.18%"><p style="text-align:center">GRU</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.7929</p></td> 
      <td class="acenter" width="27.18%"><p style="text-align:center">0.7197</p></td> 
     </tr> 
    </table>
   </table-wrap>
   <p>Furthermore, the overall performance of LSTM over multiple epochs is presented in <xref ref-type="fig" rid="fig3">
     Figure 3
    </xref>. The figure on the left shows the expected behaviour wherein both training and validation accuracy exhibit steady improvement before eventually plateauing. Meanwhile, in the figure on the right, convergence loss occurs when the training and validation loss decreases steadily over epochs before eventually reaching a point where losses are not reduced any further. Therefore, we are assured that the model has the capability of learning as much information as possible from available data without overfitting.</p>
   <fig id="fig3" position="float">
    <label>Figure 3</label>
    <caption>
     <title>
      <xref ref-type="bibr" rid="scirp.134674-"></xref>Figure 3. Trend of training and validation accuracy (left) and training and validation loss (right) over multiple epochs.</title>
    </caption>
    <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/1768077-rId14.jpeg?20240722030507" />
   </fig>
   <p>Comparing the F1 scores of each model, all models perform better with higher F1 scores when classifying and detecting non microaggression. A summarized evaluation results could be found in <xref ref-type="table" rid="table2">
     Table 2
    </xref>. Focusing on the LSTM model, when it predicts the text as non-microaggressive, the precision value of “No Microaggression” indicates a 72% chance that the text is genuinely non-microaggressive. On the other hand, the recall value of the “Microaggression” class indicates the model could correctly detect 59% of the microaggressions. Thus, the LSTM model is more reliable in predicting text with no microaggression.</p>
   <p>
    <xref ref-type="bibr" rid="scirp.134674-"></xref>Nonetheless, the recall value for detecting microaggressions in this study is higher than the previous research of Ali (<xref ref-type="bibr" rid="scirp.134674-2">
     Ali et al., 2020)
    </xref>. A key factor contributing to this improved performance could be the use of a less imbalanced dataset, achieved through the generation of synthetic data and data annotation using the language model GPT.</p>
   <table-wrap id="table2">
    <label>
     <xref ref-type="table" rid="table2">
      Table 2
     </xref></label>
    <caption>
     <title>
      <xref ref-type="bibr" rid="scirp.134674-"></xref>Table 2. Evaluation results of different models across three metrics.</title>
    </caption>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td acenter" width="12.30%"><p style="text-align:center"></p></td> 
      <td class="custom-bottom-td acenter" width="44.70%" colspan="3"><p style="text-align:center">No microaggression</p></td> 
      <td class="acenter" width="43.01%" colspan="3"><p style="text-align:center">Yes Microaggression</p></td> 
     </tr> 
     <tr> 
      <td class="custom-bottom-td custom-top-td acenter" width="12.30%"><p style="text-align:center">Model</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="16.87%"><p style="text-align:center">Precision</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="14.73%"><p style="text-align:center">Recall</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="13.10%"><p style="text-align:center">F1 score</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="16.15%"><p style="text-align:center">Precision</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="13.76%"><p style="text-align:center">Recall</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="13.10%"><p style="text-align:center">F1 score</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td acenter" width="12.30%"><p style="text-align:center">LR</p></td> 
      <td class="custom-top-td acenter" width="16.87%"><p style="text-align:center">0.72</p></td> 
      <td class="custom-top-td acenter" width="14.73%"><p style="text-align:center">0.90</p></td> 
      <td class="custom-top-td acenter" width="13.10%"><p style="text-align:center">0.80</p></td> 
      <td class="custom-top-td acenter" width="16.15%"><p style="text-align:center">0.82</p></td> 
      <td class="custom-top-td acenter" width="13.76%"><p style="text-align:center">0.57</p></td> 
      <td class="custom-top-td acenter" width="13.10%"><p style="text-align:center">0.67</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="12.30%"><p style="text-align:center">SVM</p></td> 
      <td class="acenter" width="16.87%"><p style="text-align:center">0.76</p></td> 
      <td class="acenter" width="14.73%"><p style="text-align:center">0.78</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.77</p></td> 
      <td class="acenter" width="16.15%"><p style="text-align:center">0.73</p></td> 
      <td class="acenter" width="13.76%"><p style="text-align:center">0.70</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.71</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="12.30%"><p style="text-align:center">LSTM</p></td> 
      <td class="acenter" width="16.87%"><p style="text-align:center">0.72</p></td> 
      <td class="acenter" width="14.73%"><p style="text-align:center">0.84</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.77</p></td> 
      <td class="acenter" width="16.15%"><p style="text-align:center">0.74</p></td> 
      <td class="acenter" width="13.76%"><p style="text-align:center">0.59</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.66</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="12.30%"><p style="text-align:center">GRU</p></td> 
      <td class="acenter" width="16.87%"><p style="text-align:center">0.72</p></td> 
      <td class="acenter" width="14.73%"><p style="text-align:center">0.84</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.78</p></td> 
      <td class="acenter" width="16.15%"><p style="text-align:center">0.75</p></td> 
      <td class="acenter" width="13.76%"><p style="text-align:center">0.59</p></td> 
      <td class="acenter" width="13.10%"><p style="text-align:center">0.66</p></td> 
     </tr> 
    </table>
   </table-wrap>
  </sec><sec id="s5">
   <title>5. Conclusion</title>
   <p>This study hypothesized that machine learning and artificial intelligence have the potential to overcome barriers to diversity and inclusion by detecting microaggressions in texts. It overcomes the challenge of annotated data regarding microaggressions by using annotation through LLMs which resulted in improved performance due to better data balance. It reveals that LSTM model exhibited the best performance in microaggression detection.</p>
   <p>While detection is one key aspect of this problem of microaggression, further study could be further enhanced by the use of language models to improve text by perhaps paraphrasing texts to be non-microaggressive where the existing models could be put as a feedback loop for the future generative model. Thus, we leave out this research question for future work to tackle microaggression, “If Artificial Intelligence (AI) can reduce the microaggression in written communication?”.</p>
  </sec><sec id="s6">
   <title>Acknowledgements</title>
   <p>This research was supported by Diversity Atlas and their provision of data has been instrumental in shaping the findings of this study. We would like to express our special thanks to the RMIT University for supporting the internship program that allowed the authors to conduct this research endeavor.</p>
  </sec><sec id="s7">
   <title>Appendix</title>
   <p>Some examples of the synthetic data, generated using the GPT language model, could be found below.</p>
  </sec>
 </body><back>
  <ref-list>
   <title>References</title>
   <ref id="scirp.134674-ref1">
    <label>1</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Aken, B. V., Risch, J., Krestel, R.,&amp;Löser, A. (2018). Challenges for Toxic Comment Classification: An In-Depth Error Analysis. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) (pp. 33-42). Association for Computational Linguistics.
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref2">
    <label>2</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ali, O., Scheidt, N., Gegov, A., Haig, E., Adda, M.,&amp;Aziz, B. (2020). Automated Detection of Racial Microaggressions Using Machine Learning. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 2477-2484). IEEE. &gt;https://doi.org/10.1109/SSCI47803.2020.9308569
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref3">
    <label>3</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Breitfeller, L., Ahn, E., Jurgens, D.,&amp;Tsvetkov, Y. (2019). Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 1664-1674). Association for Computational Linguistics. &gt;https://doi.org/10.18653/v1/D19-1176
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref4">
    <label>4</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Civil Research Data (2018). Data.json. &gt;https://figshare.com/articles/dataset/data
     <sub>j</sub>son/7376747 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref5">
    <label>5</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Devlin, J., Chang, M. W., Lee, K.,&amp;Toutanova, K. (2019). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4171-4186). Association for Computational Linguistics. &gt;https://doi.org/10.18653/v1/N19-1423
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref6">
    <label>6</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Genkin, A., Lewis, D. D.,&amp;Madigan, D. (2007). Large-Scale Bayesian Logistic Regression for Text Categorization. Technometrics, 49, 291-304. &gt;https://doi.org/10.1198/004017007000000245
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref7">
    <label>7</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Gilardi, F., Alizadeh, M.,&amp;Kubli, M. (2023). ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks. Proceedings of the National Academy of Sciences of the United States of America, 120, e2305016120. &gt;https://doi.org/10.1073/pnas.2305016120
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref8">
    <label>8</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Hossin, M.,&amp;Sulaiman, M. N. (2015). A Review on Evaluation Metrics for Data Classification Evaluations. International Journal of Data Mining&amp;Knowledge Management Process, 5. &gt;https://doi.org/10.5121/ijdkp.2015.5201
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref9">
    <label>9</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ifrim, G., Bakir, G.,&amp;Weikum, G. (2008). Fast Logistic Regression for Text Categorization with Variable-Length N-Grams. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 354-362). Association for Computing Machinery. &gt;https://doi.org/10.1145/1401890.1401936
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref10">
    <label>10</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Li, X., Bing, L., Zhang, W.,&amp;Lam, W. (2019). Exploiting BERT for End-to-End Aspect-Based Sentiment Analysis. In Proceedings of the 5th Workshop on Noisy User-Generated Text (W-NUT 2019) (pp. 34-41). Association for Computational Linguistics. &gt;https://doi.org/10.18653/v1/D19-5505
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref11">
    <label>11</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     McClure, E.&amp;Wald, B. (2022). Algorithmic Microaggressions. Feminist Philosophy Quarterly, 8, Article 5. &gt;https://doi.org/10.5206/fpq/2022.3/4.14276
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref12">
    <label>12</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Miaschi, A.,&amp;Dell’Orletta, F. (2020). Contextual and Non-Contextual Word Embeddings: An In-Depth Linguistic Investigation. In Proceedings of the 5th Workshop on Representation Learning for NLP (pp. 110-119). Association for Computational Linguistics. &gt;https://doi.org/10.18653/v1/2020.repl4nlp-1.15
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref13">
    <label>13</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Microaggressions Project (2023). Microaggressions in Everyday Life. &gt;https://www.microaggressions.com/ 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref14">
    <label>14</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Nadal, K. L. (2018). Microaggressions and Traumatic Stress: Theory, Research, and Clinical Treatment. American Psychological Association. &gt;https://doi.org/10.1037/0000073-000
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref15">
    <label>15</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Nadal, K. L., Griffin, K. E., Wong, Y, Hamit, S.,&amp;Rasmus, M. (2014). The Impact of Racial Microaggressions on Mental Health: Counseling Implications for Clients of Color. Journal of Counseling&amp;Development, 92, 57-66. &gt;https://doi.org/10.1002/j.1556-6676.2014.00130.x
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref16">
    <label>16</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ngueajio, M. K., Hernandez, I., Cornett, K., Washington, G.,&amp;Parsons, D. (2023). Towards Identification of Microaggressions in Real-Life and Scripted Conversations, Using Context-Aware Machine Learning Techniques. &gt;https://openreview.net/forum?id=z7FfWq2iaW4 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref17">
    <label>17</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Nowak, J., Taspinar, A.,&amp;Scherer, R. (2017). LSTM Recurrent Neural Networks for Short Text and Sentiment Classification. In L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L. Zadeh,&amp;J. Zurada (Eds.), Artificial Intelligence and Soft Computing (pp. 553-562). Springer International Publishing. &gt;https://doi.org/10.1007/978-3-319-59060-8_50
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref18">
    <label>18</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ògúnrèmí, T., Basile, V.,&amp;Caselli, T. (2022). Leveraging Bias in Pre-Trained Word Embeddings for Unsupervised Microaggression Detection. Italian Journal of Computational Linguistics, 8. &gt;https://doi.org/10.4000/ijcol.1066 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref19">
    <label>19</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Pandey, R.,&amp;Singh, J. P. (2023). BERT-LSTM Model for Sarcasm Detection in Code-Mixed Social Media Post. Journal of Intelligent Information Systems, 60, 235-254. &gt;https://doi.org/10.1007/s10844-022-00755-z
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref20">
    <label>20</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Raichur, A., Lee, N.,&amp;Moieni, R. (2023). A Natural Language Processing Approach to Promote Gender Equality: Analysing the Progress of Gender-Inclusive Language on the Victorian Government Website. Open Journal of Social Sciences, 11, 513-529. &gt;https://doi.org/10.4236/jss.2023.119033
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref21">
    <label>21</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Steinwart, I.,&amp;Christmann, A. (2008). Support Vector Machines. Springer Science&amp;Business Media.
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref22">
    <label>22</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Sue, D. W. (2010). Microaggressions in Everyday Life: Race, Gender, and Sexual Orientation. John Wiley&amp;Sons. 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref23">
    <label>23</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Sue, D. W.,&amp;Sue, D. (2003). Counseling the Culturally Diverse: Theory and Practice. John Wiley&amp;Sons. 
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref24">
    <label>24</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Sue, D. W., Capodilupo, C. M., Torino, G. C., Bucceri, J. M., Holder, A., Nadal, K. L.,&amp;Esquilin, M. (2007). Racial Microaggressions in Everyday Life: Implications for Clinical Practice. American Psychologist, 62, 271-286. &gt;https://doi.org/10.1037/0003-066X.62.4.271
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref25">
    <label>25</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ying, X. (2019). An Overview of Overfitting and Its Solutions. Journal of Physics: Conference Series, 1168, Article ID: 022022. &gt;https://doi.org/10.1088/1742-6596/1168/2/022022
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref26">
    <label>26</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Zhang, J., Jin, R., Yang, Y.,&amp;Hauptmann, A. (2003). Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization. In Machine Learning, Proceedings of the Twentieth International Conference (ICML 2003) (pp. 888-895). AAAI Press.
    </mixed-citation>
   </ref>
   <ref id="scirp.134674-ref27">
    <label>27</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Zulqarnain, M., Ghazali, R., Hassim, Y. M.,&amp;Rehan, M. (2020). Text Classification Based on Gated Recurrent Unit Combines with Support Vector Machine. International Journal of Electrical and Computer Engineering, 10, 3734-3742. &gt;https://doi.org/10.11591/ijece.v10i4.pp3734-3742
    </mixed-citation>
   </ref>
  </ref-list>
 </back>
</article>