Small Scale Predictive Analysis of Gender Balance in Australia Using Grey Models: Integrating Labour Force and Migration Data

Abstract

Gender balance is a key part of the Australian identity, for creating diverse workplaces and fostering social cohesion throughout Australia. This study aims to provide a comprehensive understanding of gender balance in Australia by exploring the labour force and overseas migrations for equality, diversity, and inclusion place. The research proposed a Gender Balance (GB) index metric based on Becker’s coefficient considering labour force and migration data to measure GB index. With small dataset comprising a total of 16 data points for each Australian state, covering from 2004 to 2022 were used to forecast GB index for the next five years. Arima, Grey Model GM (1, 1) and GM (2, 1) were used as forecasting models. The research revealed GM (1, 1) to be the optimal model to forecast gender balance index trends. The findings can inform policy decisions and interventions to promote greater gender equality and equity nationwide.

Share and Cite:

Rios, E. , Hou, S. , Lee, N. and Moieni, R. (2024) Small Scale Predictive Analysis of Gender Balance in Australia Using Grey Models: Integrating Labour Force and Migration Data. Open Journal of Social Sciences, 12, 448-469. doi: 10.4236/jss.2024.127033.

1. Introduction

Promoting gender equality in the workplace is a key driver of organisational success and societal progress. Understanding the dynamics of gender balance within the workforce is crucial for companies to effectively manage diversity and foster inclusive cultures. Creating a diverse working environment still remains a significant challenge for companies in Australia, especially in the context of global STEM fields where women represent just 29.2% of the workforce despite constituting 49.3% of non-STEM employment (World Economic Forum, 2023). In response to this imperative, this academic paper presents a comprehensive study aimed at predicting gender trends in the Australian labour market and overseas migration.

This paper addresses the challenge of predicting gender balance in the Australian workplace by employing forecasting models based on small datasets. This study aims to provide actionable insights for companies seeking to enhance their understanding of workforce dynamics and promote gender equality initiatives.

The methodology employed in this study draws on two primary datasets: census data on the labour force (Australian Bureau of Statistics, n.d.-a) and overseas migration (Australian Bureau of Statistics, n.d.-b), categorised by sex and state. The forecasting models, including ARIMA (AutoRegressive Integrated Moving Average), Grey Models (GM), specifically GM (1, 1), and GM (2, 1), have been developed to predict the trends of gender balance at both the state and national levels.

Furthermore, Becker’s formula of discrimination (D) (Becker, 1971) has been adapted to fit within the forecasting framework, assisting in evaluating the quality of equal opportunity within the Australian employment market. Regarding the diversity index proposed by Moieni & Mousaferiadis (2022), this paper introduces a novel approach to integrating Becker’s discrimination indices into the forecasting models. Regarding the diversity index proposed by Moieni & Mousaferiadis (2022), this paper introduces a novel approach to integrating Becker’s discrimination indices into the forecasting models. This involves aggregating discrimination indices into a single coefficient, which serves as a key parameter for the integrated forecasting model. This approach enables a comprehensive assessment of gender parity trends in Australia.

By elucidating the relationship between overseas migration patterns, employment trends, and gender balance, this research contributes to the ongoing discourse on gender equality and diversity in the workplace. The findings and insights derived from this study have significant implications for policy formulation, organisational practices, and societal initiatives aimed at fostering inclusive and equitable employment environments in Australia.

Subsequent sections of this paper explore the theoretical foundations of forecasting models, discuss the methodology employed in data analysis, present empirical findings, and offer recommendations for enhancing gender parity and equality in the Australian labour market. Through rigorous academic inquiry, this paper seeks to advance the understanding of gender dynamics in employment and inform evidence-based interventions for presenting inclusivity and fairness in the workplace.

Problem Statement

The Australian labour force and overseas migration relationship is vital for equality, diversity, and inclusion, specifically creating diverse workplaces and fostering social cohesion across Australia. Nonetheless, the gender balance in Australia, considering these factors, remains insufficiently understood and studied. Though confined to the Australian context, this research emphasizes the necessity of developing a metric to measure gender balance based on labour force participation and migration patterns across Australia. This metric is essential for measuring gender balance and promoting equal opportunities, creating diverse workplaces, and fostering social cohesion in Australia’s multicultural landscape.

2. Literature Review

A) Importance of Gender Balance

Gender balance is a crucial issue that has far-reaching implications for societies across the globe. Achieving gender equality is not only a fundamental human right but also a prerequisite for sustainable development, economic growth, and social cohesion. Moreover, gender balance is essential for promoting equal opportunities, empowering both genders, and fostering inclusive societies. Numerous studies have demonstrated the positive impacts of gender equality on various aspects of societal well-being. For instance, gender balance in the workforce leads to increased productivity, innovation, and economic growth (Gray et al., 2022). Women’s participation in the labor market contributes to a more diverse and skilled talent pool, driving economic prosperity. In education and healthcare has been shown to improve outcomes for both women and children, leading to better overall societal well-being (Gakidou et al., 2010). Increased representation of women in leadership and decision-making roles fosters more inclusive and diverse perspectives, contributing to better governance and policy-making (Torchia et al., 2011). Nonetheless, exist gender imbalances worldwide. Despite progress towards gender equality, significant disparities persist in the Australian labor market. The Workplace Gender Equality Agency’s 2022 report found that women hold only 19.4% of CEO positions and 32.5% of key management roles in non-public sector organizations. The gender pay gap also remains, with women earning on average 13.8% less than men (Workplace Gender Equality Agency, 2022). These imbalances are often attributed to societal attitudes, lack of affordable childcare, and inadequate policies to support work-life balance. Women continue to bear a disproportionate share of unpaid domestic and caregiving responsibilities, impacting their workforce participation and career progression (Wilkins, 2017). Additionally, migration has been identified as a potential solution to address labor shortages in Australia, but migrant women and men often face additional barriers such as language issues, lack of local experience, and non-recognition of overseas qualifications, leading to underemployment or employment in lower-skilled jobs (Australian Government Department of Home Affairs, 2023). Researching gender balance in Australia’s labor force and migration is crucial for identifying systemic barriers and societal attitudes contributing to gender imbalances, informing policies that promote equal opportunities (Clemens, 2023). Ensuring the full participation of both genders, including migrants, maximizes the available talent pool and addresses skill shortages, leading to improved economic outcomes and reduced poverty (Productivity Commission, 2022). Providing insights into the challenges faced by migrants helps inform targeted support services, facilitating their labor market integration (Sharma et al., 2024). Therefore, this study can help to identify the gender balance opportunities across Australia and promote a more inclusive and equitable labour force, benefiting both genders and the country economy.

B) The Rise of Small Data

Given the constraints imposed by the limited availability of online data sources, this research project will concentrate on applying forecasting models to small datasets. This focus aligns with the growing recognition in the era of big data of the significant value and importance of “small data”—relatively small samples of qualitative or quantitative data that can provide rich insights (Lindstrom, 2016). While big data focuses on finding patterns across massive datasets, small data allows for a deeper dive into specifics and context.

Small data has several key advantages. It is typically easier and less expensive to collect and more feasible compared to big data especially for resource-constrained organisations (Kitchin & Lauriault, 2014; Hekler et al., 2019). Small datasets also lend themselves better to human analysis and sense-making rather than being overly reliant on algorithms and machine learning models that can miss nuances (Mayer-Schonberger & Cukier, 2013). Additionally, small datasets are present in diverse domains including business, where companies are using small data from customer surveys, social media comments, and other sources to gain insights into consumer needs, brand perception, and product issues in a targeted way (Bose & Mahapatra, 2001). In healthcare, Analysing small datasets like medical records and patient-reported outcomes can reveal treatment effects, disease progression patterns, and other insights complementary to large clinical trials (Baro et al., 2015). Qualitative small social sciences data like interviews, ethnographies, and focus groups remain vital for understanding human behaviours, motivations, and lived experiences (Kitchin, 2014). Small datasets can be found in government such as constituents, localities, and specific issues can inform policymaking in a contextual, actionable manner (Verhulst et al., 2019).

Therefore, the size of the dataset available can significantly impact the choice and performance of forecasting models. While large datasets enable more complex models, small datasets require simpler but robust approaches. For example:

1) Large Datasets: Various model techniques cater to large datasets, including machine learning models and deep learning methods. Machine learning techniques like neural networks, random forests, and gradient boosting effectively capture complex non-linear patterns for forecasting, offering flexibility in modelling high-dimensional data and intricate interactions between predictors (Cerqueira et al., 2022). Deep learning, exemplified by recurrent neural networks (RNNs) such as LSTMs and transformer-based models like DeepAR, demonstrates state-of-the-art performance in large-scale time series forecasting tasks by automatically learning long-range temporal dependencies (Brownlee, 2018). While these techniques exhibit powerful capabilities, they also present potential limitations. They excel in capturing complex patterns, high-dimensionality, automation, reduced need for manual feature engineering, and improved accuracy from reduced overfitting on massive data (Cerqueira et al., 2022). Forecasting models, however, tailored for large datasets may encounter challenges when applied to smaller datasets. These models, accustomed to abundant data inputs, may struggle with overfitting on limited samples, where the model’s complexity surpasses the available information (Cerqueira, 2022).

2) Small Datasets: When working with limited data, choosing an appropriate forecasting model is crucial to obtain accurate and reliable predictions. Several techniques have been developed specifically for small sample sizes: the classical techniques statistical model and Bayesian methods. Classical statistical models such as ARIMA, linear regression, and Grey Model are commonly favoured. Despite lower flexibility, these models offer greater transparency, ease of specification, and reduced susceptibility to over fitting. Alternatively, Bayesian structural time series models like Prophet integrate domain knowledge through prior distributions, thereby enhancing forecast accuracy and providing more reliable uncertainty estimates for small datasets (Taylor & Letham, 2018).

C) Forecasting Models for Small Datasets

In alignment with the study of diversity prediction using machine learning on small datasets (Moieni et al., 2023), this study opts for forecasting models adept at handling limited data. GM(1, 1) and (2, 1), alongside ARIMA, are selected for gender balance research. The performance of each model is evaluated through metrics such as the Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), and Mean Absolute Error (MAE), ensuring a comprehensive analysis of model performance in predicting gender balance within the Australian labour market.

The ARIMA model is a classical statistical technique that combines three components: Autoregressive (AR), Integrated (I), and Moving Average (MA). The general form of the ARIMA model is represented as ARIMA (p, d, q), where p is the order of the AR component, d is the degree of differencing required to make the time series stationary, and q is the order of the MA component (Abu-Bakar & Rosbi, 2017). Hence, the ARIMA model is well-suited for small datasets due to its simplicity, transparency, and reduced susceptibility to overfitting. Despite its lower flexibility compared to more complex models, ARIMA offers reliable forecasts when working with limited data (Pan, Zhang et al., 2016). On the other hand, GM(1, 1) is a time series forecasting technique particularly useful for small datasets and can effectively handle incomplete or uncertain information (Caro et al., 2020). The GM(1, 1) model is based on the accumulation of the original time series data and uses a first-order differential equation to describe the behavior of the system (Wang et al., 2018). This model is known for its simplicity, as it requires only a few data points to build the model. However, it may struggle to capture complex patterns or non-linear relationships in the data. Lastly, the GM(2, 1) is an extension of the GM(1, 1) model, designed to handle data with non-linear characteristics or fluctuations. The GM(2, 1) model uses a second-order differential equation to describe the system’s behavior, allowing for more flexibility in capturing non-linear patterns. The GM(2, 1) model is more complex than the GM(1, 1) model and involves additional terms to capture non-linear behavior (Shao et al., 2012) and it requires more data points that GM(1, 1) to calculate additional parameters.

D) Challenges of Forecasting with Small Datasets

While large datasets have enabled powerful forecasting models like deep learning, many real-world applications involve working with limited data samples. To evaluate the performance and suitability of various forecasting techniques on such limited data, this study employs a small dataset and applies multiple forecasting models for comparison. Forecasting with small datasets poses several key challenges. This literature review examines the key problems associated with forecasting using small datasets and discusses potential strategies to address these challenges.

One prominent issue with small datasets is the increased risk of model overfitting. With fewer data points available for training, forecasting models may capture noise or random fluctuations in the data, leading to overly complex models that perform poorly on unseen data. Overfitting not only compromises the accuracy of forecasts but also undermines the model’s generalisability to new scenarios or data distributions (Hastie et al., 2009). Moreover, small datasets often lack the diversity and representativeness necessary for capturing the full range of variability in the underlying data generating process. This limitation can result in biased or unreliable forecasts, as the model may fail to capture important patterns or relationships present in the broader population (Gelman et al., 2013). Another challenge stems from the limited feature space available in small datasets. Forecasting models rely on a set of input features to make predictions, and small datasets may not encompass all relevant variables or factors influencing the target variable. As a result, the model’s predictive power may be constrained, hindering its ability to generate accurate forecasts (West & Harrison, 2006).

E) Application of Becker’s Formula of Discrimination

Given the constraints of current datasets, this study aims to develop a predictive model using this limited data to forecast future trends in gender balance within the Australian labour market. At this stage, Becker’s discrimination formula (D), as introduced by Becker (1971), serves to quantify taste-based discrimination by employers, encompassing gender biases. Chen (2020) successfully utilised this coefficient (D) to evaluate opportunity equality in both higher education and the labour market, demonstrating its efficacy as a measure of gender balance. This index allows for the assessment of employment opportunity disparities between males and females. By incorporating variables related to overseas migration and the employment market in Australia, alongside the application of Becker’s formula, the study seeks to determine the level of gender discrimination.

F) Development of a Gender Balance (GB) Index

The journey continues with the development of the GB index, inspired by the cultural diversity index proposed by Moieni and Mousaferiadis (2022). Employing their methodology to calculate the weights of each attribute allows for the creation of GB index. This innovative index is designed to quantify the discrimination level between genders. Subsequently, the forecasting models will be built to identify the most fitting approach for assessing discrimination levels and predicting gender equality in the workplace.

3. Methodology

This section addresses how series data were harvested and transformed. This research is focusing mainly on predict gender equality by sex and state/territory. Therefore, exploring diverse factors that contribute to measure gender equality were taking into consideration including overseas migration by state/territory classify by sex from 2004 onwards (Australian Bureau of Statistics, n.d.-b) and labour force status by sex, state and territory from 1978 onwards (Australian Bureau of Statistics, n.d.-a). Both datasets have different periods of time (years). Consequently, it was considered from 2004 to 2022 (See Appendix).

It was assumed that the count of employees encompasses individuals from diverse backgrounds, including migrants, citizens, and Aboriginal peoples. Moreover, it was applied Becker’s Coefficient formula (Chen, 2020) DM and DE where each coefficient corresponds to Overseas Migration Discrimination and Employment Discrimination respectively, (see equation 1 and 2). The next stage is to introduce a GB index. Finally, this study also applied the concept of time series analysis including ARIMA, grey model (1, 1), grey model (2, 1) to predict gender balance from 2016 to 2019.

A) Becker’s Coefficient Calculations

This section is divided into two steps. The first step is Becker’s Coefficient Calculations. The second step is coefficients interpretation.

  • Step 1: Becker’s Discrimination formula (D)

This study applied Becker’s Discrimination formula (D) to measure the equality of opportunity in the Australian employment market and overseas migration in Australia.

The Migration Discrimination (DM) was calculated using the total migration figures for males and females across Australian states over various years. For instance, in Victoria in 2004, the numbers for males (MigM) and females (MigF) migration were 43,860 and 40,920, respectively. The proportion (MigM/MigF) was then calculated for each state and territory by year using respective data (see Equation (1)). Hence, 17 DM data was obtained by year from 2004 to 2022 by each Australian state.

Migration Discrimination (DM):

D M = MigM MigF 1 (1)

  • MigM = Number of Male Migrating

  • MigF = Number of Male Migrating

The Employment Discrimination (DE) was calculated using the total employment figures for males and females across Australian states over various years. For instance, in Victoria in 2004, the numbers for males (EmpM) and females (EmpF) employee were 15,809,000 and 12,743,700, respectively. The proportion (EmpM/EmpF) was then calculated for each state and territory by year using respective data (see Equation (2)). Hence, 17 DE data was obtained by year from 2004 to 2022 by each Australian state.

Employment Discrimination (DE):

D E = EmpM EmpF 1 (2)

  • EmpM = Number of Male employees.

  • EmpF = Number of Male employees.

  • Step 2: Interpret DM and DE values

DM or DE are from a range from −1 to 1. A positive DM or DE suggests fewer opportunities for females in either overseas migration or employment. A negative DM or DE indicates favoritism towards females in migration or employment opportunities. DM or DE close to 0 implies gender-neutral overseas migration or employment opportunities.

B) Gender Balance Index

A definition of gender balance might encompass a variety of factors such as age and nationality. However, the scope of this analysis is shaped by the available data, which determines the specific factors considered.

These factors, while contributing to overall gender balance in a multicultural country and in the workplace is a key factor of a success of organisations implies a diversity place to work, inclusive and equal opportunities for both genders. On that account, this section shows the process how to find coefficients for each dataset and introduce a gender balance formulation (Equation (5)).

  • Aggregation of Discrimination Indexes

To integrate the discrimination indices (Equation (1) and (2)) into the forecasting models, it will be aggregated into a single index. This process involves calculating the proportion of overseas migration and total employed people across all Australia states, as coefficients for the integrated formula (Equation (3) and (4)) considering data from 2004 to 2022. These proportions represented the total of overseas migration (OMigrationT) and the total of population of labour force (LabourFT) during the same period. The coefficients are calculated as follows:

Overseas Migration Coefficient (PM):

P M =  OMigrationT OMigrationT+LabourFT (3)

  • OMigrationT = Total population of overseas migration across Australia.

  • LabourFT = Total population of labour force across Australia.

Employment Coefficient (PE):

P E =1 P M (4)

  • Gender Balance Formulation

The coefficients derived from overseas migration PM and employment PE data are crucial for time series taking into consideration two attributes at the same time (see Equation (3) and (4)). These coefficients provide insights into gender parity in overseas migration and employment trends in Australia. By analysing these indices, organisations can gain valuable insights into the equality of opportunities for males and females in the Australian labour market and overseas migration in a multicultural country. Therefore, the “Gender Balance” formula (Equation (5)) is designed to measure equality across gender by incorporating these coefficients (PM and PE) into the “Gender Balance” index. As these coefficients global all the information of all the years and of all the states as a whole in migration and employment, respectively.

Gender Balance= P M D M + P E D E

Gender Balance=0.003 D M +0.997 D E (5)

Final dataset included date in years from 2004 to 2022 and 8 columns where each column correspond to a state or territory in Australia, respectively. After apply the new index propose above (Equation (5)). Table 1 provides a summary of the data after apply the new “Gender Balance” index.

Table 1. A table to show the final dataset after apply GB index and each column is an Australian State/Territory.

Years

NSW

VIC

QLD

SA

WA

TAS

NT

ACT

C) Forecasting Gender Balance Trends through Time Series Analysis

This research endeavors to leverage the recently developed “Gender Balance” index (GB) alongside three distinct time series forecasting methodologies: Autoregressive Integrated Moving Average (ARIMA) (Hyndman & Athanasopoulos, 2018), Grey Model (1, 1) (Julong et al., 1989), and Grey Model (2, 1) (Gan et al., 2015) for projecting GB trends over the next 5 years. To ensure the reliability of the forecasting models, comprehensive adequacy checks are conducted prior to their application. By comparing the three methodologies, the most effective approach has been selected to forecast GB trends for the forthcoming 5 years.

Evaluating the performance of a forecasting model constitutes a pivotal aspect of this study, given its direct impact on decision-making processes. By analyzing the disparity between predicted and actual values, this study applied the most common metrics including Mean Absolute Percentage Error (MAPE) (Hyndman & Athanasopoulos, 2018), Mean Absolute Error (MAE) (Hyndman & Athanasopoulos, 2018), and Mean Squared Error (MSE). These evaluation criteria facilitate a comprehensive assessment of the forecasting accuracy, providing invaluable insights into the performance of the selected models.

4. Data Analysis and Results

This section encapsulates the findings derived from the research following Exploratory Data Analysis (EDA), examination of GB index, and time series analysis.

A) Anomaly Detection

Utilising Exploratory Data Analysis (EDA), this study discerned outliers within overseas migration trends, exerting influence on forecast models. Overseas migration plots (Figure 1) highlighted two data points deviating from the general trend in 2020 and 2021 across genders. Specifically, in New South Wales (NSW), male and female overseas migration experienced marginal declines to 56.6% and 56%, respectively, in 2020. Nevertheless, Victoria (VIC) witnessed a significant drop to 68% (male) and 67% (female), marking the highest decline since the onset of the pandemic. Similarly, Queensland (QLD) reported decreases to 63% and 62.7% for male and female migrants, respectively. These fluctuations are likely attributable to COVID-19 restrictions (Australian Government Department of Health of Age Care, 2024), where the number of travelers entering Australia dropped during these two years.

Figure 1. Overseas Migration and Labour Force original data. It was found that two outliers in 2020 and 2021 in both dataset in both genders.

In 2020, the labour force, as depicted in Figure 1, experienced significant impacts. Specifically, within New South Wales (NSW), there was a noticeable decrease in the number of jobs, with a decline of approximately 2.9% for males and 1.4% for females compared to the preceding year, coinciding with the onset of COVID-19 cases. Meanwhile, the participation of female labour force in Victoria exhibited the most pronounced reduction, plummeting by 3.1%. The predominant cause of this decline in the labour force can be attributed to the disruptive effects of the COVID-19 pandemic. These effects encompassed various factors, including enforced shutdowns, widespread business closures, and diminished consumer demand, collectively contributing to the contraction observed in labour force participation rates.

In consequence, data from 2020 and 2021 were excluded from statistical calculations due to their susceptibility to outliers. These years were deemed highly sensitive to anomaly in the data, which could significantly skew statistical analyses.

B) Trend of Gender Balance Index

Figure 2 illustrates the GB index across Australian states, considering labour force participation and overseas migration, to elucidate gender-based opportunities over time. Early in the 2000s, Western Australia (WA) exhibited the highest GB index, suggesting fewer opportunities for females compared to males. However, since 2016, the gap in the GB index between genders has narrowed in WA, indicating an increasing presence of females in both the labour force and overseas migration patterns within the state. Furthermore, from 2004 to 2022, New South Wales (NSW), Victoria (VIC), Queensland (QLD), and South Australia (SA) have progressively closed the gap in gender-based opportunities, with GB index improvements of approximately 56.5%, 53.3%, 74%, and 65.2%, respectively. In contrast, the Australian Capital Territory (ACT) displayed a consistently lower GB index since 2004, suggesting a relatively stable representation of gender-based opportunities over the years. This observation aligns with data from the Australian Bureau of Statistics (2023), indicating a balanced gender composition in the ACT without a significant prevalence of males over females.

Figure 2. GB index plots across Australian States/territories. This index global both genders.

5. Data Forecasting

This section comprises two stages delimited for comprehensive analysis. The initial stage encompasses model comparison to det the optimal forecasting model. This phase is subdivided into subsections, each presenting the application of three distinct forecasting models. The subsequent stage entails the prediction of gender balance over the next five years utilising the superior model identified in the initial stage.

A) Model Comparison and Selection

Overall, the dataset underwent a partition into training data spanning from 2004 to 2015 and testing data covering the years 2016 to 2019. This partitioning strategy was implemented across three forecasting models: ARIMA, GM(1, 1), and GM(2, 1). Subsequent evaluation of each model was conducted based on performance metrics and median across all states.

1) ARIMA model: Utilising the itertools function, an exhaustive search was conducted to scan parameters p, d, and q within a range of 0 to 5, aiming to identify the optimal parameters for each state/territory. Remarkably, none of the states/territories exhibited seasonality in their data. Table 2 illustrates the determined parameters (p, d, q) for each Australian state/territory. The Australian Capital Territory (ACT) parameters indicate that errors are uncorrelated over time. However, this does not inherently signify whether the model is effective or ineffective.

Table 2. ARIMA parameter for each state.

ARIMA parameter

NSW

VIC

QLD

SA

WA

TAS

NT

ACT

(p, d, q)

(4, 1, 0)

(1, 0, 0)

(0, 1, 0)

(1, 0, 0)

(1, 0, 0)

(4, 0, 0)

(0, 0, 1)

(0, 0, 0)

2) Grey Model (1, 1): Figure 3 presents a comparison between ARIMA and Grey Model (1, 1) forecasts. GM(1, 1) predictions closely align with actual values, indicative of its precision in capturing the underlying trends. Discrepancies, however, between predicted and actual values are notable for Western Australia (WA) and the Northern Territory (NT). One contributing factor to these deviations is the population dynamics within these states/territories, where fluctuations in overseas migration and employment rates occurred notably between 2014 and 2016. Consequently, WA experienced a substantial decrease of 23.5%, while NT observed a pronounced drop of 30% during these years. These variations significantly impacted the accuracy of the forecast models. Mathematically, GM(1, 1) operates by accumulating the original data sequence to generate a new sequence that adheres to an approximate exponential law (Wang et al., 2020). This method proves effective when the changes in the original data are not substantial. However, in scenarios with significant fluctuations, such as observations in WA and ACT during the specific period, GM (1, 1) may exhibit limitations in accurately capturing the dynamic patterns inherent in the data. In contrast, this study utilised only 12 data points to build the ARIMA model. This limited dataset raises concerns regarding the potential impact of random fluctuations (Hyndman & Athanasopoulos, 2018), as illustrated in Figure 3. Such fluctuations can affect the forecasting performance, highlighting the inherent risks associated with using a smaller number of data points.

Figure 3. Comparative Analysis of ARIMA, GM(1, 1) to predict GB Index by states/territories from 2016 to 2019.

3) Grey Model (2, 1): Figure 4 depicts the application of the Grey Model (2, 1), revealing a conspicuous exponential pattern across all Australian states/territories. This observation aligns with the theoretical underpinnings of the GM(2, 1) model equations. However, it’s noteworthy that the GM (2, 1) may encounter challenges in achieving a precise fit due to structural disparities between the estimation and prediction equations, as well as the inherent limitations in grey modelling approaches, which may struggle to fully encapsulate the intricacies of real-world data.

Figure 4. Comparative Analysis of ARIMA, GM(1, 1) and GM(2, 1) to predict GB Index by states/territories from 2016 to 2019.

4) Model Selection: In examining the performance of GM(2, 1) across the New South Wale (NSW), Australian Capital Territory (ACT) and South Australia (SA) as depicted in Figure 5, higher Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Mean Squared Error (MSE) values were observed compared to other states. These outcomes suggest a discernible exponential trend as shown in Figure 4. Conversely, in Northern Territory (NT) and New South Wales (NSW), GM(1, 1) exhibited elevated metrics, potentially attributable to the model’s sensitivity to abrupt fluctuations in data. The fluctuations in NSW’s real data, particularly in 2019, could be associated with Grey Model’s response to drastic changes. Specifically, the relatively low number of COVID-19 cases within Australia in 2019 impacted working hours across various occupations and led to a decline in overseas migration numbers prior to the declaration of the global pandemic (Australian Bureau of Statistics, 2020).

Figure 5. The bar chart compares MAPE, MSE and MAE metric evaluating ARIMA, GM (1, 1) and (2, 1) models.

In Figure 5, a comprehensive comparison of metrics highlights the widespread suitability of GM(1, 1) model across the majority of states. Particularly, examination of the median metrics by each model (Table 3) suggests that GM(1, 1) model consistently outperforms others in predicting GB index across the Australian continent. Consequently, considering the lower metric values and median, GM(1, 1) emerges as the most favourable choice for forecasting GB trends in Australia.

Table 3. Metrics results by states, and the median value of each model.

Australia States


MAPE



MAE



MSE



ARIMA

GM11

GM 21

ARIMA

GM 11

GM 21

ARIMA

GM 11

GM 21

NSW

8.082

16.082

1607872.726

0.011

0.022

2280.085

0.013

0.023

2992.351

VIC

19.441

5.645

1454830.008

0.027

0.008

2056.829

0.033

0.011

2687.721

QLD

22.053

19.684

41512.257

0.021

0.019

40.206

0.024

0.020

45.272

SA

48.251

24.081

4322971.240

0.050

0.0251

4345.863

0.053

0.027

5891.595

WA

39.047

31.347

8882.081

0.071

0.057

16.195

0.073

0.058

17.271

TAS

42.545

17.500

1448670.712

0.039

0.016

1402.649

0.045

0.019

1826.356

NT

43.133

65.896

11984.144

0.047

0.075

13.946

0.055

0.082

14.959

ACT

105.897

58.248

3855149.642

0.030

0.014

1582.997

0.031

0.019

2144.968

Median

40.796

21.882

1451750.360

0.034

0.021

1492.823

0.039

0.022

1985.662

B) Predicting the next 5 years

Grey Model (1, 1) shows the optimal performance for predicting GB trends from 2016 to 2019, as evidenced by the findings presented in Table 3 and the metrics evaluation depicted in Figure 5. Additionally, Figure 6 indicates a narrowing

Figure 6. Projection results for the next five years applying GM(1, 1).

of the gap between male and female GB projections over the next five years. Significant disparities, however, are observed in NT, where the predictions diverge considerably from real data. This discrepancy in NT can be attributed to the operational mechanism of GM(1, 1), which relies on accumulating the original data sequence. In this instance, the model was trained with only 12 data points, revealing a distinct upward trend. The model extrapolates future values by adhering to this positive trend, potentially leading to deviations from actual observations.

Table 4 presents forecasted projections for the GB gap opportunities, with the Australian Capital Territory (ACT) positioned to achieve parity by 2026, closely followed by TAS and QLD exhibiting promising trajectories towards closing the gap. On the contrary, the populous states of NSW and VIC demonstrate a concerning slow pace of progress, highlighting the need for both research investigation and policy action. This discrepancy across regions can be attributed to a multitude of factors, including job opportunities, visa and immigration policies, cultural and social norms.

Table 4. Forecasting GB trends Across Australian States/Territories over the next five years.

Years

NSW

VIC

QLD

SA

WA

TAS

NT

ACT

2022

0.147

0.127

0.101

0.118

0.230

0.092

0.225

0.042

2023

0.143

0.122

0.096

0.114

0.227

0.088

0.229

0.040

2024

0.140

0.118

0.092

0.111

0.225

0.084

0.232

0.039

2025

0.136

0.114

0.088

0.108

0.222

0.080

0.236

0.038

2026

0.133

0.110

0.084

0.105

0.219

0.077

0.240

0.036

6. Conclusion

This study evaluated the performance of three forecasting models—ARIMA, GM(1, 1), and GM(2, 1)—in predicting Gender Balance (GB) trends across Australian states and territories. The dataset consisted of 16 data points, which were partitioned into training data from 2004-2015 and testing data from 2016-2019.

ARIMA model was optimised through an exhaustive search of parameter combinations and none of the states/territories exhibited seasonality in their data. GM(1, 1) closely aligned with actual values, demonstrating its effectiveness in capturing trends. However, discrepancies were observed for WA and the NT, likely due to significant population dynamics and fluctuations during 2014 and 2016. By contrast, GM(2, 1) exhibited a conspicuous exponential pattern across all states/territories, in line with its theoretical underpinnings. However, this model faced challenges in achieving a precise fit, potentially due to structural inconsistencies between the estimation and prediction equations, as well as inherent limitations of grey modelling approaches.

Evaluation of performance metrics revealed the widespread suitability of GM(1, 1), which consistently outperformed the other models in predicting GB across the majority of states. The median prediction results further reinforced the model as the most favourable choice for forecasting GB trends in Australia. While GM(1, 1) showed outstanding results, discrepancies were observed in WA and NT, where the model’s reliance on a limited 12 data points training set led to extrapolation of an upward trend that diverged from actual observations. GB index gap projections for the next five years, utilising GM(1, 1) analysis, underscores the persistent nature of gender inequality in workforce participation across Australia. However, the degree of this disparity varies considerably among states and territories. It is anticipated that by 2026, the ACT, TAS, and QLD will achieve greater gender parity and diversity in employment opportunities. Nonetheless, in major urban centers such as NSW and VIC, the pace of closing this gap appears sluggish. This phenomenon can be attributed to various factors including job market dynamics, visa and immigration policies, as well as cultural and social norms. These factors can significantly influence workforce participation rates, and certain visa categories may impose restrictions on employment opportunities for women, exacerbating gender disparities within the labor market. In conclusion, GM(1, 1) emerges as the optimal choice for predicting GB trends in Australia from 2016 to 2019, as evidenced by its superior performance across various evaluation metrics. Nevertheless, caution is warranted when applying this model to regions with significant population dynamics and limited data availability, as demonstrated by the case of Northern Territory. Additionally, the GB index, which can be applied within a company to measure the gender balance, may prove effective evidence there is a sufficient amount of accessible data, and the coefficient may vary depending on the volume of available datasets.

7. Limitations and Future Research

This study was limited by data availability, data collection and time constraints. The data scope was confined to the Australian Bureau of Statistics (ABS), the only source providing consistent data information classified by gender. This limitation extended to number of factors to consider, as other dataset sources were not consistent with the time frame, state information, and most of information was not classify by gender. Moreover, finding consistent data by state online proved challenging as state government data availability is discretionary and is not universally open source in all cases. For instance, attempts to integrate education data classified by state and gender into the forecasting model were impeded due to insufficient date data with other attributes. This highlights the challenges of other factors to measure gender balance using open-source data, especially across all states with gender classification.

Future research stands to gain considerable value from the integration of additional attributes such as birth country, language, and identification of Aboriginal and Torres Strait Islanders. Such endeavors hold promise for yielding deeper insights into gender equality, thereby furnishing companies and organisations with invaluable knowledge to cultivate diverse workplaces characterised by equal opportunities and inclusive cultures.

Acknowledgements

This research was supported by Diversity Atlas and their provision of data has been instrumental in shaping the findings of this study. We thank the entire Diversity Atlas team for their support, although they may not agree with all the interpretations/conclusions of the results.

Appendix

A. Git Hub Repository

The provided hyperlink grants access to the datasets utilised in this study, along with the associated results: Gender Balance Small Dataset.

Please note that this dataset is referenced throughout the paper for analysis and findings.

https://github.com/eloisjr/Small-scale-predictive-analysis-of-gender-balance-in-Australia-using-Grey-models-Integrating-Labour

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Australian Bureau of Statistics (2020, February). Labour Force, Australia, Detailed, Quarterly.
https://www.abs.gov.au/statistics/labour/employment-and-unemployment/labour-force-australia-detailed-quarterly/latest-release
[2] Australian Bureau of Statistics (2023). Regional Population by Age and Sex.
https://www.abs.gov.au/statistics/people/population/regional-population-age-and-sex/latest-release#australian-capital-territory
[3] Australian Bureau of Statistics (n.d.-a). Labour Force Status by Sex, State and Territory—Number of People Employed, Unemployed and Not in the Labour Force, Monthly, February 1978 and Onwards.
https://explore.data.abs.gov.au/vis?tm=labour%20force&pg=0&df[ds]=ABS_ABS_TOPICS&df[id]=LF&df[ag]=ABS&df[vs]=1.0.0&hc[Measure]=Labour%20Force&hc[ABS%20Topics]=LABOUR&pd=2004-01%2C2023-12&dq=M3.3%2B2%2B1.1599.30.1%2B2%2B3%2B4%2B5%2B6%2B7%2B8.M&ly[cl]=TIME_PERIOD&vw=tb
[4] Australian Bureau of Statistics (n.d.-b). Net Overseas Migration, Arrivals, Departures and Net, State/Territory, Age and Sex—Calendar Years, 2004 Onwards.
https://explore.data.abs.gov.au/vis?tm=Migration&pg=0&df[ds]=ABSABSTOPICS&df[id]=NOMCY&df[ag]=ABS&df[vs]=1.0.0&pd=2004%2C&dq=1%2B2%2B3.TOT.1%2B2%2B3..A&ly[cl]=TIMEPERIOD&ly[rw]=REGION&ly[rs]=SEX%2CMEASURE
[5] Australian Government Department of Health of Age Care (2024). Covid-19 and Travel.
https://www.health.gov.au/topics/covid-19/travel
[6] Australian Government Department of Home Affairs (2023). Migration Strategy.
https://immi.homeaffairs.gov.au/programs-subsite/migration-strategy/Documents/migration-strategy.pdf
[7] Bakar, N. A., & Rosbi, S. (2017). Autoregressive Integrated Moving Average (ARIMA) Model for Forecasting Cryptocurrency Exchange Rate in High Volatility Environment: A New Insight of Bitcoin Transaction. International Journal of Advanced Engineering Research and Science, 4, 130-137. [Google Scholar] [CrossRef
[8] Baro, E., Degoul, S., Beuscart, R., & Chazard, E. (2015). Toward a Literature-Driven Definition of Big Data in Healthcare. BioMed Research International, 2015, Article ID: 639021. [Google Scholar] [CrossRef] [PubMed]
[9] Becker, G. S. (1971). The Economics of Discrimination. University of Chicago Press. [Google Scholar] [CrossRef
[10] Bose, I., & Mahapatra, R. K. (2001). Business Data Mining—A Machine Learning Perspective. Information & Management, 39, 211-225. [Google Scholar] [CrossRef
[11] Brownlee, J. (2018). Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery.
[12] Caro, E., Juan, J., & Cara, J. (2020). Periodically Correlated Models for Short-Term Electricity Load Forecasting. Applied Mathematics and Computation, 364, Article ID: 124642. [Google Scholar] [CrossRef
[13] Cerqueira, V. (2022). Machine Learning for Forecasting: Size Matters.
https://towardsdatascience.com/machine-learning-for-forecasting-size-matters-b5271ec784dc
[14] Cerqueira, V., Torgo, L., & Soares, C. (2022). A Case Study Comparing Machine Learning with Statistical Methods for Time Series Forecasting: Size Matters. Journal of Intelligent Information Systems, 59, 415-433. [Google Scholar] [CrossRef
[15] Chen, T. (2020). Forecasting Gender Parity in Higher Education System and Labor Market in Japan and Korea. International Journal of Social Science and Humanity, 10, 96-100. [Google Scholar] [CrossRef
[16] Clemens, M. (2023). Helping Tackle Labor Shortages with Data: How the Department of Labor Can and Should Update Schedule A. Center for Global Development.
https://www.cgdev.org/blog/helping-tackle-labor-shortages-data-how-department-labor-can-and-should-update-schedule
[17] Gakidou, E., Cowling, K., Lozano, R., & Murray, C. J. (2010). Increased Educational Attainment and Its Effect on Child Mortality in 175 Countries between 1970 and 2009: A Systematic Analysis. The Lancet, 376, 959-974. [Google Scholar] [CrossRef] [PubMed]
[18] Gan, R., Chen, X., Yan, Y., & Huang, D. (2015). Application of a Hybrid Method Combining Grey Model and Back Propagation Artificial Neural Networks to Forecast Hepatitis B in China. Computational and Mathematical Methods in Medicine, 2015, Article ID: 328273. [Google Scholar] [CrossRef] [PubMed]
[19] Gelman, A. et al. (2013). Bayesian Data Analysis. Chapman; Hall/CRC.
[20] Gray, C., Crawford, G., Maycock, B., & Lobo, R. (2022). Exploring the Intersections of Migration, Gender, and Sexual Health with Indonesian Women in Perth, Western Australia. International Journal of Environmental Research and Public Health, 19, Article No. 13707. [Google Scholar] [CrossRef] [PubMed]
[21] Hastie, T. et al. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science Business Media.
[22] Hekler, E. B., Klasnja, P., Chevance, G., Golaszewski, N. M., Lewis, D., & Sim, I. (2019). Why We Need a Small Data Paradigm. BMC Medicine, 17, Article No. 133. [Google Scholar] [CrossRef] [PubMed]
[23] Hyndman, J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice.
[24] Julong, D. et al. (1989). Introduction to Grey System Theory. The Journal of Grey System, 1, 1-24.
[25] Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures & Their Consequences. SAGE Publications Ltd. [Google Scholar] [CrossRef
[26] Kitchin, R., & Lauriault, T. P. (2014). Small Data in the Era of Big Data. GeoJournal, 80, 463-475. [Google Scholar] [CrossRef
[27] Lindstrom, M. (2016). Small Data: The Tiny Clues That Uncover Huge Trends. St. Martin’s Press.
[28] Mayer-Schonberger, V., & Cukier, K. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt.
[29] Moieni, R., & Mousaferiadis, P. (2022). Analysis of Cultural Diversity Concept in Different Countries Using Fractal Analysis. The International Journal of Organizational Diversity, 22, 43-62. [Google Scholar] [CrossRef
[30] Moieni, R., Mousaferiadis, P., & Roohi, L. (2023). A Study on Diversity Prediction with Machine Learning and Small Data. Open Journal of Social Sciences, 11, 18-31. [Google Scholar] [CrossRef
[31] Pan, Y., Zhang, M., Chen, Z., Zhou, M., & Zhang, Z. (2016). An ARIMA Based Model for Forecasting the Patient Number of Epidemic Disease. In 2016 13th International Conference on Service Systems and Service Management (ICSSSM) (pp. 1-4). IEEE.
[32] Productivity Commission (2022). Overcoming Australias Labour and Skills Shortages.
https://www.pc.gov.au/__data/assets/pdf_file/0007/338083/sub047-productivity-attachment.pdf
[33] Shao, Y., & Su, H. (2012). On Approximating Grey Model DGM(2,1). AASRI Procedia, 1, 8-13. [Google Scholar] [CrossRef
[34] Sharma, D. B., Chandrakanth Reddy, D. I., Ahmad, D. F., Sujata, D., & Singh, C. (2024). Examining the Role of Gender in the Social Dynamics of Migration. Educational Administration: Theory and Practice, 30, 1058-1066. [Google Scholar] [CrossRef
[35] Taylor, S. J., & Letham, B. (2018). Forecasting at Scale. The American Statistician, 72, 37-45. [Google Scholar] [CrossRef
[36] Torchia, M., Calabrò, A., & Huse, M. (2011). Women Directors on Corporate Boards: From Tokenism to Critical Mass. Journal of Business Ethics, 102, 299-317. [Google Scholar] [CrossRef
[37] Verhulst, S. G., Engin, Z., & Crowcroft, J. (2019). Data & Policy: A New Venue to Study and Explore Policy—Data Interaction. Data & Policy, 1, e1. [Google Scholar] [CrossRef
[38] Wang, C., Dang, T., Nguyen, N., & Le, T. (2020). Supporting Better Decision-Making: A Combined Grey Model and Data Envelopment Analysis for Efficiency Evaluation in E-Commerce Marketplaces. Sustainability, 12, Article No. 10385. [Google Scholar] [CrossRef
[39] Wang, Y., Shen, Z., & Jiang, Y. (2018). Comparison of ARIMA and GM(1,1) Models for Prediction of Hepatitis B in China. PLOS ONE, 13, e0201987. [Google Scholar] [CrossRef] [PubMed]
[40] West, M., & Harrison, J. (2006). Bayesian Forecasting and Dynamic Models. Springer Science & Business Media.
[41] Wilkins, R. (2017). Household Economic Decision-Making: Spousal Perspectives on Their Working Lives. In Household Economic Decision-Making (pp. 1-28). Academic Press.
https://melbourneinstitute.unimelb.edu.au/__data/assets/pdf_file/0009/3963249/HILDA-Statistical-Report-2021.pdf
[42] Workplace Gender Equality Agency (2022). Gender Equality Workplace Statistics at a Glance 2022.
https://www.wgea.gov.au/publications/gender-equality-workplace-statistics-at-a-glance-2022
[43] World Economic Forum (2023). Global Gender Gap Report.
https://www.weforum.org/publications/global-gender-gap-report-2023/

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.