Small Scale Predictive Analysis of Gender Balance in Australia Using Grey Models: Integrating Labour Force and Migration Data ()
1. Introduction
Promoting gender equality in the workplace is a key driver of organisational success and societal progress. Understanding the dynamics of gender balance within the workforce is crucial for companies to effectively manage diversity and foster inclusive cultures. Creating a diverse working environment still remains a significant challenge for companies in Australia, especially in the context of global STEM fields where women represent just 29.2% of the workforce despite constituting 49.3% of non-STEM employment (World Economic Forum, 2023). In response to this imperative, this academic paper presents a comprehensive study aimed at predicting gender trends in the Australian labour market and overseas migration.
This paper addresses the challenge of predicting gender balance in the Australian workplace by employing forecasting models based on small datasets. This study aims to provide actionable insights for companies seeking to enhance their understanding of workforce dynamics and promote gender equality initiatives.
The methodology employed in this study draws on two primary datasets: census data on the labour force (Australian Bureau of Statistics, n.d.-a) and overseas migration (Australian Bureau of Statistics, n.d.-b), categorised by sex and state. The forecasting models, including ARIMA (AutoRegressive Integrated Moving Average), Grey Models (GM), specifically GM (1, 1), and GM (2, 1), have been developed to predict the trends of gender balance at both the state and national levels.
Furthermore, Becker’s formula of discrimination (D) (Becker, 1971) has been adapted to fit within the forecasting framework, assisting in evaluating the quality of equal opportunity within the Australian employment market. Regarding the diversity index proposed by Moieni & Mousaferiadis (2022), this paper introduces a novel approach to integrating Becker’s discrimination indices into the forecasting models. Regarding the diversity index proposed by Moieni & Mousaferiadis (2022), this paper introduces a novel approach to integrating Becker’s discrimination indices into the forecasting models. This involves aggregating discrimination indices into a single coefficient, which serves as a key parameter for the integrated forecasting model. This approach enables a comprehensive assessment of gender parity trends in Australia.
By elucidating the relationship between overseas migration patterns, employment trends, and gender balance, this research contributes to the ongoing discourse on gender equality and diversity in the workplace. The findings and insights derived from this study have significant implications for policy formulation, organisational practices, and societal initiatives aimed at fostering inclusive and equitable employment environments in Australia.
Subsequent sections of this paper explore the theoretical foundations of forecasting models, discuss the methodology employed in data analysis, present empirical findings, and offer recommendations for enhancing gender parity and equality in the Australian labour market. Through rigorous academic inquiry, this paper seeks to advance the understanding of gender dynamics in employment and inform evidence-based interventions for presenting inclusivity and fairness in the workplace.
Problem Statement
The Australian labour force and overseas migration relationship is vital for equality, diversity, and inclusion, specifically creating diverse workplaces and fostering social cohesion across Australia. Nonetheless, the gender balance in Australia, considering these factors, remains insufficiently understood and studied. Though confined to the Australian context, this research emphasizes the necessity of developing a metric to measure gender balance based on labour force participation and migration patterns across Australia. This metric is essential for measuring gender balance and promoting equal opportunities, creating diverse workplaces, and fostering social cohesion in Australia’s multicultural landscape.
2. Literature Review
A) Importance of Gender Balance
Gender balance is a crucial issue that has far-reaching implications for societies across the globe. Achieving gender equality is not only a fundamental human right but also a prerequisite for sustainable development, economic growth, and social cohesion. Moreover, gender balance is essential for promoting equal opportunities, empowering both genders, and fostering inclusive societies. Numerous studies have demonstrated the positive impacts of gender equality on various aspects of societal well-being. For instance, gender balance in the workforce leads to increased productivity, innovation, and economic growth (Gray et al., 2022). Women’s participation in the labor market contributes to a more diverse and skilled talent pool, driving economic prosperity. In education and healthcare has been shown to improve outcomes for both women and children, leading to better overall societal well-being (Gakidou et al., 2010). Increased representation of women in leadership and decision-making roles fosters more inclusive and diverse perspectives, contributing to better governance and policy-making (Torchia et al., 2011). Nonetheless, exist gender imbalances worldwide. Despite progress towards gender equality, significant disparities persist in the Australian labor market. The Workplace Gender Equality Agency’s 2022 report found that women hold only 19.4% of CEO positions and 32.5% of key management roles in non-public sector organizations. The gender pay gap also remains, with women earning on average 13.8% less than men (Workplace Gender Equality Agency, 2022). These imbalances are often attributed to societal attitudes, lack of affordable childcare, and inadequate policies to support work-life balance. Women continue to bear a disproportionate share of unpaid domestic and caregiving responsibilities, impacting their workforce participation and career progression (Wilkins, 2017). Additionally, migration has been identified as a potential solution to address labor shortages in Australia, but migrant women and men often face additional barriers such as language issues, lack of local experience, and non-recognition of overseas qualifications, leading to underemployment or employment in lower-skilled jobs (Australian Government Department of Home Affairs, 2023). Researching gender balance in Australia’s labor force and migration is crucial for identifying systemic barriers and societal attitudes contributing to gender imbalances, informing policies that promote equal opportunities (Clemens, 2023). Ensuring the full participation of both genders, including migrants, maximizes the available talent pool and addresses skill shortages, leading to improved economic outcomes and reduced poverty (Productivity Commission, 2022). Providing insights into the challenges faced by migrants helps inform targeted support services, facilitating their labor market integration (Sharma et al., 2024). Therefore, this study can help to identify the gender balance opportunities across Australia and promote a more inclusive and equitable labour force, benefiting both genders and the country economy.
B) The Rise of Small Data
Given the constraints imposed by the limited availability of online data sources, this research project will concentrate on applying forecasting models to small datasets. This focus aligns with the growing recognition in the era of big data of the significant value and importance of “small data”—relatively small samples of qualitative or quantitative data that can provide rich insights (Lindstrom, 2016). While big data focuses on finding patterns across massive datasets, small data allows for a deeper dive into specifics and context.
Small data has several key advantages. It is typically easier and less expensive to collect and more feasible compared to big data especially for resource-constrained organisations (Kitchin & Lauriault, 2014; Hekler et al., 2019). Small datasets also lend themselves better to human analysis and sense-making rather than being overly reliant on algorithms and machine learning models that can miss nuances (Mayer-Schonberger & Cukier, 2013). Additionally, small datasets are present in diverse domains including business, where companies are using small data from customer surveys, social media comments, and other sources to gain insights into consumer needs, brand perception, and product issues in a targeted way (Bose & Mahapatra, 2001). In healthcare, Analysing small datasets like medical records and patient-reported outcomes can reveal treatment effects, disease progression patterns, and other insights complementary to large clinical trials (Baro et al., 2015). Qualitative small social sciences data like interviews, ethnographies, and focus groups remain vital for understanding human behaviours, motivations, and lived experiences (Kitchin, 2014). Small datasets can be found in government such as constituents, localities, and specific issues can inform policymaking in a contextual, actionable manner (Verhulst et al., 2019).
Therefore, the size of the dataset available can significantly impact the choice and performance of forecasting models. While large datasets enable more complex models, small datasets require simpler but robust approaches. For example:
1) Large Datasets: Various model techniques cater to large datasets, including machine learning models and deep learning methods. Machine learning techniques like neural networks, random forests, and gradient boosting effectively capture complex non-linear patterns for forecasting, offering flexibility in modelling high-dimensional data and intricate interactions between predictors (Cerqueira et al., 2022). Deep learning, exemplified by recurrent neural networks (RNNs) such as LSTMs and transformer-based models like DeepAR, demonstrates state-of-the-art performance in large-scale time series forecasting tasks by automatically learning long-range temporal dependencies (Brownlee, 2018). While these techniques exhibit powerful capabilities, they also present potential limitations. They excel in capturing complex patterns, high-dimensionality, automation, reduced need for manual feature engineering, and improved accuracy from reduced overfitting on massive data (Cerqueira et al., 2022). Forecasting models, however, tailored for large datasets may encounter challenges when applied to smaller datasets. These models, accustomed to abundant data inputs, may struggle with overfitting on limited samples, where the model’s complexity surpasses the available information (Cerqueira, 2022).
2) Small Datasets: When working with limited data, choosing an appropriate forecasting model is crucial to obtain accurate and reliable predictions. Several techniques have been developed specifically for small sample sizes: the classical techniques statistical model and Bayesian methods. Classical statistical models such as ARIMA, linear regression, and Grey Model are commonly favoured. Despite lower flexibility, these models offer greater transparency, ease of specification, and reduced susceptibility to over fitting. Alternatively, Bayesian structural time series models like Prophet integrate domain knowledge through prior distributions, thereby enhancing forecast accuracy and providing more reliable uncertainty estimates for small datasets (Taylor & Letham, 2018).
C) Forecasting Models for Small Datasets
In alignment with the study of diversity prediction using machine learning on small datasets (Moieni et al., 2023), this study opts for forecasting models adept at handling limited data. GM(1, 1) and (2, 1), alongside ARIMA, are selected for gender balance research. The performance of each model is evaluated through metrics such as the Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), and Mean Absolute Error (MAE), ensuring a comprehensive analysis of model performance in predicting gender balance within the Australian labour market.
The ARIMA model is a classical statistical technique that combines three components: Autoregressive (AR), Integrated (I), and Moving Average (MA). The general form of the ARIMA model is represented as ARIMA (p, d, q), where p is the order of the AR component, d is the degree of differencing required to make the time series stationary, and q is the order of the MA component (Abu-Bakar & Rosbi, 2017). Hence, the ARIMA model is well-suited for small datasets due to its simplicity, transparency, and reduced susceptibility to overfitting. Despite its lower flexibility compared to more complex models, ARIMA offers reliable forecasts when working with limited data (Pan, Zhang et al., 2016). On the other hand, GM(1, 1) is a time series forecasting technique particularly useful for small datasets and can effectively handle incomplete or uncertain information (Caro et al., 2020). The GM(1, 1) model is based on the accumulation of the original time series data and uses a first-order differential equation to describe the behavior of the system (Wang et al., 2018). This model is known for its simplicity, as it requires only a few data points to build the model. However, it may struggle to capture complex patterns or non-linear relationships in the data. Lastly, the GM(2, 1) is an extension of the GM(1, 1) model, designed to handle data with non-linear characteristics or fluctuations. The GM(2, 1) model uses a second-order differential equation to describe the system’s behavior, allowing for more flexibility in capturing non-linear patterns. The GM(2, 1) model is more complex than the GM(1, 1) model and involves additional terms to capture non-linear behavior (Shao et al., 2012) and it requires more data points that GM(1, 1) to calculate additional parameters.
D) Challenges of Forecasting with Small Datasets
While large datasets have enabled powerful forecasting models like deep learning, many real-world applications involve working with limited data samples. To evaluate the performance and suitability of various forecasting techniques on such limited data, this study employs a small dataset and applies multiple forecasting models for comparison. Forecasting with small datasets poses several key challenges. This literature review examines the key problems associated with forecasting using small datasets and discusses potential strategies to address these challenges.
One prominent issue with small datasets is the increased risk of model overfitting. With fewer data points available for training, forecasting models may capture noise or random fluctuations in the data, leading to overly complex models that perform poorly on unseen data. Overfitting not only compromises the accuracy of forecasts but also undermines the model’s generalisability to new scenarios or data distributions (Hastie et al., 2009). Moreover, small datasets often lack the diversity and representativeness necessary for capturing the full range of variability in the underlying data generating process. This limitation can result in biased or unreliable forecasts, as the model may fail to capture important patterns or relationships present in the broader population (Gelman et al., 2013). Another challenge stems from the limited feature space available in small datasets. Forecasting models rely on a set of input features to make predictions, and small datasets may not encompass all relevant variables or factors influencing the target variable. As a result, the model’s predictive power may be constrained, hindering its ability to generate accurate forecasts (West & Harrison, 2006).
E) Application of Becker’s Formula of Discrimination
Given the constraints of current datasets, this study aims to develop a predictive model using this limited data to forecast future trends in gender balance within the Australian labour market. At this stage, Becker’s discrimination formula (D), as introduced by Becker (1971), serves to quantify taste-based discrimination by employers, encompassing gender biases. Chen (2020) successfully utilised this coefficient (D) to evaluate opportunity equality in both higher education and the labour market, demonstrating its efficacy as a measure of gender balance. This index allows for the assessment of employment opportunity disparities between males and females. By incorporating variables related to overseas migration and the employment market in Australia, alongside the application of Becker’s formula, the study seeks to determine the level of gender discrimination.
F) Development of a Gender Balance (GB) Index
The journey continues with the development of the GB index, inspired by the cultural diversity index proposed by Moieni and Mousaferiadis (2022). Employing their methodology to calculate the weights of each attribute allows for the creation of GB index. This innovative index is designed to quantify the discrimination level between genders. Subsequently, the forecasting models will be built to identify the most fitting approach for assessing discrimination levels and predicting gender equality in the workplace.
3. Methodology
This section addresses how series data were harvested and transformed. This research is focusing mainly on predict gender equality by sex and state/territory. Therefore, exploring diverse factors that contribute to measure gender equality were taking into consideration including overseas migration by state/territory classify by sex from 2004 onwards (Australian Bureau of Statistics, n.d.-b) and labour force status by sex, state and territory from 1978 onwards (Australian Bureau of Statistics, n.d.-a). Both datasets have different periods of time (years). Consequently, it was considered from 2004 to 2022 (See Appendix).
It was assumed that the count of employees encompasses individuals from diverse backgrounds, including migrants, citizens, and Aboriginal peoples. Moreover, it was applied Becker’s Coefficient formula (Chen, 2020) DM and DE where each coefficient corresponds to Overseas Migration Discrimination and Employment Discrimination respectively, (see equation 1 and 2). The next stage is to introduce a GB index. Finally, this study also applied the concept of time series analysis including ARIMA, grey model (1, 1), grey model (2, 1) to predict gender balance from 2016 to 2019.
A) Becker’s Coefficient Calculations
This section is divided into two steps. The first step is Becker’s Coefficient Calculations. The second step is coefficients interpretation.
This study applied Becker’s Discrimination formula (D) to measure the equality of opportunity in the Australian employment market and overseas migration in Australia.
The Migration Discrimination (DM) was calculated using the total migration figures for males and females across Australian states over various years. For instance, in Victoria in 2004, the numbers for males (MigM) and females (MigF) migration were 43,860 and 40,920, respectively. The proportion (MigM/MigF) was then calculated for each state and territory by year using respective data (see Equation (1)). Hence, 17 DM data was obtained by year from 2004 to 2022 by each Australian state.
Migration Discrimination (DM):
(1)
The Employment Discrimination (DE) was calculated using the total employment figures for males and females across Australian states over various years. For instance, in Victoria in 2004, the numbers for males (EmpM) and females (EmpF) employee were 15,809,000 and 12,743,700, respectively. The proportion (EmpM/EmpF) was then calculated for each state and territory by year using respective data (see Equation (2)). Hence, 17 DE data was obtained by year from 2004 to 2022 by each Australian state.
Employment Discrimination (DE):
(2)
EmpM = Number of Male employees.
EmpF = Number of Male employees.
Step 2: Interpret DM and DE values
DM or DE are from a range from −1 to 1. A positive DM or DE suggests fewer opportunities for females in either overseas migration or employment. A negative DM or DE indicates favoritism towards females in migration or employment opportunities. DM or DE close to 0 implies gender-neutral overseas migration or employment opportunities.
B) Gender Balance Index
A definition of gender balance might encompass a variety of factors such as age and nationality. However, the scope of this analysis is shaped by the available data, which determines the specific factors considered.
These factors, while contributing to overall gender balance in a multicultural country and in the workplace is a key factor of a success of organisations implies a diversity place to work, inclusive and equal opportunities for both genders. On that account, this section shows the process how to find coefficients for each dataset and introduce a gender balance formulation (Equation (5)).
To integrate the discrimination indices (Equation (1) and (2)) into the forecasting models, it will be aggregated into a single index. This process involves calculating the proportion of overseas migration and total employed people across all Australia states, as coefficients for the integrated formula (Equation (3) and (4)) considering data from 2004 to 2022. These proportions represented the total of overseas migration (OMigrationT) and the total of population of labour force (LabourFT) during the same period. The coefficients are calculated as follows:
Overseas Migration Coefficient (PM):
(3)
Employment Coefficient (PE):
(4)
The coefficients derived from overseas migration PM and employment PE data are crucial for time series taking into consideration two attributes at the same time (see Equation (3) and (4)). These coefficients provide insights into gender parity in overseas migration and employment trends in Australia. By analysing these indices, organisations can gain valuable insights into the equality of opportunities for males and females in the Australian labour market and overseas migration in a multicultural country. Therefore, the “Gender Balance” formula (Equation (5)) is designed to measure equality across gender by incorporating these coefficients (PM and PE) into the “Gender Balance” index. As these coefficients global all the information of all the years and of all the states as a whole in migration and employment, respectively.
(5)
Final dataset included date in years from 2004 to 2022 and 8 columns where each column correspond to a state or territory in Australia, respectively. After apply the new index propose above (Equation (5)). Table 1 provides a summary of the data after apply the new “Gender Balance” index.
Table 1. A table to show the final dataset after apply GB index and each column is an Australian State/Territory.
Years |
NSW |
VIC |
QLD |
SA |
WA |
TAS |
NT |
ACT |
C) Forecasting Gender Balance Trends through Time Series Analysis
This research endeavors to leverage the recently developed “Gender Balance” index (GB) alongside three distinct time series forecasting methodologies: Autoregressive Integrated Moving Average (ARIMA) (Hyndman & Athanasopoulos, 2018), Grey Model (1, 1) (Julong et al., 1989), and Grey Model (2, 1) (Gan et al., 2015) for projecting GB trends over the next 5 years. To ensure the reliability of the forecasting models, comprehensive adequacy checks are conducted prior to their application. By comparing the three methodologies, the most effective approach has been selected to forecast GB trends for the forthcoming 5 years.
Evaluating the performance of a forecasting model constitutes a pivotal aspect of this study, given its direct impact on decision-making processes. By analyzing the disparity between predicted and actual values, this study applied the most common metrics including Mean Absolute Percentage Error (MAPE) (Hyndman & Athanasopoulos, 2018), Mean Absolute Error (MAE) (Hyndman & Athanasopoulos, 2018), and Mean Squared Error (MSE). These evaluation criteria facilitate a comprehensive assessment of the forecasting accuracy, providing invaluable insights into the performance of the selected models.
4. Data Analysis and Results
This section encapsulates the findings derived from the research following Exploratory Data Analysis (EDA), examination of GB index, and time series analysis.
A) Anomaly Detection
Utilising Exploratory Data Analysis (EDA), this study discerned outliers within overseas migration trends, exerting influence on forecast models. Overseas migration plots (Figure 1) highlighted two data points deviating from the general trend in 2020 and 2021 across genders. Specifically, in New South Wales (NSW), male and female overseas migration experienced marginal declines to 56.6% and 56%, respectively, in 2020. Nevertheless, Victoria (VIC) witnessed a significant drop to 68% (male) and 67% (female), marking the highest decline since the onset of the pandemic. Similarly, Queensland (QLD) reported decreases to 63% and 62.7% for male and female migrants, respectively. These fluctuations are likely attributable to COVID-19 restrictions (Australian Government Department of Health of Age Care, 2024), where the number of travelers entering Australia dropped during these two years.
![]()
Figure 1. Overseas Migration and Labour Force original data. It was found that two outliers in 2020 and 2021 in both dataset in both genders.
In 2020, the labour force, as depicted in Figure 1, experienced significant impacts. Specifically, within New South Wales (NSW), there was a noticeable decrease in the number of jobs, with a decline of approximately 2.9% for males and 1.4% for females compared to the preceding year, coinciding with the onset of COVID-19 cases. Meanwhile, the participation of female labour force in Victoria exhibited the most pronounced reduction, plummeting by 3.1%. The predominant cause of this decline in the labour force can be attributed to the disruptive effects of the COVID-19 pandemic. These effects encompassed various factors, including enforced shutdowns, widespread business closures, and diminished consumer demand, collectively contributing to the contraction observed in labour force participation rates.
In consequence, data from 2020 and 2021 were excluded from statistical calculations due to their susceptibility to outliers. These years were deemed highly sensitive to anomaly in the data, which could significantly skew statistical analyses.
B) Trend of Gender Balance Index
Figure 2 illustrates the GB index across Australian states, considering labour force participation and overseas migration, to elucidate gender-based opportunities over time. Early in the 2000s, Western Australia (WA) exhibited the highest GB index, suggesting fewer opportunities for females compared to males. However, since 2016, the gap in the GB index between genders has narrowed in WA, indicating an increasing presence of females in both the labour force and overseas migration patterns within the state. Furthermore, from 2004 to 2022, New South Wales (NSW), Victoria (VIC), Queensland (QLD), and South Australia (SA) have progressively closed the gap in gender-based opportunities, with GB index improvements of approximately 56.5%, 53.3%, 74%, and 65.2%, respectively. In contrast, the Australian Capital Territory (ACT) displayed a consistently lower GB index since 2004, suggesting a relatively stable representation of gender-based opportunities over the years. This observation aligns with data from the Australian Bureau of Statistics (2023), indicating a balanced gender composition in the ACT without a significant prevalence of males over females.
![]()
Figure 2. GB index plots across Australian States/territories. This index global both genders.
5. Data Forecasting
This section comprises two stages delimited for comprehensive analysis. The initial stage encompasses model comparison to det the optimal forecasting model. This phase is subdivided into subsections, each presenting the application of three distinct forecasting models. The subsequent stage entails the prediction of gender balance over the next five years utilising the superior model identified in the initial stage.
A) Model Comparison and Selection
Overall, the dataset underwent a partition into training data spanning from 2004 to 2015 and testing data covering the years 2016 to 2019. This partitioning strategy was implemented across three forecasting models: ARIMA, GM(1, 1), and GM(2, 1). Subsequent evaluation of each model was conducted based on performance metrics and median across all states.
1) ARIMA model: Utilising the itertools function, an exhaustive search was conducted to scan parameters p, d, and q within a range of 0 to 5, aiming to identify the optimal parameters for each state/territory. Remarkably, none of the states/territories exhibited seasonality in their data. Table 2 illustrates the determined parameters (p, d, q) for each Australian state/territory. The Australian Capital Territory (ACT) parameters indicate that errors are uncorrelated over time. However, this does not inherently signify whether the model is effective or ineffective.
Table 2. ARIMA parameter for each state.
ARIMA parameter |
NSW |
VIC |
QLD |
SA |
WA |
TAS |
NT |
ACT |
(p, d, q) |
(4, 1, 0) |
(1, 0, 0) |
(0, 1, 0) |
(1, 0, 0) |
(1, 0, 0) |
(4, 0, 0) |
(0, 0, 1) |
(0, 0, 0) |
2) Grey Model (1, 1): Figure 3 presents a comparison between ARIMA and Grey Model (1, 1) forecasts. GM(1, 1) predictions closely align with actual values, indicative of its precision in capturing the underlying trends. Discrepancies, however, between predicted and actual values are notable for Western Australia (WA) and the Northern Territory (NT). One contributing factor to these deviations is the population dynamics within these states/territories, where fluctuations in overseas migration and employment rates occurred notably between 2014 and 2016. Consequently, WA experienced a substantial decrease of 23.5%, while NT observed a pronounced drop of 30% during these years. These variations significantly impacted the accuracy of the forecast models. Mathematically, GM(1, 1) operates by accumulating the original data sequence to generate a new sequence that adheres to an approximate exponential law (Wang et al., 2020). This method proves effective when the changes in the original data are not substantial. However, in scenarios with significant fluctuations, such as observations in WA and ACT during the specific period, GM (1, 1) may exhibit limitations in accurately capturing the dynamic patterns inherent in the data. In contrast, this study utilised only 12 data points to build the ARIMA model. This limited dataset raises concerns regarding the potential impact of random fluctuations (Hyndman & Athanasopoulos, 2018), as illustrated in Figure 3. Such fluctuations can affect the forecasting performance, highlighting the inherent risks associated with using a smaller number of data points.
![]()
Figure 3. Comparative Analysis of ARIMA, GM(1, 1) to predict GB Index by states/territories from 2016 to 2019.
3) Grey Model (2, 1): Figure 4 depicts the application of the Grey Model (2, 1), revealing a conspicuous exponential pattern across all Australian states/territories. This observation aligns with the theoretical underpinnings of the GM(2, 1) model equations. However, it’s noteworthy that the GM (2, 1) may encounter challenges in achieving a precise fit due to structural disparities between the estimation and prediction equations, as well as the inherent limitations in grey modelling approaches, which may struggle to fully encapsulate the intricacies of real-world data.
Figure 4. Comparative Analysis of ARIMA, GM(1, 1) and GM(2, 1) to predict GB Index by states/territories from 2016 to 2019.
4) Model Selection: In examining the performance of GM(2, 1) across the New South Wale (NSW), Australian Capital Territory (ACT) and South Australia (SA) as depicted in Figure 5, higher Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), and Mean Squared Error (MSE) values were observed compared to other states. These outcomes suggest a discernible exponential trend as shown in Figure 4. Conversely, in Northern Territory (NT) and New South Wales (NSW), GM(1, 1) exhibited elevated metrics, potentially attributable to the model’s sensitivity to abrupt fluctuations in data. The fluctuations in NSW’s real data, particularly in 2019, could be associated with Grey Model’s response to drastic changes. Specifically, the relatively low number of COVID-19 cases within Australia in 2019 impacted working hours across various occupations and led to a decline in overseas migration numbers prior to the declaration of the global pandemic (Australian Bureau of Statistics, 2020).
![]()
Figure 5. The bar chart compares MAPE, MSE and MAE metric evaluating ARIMA, GM (1, 1) and (2, 1) models.
In Figure 5, a comprehensive comparison of metrics highlights the widespread suitability of GM(1, 1) model across the majority of states. Particularly, examination of the median metrics by each model (Table 3) suggests that GM(1, 1) model consistently outperforms others in predicting GB index across the Australian continent. Consequently, considering the lower metric values and median, GM(1, 1) emerges as the most favourable choice for forecasting GB trends in Australia.
Table 3. Metrics results by states, and the median value of each model.
Australia States |
|
MAPE |
|
|
MAE |
|
|
MSE |
|
|
ARIMA |
GM11 |
GM 21 |
ARIMA |
GM 11 |
GM 21 |
ARIMA |
GM 11 |
GM 21 |
NSW |
8.082 |
16.082 |
1607872.726 |
0.011 |
0.022 |
2280.085 |
0.013 |
0.023 |
2992.351 |
VIC |
19.441 |
5.645 |
1454830.008 |
0.027 |
0.008 |
2056.829 |
0.033 |
0.011 |
2687.721 |
QLD |
22.053 |
19.684 |
41512.257 |
0.021 |
0.019 |
40.206 |
0.024 |
0.020 |
45.272 |
SA |
48.251 |
24.081 |
4322971.240 |
0.050 |
0.0251 |
4345.863 |
0.053 |
0.027 |
5891.595 |
WA |
39.047 |
31.347 |
8882.081 |
0.071 |
0.057 |
16.195 |
0.073 |
0.058 |
17.271 |
TAS |
42.545 |
17.500 |
1448670.712 |
0.039 |
0.016 |
1402.649 |
0.045 |
0.019 |
1826.356 |
NT |
43.133 |
65.896 |
11984.144 |
0.047 |
0.075 |
13.946 |
0.055 |
0.082 |
14.959 |
ACT |
105.897 |
58.248 |
3855149.642 |
0.030 |
0.014 |
1582.997 |
0.031 |
0.019 |
2144.968 |
Median |
40.796 |
21.882 |
1451750.360 |
0.034 |
0.021 |
1492.823 |
0.039 |
0.022 |
1985.662 |
B) Predicting the next 5 years
Grey Model (1, 1) shows the optimal performance for predicting GB trends from 2016 to 2019, as evidenced by the findings presented in Table 3 and the metrics evaluation depicted in Figure 5. Additionally, Figure 6 indicates a narrowing
Figure 6. Projection results for the next five years applying GM(1, 1).
of the gap between male and female GB projections over the next five years. Significant disparities, however, are observed in NT, where the predictions diverge considerably from real data. This discrepancy in NT can be attributed to the operational mechanism of GM(1, 1), which relies on accumulating the original data sequence. In this instance, the model was trained with only 12 data points, revealing a distinct upward trend. The model extrapolates future values by adhering to this positive trend, potentially leading to deviations from actual observations.
Table 4 presents forecasted projections for the GB gap opportunities, with the Australian Capital Territory (ACT) positioned to achieve parity by 2026, closely followed by TAS and QLD exhibiting promising trajectories towards closing the gap. On the contrary, the populous states of NSW and VIC demonstrate a concerning slow pace of progress, highlighting the need for both research investigation and policy action. This discrepancy across regions can be attributed to a multitude of factors, including job opportunities, visa and immigration policies, cultural and social norms.
Table 4. Forecasting GB trends Across Australian States/Territories over the next five years.
Years |
NSW |
VIC |
QLD |
SA |
WA |
TAS |
NT |
ACT |
2022 |
0.147 |
0.127 |
0.101 |
0.118 |
0.230 |
0.092 |
0.225 |
0.042 |
2023 |
0.143 |
0.122 |
0.096 |
0.114 |
0.227 |
0.088 |
0.229 |
0.040 |
2024 |
0.140 |
0.118 |
0.092 |
0.111 |
0.225 |
0.084 |
0.232 |
0.039 |
2025 |
0.136 |
0.114 |
0.088 |
0.108 |
0.222 |
0.080 |
0.236 |
0.038 |
2026 |
0.133 |
0.110 |
0.084 |
0.105 |
0.219 |
0.077 |
0.240 |
0.036 |
6. Conclusion
This study evaluated the performance of three forecasting models—ARIMA, GM(1, 1), and GM(2, 1)—in predicting Gender Balance (GB) trends across Australian states and territories. The dataset consisted of 16 data points, which were partitioned into training data from 2004-2015 and testing data from 2016-2019.
ARIMA model was optimised through an exhaustive search of parameter combinations and none of the states/territories exhibited seasonality in their data. GM(1, 1) closely aligned with actual values, demonstrating its effectiveness in capturing trends. However, discrepancies were observed for WA and the NT, likely due to significant population dynamics and fluctuations during 2014 and 2016. By contrast, GM(2, 1) exhibited a conspicuous exponential pattern across all states/territories, in line with its theoretical underpinnings. However, this model faced challenges in achieving a precise fit, potentially due to structural inconsistencies between the estimation and prediction equations, as well as inherent limitations of grey modelling approaches.
Evaluation of performance metrics revealed the widespread suitability of GM(1, 1), which consistently outperformed the other models in predicting GB across the majority of states. The median prediction results further reinforced the model as the most favourable choice for forecasting GB trends in Australia. While GM(1, 1) showed outstanding results, discrepancies were observed in WA and NT, where the model’s reliance on a limited 12 data points training set led to extrapolation of an upward trend that diverged from actual observations. GB index gap projections for the next five years, utilising GM(1, 1) analysis, underscores the persistent nature of gender inequality in workforce participation across Australia. However, the degree of this disparity varies considerably among states and territories. It is anticipated that by 2026, the ACT, TAS, and QLD will achieve greater gender parity and diversity in employment opportunities. Nonetheless, in major urban centers such as NSW and VIC, the pace of closing this gap appears sluggish. This phenomenon can be attributed to various factors including job market dynamics, visa and immigration policies, as well as cultural and social norms. These factors can significantly influence workforce participation rates, and certain visa categories may impose restrictions on employment opportunities for women, exacerbating gender disparities within the labor market. In conclusion, GM(1, 1) emerges as the optimal choice for predicting GB trends in Australia from 2016 to 2019, as evidenced by its superior performance across various evaluation metrics. Nevertheless, caution is warranted when applying this model to regions with significant population dynamics and limited data availability, as demonstrated by the case of Northern Territory. Additionally, the GB index, which can be applied within a company to measure the gender balance, may prove effective evidence there is a sufficient amount of accessible data, and the coefficient may vary depending on the volume of available datasets.
7. Limitations and Future Research
This study was limited by data availability, data collection and time constraints. The data scope was confined to the Australian Bureau of Statistics (ABS), the only source providing consistent data information classified by gender. This limitation extended to number of factors to consider, as other dataset sources were not consistent with the time frame, state information, and most of information was not classify by gender. Moreover, finding consistent data by state online proved challenging as state government data availability is discretionary and is not universally open source in all cases. For instance, attempts to integrate education data classified by state and gender into the forecasting model were impeded due to insufficient date data with other attributes. This highlights the challenges of other factors to measure gender balance using open-source data, especially across all states with gender classification.
Future research stands to gain considerable value from the integration of additional attributes such as birth country, language, and identification of Aboriginal and Torres Strait Islanders. Such endeavors hold promise for yielding deeper insights into gender equality, thereby furnishing companies and organisations with invaluable knowledge to cultivate diverse workplaces characterised by equal opportunities and inclusive cultures.
Acknowledgements
This research was supported by Diversity Atlas and their provision of data has been instrumental in shaping the findings of this study. We thank the entire Diversity Atlas team for their support, although they may not agree with all the interpretations/conclusions of the results.
Appendix
A. Git Hub Repository
The provided hyperlink grants access to the datasets utilised in this study, along with the associated results: Gender Balance Small Dataset.
Please note that this dataset is referenced throughout the paper for analysis and findings.
https://github.com/eloisjr/Small-scale-predictive-analysis-of-gender-balance-in-Australia-using-Grey-models-Integrating-Labour