Semantic Diversification in Equity Portfolios ()
ABSTRACT
In the race to harvest the power of Artificial Intelligence (AI) in virtually every field, researchers and practitioners are faced with an ever increasing supply of novel tools that have not undergone domain-specific tests. This paper informs the methodological choices of researchers in economics and finance by comparing the performance of three Natural Language Processing (NLP) methods at an important task, namely using text analysis for portfolio diversification. Portfolio management can benefit from analysing text data in the form of company descriptions, since the returns of companies with similar descriptions tend to be correlated and consequently, portfolios of dissimilar companies should have lower risk. In this paper, three NLP methods are used to construct so-called minimum semantic concentration portfolios, which are designed to leverage the semantic diversity of the business descriptions of constituent companies to reduce portfolio volatility. Two widely used large language models (BERT and GPT) and an alternative AI solution inspired by neuroscience, called semantic fingerprinting are put to the test of comparing meaningfully the business descriptions of the S&P 500 and respectively Europe 600 constituents in order to derive actionable investment insights. The results show that all three NLP methods are able to extract relevant information from company descriptions: the minimum semantic concentration portfolios have significantly lower volatility than portfolios constructed with randomly chosen weights. While no NLP method is able to claim absolute superiority over its peers, semantic fingerprinting appears the most consistent and robust performer, since BERT and GPT demonstrate not only their potential but also a caveat, as their performances are volatile even across very similar tasks.
Share and Cite:
Pungulescu, C. (2025) Semantic Diversification in Equity Portfolios.
Theoretical Economics Letters,
15, 187-198. doi:
10.4236/tel.2025.151011.
Cited by
No relevant information.