Ramya Balakrishnan
6 min readJun 7, 2022

--

Different Techniques in Topic Modeling: LDA, Mallet LDA, STM & HDP

In natural language processing, topic modeling is a type of statistical modeling that is used to discover abstract topics in a collection of documents. Though there are multiple techniques available in topic modeling implementation, evaluating the models has been challenging due to its unsupervised training process. There is no standard set of output metrics to compare for every corpus. However, it is equally important to identify if the trained model is good or bad and to have the ability to compare different models/techniques. In this blog, we will explore different techniques and evaluation methods for topic modeling through four of the most popular techniques: LDA, Mallet LDA, STM, and HDP.

Problem Statement:

In response to the murder of George Floyd, today’s leaders are starting new and different conversations across their organizations about systemic oppression and accountability. To understand the distinction of EDI corporate statements as cosmetic covering, conversation starter or commitment indicator, our research team set out an agenda to answer the following question:

  1. What do corporations commit to doing? Are there differences in themes in those commitments?

Data Collection:

We built a web scraping application to scrape corporate statements from Fortune 100 companies and the CEO Action group in response to systemic racism in 2020. We confirmed and analyzed 202 available statements from 228 organizations. The sample count is…

--

--