Topic Modelling

What it is?

I came across this technique while working with Text. I was trying to analyse Twitter’s tweets and Facebook’s posts from page after Reliance Jio Launch.

Analysis invloves:

  1. Data Collection
  2. Data Cleaning
  3. Word Cloud creation
  4. Sentiment Analysis

After this I was thinking to do something else, while searching on net I found this new topic “Topic Modelling” and “LDA”.

Started reading this, and found that in a layman’s language – Topic Modelling is a way to know on what topics people are talking about and latent Dirichlet allocation (LDA) is a statistical algorithm to do Topic Modelling.

How LDA Works?

It is a unsupervised algorithm similar to probabilistic k-means Clustering. It takes two inputs:
1. No of topics in which we want to classify the words in each documents (k)
2. Corpus of documents containing words.

This algorithm considers that there are only k topics, and will try to classify documents to each topic and words to each topic.

topic-modelling

By Iterating this many times, it gives the final out as : Topics and bag of words best defining a topic.

Applied this on Reliance Jio user review data and found people are mostly talking about:

Topic1: Competitors (words used like Vodafone, Airtel, Idea etc (30% people))

Topic2: Amazing Internet (words used like jiodigitallife, internet, download, myjio etc (14% people))

Topic3: Super fast Speed (words used like super fast, watching, switching, speed etc (26% people))

Topic4: Jio Launch (words like India, Mukesh, Launch, Tariff etc (30% people))

Leave a Comment

Scroll to Top