What it is?
I came across this technique while working with Text. I was trying to analyse Twitter’s tweets and Facebook’s posts from page after Reliance Jio Launch.
Analysis invloves:
- Data Collection
- Data Cleaning
- Word Cloud creation
- Sentiment Analysis
After this I was thinking to do something else, while searching on net I found this new topic “Topic Modelling” and “LDA”.
Started reading this, and found that in a layman’s language – Topic Modelling is a way to know on what topics people are talking about and latent Dirichlet allocation (LDA) is a statistical algorithm to do Topic Modelling.
How LDA Works?
It is a unsupervised algorithm similar to probabilistic k-means Clustering. It takes two inputs:
1. No of topics in which we want to classify the words in each documents (k)
2. Corpus of documents containing words.
This algorithm considers that there are only k topics, and will try to classify documents to each topic and words to each topic.
By Iterating this many times, it gives the final out as : Topics and bag of words best defining a topic.
Applied this on Reliance Jio user review data and found people are mostly talking about:
Topic1: Competitors (words used like Vodafone, Airtel, Idea etc (30% people))
Topic2: Amazing Internet (words used like jiodigitallife, internet, download, myjio etc (14% people))
Topic3: Super fast Speed (words used like super fast, watching, switching, speed etc (26% people))
Topic4: Jio Launch (words like India, Mukesh, Launch, Tariff etc (30% people))