Jokes Classification
Introduction
There are lots of applications of text classification in the commercial world. For example, in the article:Β Multi-Class Text Classification with Scikit-Learn,Β Susan Li described a Machine Learning method to assign new "Consumer Complaint Narrative" to one of 12 categories.
For this project, I used mostly the same methodology to assign a joke to one of 34 categories (according to theΒ Central Comedy page).
Methodology:
The Details:
We have 34 imbalanced classes.
- Some Tf-idf term weighting results:
- Model results:
Applying in 5 balanced classes.
- Exploring some Logistic RegressionΒ results:
- The model apparently does not differentiate "money jokes" toΒ "looking good jokes" very well.
- "dirty jokes" andΒ "insults jokes" are often classified as "miscellaneous jokes", probably becauseΒ "miscellaneous jokes" has an ample definition. So i needs more investigation.
Improvements:
Tuning the models parameters properly: as we can see in my git hub, the model are flat.
Finding the appropriate metrics to compare the modelβs performance.