Jokes Classification

Felipe da Silva Santos

Posted on May 20, 2018

Introduction

There are lots of applications of text classification in the commercial world. For example, in the article: Multi-Class Text Classification with Scikit-Learn, Susan Li described a Machine Learning method to assign new "Consumer Complaint Narrative" to one of 12 categories.

For this project, I used mostly the same methodology to assign a joke to one of 34 categories (according to the Central Comedy page).

Methodology:

The Details:

We have 34 imbalanced classes.

Some Tf-idf term weighting results:

Model results:

Applying in 5 balanced classes.

Exploring some Logistic Regression results:
- The model apparently does not differentiate "money jokes" to "looking good jokes" very well.
- "dirty jokes" and "insults jokes" are often classified as "miscellaneous jokes", probably because "miscellaneous jokes" has an ample definition. So i needs more investigation.

Improvements:

Tuning the models parameters properly: as we can see in my git hub, the model are flat.

Finding the appropriate metrics to compare the model’s performance.

Jokes Classification

Introduction

Methodology:

The Details:

Improvements:

About Author

Felipe da Silva Santos

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Jokes Classification

Introduction

Methodology:

The Details:

Improvements:

About Author

Felipe da Silva Santos

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!