Bonsai- Optimizing Forum Queries and How To Save It

Gregory Fortunato

Posted on Mar 22, 2019

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Recently, I've been trying my hand at a new hobby: Bonsai, the art of cultivating trees in small pots. After some initial success in the humid summer months, I soon found myself staring at the wilted leaves of a small dying tree, and, realizing that I might need some help, searched for advice as to how I might revive my poor tree.

^{A healthy bonsai!}

I turned to Bonsai Empire, a large forum with over 6,000 postings to see what fellow enthusiasts have been discussing. I quickly noticed a large disparity in the number of responses to each post, and decided to scrape the multi-layered forum with a scrapy spider to investigate which topics were getting the most responses.

Scraped data in hand, I took the stems of every word in each post's topic to consider words with the same root (e.g., choose & choosing) as the same word and removed meaningless "stopwords" such as "your" and "is" to get a list of meaningful topics and their corresponding responses.

Thus, the top ten topics with the most responses are as follows:

STEM	TOTAL RESPONSES
bonsai	1324
tree	749
help	620
new	366
junip	282
ficu	239
thi	219
need	218
elm	214
leav	213

This list is, of course, biased towards topics with the most postings, so we look to the average number of responses for better insight, and obtain the following list of topics:

STEM	AVERAGE RESPONSES
introduc	83.4
wisconsin	55.0
nonsai	54.0
recommendations	48.0
corkscrew	48.0
challenge	46.0
competit	46.0
halp	44.0
concretec	44.0
aggress	44.0

However, the integer values for Average Responses indicate that these values may be from single posts, so we filter the list for topics that appear in at least two posts, and obtain the list below:

STEM	AVERAGE RESPONSES
introduc	83.4
competit	46.0
gnarl	41.5
alps	40.0
walmart	36.25
heaven	33.0
guid	30
fusion	29.75
monster	29.5
pics	29.0

It should be noted that many of the topics above are relatively infrequent, so the avid responses may simply be anomalies. For the highest likelihood of response, one should write a topic that includes words from the top of both of the total and average response.

Bonsai- Optimizing Forum Queries and How To Save It

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Gregory Fortunato

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Bonsai- Optimizing Forum Queries and How To Save It

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

About Author

Gregory Fortunato

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!