Mind nutrition station: 100 Best books of all time

Miaozhi Yu

Posted on Aug 22, 2016

If you are a book worm, you probably heard of the website Goodreads. Goodreads is an Amazon company and "social cataloging" website founded in December 2006 and launched in January 2007 by Otis Chandler, II, a software engineer and entrepreneur, and Elizabeth Chandler. (Otis Chandler II is the grandson of Otis Chandler, who was the publisher of the Los Angeles Times, and is the great-great-great grandson of Harrison Gray Otis, the newspaper's founder.) The website allows individuals to freely search Goodreads' extensive user-populated database of books, annotations, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys/polls, blogs, and discussions. By July 2012, the site reported 10 million members, 20 million monthly visits, and 30 employees. On July 23, 2013, it was announced on their website that the user base had grown to 20 million members, having doubled in close to 11 months.

According to Goodreads members' review, the website created a list of 100 best books of all time. (Please check the website: http://www.goodreads.com/list/show/9440.100_Best_Books_of_All_Time_The_World_Library_List) In this project, I webscraped information of each book on the list (including title, author, publish date, pages, cover image, type etc) and their reviews from members of Goodreads. The packages I used in the webscraping are Beautiful soup and Selenium.

Here is the website:

Methodology:

Below is the code I used to webscrape basic information including book title, author, pages, publish date, rating score, number of ratings, language, type, country. The earliest book, Odessey, on this list can date back to 800 BC while the latest is in 20th century. The book can be thousands pages long while the shortest can be around 80 pages.

And then I used Selenium to webscrape 20 pages of reviews of each book.

Shiny App:

Based on the dataset scraped from the website, I created the following shiny app. This shiny app focuses on the following three questions: 1.

how does the literature hot spots changes over centuries? 2. Give recommendation from the list to the readers based on their choices on publish date, type and country.

Based on the interactive map, we can see that the hot spots starts from Europe, after renaissance, the literature gets even more prosperous in Europe. As time goes by, the literature hot spots spread out to other continents, including Asia, South America and North America.

The next dashboard is about giving recommendation to users based on their choice. After clicking on the book, there pop up a small window containing the wordcloud generated from reviews of each book.

Also you can explore the dataset on the third dashboard.

The link to the shiny app is here: https://yumiaomiao0908.shinyapps.io/wiki_app/

Please feel free to play with it and leave any comments.

About Author

Miaozhi Yu

Miaozhi recently received her Master’s degree in Mathematics from New York University. Before that she received a Bachelor’s Degree in both Mathematics and Statistics with a minor in Physics from UIUC. Her research interests lie in random graphs...

View all posts by Miaozhi Yu >

Cancel reply

You must be logged in to post a comment.

No comments found.

Mind nutrition station: 100 Best books of all time

About Author

Miaozhi Yu

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Mind nutrition station: 100 Best books of all time

About Author

Miaozhi Yu

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!