Mind nutrition station: 100 Best books of all time

Miaozhi Yu
Posted on Aug 22, 2016

If you are a book worm, you probably heard of the website Goodreads. Goodreads is an Amazon company and "social cataloging" website founded in December 2006 and launched in January 2007 by Otis Chandler, II, a software engineer and entrepreneur, and Elizabeth Chandler. (Otis Chandler II is the grandson of Otis Chandler, who was the publisher of the Los Angeles Times, and is the great-great-great grandson of Harrison Gray Otis, the newspaper's founder.) The website allows individuals to freely search Goodreads' extensive user-populated database of books, annotations, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys/polls, blogs, and discussions. By July 2012, the site reported 10 million members, 20 million monthly visits, and 30 employees. On July 23, 2013, it was announced on their website that the user base had grown to 20 million members, having doubled in close to 11 months.

According to Goodreads members' review, the website created a list of 100 best books of all time. (Please check the website: http://www.goodreads.com/list/show/9440.100_Best_Books_of_All_Time_The_World_Library_List) In this project, I webscraped information of each book on the list (including title, author, publish date, pages, cover image, type etc) and their reviews from members of Goodreads.  The packages I used in the webscraping are Beautiful soup and Selenium.

Here is the website:

Capture    Capture1

 

Methodology:

Below is the code I used to webscrape basic information including book title, author, pages, publish date, rating score, number of ratings, language, type, country. The earliest book, Odessey, on this list can date back to 800 BC while the latest is in 20th century. The book can be thousands pages long while the shortest can be around 80 pages.

Capture2

Capture3

And then I used Selenium to webscrape 20 pages of reviews of each book.

Capture4

Capture5

Capture6

 

Shiny App:

Based on the dataset scraped from the website, I created the following shiny app. This shiny app focuses on the following three questions: 1.

how does the literature hot spots changes over centuries? 2. Give recommendation from the list to the readers based on their choices on publish date, type and country.

Based on the interactive map, we can see that the hot spots starts from Europe, after renaissance, the literature gets even more prosperous in Europe. As time goes by, the literature hot spots spread out to other continents, including Asia, South America and North America.

app-map

The next dashboard is about giving recommendation to users based on their choice. After clicking on the book, there pop up a small window containing the wordcloud generated from reviews of each book.

app-recommendation

app-review

Also you can explore the dataset on the third dashboard.

The link to the shiny app is here: https://yumiaomiao0908.shinyapps.io/wiki_app/

Please feel free to play with it and leave any comments.

About Author

Miaozhi Yu

Miaozhi Yu

Miaozhi recently received her Master’s degree in Mathematics from New York University. Before that she received a Bachelor’s Degree in both Mathematics and Statistics with a minor in Physics from UIUC. Her research interests lie in random graphs...
View all posts by Miaozhi Yu >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp