Mind nutrition station: 100 Best books of all time

Posted on Aug 22, 2016

If you are a book worm, you probably heard of the website Goodreads. Goodreads is an Amazon company and "social cataloging" website founded in December 2006 and launched in January 2007 by Otis Chandler, II, a software engineer and entrepreneur, and Elizabeth Chandler. (Otis Chandler II is the grandson of Otis Chandler, who was the publisher of the Los Angeles Times, and is the great-great-great grandson of Harrison Gray Otis, the newspaper's founder.) The website allows individuals to freely search Goodreads' extensive user-populated database of books, annotations, and reviews. Users can sign up and register books to generate library catalogs and reading lists. They can also create their own groups of book suggestions, surveys/polls, blogs, and discussions. By July 2012, the site reported 10 million members, 20 million monthly visits, and 30 employees. On July 23, 2013, it was announced on their website that the user base had grown to 20 million members, having doubled in close to 11 months.

According to Goodreads members' review, the website created a list of 100 best books of all time. (Please check the website: http://www.goodreads.com/list/show/9440.100_Best_Books_of_All_Time_The_World_Library_List) In this project, I webscraped information of each book on the list (including title, author, publish date, pages, cover image, type etc) and their reviews from members of Goodreads.  The packages I used in the webscraping are Beautiful soup and Selenium.

Here is the website:

Capture    Capture1

 

Methodology:

Below is the code I used to webscrape basic information including book title, author, pages, publish date, rating score, number of ratings, language, type, country. The earliest book, Odessey, on this list can date back to 800 BC while the latest is in 20th century. The book can be thousands pages long while the shortest can be around 80 pages.

Capture2

Capture3

And then I used Selenium to webscrape 20 pages of reviews of each book.

Capture4

Capture5

Capture6

 

Shiny App:

Based on the dataset scraped from the website, I created the following shiny app. This shiny app focuses on the following three questions: 1.

how does the literature hot spots changes over centuries? 2. Give recommendation from the list to the readers based on their choices on publish date, type and country.

Based on the interactive map, we can see that the hot spots starts from Europe, after renaissance, the literature gets even more prosperous in Europe. As time goes by, the literature hot spots spread out to other continents, including Asia, South America and North America.

app-map

The next dashboard is about giving recommendation to users based on their choice. After clicking on the book, there pop up a small window containing the wordcloud generated from reviews of each book.

app-recommendation

app-review

Also you can explore the dataset on the third dashboard.

The link to the shiny app is here: https://yumiaomiao0908.shinyapps.io/wiki_app/

Please feel free to play with it and leave any comments.

About Author

Miaozhi Yu

Miaozhi recently received her Master’s degree in Mathematics from New York University. Before that she received a Bachelor’s Degree in both Mathematics and Statistics with a minor in Physics from UIUC. Her research interests lie in random graphs...
View all posts by Miaozhi Yu >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI