Does Book Length Influence its Popularity?
Goodreads is an Amazon-owned review and discussion website for books. I scraped data on books and their authors, in which the start url result page was essentially a home directory for their most popular lists, called Listopia. Each list has a certain theme, or topic, and the books are ranked within each list. Rather than hand-selecting each list, I decided to scrape the most popular lists to avoid selection bias. Goodreads has more than 7,000 book lists, and I used the Scrapy framework in Python to scrape about 300 lists, which amounted to 22,000 unique books in my dataset.
The popularity of a book could be measured in a few different ways. I decided the main two dependent variables to focus on would be a book’s average rating and Goodreads’ index called Score. Ultimately, I determined that Score better represents the popularity because this single statistic determines where a book is placed in the rankings of the list. Whereas, a book’s average rating would not change drastically when scanning the list hierarchy. Most books have score below 2,000, which are relatively low values. This intuitively makes sense because the ranking index should ideally only a small proportion of the books.
In order to exploit the differences between genres, I created a new data frame in which I only kept books that had a genre specified as ‘Fiction’ or ‘Nonfiction’, and only 3,000 books remained after filtering. As the graph displays below, the score of fiction books tends to increase as the page length increases. However, as the pages of nonfiction books increases, its score tends to stay the same, on average.
The findings of this analysis can give more insight about the market for literature for almost all parties in the supply chain. Young writers, for example, can benefit from this analysis by using this as guidance for the approximate ideal length, given their intention for writing in a specific genre. For example, it seems as though long books are more likely to be rewarded with higher ratings if it is a fiction genre. Similarly, this information can be used by publishers to guide writers in order to maximize the book rating and thus, the revenue. With that being said, there is a basic assumption being made, which which was not proven in this study, but at the very least provides opportunity for further research: this study did not prove that higher scores are correlated with increased revenue. This study only provided evidence that it’s more likely to have a higher ranking on Goodreads. However, it is likely that a higher ranking is correlated with an increased revenue from sales. Lastly, the consumers can benefit from this as well by steering clear of lengthy nonfiction books, unless it happens to be their absolute favorite topic.