"they've all got the same covers, and I thought they were all o' one sample, as you may say. But it seems one mustn't judge by th' outside. This is a puzzlin' world." The Mill on the Floss by George EliotWhat is the correlation between peoples ratings of a books cover and the ratings the book receives? This post is about a game devised to get people to rate book covers and gives some great visualisations comparing a books goodreads rating to its cover rating. They gathered over 3 million ratings of 100 covers.
I took their data and got the average rating for each of the covers they tested. I then scraped these 100 books Goodreads average ratings, number of ratings and number of reviews. The Data table and the code I used to scrape and aggregate is here. There are all sorts of accuracy warnings you can imagine around these results. The main ones being that the books and their covers all look pretty good to me. They are not on the self published fan fiction end of the market. The variables here are. num_ratings: Number of Goodreads ratings. rating: average rating of the book. num_reviews: Number of people who have actually written a review. cover_rating: The average rating people gave the cover of the book.
> cor(rating,cover_rating)So no you can't judge a book by its cover the correlation in ratings is only .16. You can guess the number of ratings by the number of reviews. You can't guess how highly rated a book is by the number of ratings. Having a good cover might increase the number of reviews your book gets by a bit.[1] 0.1609114
> cor(num_ratings,num_reviews)
[1] 0.9597442
> cor(rating,num_ratings)
[1] 0.2141307
> cor(rating,num_reviews)
[1] 0.2658916
> cor(num_ratings,cover_rating)
[1] 0.3059627
> cor(num_reviews,cover_rating)
[1] 0.3307553
The conclusion is you shouldn't judge a book by its cover. Or by its number of sales (ratings). But people probably do judge books by their cover a bit.