Tuesday, 13 October 2015

What do Goodreads ratings say about sales?

I have long maintained from my own data that there is a pretty close relationship between the number of Goodreads ratings a book has and its total sales. PROVIDING that the books are of a similar age and the same genre.

Others have disputed me. So I've been out to gather data.

Many authors don't like to talk about sales figures just as many don't like to talk about money. That's a position I respect and understand. I don't share it.

The scatter-plot below is 'number of Goodreads ratings'  along the bottom and 'sales in English' along the side.

Click for detail.



Red circle = 2011 book. Pink square = 2012. Blue diamond = 2013. Green triangle = 2014. Asterix = 2015.

Black circles are anonymous data from 2011+
Black squares anonymous data from before 2011


They are all for fantasy books.

The ratings to sales ratio is pretty stable. If you multiple the number of Goodreads ratings by 7.7 you come pretty close to the number of books sold in English (all formats).

About half of the authors I invited to supply their data decided not to. So this graph may change for the higher numbers. Perhaps if a book breaks out of the genre sales bin it finds a demographic less (or more) likely to rate on Goodreads.

One noticeable outlier (blue diamond in the lower section) has the possible explanation (suggested by the author) that it sold a lot of very cheap e-copies that may be sitting on Kindles waiting to be read. Books on special offer are sometimes snapped up and saved for later. Also, most of his sales were in the UK. Perhaps the British use Goodreads less.


So there you have it. Some evidence that when the book is traditionally published, out for a year or more, and has a decent number of ratings (say several hundred to make the effects of any manipulation insignificant) you appear to be able to guestimate sales pretty well!

This will of course vary from genre to genre: with the likelihood of the demographic to rate on Goodreads). And with time: books published before Goodreads, or when Goodreads was smaller, will be under-represented by ratings.

Although the ratio will change with genre and age it seems very likely that if you take two books from the same genre and the same period that if one has twice as many ratings as the other it will also have twice as many sales and that that prediction will be pretty accurate.

The annotated points in easier to read form.
(click for details)





     


16 comments:

  1. Very interesting. It would be nice to have even more books to verify this, but as a reader who has often wondered about sales, this is a great way to estimate the sales of my favorite books -- specifically English sales.

    ReplyDelete
  2. Thanks for sharing this Mark. Interesting stuff -- I've often wondered about this very question. The multiplier for my books seemed more like 6 than 7.7, but admittedly, I was just eyeballing...

    ReplyDelete
  3. Thanks for sharing this Mark. Interesting stuff -- I've often wondered about this very question. The multiplier for my books seemed more like 6 than 7.7, but admittedly, I was just eyeballing...

    ReplyDelete
  4. Interesting article. I guess it gets a bit chicken and egg but it would be interesting to see if goodread ratings correspond with sales too. eg if the rating goes up is there an upswing in sales and vice versa?

    Curious to see the possible effect of an ebook promo. I know I have stacks of ebooks that may potentially never be read but the impulse to buy a cheap book that doesn't take up physical space is always hard to resist.

    ReplyDelete
  5. I looked at this and thought, 'it probably doesn't work for newly released books with a lower number of ratings.'

    Just got a look at an early Royalty Statement (drastically different than a Royalty Cheque) and it's spot on.

    Damn.

    ReplyDelete
  6. I'd really like to see how this relates across audiences and what the difference is between genre and literary fiction, or maybe more specifically, I have a gut feel that books for smaller, more 'specialised' audiences might be more affected by this and the big middle of the road mainstream best sellers might be less affected by this.
    Also would be quite interesting to see how mainstream advertising affects this.

    ReplyDelete
  7. This doesn't work for me, Mark. Using books (all fantasy epic) published between 2011 and 2015, the ratio was between x11 and x22...and that was just for US sales, not English language worldwide. I have often wondered if being a woman writer affects ratings and reviews.

    ReplyDelete
    Replies
    1. Interesting. Out of more than a dozen authors you're the first to report a significant difference. Even the outlier reported in the blog turned out to have over-estimated his sales.

      It would be interesting to have more points and to check any male / female difference (Kameron Hurley provided data that fit the trend and she's female). When dealing with smaller numbers (and your post 2011 books have hundreds of ratings rather than thousands) we can expect more volatile behaviour, though I expected any bias on top of that volatility to be in the opposite direction to the one you indicate.

      Delete
  8. Nice work. The ratio is way lower than I expected. I would expect that far less than 10% of readers even know that goodreads exists. And of those, only a fraction will bother rating any given book. Comparing with computer games, the ratio is closer to 100 for Steam reviews, which most players know about and use to actually buy the game. I've also seen the Amazon book reviews to copies sold estimated at around 100. It would be interesting to see a deeper analysis of why these are an order of magnitude different.

    ReplyDelete
  9. Doesn't work for me either, Mark – maybe because most of my sales are in the UK and Australia. For my first book (1998) the ratio is 55. For my 2011 book it's 30. And for last year's (2016) book it's 122. They're all epic fantasy.

    ReplyDelete
    Replies
    1. Well, 1998 is clearly outside the bounds of the study and significantly predates Goodreads itself (2006). So no surprise there.

      And your 2016 book has fewer than 100 ratings, and as noted the statistics are volatile for small numbers. The article says several hundred is the minimum point at which one should consider using the ratio.

      For the only book to which I would advise applying the method you have a 30. I can't explain that, though as a scientist it _greatly_ surprises me and I would love to have access to a larger body of ground truth data.

      Your readers presumably read other fantasy and are drawn without bias from the general population of fantasy readers. Why then would they be statistically far less likely to register their opinion on Goodreads for your books than their fellows (& quite possibly they themselves) are for the books of other authors? ... a mystery.

      It would be nice to get raw data from a range of publishers over a much larger number of books. But that's never going to happen.

      One important point is that for the data in my post I approached authors. It was a fairly unbiased sample. For the data volunteered in the comments it is authors approaching me ... and who is most likely to take the effort to comment? Well, one group is authors for whom the formula doesn't seem to work. They are far more likely to want to comment than authors for whom it does work and who nod and move on (I have had a body of feedback to this effect in face to face conversations, on forums, etc). So what we see in these comments is a self-selected collection of outliers. Which can't of course be sensibly included in the data, but which might motivate the collection of more (non self-selected) data.

      Delete
    2. An additional thought... It might be that you have a body of loyal readers who formed an attachment to you in the late 90s and as an older generation are statistically less likely to be internet/Goodreads users.

      The authors in my study are (as far as I know) all first published in the last ten years and so have first recruited their readers in the Goodreads age / a period of far greater internet use.

      Delete
  10. Thanks Mark. Your second thought is bang on. My 11 Three Worlds epic fantasy novels were published between 1998 and 2008 and sold very well in Australia and the UK, where most of my fans still are. From talking to some other fantasy writers, the 7.7 factor doesn't seem to work so well for AU and UK sales. Interesting. But anyway, great article! Cheers, Ian.

    ReplyDelete
    Replies
    1. I sell as many books in the UK as in the US, and do pretty well in Australia, so I think it's likely to be more about when you got most of your readership rather than where you sell.

      Delete
  11. I think you aren't differentiating the "types" of releases well enough. For instance, Make Cole - who is published in mass market paperbacks really can't be compared to your books that are released in hardcover and paperback (because your books have essentially two bites at the apple if it were). Likewise, the price of the ebook is going to be a huge factor. Orbit has my Riyria books at $9.99 - a price they've been at since day one. I think this is too high, but Orbit's argument is they sell well at that price so no reason to change things. Still, a $9.99 ebook is going to sell less than a $4.95 ebook - which is the price for many authors who have hardcover followed by paperback release.

    Also, how are each author determining their sales numbers? If using an author portal - you can get a pretty good idea of books shipped - but early on those numbers will be misleading. After the bulk of returns occur data from that source is pretty good, but I'd say you need a book to be out at least a year before you count those chickens.

    If using bookscan data, that's notoriously unreliable. Scalzi says his bookscan numbers show little relation to his own sales. For me, they run around 60% of actual sales (for books that have been out a long time).

    Royalty reports is the third way of determining sales but we get them so infrequently that they lag by quite a deal. April data shoes Jul - Dec of the previous year and October data shows Jan - June..unless you are talking about audio data - which in a subsidiary deal can lag by as much as a year and a half!

    There are other factors. Whether the book is a first in series, last in series, or something in the middle. What kind of distribution the author has in their non-native country) -- Riyria has a UK publisher, Legends does not. Also some books are pirated more than others. Those books would have lower ratios since they have more reviews on fewer sales.

    For some data points. Age of Myth (which has been out for more than a year is at 11.2. Death of Dulgath 8.9 (two years old). Age of Swords (6 months) -- assuming similar level of returns as Age of Myth would put it at 13.2. The Rose and the Thorn would be 5.6. That's pretty much all over the board.

    I appreciate what you are trying to do, but I don't think given then number of variables that come into play, and there very limited amount of data, that you can really make any conclusions other than if a book has 10,000+ ratings, it's probably sold well. But "how well" is anyone's guess.

    ReplyDelete
    Replies
    1. Several points to make here.

      i) I specifically make the point that these data are all for traditionally published books, which really means authors whose novels are solely traditionally published. Your hybrid career does not fit that brief.

      ii) The data in these graphs was canvassed randomly (albeit steered by association). It would be non-scientific to add volunteered data since these are biased (i.e far more likely to come from outliers prompted to volunteer the data due to its exception)

      iii) on the scale of the graph you figures, despite their non traditional and varied origins, are actually not very much at odds with the data already there and rather than being all over the place fit fairly well leading to almost no change to the line of best fit. If we ignore the book that is 6 months old (far too recent for GR ratings to have fully caught up with initial kickstarter sales that preload the process, then your average ratio is 8.6. It's also worth bearing in mind that your unusual sales profile extends into a much larger % of audio sales than most, which may skew results.


      In any event, to quote figures that are self selected, from an unusual back ground, *AND* still turn out to be broadly in line with the trend ... and then conclude that the relationship between sales and GR ratings numbers is "anyone's guess" is simply not correct.

      Delete