from Hacker News

How Not to Sort by Average Rating (2009)

by Aqwis on 8/30/17, 12:46 PM with 156 comments

  • by paulgb on 8/30/17, 2:39 PM

    Averages (even with the post's approach) still have the problem of not being "honest" in the game theory sense. For example, if something is rated 4 stars with 100 reviews, a reviewer who believes its true rating should be 3 stars is motivated to give it 1 star because that will move the average rating closer to his desired outcome. A look at rating distributions shows that this is in fact how many people behave.

    Median ratings are "honest" in this sense, as long as ties are broken arbitrarily rather than by averaging. Math challenge: is there a way of combining the desirable properties mentioned in the post with the property of honesty? I suspect there is but I haven't tried it.

  • by kstenerud on 8/30/17, 2:46 PM

    That's what's always annoyed me with Amazon's "sort by average rating" setting. I want to see the top 10 or so items by rating to give me a baseline to investigate from, but instead I get page after page of cheap Chinese crap with one 5-star review each from the resident fake reviewer.

    Worse than useless.

    Even a simple change like adding a "show only items with a minumum of X reviews" would be a godsend.

  • by toniprada on 8/30/17, 6:01 PM

    Other approach for non binary ratings is to use the true Bayesian estimate, which uses all the platform ratings as the prior probability. This is what IMBD uses in its Top 250:

    "The following formula is used to calculate the Top Rated 250 titles. This formula provides a true 'Bayesian estimate', which takes into account the number of votes each title has received, minimum votes required to be on the list, and the mean vote for all titles:

    weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

    Where:

    R = average for the movie (mean) = (Rating) v = number of votes for the movie = (votes) m = minimum votes required to be listed in the Top 250 C = the mean vote across the whole report"

    http://www.imdb.com/help/show_leaf?votestopfaq&pf_rd_m=A2FGE...

  • by poorman on 8/30/17, 4:07 PM

    I reference this article constantly at Untappd.

    When we were building the NextGlass app, I took much of this into consideration for giving wine and beer recommendations.

    We recently ran the query on the Untappd database of 500 million checkins and it yielded some interesting results. The "whales" (rare beers) bubbled to the top. I assume this is because users who have to trade and hunt down rare beers are less likely to rate them lower. The movie industry doesn't have to worry about users rating "rare movies", but I would think Amazon might have the same issue with rare products.

  • by intenscia on 8/30/17, 2:43 PM

    Implemented this after discovering it via https://www.gamasutra.com/blogs/LarsDoucet/20141006/227162/F...

    Works amazingly well and so easy to calculate vs say the way IMDb rates things.

  • by loisaidasam on 8/30/17, 4:40 PM

    Here's a SO post w/ a python implementation:

    https://stackoverflow.com/questions/10029588/python-implemen...

    The accepted answer uses a hard-coded z-value.

    In the event that you want a dynamic z-value like the ruby solution offers, I just submitted the following solution:

    https://stackoverflow.com/questions/10029588/python-implemen...

  • by dperfect on 8/30/17, 3:51 PM

    What's the best way to apply the suggested solution to a numeric 5-star rating system (the author mentions Amazon's 5-star system using the wrong approach, yet the solution is specific to a rating system of binary positive/negative ratings)?

    I suppose one could arbitrarily assign ratings above a certain threshold to "positive" and those below to "negative", and use the same algorithm, but I imagine there's probably a similar algorithm that works directly on numeric ratings. Anyone know? Or if you must convert the numeric ratings to positive/negative, how does one find the best cutoff value?

  • by jbochi on 8/30/17, 4:23 PM

    It's very common to see a "Most Popular" section in a website, but the way it's usually done is not optimized for clicks.

    Inspired by Evan's post, I wrote "How Not to Sort by Popularity" a few weeks ago: https://medium.com/@jbochi/how-not-to-sort-by-popularity-927...

  • by kuharich on 8/30/17, 7:02 PM

  • by hood_syntax on 8/30/17, 2:34 PM

    Read this article before and I really liked how to the point it is. More than anything, can I just say how infuriating Amazon's rating system is?
  • by eeZah7Ux on 8/30/17, 3:46 PM

    This is computationally very heavy, but, more importantly, for practical purposes you want to have a tunable parameter to balance between sorting by pure rating average and sorting by pure popularity.

    Often you also want to give a configurable advantage or handicap to new entries.

  • by amelius on 8/30/17, 4:19 PM

    > What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer.

    Well, you can't answer that question without making assumptions. And these seem to be missing in the article.

  • by thanatropism on 8/30/17, 3:22 PM

    Arguably what Urban Dictionary is doing is to weigh by "net favorability" in some sense and quantity of votes. Quantity of votes correlates to relevance, particularly because UD is meant to represent popular usage.
  • by agentgt on 8/31/17, 12:55 AM

    This sort of reminds of "voting theory" and if I recall it was proven by I think a nobel prize winner that there cannot be a fair winner.

    Obviously it's not entirely analogous but I would not be surprised if it mapped over to this domain.

    Edit: on mobile so late on the link to Kenneth Arrow https://en.m.wikipedia.org/wiki/Arrow%27s_impossibility_theo...

  • by gesman on 8/30/17, 7:37 PM

    I think ratings need to be normalized to personal beliefs and preferences of the viewer.

    In other words - I can care less how Joe Blow rated the product - but it's important to me how likeminded people like me rated the product.

    Also - Amazon is not making mistake in ratings.

    Amazon is less interested in selling you relevant product for you.

    Amazon is more interested to boost it's bottom line, move stalled inventory or move higher margin inventory.

  • by alexvay on 8/31/17, 2:41 PM

    I think the article is missing something visual to demonstrate the actual scoring at work.

    I've made a simple plot in Excel here: http://i.imgur.com/adjaLQ9.png

    The number of up-votes remains the same, while down-votes increases linearly. The scoring declining line in grey is the score.

  • by tabtab on 8/31/17, 12:24 AM

    What about having a scaling factor to adjust the impact of quantity (total) of individual ratings as needed? Rough draft:

      sort_score = (pos / total) + (W * log(total))
    
    Here, W is the weighting (scaling) factor. Total = positive + negative
  • by alexpetralia on 8/30/17, 2:37 PM

    Chris Stucchio and Evan Miller have amazing statistics blogs.
  • by phunge on 8/30/17, 9:20 PM

    Classic post! This post is like a gentle gateway to the world of Bayesian statistics -- check out Cameron Davidson Pilon's free book if you want to go deeper.
  • by bradbeattie on 8/31/17, 5:52 AM

    I think this article is missing the next step: collaborative filtering. I only care about the ratings it received from people that rate thing like I do.
  • by larkeith on 8/30/17, 6:00 PM

    This article is useful, but the author's tone really rubs me the wrong way - to the point I'm dubious about trusting the information without further sources. Cutting the entire first part ("not calculating the average is not how to calculate the average") would help, as would more accurately titling the piece - no matter how effective this method is, it is NOT sorting by average, strictly speaking.
  • by ignawin on 8/30/17, 3:23 PM

    Any blog posts/papers on what the best general approach to onliene reviews is?
  • by Animats on 8/30/17, 8:41 PM

    Mandatory XKCD: https://xkcd.com/937/
  • by autokad on 8/31/17, 6:09 PM

    a gamma poison might more accurately calculate the rating based off uncertainty of the data
  • by donatj on 8/30/17, 3:05 PM

    Does a decent Fortran implementation exist?