from Hacker News

Normalizing Ratings

by Symmetry on 5/2/25, 12:39 AM with 52 comments

  • by nlh on 5/2/25, 10:17 PM

    Similarly - one of my biggest complaints about almost every rating system in production is how just absolutely lazy they are. And by that, I mean everyone seems to think "the object's collective rating is an average of all the individual ratings" is good enough. It's not.

    Take any given Yelp / Google / Amazon page and you'll see some distribution like this:

    User 1: "5 stars. Everything was great!"

    User 2: "5 stars. I'd go here again!"

    User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"

    Yelp: 3.6 stars average rating.

    One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.

  • by tibbar on 5/2/25, 10:57 PM

    One of my favorite algorithms for this is Expectation Maximization [0].

    You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)

    [0] https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...

  • by theendisney on 5/15/25, 10:05 PM

    Rating systemen should really mature to exclude non customers and list the customers purchase history.

    Weight by amount spend could be interesting.

    Big vendors/companies should probably be required to have per product ratings rather than optional. Rating adobe or alibaba on general is probably not all that useful.

    The EU almost requires it but google (for example) still didnt find a nice technical solution.

  • by stevage on 5/2/25, 11:33 PM

    I like rating systems from -2 to +2 for this reason.

    The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.

    They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.

  • by homeonthemtn on 5/2/25, 11:50 PM

    I'd rather we just did an increment of 3 rating. 1. Bad 2. Fine 3. Great

    2 and 4 are irrelevant and/or a wild guess or user defined/specific.

    Most of the time our rating systems devolve into roughly this state anyways.

    E.g.

    5 is excellent 4.x is fine <4 is problematic

    And then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellent

    In the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....

    Let's just do 3 stars (no decimal) and call it a day

  • by Retr0id on 5/2/25, 10:31 PM

    > I'm genuinely mystified why its not applied anywhere I can see.

    I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.

    If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.

  • by mzmzmzm on 5/2/25, 11:36 PM

    A problem with accounting for "above average" service is sometimes I don't want it. If a driver goes above and beyond, offering a water bottle or something else exceptional, occasionally I would rather be left alone during a quiet, impersonal ride.
  • by parrit on 5/2/25, 11:28 PM

    For uber you don't need a rating at all. The tracking system knows if they were late, if they took a good route and if they dropped you off at the wrong location.

    Anything really bad can be dealt with via a complaint system.

    Anything exceptional could be asked by a free text field when giving a tip.

    Who is going to read all those text fields and classify them? AI!

  • by pbronez on 5/2/25, 11:43 PM

    One formal measure of this is Inter-Rater Reliability

    https://en.wikipedia.org/wiki/Inter-rater_reliability

  • by rossdavidh on 5/2/25, 11:19 PM

    I have often had the same thought, and I have to believe the reason is that the companies' bottom line is not impacted the tiniest bit by their ratings' systems. It wouldn't be that hard to do better, but anything that takes a non-zero amount of attention and effort to improve, has to compete with all of those other priorities. As far as I can tell, they just don't care at all about how useful their rating system is.

    Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.

  • by adrmtu on 5/3/25, 12:28 AM

    Isn't this basically a de-biasing problem? Treat each rider’s ratings as a random variable with its own mean μᵤ and variance σᵤ², then normalize. Basically compute z = (r – μᵤ)/σᵤ, then remap z back onto a 1–5 scale so “normal” always centers around ~3. You could also add a time decay to weight recent rides higher to adapt when someone’s rating habits drift.

    Has anyone seen a live system (Uber, Goodreads, etc.) implement per-user z-score normalization?

  • by parrit on 5/2/25, 11:22 PM

  • by nmstoker on 5/3/25, 12:15 AM

    Does anyone else get that survey rating effect where you start off thinking the company is reasonable, you give a 4 or 5, then the next page asks for why you chose this and as you think it through you realise more and more shitty things they did, so you go back to bring them down to a 2 or 3. Effectively by asking in detail they undermine the perception of them
  • by enaaem on 5/3/25, 12:13 AM

    Check the bad reviews. If the 1-2 star reviews are mostly about the rude owner, then you know the food is good.
  • by lordnacho on 5/3/25, 12:50 AM

    Has anyone done a forced ranking rating?

    "Here's your last 5 drivers, please rank them"

  • by xnx on 5/2/25, 10:30 PM

    I don't understand why letter grades aren't more popular for rating things in the US.

    "A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.

  • by JSR_FDED on 5/2/25, 11:32 PM

    A++++ article!
  • by jonstewart on 5/3/25, 12:06 AM

    I give five stars always because I’m not a rat.
  • by User23 on 5/3/25, 12:42 AM

    Same for peer reviews. Giving anything less than a four is saying fire this person. And even too many fours is PIP territory.