by Symmetry on 5/2/25, 12:39 AM with 52 comments
by nlh on 5/2/25, 10:17 PM
Take any given Yelp / Google / Amazon page and you'll see some distribution like this:
User 1: "5 stars. Everything was great!"
User 2: "5 stars. I'd go here again!"
User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"
Yelp: 3.6 stars average rating.
One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.
by tibbar on 5/2/25, 10:57 PM
You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)
[0] https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...
by theendisney on 5/15/25, 10:05 PM
Weight by amount spend could be interesting.
Big vendors/companies should probably be required to have per product ratings rather than optional. Rating adobe or alibaba on general is probably not all that useful.
The EU almost requires it but google (for example) still didnt find a nice technical solution.
by stevage on 5/2/25, 11:33 PM
The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.
They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.
by homeonthemtn on 5/2/25, 11:50 PM
2 and 4 are irrelevant and/or a wild guess or user defined/specific.
Most of the time our rating systems devolve into roughly this state anyways.
E.g.
5 is excellent 4.x is fine <4 is problematic
And then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellent
In the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....
Let's just do 3 stars (no decimal) and call it a day
by Retr0id on 5/2/25, 10:31 PM
I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.
If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.
by mzmzmzm on 5/2/25, 11:36 PM
by parrit on 5/2/25, 11:28 PM
Anything really bad can be dealt with via a complaint system.
Anything exceptional could be asked by a free text field when giving a tip.
Who is going to read all those text fields and classify them? AI!
by pbronez on 5/2/25, 11:43 PM
by rossdavidh on 5/2/25, 11:19 PM
Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.
by adrmtu on 5/3/25, 12:28 AM
Has anyone seen a live system (Uber, Goodreads, etc.) implement per-user z-score normalization?
by parrit on 5/2/25, 11:22 PM
by nmstoker on 5/3/25, 12:15 AM
by enaaem on 5/3/25, 12:13 AM
by lordnacho on 5/3/25, 12:50 AM
"Here's your last 5 drivers, please rank them"
by xnx on 5/2/25, 10:30 PM
"A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.
by JSR_FDED on 5/2/25, 11:32 PM
by jonstewart on 5/3/25, 12:06 AM
by User23 on 5/3/25, 12:42 AM